Training: 2022-01-15 14:51:43,310-rank_id: 0 Training: 2022-01-15 14:52:04,169-: loss cosface Training: 2022-01-15 14:52:04,170-: network r50 Training: 2022-01-15 14:52:04,170-: resume False Training: 2022-01-15 14:52:04,170-: output work_dirs/webface42m_r50_lr01_pfc02_bs8k_16gpus Training: 2022-01-15 14:52:04,170-: embedding_size 512 Training: 2022-01-15 14:52:04,170-: sample_rate 0.2 Training: 2022-01-15 14:52:04,171-: fp16 True Training: 2022-01-15 14:52:04,171-: momentum 0.9 Training: 2022-01-15 14:52:04,171-: weight_decay 0.0005 Training: 2022-01-15 14:52:04,171-: batch_size 512 Training: 2022-01-15 14:52:04,171-: lr 0.6 Training: 2022-01-15 14:52:04,171-: dali True Training: 2022-01-15 14:52:04,171-: verbose 10000 Training: 2022-01-15 14:52:04,171-: frequent 10 Training: 2022-01-15 14:52:04,171-: score None Training: 2022-01-15 14:52:04,171-: rec /train_tmp/WebFace42M Training: 2022-01-15 14:52:04,171-: num_classes 2059906 Training: 2022-01-15 14:52:04,171-: num_image 42474557 Training: 2022-01-15 14:52:04,171-: num_epoch 20 Training: 2022-01-15 14:52:04,171-: warmup_epoch 4 Training: 2022-01-15 14:52:04,171-: val_targets ['lfw', 'cfp_fp', 'agedb_30'] Training: 2022-01-15 14:52:04,171-: warmup_step 20736 Training: 2022-01-15 14:52:04,171-: total_step 103680 Training: 2022-01-15 14:53:11,153-Reducer buckets have been rebuilt in this iteration. Training: 2022-01-15 14:53:26,109-Speed 10538.89 samples/sec Loss 42.4980 LearningRate 0.0006 Epoch: 0 Global Step: 20 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-01-15 14:53:33,934-Speed 10472.32 samples/sec Loss 42.5048 LearningRate 0.0009 Epoch: 0 Global Step: 30 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-01-15 14:53:41,750-Speed 10482.47 samples/sec Loss 42.4908 LearningRate 0.0012 Epoch: 0 Global Step: 40 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-01-15 14:53:49,556-Speed 10496.88 samples/sec Loss 42.4908 LearningRate 0.0014 Epoch: 0 Global Step: 50 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-01-15 14:53:57,384-Speed 10468.01 samples/sec Loss 42.4911 LearningRate 0.0017 Epoch: 0 Global Step: 60 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-01-15 14:54:05,202-Speed 10479.91 samples/sec Loss 42.4814 LearningRate 0.0020 Epoch: 0 Global Step: 70 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-01-15 14:54:12,998-Speed 10509.65 samples/sec Loss 42.4677 LearningRate 0.0023 Epoch: 0 Global Step: 80 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-01-15 14:54:20,785-Speed 10522.36 samples/sec Loss 42.4678 LearningRate 0.0026 Epoch: 0 Global Step: 90 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-01-15 14:54:28,610-Speed 10471.12 samples/sec Loss 42.4525 LearningRate 0.0029 Epoch: 0 Global Step: 100 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-01-15 14:54:36,485-Speed 10404.45 samples/sec Loss 42.4509 LearningRate 0.0032 Epoch: 0 Global Step: 110 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-15 14:54:44,363-Speed 10400.99 samples/sec Loss 42.4429 LearningRate 0.0035 Epoch: 0 Global Step: 120 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-15 14:54:52,274-Speed 10357.30 samples/sec Loss 42.4529 LearningRate 0.0038 Epoch: 0 Global Step: 130 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-15 14:55:00,077-Speed 10500.45 samples/sec Loss 42.4124 LearningRate 0.0041 Epoch: 0 Global Step: 140 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-15 14:55:07,869-Speed 10514.96 samples/sec Loss 42.4071 LearningRate 0.0043 Epoch: 0 Global Step: 150 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-15 14:55:15,684-Speed 10482.92 samples/sec Loss 42.3922 LearningRate 0.0046 Epoch: 0 Global Step: 160 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-15 14:55:23,491-Speed 10495.91 samples/sec Loss 42.3490 LearningRate 0.0049 Epoch: 0 Global Step: 170 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-15 14:55:31,291-Speed 10504.49 samples/sec Loss 42.3321 LearningRate 0.0052 Epoch: 0 Global Step: 180 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-15 14:55:39,073-Speed 10529.22 samples/sec Loss 42.2759 LearningRate 0.0055 Epoch: 0 Global Step: 190 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-15 14:55:46,882-Speed 10495.15 samples/sec Loss 42.2436 LearningRate 0.0058 Epoch: 0 Global Step: 200 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-01-15 14:55:54,796-Speed 10352.03 samples/sec Loss 42.1667 LearningRate 0.0061 Epoch: 0 Global Step: 210 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 14:56:02,611-Speed 10484.63 samples/sec Loss 42.1119 LearningRate 0.0064 Epoch: 0 Global Step: 220 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 14:56:10,376-Speed 10552.01 samples/sec Loss 42.0450 LearningRate 0.0067 Epoch: 0 Global Step: 230 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 14:56:18,138-Speed 10555.45 samples/sec Loss 41.9606 LearningRate 0.0069 Epoch: 0 Global Step: 240 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 14:56:25,926-Speed 10522.16 samples/sec Loss 41.8878 LearningRate 0.0072 Epoch: 0 Global Step: 250 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 14:56:33,708-Speed 10527.91 samples/sec Loss 41.8050 LearningRate 0.0075 Epoch: 0 Global Step: 260 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 14:56:41,496-Speed 10521.71 samples/sec Loss 41.7209 LearningRate 0.0078 Epoch: 0 Global Step: 270 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 14:56:49,259-Speed 10555.28 samples/sec Loss 41.6181 LearningRate 0.0081 Epoch: 0 Global Step: 280 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 14:56:57,047-Speed 10519.80 samples/sec Loss 41.5033 LearningRate 0.0084 Epoch: 0 Global Step: 290 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 14:57:04,801-Speed 10565.80 samples/sec Loss 41.4112 LearningRate 0.0087 Epoch: 0 Global Step: 300 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 14:57:12,567-Speed 10551.28 samples/sec Loss 41.3446 LearningRate 0.0090 Epoch: 0 Global Step: 310 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 14:57:20,332-Speed 10552.23 samples/sec Loss 41.2324 LearningRate 0.0093 Epoch: 0 Global Step: 320 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 14:57:28,113-Speed 10529.74 samples/sec Loss 41.1418 LearningRate 0.0095 Epoch: 0 Global Step: 330 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 14:57:35,899-Speed 10521.81 samples/sec Loss 41.0666 LearningRate 0.0098 Epoch: 0 Global Step: 340 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 14:57:43,663-Speed 10553.62 samples/sec Loss 40.9541 LearningRate 0.0101 Epoch: 0 Global Step: 350 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 14:57:51,459-Speed 10510.38 samples/sec Loss 40.8803 LearningRate 0.0104 Epoch: 0 Global Step: 360 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 14:57:59,217-Speed 10560.87 samples/sec Loss 40.7930 LearningRate 0.0107 Epoch: 0 Global Step: 370 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 14:58:06,973-Speed 10563.53 samples/sec Loss 40.6926 LearningRate 0.0110 Epoch: 0 Global Step: 380 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 14:58:14,750-Speed 10535.93 samples/sec Loss 40.6391 LearningRate 0.0113 Epoch: 0 Global Step: 390 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 14:58:22,511-Speed 10556.58 samples/sec Loss 40.5538 LearningRate 0.0116 Epoch: 0 Global Step: 400 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 14:58:30,258-Speed 10576.90 samples/sec Loss 40.4857 LearningRate 0.0119 Epoch: 0 Global Step: 410 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 14:58:38,013-Speed 10565.16 samples/sec Loss 40.3895 LearningRate 0.0122 Epoch: 0 Global Step: 420 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 14:58:45,792-Speed 10536.41 samples/sec Loss 40.3293 LearningRate 0.0124 Epoch: 0 Global Step: 430 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 14:58:53,563-Speed 10543.34 samples/sec Loss 40.2559 LearningRate 0.0127 Epoch: 0 Global Step: 440 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 14:59:01,336-Speed 10542.08 samples/sec Loss 40.1917 LearningRate 0.0130 Epoch: 0 Global Step: 450 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 14:59:09,126-Speed 10517.69 samples/sec Loss 40.0976 LearningRate 0.0133 Epoch: 0 Global Step: 460 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 14:59:16,903-Speed 10541.06 samples/sec Loss 40.0501 LearningRate 0.0136 Epoch: 0 Global Step: 470 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 14:59:24,696-Speed 10513.07 samples/sec Loss 39.9828 LearningRate 0.0139 Epoch: 0 Global Step: 480 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 14:59:32,499-Speed 10500.02 samples/sec Loss 39.9063 LearningRate 0.0142 Epoch: 0 Global Step: 490 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 14:59:40,285-Speed 10522.96 samples/sec Loss 39.8531 LearningRate 0.0145 Epoch: 0 Global Step: 500 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 14:59:48,077-Speed 10516.17 samples/sec Loss 39.7836 LearningRate 0.0148 Epoch: 0 Global Step: 510 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 14:59:55,866-Speed 10518.50 samples/sec Loss 39.7285 LearningRate 0.0150 Epoch: 0 Global Step: 520 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 15:00:03,648-Speed 10528.19 samples/sec Loss 39.6652 LearningRate 0.0153 Epoch: 0 Global Step: 530 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 15:00:11,423-Speed 10538.79 samples/sec Loss 39.6111 LearningRate 0.0156 Epoch: 0 Global Step: 540 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 15:00:19,186-Speed 10555.47 samples/sec Loss 39.5778 LearningRate 0.0159 Epoch: 0 Global Step: 550 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 15:00:26,966-Speed 10530.43 samples/sec Loss 39.4876 LearningRate 0.0162 Epoch: 0 Global Step: 560 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 15:00:34,733-Speed 10549.76 samples/sec Loss 39.4386 LearningRate 0.0165 Epoch: 0 Global Step: 570 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 15:00:42,534-Speed 10502.25 samples/sec Loss 39.4006 LearningRate 0.0168 Epoch: 0 Global Step: 580 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 15:00:50,334-Speed 10504.16 samples/sec Loss 39.3441 LearningRate 0.0171 Epoch: 0 Global Step: 590 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 15:00:58,156-Speed 10475.69 samples/sec Loss 39.2938 LearningRate 0.0174 Epoch: 0 Global Step: 600 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 15:01:05,936-Speed 10530.61 samples/sec Loss 39.2431 LearningRate 0.0177 Epoch: 0 Global Step: 610 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 15:01:13,722-Speed 10523.59 samples/sec Loss 39.2166 LearningRate 0.0179 Epoch: 0 Global Step: 620 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-15 15:01:21,519-Speed 10510.90 samples/sec Loss 39.1755 LearningRate 0.0182 Epoch: 0 Global Step: 630 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-15 15:01:29,292-Speed 10540.97 samples/sec Loss 39.1499 LearningRate 0.0185 Epoch: 0 Global Step: 640 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-15 15:01:37,060-Speed 10547.51 samples/sec Loss 39.0989 LearningRate 0.0188 Epoch: 0 Global Step: 650 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-15 15:01:44,822-Speed 10555.47 samples/sec Loss 39.0727 LearningRate 0.0191 Epoch: 0 Global Step: 660 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-15 15:01:52,592-Speed 10544.66 samples/sec Loss 39.0130 LearningRate 0.0194 Epoch: 0 Global Step: 670 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-15 15:02:00,356-Speed 10553.55 samples/sec Loss 38.9975 LearningRate 0.0197 Epoch: 0 Global Step: 680 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-15 15:02:08,117-Speed 10556.65 samples/sec Loss 38.9499 LearningRate 0.0200 Epoch: 0 Global Step: 690 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-15 15:02:15,896-Speed 10533.14 samples/sec Loss 38.9410 LearningRate 0.0203 Epoch: 0 Global Step: 700 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-15 15:02:23,690-Speed 10512.85 samples/sec Loss 38.8997 LearningRate 0.0205 Epoch: 0 Global Step: 710 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-15 15:02:31,466-Speed 10536.29 samples/sec Loss 38.8655 LearningRate 0.0208 Epoch: 0 Global Step: 720 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 15:02:39,220-Speed 10571.25 samples/sec Loss 38.8419 LearningRate 0.0211 Epoch: 0 Global Step: 730 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 15:02:47,033-Speed 10485.95 samples/sec Loss 38.8172 LearningRate 0.0214 Epoch: 0 Global Step: 740 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 15:02:54,798-Speed 10552.53 samples/sec Loss 38.7806 LearningRate 0.0217 Epoch: 0 Global Step: 750 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-01-15 15:03:02,547-Speed 10581.24 samples/sec Loss 38.7802 LearningRate 0.0220 Epoch: 0 Global Step: 760 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-15 15:03:10,323-Speed 10536.87 samples/sec Loss 38.7691 LearningRate 0.0223 Epoch: 0 Global Step: 770 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-15 15:03:18,092-Speed 10546.67 samples/sec Loss 38.7287 LearningRate 0.0226 Epoch: 0 Global Step: 780 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-15 15:03:25,872-Speed 10534.40 samples/sec Loss 38.7063 LearningRate 0.0229 Epoch: 0 Global Step: 790 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-15 15:03:33,682-Speed 10490.84 samples/sec Loss 38.6898 LearningRate 0.0231 Epoch: 0 Global Step: 800 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-15 15:03:41,477-Speed 10511.84 samples/sec Loss 38.6850 LearningRate 0.0234 Epoch: 0 Global Step: 810 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-15 15:03:49,280-Speed 10501.17 samples/sec Loss 38.6583 LearningRate 0.0237 Epoch: 0 Global Step: 820 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-15 15:03:57,146-Speed 10416.54 samples/sec Loss 38.6302 LearningRate 0.0240 Epoch: 0 Global Step: 830 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-15 15:04:04,948-Speed 10501.66 samples/sec Loss 38.6152 LearningRate 0.0243 Epoch: 0 Global Step: 840 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-01-15 15:04:12,711-Speed 10554.32 samples/sec Loss 38.5989 LearningRate 0.0246 Epoch: 0 Global Step: 850 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-15 15:04:20,501-Speed 10517.37 samples/sec Loss 38.5868 LearningRate 0.0249 Epoch: 0 Global Step: 860 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-15 15:04:28,325-Speed 10472.15 samples/sec Loss 38.5808 LearningRate 0.0252 Epoch: 0 Global Step: 870 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-15 15:04:36,132-Speed 10496.18 samples/sec Loss 38.5699 LearningRate 0.0255 Epoch: 0 Global Step: 880 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-15 15:04:43,921-Speed 10519.62 samples/sec Loss 38.5524 LearningRate 0.0258 Epoch: 0 Global Step: 890 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-15 15:04:51,712-Speed 10516.82 samples/sec Loss 38.5357 LearningRate 0.0260 Epoch: 0 Global Step: 900 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-15 15:04:59,536-Speed 10473.58 samples/sec Loss 38.5317 LearningRate 0.0263 Epoch: 0 Global Step: 910 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-15 15:05:07,347-Speed 10493.15 samples/sec Loss 38.5047 LearningRate 0.0266 Epoch: 0 Global Step: 920 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-15 15:05:15,152-Speed 10497.44 samples/sec Loss 38.4897 LearningRate 0.0269 Epoch: 0 Global Step: 930 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-15 15:05:22,969-Speed 10481.83 samples/sec Loss 38.4940 LearningRate 0.0272 Epoch: 0 Global Step: 940 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-15 15:05:30,784-Speed 10485.88 samples/sec Loss 38.4776 LearningRate 0.0275 Epoch: 0 Global Step: 950 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-15 15:05:38,599-Speed 10485.32 samples/sec Loss 38.4629 LearningRate 0.0278 Epoch: 0 Global Step: 960 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:05:46,375-Speed 10537.75 samples/sec Loss 38.4521 LearningRate 0.0281 Epoch: 0 Global Step: 970 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-15 15:05:54,209-Speed 10457.97 samples/sec Loss 38.4495 LearningRate 0.0284 Epoch: 0 Global Step: 980 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-15 15:06:02,010-Speed 10504.63 samples/sec Loss 38.4249 LearningRate 0.0286 Epoch: 0 Global Step: 990 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-15 15:06:09,826-Speed 10483.87 samples/sec Loss 38.4208 LearningRate 0.0289 Epoch: 0 Global Step: 1000 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-15 15:06:17,685-Speed 10425.80 samples/sec Loss 38.4126 LearningRate 0.0292 Epoch: 0 Global Step: 1010 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-15 15:06:25,478-Speed 10513.39 samples/sec Loss 38.4139 LearningRate 0.0295 Epoch: 0 Global Step: 1020 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-15 15:06:33,277-Speed 10505.19 samples/sec Loss 38.4050 LearningRate 0.0298 Epoch: 0 Global Step: 1030 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-15 15:06:41,097-Speed 10477.67 samples/sec Loss 38.3907 LearningRate 0.0301 Epoch: 0 Global Step: 1040 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-15 15:06:48,887-Speed 10518.07 samples/sec Loss 38.3610 LearningRate 0.0304 Epoch: 0 Global Step: 1050 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-15 15:06:56,718-Speed 10462.31 samples/sec Loss 38.3688 LearningRate 0.0307 Epoch: 0 Global Step: 1060 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-15 15:07:04,538-Speed 10477.88 samples/sec Loss 38.3434 LearningRate 0.0310 Epoch: 0 Global Step: 1070 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-15 15:07:12,355-Speed 10481.34 samples/sec Loss 38.3405 LearningRate 0.0312 Epoch: 0 Global Step: 1080 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-15 15:07:20,193-Speed 10454.06 samples/sec Loss 38.3381 LearningRate 0.0315 Epoch: 0 Global Step: 1090 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-01-15 15:07:28,019-Speed 10470.22 samples/sec Loss 38.3337 LearningRate 0.0318 Epoch: 0 Global Step: 1100 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-01-15 15:07:35,853-Speed 10458.44 samples/sec Loss 38.3181 LearningRate 0.0321 Epoch: 0 Global Step: 1110 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-01-15 15:07:43,646-Speed 10514.32 samples/sec Loss 38.3201 LearningRate 0.0324 Epoch: 0 Global Step: 1120 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-01-15 15:07:51,433-Speed 10521.97 samples/sec Loss 38.3203 LearningRate 0.0327 Epoch: 0 Global Step: 1130 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-01-15 15:07:59,224-Speed 10516.49 samples/sec Loss 38.2987 LearningRate 0.0330 Epoch: 0 Global Step: 1140 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-01-15 15:08:07,021-Speed 10508.10 samples/sec Loss 38.3164 LearningRate 0.0333 Epoch: 0 Global Step: 1150 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-01-15 15:08:14,799-Speed 10533.34 samples/sec Loss 38.2953 LearningRate 0.0336 Epoch: 0 Global Step: 1160 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-01-15 15:08:22,592-Speed 10514.28 samples/sec Loss 38.2887 LearningRate 0.0339 Epoch: 0 Global Step: 1170 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-01-15 15:08:30,414-Speed 10475.18 samples/sec Loss 38.2872 LearningRate 0.0341 Epoch: 0 Global Step: 1180 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-01-15 15:08:38,219-Speed 10499.28 samples/sec Loss 38.2896 LearningRate 0.0344 Epoch: 0 Global Step: 1190 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-15 15:08:46,047-Speed 10466.41 samples/sec Loss 38.2829 LearningRate 0.0347 Epoch: 0 Global Step: 1200 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-15 15:08:53,833-Speed 10523.04 samples/sec Loss 38.2741 LearningRate 0.0350 Epoch: 0 Global Step: 1210 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-15 15:09:01,658-Speed 10470.89 samples/sec Loss 38.2550 LearningRate 0.0353 Epoch: 0 Global Step: 1220 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-15 15:09:09,481-Speed 10474.09 samples/sec Loss 38.2580 LearningRate 0.0356 Epoch: 0 Global Step: 1230 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-15 15:09:17,279-Speed 10506.92 samples/sec Loss 38.2631 LearningRate 0.0359 Epoch: 0 Global Step: 1240 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-15 15:09:25,064-Speed 10524.49 samples/sec Loss 38.2589 LearningRate 0.0362 Epoch: 0 Global Step: 1250 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-15 15:09:32,883-Speed 10478.86 samples/sec Loss 38.2594 LearningRate 0.0365 Epoch: 0 Global Step: 1260 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-15 15:09:40,717-Speed 10459.69 samples/sec Loss 38.2590 LearningRate 0.0367 Epoch: 0 Global Step: 1270 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-15 15:09:48,561-Speed 10444.52 samples/sec Loss 38.2660 LearningRate 0.0370 Epoch: 0 Global Step: 1280 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-15 15:09:56,401-Speed 10451.40 samples/sec Loss 38.2441 LearningRate 0.0373 Epoch: 0 Global Step: 1290 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:10:04,218-Speed 10480.84 samples/sec Loss 38.2531 LearningRate 0.0376 Epoch: 0 Global Step: 1300 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:10:12,055-Speed 10455.44 samples/sec Loss 38.2634 LearningRate 0.0379 Epoch: 0 Global Step: 1310 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:10:19,905-Speed 10438.29 samples/sec Loss 38.2514 LearningRate 0.0382 Epoch: 0 Global Step: 1320 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:10:27,719-Speed 10484.41 samples/sec Loss 38.2451 LearningRate 0.0385 Epoch: 0 Global Step: 1330 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:10:35,561-Speed 10448.89 samples/sec Loss 38.2466 LearningRate 0.0388 Epoch: 0 Global Step: 1340 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:10:43,407-Speed 10442.84 samples/sec Loss 38.2262 LearningRate 0.0391 Epoch: 0 Global Step: 1350 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:10:51,242-Speed 10456.23 samples/sec Loss 38.2228 LearningRate 0.0394 Epoch: 0 Global Step: 1360 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:10:59,067-Speed 10470.81 samples/sec Loss 38.2284 LearningRate 0.0396 Epoch: 0 Global Step: 1370 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:11:06,914-Speed 10440.98 samples/sec Loss 38.2449 LearningRate 0.0399 Epoch: 0 Global Step: 1380 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:11:14,746-Speed 10460.76 samples/sec Loss 38.2194 LearningRate 0.0402 Epoch: 0 Global Step: 1390 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:11:22,577-Speed 10463.14 samples/sec Loss 38.2370 LearningRate 0.0405 Epoch: 0 Global Step: 1400 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:11:30,388-Speed 10490.00 samples/sec Loss 38.2302 LearningRate 0.0408 Epoch: 0 Global Step: 1410 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:11:38,166-Speed 10535.38 samples/sec Loss 38.2242 LearningRate 0.0411 Epoch: 0 Global Step: 1420 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:11:45,977-Speed 10492.37 samples/sec Loss 38.2461 LearningRate 0.0414 Epoch: 0 Global Step: 1430 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:11:53,777-Speed 10503.71 samples/sec Loss 38.2388 LearningRate 0.0417 Epoch: 0 Global Step: 1440 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:12:01,586-Speed 10496.64 samples/sec Loss 38.2325 LearningRate 0.0420 Epoch: 0 Global Step: 1450 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:12:09,373-Speed 10522.35 samples/sec Loss 38.2465 LearningRate 0.0422 Epoch: 0 Global Step: 1460 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:12:17,194-Speed 10476.19 samples/sec Loss 38.2373 LearningRate 0.0425 Epoch: 0 Global Step: 1470 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:12:25,025-Speed 10463.27 samples/sec Loss 38.2213 LearningRate 0.0428 Epoch: 0 Global Step: 1480 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:12:32,836-Speed 10489.71 samples/sec Loss 38.2317 LearningRate 0.0431 Epoch: 0 Global Step: 1490 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:12:40,625-Speed 10519.25 samples/sec Loss 38.2165 LearningRate 0.0434 Epoch: 0 Global Step: 1500 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:12:48,418-Speed 10514.16 samples/sec Loss 38.2158 LearningRate 0.0437 Epoch: 0 Global Step: 1510 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:12:56,192-Speed 10537.78 samples/sec Loss 38.2174 LearningRate 0.0440 Epoch: 0 Global Step: 1520 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:13:03,982-Speed 10517.37 samples/sec Loss 38.2420 LearningRate 0.0443 Epoch: 0 Global Step: 1530 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:13:11,787-Speed 10497.41 samples/sec Loss 38.2211 LearningRate 0.0446 Epoch: 0 Global Step: 1540 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:13:19,604-Speed 10481.47 samples/sec Loss 38.2172 LearningRate 0.0448 Epoch: 0 Global Step: 1550 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:13:27,405-Speed 10502.79 samples/sec Loss 38.2099 LearningRate 0.0451 Epoch: 0 Global Step: 1560 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:13:35,264-Speed 10425.89 samples/sec Loss 38.2204 LearningRate 0.0454 Epoch: 0 Global Step: 1570 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:13:43,069-Speed 10496.95 samples/sec Loss 38.2153 LearningRate 0.0457 Epoch: 0 Global Step: 1580 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:13:50,899-Speed 10463.62 samples/sec Loss 38.2126 LearningRate 0.0460 Epoch: 0 Global Step: 1590 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:13:58,688-Speed 10519.78 samples/sec Loss 38.2188 LearningRate 0.0463 Epoch: 0 Global Step: 1600 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:14:06,485-Speed 10508.26 samples/sec Loss 38.2258 LearningRate 0.0466 Epoch: 0 Global Step: 1610 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:14:14,342-Speed 10426.95 samples/sec Loss 38.2314 LearningRate 0.0469 Epoch: 0 Global Step: 1620 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:14:22,165-Speed 10473.64 samples/sec Loss 38.2210 LearningRate 0.0472 Epoch: 0 Global Step: 1630 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:14:29,995-Speed 10465.60 samples/sec Loss 38.2138 LearningRate 0.0475 Epoch: 0 Global Step: 1640 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:14:37,814-Speed 10478.86 samples/sec Loss 38.2127 LearningRate 0.0477 Epoch: 0 Global Step: 1650 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:14:45,603-Speed 10520.92 samples/sec Loss 38.2195 LearningRate 0.0480 Epoch: 0 Global Step: 1660 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:14:53,401-Speed 10507.92 samples/sec Loss 38.1962 LearningRate 0.0483 Epoch: 0 Global Step: 1670 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:15:01,200-Speed 10506.55 samples/sec Loss 38.2052 LearningRate 0.0486 Epoch: 0 Global Step: 1680 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:15:08,990-Speed 10518.17 samples/sec Loss 38.2062 LearningRate 0.0489 Epoch: 0 Global Step: 1690 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:15:16,792-Speed 10501.04 samples/sec Loss 38.2024 LearningRate 0.0492 Epoch: 0 Global Step: 1700 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:15:24,608-Speed 10481.50 samples/sec Loss 38.2109 LearningRate 0.0495 Epoch: 0 Global Step: 1710 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:15:32,405-Speed 10508.49 samples/sec Loss 38.1828 LearningRate 0.0498 Epoch: 0 Global Step: 1720 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:15:40,192-Speed 10522.60 samples/sec Loss 38.1993 LearningRate 0.0501 Epoch: 0 Global Step: 1730 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:15:47,972-Speed 10529.90 samples/sec Loss 38.1804 LearningRate 0.0503 Epoch: 0 Global Step: 1740 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:15:55,779-Speed 10493.89 samples/sec Loss 38.1848 LearningRate 0.0506 Epoch: 0 Global Step: 1750 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:16:03,576-Speed 10508.88 samples/sec Loss 38.1711 LearningRate 0.0509 Epoch: 0 Global Step: 1760 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:16:11,365-Speed 10517.99 samples/sec Loss 38.1686 LearningRate 0.0512 Epoch: 0 Global Step: 1770 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:16:19,182-Speed 10480.77 samples/sec Loss 38.1582 LearningRate 0.0515 Epoch: 0 Global Step: 1780 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:16:26,978-Speed 10509.48 samples/sec Loss 38.1432 LearningRate 0.0518 Epoch: 0 Global Step: 1790 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:16:34,801-Speed 10474.21 samples/sec Loss 38.1435 LearningRate 0.0521 Epoch: 0 Global Step: 1800 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:16:42,583-Speed 10527.10 samples/sec Loss 38.1528 LearningRate 0.0524 Epoch: 0 Global Step: 1810 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:16:50,387-Speed 10499.77 samples/sec Loss 38.1379 LearningRate 0.0527 Epoch: 0 Global Step: 1820 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:16:58,187-Speed 10505.29 samples/sec Loss 38.1374 LearningRate 0.0530 Epoch: 0 Global Step: 1830 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:17:05,974-Speed 10522.03 samples/sec Loss 38.1258 LearningRate 0.0532 Epoch: 0 Global Step: 1840 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:17:13,782-Speed 10493.58 samples/sec Loss 38.1100 LearningRate 0.0535 Epoch: 0 Global Step: 1850 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:17:21,576-Speed 10512.19 samples/sec Loss 38.0913 LearningRate 0.0538 Epoch: 0 Global Step: 1860 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:17:29,404-Speed 10468.36 samples/sec Loss 38.0832 LearningRate 0.0541 Epoch: 0 Global Step: 1870 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:17:37,227-Speed 10472.51 samples/sec Loss 38.0898 LearningRate 0.0544 Epoch: 0 Global Step: 1880 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:17:45,027-Speed 10505.35 samples/sec Loss 38.0614 LearningRate 0.0547 Epoch: 0 Global Step: 1890 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:17:52,823-Speed 10509.05 samples/sec Loss 38.0515 LearningRate 0.0550 Epoch: 0 Global Step: 1900 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:18:00,615-Speed 10520.08 samples/sec Loss 38.0361 LearningRate 0.0553 Epoch: 0 Global Step: 1910 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:18:08,430-Speed 10484.27 samples/sec Loss 38.0311 LearningRate 0.0556 Epoch: 0 Global Step: 1920 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:18:16,231-Speed 10502.84 samples/sec Loss 38.0174 LearningRate 0.0558 Epoch: 0 Global Step: 1930 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:18:24,025-Speed 10511.93 samples/sec Loss 37.9984 LearningRate 0.0561 Epoch: 0 Global Step: 1940 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:18:31,847-Speed 10474.63 samples/sec Loss 38.0002 LearningRate 0.0564 Epoch: 0 Global Step: 1950 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:18:39,679-Speed 10461.09 samples/sec Loss 37.9825 LearningRate 0.0567 Epoch: 0 Global Step: 1960 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:18:47,472-Speed 10513.76 samples/sec Loss 37.9691 LearningRate 0.0570 Epoch: 0 Global Step: 1970 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:18:55,260-Speed 10519.54 samples/sec Loss 37.9501 LearningRate 0.0573 Epoch: 0 Global Step: 1980 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:19:03,064-Speed 10499.98 samples/sec Loss 37.9256 LearningRate 0.0576 Epoch: 0 Global Step: 1990 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:19:10,867-Speed 10500.10 samples/sec Loss 37.9307 LearningRate 0.0579 Epoch: 0 Global Step: 2000 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:19:18,704-Speed 10460.23 samples/sec Loss 37.8758 LearningRate 0.0582 Epoch: 0 Global Step: 2010 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:19:26,510-Speed 10495.70 samples/sec Loss 37.8726 LearningRate 0.0584 Epoch: 0 Global Step: 2020 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:19:34,325-Speed 10485.33 samples/sec Loss 37.8509 LearningRate 0.0587 Epoch: 0 Global Step: 2030 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:19:42,114-Speed 10519.45 samples/sec Loss 37.8367 LearningRate 0.0590 Epoch: 0 Global Step: 2040 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:19:49,918-Speed 10498.36 samples/sec Loss 37.8342 LearningRate 0.0593 Epoch: 0 Global Step: 2050 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:19:57,739-Speed 10476.44 samples/sec Loss 37.7983 LearningRate 0.0596 Epoch: 0 Global Step: 2060 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:20:05,579-Speed 10449.43 samples/sec Loss 37.7672 LearningRate 0.0599 Epoch: 0 Global Step: 2070 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:20:13,373-Speed 10513.28 samples/sec Loss 37.7556 LearningRate 0.0602 Epoch: 0 Global Step: 2080 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:20:21,192-Speed 10479.21 samples/sec Loss 37.7510 LearningRate 0.0605 Epoch: 0 Global Step: 2090 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:20:29,076-Speed 10391.54 samples/sec Loss 37.6926 LearningRate 0.0608 Epoch: 0 Global Step: 2100 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:20:36,894-Speed 10479.97 samples/sec Loss 37.6947 LearningRate 0.0611 Epoch: 0 Global Step: 2110 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:20:44,756-Speed 10422.46 samples/sec Loss 37.6644 LearningRate 0.0613 Epoch: 0 Global Step: 2120 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:20:52,589-Speed 10462.30 samples/sec Loss 37.6445 LearningRate 0.0616 Epoch: 0 Global Step: 2130 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:21:00,455-Speed 10415.41 samples/sec Loss 37.6335 LearningRate 0.0619 Epoch: 0 Global Step: 2140 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:21:08,276-Speed 10477.50 samples/sec Loss 37.6002 LearningRate 0.0622 Epoch: 0 Global Step: 2150 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:21:16,089-Speed 10486.26 samples/sec Loss 37.5671 LearningRate 0.0625 Epoch: 0 Global Step: 2160 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:21:23,907-Speed 10480.15 samples/sec Loss 37.5524 LearningRate 0.0628 Epoch: 0 Global Step: 2170 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:21:31,726-Speed 10480.34 samples/sec Loss 37.5635 LearningRate 0.0631 Epoch: 0 Global Step: 2180 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:21:39,593-Speed 10414.93 samples/sec Loss 37.5049 LearningRate 0.0634 Epoch: 0 Global Step: 2190 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:21:47,412-Speed 10480.15 samples/sec Loss 37.4907 LearningRate 0.0637 Epoch: 0 Global Step: 2200 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:21:55,216-Speed 10499.33 samples/sec Loss 37.4874 LearningRate 0.0639 Epoch: 0 Global Step: 2210 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:22:03,039-Speed 10475.94 samples/sec Loss 37.4474 LearningRate 0.0642 Epoch: 0 Global Step: 2220 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:22:10,832-Speed 10515.31 samples/sec Loss 37.4129 LearningRate 0.0645 Epoch: 0 Global Step: 2230 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:22:18,643-Speed 10489.27 samples/sec Loss 37.3853 LearningRate 0.0648 Epoch: 0 Global Step: 2240 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:22:26,453-Speed 10491.88 samples/sec Loss 37.3638 LearningRate 0.0651 Epoch: 0 Global Step: 2250 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:22:34,245-Speed 10516.23 samples/sec Loss 37.3398 LearningRate 0.0654 Epoch: 0 Global Step: 2260 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:22:42,013-Speed 10546.88 samples/sec Loss 37.3354 LearningRate 0.0657 Epoch: 0 Global Step: 2270 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:22:49,816-Speed 10500.21 samples/sec Loss 37.2706 LearningRate 0.0660 Epoch: 0 Global Step: 2280 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:22:57,643-Speed 10467.79 samples/sec Loss 37.2561 LearningRate 0.0663 Epoch: 0 Global Step: 2290 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:23:05,447-Speed 10498.59 samples/sec Loss 37.2284 LearningRate 0.0666 Epoch: 0 Global Step: 2300 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:23:13,284-Speed 10455.31 samples/sec Loss 37.1947 LearningRate 0.0668 Epoch: 0 Global Step: 2310 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:23:21,089-Speed 10502.41 samples/sec Loss 37.1694 LearningRate 0.0671 Epoch: 0 Global Step: 2320 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:23:28,883-Speed 10512.93 samples/sec Loss 37.1521 LearningRate 0.0674 Epoch: 0 Global Step: 2330 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:23:36,697-Speed 10486.03 samples/sec Loss 37.1284 LearningRate 0.0677 Epoch: 0 Global Step: 2340 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:23:44,511-Speed 10486.82 samples/sec Loss 37.0817 LearningRate 0.0680 Epoch: 0 Global Step: 2350 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:23:52,323-Speed 10487.49 samples/sec Loss 37.0449 LearningRate 0.0683 Epoch: 0 Global Step: 2360 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:24:00,103-Speed 10533.22 samples/sec Loss 37.0263 LearningRate 0.0686 Epoch: 0 Global Step: 2370 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:24:07,914-Speed 10489.56 samples/sec Loss 37.0215 LearningRate 0.0689 Epoch: 0 Global Step: 2380 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:24:15,737-Speed 10472.67 samples/sec Loss 36.9717 LearningRate 0.0692 Epoch: 0 Global Step: 2390 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:24:23,545-Speed 10494.06 samples/sec Loss 36.9356 LearningRate 0.0694 Epoch: 0 Global Step: 2400 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:24:31,349-Speed 10499.00 samples/sec Loss 36.8997 LearningRate 0.0697 Epoch: 0 Global Step: 2410 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:24:39,151-Speed 10502.02 samples/sec Loss 36.9067 LearningRate 0.0700 Epoch: 0 Global Step: 2420 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:24:46,918-Speed 10548.83 samples/sec Loss 36.8396 LearningRate 0.0703 Epoch: 0 Global Step: 2430 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:24:54,702-Speed 10525.76 samples/sec Loss 36.8042 LearningRate 0.0706 Epoch: 0 Global Step: 2440 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:25:02,530-Speed 10467.31 samples/sec Loss 36.7970 LearningRate 0.0709 Epoch: 0 Global Step: 2450 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:25:10,348-Speed 10479.66 samples/sec Loss 36.7402 LearningRate 0.0712 Epoch: 0 Global Step: 2460 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:25:18,135-Speed 10522.50 samples/sec Loss 36.7494 LearningRate 0.0715 Epoch: 0 Global Step: 2470 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:25:25,950-Speed 10484.56 samples/sec Loss 36.6892 LearningRate 0.0718 Epoch: 0 Global Step: 2480 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:25:33,741-Speed 10517.48 samples/sec Loss 36.6800 LearningRate 0.0720 Epoch: 0 Global Step: 2490 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:25:41,530-Speed 10519.84 samples/sec Loss 36.6308 LearningRate 0.0723 Epoch: 0 Global Step: 2500 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:25:49,346-Speed 10482.19 samples/sec Loss 36.5907 LearningRate 0.0726 Epoch: 0 Global Step: 2510 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:25:57,138-Speed 10516.35 samples/sec Loss 36.5710 LearningRate 0.0729 Epoch: 0 Global Step: 2520 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:26:04,959-Speed 10475.49 samples/sec Loss 36.5179 LearningRate 0.0732 Epoch: 0 Global Step: 2530 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:26:12,750-Speed 10516.80 samples/sec Loss 36.4886 LearningRate 0.0735 Epoch: 0 Global Step: 2540 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:26:20,565-Speed 10484.18 samples/sec Loss 36.4848 LearningRate 0.0738 Epoch: 0 Global Step: 2550 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:26:28,367-Speed 10502.45 samples/sec Loss 36.4192 LearningRate 0.0741 Epoch: 0 Global Step: 2560 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:26:36,137-Speed 10544.85 samples/sec Loss 36.3733 LearningRate 0.0744 Epoch: 0 Global Step: 2570 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:26:43,925-Speed 10520.58 samples/sec Loss 36.3664 LearningRate 0.0747 Epoch: 0 Global Step: 2580 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:26:51,737-Speed 10488.58 samples/sec Loss 36.3128 LearningRate 0.0749 Epoch: 0 Global Step: 2590 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:26:59,540-Speed 10500.14 samples/sec Loss 36.2821 LearningRate 0.0752 Epoch: 0 Global Step: 2600 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:27:07,341-Speed 10502.79 samples/sec Loss 36.2552 LearningRate 0.0755 Epoch: 0 Global Step: 2610 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:27:15,156-Speed 10483.95 samples/sec Loss 36.2089 LearningRate 0.0758 Epoch: 0 Global Step: 2620 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:27:22,998-Speed 10446.35 samples/sec Loss 36.2069 LearningRate 0.0761 Epoch: 0 Global Step: 2630 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:27:30,905-Speed 10362.62 samples/sec Loss 36.1370 LearningRate 0.0764 Epoch: 0 Global Step: 2640 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:27:38,719-Speed 10485.85 samples/sec Loss 36.0996 LearningRate 0.0767 Epoch: 0 Global Step: 2650 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:27:46,534-Speed 10482.96 samples/sec Loss 36.0538 LearningRate 0.0770 Epoch: 0 Global Step: 2660 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:27:54,379-Speed 10445.79 samples/sec Loss 36.0217 LearningRate 0.0773 Epoch: 0 Global Step: 2670 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:28:02,305-Speed 10339.66 samples/sec Loss 36.0343 LearningRate 0.0775 Epoch: 0 Global Step: 2680 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:28:10,122-Speed 10481.53 samples/sec Loss 35.9923 LearningRate 0.0778 Epoch: 0 Global Step: 2690 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:28:17,963-Speed 10449.58 samples/sec Loss 35.9154 LearningRate 0.0781 Epoch: 0 Global Step: 2700 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:28:25,801-Speed 10452.70 samples/sec Loss 35.9000 LearningRate 0.0784 Epoch: 0 Global Step: 2710 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:28:33,653-Speed 10436.45 samples/sec Loss 35.8745 LearningRate 0.0787 Epoch: 0 Global Step: 2720 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:28:41,465-Speed 10488.95 samples/sec Loss 35.7826 LearningRate 0.0790 Epoch: 0 Global Step: 2730 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:28:49,239-Speed 10539.32 samples/sec Loss 35.7663 LearningRate 0.0793 Epoch: 0 Global Step: 2740 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:28:57,019-Speed 10531.22 samples/sec Loss 35.7742 LearningRate 0.0796 Epoch: 0 Global Step: 2750 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:29:04,816-Speed 10507.66 samples/sec Loss 35.6836 LearningRate 0.0799 Epoch: 0 Global Step: 2760 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:29:12,597-Speed 10530.61 samples/sec Loss 35.6889 LearningRate 0.0802 Epoch: 0 Global Step: 2770 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:29:20,388-Speed 10517.57 samples/sec Loss 35.6139 LearningRate 0.0804 Epoch: 0 Global Step: 2780 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:29:28,179-Speed 10515.48 samples/sec Loss 35.6009 LearningRate 0.0807 Epoch: 0 Global Step: 2790 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:29:35,968-Speed 10518.82 samples/sec Loss 35.5053 LearningRate 0.0810 Epoch: 0 Global Step: 2800 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:29:43,806-Speed 10456.53 samples/sec Loss 35.4856 LearningRate 0.0813 Epoch: 0 Global Step: 2810 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:29:51,638-Speed 10462.49 samples/sec Loss 35.4605 LearningRate 0.0816 Epoch: 0 Global Step: 2820 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:29:59,442-Speed 10498.56 samples/sec Loss 35.4022 LearningRate 0.0819 Epoch: 0 Global Step: 2830 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:30:07,206-Speed 10554.12 samples/sec Loss 35.3554 LearningRate 0.0822 Epoch: 0 Global Step: 2840 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:30:14,990-Speed 10531.96 samples/sec Loss 35.3110 LearningRate 0.0825 Epoch: 0 Global Step: 2850 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:30:22,811-Speed 10476.64 samples/sec Loss 35.2568 LearningRate 0.0828 Epoch: 0 Global Step: 2860 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:30:30,626-Speed 10484.59 samples/sec Loss 35.2275 LearningRate 0.0830 Epoch: 0 Global Step: 2870 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:30:38,461-Speed 10457.73 samples/sec Loss 35.1963 LearningRate 0.0833 Epoch: 0 Global Step: 2880 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:30:46,302-Speed 10449.24 samples/sec Loss 35.1357 LearningRate 0.0836 Epoch: 0 Global Step: 2890 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:30:54,085-Speed 10527.99 samples/sec Loss 35.0861 LearningRate 0.0839 Epoch: 0 Global Step: 2900 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:31:01,875-Speed 10517.28 samples/sec Loss 35.0436 LearningRate 0.0842 Epoch: 0 Global Step: 2910 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:31:09,666-Speed 10516.98 samples/sec Loss 34.9981 LearningRate 0.0845 Epoch: 0 Global Step: 2920 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:31:17,435-Speed 10545.17 samples/sec Loss 34.9496 LearningRate 0.0848 Epoch: 0 Global Step: 2930 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:31:25,245-Speed 10492.11 samples/sec Loss 34.9343 LearningRate 0.0851 Epoch: 0 Global Step: 2940 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:31:33,058-Speed 10488.17 samples/sec Loss 34.8839 LearningRate 0.0854 Epoch: 0 Global Step: 2950 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:31:40,865-Speed 10495.47 samples/sec Loss 34.8213 LearningRate 0.0856 Epoch: 0 Global Step: 2960 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:31:48,669-Speed 10500.00 samples/sec Loss 34.7541 LearningRate 0.0859 Epoch: 0 Global Step: 2970 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:31:56,517-Speed 10444.70 samples/sec Loss 34.7399 LearningRate 0.0862 Epoch: 0 Global Step: 2980 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:32:04,341-Speed 10473.00 samples/sec Loss 34.7122 LearningRate 0.0865 Epoch: 0 Global Step: 2990 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:32:12,142-Speed 10507.68 samples/sec Loss 34.6536 LearningRate 0.0868 Epoch: 0 Global Step: 3000 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:32:19,944-Speed 10502.26 samples/sec Loss 34.5941 LearningRate 0.0871 Epoch: 0 Global Step: 3010 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:32:27,770-Speed 10470.14 samples/sec Loss 34.5121 LearningRate 0.0874 Epoch: 0 Global Step: 3020 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:32:35,562-Speed 10514.26 samples/sec Loss 34.4775 LearningRate 0.0877 Epoch: 0 Global Step: 3030 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:32:43,356-Speed 10512.73 samples/sec Loss 34.4509 LearningRate 0.0880 Epoch: 0 Global Step: 3040 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:32:51,160-Speed 10499.57 samples/sec Loss 34.3857 LearningRate 0.0883 Epoch: 0 Global Step: 3050 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:32:58,977-Speed 10483.51 samples/sec Loss 34.3196 LearningRate 0.0885 Epoch: 0 Global Step: 3060 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:33:06,812-Speed 10457.88 samples/sec Loss 34.2863 LearningRate 0.0888 Epoch: 0 Global Step: 3070 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:33:14,619-Speed 10495.16 samples/sec Loss 34.2116 LearningRate 0.0891 Epoch: 0 Global Step: 3080 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:33:22,464-Speed 10444.87 samples/sec Loss 34.2060 LearningRate 0.0894 Epoch: 0 Global Step: 3090 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:33:30,316-Speed 10435.27 samples/sec Loss 34.1100 LearningRate 0.0897 Epoch: 0 Global Step: 3100 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:33:38,127-Speed 10490.78 samples/sec Loss 34.0533 LearningRate 0.0900 Epoch: 0 Global Step: 3110 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:33:45,939-Speed 10488.72 samples/sec Loss 34.0500 LearningRate 0.0903 Epoch: 0 Global Step: 3120 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:33:53,745-Speed 10496.52 samples/sec Loss 33.9818 LearningRate 0.0906 Epoch: 0 Global Step: 3130 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:34:01,541-Speed 10510.39 samples/sec Loss 33.9360 LearningRate 0.0909 Epoch: 0 Global Step: 3140 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:34:09,363-Speed 10474.12 samples/sec Loss 33.8508 LearningRate 0.0911 Epoch: 0 Global Step: 3150 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:34:17,164-Speed 10502.07 samples/sec Loss 33.8342 LearningRate 0.0914 Epoch: 0 Global Step: 3160 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:34:24,967-Speed 10501.04 samples/sec Loss 33.7779 LearningRate 0.0917 Epoch: 0 Global Step: 3170 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:34:32,766-Speed 10509.29 samples/sec Loss 33.7098 LearningRate 0.0920 Epoch: 0 Global Step: 3180 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:34:40,581-Speed 10484.72 samples/sec Loss 33.6542 LearningRate 0.0923 Epoch: 0 Global Step: 3190 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:34:48,375-Speed 10511.01 samples/sec Loss 33.6219 LearningRate 0.0926 Epoch: 0 Global Step: 3200 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:34:56,159-Speed 10527.03 samples/sec Loss 33.5585 LearningRate 0.0929 Epoch: 0 Global Step: 3210 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:35:03,932-Speed 10541.08 samples/sec Loss 33.4974 LearningRate 0.0932 Epoch: 0 Global Step: 3220 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:35:11,739-Speed 10496.01 samples/sec Loss 33.4497 LearningRate 0.0935 Epoch: 0 Global Step: 3230 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:35:19,517-Speed 10534.23 samples/sec Loss 33.4034 LearningRate 0.0938 Epoch: 0 Global Step: 3240 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:35:27,333-Speed 10482.94 samples/sec Loss 33.3321 LearningRate 0.0940 Epoch: 0 Global Step: 3250 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:35:35,124-Speed 10516.70 samples/sec Loss 33.2796 LearningRate 0.0943 Epoch: 0 Global Step: 3260 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:35:42,900-Speed 10540.48 samples/sec Loss 33.2219 LearningRate 0.0946 Epoch: 0 Global Step: 3270 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:35:50,686-Speed 10527.31 samples/sec Loss 33.2085 LearningRate 0.0949 Epoch: 0 Global Step: 3280 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:35:58,466-Speed 10531.65 samples/sec Loss 33.1250 LearningRate 0.0952 Epoch: 0 Global Step: 3290 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:36:06,261-Speed 10511.97 samples/sec Loss 33.0747 LearningRate 0.0955 Epoch: 0 Global Step: 3300 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:36:14,086-Speed 10471.75 samples/sec Loss 33.0011 LearningRate 0.0958 Epoch: 0 Global Step: 3310 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:36:21,877-Speed 10516.99 samples/sec Loss 32.9442 LearningRate 0.0961 Epoch: 0 Global Step: 3320 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:36:29,677-Speed 10504.55 samples/sec Loss 32.9245 LearningRate 0.0964 Epoch: 0 Global Step: 3330 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:36:37,474-Speed 10509.25 samples/sec Loss 32.8794 LearningRate 0.0966 Epoch: 0 Global Step: 3340 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:36:45,302-Speed 10466.73 samples/sec Loss 32.7904 LearningRate 0.0969 Epoch: 0 Global Step: 3350 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:36:53,106-Speed 10499.43 samples/sec Loss 32.7350 LearningRate 0.0972 Epoch: 0 Global Step: 3360 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:37:00,916-Speed 10491.81 samples/sec Loss 32.7126 LearningRate 0.0975 Epoch: 0 Global Step: 3370 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:37:08,720-Speed 10500.33 samples/sec Loss 32.6052 LearningRate 0.0978 Epoch: 0 Global Step: 3380 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:37:16,513-Speed 10514.29 samples/sec Loss 32.5599 LearningRate 0.0981 Epoch: 0 Global Step: 3390 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:37:24,327-Speed 10486.17 samples/sec Loss 32.5151 LearningRate 0.0984 Epoch: 0 Global Step: 3400 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:37:32,145-Speed 10480.00 samples/sec Loss 32.4110 LearningRate 0.0987 Epoch: 0 Global Step: 3410 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:37:39,952-Speed 10495.50 samples/sec Loss 32.4042 LearningRate 0.0990 Epoch: 0 Global Step: 3420 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:37:47,769-Speed 10481.67 samples/sec Loss 32.3661 LearningRate 0.0992 Epoch: 0 Global Step: 3430 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:37:55,580-Speed 10493.29 samples/sec Loss 32.2395 LearningRate 0.0995 Epoch: 0 Global Step: 3440 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:38:03,364-Speed 10526.15 samples/sec Loss 32.2077 LearningRate 0.0998 Epoch: 0 Global Step: 3450 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:38:11,214-Speed 10436.93 samples/sec Loss 32.1870 LearningRate 0.1001 Epoch: 0 Global Step: 3460 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:38:18,995-Speed 10530.28 samples/sec Loss 32.0700 LearningRate 0.1004 Epoch: 0 Global Step: 3470 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:38:26,826-Speed 10467.89 samples/sec Loss 32.0359 LearningRate 0.1007 Epoch: 0 Global Step: 3480 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:38:34,619-Speed 10513.45 samples/sec Loss 31.9767 LearningRate 0.1010 Epoch: 0 Global Step: 3490 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:38:42,418-Speed 10505.80 samples/sec Loss 31.9094 LearningRate 0.1013 Epoch: 0 Global Step: 3500 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:38:50,201-Speed 10528.22 samples/sec Loss 31.8932 LearningRate 0.1016 Epoch: 0 Global Step: 3510 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:38:57,987-Speed 10523.75 samples/sec Loss 31.8085 LearningRate 0.1019 Epoch: 0 Global Step: 3520 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:39:05,765-Speed 10535.46 samples/sec Loss 31.7308 LearningRate 0.1021 Epoch: 0 Global Step: 3530 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:39:13,562-Speed 10509.49 samples/sec Loss 31.6538 LearningRate 0.1024 Epoch: 0 Global Step: 3540 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:39:21,372-Speed 10490.85 samples/sec Loss 31.5949 LearningRate 0.1027 Epoch: 0 Global Step: 3550 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:39:29,213-Speed 10450.21 samples/sec Loss 31.5747 LearningRate 0.1030 Epoch: 0 Global Step: 3560 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:39:37,035-Speed 10474.34 samples/sec Loss 31.4318 LearningRate 0.1033 Epoch: 0 Global Step: 3570 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:39:44,822-Speed 10521.69 samples/sec Loss 31.4191 LearningRate 0.1036 Epoch: 0 Global Step: 3580 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:39:52,654-Speed 10462.61 samples/sec Loss 31.3333 LearningRate 0.1039 Epoch: 0 Global Step: 3590 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:40:00,499-Speed 10443.65 samples/sec Loss 31.2756 LearningRate 0.1042 Epoch: 0 Global Step: 3600 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:40:08,268-Speed 10546.84 samples/sec Loss 31.2849 LearningRate 0.1045 Epoch: 0 Global Step: 3610 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:40:16,082-Speed 10485.18 samples/sec Loss 31.1584 LearningRate 0.1047 Epoch: 0 Global Step: 3620 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:40:23,887-Speed 10497.70 samples/sec Loss 31.1386 LearningRate 0.1050 Epoch: 0 Global Step: 3630 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:40:31,691-Speed 10497.67 samples/sec Loss 31.1046 LearningRate 0.1053 Epoch: 0 Global Step: 3640 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:40:39,522-Speed 10463.22 samples/sec Loss 31.0417 LearningRate 0.1056 Epoch: 0 Global Step: 3650 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:40:47,306-Speed 10526.62 samples/sec Loss 30.9521 LearningRate 0.1059 Epoch: 0 Global Step: 3660 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:40:55,098-Speed 10514.12 samples/sec Loss 30.8901 LearningRate 0.1062 Epoch: 0 Global Step: 3670 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:41:02,929-Speed 10464.51 samples/sec Loss 30.8120 LearningRate 0.1065 Epoch: 0 Global Step: 3680 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:41:10,714-Speed 10524.27 samples/sec Loss 30.7770 LearningRate 0.1068 Epoch: 0 Global Step: 3690 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:41:18,522-Speed 10492.70 samples/sec Loss 30.7184 LearningRate 0.1071 Epoch: 0 Global Step: 3700 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:41:26,308-Speed 10524.30 samples/sec Loss 30.5893 LearningRate 0.1073 Epoch: 0 Global Step: 3710 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:41:34,115-Speed 10494.15 samples/sec Loss 30.6073 LearningRate 0.1076 Epoch: 0 Global Step: 3720 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:41:41,942-Speed 10468.96 samples/sec Loss 30.5308 LearningRate 0.1079 Epoch: 0 Global Step: 3730 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:41:49,788-Speed 10443.70 samples/sec Loss 30.4862 LearningRate 0.1082 Epoch: 0 Global Step: 3740 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:41:57,577-Speed 10518.82 samples/sec Loss 30.3750 LearningRate 0.1085 Epoch: 0 Global Step: 3750 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:42:05,377-Speed 10504.17 samples/sec Loss 30.3702 LearningRate 0.1088 Epoch: 0 Global Step: 3760 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:42:13,202-Speed 10472.70 samples/sec Loss 30.2370 LearningRate 0.1091 Epoch: 0 Global Step: 3770 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:42:21,010-Speed 10493.14 samples/sec Loss 30.2190 LearningRate 0.1094 Epoch: 0 Global Step: 3780 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:42:28,841-Speed 10463.30 samples/sec Loss 30.1498 LearningRate 0.1097 Epoch: 0 Global Step: 3790 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:42:36,639-Speed 10506.50 samples/sec Loss 30.1364 LearningRate 0.1100 Epoch: 0 Global Step: 3800 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:42:44,420-Speed 10531.01 samples/sec Loss 29.9757 LearningRate 0.1102 Epoch: 0 Global Step: 3810 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:42:52,207-Speed 10522.71 samples/sec Loss 29.9074 LearningRate 0.1105 Epoch: 0 Global Step: 3820 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:42:59,989-Speed 10529.29 samples/sec Loss 29.8381 LearningRate 0.1108 Epoch: 0 Global Step: 3830 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:43:07,807-Speed 10479.28 samples/sec Loss 29.8431 LearningRate 0.1111 Epoch: 0 Global Step: 3840 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:43:15,618-Speed 10489.77 samples/sec Loss 29.7514 LearningRate 0.1114 Epoch: 0 Global Step: 3850 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:43:23,432-Speed 10486.11 samples/sec Loss 29.6768 LearningRate 0.1117 Epoch: 0 Global Step: 3860 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:43:31,230-Speed 10505.53 samples/sec Loss 29.5710 LearningRate 0.1120 Epoch: 0 Global Step: 3870 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:43:38,996-Speed 10550.25 samples/sec Loss 29.5396 LearningRate 0.1123 Epoch: 0 Global Step: 3880 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:43:46,790-Speed 10513.25 samples/sec Loss 29.5692 LearningRate 0.1126 Epoch: 0 Global Step: 3890 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:43:54,626-Speed 10461.16 samples/sec Loss 29.4409 LearningRate 0.1128 Epoch: 0 Global Step: 3900 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:44:02,428-Speed 10502.38 samples/sec Loss 29.3718 LearningRate 0.1131 Epoch: 0 Global Step: 3910 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:44:10,248-Speed 10476.53 samples/sec Loss 29.2817 LearningRate 0.1134 Epoch: 0 Global Step: 3920 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:44:18,055-Speed 10495.07 samples/sec Loss 29.2354 LearningRate 0.1137 Epoch: 0 Global Step: 3930 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:44:25,910-Speed 10431.11 samples/sec Loss 29.2058 LearningRate 0.1140 Epoch: 0 Global Step: 3940 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:44:33,708-Speed 10506.54 samples/sec Loss 29.1715 LearningRate 0.1143 Epoch: 0 Global Step: 3950 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:44:41,498-Speed 10518.29 samples/sec Loss 29.0271 LearningRate 0.1146 Epoch: 0 Global Step: 3960 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:44:49,312-Speed 10485.44 samples/sec Loss 29.0186 LearningRate 0.1149 Epoch: 0 Global Step: 3970 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:44:57,099-Speed 10521.55 samples/sec Loss 28.9325 LearningRate 0.1152 Epoch: 0 Global Step: 3980 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:45:04,896-Speed 10507.92 samples/sec Loss 28.8363 LearningRate 0.1155 Epoch: 0 Global Step: 3990 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:45:12,692-Speed 10510.17 samples/sec Loss 28.8304 LearningRate 0.1157 Epoch: 0 Global Step: 4000 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:45:20,470-Speed 10534.97 samples/sec Loss 28.7399 LearningRate 0.1160 Epoch: 0 Global Step: 4010 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:45:28,251-Speed 10529.90 samples/sec Loss 28.6998 LearningRate 0.1163 Epoch: 0 Global Step: 4020 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:45:36,065-Speed 10484.95 samples/sec Loss 28.5734 LearningRate 0.1166 Epoch: 0 Global Step: 4030 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:45:43,843-Speed 10534.28 samples/sec Loss 28.4916 LearningRate 0.1169 Epoch: 0 Global Step: 4040 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:45:51,656-Speed 10487.04 samples/sec Loss 28.3875 LearningRate 0.1172 Epoch: 0 Global Step: 4050 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:45:59,424-Speed 10547.00 samples/sec Loss 28.4237 LearningRate 0.1175 Epoch: 0 Global Step: 4060 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:46:07,234-Speed 10490.63 samples/sec Loss 28.2989 LearningRate 0.1178 Epoch: 0 Global Step: 4070 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:46:15,031-Speed 10509.23 samples/sec Loss 28.2516 LearningRate 0.1181 Epoch: 0 Global Step: 4080 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:46:22,820-Speed 10519.97 samples/sec Loss 28.1823 LearningRate 0.1183 Epoch: 0 Global Step: 4090 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:46:30,614-Speed 10512.40 samples/sec Loss 28.0971 LearningRate 0.1186 Epoch: 0 Global Step: 4100 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:46:38,420-Speed 10496.29 samples/sec Loss 27.9701 LearningRate 0.1189 Epoch: 0 Global Step: 4110 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:46:46,248-Speed 10467.39 samples/sec Loss 27.9133 LearningRate 0.1192 Epoch: 0 Global Step: 4120 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:46:54,033-Speed 10524.84 samples/sec Loss 27.8962 LearningRate 0.1195 Epoch: 0 Global Step: 4130 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:47:01,811-Speed 10534.79 samples/sec Loss 27.8093 LearningRate 0.1198 Epoch: 0 Global Step: 4140 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:47:11,082-Speed 8837.86 samples/sec Loss 27.8495 LearningRate 0.1201 Epoch: 0 Global Step: 4150 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:47:18,873-Speed 10516.92 samples/sec Loss 27.7178 LearningRate 0.1204 Epoch: 0 Global Step: 4160 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:47:26,680-Speed 10495.77 samples/sec Loss 27.6475 LearningRate 0.1207 Epoch: 0 Global Step: 4170 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:47:34,477-Speed 10508.21 samples/sec Loss 27.5682 LearningRate 0.1209 Epoch: 0 Global Step: 4180 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:47:42,291-Speed 10486.78 samples/sec Loss 27.4922 LearningRate 0.1212 Epoch: 0 Global Step: 4190 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:47:50,101-Speed 10496.89 samples/sec Loss 27.4057 LearningRate 0.1215 Epoch: 0 Global Step: 4200 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:47:57,894-Speed 10513.42 samples/sec Loss 27.3779 LearningRate 0.1218 Epoch: 0 Global Step: 4210 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:48:05,679-Speed 10525.71 samples/sec Loss 27.2868 LearningRate 0.1221 Epoch: 0 Global Step: 4220 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:48:13,462-Speed 10527.01 samples/sec Loss 27.2341 LearningRate 0.1224 Epoch: 0 Global Step: 4230 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:48:21,249-Speed 10522.57 samples/sec Loss 27.1557 LearningRate 0.1227 Epoch: 0 Global Step: 4240 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:48:29,031-Speed 10527.91 samples/sec Loss 27.0681 LearningRate 0.1230 Epoch: 0 Global Step: 4250 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:48:36,813-Speed 10527.69 samples/sec Loss 26.9960 LearningRate 0.1233 Epoch: 0 Global Step: 4260 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:48:44,623-Speed 10497.33 samples/sec Loss 26.9321 LearningRate 0.1236 Epoch: 0 Global Step: 4270 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:48:52,436-Speed 10487.22 samples/sec Loss 26.9226 LearningRate 0.1238 Epoch: 0 Global Step: 4280 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:49:00,251-Speed 10483.92 samples/sec Loss 26.8117 LearningRate 0.1241 Epoch: 0 Global Step: 4290 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:49:08,043-Speed 10515.39 samples/sec Loss 26.7446 LearningRate 0.1244 Epoch: 0 Global Step: 4300 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:49:15,889-Speed 10443.33 samples/sec Loss 26.6975 LearningRate 0.1247 Epoch: 0 Global Step: 4310 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:49:23,684-Speed 10509.89 samples/sec Loss 26.6354 LearningRate 0.1250 Epoch: 0 Global Step: 4320 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:49:31,503-Speed 10479.33 samples/sec Loss 26.5283 LearningRate 0.1253 Epoch: 0 Global Step: 4330 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:49:39,347-Speed 10445.27 samples/sec Loss 26.4475 LearningRate 0.1256 Epoch: 0 Global Step: 4340 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:49:47,140-Speed 10513.90 samples/sec Loss 26.3016 LearningRate 0.1259 Epoch: 0 Global Step: 4350 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:49:54,930-Speed 10517.61 samples/sec Loss 26.3724 LearningRate 0.1262 Epoch: 0 Global Step: 4360 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:50:02,731-Speed 10504.05 samples/sec Loss 26.3121 LearningRate 0.1264 Epoch: 0 Global Step: 4370 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:50:10,538-Speed 10494.97 samples/sec Loss 26.2437 LearningRate 0.1267 Epoch: 0 Global Step: 4380 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:50:18,333-Speed 10510.04 samples/sec Loss 26.1366 LearningRate 0.1270 Epoch: 0 Global Step: 4390 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:50:26,100-Speed 10549.70 samples/sec Loss 26.0627 LearningRate 0.1273 Epoch: 0 Global Step: 4400 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:50:33,911-Speed 10489.65 samples/sec Loss 26.0746 LearningRate 0.1276 Epoch: 0 Global Step: 4410 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:50:41,722-Speed 10489.51 samples/sec Loss 25.9536 LearningRate 0.1279 Epoch: 0 Global Step: 4420 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:50:49,515-Speed 10513.88 samples/sec Loss 25.8462 LearningRate 0.1282 Epoch: 0 Global Step: 4430 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:50:57,326-Speed 10488.66 samples/sec Loss 25.8146 LearningRate 0.1285 Epoch: 0 Global Step: 4440 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:51:05,161-Speed 10458.43 samples/sec Loss 25.7183 LearningRate 0.1288 Epoch: 0 Global Step: 4450 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:51:12,963-Speed 10501.40 samples/sec Loss 25.6395 LearningRate 0.1291 Epoch: 0 Global Step: 4460 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:51:20,741-Speed 10533.87 samples/sec Loss 25.5915 LearningRate 0.1293 Epoch: 0 Global Step: 4470 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:51:28,541-Speed 10502.25 samples/sec Loss 25.4946 LearningRate 0.1296 Epoch: 0 Global Step: 4480 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-01-15 15:51:36,330-Speed 10519.69 samples/sec Loss 25.4817 LearningRate 0.1299 Epoch: 0 Global Step: 4490 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:51:44,122-Speed 10515.55 samples/sec Loss 25.2924 LearningRate 0.1302 Epoch: 0 Global Step: 4500 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:51:51,983-Speed 10421.18 samples/sec Loss 25.3029 LearningRate 0.1305 Epoch: 0 Global Step: 4510 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:51:59,801-Speed 10480.02 samples/sec Loss 25.2564 LearningRate 0.1308 Epoch: 0 Global Step: 4520 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:52:07,603-Speed 10503.05 samples/sec Loss 25.1238 LearningRate 0.1311 Epoch: 0 Global Step: 4530 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:52:15,387-Speed 10528.41 samples/sec Loss 25.0810 LearningRate 0.1314 Epoch: 0 Global Step: 4540 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:52:23,169-Speed 10528.22 samples/sec Loss 24.9932 LearningRate 0.1317 Epoch: 0 Global Step: 4550 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:52:30,998-Speed 10466.56 samples/sec Loss 24.9511 LearningRate 0.1319 Epoch: 0 Global Step: 4560 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:52:38,832-Speed 10459.76 samples/sec Loss 24.9410 LearningRate 0.1322 Epoch: 0 Global Step: 4570 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:52:46,676-Speed 10445.58 samples/sec Loss 24.8144 LearningRate 0.1325 Epoch: 0 Global Step: 4580 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:52:54,495-Speed 10478.62 samples/sec Loss 24.7410 LearningRate 0.1328 Epoch: 0 Global Step: 4590 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:53:02,298-Speed 10500.67 samples/sec Loss 24.6295 LearningRate 0.1331 Epoch: 0 Global Step: 4600 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:53:10,123-Speed 10470.71 samples/sec Loss 24.6516 LearningRate 0.1334 Epoch: 0 Global Step: 4610 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:53:17,933-Speed 10491.36 samples/sec Loss 24.6048 LearningRate 0.1337 Epoch: 0 Global Step: 4620 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:53:25,734-Speed 10503.43 samples/sec Loss 24.4456 LearningRate 0.1340 Epoch: 0 Global Step: 4630 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:53:33,516-Speed 10527.26 samples/sec Loss 24.4727 LearningRate 0.1343 Epoch: 0 Global Step: 4640 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:53:41,311-Speed 10515.08 samples/sec Loss 24.3216 LearningRate 0.1345 Epoch: 0 Global Step: 4650 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:53:49,083-Speed 10541.82 samples/sec Loss 24.3399 LearningRate 0.1348 Epoch: 0 Global Step: 4660 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:53:56,859-Speed 10535.80 samples/sec Loss 24.2126 LearningRate 0.1351 Epoch: 0 Global Step: 4670 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:54:06,930-Speed 8135.17 samples/sec Loss 24.1315 LearningRate 0.1354 Epoch: 0 Global Step: 4680 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:54:14,695-Speed 10552.00 samples/sec Loss 24.0211 LearningRate 0.1357 Epoch: 0 Global Step: 4690 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-01-15 15:54:22,479-Speed 10531.31 samples/sec Loss 24.0521 LearningRate 0.1360 Epoch: 0 Global Step: 4700 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:54:30,256-Speed 10535.63 samples/sec Loss 23.9430 LearningRate 0.1363 Epoch: 0 Global Step: 4710 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:54:38,035-Speed 10532.38 samples/sec Loss 23.8784 LearningRate 0.1366 Epoch: 0 Global Step: 4720 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:54:45,813-Speed 10535.78 samples/sec Loss 23.8262 LearningRate 0.1369 Epoch: 0 Global Step: 4730 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:54:53,642-Speed 10464.64 samples/sec Loss 23.7200 LearningRate 0.1372 Epoch: 0 Global Step: 4740 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:55:01,520-Speed 10399.96 samples/sec Loss 23.6307 LearningRate 0.1374 Epoch: 0 Global Step: 4750 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:55:09,323-Speed 10500.50 samples/sec Loss 23.6713 LearningRate 0.1377 Epoch: 0 Global Step: 4760 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-01-15 15:55:17,093-Speed 10546.26 samples/sec Loss 23.5056 LearningRate 0.1380 Epoch: 0 Global Step: 4770 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-15 15:55:24,890-Speed 10507.02 samples/sec Loss 23.4991 LearningRate 0.1383 Epoch: 0 Global Step: 4780 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-15 15:55:32,685-Speed 10512.44 samples/sec Loss 23.4273 LearningRate 0.1386 Epoch: 0 Global Step: 4790 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-01-15 15:55:40,472-Speed 10521.73 samples/sec Loss 23.3681 LearningRate 0.1389 Epoch: 0 Global Step: 4800 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-15 15:55:48,252-Speed 10532.60 samples/sec Loss 23.3039 LearningRate 0.1392 Epoch: 0 Global Step: 4810 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-15 15:55:56,066-Speed 10486.69 samples/sec Loss 23.1922 LearningRate 0.1395 Epoch: 0 Global Step: 4820 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-15 15:56:03,880-Speed 10485.58 samples/sec Loss 23.0905 LearningRate 0.1398 Epoch: 0 Global Step: 4830 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-15 15:56:11,682-Speed 10500.00 samples/sec Loss 23.1084 LearningRate 0.1400 Epoch: 0 Global Step: 4840 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-15 15:56:19,490-Speed 10493.51 samples/sec Loss 22.9152 LearningRate 0.1403 Epoch: 0 Global Step: 4850 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-15 15:56:27,293-Speed 10501.44 samples/sec Loss 22.8883 LearningRate 0.1406 Epoch: 0 Global Step: 4860 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-15 15:56:35,118-Speed 10474.04 samples/sec Loss 22.8544 LearningRate 0.1409 Epoch: 0 Global Step: 4870 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 15:56:42,954-Speed 10457.50 samples/sec Loss 22.8564 LearningRate 0.1412 Epoch: 0 Global Step: 4880 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 15:56:50,775-Speed 10476.84 samples/sec Loss 22.7073 LearningRate 0.1415 Epoch: 0 Global Step: 4890 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 15:56:58,559-Speed 10525.04 samples/sec Loss 22.6033 LearningRate 0.1418 Epoch: 0 Global Step: 4900 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 15:57:06,395-Speed 10456.79 samples/sec Loss 22.5397 LearningRate 0.1421 Epoch: 0 Global Step: 4910 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 15:57:14,191-Speed 10511.64 samples/sec Loss 22.5665 LearningRate 0.1424 Epoch: 0 Global Step: 4920 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 15:57:21,987-Speed 10510.51 samples/sec Loss 22.4893 LearningRate 0.1427 Epoch: 0 Global Step: 4930 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 15:57:29,775-Speed 10519.58 samples/sec Loss 22.3900 LearningRate 0.1429 Epoch: 0 Global Step: 4940 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 15:57:37,581-Speed 10496.02 samples/sec Loss 22.3162 LearningRate 0.1432 Epoch: 0 Global Step: 4950 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 15:57:45,385-Speed 10498.43 samples/sec Loss 22.2510 LearningRate 0.1435 Epoch: 0 Global Step: 4960 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 15:57:53,191-Speed 10497.02 samples/sec Loss 22.2205 LearningRate 0.1438 Epoch: 0 Global Step: 4970 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 15:58:00,978-Speed 10521.37 samples/sec Loss 22.1409 LearningRate 0.1441 Epoch: 0 Global Step: 4980 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 15:58:08,766-Speed 10520.08 samples/sec Loss 22.0570 LearningRate 0.1444 Epoch: 0 Global Step: 4990 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 15:58:16,544-Speed 10533.53 samples/sec Loss 21.9856 LearningRate 0.1447 Epoch: 0 Global Step: 5000 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 15:58:24,349-Speed 10497.68 samples/sec Loss 21.9345 LearningRate 0.1450 Epoch: 0 Global Step: 5010 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 15:58:32,182-Speed 10459.19 samples/sec Loss 21.8381 LearningRate 0.1453 Epoch: 0 Global Step: 5020 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 15:58:39,974-Speed 10513.81 samples/sec Loss 21.7888 LearningRate 0.1455 Epoch: 0 Global Step: 5030 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 15:58:47,759-Speed 10524.80 samples/sec Loss 21.7161 LearningRate 0.1458 Epoch: 0 Global Step: 5040 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 15:58:55,558-Speed 10505.19 samples/sec Loss 21.6213 LearningRate 0.1461 Epoch: 0 Global Step: 5050 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 15:59:03,355-Speed 10508.24 samples/sec Loss 21.5554 LearningRate 0.1464 Epoch: 0 Global Step: 5060 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 15:59:11,164-Speed 10493.40 samples/sec Loss 21.4834 LearningRate 0.1467 Epoch: 0 Global Step: 5070 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 15:59:18,962-Speed 10512.35 samples/sec Loss 21.4653 LearningRate 0.1470 Epoch: 0 Global Step: 5080 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 15:59:26,759-Speed 10508.89 samples/sec Loss 21.4166 LearningRate 0.1473 Epoch: 0 Global Step: 5090 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 15:59:34,575-Speed 10482.03 samples/sec Loss 21.3460 LearningRate 0.1476 Epoch: 0 Global Step: 5100 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 15:59:42,403-Speed 10468.79 samples/sec Loss 21.2477 LearningRate 0.1479 Epoch: 0 Global Step: 5110 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 15:59:50,241-Speed 10452.56 samples/sec Loss 21.2263 LearningRate 0.1481 Epoch: 0 Global Step: 5120 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 15:59:58,067-Speed 10470.94 samples/sec Loss 21.1825 LearningRate 0.1484 Epoch: 0 Global Step: 5130 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:00:05,897-Speed 10465.49 samples/sec Loss 21.1210 LearningRate 0.1487 Epoch: 0 Global Step: 5140 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:00:13,699-Speed 10500.97 samples/sec Loss 21.0489 LearningRate 0.1490 Epoch: 0 Global Step: 5150 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:00:21,522-Speed 10474.71 samples/sec Loss 21.0028 LearningRate 0.1493 Epoch: 0 Global Step: 5160 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:00:29,332-Speed 10490.34 samples/sec Loss 20.8953 LearningRate 0.1496 Epoch: 0 Global Step: 5170 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:00:37,143-Speed 10489.88 samples/sec Loss 20.8111 LearningRate 0.1499 Epoch: 0 Global Step: 5180 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:00:59,555-Speed 3655.55 samples/sec Loss 20.7306 LearningRate 0.1502 Epoch: 1 Global Step: 5190 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:01:07,353-Speed 10508.26 samples/sec Loss 20.7049 LearningRate 0.1505 Epoch: 1 Global Step: 5200 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:01:15,145-Speed 10515.83 samples/sec Loss 20.6998 LearningRate 0.1508 Epoch: 1 Global Step: 5210 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:01:22,942-Speed 10507.34 samples/sec Loss 20.6126 LearningRate 0.1510 Epoch: 1 Global Step: 5220 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:01:30,742-Speed 10504.70 samples/sec Loss 20.4690 LearningRate 0.1513 Epoch: 1 Global Step: 5230 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:01:38,503-Speed 10557.24 samples/sec Loss 20.4268 LearningRate 0.1516 Epoch: 1 Global Step: 5240 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:01:46,308-Speed 10497.01 samples/sec Loss 20.3955 LearningRate 0.1519 Epoch: 1 Global Step: 5250 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:01:54,125-Speed 10480.51 samples/sec Loss 20.3647 LearningRate 0.1522 Epoch: 1 Global Step: 5260 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:02:01,892-Speed 10548.40 samples/sec Loss 20.2729 LearningRate 0.1525 Epoch: 1 Global Step: 5270 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:02:09,683-Speed 10516.95 samples/sec Loss 20.2423 LearningRate 0.1528 Epoch: 1 Global Step: 5280 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-15 16:02:17,484-Speed 10501.94 samples/sec Loss 20.2150 LearningRate 0.1531 Epoch: 1 Global Step: 5290 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-15 16:02:25,275-Speed 10516.63 samples/sec Loss 20.0415 LearningRate 0.1534 Epoch: 1 Global Step: 5300 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-15 16:02:33,065-Speed 10516.98 samples/sec Loss 20.0058 LearningRate 0.1536 Epoch: 1 Global Step: 5310 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-15 16:02:40,872-Speed 10494.50 samples/sec Loss 19.9916 LearningRate 0.1539 Epoch: 1 Global Step: 5320 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-15 16:02:48,642-Speed 10545.12 samples/sec Loss 19.8870 LearningRate 0.1542 Epoch: 1 Global Step: 5330 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-15 16:02:56,423-Speed 10529.68 samples/sec Loss 19.7920 LearningRate 0.1545 Epoch: 1 Global Step: 5340 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-15 16:03:04,209-Speed 10524.84 samples/sec Loss 19.8201 LearningRate 0.1548 Epoch: 1 Global Step: 5350 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-15 16:03:12,010-Speed 10504.11 samples/sec Loss 19.7805 LearningRate 0.1551 Epoch: 1 Global Step: 5360 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-15 16:03:19,860-Speed 10437.17 samples/sec Loss 19.6860 LearningRate 0.1554 Epoch: 1 Global Step: 5370 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-15 16:03:27,676-Speed 10484.18 samples/sec Loss 19.6766 LearningRate 0.1557 Epoch: 1 Global Step: 5380 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:03:35,460-Speed 10525.73 samples/sec Loss 19.6099 LearningRate 0.1560 Epoch: 1 Global Step: 5390 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:03:43,245-Speed 10525.17 samples/sec Loss 19.4645 LearningRate 0.1562 Epoch: 1 Global Step: 5400 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:03:51,051-Speed 10495.76 samples/sec Loss 19.5297 LearningRate 0.1565 Epoch: 1 Global Step: 5410 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:03:58,826-Speed 10537.98 samples/sec Loss 19.4194 LearningRate 0.1568 Epoch: 1 Global Step: 5420 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:04:06,618-Speed 10518.42 samples/sec Loss 19.4323 LearningRate 0.1571 Epoch: 1 Global Step: 5430 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:04:14,440-Speed 10474.93 samples/sec Loss 19.3078 LearningRate 0.1574 Epoch: 1 Global Step: 5440 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:04:22,262-Speed 10474.79 samples/sec Loss 19.1968 LearningRate 0.1577 Epoch: 1 Global Step: 5450 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:04:30,079-Speed 10482.28 samples/sec Loss 19.2155 LearningRate 0.1580 Epoch: 1 Global Step: 5460 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:04:37,890-Speed 10494.60 samples/sec Loss 19.1448 LearningRate 0.1583 Epoch: 1 Global Step: 5470 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:04:45,686-Speed 10510.51 samples/sec Loss 19.0557 LearningRate 0.1586 Epoch: 1 Global Step: 5480 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:04:53,501-Speed 10484.00 samples/sec Loss 18.9721 LearningRate 0.1589 Epoch: 1 Global Step: 5490 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:05:01,311-Speed 10491.43 samples/sec Loss 18.9531 LearningRate 0.1591 Epoch: 1 Global Step: 5500 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:05:09,107-Speed 10508.63 samples/sec Loss 18.9756 LearningRate 0.1594 Epoch: 1 Global Step: 5510 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:05:16,908-Speed 10504.09 samples/sec Loss 18.8561 LearningRate 0.1597 Epoch: 1 Global Step: 5520 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:05:24,700-Speed 10514.57 samples/sec Loss 18.7858 LearningRate 0.1600 Epoch: 1 Global Step: 5530 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:05:32,478-Speed 10534.49 samples/sec Loss 18.7619 LearningRate 0.1603 Epoch: 1 Global Step: 5540 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:05:40,263-Speed 10533.63 samples/sec Loss 18.5925 LearningRate 0.1606 Epoch: 1 Global Step: 5550 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:05:48,117-Speed 10432.46 samples/sec Loss 18.5888 LearningRate 0.1609 Epoch: 1 Global Step: 5560 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:05:55,916-Speed 10505.93 samples/sec Loss 18.6380 LearningRate 0.1612 Epoch: 1 Global Step: 5570 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:06:03,704-Speed 10520.55 samples/sec Loss 18.5644 LearningRate 0.1615 Epoch: 1 Global Step: 5580 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:06:11,534-Speed 10464.80 samples/sec Loss 18.4567 LearningRate 0.1617 Epoch: 1 Global Step: 5590 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:06:19,349-Speed 10483.26 samples/sec Loss 18.4461 LearningRate 0.1620 Epoch: 1 Global Step: 5600 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:06:27,122-Speed 10542.00 samples/sec Loss 18.4130 LearningRate 0.1623 Epoch: 1 Global Step: 5610 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:06:34,904-Speed 10529.47 samples/sec Loss 18.3210 LearningRate 0.1626 Epoch: 1 Global Step: 5620 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:06:42,696-Speed 10516.07 samples/sec Loss 18.2544 LearningRate 0.1629 Epoch: 1 Global Step: 5630 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:06:50,478-Speed 10529.41 samples/sec Loss 18.2818 LearningRate 0.1632 Epoch: 1 Global Step: 5640 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:06:58,246-Speed 10548.60 samples/sec Loss 18.1940 LearningRate 0.1635 Epoch: 1 Global Step: 5650 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:07:06,011-Speed 10552.23 samples/sec Loss 18.2608 LearningRate 0.1638 Epoch: 1 Global Step: 5660 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:07:13,780-Speed 10545.45 samples/sec Loss 18.0607 LearningRate 0.1641 Epoch: 1 Global Step: 5670 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:07:21,561-Speed 10531.27 samples/sec Loss 18.0505 LearningRate 0.1644 Epoch: 1 Global Step: 5680 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:07:29,333-Speed 10541.81 samples/sec Loss 18.0149 LearningRate 0.1646 Epoch: 1 Global Step: 5690 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:07:37,175-Speed 10447.57 samples/sec Loss 17.9031 LearningRate 0.1649 Epoch: 1 Global Step: 5700 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:07:44,963-Speed 10521.47 samples/sec Loss 17.8329 LearningRate 0.1652 Epoch: 1 Global Step: 5710 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:07:52,773-Speed 10490.58 samples/sec Loss 17.8037 LearningRate 0.1655 Epoch: 1 Global Step: 5720 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:08:00,579-Speed 10496.42 samples/sec Loss 17.7336 LearningRate 0.1658 Epoch: 1 Global Step: 5730 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:08:08,368-Speed 10520.48 samples/sec Loss 17.6466 LearningRate 0.1661 Epoch: 1 Global Step: 5740 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:08:16,138-Speed 10544.79 samples/sec Loss 17.6486 LearningRate 0.1664 Epoch: 1 Global Step: 5750 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:08:23,930-Speed 10515.53 samples/sec Loss 17.6712 LearningRate 0.1667 Epoch: 1 Global Step: 5760 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:08:31,751-Speed 10475.37 samples/sec Loss 17.6085 LearningRate 0.1670 Epoch: 1 Global Step: 5770 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:08:39,553-Speed 10501.98 samples/sec Loss 17.5113 LearningRate 0.1672 Epoch: 1 Global Step: 5780 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:08:47,334-Speed 10530.80 samples/sec Loss 17.5225 LearningRate 0.1675 Epoch: 1 Global Step: 5790 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:08:55,111-Speed 10535.85 samples/sec Loss 17.4781 LearningRate 0.1678 Epoch: 1 Global Step: 5800 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:09:02,908-Speed 10508.93 samples/sec Loss 17.3773 LearningRate 0.1681 Epoch: 1 Global Step: 5810 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:09:10,721-Speed 10486.79 samples/sec Loss 17.3223 LearningRate 0.1684 Epoch: 1 Global Step: 5820 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:09:18,486-Speed 10552.01 samples/sec Loss 17.2909 LearningRate 0.1687 Epoch: 1 Global Step: 5830 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:09:26,252-Speed 10556.49 samples/sec Loss 17.2713 LearningRate 0.1690 Epoch: 1 Global Step: 5840 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:09:34,022-Speed 10544.77 samples/sec Loss 17.1329 LearningRate 0.1693 Epoch: 1 Global Step: 5850 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:09:41,815-Speed 10513.33 samples/sec Loss 17.1332 LearningRate 0.1696 Epoch: 1 Global Step: 5860 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:09:49,627-Speed 10488.45 samples/sec Loss 17.1117 LearningRate 0.1698 Epoch: 1 Global Step: 5870 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:09:57,417-Speed 10517.97 samples/sec Loss 17.0782 LearningRate 0.1701 Epoch: 1 Global Step: 5880 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:10:05,197-Speed 10530.75 samples/sec Loss 17.1290 LearningRate 0.1704 Epoch: 1 Global Step: 5890 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:10:12,987-Speed 10516.18 samples/sec Loss 16.9668 LearningRate 0.1707 Epoch: 1 Global Step: 5900 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:10:20,767-Speed 10531.87 samples/sec Loss 16.8363 LearningRate 0.1710 Epoch: 1 Global Step: 5910 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:10:28,561-Speed 10512.48 samples/sec Loss 16.8930 LearningRate 0.1713 Epoch: 1 Global Step: 5920 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:10:36,372-Speed 10488.68 samples/sec Loss 16.8121 LearningRate 0.1716 Epoch: 1 Global Step: 5930 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:10:44,146-Speed 10540.98 samples/sec Loss 16.8657 LearningRate 0.1719 Epoch: 1 Global Step: 5940 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:10:51,911-Speed 10551.64 samples/sec Loss 16.7803 LearningRate 0.1722 Epoch: 1 Global Step: 5950 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:10:59,759-Speed 10438.73 samples/sec Loss 16.7216 LearningRate 0.1725 Epoch: 1 Global Step: 5960 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:11:07,616-Speed 10428.94 samples/sec Loss 16.7231 LearningRate 0.1727 Epoch: 1 Global Step: 5970 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:11:15,461-Speed 10445.13 samples/sec Loss 16.6267 LearningRate 0.1730 Epoch: 1 Global Step: 5980 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:11:23,292-Speed 10462.16 samples/sec Loss 16.6828 LearningRate 0.1733 Epoch: 1 Global Step: 5990 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:11:31,133-Speed 10450.39 samples/sec Loss 16.5863 LearningRate 0.1736 Epoch: 1 Global Step: 6000 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:11:39,041-Speed 10361.53 samples/sec Loss 16.5387 LearningRate 0.1739 Epoch: 1 Global Step: 6010 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:11:46,865-Speed 10473.35 samples/sec Loss 16.4689 LearningRate 0.1742 Epoch: 1 Global Step: 6020 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:11:54,656-Speed 10516.21 samples/sec Loss 16.4359 LearningRate 0.1745 Epoch: 1 Global Step: 6030 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:12:02,448-Speed 10518.14 samples/sec Loss 16.3769 LearningRate 0.1748 Epoch: 1 Global Step: 6040 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:12:10,253-Speed 10497.75 samples/sec Loss 16.3304 LearningRate 0.1751 Epoch: 1 Global Step: 6050 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:12:18,068-Speed 10484.77 samples/sec Loss 16.2754 LearningRate 0.1753 Epoch: 1 Global Step: 6060 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:12:25,908-Speed 10451.15 samples/sec Loss 16.2832 LearningRate 0.1756 Epoch: 1 Global Step: 6070 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:12:33,716-Speed 10492.95 samples/sec Loss 16.1898 LearningRate 0.1759 Epoch: 1 Global Step: 6080 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:12:41,528-Speed 10488.82 samples/sec Loss 16.1883 LearningRate 0.1762 Epoch: 1 Global Step: 6090 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:12:49,342-Speed 10485.90 samples/sec Loss 16.1261 LearningRate 0.1765 Epoch: 1 Global Step: 6100 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:12:57,159-Speed 10480.99 samples/sec Loss 16.0928 LearningRate 0.1768 Epoch: 1 Global Step: 6110 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:13:05,047-Speed 10387.18 samples/sec Loss 16.1268 LearningRate 0.1771 Epoch: 1 Global Step: 6120 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:13:12,836-Speed 10518.54 samples/sec Loss 16.0212 LearningRate 0.1774 Epoch: 1 Global Step: 6130 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-01-15 16:13:20,665-Speed 10466.27 samples/sec Loss 15.9875 LearningRate 0.1777 Epoch: 1 Global Step: 6140 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-01-15 16:13:28,480-Speed 10484.54 samples/sec Loss 16.0045 LearningRate 0.1780 Epoch: 1 Global Step: 6150 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-01-15 16:13:36,274-Speed 10513.59 samples/sec Loss 15.9785 LearningRate 0.1782 Epoch: 1 Global Step: 6160 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-01-15 16:13:44,065-Speed 10517.71 samples/sec Loss 15.8755 LearningRate 0.1785 Epoch: 1 Global Step: 6170 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-01-15 16:13:51,880-Speed 10484.05 samples/sec Loss 15.9028 LearningRate 0.1788 Epoch: 1 Global Step: 6180 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-01-15 16:13:59,684-Speed 10499.97 samples/sec Loss 15.8497 LearningRate 0.1791 Epoch: 1 Global Step: 6190 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-01-15 16:14:07,541-Speed 10429.87 samples/sec Loss 15.7970 LearningRate 0.1794 Epoch: 1 Global Step: 6200 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-01-15 16:14:15,355-Speed 10486.22 samples/sec Loss 15.6995 LearningRate 0.1797 Epoch: 1 Global Step: 6210 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-01-15 16:14:23,159-Speed 10498.49 samples/sec Loss 15.6312 LearningRate 0.1800 Epoch: 1 Global Step: 6220 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-01-15 16:14:30,961-Speed 10502.75 samples/sec Loss 15.6446 LearningRate 0.1803 Epoch: 1 Global Step: 6230 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-15 16:14:38,785-Speed 10472.59 samples/sec Loss 15.5847 LearningRate 0.1806 Epoch: 1 Global Step: 6240 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-15 16:14:46,609-Speed 10472.89 samples/sec Loss 15.6209 LearningRate 0.1808 Epoch: 1 Global Step: 6250 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-15 16:14:54,436-Speed 10468.87 samples/sec Loss 15.5582 LearningRate 0.1811 Epoch: 1 Global Step: 6260 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-15 16:15:02,265-Speed 10464.24 samples/sec Loss 15.5140 LearningRate 0.1814 Epoch: 1 Global Step: 6270 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-15 16:15:10,072-Speed 10495.81 samples/sec Loss 15.4925 LearningRate 0.1817 Epoch: 1 Global Step: 6280 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-15 16:15:17,870-Speed 10507.32 samples/sec Loss 15.4063 LearningRate 0.1820 Epoch: 1 Global Step: 6290 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-15 16:15:25,687-Speed 10481.31 samples/sec Loss 15.4281 LearningRate 0.1823 Epoch: 1 Global Step: 6300 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-15 16:15:33,522-Speed 10456.34 samples/sec Loss 15.3631 LearningRate 0.1826 Epoch: 1 Global Step: 6310 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-15 16:15:41,357-Speed 10458.69 samples/sec Loss 15.3153 LearningRate 0.1829 Epoch: 1 Global Step: 6320 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-01-15 16:15:49,173-Speed 10481.93 samples/sec Loss 15.3691 LearningRate 0.1832 Epoch: 1 Global Step: 6330 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:15:56,990-Speed 10480.77 samples/sec Loss 15.3549 LearningRate 0.1834 Epoch: 1 Global Step: 6340 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:16:04,821-Speed 10462.85 samples/sec Loss 15.2624 LearningRate 0.1837 Epoch: 1 Global Step: 6350 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:16:12,667-Speed 10443.27 samples/sec Loss 15.2029 LearningRate 0.1840 Epoch: 1 Global Step: 6360 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:16:20,476-Speed 10491.75 samples/sec Loss 15.1562 LearningRate 0.1843 Epoch: 1 Global Step: 6370 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:16:28,305-Speed 10467.81 samples/sec Loss 15.1235 LearningRate 0.1846 Epoch: 1 Global Step: 6380 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:16:36,131-Speed 10468.67 samples/sec Loss 15.1337 LearningRate 0.1849 Epoch: 1 Global Step: 6390 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:16:43,948-Speed 10481.75 samples/sec Loss 15.0772 LearningRate 0.1852 Epoch: 1 Global Step: 6400 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:16:51,791-Speed 10447.18 samples/sec Loss 15.0016 LearningRate 0.1855 Epoch: 1 Global Step: 6410 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:16:59,604-Speed 10486.47 samples/sec Loss 15.0292 LearningRate 0.1858 Epoch: 1 Global Step: 6420 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:17:07,411-Speed 10493.84 samples/sec Loss 14.9701 LearningRate 0.1861 Epoch: 1 Global Step: 6430 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:17:15,265-Speed 10432.88 samples/sec Loss 14.9295 LearningRate 0.1863 Epoch: 1 Global Step: 6440 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:17:23,083-Speed 10480.18 samples/sec Loss 14.9026 LearningRate 0.1866 Epoch: 1 Global Step: 6450 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:17:30,925-Speed 10447.26 samples/sec Loss 14.8515 LearningRate 0.1869 Epoch: 1 Global Step: 6460 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:17:38,750-Speed 10470.57 samples/sec Loss 14.9038 LearningRate 0.1872 Epoch: 1 Global Step: 6470 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:17:46,548-Speed 10506.44 samples/sec Loss 14.8624 LearningRate 0.1875 Epoch: 1 Global Step: 6480 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:17:54,387-Speed 10452.18 samples/sec Loss 14.8428 LearningRate 0.1878 Epoch: 1 Global Step: 6490 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:18:02,203-Speed 10481.69 samples/sec Loss 14.8304 LearningRate 0.1881 Epoch: 1 Global Step: 6500 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:18:10,035-Speed 10462.22 samples/sec Loss 14.7872 LearningRate 0.1884 Epoch: 1 Global Step: 6510 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:18:17,842-Speed 10494.02 samples/sec Loss 14.7285 LearningRate 0.1887 Epoch: 1 Global Step: 6520 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:18:25,654-Speed 10493.92 samples/sec Loss 14.6669 LearningRate 0.1889 Epoch: 1 Global Step: 6530 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:18:33,477-Speed 10472.50 samples/sec Loss 14.7177 LearningRate 0.1892 Epoch: 1 Global Step: 6540 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:18:41,331-Speed 10433.10 samples/sec Loss 14.6245 LearningRate 0.1895 Epoch: 1 Global Step: 6550 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:18:49,192-Speed 10422.36 samples/sec Loss 14.5728 LearningRate 0.1898 Epoch: 1 Global Step: 6560 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:18:57,004-Speed 10489.94 samples/sec Loss 14.5656 LearningRate 0.1901 Epoch: 1 Global Step: 6570 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:19:04,876-Speed 10412.38 samples/sec Loss 14.5242 LearningRate 0.1904 Epoch: 1 Global Step: 6580 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:19:12,699-Speed 10474.54 samples/sec Loss 14.5771 LearningRate 0.1907 Epoch: 1 Global Step: 6590 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:19:20,504-Speed 10498.25 samples/sec Loss 14.5146 LearningRate 0.1910 Epoch: 1 Global Step: 6600 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:19:28,352-Speed 10441.00 samples/sec Loss 14.4515 LearningRate 0.1913 Epoch: 1 Global Step: 6610 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:19:36,169-Speed 10481.21 samples/sec Loss 14.4438 LearningRate 0.1916 Epoch: 1 Global Step: 6620 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:19:43,992-Speed 10473.85 samples/sec Loss 14.4424 LearningRate 0.1918 Epoch: 1 Global Step: 6630 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:19:51,833-Speed 10449.74 samples/sec Loss 14.3928 LearningRate 0.1921 Epoch: 1 Global Step: 6640 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:19:59,673-Speed 10450.95 samples/sec Loss 14.2987 LearningRate 0.1924 Epoch: 1 Global Step: 6650 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:20:07,505-Speed 10461.65 samples/sec Loss 14.3324 LearningRate 0.1927 Epoch: 1 Global Step: 6660 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:20:15,331-Speed 10469.94 samples/sec Loss 14.3624 LearningRate 0.1930 Epoch: 1 Global Step: 6670 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:20:23,149-Speed 10481.15 samples/sec Loss 14.2144 LearningRate 0.1933 Epoch: 1 Global Step: 6680 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:20:30,982-Speed 10459.78 samples/sec Loss 14.2143 LearningRate 0.1936 Epoch: 1 Global Step: 6690 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:20:38,794-Speed 10488.48 samples/sec Loss 14.1985 LearningRate 0.1939 Epoch: 1 Global Step: 6700 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:20:46,631-Speed 10454.77 samples/sec Loss 14.1937 LearningRate 0.1942 Epoch: 1 Global Step: 6710 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:20:54,453-Speed 10476.04 samples/sec Loss 14.1576 LearningRate 0.1944 Epoch: 1 Global Step: 6720 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:21:02,272-Speed 10479.52 samples/sec Loss 14.1459 LearningRate 0.1947 Epoch: 1 Global Step: 6730 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:21:10,091-Speed 10478.27 samples/sec Loss 14.1308 LearningRate 0.1950 Epoch: 1 Global Step: 6740 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:21:17,928-Speed 10455.43 samples/sec Loss 14.1182 LearningRate 0.1953 Epoch: 1 Global Step: 6750 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:21:25,766-Speed 10455.94 samples/sec Loss 14.1072 LearningRate 0.1956 Epoch: 1 Global Step: 6760 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:21:33,590-Speed 10472.51 samples/sec Loss 14.0425 LearningRate 0.1959 Epoch: 1 Global Step: 6770 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:21:41,418-Speed 10468.51 samples/sec Loss 14.0038 LearningRate 0.1962 Epoch: 1 Global Step: 6780 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:21:49,269-Speed 10436.20 samples/sec Loss 13.9770 LearningRate 0.1965 Epoch: 1 Global Step: 6790 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:21:57,086-Speed 10481.54 samples/sec Loss 13.9645 LearningRate 0.1968 Epoch: 1 Global Step: 6800 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:22:04,910-Speed 10472.21 samples/sec Loss 13.9568 LearningRate 0.1970 Epoch: 1 Global Step: 6810 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:22:12,713-Speed 10500.58 samples/sec Loss 13.8665 LearningRate 0.1973 Epoch: 1 Global Step: 6820 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:22:20,527-Speed 10485.14 samples/sec Loss 13.8960 LearningRate 0.1976 Epoch: 1 Global Step: 6830 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:22:28,363-Speed 10456.05 samples/sec Loss 13.7532 LearningRate 0.1979 Epoch: 1 Global Step: 6840 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:22:36,174-Speed 10489.75 samples/sec Loss 13.8504 LearningRate 0.1982 Epoch: 1 Global Step: 6850 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:22:43,952-Speed 10542.04 samples/sec Loss 13.8627 LearningRate 0.1985 Epoch: 1 Global Step: 6860 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:22:51,754-Speed 10505.98 samples/sec Loss 13.8304 LearningRate 0.1988 Epoch: 1 Global Step: 6870 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:22:59,544-Speed 10518.15 samples/sec Loss 13.8071 LearningRate 0.1991 Epoch: 1 Global Step: 6880 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:23:07,335-Speed 10517.38 samples/sec Loss 13.7442 LearningRate 0.1994 Epoch: 1 Global Step: 6890 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:23:15,129-Speed 10512.40 samples/sec Loss 13.7089 LearningRate 0.1997 Epoch: 1 Global Step: 6900 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:23:22,935-Speed 10496.54 samples/sec Loss 13.7029 LearningRate 0.1999 Epoch: 1 Global Step: 6910 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:23:30,761-Speed 10471.21 samples/sec Loss 13.7903 LearningRate 0.2002 Epoch: 1 Global Step: 6920 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:23:38,571-Speed 10490.10 samples/sec Loss 13.6416 LearningRate 0.2005 Epoch: 1 Global Step: 6930 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:23:46,352-Speed 10529.98 samples/sec Loss 13.6929 LearningRate 0.2008 Epoch: 1 Global Step: 6940 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:23:54,192-Speed 10451.26 samples/sec Loss 13.6043 LearningRate 0.2011 Epoch: 1 Global Step: 6950 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:24:01,977-Speed 10524.70 samples/sec Loss 13.6320 LearningRate 0.2014 Epoch: 1 Global Step: 6960 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:24:09,807-Speed 10464.18 samples/sec Loss 13.5977 LearningRate 0.2017 Epoch: 1 Global Step: 6970 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:24:17,606-Speed 10504.39 samples/sec Loss 13.6072 LearningRate 0.2020 Epoch: 1 Global Step: 6980 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:24:25,426-Speed 10476.61 samples/sec Loss 13.4894 LearningRate 0.2023 Epoch: 1 Global Step: 6990 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:24:33,276-Speed 10447.64 samples/sec Loss 13.4887 LearningRate 0.2025 Epoch: 1 Global Step: 7000 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:24:41,060-Speed 10525.55 samples/sec Loss 13.5295 LearningRate 0.2028 Epoch: 1 Global Step: 7010 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:24:48,860-Speed 10503.79 samples/sec Loss 13.4742 LearningRate 0.2031 Epoch: 1 Global Step: 7020 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:24:56,670-Speed 10491.13 samples/sec Loss 13.4851 LearningRate 0.2034 Epoch: 1 Global Step: 7030 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:25:04,458-Speed 10521.01 samples/sec Loss 13.4577 LearningRate 0.2037 Epoch: 1 Global Step: 7040 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:25:12,256-Speed 10507.32 samples/sec Loss 13.3776 LearningRate 0.2040 Epoch: 1 Global Step: 7050 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:25:20,041-Speed 10523.66 samples/sec Loss 13.4737 LearningRate 0.2043 Epoch: 1 Global Step: 7060 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:25:27,848-Speed 10494.97 samples/sec Loss 13.3618 LearningRate 0.2046 Epoch: 1 Global Step: 7070 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:25:35,684-Speed 10455.77 samples/sec Loss 13.3411 LearningRate 0.2049 Epoch: 1 Global Step: 7080 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:25:43,515-Speed 10467.14 samples/sec Loss 13.3810 LearningRate 0.2052 Epoch: 1 Global Step: 7090 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:25:51,312-Speed 10509.31 samples/sec Loss 13.3300 LearningRate 0.2054 Epoch: 1 Global Step: 7100 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:25:59,143-Speed 10462.06 samples/sec Loss 13.3492 LearningRate 0.2057 Epoch: 1 Global Step: 7110 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:26:06,949-Speed 10496.95 samples/sec Loss 13.3506 LearningRate 0.2060 Epoch: 1 Global Step: 7120 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:26:14,758-Speed 10493.34 samples/sec Loss 13.2424 LearningRate 0.2063 Epoch: 1 Global Step: 7130 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:26:22,555-Speed 10507.59 samples/sec Loss 13.2292 LearningRate 0.2066 Epoch: 1 Global Step: 7140 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:26:30,381-Speed 10469.27 samples/sec Loss 13.1954 LearningRate 0.2069 Epoch: 1 Global Step: 7150 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:26:38,205-Speed 10472.43 samples/sec Loss 13.2087 LearningRate 0.2072 Epoch: 1 Global Step: 7160 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:26:46,044-Speed 10455.70 samples/sec Loss 13.2381 LearningRate 0.2075 Epoch: 1 Global Step: 7170 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:26:53,852-Speed 10494.46 samples/sec Loss 13.1533 LearningRate 0.2078 Epoch: 1 Global Step: 7180 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:27:01,656-Speed 10499.66 samples/sec Loss 13.1414 LearningRate 0.2080 Epoch: 1 Global Step: 7190 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:27:09,476-Speed 10477.29 samples/sec Loss 13.0861 LearningRate 0.2083 Epoch: 1 Global Step: 7200 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:27:17,276-Speed 10505.18 samples/sec Loss 13.1478 LearningRate 0.2086 Epoch: 1 Global Step: 7210 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:27:25,082-Speed 10496.81 samples/sec Loss 13.0986 LearningRate 0.2089 Epoch: 1 Global Step: 7220 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:27:32,898-Speed 10482.61 samples/sec Loss 13.0794 LearningRate 0.2092 Epoch: 1 Global Step: 7230 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:27:40,740-Speed 10448.04 samples/sec Loss 13.0212 LearningRate 0.2095 Epoch: 1 Global Step: 7240 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:27:48,574-Speed 10458.67 samples/sec Loss 13.0420 LearningRate 0.2098 Epoch: 1 Global Step: 7250 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:27:56,359-Speed 10525.93 samples/sec Loss 13.0299 LearningRate 0.2101 Epoch: 1 Global Step: 7260 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:28:04,145-Speed 10522.93 samples/sec Loss 12.9774 LearningRate 0.2104 Epoch: 1 Global Step: 7270 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:28:11,956-Speed 10489.15 samples/sec Loss 12.9439 LearningRate 0.2106 Epoch: 1 Global Step: 7280 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:28:19,738-Speed 10529.22 samples/sec Loss 12.9861 LearningRate 0.2109 Epoch: 1 Global Step: 7290 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:28:27,558-Speed 10477.24 samples/sec Loss 12.9449 LearningRate 0.2112 Epoch: 1 Global Step: 7300 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:28:35,380-Speed 10475.87 samples/sec Loss 12.9539 LearningRate 0.2115 Epoch: 1 Global Step: 7310 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:28:43,188-Speed 10494.22 samples/sec Loss 12.8787 LearningRate 0.2118 Epoch: 1 Global Step: 7320 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:28:50,989-Speed 10503.22 samples/sec Loss 12.8944 LearningRate 0.2121 Epoch: 1 Global Step: 7330 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:28:58,798-Speed 10491.76 samples/sec Loss 12.9011 LearningRate 0.2124 Epoch: 1 Global Step: 7340 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:29:06,587-Speed 10520.40 samples/sec Loss 12.8390 LearningRate 0.2127 Epoch: 1 Global Step: 7350 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:29:14,421-Speed 10459.13 samples/sec Loss 12.8835 LearningRate 0.2130 Epoch: 1 Global Step: 7360 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:29:22,250-Speed 10465.56 samples/sec Loss 12.8311 LearningRate 0.2133 Epoch: 1 Global Step: 7370 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:29:30,037-Speed 10522.06 samples/sec Loss 12.8248 LearningRate 0.2135 Epoch: 1 Global Step: 7380 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:29:37,835-Speed 10510.14 samples/sec Loss 12.7816 LearningRate 0.2138 Epoch: 1 Global Step: 7390 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:29:45,651-Speed 10482.42 samples/sec Loss 12.8135 LearningRate 0.2141 Epoch: 1 Global Step: 7400 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:29:53,457-Speed 10495.75 samples/sec Loss 12.8424 LearningRate 0.2144 Epoch: 1 Global Step: 7410 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:30:01,257-Speed 10505.48 samples/sec Loss 12.8746 LearningRate 0.2147 Epoch: 1 Global Step: 7420 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:30:09,098-Speed 10448.38 samples/sec Loss 12.7487 LearningRate 0.2150 Epoch: 1 Global Step: 7430 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:30:16,915-Speed 10482.03 samples/sec Loss 12.7455 LearningRate 0.2153 Epoch: 1 Global Step: 7440 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:30:24,723-Speed 10493.19 samples/sec Loss 12.7063 LearningRate 0.2156 Epoch: 1 Global Step: 7450 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:30:32,564-Speed 10450.00 samples/sec Loss 12.7045 LearningRate 0.2159 Epoch: 1 Global Step: 7460 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:30:40,359-Speed 10510.97 samples/sec Loss 12.6972 LearningRate 0.2161 Epoch: 1 Global Step: 7470 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:30:48,138-Speed 10531.81 samples/sec Loss 12.6699 LearningRate 0.2164 Epoch: 1 Global Step: 7480 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:30:55,936-Speed 10507.83 samples/sec Loss 12.6814 LearningRate 0.2167 Epoch: 1 Global Step: 7490 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:31:03,780-Speed 10444.58 samples/sec Loss 12.6577 LearningRate 0.2170 Epoch: 1 Global Step: 7500 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:31:11,572-Speed 10516.25 samples/sec Loss 12.6433 LearningRate 0.2173 Epoch: 1 Global Step: 7510 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:31:19,382-Speed 10491.95 samples/sec Loss 12.6610 LearningRate 0.2176 Epoch: 1 Global Step: 7520 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:31:27,175-Speed 10514.64 samples/sec Loss 12.6079 LearningRate 0.2179 Epoch: 1 Global Step: 7530 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:31:34,988-Speed 10491.91 samples/sec Loss 12.6361 LearningRate 0.2182 Epoch: 1 Global Step: 7540 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:31:42,799-Speed 10490.75 samples/sec Loss 12.6176 LearningRate 0.2185 Epoch: 1 Global Step: 7550 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:31:50,607-Speed 10494.71 samples/sec Loss 12.6193 LearningRate 0.2187 Epoch: 1 Global Step: 7560 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:31:58,425-Speed 10481.12 samples/sec Loss 12.5687 LearningRate 0.2190 Epoch: 1 Global Step: 7570 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:32:06,231-Speed 10497.53 samples/sec Loss 12.5497 LearningRate 0.2193 Epoch: 1 Global Step: 7580 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:32:14,046-Speed 10484.23 samples/sec Loss 12.4628 LearningRate 0.2196 Epoch: 1 Global Step: 7590 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:32:21,862-Speed 10483.34 samples/sec Loss 12.5350 LearningRate 0.2199 Epoch: 1 Global Step: 7600 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:32:29,693-Speed 10463.87 samples/sec Loss 12.5011 LearningRate 0.2202 Epoch: 1 Global Step: 7610 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:32:37,504-Speed 10491.79 samples/sec Loss 12.5098 LearningRate 0.2205 Epoch: 1 Global Step: 7620 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:32:45,286-Speed 10529.92 samples/sec Loss 12.4639 LearningRate 0.2208 Epoch: 1 Global Step: 7630 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:32:53,070-Speed 10527.50 samples/sec Loss 12.4655 LearningRate 0.2211 Epoch: 1 Global Step: 7640 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:33:00,886-Speed 10482.36 samples/sec Loss 12.4872 LearningRate 0.2214 Epoch: 1 Global Step: 7650 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:33:08,682-Speed 10510.68 samples/sec Loss 12.4759 LearningRate 0.2216 Epoch: 1 Global Step: 7660 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:33:16,510-Speed 10466.68 samples/sec Loss 12.4609 LearningRate 0.2219 Epoch: 1 Global Step: 7670 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:33:24,328-Speed 10480.85 samples/sec Loss 12.4898 LearningRate 0.2222 Epoch: 1 Global Step: 7680 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:33:32,123-Speed 10510.77 samples/sec Loss 12.4782 LearningRate 0.2225 Epoch: 1 Global Step: 7690 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:33:39,944-Speed 10475.59 samples/sec Loss 12.4631 LearningRate 0.2228 Epoch: 1 Global Step: 7700 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:33:47,745-Speed 10503.35 samples/sec Loss 12.4007 LearningRate 0.2231 Epoch: 1 Global Step: 7710 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:33:55,554-Speed 10491.98 samples/sec Loss 12.3831 LearningRate 0.2234 Epoch: 1 Global Step: 7720 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:34:03,375-Speed 10476.63 samples/sec Loss 12.3720 LearningRate 0.2237 Epoch: 1 Global Step: 7730 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:34:11,208-Speed 10459.47 samples/sec Loss 12.3739 LearningRate 0.2240 Epoch: 1 Global Step: 7740 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:34:18,993-Speed 10524.02 samples/sec Loss 12.3496 LearningRate 0.2242 Epoch: 1 Global Step: 7750 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:34:26,800-Speed 10494.49 samples/sec Loss 12.3399 LearningRate 0.2245 Epoch: 1 Global Step: 7760 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:34:34,607-Speed 10495.25 samples/sec Loss 12.2857 LearningRate 0.2248 Epoch: 1 Global Step: 7770 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:34:42,406-Speed 10505.19 samples/sec Loss 12.2689 LearningRate 0.2251 Epoch: 1 Global Step: 7780 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:34:50,234-Speed 10467.14 samples/sec Loss 12.2897 LearningRate 0.2254 Epoch: 1 Global Step: 7790 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:34:58,060-Speed 10469.11 samples/sec Loss 12.2552 LearningRate 0.2257 Epoch: 1 Global Step: 7800 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:35:05,867-Speed 10496.03 samples/sec Loss 12.2977 LearningRate 0.2260 Epoch: 1 Global Step: 7810 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:35:13,650-Speed 10526.36 samples/sec Loss 12.3160 LearningRate 0.2263 Epoch: 1 Global Step: 7820 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:35:21,485-Speed 10457.64 samples/sec Loss 12.2141 LearningRate 0.2266 Epoch: 1 Global Step: 7830 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:35:29,282-Speed 10508.36 samples/sec Loss 12.2689 LearningRate 0.2269 Epoch: 1 Global Step: 7840 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:35:37,123-Speed 10450.48 samples/sec Loss 12.3546 LearningRate 0.2271 Epoch: 1 Global Step: 7850 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:35:44,934-Speed 10487.76 samples/sec Loss 12.7375 LearningRate 0.2274 Epoch: 1 Global Step: 7860 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:35:52,749-Speed 10484.02 samples/sec Loss 13.3437 LearningRate 0.2277 Epoch: 1 Global Step: 7870 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:36:00,530-Speed 10530.98 samples/sec Loss 13.2974 LearningRate 0.2280 Epoch: 1 Global Step: 7880 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:36:08,383-Speed 10434.12 samples/sec Loss 12.8482 LearningRate 0.2283 Epoch: 1 Global Step: 7890 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:36:16,149-Speed 10549.64 samples/sec Loss 12.5436 LearningRate 0.2286 Epoch: 1 Global Step: 7900 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:36:23,932-Speed 10526.93 samples/sec Loss 12.3282 LearningRate 0.2289 Epoch: 1 Global Step: 7910 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:36:31,738-Speed 10497.35 samples/sec Loss 12.2899 LearningRate 0.2292 Epoch: 1 Global Step: 7920 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:36:39,523-Speed 10525.12 samples/sec Loss 12.3398 LearningRate 0.2295 Epoch: 1 Global Step: 7930 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:36:47,333-Speed 10491.44 samples/sec Loss 12.2825 LearningRate 0.2297 Epoch: 1 Global Step: 7940 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:36:55,150-Speed 10481.14 samples/sec Loss 12.2351 LearningRate 0.2300 Epoch: 1 Global Step: 7950 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:37:02,957-Speed 10495.18 samples/sec Loss 12.1520 LearningRate 0.2303 Epoch: 1 Global Step: 7960 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:37:10,729-Speed 10542.38 samples/sec Loss 12.2647 LearningRate 0.2306 Epoch: 1 Global Step: 7970 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:37:18,529-Speed 10505.15 samples/sec Loss 12.2722 LearningRate 0.2309 Epoch: 1 Global Step: 7980 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:37:26,323-Speed 10512.54 samples/sec Loss 12.2129 LearningRate 0.2312 Epoch: 1 Global Step: 7990 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:37:34,098-Speed 10539.16 samples/sec Loss 12.1665 LearningRate 0.2315 Epoch: 1 Global Step: 8000 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:37:41,894-Speed 10509.21 samples/sec Loss 12.1425 LearningRate 0.2318 Epoch: 1 Global Step: 8010 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:37:49,692-Speed 10506.59 samples/sec Loss 12.2000 LearningRate 0.2321 Epoch: 1 Global Step: 8020 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:37:57,518-Speed 10470.39 samples/sec Loss 12.0602 LearningRate 0.2323 Epoch: 1 Global Step: 8030 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:38:05,368-Speed 10438.08 samples/sec Loss 12.1357 LearningRate 0.2326 Epoch: 1 Global Step: 8040 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:38:13,177-Speed 10492.41 samples/sec Loss 12.1344 LearningRate 0.2329 Epoch: 1 Global Step: 8050 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:38:20,956-Speed 10533.79 samples/sec Loss 12.0873 LearningRate 0.2332 Epoch: 1 Global Step: 8060 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:38:28,773-Speed 10481.70 samples/sec Loss 12.0849 LearningRate 0.2335 Epoch: 1 Global Step: 8070 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:38:36,601-Speed 10473.00 samples/sec Loss 12.0774 LearningRate 0.2338 Epoch: 1 Global Step: 8080 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:38:44,401-Speed 10503.65 samples/sec Loss 12.0163 LearningRate 0.2341 Epoch: 1 Global Step: 8090 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:38:52,205-Speed 10499.59 samples/sec Loss 12.0346 LearningRate 0.2344 Epoch: 1 Global Step: 8100 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:39:00,004-Speed 10506.85 samples/sec Loss 11.9754 LearningRate 0.2347 Epoch: 1 Global Step: 8110 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:39:07,813-Speed 10493.13 samples/sec Loss 12.0105 LearningRate 0.2350 Epoch: 1 Global Step: 8120 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:39:15,635-Speed 10473.63 samples/sec Loss 12.0141 LearningRate 0.2352 Epoch: 1 Global Step: 8130 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:39:23,408-Speed 10541.03 samples/sec Loss 12.1052 LearningRate 0.2355 Epoch: 1 Global Step: 8140 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:39:31,195-Speed 10522.10 samples/sec Loss 12.0172 LearningRate 0.2358 Epoch: 1 Global Step: 8150 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:39:38,983-Speed 10520.79 samples/sec Loss 12.0111 LearningRate 0.2361 Epoch: 1 Global Step: 8160 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:39:46,775-Speed 10515.66 samples/sec Loss 11.9759 LearningRate 0.2364 Epoch: 1 Global Step: 8170 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:39:54,580-Speed 10496.57 samples/sec Loss 11.9699 LearningRate 0.2367 Epoch: 1 Global Step: 8180 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:40:02,359-Speed 10534.83 samples/sec Loss 11.9196 LearningRate 0.2370 Epoch: 1 Global Step: 8190 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:40:10,155-Speed 10514.66 samples/sec Loss 11.9134 LearningRate 0.2373 Epoch: 1 Global Step: 8200 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:40:17,984-Speed 10464.30 samples/sec Loss 12.0250 LearningRate 0.2376 Epoch: 1 Global Step: 8210 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:40:25,814-Speed 10465.07 samples/sec Loss 11.9306 LearningRate 0.2378 Epoch: 1 Global Step: 8220 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:40:33,654-Speed 10453.16 samples/sec Loss 12.0114 LearningRate 0.2381 Epoch: 1 Global Step: 8230 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:40:41,487-Speed 10461.38 samples/sec Loss 11.9947 LearningRate 0.2384 Epoch: 1 Global Step: 8240 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:40:49,282-Speed 10511.17 samples/sec Loss 11.9013 LearningRate 0.2387 Epoch: 1 Global Step: 8250 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:40:57,094-Speed 10489.42 samples/sec Loss 11.9253 LearningRate 0.2390 Epoch: 1 Global Step: 8260 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:41:04,890-Speed 10509.59 samples/sec Loss 11.8885 LearningRate 0.2393 Epoch: 1 Global Step: 8270 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:41:12,679-Speed 10520.25 samples/sec Loss 11.8643 LearningRate 0.2396 Epoch: 1 Global Step: 8280 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:41:20,493-Speed 10484.75 samples/sec Loss 11.9053 LearningRate 0.2399 Epoch: 1 Global Step: 8290 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:41:28,335-Speed 10448.02 samples/sec Loss 11.9095 LearningRate 0.2402 Epoch: 1 Global Step: 8300 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:41:36,156-Speed 10477.14 samples/sec Loss 11.8087 LearningRate 0.2405 Epoch: 1 Global Step: 8310 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:41:43,986-Speed 10464.07 samples/sec Loss 11.8620 LearningRate 0.2407 Epoch: 1 Global Step: 8320 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:41:51,819-Speed 10459.62 samples/sec Loss 11.7955 LearningRate 0.2410 Epoch: 1 Global Step: 8330 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:41:59,654-Speed 10457.94 samples/sec Loss 11.9660 LearningRate 0.2413 Epoch: 1 Global Step: 8340 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:42:07,466-Speed 10487.31 samples/sec Loss 11.8188 LearningRate 0.2416 Epoch: 1 Global Step: 8350 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:42:15,294-Speed 10468.21 samples/sec Loss 11.8412 LearningRate 0.2419 Epoch: 1 Global Step: 8360 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:42:23,126-Speed 10460.96 samples/sec Loss 11.8093 LearningRate 0.2422 Epoch: 1 Global Step: 8370 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:42:30,979-Speed 10432.65 samples/sec Loss 11.8084 LearningRate 0.2425 Epoch: 1 Global Step: 8380 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:42:38,829-Speed 10438.82 samples/sec Loss 11.8008 LearningRate 0.2428 Epoch: 1 Global Step: 8390 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:42:46,612-Speed 10528.87 samples/sec Loss 11.8321 LearningRate 0.2431 Epoch: 1 Global Step: 8400 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:42:54,424-Speed 10495.41 samples/sec Loss 11.6838 LearningRate 0.2433 Epoch: 1 Global Step: 8410 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:43:02,236-Speed 10488.36 samples/sec Loss 11.7956 LearningRate 0.2436 Epoch: 1 Global Step: 8420 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:43:10,070-Speed 10458.37 samples/sec Loss 11.7644 LearningRate 0.2439 Epoch: 1 Global Step: 8430 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:43:17,916-Speed 10446.05 samples/sec Loss 11.7601 LearningRate 0.2442 Epoch: 1 Global Step: 8440 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:43:25,769-Speed 10432.89 samples/sec Loss 11.7641 LearningRate 0.2445 Epoch: 1 Global Step: 8450 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:43:33,582-Speed 10487.23 samples/sec Loss 11.8046 LearningRate 0.2448 Epoch: 1 Global Step: 8460 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:43:41,385-Speed 10499.54 samples/sec Loss 11.7857 LearningRate 0.2451 Epoch: 1 Global Step: 8470 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:43:49,174-Speed 10520.14 samples/sec Loss 11.6941 LearningRate 0.2454 Epoch: 1 Global Step: 8480 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:43:56,980-Speed 10496.80 samples/sec Loss 11.7608 LearningRate 0.2457 Epoch: 1 Global Step: 8490 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:44:04,771-Speed 10515.84 samples/sec Loss 11.6919 LearningRate 0.2459 Epoch: 1 Global Step: 8500 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:44:12,562-Speed 10517.53 samples/sec Loss 11.6655 LearningRate 0.2462 Epoch: 1 Global Step: 8510 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:44:20,359-Speed 10507.82 samples/sec Loss 11.6948 LearningRate 0.2465 Epoch: 1 Global Step: 8520 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:44:28,147-Speed 10521.16 samples/sec Loss 11.6784 LearningRate 0.2468 Epoch: 1 Global Step: 8530 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:44:35,950-Speed 10500.91 samples/sec Loss 11.6714 LearningRate 0.2471 Epoch: 1 Global Step: 8540 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:44:43,743-Speed 10513.38 samples/sec Loss 11.7180 LearningRate 0.2474 Epoch: 1 Global Step: 8550 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:44:51,566-Speed 10473.99 samples/sec Loss 11.6242 LearningRate 0.2477 Epoch: 1 Global Step: 8560 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:44:59,386-Speed 10477.21 samples/sec Loss 11.6543 LearningRate 0.2480 Epoch: 1 Global Step: 8570 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:45:07,216-Speed 10464.31 samples/sec Loss 11.7052 LearningRate 0.2483 Epoch: 1 Global Step: 8580 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:45:14,995-Speed 10532.32 samples/sec Loss 11.7368 LearningRate 0.2486 Epoch: 1 Global Step: 8590 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:45:22,786-Speed 10516.59 samples/sec Loss 11.6800 LearningRate 0.2488 Epoch: 1 Global Step: 8600 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:45:30,587-Speed 10503.41 samples/sec Loss 11.6955 LearningRate 0.2491 Epoch: 1 Global Step: 8610 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:45:38,394-Speed 10495.35 samples/sec Loss 11.5830 LearningRate 0.2494 Epoch: 1 Global Step: 8620 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:45:46,211-Speed 10480.22 samples/sec Loss 11.6039 LearningRate 0.2497 Epoch: 1 Global Step: 8630 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:45:54,030-Speed 10479.64 samples/sec Loss 11.5857 LearningRate 0.2500 Epoch: 1 Global Step: 8640 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:46:01,838-Speed 10493.07 samples/sec Loss 11.7083 LearningRate 0.2503 Epoch: 1 Global Step: 8650 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:46:09,638-Speed 10503.77 samples/sec Loss 11.6375 LearningRate 0.2506 Epoch: 1 Global Step: 8660 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:46:17,465-Speed 10467.74 samples/sec Loss 11.5984 LearningRate 0.2509 Epoch: 1 Global Step: 8670 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:46:25,273-Speed 10493.73 samples/sec Loss 11.5790 LearningRate 0.2512 Epoch: 1 Global Step: 8680 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-01-15 16:46:33,079-Speed 10500.22 samples/sec Loss 11.5658 LearningRate 0.2514 Epoch: 1 Global Step: 8690 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:46:40,862-Speed 10527.04 samples/sec Loss 11.5969 LearningRate 0.2517 Epoch: 1 Global Step: 8700 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:46:48,657-Speed 10511.97 samples/sec Loss 11.5140 LearningRate 0.2520 Epoch: 1 Global Step: 8710 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:46:56,454-Speed 10510.14 samples/sec Loss 11.5778 LearningRate 0.2523 Epoch: 1 Global Step: 8720 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:47:04,241-Speed 10521.35 samples/sec Loss 11.6863 LearningRate 0.2526 Epoch: 1 Global Step: 8730 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:47:12,034-Speed 10513.25 samples/sec Loss 11.5752 LearningRate 0.2529 Epoch: 1 Global Step: 8740 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:47:19,875-Speed 10450.20 samples/sec Loss 11.6774 LearningRate 0.2532 Epoch: 1 Global Step: 8750 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:47:27,715-Speed 10451.28 samples/sec Loss 11.5664 LearningRate 0.2535 Epoch: 1 Global Step: 8760 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:47:35,516-Speed 10502.21 samples/sec Loss 11.5689 LearningRate 0.2538 Epoch: 1 Global Step: 8770 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:47:43,308-Speed 10514.63 samples/sec Loss 11.6229 LearningRate 0.2541 Epoch: 1 Global Step: 8780 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:47:51,089-Speed 10534.65 samples/sec Loss 11.5259 LearningRate 0.2543 Epoch: 1 Global Step: 8790 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:47:58,909-Speed 10478.01 samples/sec Loss 11.5222 LearningRate 0.2546 Epoch: 1 Global Step: 8800 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:48:06,745-Speed 10456.84 samples/sec Loss 11.4711 LearningRate 0.2549 Epoch: 1 Global Step: 8810 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:48:14,537-Speed 10516.45 samples/sec Loss 11.5199 LearningRate 0.2552 Epoch: 1 Global Step: 8820 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:48:22,348-Speed 10488.77 samples/sec Loss 11.4774 LearningRate 0.2555 Epoch: 1 Global Step: 8830 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:48:30,171-Speed 10472.94 samples/sec Loss 11.4265 LearningRate 0.2558 Epoch: 1 Global Step: 8840 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:48:37,960-Speed 10522.96 samples/sec Loss 11.6024 LearningRate 0.2561 Epoch: 1 Global Step: 8850 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:48:45,789-Speed 10464.84 samples/sec Loss 11.5222 LearningRate 0.2564 Epoch: 1 Global Step: 8860 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:48:53,595-Speed 10496.42 samples/sec Loss 11.5405 LearningRate 0.2567 Epoch: 1 Global Step: 8870 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:49:01,429-Speed 10458.87 samples/sec Loss 11.5194 LearningRate 0.2569 Epoch: 1 Global Step: 8880 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:49:09,229-Speed 10510.35 samples/sec Loss 11.4813 LearningRate 0.2572 Epoch: 1 Global Step: 8890 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:49:17,043-Speed 10485.45 samples/sec Loss 11.4391 LearningRate 0.2575 Epoch: 1 Global Step: 8900 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:49:24,880-Speed 10458.27 samples/sec Loss 11.5178 LearningRate 0.2578 Epoch: 1 Global Step: 8910 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:49:32,645-Speed 10550.56 samples/sec Loss 11.4312 LearningRate 0.2581 Epoch: 1 Global Step: 8920 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:49:40,478-Speed 10459.79 samples/sec Loss 11.4211 LearningRate 0.2584 Epoch: 1 Global Step: 8930 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:49:48,276-Speed 10507.99 samples/sec Loss 11.5513 LearningRate 0.2587 Epoch: 1 Global Step: 8940 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:49:56,089-Speed 10488.24 samples/sec Loss 11.5177 LearningRate 0.2590 Epoch: 1 Global Step: 8950 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:50:03,944-Speed 10429.74 samples/sec Loss 11.4024 LearningRate 0.2593 Epoch: 1 Global Step: 8960 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:50:11,775-Speed 10463.35 samples/sec Loss 11.4793 LearningRate 0.2595 Epoch: 1 Global Step: 8970 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:50:19,608-Speed 10460.66 samples/sec Loss 11.4461 LearningRate 0.2598 Epoch: 1 Global Step: 8980 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:50:27,437-Speed 10467.24 samples/sec Loss 11.3990 LearningRate 0.2601 Epoch: 1 Global Step: 8990 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:50:35,252-Speed 10484.87 samples/sec Loss 11.3759 LearningRate 0.2604 Epoch: 1 Global Step: 9000 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:50:43,046-Speed 10512.23 samples/sec Loss 11.4295 LearningRate 0.2607 Epoch: 1 Global Step: 9010 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:50:50,841-Speed 10510.58 samples/sec Loss 11.4546 LearningRate 0.2610 Epoch: 1 Global Step: 9020 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:50:58,624-Speed 10527.10 samples/sec Loss 11.4330 LearningRate 0.2613 Epoch: 1 Global Step: 9030 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:51:06,414-Speed 10520.36 samples/sec Loss 11.4184 LearningRate 0.2616 Epoch: 1 Global Step: 9040 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:51:14,215-Speed 10510.95 samples/sec Loss 11.4508 LearningRate 0.2619 Epoch: 1 Global Step: 9050 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:51:22,037-Speed 10476.64 samples/sec Loss 11.4190 LearningRate 0.2622 Epoch: 1 Global Step: 9060 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:51:29,856-Speed 10479.27 samples/sec Loss 11.4153 LearningRate 0.2624 Epoch: 1 Global Step: 9070 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:51:37,692-Speed 10455.18 samples/sec Loss 11.4071 LearningRate 0.2627 Epoch: 1 Global Step: 9080 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:51:45,517-Speed 10470.18 samples/sec Loss 11.4099 LearningRate 0.2630 Epoch: 1 Global Step: 9090 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:51:53,359-Speed 10448.13 samples/sec Loss 11.4077 LearningRate 0.2633 Epoch: 1 Global Step: 9100 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:52:01,200-Speed 10450.87 samples/sec Loss 11.3638 LearningRate 0.2636 Epoch: 1 Global Step: 9110 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:52:09,016-Speed 10482.24 samples/sec Loss 11.4007 LearningRate 0.2639 Epoch: 1 Global Step: 9120 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:52:16,812-Speed 10509.18 samples/sec Loss 11.3737 LearningRate 0.2642 Epoch: 1 Global Step: 9130 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:52:24,628-Speed 10483.66 samples/sec Loss 11.3402 LearningRate 0.2645 Epoch: 1 Global Step: 9140 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:52:32,448-Speed 10478.42 samples/sec Loss 11.3829 LearningRate 0.2648 Epoch: 1 Global Step: 9150 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:52:40,287-Speed 10452.05 samples/sec Loss 11.3262 LearningRate 0.2650 Epoch: 1 Global Step: 9160 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:52:48,095-Speed 10494.02 samples/sec Loss 11.2468 LearningRate 0.2653 Epoch: 1 Global Step: 9170 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:52:55,895-Speed 10504.61 samples/sec Loss 11.4098 LearningRate 0.2656 Epoch: 1 Global Step: 9180 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:53:03,678-Speed 10527.40 samples/sec Loss 11.4733 LearningRate 0.2659 Epoch: 1 Global Step: 9190 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:53:11,486-Speed 10505.81 samples/sec Loss 11.3295 LearningRate 0.2662 Epoch: 1 Global Step: 9200 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:53:19,272-Speed 10522.73 samples/sec Loss 11.4898 LearningRate 0.2665 Epoch: 1 Global Step: 9210 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:53:27,091-Speed 10479.66 samples/sec Loss 11.3724 LearningRate 0.2668 Epoch: 1 Global Step: 9220 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:53:34,944-Speed 10433.62 samples/sec Loss 11.3762 LearningRate 0.2671 Epoch: 1 Global Step: 9230 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:53:42,744-Speed 10503.55 samples/sec Loss 11.3575 LearningRate 0.2674 Epoch: 1 Global Step: 9240 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:53:50,568-Speed 10470.45 samples/sec Loss 11.3596 LearningRate 0.2677 Epoch: 1 Global Step: 9250 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:53:58,370-Speed 10502.19 samples/sec Loss 11.2956 LearningRate 0.2679 Epoch: 1 Global Step: 9260 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:54:06,197-Speed 10468.94 samples/sec Loss 11.2785 LearningRate 0.2682 Epoch: 1 Global Step: 9270 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:54:14,028-Speed 10462.10 samples/sec Loss 11.2505 LearningRate 0.2685 Epoch: 1 Global Step: 9280 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:54:21,841-Speed 10486.48 samples/sec Loss 11.3133 LearningRate 0.2688 Epoch: 1 Global Step: 9290 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:54:29,709-Speed 10413.45 samples/sec Loss 11.3311 LearningRate 0.2691 Epoch: 1 Global Step: 9300 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:54:37,548-Speed 10452.31 samples/sec Loss 11.3761 LearningRate 0.2694 Epoch: 1 Global Step: 9310 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:54:45,359-Speed 10490.85 samples/sec Loss 11.3335 LearningRate 0.2697 Epoch: 1 Global Step: 9320 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:54:53,195-Speed 10455.20 samples/sec Loss 11.3088 LearningRate 0.2700 Epoch: 1 Global Step: 9330 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:55:01,036-Speed 10450.96 samples/sec Loss 11.3158 LearningRate 0.2703 Epoch: 1 Global Step: 9340 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:55:08,811-Speed 10539.52 samples/sec Loss 11.3128 LearningRate 0.2705 Epoch: 1 Global Step: 9350 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:55:16,609-Speed 10507.16 samples/sec Loss 11.3122 LearningRate 0.2708 Epoch: 1 Global Step: 9360 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:55:24,392-Speed 10529.03 samples/sec Loss 11.2317 LearningRate 0.2711 Epoch: 1 Global Step: 9370 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:55:32,181-Speed 10523.91 samples/sec Loss 11.3335 LearningRate 0.2714 Epoch: 1 Global Step: 9380 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:55:40,000-Speed 10479.63 samples/sec Loss 11.3166 LearningRate 0.2717 Epoch: 1 Global Step: 9390 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:55:47,792-Speed 10515.15 samples/sec Loss 11.3479 LearningRate 0.2720 Epoch: 1 Global Step: 9400 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:55:55,603-Speed 10489.69 samples/sec Loss 11.2496 LearningRate 0.2723 Epoch: 1 Global Step: 9410 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:56:03,402-Speed 10506.41 samples/sec Loss 11.2035 LearningRate 0.2726 Epoch: 1 Global Step: 9420 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:56:11,231-Speed 10465.09 samples/sec Loss 11.2797 LearningRate 0.2729 Epoch: 1 Global Step: 9430 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:56:19,040-Speed 10492.38 samples/sec Loss 11.2204 LearningRate 0.2731 Epoch: 1 Global Step: 9440 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:56:26,820-Speed 10532.38 samples/sec Loss 11.3702 LearningRate 0.2734 Epoch: 1 Global Step: 9450 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 16:56:34,633-Speed 10487.47 samples/sec Loss 11.2495 LearningRate 0.2737 Epoch: 1 Global Step: 9460 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:56:42,419-Speed 10523.49 samples/sec Loss 11.2638 LearningRate 0.2740 Epoch: 1 Global Step: 9470 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 16:56:50,204-Speed 10525.02 samples/sec Loss 11.2630 LearningRate 0.2743 Epoch: 1 Global Step: 9480 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 16:56:58,001-Speed 10507.94 samples/sec Loss 11.1760 LearningRate 0.2746 Epoch: 1 Global Step: 9490 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 16:57:05,800-Speed 10506.18 samples/sec Loss 11.2269 LearningRate 0.2749 Epoch: 1 Global Step: 9500 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-15 16:57:13,592-Speed 10515.30 samples/sec Loss 11.2109 LearningRate 0.2752 Epoch: 1 Global Step: 9510 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-15 16:57:21,381-Speed 10519.60 samples/sec Loss 11.2375 LearningRate 0.2755 Epoch: 1 Global Step: 9520 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-15 16:57:29,175-Speed 10511.83 samples/sec Loss 11.2190 LearningRate 0.2758 Epoch: 1 Global Step: 9530 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-15 16:57:36,965-Speed 10517.53 samples/sec Loss 11.2582 LearningRate 0.2760 Epoch: 1 Global Step: 9540 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-15 16:57:44,774-Speed 10492.87 samples/sec Loss 11.2220 LearningRate 0.2763 Epoch: 1 Global Step: 9550 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-15 16:57:52,609-Speed 10458.37 samples/sec Loss 11.2654 LearningRate 0.2766 Epoch: 1 Global Step: 9560 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-15 16:58:00,420-Speed 10489.49 samples/sec Loss 11.2801 LearningRate 0.2769 Epoch: 1 Global Step: 9570 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-15 16:58:08,231-Speed 10492.38 samples/sec Loss 11.3125 LearningRate 0.2772 Epoch: 1 Global Step: 9580 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-15 16:58:16,029-Speed 10507.78 samples/sec Loss 11.2783 LearningRate 0.2775 Epoch: 1 Global Step: 9590 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-15 16:58:23,828-Speed 10507.11 samples/sec Loss 11.2738 LearningRate 0.2778 Epoch: 1 Global Step: 9600 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 16:58:31,630-Speed 10500.96 samples/sec Loss 11.2702 LearningRate 0.2781 Epoch: 1 Global Step: 9610 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 16:58:39,420-Speed 10518.23 samples/sec Loss 11.1511 LearningRate 0.2784 Epoch: 1 Global Step: 9620 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 16:58:47,218-Speed 10508.10 samples/sec Loss 11.2479 LearningRate 0.2786 Epoch: 1 Global Step: 9630 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 16:58:55,021-Speed 10501.09 samples/sec Loss 11.2091 LearningRate 0.2789 Epoch: 1 Global Step: 9640 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 16:59:02,843-Speed 10474.23 samples/sec Loss 11.3256 LearningRate 0.2792 Epoch: 1 Global Step: 9650 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 16:59:10,665-Speed 10475.16 samples/sec Loss 11.1924 LearningRate 0.2795 Epoch: 1 Global Step: 9660 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 16:59:18,449-Speed 10525.95 samples/sec Loss 11.1874 LearningRate 0.2798 Epoch: 1 Global Step: 9670 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 16:59:26,255-Speed 10496.30 samples/sec Loss 11.2342 LearningRate 0.2801 Epoch: 1 Global Step: 9680 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 16:59:34,049-Speed 10511.63 samples/sec Loss 11.1464 LearningRate 0.2804 Epoch: 1 Global Step: 9690 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 16:59:41,848-Speed 10505.75 samples/sec Loss 11.2223 LearningRate 0.2807 Epoch: 1 Global Step: 9700 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 16:59:49,708-Speed 10424.49 samples/sec Loss 11.2503 LearningRate 0.2810 Epoch: 1 Global Step: 9710 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 16:59:57,574-Speed 10416.44 samples/sec Loss 11.2204 LearningRate 0.2812 Epoch: 1 Global Step: 9720 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:00:05,421-Speed 10441.35 samples/sec Loss 11.2115 LearningRate 0.2815 Epoch: 1 Global Step: 9730 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:00:13,212-Speed 10516.35 samples/sec Loss 11.2207 LearningRate 0.2818 Epoch: 1 Global Step: 9740 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:00:21,020-Speed 10494.73 samples/sec Loss 11.0847 LearningRate 0.2821 Epoch: 1 Global Step: 9750 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:00:28,812-Speed 10515.55 samples/sec Loss 11.2637 LearningRate 0.2824 Epoch: 1 Global Step: 9760 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:00:36,660-Speed 10438.57 samples/sec Loss 11.2441 LearningRate 0.2827 Epoch: 1 Global Step: 9770 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:00:44,464-Speed 10500.36 samples/sec Loss 11.3093 LearningRate 0.2830 Epoch: 1 Global Step: 9780 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:00:52,304-Speed 10449.85 samples/sec Loss 11.2310 LearningRate 0.2833 Epoch: 1 Global Step: 9790 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:01:00,122-Speed 10482.01 samples/sec Loss 11.2221 LearningRate 0.2836 Epoch: 1 Global Step: 9800 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:01:07,920-Speed 10515.44 samples/sec Loss 11.1947 LearningRate 0.2839 Epoch: 1 Global Step: 9810 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:01:15,728-Speed 10493.79 samples/sec Loss 11.1145 LearningRate 0.2841 Epoch: 1 Global Step: 9820 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:01:23,506-Speed 10532.96 samples/sec Loss 11.1388 LearningRate 0.2844 Epoch: 1 Global Step: 9830 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:01:31,348-Speed 10447.97 samples/sec Loss 11.1974 LearningRate 0.2847 Epoch: 1 Global Step: 9840 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:01:39,146-Speed 10509.02 samples/sec Loss 11.1942 LearningRate 0.2850 Epoch: 1 Global Step: 9850 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:01:46,955-Speed 10491.26 samples/sec Loss 11.2094 LearningRate 0.2853 Epoch: 1 Global Step: 9860 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:01:54,796-Speed 10449.80 samples/sec Loss 11.1447 LearningRate 0.2856 Epoch: 1 Global Step: 9870 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:02:02,582-Speed 10523.30 samples/sec Loss 11.1938 LearningRate 0.2859 Epoch: 1 Global Step: 9880 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:02:10,386-Speed 10500.76 samples/sec Loss 11.1721 LearningRate 0.2862 Epoch: 1 Global Step: 9890 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:02:18,183-Speed 10509.53 samples/sec Loss 11.1450 LearningRate 0.2865 Epoch: 1 Global Step: 9900 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:02:25,958-Speed 10538.59 samples/sec Loss 11.2249 LearningRate 0.2867 Epoch: 1 Global Step: 9910 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:02:33,726-Speed 10551.54 samples/sec Loss 11.1986 LearningRate 0.2870 Epoch: 1 Global Step: 9920 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:02:41,538-Speed 10488.61 samples/sec Loss 11.1813 LearningRate 0.2873 Epoch: 1 Global Step: 9930 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:02:49,391-Speed 10433.47 samples/sec Loss 11.2198 LearningRate 0.2876 Epoch: 1 Global Step: 9940 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:02:57,195-Speed 10499.71 samples/sec Loss 11.1669 LearningRate 0.2879 Epoch: 1 Global Step: 9950 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:03:05,015-Speed 10482.29 samples/sec Loss 11.1200 LearningRate 0.2882 Epoch: 1 Global Step: 9960 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:03:12,801-Speed 10523.03 samples/sec Loss 11.1096 LearningRate 0.2885 Epoch: 1 Global Step: 9970 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:03:20,605-Speed 10499.88 samples/sec Loss 11.1261 LearningRate 0.2888 Epoch: 1 Global Step: 9980 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:03:28,427-Speed 10475.52 samples/sec Loss 11.1293 LearningRate 0.2891 Epoch: 1 Global Step: 9990 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:03:36,234-Speed 10495.42 samples/sec Loss 11.2263 LearningRate 0.2894 Epoch: 1 Global Step: 10000 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:04:03,465-[lfw][10000]XNorm: 23.064546 Training: 2022-01-15 17:04:03,466-[lfw][10000]Accuracy-Flip: 0.99483+-0.00391 Training: 2022-01-15 17:04:03,467-[lfw][10000]Accuracy-Highest: 0.99483 Training: 2022-01-15 17:04:35,455-[cfp_fp][10000]XNorm: 21.010530 Training: 2022-01-15 17:04:35,455-[cfp_fp][10000]Accuracy-Flip: 0.96829+-0.01067 Training: 2022-01-15 17:04:35,456-[cfp_fp][10000]Accuracy-Highest: 0.96829 Training: 2022-01-15 17:05:03,455-[agedb_30][10000]XNorm: 22.595752 Training: 2022-01-15 17:05:03,457-[agedb_30][10000]Accuracy-Flip: 0.95250+-0.01083 Training: 2022-01-15 17:05:03,457-[agedb_30][10000]Accuracy-Highest: 0.95250 Training: 2022-01-15 17:05:11,292-Speed 861.80 samples/sec Loss 11.1196 LearningRate 0.2896 Epoch: 1 Global Step: 10010 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 17:05:19,060-Speed 10549.70 samples/sec Loss 11.1118 LearningRate 0.2899 Epoch: 1 Global Step: 10020 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 17:05:26,825-Speed 10551.57 samples/sec Loss 11.0616 LearningRate 0.2902 Epoch: 1 Global Step: 10030 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 17:05:34,592-Speed 10549.10 samples/sec Loss 11.0828 LearningRate 0.2905 Epoch: 1 Global Step: 10040 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 17:05:42,445-Speed 10434.52 samples/sec Loss 11.1688 LearningRate 0.2908 Epoch: 1 Global Step: 10050 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 17:05:50,241-Speed 10513.65 samples/sec Loss 11.1544 LearningRate 0.2911 Epoch: 1 Global Step: 10060 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 17:05:58,001-Speed 10559.25 samples/sec Loss 11.1863 LearningRate 0.2914 Epoch: 1 Global Step: 10070 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 17:06:05,794-Speed 10515.02 samples/sec Loss 11.1837 LearningRate 0.2917 Epoch: 1 Global Step: 10080 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 17:06:13,576-Speed 10529.29 samples/sec Loss 11.1362 LearningRate 0.2920 Epoch: 1 Global Step: 10090 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 17:06:21,362-Speed 10524.26 samples/sec Loss 11.1585 LearningRate 0.2922 Epoch: 1 Global Step: 10100 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 17:06:29,145-Speed 10536.33 samples/sec Loss 11.0764 LearningRate 0.2925 Epoch: 1 Global Step: 10110 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 17:06:36,916-Speed 10543.53 samples/sec Loss 11.1530 LearningRate 0.2928 Epoch: 1 Global Step: 10120 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 17:06:44,696-Speed 10531.00 samples/sec Loss 11.1490 LearningRate 0.2931 Epoch: 1 Global Step: 10130 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 17:06:52,470-Speed 10539.14 samples/sec Loss 11.3044 LearningRate 0.2934 Epoch: 1 Global Step: 10140 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 17:07:00,242-Speed 10542.22 samples/sec Loss 11.0486 LearningRate 0.2937 Epoch: 1 Global Step: 10150 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 17:07:08,028-Speed 10524.45 samples/sec Loss 11.0951 LearningRate 0.2940 Epoch: 1 Global Step: 10160 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 17:07:15,810-Speed 10528.67 samples/sec Loss 11.1485 LearningRate 0.2943 Epoch: 1 Global Step: 10170 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 17:07:23,590-Speed 10531.52 samples/sec Loss 11.0862 LearningRate 0.2946 Epoch: 1 Global Step: 10180 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 17:07:31,385-Speed 10514.56 samples/sec Loss 11.1484 LearningRate 0.2948 Epoch: 1 Global Step: 10190 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 17:07:39,165-Speed 10531.67 samples/sec Loss 11.0582 LearningRate 0.2951 Epoch: 1 Global Step: 10200 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 17:07:46,981-Speed 10482.54 samples/sec Loss 11.0968 LearningRate 0.2954 Epoch: 1 Global Step: 10210 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 17:07:54,764-Speed 10528.33 samples/sec Loss 11.1518 LearningRate 0.2957 Epoch: 1 Global Step: 10220 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 17:08:02,531-Speed 10549.10 samples/sec Loss 11.1228 LearningRate 0.2960 Epoch: 1 Global Step: 10230 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 17:08:10,306-Speed 10538.13 samples/sec Loss 11.1999 LearningRate 0.2963 Epoch: 1 Global Step: 10240 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 17:08:18,089-Speed 10527.55 samples/sec Loss 11.0920 LearningRate 0.2966 Epoch: 1 Global Step: 10250 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 17:08:25,894-Speed 10497.92 samples/sec Loss 11.1351 LearningRate 0.2969 Epoch: 1 Global Step: 10260 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 17:08:33,667-Speed 10541.03 samples/sec Loss 11.2410 LearningRate 0.2972 Epoch: 1 Global Step: 10270 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 17:08:41,465-Speed 10508.74 samples/sec Loss 11.1587 LearningRate 0.2975 Epoch: 1 Global Step: 10280 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 17:08:49,241-Speed 10536.26 samples/sec Loss 11.0385 LearningRate 0.2977 Epoch: 1 Global Step: 10290 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 17:08:57,063-Speed 10474.69 samples/sec Loss 11.1128 LearningRate 0.2980 Epoch: 1 Global Step: 10300 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 17:09:04,886-Speed 10472.71 samples/sec Loss 11.1826 LearningRate 0.2983 Epoch: 1 Global Step: 10310 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 17:09:12,687-Speed 10504.10 samples/sec Loss 11.0643 LearningRate 0.2986 Epoch: 1 Global Step: 10320 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 17:09:20,517-Speed 10464.10 samples/sec Loss 11.1408 LearningRate 0.2989 Epoch: 1 Global Step: 10330 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 17:09:28,350-Speed 10460.84 samples/sec Loss 11.2111 LearningRate 0.2992 Epoch: 1 Global Step: 10340 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 17:09:36,225-Speed 10406.48 samples/sec Loss 11.2287 LearningRate 0.2995 Epoch: 1 Global Step: 10350 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 17:09:44,056-Speed 10463.13 samples/sec Loss 11.0545 LearningRate 0.2998 Epoch: 1 Global Step: 10360 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 17:09:51,882-Speed 10468.49 samples/sec Loss 11.0891 LearningRate 0.3001 Epoch: 1 Global Step: 10370 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 17:10:14,752-Speed 3582.17 samples/sec Loss 11.0807 LearningRate 0.3003 Epoch: 2 Global Step: 10380 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 17:10:22,547-Speed 10511.78 samples/sec Loss 11.0924 LearningRate 0.3006 Epoch: 2 Global Step: 10390 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 17:10:30,346-Speed 10509.31 samples/sec Loss 10.9982 LearningRate 0.3009 Epoch: 2 Global Step: 10400 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 17:10:38,119-Speed 10541.34 samples/sec Loss 11.1399 LearningRate 0.3012 Epoch: 2 Global Step: 10410 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 17:10:45,946-Speed 10467.32 samples/sec Loss 11.0933 LearningRate 0.3015 Epoch: 2 Global Step: 10420 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 17:10:53,749-Speed 10502.63 samples/sec Loss 11.0406 LearningRate 0.3018 Epoch: 2 Global Step: 10430 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 17:11:01,527-Speed 10534.46 samples/sec Loss 11.1111 LearningRate 0.3021 Epoch: 2 Global Step: 10440 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 17:11:09,350-Speed 10473.81 samples/sec Loss 11.0601 LearningRate 0.3024 Epoch: 2 Global Step: 10450 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 17:11:17,162-Speed 10488.77 samples/sec Loss 11.0802 LearningRate 0.3027 Epoch: 2 Global Step: 10460 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 17:11:24,996-Speed 10458.70 samples/sec Loss 11.0908 LearningRate 0.3030 Epoch: 2 Global Step: 10470 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 17:11:32,788-Speed 10515.61 samples/sec Loss 11.1481 LearningRate 0.3032 Epoch: 2 Global Step: 10480 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 17:11:40,585-Speed 10508.67 samples/sec Loss 11.1577 LearningRate 0.3035 Epoch: 2 Global Step: 10490 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 17:11:48,386-Speed 10503.21 samples/sec Loss 11.1018 LearningRate 0.3038 Epoch: 2 Global Step: 10500 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 17:11:56,179-Speed 10514.79 samples/sec Loss 11.1315 LearningRate 0.3041 Epoch: 2 Global Step: 10510 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 17:12:03,970-Speed 10516.28 samples/sec Loss 11.0961 LearningRate 0.3044 Epoch: 2 Global Step: 10520 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 17:12:11,748-Speed 10534.51 samples/sec Loss 11.1705 LearningRate 0.3047 Epoch: 2 Global Step: 10530 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 17:12:19,545-Speed 10508.93 samples/sec Loss 11.1463 LearningRate 0.3050 Epoch: 2 Global Step: 10540 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 17:12:27,299-Speed 10566.23 samples/sec Loss 11.0101 LearningRate 0.3053 Epoch: 2 Global Step: 10550 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 17:12:35,058-Speed 10560.62 samples/sec Loss 11.1359 LearningRate 0.3056 Epoch: 2 Global Step: 10560 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 17:12:42,830-Speed 10542.66 samples/sec Loss 11.0290 LearningRate 0.3058 Epoch: 2 Global Step: 10570 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-01-15 17:12:50,609-Speed 10532.41 samples/sec Loss 11.0516 LearningRate 0.3061 Epoch: 2 Global Step: 10580 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-01-15 17:12:58,400-Speed 10515.24 samples/sec Loss 11.0224 LearningRate 0.3064 Epoch: 2 Global Step: 10590 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:13:06,182-Speed 10530.40 samples/sec Loss 11.0455 LearningRate 0.3067 Epoch: 2 Global Step: 10600 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:13:13,968-Speed 10522.87 samples/sec Loss 11.0806 LearningRate 0.3070 Epoch: 2 Global Step: 10610 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:13:21,791-Speed 10473.67 samples/sec Loss 11.1035 LearningRate 0.3073 Epoch: 2 Global Step: 10620 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:13:29,600-Speed 10494.13 samples/sec Loss 11.0577 LearningRate 0.3076 Epoch: 2 Global Step: 10630 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:13:37,392-Speed 10516.56 samples/sec Loss 11.0638 LearningRate 0.3079 Epoch: 2 Global Step: 10640 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:13:45,181-Speed 10521.26 samples/sec Loss 11.2120 LearningRate 0.3082 Epoch: 2 Global Step: 10650 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:13:53,013-Speed 10460.98 samples/sec Loss 11.1199 LearningRate 0.3084 Epoch: 2 Global Step: 10660 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:14:00,795-Speed 10527.72 samples/sec Loss 11.0791 LearningRate 0.3087 Epoch: 2 Global Step: 10670 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:14:08,590-Speed 10512.71 samples/sec Loss 11.0554 LearningRate 0.3090 Epoch: 2 Global Step: 10680 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:14:16,375-Speed 10524.56 samples/sec Loss 11.0719 LearningRate 0.3093 Epoch: 2 Global Step: 10690 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:14:24,179-Speed 10500.01 samples/sec Loss 11.0648 LearningRate 0.3096 Epoch: 2 Global Step: 10700 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:14:31,977-Speed 10507.59 samples/sec Loss 11.0231 LearningRate 0.3099 Epoch: 2 Global Step: 10710 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:14:39,765-Speed 10520.18 samples/sec Loss 11.0810 LearningRate 0.3102 Epoch: 2 Global Step: 10720 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:14:47,546-Speed 10531.34 samples/sec Loss 11.0914 LearningRate 0.3105 Epoch: 2 Global Step: 10730 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:14:55,331-Speed 10524.58 samples/sec Loss 11.1033 LearningRate 0.3108 Epoch: 2 Global Step: 10740 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:15:03,127-Speed 10509.97 samples/sec Loss 11.0875 LearningRate 0.3111 Epoch: 2 Global Step: 10750 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:15:10,915-Speed 10520.36 samples/sec Loss 11.1098 LearningRate 0.3113 Epoch: 2 Global Step: 10760 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:15:18,736-Speed 10485.23 samples/sec Loss 11.0785 LearningRate 0.3116 Epoch: 2 Global Step: 10770 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:15:26,532-Speed 10509.12 samples/sec Loss 11.0668 LearningRate 0.3119 Epoch: 2 Global Step: 10780 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:15:34,359-Speed 10471.45 samples/sec Loss 11.0710 LearningRate 0.3122 Epoch: 2 Global Step: 10790 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:15:42,154-Speed 10511.55 samples/sec Loss 11.1413 LearningRate 0.3125 Epoch: 2 Global Step: 10800 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:15:49,983-Speed 10465.69 samples/sec Loss 11.0706 LearningRate 0.3128 Epoch: 2 Global Step: 10810 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:15:57,826-Speed 10444.89 samples/sec Loss 11.1146 LearningRate 0.3131 Epoch: 2 Global Step: 10820 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:16:05,642-Speed 10484.28 samples/sec Loss 11.0667 LearningRate 0.3134 Epoch: 2 Global Step: 10830 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:16:13,446-Speed 10498.64 samples/sec Loss 11.0417 LearningRate 0.3137 Epoch: 2 Global Step: 10840 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:16:21,251-Speed 10505.62 samples/sec Loss 11.0707 LearningRate 0.3139 Epoch: 2 Global Step: 10850 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:16:29,072-Speed 10475.21 samples/sec Loss 11.1314 LearningRate 0.3142 Epoch: 2 Global Step: 10860 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:16:36,857-Speed 10525.34 samples/sec Loss 11.0097 LearningRate 0.3145 Epoch: 2 Global Step: 10870 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:16:44,644-Speed 10526.65 samples/sec Loss 11.1690 LearningRate 0.3148 Epoch: 2 Global Step: 10880 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:16:52,549-Speed 10365.75 samples/sec Loss 11.1114 LearningRate 0.3151 Epoch: 2 Global Step: 10890 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:17:00,371-Speed 10474.94 samples/sec Loss 11.0820 LearningRate 0.3154 Epoch: 2 Global Step: 10900 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:17:08,176-Speed 10497.46 samples/sec Loss 11.0670 LearningRate 0.3157 Epoch: 2 Global Step: 10910 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:17:15,961-Speed 10524.18 samples/sec Loss 11.0627 LearningRate 0.3160 Epoch: 2 Global Step: 10920 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:17:23,754-Speed 10517.21 samples/sec Loss 11.0671 LearningRate 0.3163 Epoch: 2 Global Step: 10930 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:17:31,529-Speed 10537.73 samples/sec Loss 11.0488 LearningRate 0.3166 Epoch: 2 Global Step: 10940 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-15 17:17:39,354-Speed 10470.00 samples/sec Loss 11.0023 LearningRate 0.3168 Epoch: 2 Global Step: 10950 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-15 17:17:47,145-Speed 10516.86 samples/sec Loss 11.0131 LearningRate 0.3171 Epoch: 2 Global Step: 10960 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-15 17:17:54,941-Speed 10510.68 samples/sec Loss 11.0648 LearningRate 0.3174 Epoch: 2 Global Step: 10970 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-15 17:18:02,716-Speed 10537.22 samples/sec Loss 11.1550 LearningRate 0.3177 Epoch: 2 Global Step: 10980 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-15 17:18:10,519-Speed 10500.20 samples/sec Loss 11.1206 LearningRate 0.3180 Epoch: 2 Global Step: 10990 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-15 17:18:18,320-Speed 10502.62 samples/sec Loss 11.0520 LearningRate 0.3183 Epoch: 2 Global Step: 11000 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-15 17:18:26,144-Speed 10477.87 samples/sec Loss 11.1006 LearningRate 0.3186 Epoch: 2 Global Step: 11010 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-15 17:18:33,963-Speed 10477.85 samples/sec Loss 11.0638 LearningRate 0.3189 Epoch: 2 Global Step: 11020 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-15 17:18:41,750-Speed 10522.63 samples/sec Loss 11.0316 LearningRate 0.3192 Epoch: 2 Global Step: 11030 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-15 17:18:49,583-Speed 10460.48 samples/sec Loss 11.0401 LearningRate 0.3194 Epoch: 2 Global Step: 11040 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:18:57,382-Speed 10506.17 samples/sec Loss 11.1255 LearningRate 0.3197 Epoch: 2 Global Step: 11050 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:19:05,161-Speed 10531.89 samples/sec Loss 11.1191 LearningRate 0.3200 Epoch: 2 Global Step: 11060 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:19:12,931-Speed 10544.72 samples/sec Loss 11.0418 LearningRate 0.3203 Epoch: 2 Global Step: 11070 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:19:20,717-Speed 10522.97 samples/sec Loss 11.0843 LearningRate 0.3206 Epoch: 2 Global Step: 11080 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:19:28,552-Speed 10457.90 samples/sec Loss 10.9939 LearningRate 0.3209 Epoch: 2 Global Step: 11090 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:19:36,381-Speed 10466.67 samples/sec Loss 11.1019 LearningRate 0.3212 Epoch: 2 Global Step: 11100 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:19:44,228-Speed 10440.96 samples/sec Loss 11.0629 LearningRate 0.3215 Epoch: 2 Global Step: 11110 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:19:52,004-Speed 10537.19 samples/sec Loss 11.0215 LearningRate 0.3218 Epoch: 2 Global Step: 11120 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:19:59,793-Speed 10517.84 samples/sec Loss 11.0880 LearningRate 0.3220 Epoch: 2 Global Step: 11130 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:20:07,615-Speed 10474.60 samples/sec Loss 11.0518 LearningRate 0.3223 Epoch: 2 Global Step: 11140 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:20:15,457-Speed 10448.42 samples/sec Loss 11.0599 LearningRate 0.3226 Epoch: 2 Global Step: 11150 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:20:23,309-Speed 10433.77 samples/sec Loss 11.0028 LearningRate 0.3229 Epoch: 2 Global Step: 11160 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:20:31,152-Speed 10447.96 samples/sec Loss 11.0711 LearningRate 0.3232 Epoch: 2 Global Step: 11170 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:20:38,992-Speed 10451.93 samples/sec Loss 11.0581 LearningRate 0.3235 Epoch: 2 Global Step: 11180 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:20:46,836-Speed 10446.65 samples/sec Loss 11.0877 LearningRate 0.3238 Epoch: 2 Global Step: 11190 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:20:54,672-Speed 10456.51 samples/sec Loss 11.0548 LearningRate 0.3241 Epoch: 2 Global Step: 11200 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:21:02,499-Speed 10473.84 samples/sec Loss 11.1774 LearningRate 0.3244 Epoch: 2 Global Step: 11210 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:21:10,340-Speed 10449.92 samples/sec Loss 11.0767 LearningRate 0.3247 Epoch: 2 Global Step: 11220 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:21:18,161-Speed 10478.18 samples/sec Loss 11.0226 LearningRate 0.3249 Epoch: 2 Global Step: 11230 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:21:25,986-Speed 10471.27 samples/sec Loss 11.0503 LearningRate 0.3252 Epoch: 2 Global Step: 11240 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:21:33,834-Speed 10440.01 samples/sec Loss 11.0580 LearningRate 0.3255 Epoch: 2 Global Step: 11250 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:21:41,679-Speed 10444.48 samples/sec Loss 11.1335 LearningRate 0.3258 Epoch: 2 Global Step: 11260 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:21:49,556-Speed 10401.61 samples/sec Loss 11.0033 LearningRate 0.3261 Epoch: 2 Global Step: 11270 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:21:57,417-Speed 10422.92 samples/sec Loss 11.0827 LearningRate 0.3264 Epoch: 2 Global Step: 11280 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:22:05,228-Speed 10489.27 samples/sec Loss 11.0354 LearningRate 0.3267 Epoch: 2 Global Step: 11290 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:22:13,050-Speed 10475.55 samples/sec Loss 11.1449 LearningRate 0.3270 Epoch: 2 Global Step: 11300 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:22:20,860-Speed 10490.81 samples/sec Loss 11.0989 LearningRate 0.3273 Epoch: 2 Global Step: 11310 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:22:28,722-Speed 10421.13 samples/sec Loss 11.0812 LearningRate 0.3275 Epoch: 2 Global Step: 11320 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:22:36,549-Speed 10468.45 samples/sec Loss 10.9581 LearningRate 0.3278 Epoch: 2 Global Step: 11330 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:22:44,373-Speed 10472.38 samples/sec Loss 10.9381 LearningRate 0.3281 Epoch: 2 Global Step: 11340 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:22:52,218-Speed 10443.97 samples/sec Loss 11.0328 LearningRate 0.3284 Epoch: 2 Global Step: 11350 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:23:00,045-Speed 10469.75 samples/sec Loss 11.0142 LearningRate 0.3287 Epoch: 2 Global Step: 11360 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:23:07,855-Speed 10491.47 samples/sec Loss 11.0356 LearningRate 0.3290 Epoch: 2 Global Step: 11370 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:23:15,693-Speed 10453.99 samples/sec Loss 11.0664 LearningRate 0.3293 Epoch: 2 Global Step: 11380 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:23:23,496-Speed 10501.25 samples/sec Loss 11.0824 LearningRate 0.3296 Epoch: 2 Global Step: 11390 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:23:31,331-Speed 10457.23 samples/sec Loss 11.0445 LearningRate 0.3299 Epoch: 2 Global Step: 11400 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:23:39,144-Speed 10487.98 samples/sec Loss 11.0296 LearningRate 0.3302 Epoch: 2 Global Step: 11410 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:23:46,960-Speed 10483.26 samples/sec Loss 11.0398 LearningRate 0.3304 Epoch: 2 Global Step: 11420 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:23:54,790-Speed 10464.15 samples/sec Loss 11.1006 LearningRate 0.3307 Epoch: 2 Global Step: 11430 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:24:02,621-Speed 10463.20 samples/sec Loss 11.1618 LearningRate 0.3310 Epoch: 2 Global Step: 11440 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:24:10,447-Speed 10469.12 samples/sec Loss 11.0243 LearningRate 0.3313 Epoch: 2 Global Step: 11450 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:24:18,291-Speed 10447.17 samples/sec Loss 11.0835 LearningRate 0.3316 Epoch: 2 Global Step: 11460 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:24:26,120-Speed 10464.51 samples/sec Loss 11.1126 LearningRate 0.3319 Epoch: 2 Global Step: 11470 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:24:33,938-Speed 10479.74 samples/sec Loss 11.0890 LearningRate 0.3322 Epoch: 2 Global Step: 11480 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:24:41,756-Speed 10481.70 samples/sec Loss 11.0531 LearningRate 0.3325 Epoch: 2 Global Step: 11490 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:24:49,598-Speed 10448.39 samples/sec Loss 11.0198 LearningRate 0.3328 Epoch: 2 Global Step: 11500 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:24:57,418-Speed 10477.47 samples/sec Loss 11.0632 LearningRate 0.3330 Epoch: 2 Global Step: 11510 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:25:05,253-Speed 10458.55 samples/sec Loss 11.0683 LearningRate 0.3333 Epoch: 2 Global Step: 11520 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:25:13,098-Speed 10444.92 samples/sec Loss 11.0756 LearningRate 0.3336 Epoch: 2 Global Step: 11530 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:25:21,030-Speed 10330.22 samples/sec Loss 11.0931 LearningRate 0.3339 Epoch: 2 Global Step: 11540 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:25:28,853-Speed 10474.87 samples/sec Loss 11.0739 LearningRate 0.3342 Epoch: 2 Global Step: 11550 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:25:36,678-Speed 10470.45 samples/sec Loss 11.0108 LearningRate 0.3345 Epoch: 2 Global Step: 11560 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:25:44,509-Speed 10462.37 samples/sec Loss 11.1432 LearningRate 0.3348 Epoch: 2 Global Step: 11570 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:25:52,339-Speed 10463.38 samples/sec Loss 11.0157 LearningRate 0.3351 Epoch: 2 Global Step: 11580 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:26:00,174-Speed 10458.22 samples/sec Loss 11.0591 LearningRate 0.3354 Epoch: 2 Global Step: 11590 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:26:08,014-Speed 10449.65 samples/sec Loss 10.9985 LearningRate 0.3356 Epoch: 2 Global Step: 11600 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:26:15,887-Speed 10407.32 samples/sec Loss 10.9622 LearningRate 0.3359 Epoch: 2 Global Step: 11610 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:26:23,716-Speed 10464.82 samples/sec Loss 11.0827 LearningRate 0.3362 Epoch: 2 Global Step: 11620 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:26:31,529-Speed 10487.31 samples/sec Loss 11.0876 LearningRate 0.3365 Epoch: 2 Global Step: 11630 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:26:39,387-Speed 10425.13 samples/sec Loss 11.0972 LearningRate 0.3368 Epoch: 2 Global Step: 11640 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:26:47,240-Speed 10433.44 samples/sec Loss 11.1535 LearningRate 0.3371 Epoch: 2 Global Step: 11650 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:26:55,077-Speed 10453.86 samples/sec Loss 11.3383 LearningRate 0.3374 Epoch: 2 Global Step: 11660 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:27:02,915-Speed 10453.73 samples/sec Loss 11.1599 LearningRate 0.3377 Epoch: 2 Global Step: 11670 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:27:10,751-Speed 10455.48 samples/sec Loss 11.0985 LearningRate 0.3380 Epoch: 2 Global Step: 11680 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:27:18,572-Speed 10475.13 samples/sec Loss 11.0811 LearningRate 0.3383 Epoch: 2 Global Step: 11690 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:27:26,408-Speed 10455.98 samples/sec Loss 11.0538 LearningRate 0.3385 Epoch: 2 Global Step: 11700 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:27:34,243-Speed 10457.17 samples/sec Loss 11.0359 LearningRate 0.3388 Epoch: 2 Global Step: 11710 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:27:42,068-Speed 10471.69 samples/sec Loss 11.0721 LearningRate 0.3391 Epoch: 2 Global Step: 11720 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:27:49,925-Speed 10428.16 samples/sec Loss 11.1705 LearningRate 0.3394 Epoch: 2 Global Step: 11730 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:27:57,762-Speed 10456.29 samples/sec Loss 11.0841 LearningRate 0.3397 Epoch: 2 Global Step: 11740 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:28:05,582-Speed 10477.56 samples/sec Loss 11.0069 LearningRate 0.3400 Epoch: 2 Global Step: 11750 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:28:13,405-Speed 10473.48 samples/sec Loss 11.0484 LearningRate 0.3403 Epoch: 2 Global Step: 11760 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:28:21,296-Speed 10386.91 samples/sec Loss 11.0909 LearningRate 0.3406 Epoch: 2 Global Step: 11770 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:28:29,185-Speed 10386.43 samples/sec Loss 11.0518 LearningRate 0.3409 Epoch: 2 Global Step: 11780 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:28:37,020-Speed 10457.38 samples/sec Loss 11.1126 LearningRate 0.3411 Epoch: 2 Global Step: 11790 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:28:44,845-Speed 10469.71 samples/sec Loss 11.0786 LearningRate 0.3414 Epoch: 2 Global Step: 11800 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:28:52,710-Speed 10418.35 samples/sec Loss 11.0570 LearningRate 0.3417 Epoch: 2 Global Step: 11810 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:29:00,502-Speed 10515.18 samples/sec Loss 11.0853 LearningRate 0.3420 Epoch: 2 Global Step: 11820 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:29:08,304-Speed 10501.42 samples/sec Loss 11.3346 LearningRate 0.3423 Epoch: 2 Global Step: 11830 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:29:16,180-Speed 10401.57 samples/sec Loss 11.1900 LearningRate 0.3426 Epoch: 2 Global Step: 11840 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:29:23,988-Speed 10493.91 samples/sec Loss 11.0431 LearningRate 0.3429 Epoch: 2 Global Step: 11850 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:29:31,791-Speed 10500.22 samples/sec Loss 11.0203 LearningRate 0.3432 Epoch: 2 Global Step: 11860 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:29:39,611-Speed 10477.11 samples/sec Loss 11.0805 LearningRate 0.3435 Epoch: 2 Global Step: 11870 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:29:47,403-Speed 10515.34 samples/sec Loss 11.0941 LearningRate 0.3437 Epoch: 2 Global Step: 11880 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:29:55,258-Speed 10429.70 samples/sec Loss 11.0368 LearningRate 0.3440 Epoch: 2 Global Step: 11890 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:30:03,078-Speed 10478.89 samples/sec Loss 11.0515 LearningRate 0.3443 Epoch: 2 Global Step: 11900 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:30:10,881-Speed 10499.74 samples/sec Loss 11.0608 LearningRate 0.3446 Epoch: 2 Global Step: 11910 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:30:18,690-Speed 10491.25 samples/sec Loss 11.0354 LearningRate 0.3449 Epoch: 2 Global Step: 11920 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-15 17:30:26,521-Speed 10463.77 samples/sec Loss 11.0403 LearningRate 0.3452 Epoch: 2 Global Step: 11930 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-15 17:30:34,329-Speed 10493.02 samples/sec Loss 11.0120 LearningRate 0.3455 Epoch: 2 Global Step: 11940 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-15 17:30:42,192-Speed 10420.52 samples/sec Loss 11.0770 LearningRate 0.3458 Epoch: 2 Global Step: 11950 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-15 17:30:50,009-Speed 10480.04 samples/sec Loss 11.1610 LearningRate 0.3461 Epoch: 2 Global Step: 11960 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-15 17:30:57,866-Speed 10427.51 samples/sec Loss 11.0925 LearningRate 0.3464 Epoch: 2 Global Step: 11970 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-15 17:31:05,686-Speed 10478.94 samples/sec Loss 11.0927 LearningRate 0.3466 Epoch: 2 Global Step: 11980 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-15 17:31:13,526-Speed 10451.18 samples/sec Loss 11.1206 LearningRate 0.3469 Epoch: 2 Global Step: 11990 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-15 17:31:21,331-Speed 10497.42 samples/sec Loss 11.0411 LearningRate 0.3472 Epoch: 2 Global Step: 12000 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-15 17:31:29,135-Speed 10499.13 samples/sec Loss 11.1667 LearningRate 0.3475 Epoch: 2 Global Step: 12010 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-01-15 17:31:36,944-Speed 10492.84 samples/sec Loss 11.0826 LearningRate 0.3478 Epoch: 2 Global Step: 12020 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:31:44,757-Speed 10487.23 samples/sec Loss 11.1820 LearningRate 0.3481 Epoch: 2 Global Step: 12030 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:31:52,610-Speed 10433.16 samples/sec Loss 11.0456 LearningRate 0.3484 Epoch: 2 Global Step: 12040 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:32:00,477-Speed 10416.73 samples/sec Loss 11.1058 LearningRate 0.3487 Epoch: 2 Global Step: 12050 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:32:08,324-Speed 10440.77 samples/sec Loss 11.0163 LearningRate 0.3490 Epoch: 2 Global Step: 12060 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:32:16,177-Speed 10434.22 samples/sec Loss 11.1148 LearningRate 0.3492 Epoch: 2 Global Step: 12070 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:32:23,987-Speed 10489.90 samples/sec Loss 11.0939 LearningRate 0.3495 Epoch: 2 Global Step: 12080 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:32:31,781-Speed 10511.91 samples/sec Loss 11.0226 LearningRate 0.3498 Epoch: 2 Global Step: 12090 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:32:39,571-Speed 10517.57 samples/sec Loss 11.0764 LearningRate 0.3501 Epoch: 2 Global Step: 12100 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:32:47,387-Speed 10484.48 samples/sec Loss 11.0820 LearningRate 0.3504 Epoch: 2 Global Step: 12110 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:32:55,215-Speed 10465.48 samples/sec Loss 11.0292 LearningRate 0.3507 Epoch: 2 Global Step: 12120 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:33:03,151-Speed 10324.46 samples/sec Loss 11.0863 LearningRate 0.3510 Epoch: 2 Global Step: 12130 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:33:10,980-Speed 10465.74 samples/sec Loss 11.1281 LearningRate 0.3513 Epoch: 2 Global Step: 12140 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:33:18,755-Speed 10537.90 samples/sec Loss 11.0614 LearningRate 0.3516 Epoch: 2 Global Step: 12150 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:33:26,545-Speed 10518.28 samples/sec Loss 11.1026 LearningRate 0.3519 Epoch: 2 Global Step: 12160 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:33:34,324-Speed 10532.23 samples/sec Loss 11.1431 LearningRate 0.3521 Epoch: 2 Global Step: 12170 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:33:42,141-Speed 10482.97 samples/sec Loss 11.0713 LearningRate 0.3524 Epoch: 2 Global Step: 12180 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:33:49,936-Speed 10510.92 samples/sec Loss 11.0866 LearningRate 0.3527 Epoch: 2 Global Step: 12190 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:33:57,735-Speed 10506.05 samples/sec Loss 11.0217 LearningRate 0.3530 Epoch: 2 Global Step: 12200 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:34:05,548-Speed 10485.74 samples/sec Loss 11.0795 LearningRate 0.3533 Epoch: 2 Global Step: 12210 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:34:13,372-Speed 10472.37 samples/sec Loss 11.1051 LearningRate 0.3536 Epoch: 2 Global Step: 12220 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:34:21,200-Speed 10467.81 samples/sec Loss 11.0990 LearningRate 0.3539 Epoch: 2 Global Step: 12230 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:34:29,002-Speed 10501.44 samples/sec Loss 11.1782 LearningRate 0.3542 Epoch: 2 Global Step: 12240 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:34:36,782-Speed 10535.99 samples/sec Loss 11.1739 LearningRate 0.3545 Epoch: 2 Global Step: 12250 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:34:44,578-Speed 10510.34 samples/sec Loss 11.1488 LearningRate 0.3547 Epoch: 2 Global Step: 12260 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:34:52,391-Speed 10486.78 samples/sec Loss 11.0959 LearningRate 0.3550 Epoch: 2 Global Step: 12270 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:35:00,210-Speed 10478.61 samples/sec Loss 11.1439 LearningRate 0.3553 Epoch: 2 Global Step: 12280 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:35:08,025-Speed 10485.05 samples/sec Loss 11.0445 LearningRate 0.3556 Epoch: 2 Global Step: 12290 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:35:15,828-Speed 10499.87 samples/sec Loss 11.0640 LearningRate 0.3559 Epoch: 2 Global Step: 12300 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:35:23,664-Speed 10455.66 samples/sec Loss 11.0357 LearningRate 0.3562 Epoch: 2 Global Step: 12310 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:35:31,503-Speed 10453.42 samples/sec Loss 11.0688 LearningRate 0.3565 Epoch: 2 Global Step: 12320 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:35:39,297-Speed 10511.09 samples/sec Loss 11.0493 LearningRate 0.3568 Epoch: 2 Global Step: 12330 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:35:47,107-Speed 10491.94 samples/sec Loss 11.0447 LearningRate 0.3571 Epoch: 2 Global Step: 12340 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:35:54,919-Speed 10488.25 samples/sec Loss 11.2155 LearningRate 0.3573 Epoch: 2 Global Step: 12350 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:36:02,732-Speed 10487.12 samples/sec Loss 11.0893 LearningRate 0.3576 Epoch: 2 Global Step: 12360 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:36:10,527-Speed 10511.17 samples/sec Loss 11.1274 LearningRate 0.3579 Epoch: 2 Global Step: 12370 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:36:18,326-Speed 10505.59 samples/sec Loss 11.0582 LearningRate 0.3582 Epoch: 2 Global Step: 12380 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:36:26,162-Speed 10460.29 samples/sec Loss 11.1313 LearningRate 0.3585 Epoch: 2 Global Step: 12390 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:36:33,959-Speed 10507.11 samples/sec Loss 11.2474 LearningRate 0.3588 Epoch: 2 Global Step: 12400 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:36:41,763-Speed 10499.64 samples/sec Loss 11.1819 LearningRate 0.3591 Epoch: 2 Global Step: 12410 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:36:49,554-Speed 10517.39 samples/sec Loss 11.1066 LearningRate 0.3594 Epoch: 2 Global Step: 12420 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:36:57,360-Speed 10494.95 samples/sec Loss 11.0853 LearningRate 0.3597 Epoch: 2 Global Step: 12430 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:37:05,148-Speed 10519.82 samples/sec Loss 11.0858 LearningRate 0.3600 Epoch: 2 Global Step: 12440 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:37:12,942-Speed 10513.01 samples/sec Loss 11.0876 LearningRate 0.3602 Epoch: 2 Global Step: 12450 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:37:20,738-Speed 10509.61 samples/sec Loss 11.0547 LearningRate 0.3605 Epoch: 2 Global Step: 12460 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:37:28,591-Speed 10434.42 samples/sec Loss 11.1442 LearningRate 0.3608 Epoch: 2 Global Step: 12470 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:37:36,437-Speed 10442.24 samples/sec Loss 11.0813 LearningRate 0.3611 Epoch: 2 Global Step: 12480 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:37:44,259-Speed 10475.11 samples/sec Loss 10.9981 LearningRate 0.3614 Epoch: 2 Global Step: 12490 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:37:52,102-Speed 10446.70 samples/sec Loss 11.1024 LearningRate 0.3617 Epoch: 2 Global Step: 12500 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:37:59,963-Speed 10422.54 samples/sec Loss 11.0372 LearningRate 0.3620 Epoch: 2 Global Step: 12510 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:38:07,759-Speed 10509.78 samples/sec Loss 11.0784 LearningRate 0.3623 Epoch: 2 Global Step: 12520 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:38:15,534-Speed 10537.75 samples/sec Loss 11.0505 LearningRate 0.3626 Epoch: 2 Global Step: 12530 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:38:23,324-Speed 10517.80 samples/sec Loss 11.1460 LearningRate 0.3628 Epoch: 2 Global Step: 12540 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:38:31,132-Speed 10494.21 samples/sec Loss 11.1691 LearningRate 0.3631 Epoch: 2 Global Step: 12550 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:38:38,930-Speed 10505.62 samples/sec Loss 11.0876 LearningRate 0.3634 Epoch: 2 Global Step: 12560 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:38:46,723-Speed 10513.86 samples/sec Loss 11.0933 LearningRate 0.3637 Epoch: 2 Global Step: 12570 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:38:54,559-Speed 10457.12 samples/sec Loss 11.1167 LearningRate 0.3640 Epoch: 2 Global Step: 12580 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:39:02,377-Speed 10480.15 samples/sec Loss 11.1933 LearningRate 0.3643 Epoch: 2 Global Step: 12590 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:39:10,224-Speed 10441.69 samples/sec Loss 11.1063 LearningRate 0.3646 Epoch: 2 Global Step: 12600 Fp16 Grad Scale: 524288 Required: 20 hours Training: 2022-01-15 17:39:18,014-Speed 10517.74 samples/sec Loss 11.1771 LearningRate 0.3649 Epoch: 2 Global Step: 12610 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:39:25,849-Speed 10456.81 samples/sec Loss 11.0595 LearningRate 0.3652 Epoch: 2 Global Step: 12620 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:39:33,649-Speed 10504.02 samples/sec Loss 11.0798 LearningRate 0.3655 Epoch: 2 Global Step: 12630 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:39:41,451-Speed 10502.45 samples/sec Loss 11.0977 LearningRate 0.3657 Epoch: 2 Global Step: 12640 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:39:49,258-Speed 10494.22 samples/sec Loss 11.1585 LearningRate 0.3660 Epoch: 2 Global Step: 12650 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:39:57,058-Speed 10503.73 samples/sec Loss 11.0931 LearningRate 0.3663 Epoch: 2 Global Step: 12660 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:40:04,877-Speed 10482.24 samples/sec Loss 11.2376 LearningRate 0.3666 Epoch: 2 Global Step: 12670 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:40:12,687-Speed 10493.46 samples/sec Loss 11.1691 LearningRate 0.3669 Epoch: 2 Global Step: 12680 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:40:20,478-Speed 10515.56 samples/sec Loss 11.2703 LearningRate 0.3672 Epoch: 2 Global Step: 12690 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:40:28,304-Speed 10469.92 samples/sec Loss 11.2477 LearningRate 0.3675 Epoch: 2 Global Step: 12700 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:40:36,095-Speed 10516.55 samples/sec Loss 11.1244 LearningRate 0.3678 Epoch: 2 Global Step: 12710 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:40:43,898-Speed 10500.34 samples/sec Loss 11.1160 LearningRate 0.3681 Epoch: 2 Global Step: 12720 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:40:51,702-Speed 10499.02 samples/sec Loss 11.0949 LearningRate 0.3683 Epoch: 2 Global Step: 12730 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:40:59,509-Speed 10494.93 samples/sec Loss 11.1304 LearningRate 0.3686 Epoch: 2 Global Step: 12740 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:41:07,333-Speed 10471.49 samples/sec Loss 11.1129 LearningRate 0.3689 Epoch: 2 Global Step: 12750 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:41:15,109-Speed 10537.20 samples/sec Loss 11.0514 LearningRate 0.3692 Epoch: 2 Global Step: 12760 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:41:22,902-Speed 10514.41 samples/sec Loss 11.2199 LearningRate 0.3695 Epoch: 2 Global Step: 12770 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:41:30,724-Speed 10474.57 samples/sec Loss 11.1238 LearningRate 0.3698 Epoch: 2 Global Step: 12780 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:41:38,626-Speed 10370.15 samples/sec Loss 11.0739 LearningRate 0.3701 Epoch: 2 Global Step: 12790 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:41:46,429-Speed 10499.77 samples/sec Loss 11.1500 LearningRate 0.3704 Epoch: 2 Global Step: 12800 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:41:54,228-Speed 10506.31 samples/sec Loss 11.1013 LearningRate 0.3707 Epoch: 2 Global Step: 12810 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:42:02,037-Speed 10493.65 samples/sec Loss 11.1497 LearningRate 0.3709 Epoch: 2 Global Step: 12820 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:42:09,830-Speed 10513.46 samples/sec Loss 11.0987 LearningRate 0.3712 Epoch: 2 Global Step: 12830 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:42:17,628-Speed 10506.90 samples/sec Loss 11.1568 LearningRate 0.3715 Epoch: 2 Global Step: 12840 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:42:25,418-Speed 10518.50 samples/sec Loss 11.1918 LearningRate 0.3718 Epoch: 2 Global Step: 12850 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:42:33,195-Speed 10534.92 samples/sec Loss 11.1326 LearningRate 0.3721 Epoch: 2 Global Step: 12860 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:42:41,026-Speed 10462.41 samples/sec Loss 11.0957 LearningRate 0.3724 Epoch: 2 Global Step: 12870 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:42:48,813-Speed 10522.73 samples/sec Loss 11.1358 LearningRate 0.3727 Epoch: 2 Global Step: 12880 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:42:56,633-Speed 10477.17 samples/sec Loss 11.1380 LearningRate 0.3730 Epoch: 2 Global Step: 12890 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:43:04,442-Speed 10496.88 samples/sec Loss 11.1679 LearningRate 0.3733 Epoch: 2 Global Step: 12900 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:43:12,225-Speed 10532.67 samples/sec Loss 11.1577 LearningRate 0.3736 Epoch: 2 Global Step: 12910 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:43:20,013-Speed 10522.56 samples/sec Loss 11.1871 LearningRate 0.3738 Epoch: 2 Global Step: 12920 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:43:27,846-Speed 10460.07 samples/sec Loss 11.2618 LearningRate 0.3741 Epoch: 2 Global Step: 12930 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:43:35,649-Speed 10499.71 samples/sec Loss 11.1821 LearningRate 0.3744 Epoch: 2 Global Step: 12940 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:43:43,438-Speed 10520.24 samples/sec Loss 11.2407 LearningRate 0.3747 Epoch: 2 Global Step: 12950 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:43:51,232-Speed 10512.83 samples/sec Loss 11.1870 LearningRate 0.3750 Epoch: 2 Global Step: 12960 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:43:59,053-Speed 10475.42 samples/sec Loss 11.1993 LearningRate 0.3753 Epoch: 2 Global Step: 12970 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:44:06,852-Speed 10505.59 samples/sec Loss 11.1663 LearningRate 0.3756 Epoch: 2 Global Step: 12980 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:44:14,647-Speed 10511.24 samples/sec Loss 11.1156 LearningRate 0.3759 Epoch: 2 Global Step: 12990 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:44:22,503-Speed 10430.64 samples/sec Loss 11.1230 LearningRate 0.3762 Epoch: 2 Global Step: 13000 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:44:30,300-Speed 10509.38 samples/sec Loss 11.1376 LearningRate 0.3764 Epoch: 2 Global Step: 13010 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:44:38,098-Speed 10507.86 samples/sec Loss 11.1273 LearningRate 0.3767 Epoch: 2 Global Step: 13020 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:44:45,902-Speed 10497.75 samples/sec Loss 11.1065 LearningRate 0.3770 Epoch: 2 Global Step: 13030 Fp16 Grad Scale: 524288 Required: 20 hours Training: 2022-01-15 17:44:53,665-Speed 10554.17 samples/sec Loss 11.1460 LearningRate 0.3773 Epoch: 2 Global Step: 13040 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:45:01,499-Speed 10459.34 samples/sec Loss 11.2171 LearningRate 0.3776 Epoch: 2 Global Step: 13050 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:45:09,327-Speed 10466.55 samples/sec Loss 11.2399 LearningRate 0.3779 Epoch: 2 Global Step: 13060 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:45:17,174-Speed 10442.37 samples/sec Loss 11.1518 LearningRate 0.3782 Epoch: 2 Global Step: 13070 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:45:24,949-Speed 10539.38 samples/sec Loss 11.2482 LearningRate 0.3785 Epoch: 2 Global Step: 13080 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:45:32,753-Speed 10498.97 samples/sec Loss 11.2064 LearningRate 0.3788 Epoch: 2 Global Step: 13090 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:45:40,578-Speed 10472.24 samples/sec Loss 11.1599 LearningRate 0.3791 Epoch: 2 Global Step: 13100 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:45:48,377-Speed 10505.45 samples/sec Loss 11.0665 LearningRate 0.3793 Epoch: 2 Global Step: 13110 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:45:56,182-Speed 10499.47 samples/sec Loss 11.1183 LearningRate 0.3796 Epoch: 2 Global Step: 13120 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:46:04,043-Speed 10423.69 samples/sec Loss 11.0925 LearningRate 0.3799 Epoch: 2 Global Step: 13130 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:46:11,850-Speed 10494.90 samples/sec Loss 11.1393 LearningRate 0.3802 Epoch: 2 Global Step: 13140 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:46:19,654-Speed 10499.82 samples/sec Loss 11.2383 LearningRate 0.3805 Epoch: 2 Global Step: 13150 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:46:27,493-Speed 10457.60 samples/sec Loss 11.1904 LearningRate 0.3808 Epoch: 2 Global Step: 13160 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:46:35,343-Speed 10442.13 samples/sec Loss 11.1475 LearningRate 0.3811 Epoch: 2 Global Step: 13170 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:46:43,168-Speed 10478.05 samples/sec Loss 11.4029 LearningRate 0.3814 Epoch: 2 Global Step: 13180 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:46:50,992-Speed 10472.61 samples/sec Loss 11.2161 LearningRate 0.3817 Epoch: 2 Global Step: 13190 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:46:58,833-Speed 10451.87 samples/sec Loss 11.3283 LearningRate 0.3819 Epoch: 2 Global Step: 13200 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:47:06,637-Speed 10500.34 samples/sec Loss 11.1342 LearningRate 0.3822 Epoch: 2 Global Step: 13210 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:47:14,452-Speed 10485.19 samples/sec Loss 11.1879 LearningRate 0.3825 Epoch: 2 Global Step: 13220 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:47:22,249-Speed 10507.48 samples/sec Loss 11.1589 LearningRate 0.3828 Epoch: 2 Global Step: 13230 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:47:30,067-Speed 10482.24 samples/sec Loss 11.1753 LearningRate 0.3831 Epoch: 2 Global Step: 13240 Fp16 Grad Scale: 524288 Required: 20 hours Training: 2022-01-15 17:47:37,874-Speed 10495.49 samples/sec Loss 11.1835 LearningRate 0.3834 Epoch: 2 Global Step: 13250 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:47:45,699-Speed 10471.58 samples/sec Loss 11.2122 LearningRate 0.3837 Epoch: 2 Global Step: 13260 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:47:53,499-Speed 10504.10 samples/sec Loss 11.1867 LearningRate 0.3840 Epoch: 2 Global Step: 13270 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:48:01,343-Speed 10446.26 samples/sec Loss 11.1795 LearningRate 0.3843 Epoch: 2 Global Step: 13280 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:48:09,145-Speed 10502.71 samples/sec Loss 11.1222 LearningRate 0.3845 Epoch: 2 Global Step: 13290 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:48:16,951-Speed 10495.66 samples/sec Loss 11.2074 LearningRate 0.3848 Epoch: 2 Global Step: 13300 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:48:24,786-Speed 10458.50 samples/sec Loss 11.2291 LearningRate 0.3851 Epoch: 2 Global Step: 13310 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:48:32,620-Speed 10459.44 samples/sec Loss 11.2779 LearningRate 0.3854 Epoch: 2 Global Step: 13320 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:48:40,421-Speed 10502.22 samples/sec Loss 11.2377 LearningRate 0.3857 Epoch: 2 Global Step: 13330 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:48:48,219-Speed 10508.20 samples/sec Loss 11.3052 LearningRate 0.3860 Epoch: 2 Global Step: 13340 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:48:56,019-Speed 10505.04 samples/sec Loss 11.2529 LearningRate 0.3863 Epoch: 2 Global Step: 13350 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:49:03,805-Speed 10523.17 samples/sec Loss 11.2933 LearningRate 0.3866 Epoch: 2 Global Step: 13360 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:49:11,633-Speed 10467.25 samples/sec Loss 11.1773 LearningRate 0.3869 Epoch: 2 Global Step: 13370 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:49:19,437-Speed 10498.81 samples/sec Loss 11.1668 LearningRate 0.3872 Epoch: 2 Global Step: 13380 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:49:27,240-Speed 10501.21 samples/sec Loss 11.1327 LearningRate 0.3874 Epoch: 2 Global Step: 13390 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:49:35,058-Speed 10480.38 samples/sec Loss 11.1186 LearningRate 0.3877 Epoch: 2 Global Step: 13400 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:49:42,857-Speed 10510.04 samples/sec Loss 11.1404 LearningRate 0.3880 Epoch: 2 Global Step: 13410 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:49:50,680-Speed 10473.31 samples/sec Loss 11.1363 LearningRate 0.3883 Epoch: 2 Global Step: 13420 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:49:58,467-Speed 10520.48 samples/sec Loss 11.1994 LearningRate 0.3886 Epoch: 2 Global Step: 13430 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:50:06,268-Speed 10507.15 samples/sec Loss 11.2238 LearningRate 0.3889 Epoch: 2 Global Step: 13440 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:50:14,052-Speed 10525.95 samples/sec Loss 11.2633 LearningRate 0.3892 Epoch: 2 Global Step: 13450 Fp16 Grad Scale: 524288 Required: 20 hours Training: 2022-01-15 17:50:21,832-Speed 10531.26 samples/sec Loss 11.1666 LearningRate 0.3895 Epoch: 2 Global Step: 13460 Fp16 Grad Scale: 524288 Required: 20 hours Training: 2022-01-15 17:50:29,662-Speed 10464.51 samples/sec Loss 11.1303 LearningRate 0.3898 Epoch: 2 Global Step: 13470 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:50:37,462-Speed 10503.83 samples/sec Loss 11.2371 LearningRate 0.3900 Epoch: 2 Global Step: 13480 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:50:45,266-Speed 10499.84 samples/sec Loss 11.2361 LearningRate 0.3903 Epoch: 2 Global Step: 13490 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:50:53,071-Speed 10497.84 samples/sec Loss 11.3397 LearningRate 0.3906 Epoch: 2 Global Step: 13500 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:51:00,864-Speed 10515.32 samples/sec Loss 11.2244 LearningRate 0.3909 Epoch: 2 Global Step: 13510 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:51:08,637-Speed 10540.52 samples/sec Loss 11.1677 LearningRate 0.3912 Epoch: 2 Global Step: 13520 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:51:16,407-Speed 10545.55 samples/sec Loss 11.2854 LearningRate 0.3915 Epoch: 2 Global Step: 13530 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:51:24,196-Speed 10519.57 samples/sec Loss 11.2405 LearningRate 0.3918 Epoch: 2 Global Step: 13540 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:51:31,980-Speed 10526.00 samples/sec Loss 11.2573 LearningRate 0.3921 Epoch: 2 Global Step: 13550 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:51:39,782-Speed 10501.30 samples/sec Loss 11.2180 LearningRate 0.3924 Epoch: 2 Global Step: 13560 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:51:47,585-Speed 10499.64 samples/sec Loss 11.2469 LearningRate 0.3927 Epoch: 2 Global Step: 13570 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:51:55,380-Speed 10510.42 samples/sec Loss 11.1716 LearningRate 0.3929 Epoch: 2 Global Step: 13580 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:52:03,170-Speed 10518.55 samples/sec Loss 11.0682 LearningRate 0.3932 Epoch: 2 Global Step: 13590 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:52:10,969-Speed 10504.93 samples/sec Loss 11.2394 LearningRate 0.3935 Epoch: 2 Global Step: 13600 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:52:18,753-Speed 10525.54 samples/sec Loss 11.2716 LearningRate 0.3938 Epoch: 2 Global Step: 13610 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:52:26,550-Speed 10508.34 samples/sec Loss 11.2205 LearningRate 0.3941 Epoch: 2 Global Step: 13620 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:52:34,339-Speed 10517.61 samples/sec Loss 11.3248 LearningRate 0.3944 Epoch: 2 Global Step: 13630 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:52:42,111-Speed 10542.19 samples/sec Loss 11.2070 LearningRate 0.3947 Epoch: 2 Global Step: 13640 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:52:49,936-Speed 10470.49 samples/sec Loss 11.2795 LearningRate 0.3950 Epoch: 2 Global Step: 13650 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:52:57,727-Speed 10515.98 samples/sec Loss 11.1439 LearningRate 0.3953 Epoch: 2 Global Step: 13660 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:53:05,553-Speed 10469.56 samples/sec Loss 11.2655 LearningRate 0.3955 Epoch: 2 Global Step: 13670 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:53:13,372-Speed 10478.07 samples/sec Loss 12.0651 LearningRate 0.3958 Epoch: 2 Global Step: 13680 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:53:21,165-Speed 10514.03 samples/sec Loss 13.2573 LearningRate 0.3961 Epoch: 2 Global Step: 13690 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:53:28,948-Speed 10528.49 samples/sec Loss 12.5850 LearningRate 0.3964 Epoch: 2 Global Step: 13700 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:53:36,739-Speed 10516.53 samples/sec Loss 11.9859 LearningRate 0.3967 Epoch: 2 Global Step: 13710 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:53:44,517-Speed 10534.98 samples/sec Loss 11.4907 LearningRate 0.3970 Epoch: 2 Global Step: 13720 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:53:52,333-Speed 10484.74 samples/sec Loss 11.4102 LearningRate 0.3973 Epoch: 2 Global Step: 13730 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:54:00,150-Speed 10481.33 samples/sec Loss 11.2825 LearningRate 0.3976 Epoch: 2 Global Step: 13740 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:54:07,955-Speed 10499.13 samples/sec Loss 11.2098 LearningRate 0.3979 Epoch: 2 Global Step: 13750 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:54:15,749-Speed 10512.03 samples/sec Loss 11.1532 LearningRate 0.3981 Epoch: 2 Global Step: 13760 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:54:23,549-Speed 10507.08 samples/sec Loss 11.0963 LearningRate 0.3984 Epoch: 2 Global Step: 13770 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:54:31,380-Speed 10464.98 samples/sec Loss 11.2152 LearningRate 0.3987 Epoch: 2 Global Step: 13780 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:54:39,189-Speed 10491.53 samples/sec Loss 11.1040 LearningRate 0.3990 Epoch: 2 Global Step: 13790 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:54:47,012-Speed 10473.06 samples/sec Loss 11.1940 LearningRate 0.3993 Epoch: 2 Global Step: 13800 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:54:54,788-Speed 10537.76 samples/sec Loss 11.1627 LearningRate 0.3996 Epoch: 2 Global Step: 13810 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:55:02,588-Speed 10511.17 samples/sec Loss 11.1589 LearningRate 0.3999 Epoch: 2 Global Step: 13820 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:55:10,386-Speed 10505.49 samples/sec Loss 11.2428 LearningRate 0.4002 Epoch: 2 Global Step: 13830 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:55:18,207-Speed 10476.11 samples/sec Loss 11.1386 LearningRate 0.4005 Epoch: 2 Global Step: 13840 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:55:26,004-Speed 10511.40 samples/sec Loss 11.3230 LearningRate 0.4008 Epoch: 2 Global Step: 13850 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:55:33,788-Speed 10525.71 samples/sec Loss 11.2764 LearningRate 0.4010 Epoch: 2 Global Step: 13860 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:55:41,575-Speed 10521.86 samples/sec Loss 11.1915 LearningRate 0.4013 Epoch: 2 Global Step: 13870 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:55:49,360-Speed 10523.11 samples/sec Loss 11.2416 LearningRate 0.4016 Epoch: 2 Global Step: 13880 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:55:57,130-Speed 10545.66 samples/sec Loss 11.1974 LearningRate 0.4019 Epoch: 2 Global Step: 13890 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:56:04,917-Speed 10522.04 samples/sec Loss 11.3124 LearningRate 0.4022 Epoch: 2 Global Step: 13900 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:56:12,703-Speed 10524.13 samples/sec Loss 11.2227 LearningRate 0.4025 Epoch: 2 Global Step: 13910 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:56:20,472-Speed 10546.24 samples/sec Loss 11.3028 LearningRate 0.4028 Epoch: 2 Global Step: 13920 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:56:28,257-Speed 10524.19 samples/sec Loss 11.2438 LearningRate 0.4031 Epoch: 2 Global Step: 13930 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:56:36,049-Speed 10515.32 samples/sec Loss 11.3071 LearningRate 0.4034 Epoch: 2 Global Step: 13940 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:56:43,900-Speed 10437.58 samples/sec Loss 11.2348 LearningRate 0.4036 Epoch: 2 Global Step: 13950 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:56:51,678-Speed 10533.51 samples/sec Loss 11.3294 LearningRate 0.4039 Epoch: 2 Global Step: 13960 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:56:59,466-Speed 10520.02 samples/sec Loss 11.2359 LearningRate 0.4042 Epoch: 2 Global Step: 13970 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:57:07,271-Speed 10498.22 samples/sec Loss 11.2516 LearningRate 0.4045 Epoch: 2 Global Step: 13980 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:57:15,043-Speed 10541.72 samples/sec Loss 11.2812 LearningRate 0.4048 Epoch: 2 Global Step: 13990 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:57:22,854-Speed 10488.43 samples/sec Loss 11.3345 LearningRate 0.4051 Epoch: 2 Global Step: 14000 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:57:30,672-Speed 10481.51 samples/sec Loss 11.3303 LearningRate 0.4054 Epoch: 2 Global Step: 14010 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:57:38,454-Speed 10528.43 samples/sec Loss 11.2954 LearningRate 0.4057 Epoch: 2 Global Step: 14020 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:57:46,274-Speed 10478.25 samples/sec Loss 11.2596 LearningRate 0.4060 Epoch: 2 Global Step: 14030 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:57:54,070-Speed 10509.50 samples/sec Loss 11.3127 LearningRate 0.4062 Epoch: 2 Global Step: 14040 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:58:01,843-Speed 10541.46 samples/sec Loss 11.4430 LearningRate 0.4065 Epoch: 2 Global Step: 14050 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:58:09,613-Speed 10545.68 samples/sec Loss 11.2597 LearningRate 0.4068 Epoch: 2 Global Step: 14060 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:58:17,407-Speed 10511.90 samples/sec Loss 11.2783 LearningRate 0.4071 Epoch: 2 Global Step: 14070 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:58:25,191-Speed 10525.26 samples/sec Loss 11.3027 LearningRate 0.4074 Epoch: 2 Global Step: 14080 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:58:33,004-Speed 10489.06 samples/sec Loss 11.2588 LearningRate 0.4077 Epoch: 2 Global Step: 14090 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:58:40,827-Speed 10474.69 samples/sec Loss 11.2712 LearningRate 0.4080 Epoch: 2 Global Step: 14100 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:58:48,684-Speed 10427.15 samples/sec Loss 11.3011 LearningRate 0.4083 Epoch: 2 Global Step: 14110 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:58:56,492-Speed 10494.32 samples/sec Loss 11.3248 LearningRate 0.4086 Epoch: 2 Global Step: 14120 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:59:04,293-Speed 10509.08 samples/sec Loss 11.2896 LearningRate 0.4089 Epoch: 2 Global Step: 14130 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 17:59:12,095-Speed 10501.75 samples/sec Loss 11.2286 LearningRate 0.4091 Epoch: 2 Global Step: 14140 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:59:19,914-Speed 10477.72 samples/sec Loss 11.3222 LearningRate 0.4094 Epoch: 2 Global Step: 14150 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:59:27,705-Speed 10517.53 samples/sec Loss 11.3170 LearningRate 0.4097 Epoch: 2 Global Step: 14160 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:59:35,513-Speed 10493.98 samples/sec Loss 11.2199 LearningRate 0.4100 Epoch: 2 Global Step: 14170 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:59:43,332-Speed 10479.38 samples/sec Loss 11.3206 LearningRate 0.4103 Epoch: 2 Global Step: 14180 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:59:51,132-Speed 10507.08 samples/sec Loss 11.3088 LearningRate 0.4106 Epoch: 2 Global Step: 14190 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 17:59:58,948-Speed 10483.24 samples/sec Loss 11.3914 LearningRate 0.4109 Epoch: 2 Global Step: 14200 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 18:00:06,754-Speed 10497.52 samples/sec Loss 11.3178 LearningRate 0.4112 Epoch: 2 Global Step: 14210 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 18:00:14,575-Speed 10476.54 samples/sec Loss 11.2491 LearningRate 0.4115 Epoch: 2 Global Step: 14220 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 18:00:22,392-Speed 10481.58 samples/sec Loss 11.2652 LearningRate 0.4117 Epoch: 2 Global Step: 14230 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 18:00:30,204-Speed 10488.65 samples/sec Loss 11.3220 LearningRate 0.4120 Epoch: 2 Global Step: 14240 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 18:00:38,037-Speed 10459.60 samples/sec Loss 11.3701 LearningRate 0.4123 Epoch: 2 Global Step: 14250 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 18:00:45,866-Speed 10465.45 samples/sec Loss 11.2871 LearningRate 0.4126 Epoch: 2 Global Step: 14260 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 18:00:53,712-Speed 10444.75 samples/sec Loss 11.3335 LearningRate 0.4129 Epoch: 2 Global Step: 14270 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 18:01:01,512-Speed 10503.64 samples/sec Loss 11.2869 LearningRate 0.4132 Epoch: 2 Global Step: 14280 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 18:01:09,308-Speed 10511.03 samples/sec Loss 11.2795 LearningRate 0.4135 Epoch: 2 Global Step: 14290 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 18:01:17,109-Speed 10507.38 samples/sec Loss 11.3423 LearningRate 0.4138 Epoch: 2 Global Step: 14300 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 18:01:24,917-Speed 10493.08 samples/sec Loss 11.3457 LearningRate 0.4141 Epoch: 2 Global Step: 14310 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 18:01:32,713-Speed 10509.03 samples/sec Loss 11.3680 LearningRate 0.4144 Epoch: 2 Global Step: 14320 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 18:01:40,500-Speed 10526.76 samples/sec Loss 11.3425 LearningRate 0.4146 Epoch: 2 Global Step: 14330 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 18:01:48,308-Speed 10496.53 samples/sec Loss 11.3496 LearningRate 0.4149 Epoch: 2 Global Step: 14340 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 18:01:56,081-Speed 10539.93 samples/sec Loss 11.2907 LearningRate 0.4152 Epoch: 2 Global Step: 14350 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 18:02:03,927-Speed 10448.04 samples/sec Loss 11.3107 LearningRate 0.4155 Epoch: 2 Global Step: 14360 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 18:02:11,718-Speed 10517.44 samples/sec Loss 11.2975 LearningRate 0.4158 Epoch: 2 Global Step: 14370 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 18:02:19,494-Speed 10536.13 samples/sec Loss 11.2613 LearningRate 0.4161 Epoch: 2 Global Step: 14380 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 18:02:27,284-Speed 10518.20 samples/sec Loss 11.3151 LearningRate 0.4164 Epoch: 2 Global Step: 14390 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 18:02:35,080-Speed 10509.28 samples/sec Loss 11.3425 LearningRate 0.4167 Epoch: 2 Global Step: 14400 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 18:02:42,911-Speed 10462.76 samples/sec Loss 11.5044 LearningRate 0.4170 Epoch: 2 Global Step: 14410 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 18:02:50,689-Speed 10539.67 samples/sec Loss 11.4723 LearningRate 0.4172 Epoch: 2 Global Step: 14420 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 18:02:58,488-Speed 10506.12 samples/sec Loss 11.3887 LearningRate 0.4175 Epoch: 2 Global Step: 14430 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 18:03:06,270-Speed 10528.68 samples/sec Loss 11.3700 LearningRate 0.4178 Epoch: 2 Global Step: 14440 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 18:03:14,092-Speed 10474.83 samples/sec Loss 11.3231 LearningRate 0.4181 Epoch: 2 Global Step: 14450 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 18:03:21,980-Speed 10388.79 samples/sec Loss 11.2703 LearningRate 0.4184 Epoch: 2 Global Step: 14460 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 18:03:29,783-Speed 10500.35 samples/sec Loss 11.4473 LearningRate 0.4187 Epoch: 2 Global Step: 14470 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 18:03:37,600-Speed 10480.49 samples/sec Loss 11.3316 LearningRate 0.4190 Epoch: 2 Global Step: 14480 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 18:03:45,384-Speed 10526.38 samples/sec Loss 11.3035 LearningRate 0.4193 Epoch: 2 Global Step: 14490 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 18:03:53,174-Speed 10517.69 samples/sec Loss 11.2937 LearningRate 0.4196 Epoch: 2 Global Step: 14500 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 18:04:00,963-Speed 10518.81 samples/sec Loss 11.3627 LearningRate 0.4198 Epoch: 2 Global Step: 14510 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 18:04:08,756-Speed 10514.39 samples/sec Loss 11.4959 LearningRate 0.4201 Epoch: 2 Global Step: 14520 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 18:04:16,609-Speed 10433.26 samples/sec Loss 11.4147 LearningRate 0.4204 Epoch: 2 Global Step: 14530 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 18:04:24,446-Speed 10455.22 samples/sec Loss 11.3157 LearningRate 0.4207 Epoch: 2 Global Step: 14540 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 18:04:32,263-Speed 10482.90 samples/sec Loss 11.3228 LearningRate 0.4210 Epoch: 2 Global Step: 14550 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 18:04:40,100-Speed 10455.24 samples/sec Loss 11.2717 LearningRate 0.4213 Epoch: 2 Global Step: 14560 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 18:04:47,917-Speed 10480.87 samples/sec Loss 11.4113 LearningRate 0.4216 Epoch: 2 Global Step: 14570 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 18:04:55,751-Speed 10459.43 samples/sec Loss 11.2948 LearningRate 0.4219 Epoch: 2 Global Step: 14580 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 18:05:03,601-Speed 10438.21 samples/sec Loss 11.3428 LearningRate 0.4222 Epoch: 2 Global Step: 14590 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 18:05:11,437-Speed 10454.80 samples/sec Loss 11.3661 LearningRate 0.4225 Epoch: 2 Global Step: 14600 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 18:05:19,273-Speed 10457.08 samples/sec Loss 11.3019 LearningRate 0.4227 Epoch: 2 Global Step: 14610 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 18:05:27,077-Speed 10499.12 samples/sec Loss 11.4225 LearningRate 0.4230 Epoch: 2 Global Step: 14620 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 18:05:34,907-Speed 10464.35 samples/sec Loss 11.3066 LearningRate 0.4233 Epoch: 2 Global Step: 14630 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 18:05:42,694-Speed 10523.53 samples/sec Loss 11.3711 LearningRate 0.4236 Epoch: 2 Global Step: 14640 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 18:05:50,476-Speed 10530.27 samples/sec Loss 11.3260 LearningRate 0.4239 Epoch: 2 Global Step: 14650 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 18:05:58,276-Speed 10504.12 samples/sec Loss 11.3512 LearningRate 0.4242 Epoch: 2 Global Step: 14660 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 18:06:06,080-Speed 10499.55 samples/sec Loss 11.4070 LearningRate 0.4245 Epoch: 2 Global Step: 14670 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-01-15 18:06:13,925-Speed 10445.30 samples/sec Loss 11.3927 LearningRate 0.4248 Epoch: 2 Global Step: 14680 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 18:06:21,735-Speed 10490.66 samples/sec Loss 11.5101 LearningRate 0.4251 Epoch: 2 Global Step: 14690 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 18:06:29,575-Speed 10453.27 samples/sec Loss 11.4411 LearningRate 0.4253 Epoch: 2 Global Step: 14700 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 18:06:37,395-Speed 10477.68 samples/sec Loss 11.3339 LearningRate 0.4256 Epoch: 2 Global Step: 14710 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 18:06:45,181-Speed 10524.53 samples/sec Loss 11.3067 LearningRate 0.4259 Epoch: 2 Global Step: 14720 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 18:06:52,984-Speed 10499.49 samples/sec Loss 11.3913 LearningRate 0.4262 Epoch: 2 Global Step: 14730 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 18:07:00,783-Speed 10506.51 samples/sec Loss 11.3620 LearningRate 0.4265 Epoch: 2 Global Step: 14740 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 18:07:08,574-Speed 10516.85 samples/sec Loss 11.3724 LearningRate 0.4268 Epoch: 2 Global Step: 14750 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 18:07:16,383-Speed 10491.33 samples/sec Loss 11.3390 LearningRate 0.4271 Epoch: 2 Global Step: 14760 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-01-15 18:07:24,220-Speed 10453.69 samples/sec Loss 11.3780 LearningRate 0.4274 Epoch: 2 Global Step: 14770 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:07:31,998-Speed 10533.77 samples/sec Loss 11.2857 LearningRate 0.4277 Epoch: 2 Global Step: 14780 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:07:39,792-Speed 10513.96 samples/sec Loss 11.6025 LearningRate 0.4280 Epoch: 2 Global Step: 14790 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:07:47,604-Speed 10487.51 samples/sec Loss 11.4860 LearningRate 0.4282 Epoch: 2 Global Step: 14800 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:07:55,397-Speed 10513.83 samples/sec Loss 11.3576 LearningRate 0.4285 Epoch: 2 Global Step: 14810 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:08:03,188-Speed 10515.41 samples/sec Loss 11.3811 LearningRate 0.4288 Epoch: 2 Global Step: 14820 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:08:10,990-Speed 10502.24 samples/sec Loss 11.3641 LearningRate 0.4291 Epoch: 2 Global Step: 14830 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:08:18,839-Speed 10438.86 samples/sec Loss 11.3962 LearningRate 0.4294 Epoch: 2 Global Step: 14840 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:08:26,615-Speed 10536.77 samples/sec Loss 11.3557 LearningRate 0.4297 Epoch: 2 Global Step: 14850 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:08:34,413-Speed 10507.22 samples/sec Loss 11.4245 LearningRate 0.4300 Epoch: 2 Global Step: 14860 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:08:42,220-Speed 10494.82 samples/sec Loss 11.3670 LearningRate 0.4303 Epoch: 2 Global Step: 14870 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:08:50,036-Speed 10483.24 samples/sec Loss 11.3829 LearningRate 0.4306 Epoch: 2 Global Step: 14880 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:08:57,829-Speed 10513.84 samples/sec Loss 11.4363 LearningRate 0.4308 Epoch: 2 Global Step: 14890 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:09:05,638-Speed 10492.16 samples/sec Loss 11.4235 LearningRate 0.4311 Epoch: 2 Global Step: 14900 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:09:13,511-Speed 10407.27 samples/sec Loss 11.3923 LearningRate 0.4314 Epoch: 2 Global Step: 14910 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:09:21,341-Speed 10472.27 samples/sec Loss 11.4905 LearningRate 0.4317 Epoch: 2 Global Step: 14920 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:09:29,167-Speed 10468.46 samples/sec Loss 11.4317 LearningRate 0.4320 Epoch: 2 Global Step: 14930 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:09:36,950-Speed 10528.89 samples/sec Loss 11.3597 LearningRate 0.4323 Epoch: 2 Global Step: 14940 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:09:44,750-Speed 10504.66 samples/sec Loss 11.5007 LearningRate 0.4326 Epoch: 2 Global Step: 14950 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:09:52,552-Speed 10501.09 samples/sec Loss 11.4424 LearningRate 0.4329 Epoch: 2 Global Step: 14960 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:10:00,380-Speed 10472.72 samples/sec Loss 11.4122 LearningRate 0.4332 Epoch: 2 Global Step: 14970 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:10:08,191-Speed 10490.28 samples/sec Loss 11.3921 LearningRate 0.4334 Epoch: 2 Global Step: 14980 Fp16 Grad Scale: 524288 Required: 19 hours Training: 2022-01-15 18:10:16,001-Speed 10498.12 samples/sec Loss 11.4477 LearningRate 0.4337 Epoch: 2 Global Step: 14990 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:10:23,825-Speed 10472.96 samples/sec Loss 11.3779 LearningRate 0.4340 Epoch: 2 Global Step: 15000 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:10:31,621-Speed 10512.91 samples/sec Loss 11.4120 LearningRate 0.4343 Epoch: 2 Global Step: 15010 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:10:39,467-Speed 10442.63 samples/sec Loss 11.5130 LearningRate 0.4346 Epoch: 2 Global Step: 15020 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:10:47,253-Speed 10521.97 samples/sec Loss 11.3883 LearningRate 0.4349 Epoch: 2 Global Step: 15030 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:10:55,073-Speed 10481.20 samples/sec Loss 11.5630 LearningRate 0.4352 Epoch: 2 Global Step: 15040 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:11:02,953-Speed 10397.19 samples/sec Loss 11.4459 LearningRate 0.4355 Epoch: 2 Global Step: 15050 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:11:10,754-Speed 10503.11 samples/sec Loss 11.4011 LearningRate 0.4358 Epoch: 2 Global Step: 15060 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:11:18,535-Speed 10530.12 samples/sec Loss 11.4070 LearningRate 0.4361 Epoch: 2 Global Step: 15070 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:11:26,329-Speed 10512.46 samples/sec Loss 11.4384 LearningRate 0.4363 Epoch: 2 Global Step: 15080 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:11:34,116-Speed 10521.73 samples/sec Loss 11.4152 LearningRate 0.4366 Epoch: 2 Global Step: 15090 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:11:41,926-Speed 10490.07 samples/sec Loss 11.3783 LearningRate 0.4369 Epoch: 2 Global Step: 15100 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:11:49,745-Speed 10479.40 samples/sec Loss 11.4876 LearningRate 0.4372 Epoch: 2 Global Step: 15110 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:11:57,567-Speed 10475.39 samples/sec Loss 11.4778 LearningRate 0.4375 Epoch: 2 Global Step: 15120 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:12:05,383-Speed 10486.68 samples/sec Loss 11.3581 LearningRate 0.4378 Epoch: 2 Global Step: 15130 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:12:13,188-Speed 10497.88 samples/sec Loss 11.3862 LearningRate 0.4381 Epoch: 2 Global Step: 15140 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:12:21,006-Speed 10481.38 samples/sec Loss 11.4255 LearningRate 0.4384 Epoch: 2 Global Step: 15150 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:12:28,822-Speed 10482.60 samples/sec Loss 11.5058 LearningRate 0.4387 Epoch: 2 Global Step: 15160 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:12:36,654-Speed 10461.63 samples/sec Loss 11.5171 LearningRate 0.4389 Epoch: 2 Global Step: 15170 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:12:44,441-Speed 10521.89 samples/sec Loss 11.4282 LearningRate 0.4392 Epoch: 2 Global Step: 15180 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:12:52,274-Speed 10460.28 samples/sec Loss 11.4158 LearningRate 0.4395 Epoch: 2 Global Step: 15190 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:13:00,062-Speed 10521.33 samples/sec Loss 11.4215 LearningRate 0.4398 Epoch: 2 Global Step: 15200 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:13:07,856-Speed 10512.68 samples/sec Loss 11.5535 LearningRate 0.4401 Epoch: 2 Global Step: 15210 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:13:15,663-Speed 10495.74 samples/sec Loss 11.5299 LearningRate 0.4404 Epoch: 2 Global Step: 15220 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:13:23,475-Speed 10492.76 samples/sec Loss 11.5624 LearningRate 0.4407 Epoch: 2 Global Step: 15230 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:13:31,275-Speed 10503.78 samples/sec Loss 11.5520 LearningRate 0.4410 Epoch: 2 Global Step: 15240 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:13:39,044-Speed 10545.29 samples/sec Loss 12.3371 LearningRate 0.4413 Epoch: 2 Global Step: 15250 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-01-15 18:13:46,841-Speed 10509.86 samples/sec Loss 12.8725 LearningRate 0.4416 Epoch: 2 Global Step: 15260 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-01-15 18:13:54,640-Speed 10506.33 samples/sec Loss 13.3339 LearningRate 0.4418 Epoch: 2 Global Step: 15270 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-01-15 18:14:02,426-Speed 10522.90 samples/sec Loss 12.5485 LearningRate 0.4421 Epoch: 2 Global Step: 15280 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-01-15 18:14:10,219-Speed 10513.93 samples/sec Loss 11.9106 LearningRate 0.4424 Epoch: 2 Global Step: 15290 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-01-15 18:14:18,017-Speed 10507.85 samples/sec Loss 11.6442 LearningRate 0.4427 Epoch: 2 Global Step: 15300 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-01-15 18:14:25,802-Speed 10526.19 samples/sec Loss 11.5157 LearningRate 0.4430 Epoch: 2 Global Step: 15310 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-01-15 18:14:33,594-Speed 10514.73 samples/sec Loss 11.4547 LearningRate 0.4433 Epoch: 2 Global Step: 15320 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-01-15 18:14:41,409-Speed 10483.53 samples/sec Loss 11.4201 LearningRate 0.4436 Epoch: 2 Global Step: 15330 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-01-15 18:14:49,205-Speed 10509.76 samples/sec Loss 11.4542 LearningRate 0.4439 Epoch: 2 Global Step: 15340 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-01-15 18:14:56,997-Speed 10516.89 samples/sec Loss 11.4268 LearningRate 0.4442 Epoch: 2 Global Step: 15350 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-15 18:15:04,811-Speed 10486.07 samples/sec Loss 11.4493 LearningRate 0.4444 Epoch: 2 Global Step: 15360 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-15 18:15:12,607-Speed 10509.83 samples/sec Loss 11.4283 LearningRate 0.4447 Epoch: 2 Global Step: 15370 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-15 18:15:20,401-Speed 10512.94 samples/sec Loss 11.5602 LearningRate 0.4450 Epoch: 2 Global Step: 15380 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-15 18:15:28,197-Speed 10509.35 samples/sec Loss 11.5100 LearningRate 0.4453 Epoch: 2 Global Step: 15390 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-15 18:15:35,992-Speed 10512.50 samples/sec Loss 11.4934 LearningRate 0.4456 Epoch: 2 Global Step: 15400 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-15 18:15:43,809-Speed 10481.78 samples/sec Loss 11.4690 LearningRate 0.4459 Epoch: 2 Global Step: 15410 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-15 18:15:51,622-Speed 10485.86 samples/sec Loss 11.4914 LearningRate 0.4462 Epoch: 2 Global Step: 15420 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-15 18:15:59,395-Speed 10541.28 samples/sec Loss 11.4387 LearningRate 0.4465 Epoch: 2 Global Step: 15430 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-15 18:16:07,184-Speed 10519.57 samples/sec Loss 11.5773 LearningRate 0.4468 Epoch: 2 Global Step: 15440 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-15 18:16:15,014-Speed 10464.93 samples/sec Loss 11.5213 LearningRate 0.4470 Epoch: 2 Global Step: 15450 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-15 18:16:22,813-Speed 10504.72 samples/sec Loss 11.5333 LearningRate 0.4473 Epoch: 2 Global Step: 15460 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-15 18:16:30,615-Speed 10501.49 samples/sec Loss 11.5515 LearningRate 0.4476 Epoch: 2 Global Step: 15470 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-15 18:16:38,428-Speed 10487.25 samples/sec Loss 11.5726 LearningRate 0.4479 Epoch: 2 Global Step: 15480 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-15 18:16:46,238-Speed 10490.02 samples/sec Loss 11.5856 LearningRate 0.4482 Epoch: 2 Global Step: 15490 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-15 18:16:54,067-Speed 10465.16 samples/sec Loss 11.5398 LearningRate 0.4485 Epoch: 2 Global Step: 15500 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-15 18:17:01,875-Speed 10500.70 samples/sec Loss 11.4546 LearningRate 0.4488 Epoch: 2 Global Step: 15510 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-15 18:17:09,656-Speed 10529.75 samples/sec Loss 11.5021 LearningRate 0.4491 Epoch: 2 Global Step: 15520 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-15 18:17:17,465-Speed 10492.99 samples/sec Loss 11.6297 LearningRate 0.4494 Epoch: 2 Global Step: 15530 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-15 18:17:25,265-Speed 10504.72 samples/sec Loss 11.5098 LearningRate 0.4497 Epoch: 2 Global Step: 15540 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-15 18:17:33,098-Speed 10460.56 samples/sec Loss 11.6274 LearningRate 0.4499 Epoch: 2 Global Step: 15550 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:17:55,748-Speed 3616.96 samples/sec Loss 11.6290 LearningRate 0.4502 Epoch: 3 Global Step: 15560 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:18:03,519-Speed 10544.09 samples/sec Loss 11.4777 LearningRate 0.4505 Epoch: 3 Global Step: 15570 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:18:11,281-Speed 10555.98 samples/sec Loss 11.5090 LearningRate 0.4508 Epoch: 3 Global Step: 15580 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:18:19,031-Speed 10572.07 samples/sec Loss 11.4857 LearningRate 0.4511 Epoch: 3 Global Step: 15590 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:18:26,798-Speed 10549.32 samples/sec Loss 11.5063 LearningRate 0.4514 Epoch: 3 Global Step: 15600 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:18:34,595-Speed 10508.31 samples/sec Loss 11.6928 LearningRate 0.4517 Epoch: 3 Global Step: 15610 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-15 18:18:42,368-Speed 10540.12 samples/sec Loss 11.5525 LearningRate 0.4520 Epoch: 3 Global Step: 15620 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-15 18:18:50,142-Speed 10538.60 samples/sec Loss 11.4708 LearningRate 0.4523 Epoch: 3 Global Step: 15630 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-15 18:18:57,941-Speed 10506.57 samples/sec Loss 11.8085 LearningRate 0.4525 Epoch: 3 Global Step: 15640 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-15 18:19:05,730-Speed 10518.86 samples/sec Loss 11.9956 LearningRate 0.4528 Epoch: 3 Global Step: 15650 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-15 18:19:13,602-Speed 10412.09 samples/sec Loss 13.5376 LearningRate 0.4531 Epoch: 3 Global Step: 15660 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-01-15 18:19:21,457-Speed 10430.35 samples/sec Loss 14.0490 LearningRate 0.4534 Epoch: 3 Global Step: 15670 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-01-15 18:19:29,231-Speed 10540.32 samples/sec Loss 13.0212 LearningRate 0.4537 Epoch: 3 Global Step: 15680 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-01-15 18:19:37,010-Speed 10533.62 samples/sec Loss 12.2648 LearningRate 0.4540 Epoch: 3 Global Step: 15690 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-01-15 18:19:44,780-Speed 10548.03 samples/sec Loss 11.8517 LearningRate 0.4543 Epoch: 3 Global Step: 15700 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-01-15 18:19:52,551-Speed 10544.66 samples/sec Loss 11.6415 LearningRate 0.4546 Epoch: 3 Global Step: 15710 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-01-15 18:20:00,335-Speed 10525.17 samples/sec Loss 11.6004 LearningRate 0.4549 Epoch: 3 Global Step: 15720 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-01-15 18:20:08,110-Speed 10538.27 samples/sec Loss 11.6001 LearningRate 0.4552 Epoch: 3 Global Step: 15730 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-01-15 18:20:15,869-Speed 10561.66 samples/sec Loss 11.5529 LearningRate 0.4554 Epoch: 3 Global Step: 15740 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-01-15 18:20:23,687-Speed 10479.84 samples/sec Loss 11.5184 LearningRate 0.4557 Epoch: 3 Global Step: 15750 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-01-15 18:20:31,491-Speed 10498.82 samples/sec Loss 11.6285 LearningRate 0.4560 Epoch: 3 Global Step: 15760 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-01-15 18:20:39,245-Speed 10566.46 samples/sec Loss 11.4928 LearningRate 0.4563 Epoch: 3 Global Step: 15770 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-01-15 18:20:47,013-Speed 10548.88 samples/sec Loss 11.6395 LearningRate 0.4566 Epoch: 3 Global Step: 15780 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-01-15 18:20:54,814-Speed 10502.81 samples/sec Loss 11.5869 LearningRate 0.4569 Epoch: 3 Global Step: 15790 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-01-15 18:21:02,600-Speed 10522.51 samples/sec Loss 11.5833 LearningRate 0.4572 Epoch: 3 Global Step: 15800 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-01-15 18:21:10,403-Speed 10500.23 samples/sec Loss 11.5780 LearningRate 0.4575 Epoch: 3 Global Step: 15810 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-01-15 18:21:18,227-Speed 10471.94 samples/sec Loss 11.5646 LearningRate 0.4578 Epoch: 3 Global Step: 15820 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-01-15 18:21:26,023-Speed 10510.75 samples/sec Loss 11.5159 LearningRate 0.4580 Epoch: 3 Global Step: 15830 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-01-15 18:21:33,863-Speed 10450.15 samples/sec Loss 11.6390 LearningRate 0.4583 Epoch: 3 Global Step: 15840 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-01-15 18:21:41,642-Speed 10533.37 samples/sec Loss 11.6842 LearningRate 0.4586 Epoch: 3 Global Step: 15850 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-01-15 18:21:49,437-Speed 10511.37 samples/sec Loss 11.7046 LearningRate 0.4589 Epoch: 3 Global Step: 15860 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-01-15 18:21:57,245-Speed 10493.42 samples/sec Loss 11.5431 LearningRate 0.4592 Epoch: 3 Global Step: 15870 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-01-15 18:22:05,014-Speed 10545.76 samples/sec Loss 11.7126 LearningRate 0.4595 Epoch: 3 Global Step: 15880 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-01-15 18:22:12,856-Speed 10449.88 samples/sec Loss 11.6695 LearningRate 0.4598 Epoch: 3 Global Step: 15890 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-01-15 18:22:20,650-Speed 10522.40 samples/sec Loss 11.7897 LearningRate 0.4601 Epoch: 3 Global Step: 15900 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-01-15 18:22:28,483-Speed 10459.15 samples/sec Loss 11.6151 LearningRate 0.4604 Epoch: 3 Global Step: 15910 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-01-15 18:22:36,294-Speed 10488.83 samples/sec Loss 11.7332 LearningRate 0.4606 Epoch: 3 Global Step: 15920 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-01-15 18:22:44,099-Speed 10497.72 samples/sec Loss 11.6407 LearningRate 0.4609 Epoch: 3 Global Step: 15930 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-01-15 18:22:51,904-Speed 10498.71 samples/sec Loss 11.6824 LearningRate 0.4612 Epoch: 3 Global Step: 15940 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-01-15 18:22:59,714-Speed 10490.52 samples/sec Loss 11.6342 LearningRate 0.4615 Epoch: 3 Global Step: 15950 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-01-15 18:23:07,541-Speed 10468.70 samples/sec Loss 11.7166 LearningRate 0.4618 Epoch: 3 Global Step: 15960 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-15 18:23:15,382-Speed 10449.94 samples/sec Loss 11.6185 LearningRate 0.4621 Epoch: 3 Global Step: 15970 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-15 18:23:23,191-Speed 10493.59 samples/sec Loss 11.5882 LearningRate 0.4624 Epoch: 3 Global Step: 15980 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-15 18:23:31,027-Speed 10454.54 samples/sec Loss 11.7330 LearningRate 0.4627 Epoch: 3 Global Step: 15990 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-15 18:23:38,862-Speed 10458.53 samples/sec Loss 11.6200 LearningRate 0.4630 Epoch: 3 Global Step: 16000 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-15 18:23:46,721-Speed 10424.74 samples/sec Loss 11.7956 LearningRate 0.4633 Epoch: 3 Global Step: 16010 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-15 18:23:54,543-Speed 10474.23 samples/sec Loss 11.8032 LearningRate 0.4635 Epoch: 3 Global Step: 16020 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-15 18:24:02,422-Speed 10400.88 samples/sec Loss 11.6868 LearningRate 0.4638 Epoch: 3 Global Step: 16030 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-15 18:24:10,264-Speed 10447.04 samples/sec Loss 11.7434 LearningRate 0.4641 Epoch: 3 Global Step: 16040 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-15 18:24:18,109-Speed 10444.84 samples/sec Loss 11.7473 LearningRate 0.4644 Epoch: 3 Global Step: 16050 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-01-15 18:24:25,968-Speed 10425.28 samples/sec Loss 11.7611 LearningRate 0.4647 Epoch: 3 Global Step: 16060 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-15 18:24:33,823-Speed 10430.28 samples/sec Loss 11.6788 LearningRate 0.4650 Epoch: 3 Global Step: 16070 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-15 18:24:41,680-Speed 10434.10 samples/sec Loss 11.6443 LearningRate 0.4653 Epoch: 3 Global Step: 16080 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-15 18:24:49,515-Speed 10457.14 samples/sec Loss 11.6990 LearningRate 0.4656 Epoch: 3 Global Step: 16090 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-15 18:24:57,362-Speed 10440.96 samples/sec Loss 11.6650 LearningRate 0.4659 Epoch: 3 Global Step: 16100 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-15 18:25:05,208-Speed 10442.45 samples/sec Loss 11.5835 LearningRate 0.4661 Epoch: 3 Global Step: 16110 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-15 18:25:13,058-Speed 10445.82 samples/sec Loss 11.7084 LearningRate 0.4664 Epoch: 3 Global Step: 16120 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-15 18:25:20,885-Speed 10467.72 samples/sec Loss 11.6602 LearningRate 0.4667 Epoch: 3 Global Step: 16130 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-15 18:25:28,713-Speed 10466.22 samples/sec Loss 11.6141 LearningRate 0.4670 Epoch: 3 Global Step: 16140 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-15 18:25:36,557-Speed 10446.62 samples/sec Loss 11.6595 LearningRate 0.4673 Epoch: 3 Global Step: 16150 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-01-15 18:25:44,380-Speed 10475.14 samples/sec Loss 11.7081 LearningRate 0.4676 Epoch: 3 Global Step: 16160 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:25:52,198-Speed 10479.44 samples/sec Loss 11.7111 LearningRate 0.4679 Epoch: 3 Global Step: 16170 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:26:00,024-Speed 10470.11 samples/sec Loss 11.7502 LearningRate 0.4682 Epoch: 3 Global Step: 16180 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:26:07,899-Speed 10405.24 samples/sec Loss 11.6765 LearningRate 0.4685 Epoch: 3 Global Step: 16190 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:26:15,724-Speed 10469.45 samples/sec Loss 11.6295 LearningRate 0.4688 Epoch: 3 Global Step: 16200 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:26:23,560-Speed 10456.31 samples/sec Loss 11.6830 LearningRate 0.4690 Epoch: 3 Global Step: 16210 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:26:31,378-Speed 10481.89 samples/sec Loss 11.6296 LearningRate 0.4693 Epoch: 3 Global Step: 16220 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:26:39,211-Speed 10459.45 samples/sec Loss 11.8033 LearningRate 0.4696 Epoch: 3 Global Step: 16230 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:26:47,023-Speed 10488.85 samples/sec Loss 11.7066 LearningRate 0.4699 Epoch: 3 Global Step: 16240 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:26:54,852-Speed 10465.98 samples/sec Loss 11.7407 LearningRate 0.4702 Epoch: 3 Global Step: 16250 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:27:02,692-Speed 10450.12 samples/sec Loss 11.6142 LearningRate 0.4705 Epoch: 3 Global Step: 16260 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:27:10,531-Speed 10451.96 samples/sec Loss 11.6570 LearningRate 0.4708 Epoch: 3 Global Step: 16270 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:27:18,405-Speed 10405.95 samples/sec Loss 11.7358 LearningRate 0.4711 Epoch: 3 Global Step: 16280 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:27:26,263-Speed 10425.95 samples/sec Loss 11.7402 LearningRate 0.4714 Epoch: 3 Global Step: 16290 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:27:34,108-Speed 10448.00 samples/sec Loss 11.6772 LearningRate 0.4716 Epoch: 3 Global Step: 16300 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:27:41,983-Speed 10405.42 samples/sec Loss 11.7559 LearningRate 0.4719 Epoch: 3 Global Step: 16310 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:27:49,834-Speed 10434.85 samples/sec Loss 11.6920 LearningRate 0.4722 Epoch: 3 Global Step: 16320 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:27:57,699-Speed 10417.08 samples/sec Loss 11.5775 LearningRate 0.4725 Epoch: 3 Global Step: 16330 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:28:05,533-Speed 10459.35 samples/sec Loss 11.6025 LearningRate 0.4728 Epoch: 3 Global Step: 16340 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:28:13,373-Speed 10451.23 samples/sec Loss 11.6670 LearningRate 0.4731 Epoch: 3 Global Step: 16350 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:28:21,208-Speed 10457.30 samples/sec Loss 11.8496 LearningRate 0.4734 Epoch: 3 Global Step: 16360 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:28:29,077-Speed 10411.45 samples/sec Loss 11.7675 LearningRate 0.4737 Epoch: 3 Global Step: 16370 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:28:36,900-Speed 10474.83 samples/sec Loss 11.7945 LearningRate 0.4740 Epoch: 3 Global Step: 16380 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:28:44,754-Speed 10431.13 samples/sec Loss 11.5659 LearningRate 0.4742 Epoch: 3 Global Step: 16390 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:28:52,583-Speed 10465.17 samples/sec Loss 11.7314 LearningRate 0.4745 Epoch: 3 Global Step: 16400 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:29:00,435-Speed 10436.19 samples/sec Loss 11.8380 LearningRate 0.4748 Epoch: 3 Global Step: 16410 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:29:08,256-Speed 10476.57 samples/sec Loss 11.7518 LearningRate 0.4751 Epoch: 3 Global Step: 16420 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:29:16,087-Speed 10462.67 samples/sec Loss 11.7996 LearningRate 0.4754 Epoch: 3 Global Step: 16430 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:29:23,908-Speed 10475.81 samples/sec Loss 11.6704 LearningRate 0.4757 Epoch: 3 Global Step: 16440 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:29:31,751-Speed 10447.20 samples/sec Loss 11.7347 LearningRate 0.4760 Epoch: 3 Global Step: 16450 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:29:39,586-Speed 10457.87 samples/sec Loss 11.6904 LearningRate 0.4763 Epoch: 3 Global Step: 16460 Fp16 Grad Scale: 524288 Required: 19 hours Training: 2022-01-15 18:29:47,392-Speed 10495.33 samples/sec Loss 11.6542 LearningRate 0.4766 Epoch: 3 Global Step: 16470 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:29:55,180-Speed 10534.19 samples/sec Loss 11.7190 LearningRate 0.4769 Epoch: 3 Global Step: 16480 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:30:02,978-Speed 10506.63 samples/sec Loss 11.8613 LearningRate 0.4771 Epoch: 3 Global Step: 16490 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:30:10,804-Speed 10469.84 samples/sec Loss 11.9464 LearningRate 0.4774 Epoch: 3 Global Step: 16500 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:30:18,668-Speed 10419.04 samples/sec Loss 11.8869 LearningRate 0.4777 Epoch: 3 Global Step: 16510 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:30:26,470-Speed 10501.01 samples/sec Loss 11.6816 LearningRate 0.4780 Epoch: 3 Global Step: 16520 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:30:34,306-Speed 10455.33 samples/sec Loss 11.6841 LearningRate 0.4783 Epoch: 3 Global Step: 16530 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:30:42,138-Speed 10461.32 samples/sec Loss 11.6751 LearningRate 0.4786 Epoch: 3 Global Step: 16540 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:30:49,938-Speed 10504.36 samples/sec Loss 11.6516 LearningRate 0.4789 Epoch: 3 Global Step: 16550 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:30:57,770-Speed 10461.39 samples/sec Loss 11.6532 LearningRate 0.4792 Epoch: 3 Global Step: 16560 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:31:05,598-Speed 10466.67 samples/sec Loss 11.7461 LearningRate 0.4795 Epoch: 3 Global Step: 16570 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:31:13,395-Speed 10508.64 samples/sec Loss 11.7403 LearningRate 0.4797 Epoch: 3 Global Step: 16580 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:31:21,203-Speed 10492.78 samples/sec Loss 11.7798 LearningRate 0.4800 Epoch: 3 Global Step: 16590 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:31:29,001-Speed 10506.88 samples/sec Loss 11.6913 LearningRate 0.4803 Epoch: 3 Global Step: 16600 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:31:36,825-Speed 10471.73 samples/sec Loss 11.6216 LearningRate 0.4806 Epoch: 3 Global Step: 16610 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:31:44,661-Speed 10455.26 samples/sec Loss 11.7677 LearningRate 0.4809 Epoch: 3 Global Step: 16620 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:31:52,443-Speed 10528.94 samples/sec Loss 11.8053 LearningRate 0.4812 Epoch: 3 Global Step: 16630 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:32:00,240-Speed 10508.06 samples/sec Loss 11.6809 LearningRate 0.4815 Epoch: 3 Global Step: 16640 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:32:08,021-Speed 10528.61 samples/sec Loss 11.7203 LearningRate 0.4818 Epoch: 3 Global Step: 16650 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:32:15,810-Speed 10519.59 samples/sec Loss 11.6317 LearningRate 0.4821 Epoch: 3 Global Step: 16660 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:32:23,634-Speed 10471.90 samples/sec Loss 11.8121 LearningRate 0.4823 Epoch: 3 Global Step: 16670 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:32:31,466-Speed 10460.40 samples/sec Loss 11.8724 LearningRate 0.4826 Epoch: 3 Global Step: 16680 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:32:39,281-Speed 10484.06 samples/sec Loss 12.1755 LearningRate 0.4829 Epoch: 3 Global Step: 16690 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:32:47,056-Speed 10538.70 samples/sec Loss 12.0990 LearningRate 0.4832 Epoch: 3 Global Step: 16700 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:32:54,839-Speed 10526.49 samples/sec Loss 11.8462 LearningRate 0.4835 Epoch: 3 Global Step: 16710 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:33:02,663-Speed 10472.01 samples/sec Loss 11.7298 LearningRate 0.4838 Epoch: 3 Global Step: 16720 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:33:10,478-Speed 10483.04 samples/sec Loss 11.6368 LearningRate 0.4841 Epoch: 3 Global Step: 16730 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:33:18,291-Speed 10487.74 samples/sec Loss 11.6585 LearningRate 0.4844 Epoch: 3 Global Step: 16740 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:33:26,105-Speed 10485.05 samples/sec Loss 11.7961 LearningRate 0.4847 Epoch: 3 Global Step: 16750 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:33:33,932-Speed 10468.67 samples/sec Loss 11.7249 LearningRate 0.4850 Epoch: 3 Global Step: 16760 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:33:41,750-Speed 10479.15 samples/sec Loss 11.7818 LearningRate 0.4852 Epoch: 3 Global Step: 16770 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:33:49,609-Speed 10426.01 samples/sec Loss 11.6715 LearningRate 0.4855 Epoch: 3 Global Step: 16780 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:33:57,418-Speed 10492.18 samples/sec Loss 11.7783 LearningRate 0.4858 Epoch: 3 Global Step: 16790 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:34:05,230-Speed 10487.68 samples/sec Loss 11.8765 LearningRate 0.4861 Epoch: 3 Global Step: 16800 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:34:13,027-Speed 10509.15 samples/sec Loss 11.8373 LearningRate 0.4864 Epoch: 3 Global Step: 16810 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:34:20,836-Speed 10492.00 samples/sec Loss 11.6909 LearningRate 0.4867 Epoch: 3 Global Step: 16820 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:34:28,676-Speed 10450.53 samples/sec Loss 11.7435 LearningRate 0.4870 Epoch: 3 Global Step: 16830 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:34:36,479-Speed 10499.73 samples/sec Loss 11.7868 LearningRate 0.4873 Epoch: 3 Global Step: 16840 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:34:44,261-Speed 10528.68 samples/sec Loss 11.8070 LearningRate 0.4876 Epoch: 3 Global Step: 16850 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:34:52,081-Speed 10477.26 samples/sec Loss 11.7747 LearningRate 0.4878 Epoch: 3 Global Step: 16860 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:34:59,892-Speed 10488.13 samples/sec Loss 11.7475 LearningRate 0.4881 Epoch: 3 Global Step: 16870 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:35:07,737-Speed 10444.18 samples/sec Loss 11.7357 LearningRate 0.4884 Epoch: 3 Global Step: 16880 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:35:15,543-Speed 10497.21 samples/sec Loss 11.7896 LearningRate 0.4887 Epoch: 3 Global Step: 16890 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:35:23,344-Speed 10502.10 samples/sec Loss 11.8310 LearningRate 0.4890 Epoch: 3 Global Step: 16900 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:35:31,189-Speed 10443.69 samples/sec Loss 11.7359 LearningRate 0.4893 Epoch: 3 Global Step: 16910 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:35:39,014-Speed 10470.24 samples/sec Loss 11.8230 LearningRate 0.4896 Epoch: 3 Global Step: 16920 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:35:46,832-Speed 10479.22 samples/sec Loss 11.8144 LearningRate 0.4899 Epoch: 3 Global Step: 16930 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:35:54,639-Speed 10495.59 samples/sec Loss 11.6792 LearningRate 0.4902 Epoch: 3 Global Step: 16940 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:36:02,424-Speed 10523.95 samples/sec Loss 11.8068 LearningRate 0.4905 Epoch: 3 Global Step: 16950 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:36:10,216-Speed 10515.03 samples/sec Loss 11.8519 LearningRate 0.4907 Epoch: 3 Global Step: 16960 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:36:18,021-Speed 10496.80 samples/sec Loss 11.7398 LearningRate 0.4910 Epoch: 3 Global Step: 16970 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:36:25,842-Speed 10476.14 samples/sec Loss 11.8474 LearningRate 0.4913 Epoch: 3 Global Step: 16980 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:36:33,628-Speed 10522.29 samples/sec Loss 11.8246 LearningRate 0.4916 Epoch: 3 Global Step: 16990 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:36:41,424-Speed 10510.46 samples/sec Loss 11.8366 LearningRate 0.4919 Epoch: 3 Global Step: 17000 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:36:49,234-Speed 10490.01 samples/sec Loss 11.7986 LearningRate 0.4922 Epoch: 3 Global Step: 17010 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:36:57,034-Speed 10503.62 samples/sec Loss 11.9118 LearningRate 0.4925 Epoch: 3 Global Step: 17020 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:37:04,857-Speed 10474.11 samples/sec Loss 11.7765 LearningRate 0.4928 Epoch: 3 Global Step: 17030 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:37:12,676-Speed 10478.86 samples/sec Loss 11.9275 LearningRate 0.4931 Epoch: 3 Global Step: 17040 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:37:20,506-Speed 10461.87 samples/sec Loss 11.7553 LearningRate 0.4933 Epoch: 3 Global Step: 17050 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:37:28,309-Speed 10500.03 samples/sec Loss 11.8393 LearningRate 0.4936 Epoch: 3 Global Step: 17060 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:37:36,102-Speed 10514.01 samples/sec Loss 11.8677 LearningRate 0.4939 Epoch: 3 Global Step: 17070 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:37:43,903-Speed 10503.15 samples/sec Loss 11.8169 LearningRate 0.4942 Epoch: 3 Global Step: 17080 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:37:51,705-Speed 10500.35 samples/sec Loss 11.8226 LearningRate 0.4945 Epoch: 3 Global Step: 17090 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:37:59,486-Speed 10530.49 samples/sec Loss 11.7230 LearningRate 0.4948 Epoch: 3 Global Step: 17100 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:38:07,315-Speed 10464.62 samples/sec Loss 11.8191 LearningRate 0.4951 Epoch: 3 Global Step: 17110 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:38:15,140-Speed 10470.48 samples/sec Loss 11.8009 LearningRate 0.4954 Epoch: 3 Global Step: 17120 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:38:22,923-Speed 10527.16 samples/sec Loss 11.9099 LearningRate 0.4957 Epoch: 3 Global Step: 17130 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:38:30,707-Speed 10525.08 samples/sec Loss 11.7706 LearningRate 0.4959 Epoch: 3 Global Step: 17140 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:38:38,515-Speed 10493.51 samples/sec Loss 11.7934 LearningRate 0.4962 Epoch: 3 Global Step: 17150 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:38:46,317-Speed 10501.41 samples/sec Loss 11.9074 LearningRate 0.4965 Epoch: 3 Global Step: 17160 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:38:54,124-Speed 10501.09 samples/sec Loss 11.8125 LearningRate 0.4968 Epoch: 3 Global Step: 17170 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:39:01,923-Speed 10506.06 samples/sec Loss 11.7734 LearningRate 0.4971 Epoch: 3 Global Step: 17180 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:39:09,722-Speed 10505.73 samples/sec Loss 11.8589 LearningRate 0.4974 Epoch: 3 Global Step: 17190 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:39:17,511-Speed 10518.74 samples/sec Loss 11.8432 LearningRate 0.4977 Epoch: 3 Global Step: 17200 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:39:25,318-Speed 10500.17 samples/sec Loss 12.0862 LearningRate 0.4980 Epoch: 3 Global Step: 17210 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:39:33,126-Speed 10493.99 samples/sec Loss 12.7682 LearningRate 0.4983 Epoch: 3 Global Step: 17220 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:39:40,919-Speed 10514.25 samples/sec Loss 13.4151 LearningRate 0.4986 Epoch: 3 Global Step: 17230 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:39:48,698-Speed 10531.88 samples/sec Loss 13.0059 LearningRate 0.4988 Epoch: 3 Global Step: 17240 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:39:56,488-Speed 10519.09 samples/sec Loss 12.6866 LearningRate 0.4991 Epoch: 3 Global Step: 17250 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:40:04,320-Speed 10461.37 samples/sec Loss 12.1151 LearningRate 0.4994 Epoch: 3 Global Step: 17260 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:40:12,141-Speed 10477.26 samples/sec Loss 11.9756 LearningRate 0.4997 Epoch: 3 Global Step: 17270 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:40:19,954-Speed 10485.92 samples/sec Loss 11.8628 LearningRate 0.5000 Epoch: 3 Global Step: 17280 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:40:27,761-Speed 10495.35 samples/sec Loss 11.7456 LearningRate 0.5003 Epoch: 3 Global Step: 17290 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:40:35,560-Speed 10505.97 samples/sec Loss 11.7876 LearningRate 0.5006 Epoch: 3 Global Step: 17300 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:40:43,410-Speed 10436.93 samples/sec Loss 11.7839 LearningRate 0.5009 Epoch: 3 Global Step: 17310 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:40:51,230-Speed 10479.76 samples/sec Loss 11.7913 LearningRate 0.5012 Epoch: 3 Global Step: 17320 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:40:59,039-Speed 10491.83 samples/sec Loss 11.9491 LearningRate 0.5014 Epoch: 3 Global Step: 17330 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:41:06,828-Speed 10519.56 samples/sec Loss 11.8159 LearningRate 0.5017 Epoch: 3 Global Step: 17340 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:41:14,634-Speed 10495.31 samples/sec Loss 11.8634 LearningRate 0.5020 Epoch: 3 Global Step: 17350 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:41:22,455-Speed 10476.64 samples/sec Loss 11.8930 LearningRate 0.5023 Epoch: 3 Global Step: 17360 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:41:30,297-Speed 10448.08 samples/sec Loss 11.8727 LearningRate 0.5026 Epoch: 3 Global Step: 17370 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:41:38,143-Speed 10444.70 samples/sec Loss 11.8218 LearningRate 0.5029 Epoch: 3 Global Step: 17380 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:41:45,981-Speed 10452.55 samples/sec Loss 12.0003 LearningRate 0.5032 Epoch: 3 Global Step: 17390 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:41:53,801-Speed 10479.29 samples/sec Loss 11.9266 LearningRate 0.5035 Epoch: 3 Global Step: 17400 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:42:01,596-Speed 10511.56 samples/sec Loss 11.9036 LearningRate 0.5038 Epoch: 3 Global Step: 17410 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:42:09,390-Speed 10512.30 samples/sec Loss 11.9735 LearningRate 0.5041 Epoch: 3 Global Step: 17420 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:42:17,275-Speed 10391.61 samples/sec Loss 11.9486 LearningRate 0.5043 Epoch: 3 Global Step: 17430 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:42:25,084-Speed 10492.84 samples/sec Loss 11.9364 LearningRate 0.5046 Epoch: 3 Global Step: 17440 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:42:32,906-Speed 10474.53 samples/sec Loss 11.9600 LearningRate 0.5049 Epoch: 3 Global Step: 17450 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:42:40,731-Speed 10470.12 samples/sec Loss 11.8956 LearningRate 0.5052 Epoch: 3 Global Step: 17460 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:42:48,517-Speed 10524.70 samples/sec Loss 11.8688 LearningRate 0.5055 Epoch: 3 Global Step: 17470 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:42:56,342-Speed 10470.65 samples/sec Loss 11.9130 LearningRate 0.5058 Epoch: 3 Global Step: 17480 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:43:04,151-Speed 10492.95 samples/sec Loss 11.8852 LearningRate 0.5061 Epoch: 3 Global Step: 17490 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:43:11,941-Speed 10517.46 samples/sec Loss 11.9444 LearningRate 0.5064 Epoch: 3 Global Step: 17500 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:43:19,728-Speed 10521.36 samples/sec Loss 11.9658 LearningRate 0.5067 Epoch: 3 Global Step: 17510 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:43:27,540-Speed 10488.08 samples/sec Loss 12.0571 LearningRate 0.5069 Epoch: 3 Global Step: 17520 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:43:35,325-Speed 10523.88 samples/sec Loss 12.0620 LearningRate 0.5072 Epoch: 3 Global Step: 17530 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:43:43,173-Speed 10440.23 samples/sec Loss 11.9738 LearningRate 0.5075 Epoch: 3 Global Step: 17540 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:43:50,972-Speed 10504.64 samples/sec Loss 12.0672 LearningRate 0.5078 Epoch: 3 Global Step: 17550 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:43:58,783-Speed 10488.86 samples/sec Loss 11.9841 LearningRate 0.5081 Epoch: 3 Global Step: 17560 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:44:06,579-Speed 10510.08 samples/sec Loss 11.9214 LearningRate 0.5084 Epoch: 3 Global Step: 17570 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:44:14,385-Speed 10495.84 samples/sec Loss 11.8963 LearningRate 0.5087 Epoch: 3 Global Step: 17580 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:44:22,163-Speed 10533.68 samples/sec Loss 12.0027 LearningRate 0.5090 Epoch: 3 Global Step: 17590 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:44:29,984-Speed 10476.56 samples/sec Loss 11.9229 LearningRate 0.5093 Epoch: 3 Global Step: 17600 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:44:37,777-Speed 10512.92 samples/sec Loss 12.0466 LearningRate 0.5095 Epoch: 3 Global Step: 17610 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:44:45,593-Speed 10482.41 samples/sec Loss 11.9440 LearningRate 0.5098 Epoch: 3 Global Step: 17620 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:44:53,424-Speed 10463.01 samples/sec Loss 11.9114 LearningRate 0.5101 Epoch: 3 Global Step: 17630 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:45:01,244-Speed 10477.11 samples/sec Loss 11.9477 LearningRate 0.5104 Epoch: 3 Global Step: 17640 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:45:09,125-Speed 10396.29 samples/sec Loss 11.8779 LearningRate 0.5107 Epoch: 3 Global Step: 17650 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:45:16,925-Speed 10503.81 samples/sec Loss 11.8945 LearningRate 0.5110 Epoch: 3 Global Step: 17660 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:45:24,734-Speed 10491.55 samples/sec Loss 11.8929 LearningRate 0.5113 Epoch: 3 Global Step: 17670 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:45:32,535-Speed 10503.40 samples/sec Loss 11.9910 LearningRate 0.5116 Epoch: 3 Global Step: 17680 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:45:40,399-Speed 10417.80 samples/sec Loss 12.0216 LearningRate 0.5119 Epoch: 3 Global Step: 17690 Fp16 Grad Scale: 524288 Required: 19 hours Training: 2022-01-15 18:45:48,208-Speed 10491.66 samples/sec Loss 11.9870 LearningRate 0.5122 Epoch: 3 Global Step: 17700 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:45:56,035-Speed 10468.43 samples/sec Loss 12.0230 LearningRate 0.5124 Epoch: 3 Global Step: 17710 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:46:03,828-Speed 10513.08 samples/sec Loss 12.0301 LearningRate 0.5127 Epoch: 3 Global Step: 17720 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:46:11,620-Speed 10515.90 samples/sec Loss 12.0141 LearningRate 0.5130 Epoch: 3 Global Step: 17730 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:46:19,423-Speed 10498.82 samples/sec Loss 12.0160 LearningRate 0.5133 Epoch: 3 Global Step: 17740 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:46:27,281-Speed 10426.42 samples/sec Loss 11.9745 LearningRate 0.5136 Epoch: 3 Global Step: 17750 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:46:35,078-Speed 10507.90 samples/sec Loss 11.9114 LearningRate 0.5139 Epoch: 3 Global Step: 17760 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:46:42,907-Speed 10464.72 samples/sec Loss 11.9825 LearningRate 0.5142 Epoch: 3 Global Step: 17770 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:46:50,701-Speed 10512.46 samples/sec Loss 12.0582 LearningRate 0.5145 Epoch: 3 Global Step: 17780 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:46:58,489-Speed 10519.91 samples/sec Loss 12.0438 LearningRate 0.5148 Epoch: 3 Global Step: 17790 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:47:06,286-Speed 10508.77 samples/sec Loss 11.9563 LearningRate 0.5150 Epoch: 3 Global Step: 17800 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:47:14,072-Speed 10521.72 samples/sec Loss 11.9620 LearningRate 0.5153 Epoch: 3 Global Step: 17810 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:47:21,875-Speed 10501.04 samples/sec Loss 12.1001 LearningRate 0.5156 Epoch: 3 Global Step: 17820 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:47:29,659-Speed 10524.83 samples/sec Loss 11.9760 LearningRate 0.5159 Epoch: 3 Global Step: 17830 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:47:37,476-Speed 10480.24 samples/sec Loss 11.9400 LearningRate 0.5162 Epoch: 3 Global Step: 17840 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:47:45,301-Speed 10471.34 samples/sec Loss 11.9408 LearningRate 0.5165 Epoch: 3 Global Step: 17850 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:47:53,105-Speed 10500.12 samples/sec Loss 11.9552 LearningRate 0.5168 Epoch: 3 Global Step: 17860 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:48:00,904-Speed 10504.85 samples/sec Loss 12.1411 LearningRate 0.5171 Epoch: 3 Global Step: 17870 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:48:08,722-Speed 10481.07 samples/sec Loss 12.0029 LearningRate 0.5174 Epoch: 3 Global Step: 17880 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:48:16,525-Speed 10500.62 samples/sec Loss 12.0764 LearningRate 0.5177 Epoch: 3 Global Step: 17890 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:48:24,334-Speed 10491.94 samples/sec Loss 12.0524 LearningRate 0.5179 Epoch: 3 Global Step: 17900 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:48:32,168-Speed 10459.27 samples/sec Loss 12.0011 LearningRate 0.5182 Epoch: 3 Global Step: 17910 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:48:39,968-Speed 10504.82 samples/sec Loss 12.0107 LearningRate 0.5185 Epoch: 3 Global Step: 17920 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:48:47,763-Speed 10510.35 samples/sec Loss 11.9286 LearningRate 0.5188 Epoch: 3 Global Step: 17930 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:48:55,611-Speed 10439.87 samples/sec Loss 12.0408 LearningRate 0.5191 Epoch: 3 Global Step: 17940 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:49:03,411-Speed 10506.24 samples/sec Loss 11.9567 LearningRate 0.5194 Epoch: 3 Global Step: 17950 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:49:11,234-Speed 10472.91 samples/sec Loss 11.9497 LearningRate 0.5197 Epoch: 3 Global Step: 17960 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:49:19,062-Speed 10467.64 samples/sec Loss 11.9688 LearningRate 0.5200 Epoch: 3 Global Step: 17970 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:49:26,902-Speed 10451.10 samples/sec Loss 12.0085 LearningRate 0.5203 Epoch: 3 Global Step: 17980 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:49:34,703-Speed 10502.66 samples/sec Loss 12.0857 LearningRate 0.5205 Epoch: 3 Global Step: 17990 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:49:42,497-Speed 10511.71 samples/sec Loss 12.2004 LearningRate 0.5208 Epoch: 3 Global Step: 18000 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:49:50,284-Speed 10523.21 samples/sec Loss 12.2238 LearningRate 0.5211 Epoch: 3 Global Step: 18010 Fp16 Grad Scale: 524288 Required: 19 hours Training: 2022-01-15 18:49:58,098-Speed 10486.33 samples/sec Loss 12.0649 LearningRate 0.5214 Epoch: 3 Global Step: 18020 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:50:05,900-Speed 10501.56 samples/sec Loss 12.0896 LearningRate 0.5217 Epoch: 3 Global Step: 18030 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:50:13,702-Speed 10501.48 samples/sec Loss 11.9843 LearningRate 0.5220 Epoch: 3 Global Step: 18040 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:50:21,497-Speed 10512.20 samples/sec Loss 11.9350 LearningRate 0.5223 Epoch: 3 Global Step: 18050 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:50:29,362-Speed 10418.24 samples/sec Loss 12.1410 LearningRate 0.5226 Epoch: 3 Global Step: 18060 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:50:37,204-Speed 10448.31 samples/sec Loss 12.0705 LearningRate 0.5229 Epoch: 3 Global Step: 18070 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:50:45,037-Speed 10459.97 samples/sec Loss 12.0849 LearningRate 0.5231 Epoch: 3 Global Step: 18080 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:50:52,852-Speed 10485.12 samples/sec Loss 12.0653 LearningRate 0.5234 Epoch: 3 Global Step: 18090 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:51:00,684-Speed 10461.96 samples/sec Loss 11.9878 LearningRate 0.5237 Epoch: 3 Global Step: 18100 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:51:08,476-Speed 10514.56 samples/sec Loss 12.0470 LearningRate 0.5240 Epoch: 3 Global Step: 18110 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:51:16,307-Speed 10462.05 samples/sec Loss 12.0276 LearningRate 0.5243 Epoch: 3 Global Step: 18120 Fp16 Grad Scale: 524288 Required: 19 hours Training: 2022-01-15 18:51:24,098-Speed 10516.56 samples/sec Loss 12.1341 LearningRate 0.5246 Epoch: 3 Global Step: 18130 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:51:31,896-Speed 10506.17 samples/sec Loss 12.0896 LearningRate 0.5249 Epoch: 3 Global Step: 18140 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:51:39,730-Speed 10458.48 samples/sec Loss 11.9671 LearningRate 0.5252 Epoch: 3 Global Step: 18150 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:51:47,550-Speed 10477.18 samples/sec Loss 12.1154 LearningRate 0.5255 Epoch: 3 Global Step: 18160 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:51:55,338-Speed 10519.89 samples/sec Loss 12.0651 LearningRate 0.5258 Epoch: 3 Global Step: 18170 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:52:03,142-Speed 10498.56 samples/sec Loss 11.9477 LearningRate 0.5260 Epoch: 3 Global Step: 18180 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:52:10,942-Speed 10504.47 samples/sec Loss 12.0763 LearningRate 0.5263 Epoch: 3 Global Step: 18190 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:52:18,770-Speed 10466.26 samples/sec Loss 12.1193 LearningRate 0.5266 Epoch: 3 Global Step: 18200 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:52:26,571-Speed 10502.54 samples/sec Loss 12.0638 LearningRate 0.5269 Epoch: 3 Global Step: 18210 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:52:34,389-Speed 10480.53 samples/sec Loss 12.1104 LearningRate 0.5272 Epoch: 3 Global Step: 18220 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:52:42,187-Speed 10505.75 samples/sec Loss 11.9843 LearningRate 0.5275 Epoch: 3 Global Step: 18230 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:52:50,004-Speed 10482.28 samples/sec Loss 12.1237 LearningRate 0.5278 Epoch: 3 Global Step: 18240 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:52:57,816-Speed 10487.61 samples/sec Loss 12.0495 LearningRate 0.5281 Epoch: 3 Global Step: 18250 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:53:05,609-Speed 10513.79 samples/sec Loss 12.0852 LearningRate 0.5284 Epoch: 3 Global Step: 18260 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:53:13,392-Speed 10527.46 samples/sec Loss 11.9999 LearningRate 0.5286 Epoch: 3 Global Step: 18270 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:53:21,194-Speed 10500.81 samples/sec Loss 12.1545 LearningRate 0.5289 Epoch: 3 Global Step: 18280 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:53:29,005-Speed 10490.69 samples/sec Loss 12.0940 LearningRate 0.5292 Epoch: 3 Global Step: 18290 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:53:36,823-Speed 10480.47 samples/sec Loss 12.0274 LearningRate 0.5295 Epoch: 3 Global Step: 18300 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:53:44,653-Speed 10463.74 samples/sec Loss 12.2052 LearningRate 0.5298 Epoch: 3 Global Step: 18310 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:53:52,477-Speed 10471.51 samples/sec Loss 12.2092 LearningRate 0.5301 Epoch: 3 Global Step: 18320 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:54:00,264-Speed 10521.91 samples/sec Loss 12.0635 LearningRate 0.5304 Epoch: 3 Global Step: 18330 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:54:08,083-Speed 10478.23 samples/sec Loss 12.0754 LearningRate 0.5307 Epoch: 3 Global Step: 18340 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:54:15,920-Speed 10455.79 samples/sec Loss 12.0006 LearningRate 0.5310 Epoch: 3 Global Step: 18350 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:54:23,725-Speed 10497.30 samples/sec Loss 12.1156 LearningRate 0.5312 Epoch: 3 Global Step: 18360 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:54:31,582-Speed 10428.07 samples/sec Loss 12.0796 LearningRate 0.5315 Epoch: 3 Global Step: 18370 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:54:39,435-Speed 10434.41 samples/sec Loss 12.1134 LearningRate 0.5318 Epoch: 3 Global Step: 18380 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:54:47,259-Speed 10472.40 samples/sec Loss 12.0862 LearningRate 0.5321 Epoch: 3 Global Step: 18390 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:54:55,065-Speed 10497.22 samples/sec Loss 12.0841 LearningRate 0.5324 Epoch: 3 Global Step: 18400 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:55:02,860-Speed 10510.87 samples/sec Loss 12.2147 LearningRate 0.5327 Epoch: 3 Global Step: 18410 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:55:10,688-Speed 10466.88 samples/sec Loss 12.1749 LearningRate 0.5330 Epoch: 3 Global Step: 18420 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:55:18,488-Speed 10503.50 samples/sec Loss 12.1147 LearningRate 0.5333 Epoch: 3 Global Step: 18430 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:55:26,317-Speed 10465.88 samples/sec Loss 12.1428 LearningRate 0.5336 Epoch: 3 Global Step: 18440 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:55:34,144-Speed 10468.69 samples/sec Loss 12.0519 LearningRate 0.5339 Epoch: 3 Global Step: 18450 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:55:41,948-Speed 10498.58 samples/sec Loss 12.1095 LearningRate 0.5341 Epoch: 3 Global Step: 18460 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:55:49,736-Speed 10520.23 samples/sec Loss 12.1325 LearningRate 0.5344 Epoch: 3 Global Step: 18470 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:55:57,533-Speed 10508.63 samples/sec Loss 12.1669 LearningRate 0.5347 Epoch: 3 Global Step: 18480 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:56:05,340-Speed 10495.08 samples/sec Loss 12.1426 LearningRate 0.5350 Epoch: 3 Global Step: 18490 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:56:13,141-Speed 10502.12 samples/sec Loss 12.1573 LearningRate 0.5353 Epoch: 3 Global Step: 18500 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:56:20,920-Speed 10532.90 samples/sec Loss 12.0891 LearningRate 0.5356 Epoch: 3 Global Step: 18510 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:56:28,759-Speed 10451.43 samples/sec Loss 12.0874 LearningRate 0.5359 Epoch: 3 Global Step: 18520 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:56:36,582-Speed 10473.84 samples/sec Loss 12.1764 LearningRate 0.5362 Epoch: 3 Global Step: 18530 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:56:44,383-Speed 10504.49 samples/sec Loss 12.2285 LearningRate 0.5365 Epoch: 3 Global Step: 18540 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:56:52,198-Speed 10484.35 samples/sec Loss 12.0590 LearningRate 0.5367 Epoch: 3 Global Step: 18550 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:57:00,004-Speed 10498.41 samples/sec Loss 12.1520 LearningRate 0.5370 Epoch: 3 Global Step: 18560 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:57:07,810-Speed 10497.06 samples/sec Loss 12.2600 LearningRate 0.5373 Epoch: 3 Global Step: 18570 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:57:15,605-Speed 10510.52 samples/sec Loss 12.2144 LearningRate 0.5376 Epoch: 3 Global Step: 18580 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:57:23,405-Speed 10504.93 samples/sec Loss 12.0863 LearningRate 0.5379 Epoch: 3 Global Step: 18590 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:57:31,195-Speed 10517.57 samples/sec Loss 12.1355 LearningRate 0.5382 Epoch: 3 Global Step: 18600 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:57:38,968-Speed 10540.30 samples/sec Loss 12.0902 LearningRate 0.5385 Epoch: 3 Global Step: 18610 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:57:46,755-Speed 10522.45 samples/sec Loss 12.1146 LearningRate 0.5388 Epoch: 3 Global Step: 18620 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:57:54,590-Speed 10457.30 samples/sec Loss 12.1097 LearningRate 0.5391 Epoch: 3 Global Step: 18630 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:58:02,409-Speed 10480.25 samples/sec Loss 12.2284 LearningRate 0.5394 Epoch: 3 Global Step: 18640 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:58:10,258-Speed 10438.06 samples/sec Loss 12.1606 LearningRate 0.5396 Epoch: 3 Global Step: 18650 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:58:18,074-Speed 10483.56 samples/sec Loss 12.1744 LearningRate 0.5399 Epoch: 3 Global Step: 18660 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:58:25,855-Speed 10529.54 samples/sec Loss 12.0958 LearningRate 0.5402 Epoch: 3 Global Step: 18670 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:58:33,644-Speed 10518.80 samples/sec Loss 12.5120 LearningRate 0.5405 Epoch: 3 Global Step: 18680 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:58:41,457-Speed 10487.13 samples/sec Loss 12.3626 LearningRate 0.5408 Epoch: 3 Global Step: 18690 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:58:49,301-Speed 10445.71 samples/sec Loss 12.2440 LearningRate 0.5411 Epoch: 3 Global Step: 18700 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:58:57,157-Speed 10429.15 samples/sec Loss 12.1942 LearningRate 0.5414 Epoch: 3 Global Step: 18710 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:59:04,984-Speed 10468.97 samples/sec Loss 12.0612 LearningRate 0.5417 Epoch: 3 Global Step: 18720 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:59:12,818-Speed 10464.10 samples/sec Loss 11.9764 LearningRate 0.5420 Epoch: 3 Global Step: 18730 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:59:20,642-Speed 10471.71 samples/sec Loss 12.3581 LearningRate 0.5422 Epoch: 3 Global Step: 18740 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:59:28,481-Speed 10452.03 samples/sec Loss 12.1658 LearningRate 0.5425 Epoch: 3 Global Step: 18750 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:59:36,307-Speed 10468.42 samples/sec Loss 12.0532 LearningRate 0.5428 Epoch: 3 Global Step: 18760 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:59:44,142-Speed 10457.62 samples/sec Loss 12.0548 LearningRate 0.5431 Epoch: 3 Global Step: 18770 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 18:59:51,944-Speed 10501.18 samples/sec Loss 12.1186 LearningRate 0.5434 Epoch: 3 Global Step: 18780 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 18:59:59,754-Speed 10494.46 samples/sec Loss 12.1670 LearningRate 0.5437 Epoch: 3 Global Step: 18790 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 19:00:07,563-Speed 10493.73 samples/sec Loss 12.3220 LearningRate 0.5440 Epoch: 3 Global Step: 18800 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 19:00:15,346-Speed 10527.65 samples/sec Loss 12.3877 LearningRate 0.5443 Epoch: 3 Global Step: 18810 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 19:00:23,160-Speed 10485.30 samples/sec Loss 12.2129 LearningRate 0.5446 Epoch: 3 Global Step: 18820 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 19:00:30,991-Speed 10462.86 samples/sec Loss 12.0825 LearningRate 0.5448 Epoch: 3 Global Step: 18830 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 19:00:38,787-Speed 10508.51 samples/sec Loss 12.1543 LearningRate 0.5451 Epoch: 3 Global Step: 18840 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 19:00:46,589-Speed 10502.47 samples/sec Loss 12.1403 LearningRate 0.5454 Epoch: 3 Global Step: 18850 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 19:00:54,400-Speed 10489.59 samples/sec Loss 12.0790 LearningRate 0.5457 Epoch: 3 Global Step: 18860 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 19:01:02,198-Speed 10506.52 samples/sec Loss 12.2393 LearningRate 0.5460 Epoch: 3 Global Step: 18870 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 19:01:09,972-Speed 10540.19 samples/sec Loss 12.2499 LearningRate 0.5463 Epoch: 3 Global Step: 18880 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 19:01:17,787-Speed 10489.73 samples/sec Loss 12.1557 LearningRate 0.5466 Epoch: 3 Global Step: 18890 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 19:01:25,591-Speed 10501.00 samples/sec Loss 12.1416 LearningRate 0.5469 Epoch: 3 Global Step: 18900 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 19:01:33,389-Speed 10506.43 samples/sec Loss 12.1499 LearningRate 0.5472 Epoch: 3 Global Step: 18910 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 19:01:41,168-Speed 10533.34 samples/sec Loss 12.2740 LearningRate 0.5475 Epoch: 3 Global Step: 18920 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 19:01:48,976-Speed 10494.67 samples/sec Loss 12.2094 LearningRate 0.5477 Epoch: 3 Global Step: 18930 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 19:01:56,806-Speed 10464.26 samples/sec Loss 12.1851 LearningRate 0.5480 Epoch: 3 Global Step: 18940 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 19:02:04,593-Speed 10521.84 samples/sec Loss 12.1895 LearningRate 0.5483 Epoch: 3 Global Step: 18950 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-01-15 19:02:12,446-Speed 10433.39 samples/sec Loss 12.3039 LearningRate 0.5486 Epoch: 3 Global Step: 18960 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 19:02:20,245-Speed 10505.96 samples/sec Loss 12.2116 LearningRate 0.5489 Epoch: 3 Global Step: 18970 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 19:02:28,074-Speed 10466.31 samples/sec Loss 12.3654 LearningRate 0.5492 Epoch: 3 Global Step: 18980 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 19:02:35,870-Speed 10510.41 samples/sec Loss 12.2252 LearningRate 0.5495 Epoch: 3 Global Step: 18990 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 19:02:43,715-Speed 10443.10 samples/sec Loss 12.3250 LearningRate 0.5498 Epoch: 3 Global Step: 19000 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 19:02:51,568-Speed 10434.10 samples/sec Loss 12.1949 LearningRate 0.5501 Epoch: 3 Global Step: 19010 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 19:02:59,365-Speed 10507.68 samples/sec Loss 12.1719 LearningRate 0.5503 Epoch: 3 Global Step: 19020 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 19:03:07,244-Speed 10399.42 samples/sec Loss 12.1883 LearningRate 0.5506 Epoch: 3 Global Step: 19030 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 19:03:15,039-Speed 10510.98 samples/sec Loss 12.3595 LearningRate 0.5509 Epoch: 3 Global Step: 19040 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 19:03:22,844-Speed 10496.91 samples/sec Loss 12.2992 LearningRate 0.5512 Epoch: 3 Global Step: 19050 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 19:03:30,654-Speed 10490.94 samples/sec Loss 12.3547 LearningRate 0.5515 Epoch: 3 Global Step: 19060 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 19:03:38,482-Speed 10466.62 samples/sec Loss 12.1613 LearningRate 0.5518 Epoch: 3 Global Step: 19070 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 19:03:46,274-Speed 10513.93 samples/sec Loss 12.2766 LearningRate 0.5521 Epoch: 3 Global Step: 19080 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 19:03:54,123-Speed 10438.85 samples/sec Loss 12.1922 LearningRate 0.5524 Epoch: 3 Global Step: 19090 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 19:04:01,944-Speed 10475.85 samples/sec Loss 12.1631 LearningRate 0.5527 Epoch: 3 Global Step: 19100 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 19:04:09,751-Speed 10495.92 samples/sec Loss 12.2147 LearningRate 0.5530 Epoch: 3 Global Step: 19110 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 19:04:17,541-Speed 10517.31 samples/sec Loss 12.1419 LearningRate 0.5532 Epoch: 3 Global Step: 19120 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 19:04:25,339-Speed 10507.97 samples/sec Loss 12.2775 LearningRate 0.5535 Epoch: 3 Global Step: 19130 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 19:04:33,136-Speed 10508.48 samples/sec Loss 12.3242 LearningRate 0.5538 Epoch: 3 Global Step: 19140 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 19:04:40,917-Speed 10529.47 samples/sec Loss 12.2433 LearningRate 0.5541 Epoch: 3 Global Step: 19150 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 19:04:48,710-Speed 10514.45 samples/sec Loss 12.2546 LearningRate 0.5544 Epoch: 3 Global Step: 19160 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 19:04:56,522-Speed 10488.86 samples/sec Loss 12.2686 LearningRate 0.5547 Epoch: 3 Global Step: 19170 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 19:05:04,335-Speed 10486.42 samples/sec Loss 12.3360 LearningRate 0.5550 Epoch: 3 Global Step: 19180 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 19:05:12,155-Speed 10477.79 samples/sec Loss 12.1749 LearningRate 0.5553 Epoch: 3 Global Step: 19190 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 19:05:19,950-Speed 10511.22 samples/sec Loss 12.2683 LearningRate 0.5556 Epoch: 3 Global Step: 19200 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-01-15 19:05:27,780-Speed 10464.04 samples/sec Loss 12.2182 LearningRate 0.5558 Epoch: 3 Global Step: 19210 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:05:35,585-Speed 10498.80 samples/sec Loss 12.2351 LearningRate 0.5561 Epoch: 3 Global Step: 19220 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:05:43,435-Speed 10436.93 samples/sec Loss 12.3015 LearningRate 0.5564 Epoch: 3 Global Step: 19230 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:05:51,267-Speed 10460.83 samples/sec Loss 12.4377 LearningRate 0.5567 Epoch: 3 Global Step: 19240 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:05:59,127-Speed 10424.16 samples/sec Loss 12.3110 LearningRate 0.5570 Epoch: 3 Global Step: 19250 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:06:06,982-Speed 10431.20 samples/sec Loss 12.3478 LearningRate 0.5573 Epoch: 3 Global Step: 19260 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:06:14,802-Speed 10477.41 samples/sec Loss 12.1849 LearningRate 0.5576 Epoch: 3 Global Step: 19270 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:06:22,589-Speed 10521.89 samples/sec Loss 12.3584 LearningRate 0.5579 Epoch: 3 Global Step: 19280 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:06:30,386-Speed 10508.51 samples/sec Loss 12.1789 LearningRate 0.5582 Epoch: 3 Global Step: 19290 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:06:38,191-Speed 10496.58 samples/sec Loss 12.4373 LearningRate 0.5584 Epoch: 3 Global Step: 19300 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:06:46,034-Speed 10446.38 samples/sec Loss 12.2616 LearningRate 0.5587 Epoch: 3 Global Step: 19310 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:06:53,868-Speed 10458.39 samples/sec Loss 12.4072 LearningRate 0.5590 Epoch: 3 Global Step: 19320 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:07:01,661-Speed 10514.63 samples/sec Loss 12.1999 LearningRate 0.5593 Epoch: 3 Global Step: 19330 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:07:09,475-Speed 10484.71 samples/sec Loss 12.1403 LearningRate 0.5596 Epoch: 3 Global Step: 19340 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:07:17,275-Speed 10504.29 samples/sec Loss 12.3386 LearningRate 0.5599 Epoch: 3 Global Step: 19350 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:07:25,088-Speed 10487.72 samples/sec Loss 12.3693 LearningRate 0.5602 Epoch: 3 Global Step: 19360 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:07:32,875-Speed 10532.52 samples/sec Loss 12.3916 LearningRate 0.5605 Epoch: 3 Global Step: 19370 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:07:40,662-Speed 10521.31 samples/sec Loss 12.2073 LearningRate 0.5608 Epoch: 3 Global Step: 19380 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:07:48,502-Speed 10452.00 samples/sec Loss 12.2734 LearningRate 0.5611 Epoch: 3 Global Step: 19390 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:07:56,301-Speed 10505.51 samples/sec Loss 12.1767 LearningRate 0.5613 Epoch: 3 Global Step: 19400 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:08:04,102-Speed 10503.32 samples/sec Loss 12.3238 LearningRate 0.5616 Epoch: 3 Global Step: 19410 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:08:11,899-Speed 10507.40 samples/sec Loss 12.2312 LearningRate 0.5619 Epoch: 3 Global Step: 19420 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:08:19,702-Speed 10500.74 samples/sec Loss 12.3531 LearningRate 0.5622 Epoch: 3 Global Step: 19430 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:08:27,503-Speed 10502.47 samples/sec Loss 12.3179 LearningRate 0.5625 Epoch: 3 Global Step: 19440 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:08:35,289-Speed 10524.10 samples/sec Loss 12.2464 LearningRate 0.5628 Epoch: 3 Global Step: 19450 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:08:43,066-Speed 10535.87 samples/sec Loss 12.3921 LearningRate 0.5631 Epoch: 3 Global Step: 19460 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:08:50,883-Speed 10482.25 samples/sec Loss 12.3867 LearningRate 0.5634 Epoch: 3 Global Step: 19470 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:08:58,675-Speed 10516.26 samples/sec Loss 12.4636 LearningRate 0.5637 Epoch: 3 Global Step: 19480 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:09:06,462-Speed 10522.75 samples/sec Loss 12.2854 LearningRate 0.5639 Epoch: 3 Global Step: 19490 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:09:14,237-Speed 10539.28 samples/sec Loss 12.3161 LearningRate 0.5642 Epoch: 3 Global Step: 19500 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:09:22,049-Speed 10488.72 samples/sec Loss 12.2619 LearningRate 0.5645 Epoch: 3 Global Step: 19510 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:09:29,877-Speed 10466.96 samples/sec Loss 12.2902 LearningRate 0.5648 Epoch: 3 Global Step: 19520 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:09:37,719-Speed 10448.51 samples/sec Loss 12.2895 LearningRate 0.5651 Epoch: 3 Global Step: 19530 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:09:45,560-Speed 10450.93 samples/sec Loss 12.3900 LearningRate 0.5654 Epoch: 3 Global Step: 19540 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:09:53,358-Speed 10506.53 samples/sec Loss 12.3283 LearningRate 0.5657 Epoch: 3 Global Step: 19550 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:10:01,144-Speed 10524.65 samples/sec Loss 12.3656 LearningRate 0.5660 Epoch: 3 Global Step: 19560 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:10:08,927-Speed 10526.60 samples/sec Loss 12.3604 LearningRate 0.5663 Epoch: 3 Global Step: 19570 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:10:16,722-Speed 10511.57 samples/sec Loss 12.5545 LearningRate 0.5666 Epoch: 3 Global Step: 19580 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:10:24,537-Speed 10484.15 samples/sec Loss 12.3914 LearningRate 0.5668 Epoch: 3 Global Step: 19590 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:10:32,368-Speed 10463.29 samples/sec Loss 12.2527 LearningRate 0.5671 Epoch: 3 Global Step: 19600 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:10:40,184-Speed 10483.00 samples/sec Loss 12.3143 LearningRate 0.5674 Epoch: 3 Global Step: 19610 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:10:47,954-Speed 10545.88 samples/sec Loss 12.3268 LearningRate 0.5677 Epoch: 3 Global Step: 19620 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:10:55,742-Speed 10521.90 samples/sec Loss 12.2654 LearningRate 0.5680 Epoch: 3 Global Step: 19630 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:11:03,556-Speed 10485.66 samples/sec Loss 12.3464 LearningRate 0.5683 Epoch: 3 Global Step: 19640 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:11:11,365-Speed 10492.71 samples/sec Loss 12.3561 LearningRate 0.5686 Epoch: 3 Global Step: 19650 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:11:19,152-Speed 10521.72 samples/sec Loss 12.3390 LearningRate 0.5689 Epoch: 3 Global Step: 19660 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:11:26,955-Speed 10499.66 samples/sec Loss 12.3428 LearningRate 0.5692 Epoch: 3 Global Step: 19670 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:11:34,749-Speed 10513.43 samples/sec Loss 12.2955 LearningRate 0.5694 Epoch: 3 Global Step: 19680 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-15 19:11:42,522-Speed 10541.06 samples/sec Loss 12.3604 LearningRate 0.5697 Epoch: 3 Global Step: 19690 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-15 19:11:50,291-Speed 10546.64 samples/sec Loss 12.3675 LearningRate 0.5700 Epoch: 3 Global Step: 19700 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-15 19:11:58,113-Speed 10475.58 samples/sec Loss 12.5989 LearningRate 0.5703 Epoch: 3 Global Step: 19710 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-15 19:12:05,925-Speed 10487.76 samples/sec Loss 12.4316 LearningRate 0.5706 Epoch: 3 Global Step: 19720 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-15 19:12:13,717-Speed 10514.09 samples/sec Loss 12.5177 LearningRate 0.5709 Epoch: 3 Global Step: 19730 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-15 19:12:21,548-Speed 10463.41 samples/sec Loss 12.4544 LearningRate 0.5712 Epoch: 3 Global Step: 19740 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-15 19:12:29,391-Speed 10445.67 samples/sec Loss 12.3679 LearningRate 0.5715 Epoch: 3 Global Step: 19750 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-15 19:12:37,166-Speed 10538.17 samples/sec Loss 12.2312 LearningRate 0.5718 Epoch: 3 Global Step: 19760 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-15 19:12:44,980-Speed 10484.61 samples/sec Loss 12.3506 LearningRate 0.5720 Epoch: 3 Global Step: 19770 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-15 19:12:52,803-Speed 10473.06 samples/sec Loss 12.4179 LearningRate 0.5723 Epoch: 3 Global Step: 19780 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:13:00,591-Speed 10520.87 samples/sec Loss 12.3161 LearningRate 0.5726 Epoch: 3 Global Step: 19790 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:13:08,409-Speed 10479.69 samples/sec Loss 12.3853 LearningRate 0.5729 Epoch: 3 Global Step: 19800 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:13:16,225-Speed 10481.73 samples/sec Loss 12.4117 LearningRate 0.5732 Epoch: 3 Global Step: 19810 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:13:24,061-Speed 10456.59 samples/sec Loss 12.2813 LearningRate 0.5735 Epoch: 3 Global Step: 19820 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:13:31,929-Speed 10413.28 samples/sec Loss 12.3711 LearningRate 0.5738 Epoch: 3 Global Step: 19830 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:13:39,729-Speed 10504.31 samples/sec Loss 12.3222 LearningRate 0.5741 Epoch: 3 Global Step: 19840 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:13:47,520-Speed 10515.08 samples/sec Loss 12.3376 LearningRate 0.5744 Epoch: 3 Global Step: 19850 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:13:55,311-Speed 10515.87 samples/sec Loss 12.3915 LearningRate 0.5747 Epoch: 3 Global Step: 19860 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:14:03,110-Speed 10505.35 samples/sec Loss 12.4276 LearningRate 0.5749 Epoch: 3 Global Step: 19870 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:14:10,925-Speed 10483.69 samples/sec Loss 12.2893 LearningRate 0.5752 Epoch: 3 Global Step: 19880 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:14:18,761-Speed 10456.38 samples/sec Loss 12.3668 LearningRate 0.5755 Epoch: 3 Global Step: 19890 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:14:26,586-Speed 10471.19 samples/sec Loss 12.4372 LearningRate 0.5758 Epoch: 3 Global Step: 19900 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:14:34,391-Speed 10496.40 samples/sec Loss 12.5691 LearningRate 0.5761 Epoch: 3 Global Step: 19910 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:14:42,209-Speed 10478.41 samples/sec Loss 12.4340 LearningRate 0.5764 Epoch: 3 Global Step: 19920 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:14:50,055-Speed 10442.94 samples/sec Loss 12.4227 LearningRate 0.5767 Epoch: 3 Global Step: 19930 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:14:57,883-Speed 10467.23 samples/sec Loss 12.4791 LearningRate 0.5770 Epoch: 3 Global Step: 19940 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:15:05,737-Speed 10430.33 samples/sec Loss 12.4970 LearningRate 0.5773 Epoch: 3 Global Step: 19950 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:15:13,530-Speed 10513.88 samples/sec Loss 12.4475 LearningRate 0.5775 Epoch: 3 Global Step: 19960 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:15:21,319-Speed 10519.62 samples/sec Loss 12.3304 LearningRate 0.5778 Epoch: 3 Global Step: 19970 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:15:29,137-Speed 10479.95 samples/sec Loss 12.3077 LearningRate 0.5781 Epoch: 3 Global Step: 19980 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:15:36,939-Speed 10501.43 samples/sec Loss 12.8934 LearningRate 0.5784 Epoch: 3 Global Step: 19990 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:15:44,714-Speed 10539.06 samples/sec Loss 12.8263 LearningRate 0.5787 Epoch: 3 Global Step: 20000 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:16:11,643-[lfw][20000]XNorm: 22.510489 Training: 2022-01-15 19:16:11,643-[lfw][20000]Accuracy-Flip: 0.99433+-0.00396 Training: 2022-01-15 19:16:11,644-[lfw][20000]Accuracy-Highest: 0.99483 Training: 2022-01-15 19:16:43,574-[cfp_fp][20000]XNorm: 19.549580 Training: 2022-01-15 19:16:43,574-[cfp_fp][20000]Accuracy-Flip: 0.95829+-0.01002 Training: 2022-01-15 19:16:43,575-[cfp_fp][20000]Accuracy-Highest: 0.96829 Training: 2022-01-15 19:17:11,399-[agedb_30][20000]XNorm: 21.887470 Training: 2022-01-15 19:17:11,400-[agedb_30][20000]Accuracy-Flip: 0.95200+-0.00833 Training: 2022-01-15 19:17:11,401-[agedb_30][20000]Accuracy-Highest: 0.95250 Training: 2022-01-15 19:17:19,141-Speed 867.56 samples/sec Loss 12.5918 LearningRate 0.5790 Epoch: 3 Global Step: 20010 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:17:26,895-Speed 10567.30 samples/sec Loss 12.3186 LearningRate 0.5793 Epoch: 3 Global Step: 20020 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:17:34,635-Speed 10584.74 samples/sec Loss 12.3475 LearningRate 0.5796 Epoch: 3 Global Step: 20030 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:17:42,370-Speed 10593.35 samples/sec Loss 12.3007 LearningRate 0.5799 Epoch: 3 Global Step: 20040 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:17:50,113-Speed 10580.10 samples/sec Loss 12.2902 LearningRate 0.5802 Epoch: 3 Global Step: 20050 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:17:57,947-Speed 10459.32 samples/sec Loss 12.2658 LearningRate 0.5804 Epoch: 3 Global Step: 20060 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:18:05,717-Speed 10545.24 samples/sec Loss 12.3454 LearningRate 0.5807 Epoch: 3 Global Step: 20070 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:18:13,492-Speed 10536.71 samples/sec Loss 12.3956 LearningRate 0.5810 Epoch: 3 Global Step: 20080 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:18:21,264-Speed 10541.23 samples/sec Loss 12.4873 LearningRate 0.5813 Epoch: 3 Global Step: 20090 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:18:29,061-Speed 10508.75 samples/sec Loss 12.4286 LearningRate 0.5816 Epoch: 3 Global Step: 20100 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:18:36,837-Speed 10538.50 samples/sec Loss 12.4782 LearningRate 0.5819 Epoch: 3 Global Step: 20110 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:18:44,614-Speed 10534.04 samples/sec Loss 12.3507 LearningRate 0.5822 Epoch: 3 Global Step: 20120 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:18:52,396-Speed 10527.79 samples/sec Loss 12.4951 LearningRate 0.5825 Epoch: 3 Global Step: 20130 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:19:00,185-Speed 10519.03 samples/sec Loss 12.4214 LearningRate 0.5828 Epoch: 3 Global Step: 20140 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:19:07,960-Speed 10537.85 samples/sec Loss 12.3716 LearningRate 0.5830 Epoch: 3 Global Step: 20150 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:19:15,735-Speed 10538.02 samples/sec Loss 12.4427 LearningRate 0.5833 Epoch: 3 Global Step: 20160 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:19:23,531-Speed 10509.59 samples/sec Loss 12.4309 LearningRate 0.5836 Epoch: 3 Global Step: 20170 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:19:31,388-Speed 10427.38 samples/sec Loss 12.4116 LearningRate 0.5839 Epoch: 3 Global Step: 20180 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:19:39,300-Speed 10356.41 samples/sec Loss 12.4013 LearningRate 0.5842 Epoch: 3 Global Step: 20190 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:19:47,055-Speed 10565.02 samples/sec Loss 12.6275 LearningRate 0.5845 Epoch: 3 Global Step: 20200 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:19:54,839-Speed 10525.83 samples/sec Loss 12.5041 LearningRate 0.5848 Epoch: 3 Global Step: 20210 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:20:02,632-Speed 10513.67 samples/sec Loss 12.4671 LearningRate 0.5851 Epoch: 3 Global Step: 20220 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:20:10,435-Speed 10500.50 samples/sec Loss 12.5309 LearningRate 0.5854 Epoch: 3 Global Step: 20230 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:20:18,221-Speed 10521.70 samples/sec Loss 12.4883 LearningRate 0.5856 Epoch: 3 Global Step: 20240 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:20:26,003-Speed 10528.69 samples/sec Loss 12.4980 LearningRate 0.5859 Epoch: 3 Global Step: 20250 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:20:33,782-Speed 10532.86 samples/sec Loss 12.4559 LearningRate 0.5862 Epoch: 3 Global Step: 20260 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:20:41,583-Speed 10502.10 samples/sec Loss 12.5175 LearningRate 0.5865 Epoch: 3 Global Step: 20270 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:20:49,361-Speed 10535.60 samples/sec Loss 12.4290 LearningRate 0.5868 Epoch: 3 Global Step: 20280 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:20:57,152-Speed 10519.44 samples/sec Loss 12.3616 LearningRate 0.5871 Epoch: 3 Global Step: 20290 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:21:05,053-Speed 10370.00 samples/sec Loss 12.4780 LearningRate 0.5874 Epoch: 3 Global Step: 20300 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:21:12,870-Speed 10480.79 samples/sec Loss 12.5735 LearningRate 0.5877 Epoch: 3 Global Step: 20310 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:21:20,680-Speed 10490.11 samples/sec Loss 12.4525 LearningRate 0.5880 Epoch: 3 Global Step: 20320 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:21:28,466-Speed 10524.18 samples/sec Loss 12.4721 LearningRate 0.5883 Epoch: 3 Global Step: 20330 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:21:36,281-Speed 10491.47 samples/sec Loss 12.5907 LearningRate 0.5885 Epoch: 3 Global Step: 20340 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:21:44,070-Speed 10517.17 samples/sec Loss 12.4866 LearningRate 0.5888 Epoch: 3 Global Step: 20350 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:21:51,853-Speed 10526.50 samples/sec Loss 12.5522 LearningRate 0.5891 Epoch: 3 Global Step: 20360 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:21:59,642-Speed 10523.21 samples/sec Loss 12.3940 LearningRate 0.5894 Epoch: 3 Global Step: 20370 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:22:07,422-Speed 10531.71 samples/sec Loss 12.3445 LearningRate 0.5897 Epoch: 3 Global Step: 20380 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:22:15,206-Speed 10523.66 samples/sec Loss 12.7067 LearningRate 0.5900 Epoch: 3 Global Step: 20390 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:22:22,987-Speed 10530.38 samples/sec Loss 12.6810 LearningRate 0.5903 Epoch: 3 Global Step: 20400 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:22:30,784-Speed 10508.45 samples/sec Loss 12.6548 LearningRate 0.5906 Epoch: 3 Global Step: 20410 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:22:38,572-Speed 10519.43 samples/sec Loss 12.5802 LearningRate 0.5909 Epoch: 3 Global Step: 20420 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:22:46,401-Speed 10465.16 samples/sec Loss 12.3844 LearningRate 0.5911 Epoch: 3 Global Step: 20430 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:22:54,196-Speed 10510.91 samples/sec Loss 12.4990 LearningRate 0.5914 Epoch: 3 Global Step: 20440 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:23:02,003-Speed 10494.63 samples/sec Loss 12.3410 LearningRate 0.5917 Epoch: 3 Global Step: 20450 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:23:09,793-Speed 10517.25 samples/sec Loss 12.6675 LearningRate 0.5920 Epoch: 3 Global Step: 20460 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:23:17,615-Speed 10474.76 samples/sec Loss 12.6442 LearningRate 0.5923 Epoch: 3 Global Step: 20470 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:23:25,408-Speed 10512.76 samples/sec Loss 12.5585 LearningRate 0.5926 Epoch: 3 Global Step: 20480 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:23:33,230-Speed 10474.91 samples/sec Loss 12.4701 LearningRate 0.5929 Epoch: 3 Global Step: 20490 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:23:41,051-Speed 10474.82 samples/sec Loss 12.4811 LearningRate 0.5932 Epoch: 3 Global Step: 20500 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:23:48,832-Speed 10529.91 samples/sec Loss 12.5142 LearningRate 0.5935 Epoch: 3 Global Step: 20510 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:23:56,630-Speed 10506.29 samples/sec Loss 12.5239 LearningRate 0.5938 Epoch: 3 Global Step: 20520 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:24:04,496-Speed 10416.17 samples/sec Loss 12.5547 LearningRate 0.5940 Epoch: 3 Global Step: 20530 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:24:12,314-Speed 10481.16 samples/sec Loss 12.5357 LearningRate 0.5943 Epoch: 3 Global Step: 20540 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:24:20,166-Speed 10434.20 samples/sec Loss 12.5979 LearningRate 0.5946 Epoch: 3 Global Step: 20550 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:24:27,956-Speed 10516.42 samples/sec Loss 12.4919 LearningRate 0.5949 Epoch: 3 Global Step: 20560 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:24:35,757-Speed 10503.04 samples/sec Loss 12.5991 LearningRate 0.5952 Epoch: 3 Global Step: 20570 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:24:43,553-Speed 10509.07 samples/sec Loss 12.5139 LearningRate 0.5955 Epoch: 3 Global Step: 20580 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:24:51,337-Speed 10526.15 samples/sec Loss 12.5921 LearningRate 0.5958 Epoch: 3 Global Step: 20590 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:24:59,141-Speed 10499.30 samples/sec Loss 12.4991 LearningRate 0.5961 Epoch: 3 Global Step: 20600 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:25:06,923-Speed 10527.95 samples/sec Loss 12.5502 LearningRate 0.5964 Epoch: 3 Global Step: 20610 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:25:14,737-Speed 10484.00 samples/sec Loss 12.5848 LearningRate 0.5966 Epoch: 3 Global Step: 20620 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:25:22,524-Speed 10521.75 samples/sec Loss 12.5501 LearningRate 0.5969 Epoch: 3 Global Step: 20630 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:25:30,325-Speed 10502.99 samples/sec Loss 12.4853 LearningRate 0.5972 Epoch: 3 Global Step: 20640 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:25:38,095-Speed 10545.63 samples/sec Loss 12.6028 LearningRate 0.5975 Epoch: 3 Global Step: 20650 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:25:45,900-Speed 10495.51 samples/sec Loss 12.6522 LearningRate 0.5978 Epoch: 3 Global Step: 20660 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:25:53,696-Speed 10509.90 samples/sec Loss 12.6707 LearningRate 0.5981 Epoch: 3 Global Step: 20670 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:26:01,477-Speed 10530.30 samples/sec Loss 12.5909 LearningRate 0.5984 Epoch: 3 Global Step: 20680 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:26:09,311-Speed 10459.34 samples/sec Loss 12.5585 LearningRate 0.5987 Epoch: 3 Global Step: 20690 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:26:17,121-Speed 10490.26 samples/sec Loss 12.7030 LearningRate 0.5990 Epoch: 3 Global Step: 20700 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:26:24,953-Speed 10462.47 samples/sec Loss 12.6313 LearningRate 0.5992 Epoch: 3 Global Step: 20710 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:26:32,773-Speed 10476.76 samples/sec Loss 12.5787 LearningRate 0.5995 Epoch: 3 Global Step: 20720 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:26:40,599-Speed 10472.57 samples/sec Loss 12.5420 LearningRate 0.5998 Epoch: 3 Global Step: 20730 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:26:48,424-Speed 10470.19 samples/sec Loss 12.5760 LearningRate 0.5999 Epoch: 3 Global Step: 20740 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:27:11,285-Speed 3583.52 samples/sec Loss 12.7261 LearningRate 0.5998 Epoch: 4 Global Step: 20750 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:27:19,117-Speed 10462.12 samples/sec Loss 12.7621 LearningRate 0.5997 Epoch: 4 Global Step: 20760 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:27:26,934-Speed 10480.95 samples/sec Loss 12.5734 LearningRate 0.5995 Epoch: 4 Global Step: 20770 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:27:34,671-Speed 10590.13 samples/sec Loss 12.4603 LearningRate 0.5994 Epoch: 4 Global Step: 20780 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:27:42,420-Speed 10572.20 samples/sec Loss 12.4839 LearningRate 0.5992 Epoch: 4 Global Step: 20790 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:27:50,168-Speed 10575.58 samples/sec Loss 12.5619 LearningRate 0.5991 Epoch: 4 Global Step: 20800 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:27:57,967-Speed 10505.46 samples/sec Loss 12.5532 LearningRate 0.5989 Epoch: 4 Global Step: 20810 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:28:05,800-Speed 10458.67 samples/sec Loss 12.5422 LearningRate 0.5988 Epoch: 4 Global Step: 20820 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:28:13,598-Speed 10507.04 samples/sec Loss 12.5250 LearningRate 0.5986 Epoch: 4 Global Step: 20830 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:28:21,386-Speed 10520.92 samples/sec Loss 12.5142 LearningRate 0.5985 Epoch: 4 Global Step: 20840 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:28:29,190-Speed 10497.95 samples/sec Loss 12.5303 LearningRate 0.5984 Epoch: 4 Global Step: 20850 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:28:37,001-Speed 10489.59 samples/sec Loss 12.6478 LearningRate 0.5982 Epoch: 4 Global Step: 20860 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:28:44,834-Speed 10460.87 samples/sec Loss 12.5605 LearningRate 0.5981 Epoch: 4 Global Step: 20870 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:28:52,673-Speed 10451.78 samples/sec Loss 12.5962 LearningRate 0.5979 Epoch: 4 Global Step: 20880 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:29:00,448-Speed 10537.70 samples/sec Loss 12.4646 LearningRate 0.5978 Epoch: 4 Global Step: 20890 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:29:08,254-Speed 10498.14 samples/sec Loss 12.6246 LearningRate 0.5976 Epoch: 4 Global Step: 20900 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:29:16,051-Speed 10507.95 samples/sec Loss 12.6714 LearningRate 0.5975 Epoch: 4 Global Step: 20910 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:29:23,848-Speed 10507.24 samples/sec Loss 12.5431 LearningRate 0.5973 Epoch: 4 Global Step: 20920 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:29:31,640-Speed 10513.80 samples/sec Loss 12.5389 LearningRate 0.5972 Epoch: 4 Global Step: 20930 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:29:39,458-Speed 10480.41 samples/sec Loss 12.4869 LearningRate 0.5971 Epoch: 4 Global Step: 20940 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:29:47,245-Speed 10521.98 samples/sec Loss 12.5489 LearningRate 0.5969 Epoch: 4 Global Step: 20950 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:29:55,059-Speed 10484.63 samples/sec Loss 12.6364 LearningRate 0.5968 Epoch: 4 Global Step: 20960 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:30:02,841-Speed 10527.68 samples/sec Loss 12.8342 LearningRate 0.5966 Epoch: 4 Global Step: 20970 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:30:10,636-Speed 10511.31 samples/sec Loss 12.6554 LearningRate 0.5965 Epoch: 4 Global Step: 20980 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:30:18,429-Speed 10514.31 samples/sec Loss 12.6061 LearningRate 0.5963 Epoch: 4 Global Step: 20990 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:30:26,257-Speed 10465.12 samples/sec Loss 12.4940 LearningRate 0.5962 Epoch: 4 Global Step: 21000 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:30:34,087-Speed 10464.51 samples/sec Loss 12.4853 LearningRate 0.5960 Epoch: 4 Global Step: 21010 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:30:41,896-Speed 10492.09 samples/sec Loss 12.5547 LearningRate 0.5959 Epoch: 4 Global Step: 21020 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:30:49,693-Speed 10508.11 samples/sec Loss 12.5150 LearningRate 0.5958 Epoch: 4 Global Step: 21030 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:30:57,501-Speed 10493.15 samples/sec Loss 12.5365 LearningRate 0.5956 Epoch: 4 Global Step: 21040 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:31:05,302-Speed 10502.47 samples/sec Loss 12.5860 LearningRate 0.5955 Epoch: 4 Global Step: 21050 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:31:13,114-Speed 10489.12 samples/sec Loss 12.4963 LearningRate 0.5953 Epoch: 4 Global Step: 21060 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:31:20,930-Speed 10482.57 samples/sec Loss 12.6114 LearningRate 0.5952 Epoch: 4 Global Step: 21070 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:31:28,734-Speed 10497.99 samples/sec Loss 12.4771 LearningRate 0.5950 Epoch: 4 Global Step: 21080 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:31:36,538-Speed 10498.76 samples/sec Loss 12.5848 LearningRate 0.5949 Epoch: 4 Global Step: 21090 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:31:44,363-Speed 10471.22 samples/sec Loss 12.5823 LearningRate 0.5947 Epoch: 4 Global Step: 21100 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:31:52,271-Speed 10360.13 samples/sec Loss 12.5139 LearningRate 0.5946 Epoch: 4 Global Step: 21110 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:32:00,106-Speed 10455.91 samples/sec Loss 12.5594 LearningRate 0.5945 Epoch: 4 Global Step: 21120 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:32:07,938-Speed 10461.44 samples/sec Loss 12.5370 LearningRate 0.5943 Epoch: 4 Global Step: 21130 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:32:15,837-Speed 10373.15 samples/sec Loss 12.5980 LearningRate 0.5942 Epoch: 4 Global Step: 21140 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:32:23,685-Speed 10439.69 samples/sec Loss 12.4143 LearningRate 0.5940 Epoch: 4 Global Step: 21150 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:32:31,500-Speed 10483.53 samples/sec Loss 12.5104 LearningRate 0.5939 Epoch: 4 Global Step: 21160 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:32:39,320-Speed 10476.57 samples/sec Loss 12.4736 LearningRate 0.5937 Epoch: 4 Global Step: 21170 Fp16 Grad Scale: 524288 Required: 18 hours Training: 2022-01-15 19:32:47,152-Speed 10461.87 samples/sec Loss 12.5609 LearningRate 0.5936 Epoch: 4 Global Step: 21180 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:32:54,989-Speed 10452.89 samples/sec Loss 12.5598 LearningRate 0.5934 Epoch: 4 Global Step: 21190 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:33:02,873-Speed 10393.05 samples/sec Loss 12.4611 LearningRate 0.5933 Epoch: 4 Global Step: 21200 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:33:10,702-Speed 10464.24 samples/sec Loss 12.5378 LearningRate 0.5932 Epoch: 4 Global Step: 21210 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:33:18,547-Speed 10443.97 samples/sec Loss 12.4927 LearningRate 0.5930 Epoch: 4 Global Step: 21220 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:33:26,410-Speed 10419.89 samples/sec Loss 12.5414 LearningRate 0.5929 Epoch: 4 Global Step: 21230 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:33:34,256-Speed 10441.99 samples/sec Loss 12.6690 LearningRate 0.5927 Epoch: 4 Global Step: 21240 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:33:42,071-Speed 10484.91 samples/sec Loss 12.7050 LearningRate 0.5926 Epoch: 4 Global Step: 21250 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:33:49,910-Speed 10450.74 samples/sec Loss 12.5542 LearningRate 0.5924 Epoch: 4 Global Step: 21260 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:33:57,726-Speed 10481.77 samples/sec Loss 12.4857 LearningRate 0.5923 Epoch: 4 Global Step: 21270 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:34:05,559-Speed 10460.31 samples/sec Loss 12.4994 LearningRate 0.5922 Epoch: 4 Global Step: 21280 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:34:13,378-Speed 10479.78 samples/sec Loss 12.5757 LearningRate 0.5920 Epoch: 4 Global Step: 21290 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:34:21,215-Speed 10454.59 samples/sec Loss 12.4361 LearningRate 0.5919 Epoch: 4 Global Step: 21300 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:34:29,104-Speed 10384.25 samples/sec Loss 12.3928 LearningRate 0.5917 Epoch: 4 Global Step: 21310 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:34:36,924-Speed 10477.44 samples/sec Loss 12.5955 LearningRate 0.5916 Epoch: 4 Global Step: 21320 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:34:44,790-Speed 10415.76 samples/sec Loss 12.5047 LearningRate 0.5914 Epoch: 4 Global Step: 21330 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:34:52,619-Speed 10466.82 samples/sec Loss 12.5168 LearningRate 0.5913 Epoch: 4 Global Step: 21340 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:35:00,468-Speed 10438.51 samples/sec Loss 12.6664 LearningRate 0.5911 Epoch: 4 Global Step: 21350 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:35:08,331-Speed 10419.94 samples/sec Loss 12.5407 LearningRate 0.5910 Epoch: 4 Global Step: 21360 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:35:16,173-Speed 10448.03 samples/sec Loss 12.5048 LearningRate 0.5909 Epoch: 4 Global Step: 21370 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:35:24,001-Speed 10466.48 samples/sec Loss 12.5130 LearningRate 0.5907 Epoch: 4 Global Step: 21380 Fp16 Grad Scale: 524288 Required: 18 hours Training: 2022-01-15 19:35:31,840-Speed 10451.66 samples/sec Loss 12.4568 LearningRate 0.5906 Epoch: 4 Global Step: 21390 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:35:39,684-Speed 10445.37 samples/sec Loss 12.5404 LearningRate 0.5904 Epoch: 4 Global Step: 21400 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:35:47,518-Speed 10456.85 samples/sec Loss 12.3680 LearningRate 0.5903 Epoch: 4 Global Step: 21410 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:35:55,342-Speed 10472.34 samples/sec Loss 12.4448 LearningRate 0.5901 Epoch: 4 Global Step: 21420 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:36:03,217-Speed 10404.36 samples/sec Loss 12.4346 LearningRate 0.5900 Epoch: 4 Global Step: 21430 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:36:11,051-Speed 10458.44 samples/sec Loss 12.4442 LearningRate 0.5899 Epoch: 4 Global Step: 21440 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:36:18,884-Speed 10458.88 samples/sec Loss 12.4460 LearningRate 0.5897 Epoch: 4 Global Step: 21450 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:36:26,758-Speed 10407.93 samples/sec Loss 12.4394 LearningRate 0.5896 Epoch: 4 Global Step: 21460 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:36:34,566-Speed 10494.80 samples/sec Loss 12.4038 LearningRate 0.5894 Epoch: 4 Global Step: 21470 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:36:42,407-Speed 10447.91 samples/sec Loss 12.4445 LearningRate 0.5893 Epoch: 4 Global Step: 21480 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:36:50,231-Speed 10472.04 samples/sec Loss 12.6080 LearningRate 0.5891 Epoch: 4 Global Step: 21490 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:36:58,041-Speed 10491.46 samples/sec Loss 12.5344 LearningRate 0.5890 Epoch: 4 Global Step: 21500 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:37:05,876-Speed 10456.61 samples/sec Loss 12.5061 LearningRate 0.5889 Epoch: 4 Global Step: 21510 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:37:13,684-Speed 10493.40 samples/sec Loss 12.5429 LearningRate 0.5887 Epoch: 4 Global Step: 21520 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:37:21,492-Speed 10492.58 samples/sec Loss 12.4759 LearningRate 0.5886 Epoch: 4 Global Step: 21530 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:37:29,331-Speed 10452.96 samples/sec Loss 12.3882 LearningRate 0.5884 Epoch: 4 Global Step: 21540 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:37:37,150-Speed 10477.39 samples/sec Loss 12.5075 LearningRate 0.5883 Epoch: 4 Global Step: 21550 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:37:44,959-Speed 10491.21 samples/sec Loss 12.4250 LearningRate 0.5881 Epoch: 4 Global Step: 21560 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:37:52,751-Speed 10518.03 samples/sec Loss 12.5740 LearningRate 0.5880 Epoch: 4 Global Step: 21570 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:38:00,587-Speed 10455.66 samples/sec Loss 12.5156 LearningRate 0.5879 Epoch: 4 Global Step: 21580 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:38:08,383-Speed 10509.30 samples/sec Loss 12.4342 LearningRate 0.5877 Epoch: 4 Global Step: 21590 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:38:16,173-Speed 10519.89 samples/sec Loss 12.5136 LearningRate 0.5876 Epoch: 4 Global Step: 21600 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:38:23,990-Speed 10482.15 samples/sec Loss 12.3843 LearningRate 0.5874 Epoch: 4 Global Step: 21610 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:38:31,785-Speed 10517.08 samples/sec Loss 12.4669 LearningRate 0.5873 Epoch: 4 Global Step: 21620 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:38:39,572-Speed 10520.26 samples/sec Loss 12.3580 LearningRate 0.5871 Epoch: 4 Global Step: 21630 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:38:47,377-Speed 10497.37 samples/sec Loss 12.3555 LearningRate 0.5870 Epoch: 4 Global Step: 21640 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:38:55,197-Speed 10478.52 samples/sec Loss 12.4203 LearningRate 0.5868 Epoch: 4 Global Step: 21650 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:39:03,005-Speed 10492.58 samples/sec Loss 12.3854 LearningRate 0.5867 Epoch: 4 Global Step: 21660 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:39:10,828-Speed 10473.44 samples/sec Loss 12.4946 LearningRate 0.5866 Epoch: 4 Global Step: 21670 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:39:18,633-Speed 10497.04 samples/sec Loss 12.5136 LearningRate 0.5864 Epoch: 4 Global Step: 21680 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:39:26,430-Speed 10508.62 samples/sec Loss 12.5074 LearningRate 0.5863 Epoch: 4 Global Step: 21690 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:39:34,266-Speed 10455.04 samples/sec Loss 12.5631 LearningRate 0.5861 Epoch: 4 Global Step: 21700 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:39:42,054-Speed 10520.84 samples/sec Loss 12.4737 LearningRate 0.5860 Epoch: 4 Global Step: 21710 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:39:49,833-Speed 10532.44 samples/sec Loss 12.4038 LearningRate 0.5858 Epoch: 4 Global Step: 21720 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:39:57,631-Speed 10506.35 samples/sec Loss 12.2713 LearningRate 0.5857 Epoch: 4 Global Step: 21730 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:40:05,448-Speed 10481.35 samples/sec Loss 12.3641 LearningRate 0.5856 Epoch: 4 Global Step: 21740 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:40:13,282-Speed 10458.71 samples/sec Loss 12.3714 LearningRate 0.5854 Epoch: 4 Global Step: 21750 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:40:21,074-Speed 10514.90 samples/sec Loss 12.5048 LearningRate 0.5853 Epoch: 4 Global Step: 21760 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:40:28,905-Speed 10462.66 samples/sec Loss 12.4577 LearningRate 0.5851 Epoch: 4 Global Step: 21770 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:40:36,732-Speed 10466.50 samples/sec Loss 12.4207 LearningRate 0.5850 Epoch: 4 Global Step: 21780 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:40:44,557-Speed 10471.76 samples/sec Loss 12.3630 LearningRate 0.5848 Epoch: 4 Global Step: 21790 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:40:52,365-Speed 10492.81 samples/sec Loss 12.3599 LearningRate 0.5847 Epoch: 4 Global Step: 21800 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:41:00,148-Speed 10526.04 samples/sec Loss 12.3051 LearningRate 0.5846 Epoch: 4 Global Step: 21810 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:41:07,961-Speed 10486.62 samples/sec Loss 12.3218 LearningRate 0.5844 Epoch: 4 Global Step: 21820 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:41:15,773-Speed 10487.90 samples/sec Loss 12.4544 LearningRate 0.5843 Epoch: 4 Global Step: 21830 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:41:23,562-Speed 10520.75 samples/sec Loss 12.3380 LearningRate 0.5841 Epoch: 4 Global Step: 21840 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:41:31,349-Speed 10520.48 samples/sec Loss 12.3240 LearningRate 0.5840 Epoch: 4 Global Step: 21850 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:41:39,170-Speed 10476.00 samples/sec Loss 12.4256 LearningRate 0.5838 Epoch: 4 Global Step: 21860 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:41:46,997-Speed 10468.83 samples/sec Loss 12.4305 LearningRate 0.5837 Epoch: 4 Global Step: 21870 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:41:54,817-Speed 10476.79 samples/sec Loss 12.3768 LearningRate 0.5836 Epoch: 4 Global Step: 21880 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:42:02,643-Speed 10469.48 samples/sec Loss 12.4685 LearningRate 0.5834 Epoch: 4 Global Step: 21890 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:42:10,473-Speed 10464.17 samples/sec Loss 12.3976 LearningRate 0.5833 Epoch: 4 Global Step: 21900 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:42:18,319-Speed 10441.47 samples/sec Loss 12.3955 LearningRate 0.5831 Epoch: 4 Global Step: 21910 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:42:26,133-Speed 10488.18 samples/sec Loss 12.3767 LearningRate 0.5830 Epoch: 4 Global Step: 21920 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:42:33,933-Speed 10503.94 samples/sec Loss 12.4123 LearningRate 0.5829 Epoch: 4 Global Step: 21930 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:42:41,791-Speed 10427.75 samples/sec Loss 12.4311 LearningRate 0.5827 Epoch: 4 Global Step: 21940 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-15 19:42:49,605-Speed 10485.58 samples/sec Loss 12.4874 LearningRate 0.5826 Epoch: 4 Global Step: 21950 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-15 19:42:57,497-Speed 10381.71 samples/sec Loss 12.4889 LearningRate 0.5824 Epoch: 4 Global Step: 21960 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-15 19:43:05,331-Speed 10458.45 samples/sec Loss 12.3010 LearningRate 0.5823 Epoch: 4 Global Step: 21970 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-15 19:43:13,154-Speed 10472.90 samples/sec Loss 12.3431 LearningRate 0.5821 Epoch: 4 Global Step: 21980 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-15 19:43:20,973-Speed 10477.74 samples/sec Loss 12.3414 LearningRate 0.5820 Epoch: 4 Global Step: 21990 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-15 19:43:28,796-Speed 10473.81 samples/sec Loss 12.3197 LearningRate 0.5819 Epoch: 4 Global Step: 22000 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-15 19:43:36,614-Speed 10479.02 samples/sec Loss 12.3368 LearningRate 0.5817 Epoch: 4 Global Step: 22010 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-15 19:43:44,441-Speed 10466.91 samples/sec Loss 12.3984 LearningRate 0.5816 Epoch: 4 Global Step: 22020 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-15 19:43:52,264-Speed 10474.35 samples/sec Loss 12.3128 LearningRate 0.5814 Epoch: 4 Global Step: 22030 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-01-15 19:44:00,121-Speed 10428.36 samples/sec Loss 12.3180 LearningRate 0.5813 Epoch: 4 Global Step: 22040 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:44:08,010-Speed 10385.30 samples/sec Loss 12.3302 LearningRate 0.5811 Epoch: 4 Global Step: 22050 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:44:15,862-Speed 10440.71 samples/sec Loss 12.3638 LearningRate 0.5810 Epoch: 4 Global Step: 22060 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:44:23,708-Speed 10442.39 samples/sec Loss 12.5253 LearningRate 0.5809 Epoch: 4 Global Step: 22070 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:44:31,576-Speed 10413.28 samples/sec Loss 12.4457 LearningRate 0.5807 Epoch: 4 Global Step: 22080 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:44:39,427-Speed 10434.87 samples/sec Loss 12.4124 LearningRate 0.5806 Epoch: 4 Global Step: 22090 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:44:47,247-Speed 10477.63 samples/sec Loss 12.4199 LearningRate 0.5804 Epoch: 4 Global Step: 22100 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:44:55,094-Speed 10440.98 samples/sec Loss 12.2905 LearningRate 0.5803 Epoch: 4 Global Step: 22110 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:45:02,877-Speed 10526.86 samples/sec Loss 12.2896 LearningRate 0.5801 Epoch: 4 Global Step: 22120 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:45:10,691-Speed 10484.80 samples/sec Loss 12.3308 LearningRate 0.5800 Epoch: 4 Global Step: 22130 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:45:18,504-Speed 10486.39 samples/sec Loss 12.3901 LearningRate 0.5799 Epoch: 4 Global Step: 22140 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:45:26,315-Speed 10489.99 samples/sec Loss 12.3845 LearningRate 0.5797 Epoch: 4 Global Step: 22150 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:45:34,146-Speed 10465.87 samples/sec Loss 12.3855 LearningRate 0.5796 Epoch: 4 Global Step: 22160 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:45:41,969-Speed 10471.49 samples/sec Loss 12.3340 LearningRate 0.5794 Epoch: 4 Global Step: 22170 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:45:49,782-Speed 10486.59 samples/sec Loss 12.2505 LearningRate 0.5793 Epoch: 4 Global Step: 22180 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:45:57,613-Speed 10463.13 samples/sec Loss 12.3022 LearningRate 0.5791 Epoch: 4 Global Step: 22190 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:46:05,443-Speed 10466.39 samples/sec Loss 12.2761 LearningRate 0.5790 Epoch: 4 Global Step: 22200 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:46:13,277-Speed 10457.35 samples/sec Loss 12.3297 LearningRate 0.5789 Epoch: 4 Global Step: 22210 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:46:21,123-Speed 10442.76 samples/sec Loss 12.3456 LearningRate 0.5787 Epoch: 4 Global Step: 22220 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:46:28,940-Speed 10482.13 samples/sec Loss 12.5471 LearningRate 0.5786 Epoch: 4 Global Step: 22230 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:46:36,743-Speed 10499.59 samples/sec Loss 12.3665 LearningRate 0.5784 Epoch: 4 Global Step: 22240 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:46:44,537-Speed 10512.14 samples/sec Loss 12.2699 LearningRate 0.5783 Epoch: 4 Global Step: 22250 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:46:52,347-Speed 10490.30 samples/sec Loss 12.3042 LearningRate 0.5782 Epoch: 4 Global Step: 22260 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:47:00,139-Speed 10515.14 samples/sec Loss 12.2532 LearningRate 0.5780 Epoch: 4 Global Step: 22270 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:47:07,960-Speed 10475.42 samples/sec Loss 12.3204 LearningRate 0.5779 Epoch: 4 Global Step: 22280 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:47:15,758-Speed 10508.20 samples/sec Loss 12.3186 LearningRate 0.5777 Epoch: 4 Global Step: 22290 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:47:23,558-Speed 10503.87 samples/sec Loss 12.3128 LearningRate 0.5776 Epoch: 4 Global Step: 22300 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:47:31,377-Speed 10479.14 samples/sec Loss 12.3209 LearningRate 0.5774 Epoch: 4 Global Step: 22310 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:47:39,204-Speed 10466.89 samples/sec Loss 12.3550 LearningRate 0.5773 Epoch: 4 Global Step: 22320 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:47:47,013-Speed 10492.00 samples/sec Loss 12.3169 LearningRate 0.5772 Epoch: 4 Global Step: 22330 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:47:54,824-Speed 10489.42 samples/sec Loss 12.1907 LearningRate 0.5770 Epoch: 4 Global Step: 22340 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:48:02,597-Speed 10540.29 samples/sec Loss 12.2545 LearningRate 0.5769 Epoch: 4 Global Step: 22350 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:48:10,382-Speed 10523.72 samples/sec Loss 12.4793 LearningRate 0.5767 Epoch: 4 Global Step: 22360 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:48:18,169-Speed 10521.06 samples/sec Loss 12.3609 LearningRate 0.5766 Epoch: 4 Global Step: 22370 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:48:25,953-Speed 10526.53 samples/sec Loss 12.3559 LearningRate 0.5765 Epoch: 4 Global Step: 22380 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:48:33,747-Speed 10511.68 samples/sec Loss 12.2540 LearningRate 0.5763 Epoch: 4 Global Step: 22390 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:48:41,556-Speed 10491.88 samples/sec Loss 12.3982 LearningRate 0.5762 Epoch: 4 Global Step: 22400 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:48:49,406-Speed 10437.92 samples/sec Loss 12.2597 LearningRate 0.5760 Epoch: 4 Global Step: 22410 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:48:57,220-Speed 10484.20 samples/sec Loss 12.2961 LearningRate 0.5759 Epoch: 4 Global Step: 22420 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:49:05,038-Speed 10479.51 samples/sec Loss 12.2321 LearningRate 0.5757 Epoch: 4 Global Step: 22430 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:49:12,857-Speed 10483.45 samples/sec Loss 12.3037 LearningRate 0.5756 Epoch: 4 Global Step: 22440 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:49:20,694-Speed 10453.93 samples/sec Loss 12.2577 LearningRate 0.5755 Epoch: 4 Global Step: 22450 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:49:28,534-Speed 10449.21 samples/sec Loss 12.2746 LearningRate 0.5753 Epoch: 4 Global Step: 22460 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:49:36,391-Speed 10429.18 samples/sec Loss 12.2021 LearningRate 0.5752 Epoch: 4 Global Step: 22470 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:49:44,224-Speed 10459.99 samples/sec Loss 12.1819 LearningRate 0.5750 Epoch: 4 Global Step: 22480 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:49:52,044-Speed 10476.94 samples/sec Loss 12.3963 LearningRate 0.5749 Epoch: 4 Global Step: 22490 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:49:59,831-Speed 10520.97 samples/sec Loss 12.3110 LearningRate 0.5748 Epoch: 4 Global Step: 22500 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:50:07,646-Speed 10492.01 samples/sec Loss 12.2835 LearningRate 0.5746 Epoch: 4 Global Step: 22510 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:50:15,444-Speed 10506.39 samples/sec Loss 12.4966 LearningRate 0.5745 Epoch: 4 Global Step: 22520 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:50:23,272-Speed 10465.47 samples/sec Loss 12.3647 LearningRate 0.5743 Epoch: 4 Global Step: 22530 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:50:31,093-Speed 10475.26 samples/sec Loss 12.2286 LearningRate 0.5742 Epoch: 4 Global Step: 22540 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:50:38,878-Speed 10525.22 samples/sec Loss 12.2124 LearningRate 0.5740 Epoch: 4 Global Step: 22550 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:50:46,668-Speed 10517.48 samples/sec Loss 12.2356 LearningRate 0.5739 Epoch: 4 Global Step: 22560 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:50:54,525-Speed 10427.74 samples/sec Loss 12.1878 LearningRate 0.5738 Epoch: 4 Global Step: 22570 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:51:02,351-Speed 10467.95 samples/sec Loss 12.2162 LearningRate 0.5736 Epoch: 4 Global Step: 22580 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:51:10,171-Speed 10477.12 samples/sec Loss 12.2395 LearningRate 0.5735 Epoch: 4 Global Step: 22590 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:51:17,961-Speed 10517.74 samples/sec Loss 12.3788 LearningRate 0.5733 Epoch: 4 Global Step: 22600 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:51:25,778-Speed 10481.35 samples/sec Loss 12.2901 LearningRate 0.5732 Epoch: 4 Global Step: 22610 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:51:33,592-Speed 10484.60 samples/sec Loss 12.2931 LearningRate 0.5731 Epoch: 4 Global Step: 22620 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:51:41,397-Speed 10498.85 samples/sec Loss 12.1961 LearningRate 0.5729 Epoch: 4 Global Step: 22630 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:51:49,242-Speed 10443.53 samples/sec Loss 12.2696 LearningRate 0.5728 Epoch: 4 Global Step: 22640 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:51:57,053-Speed 10488.52 samples/sec Loss 12.2481 LearningRate 0.5726 Epoch: 4 Global Step: 22650 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:52:04,891-Speed 10453.28 samples/sec Loss 12.2677 LearningRate 0.5725 Epoch: 4 Global Step: 22660 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:52:12,713-Speed 10474.88 samples/sec Loss 12.1531 LearningRate 0.5723 Epoch: 4 Global Step: 22670 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:52:20,560-Speed 10441.49 samples/sec Loss 12.2378 LearningRate 0.5722 Epoch: 4 Global Step: 22680 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:52:28,391-Speed 10461.96 samples/sec Loss 12.1333 LearningRate 0.5721 Epoch: 4 Global Step: 22690 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:52:36,200-Speed 10492.30 samples/sec Loss 12.2776 LearningRate 0.5719 Epoch: 4 Global Step: 22700 Fp16 Grad Scale: 524288 Required: 18 hours Training: 2022-01-15 19:52:44,004-Speed 10502.91 samples/sec Loss 12.3006 LearningRate 0.5718 Epoch: 4 Global Step: 22710 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:52:51,834-Speed 10462.94 samples/sec Loss 12.1963 LearningRate 0.5716 Epoch: 4 Global Step: 22720 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:52:59,639-Speed 10497.44 samples/sec Loss 12.2713 LearningRate 0.5715 Epoch: 4 Global Step: 22730 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:53:07,460-Speed 10476.35 samples/sec Loss 12.2607 LearningRate 0.5714 Epoch: 4 Global Step: 22740 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:53:15,271-Speed 10489.37 samples/sec Loss 12.2769 LearningRate 0.5712 Epoch: 4 Global Step: 22750 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:53:23,067-Speed 10509.20 samples/sec Loss 12.1735 LearningRate 0.5711 Epoch: 4 Global Step: 22760 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:53:30,840-Speed 10540.56 samples/sec Loss 12.2783 LearningRate 0.5709 Epoch: 4 Global Step: 22770 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:53:38,624-Speed 10524.51 samples/sec Loss 12.1650 LearningRate 0.5708 Epoch: 4 Global Step: 22780 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:53:46,452-Speed 10466.84 samples/sec Loss 12.2101 LearningRate 0.5707 Epoch: 4 Global Step: 22790 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:53:54,233-Speed 10529.43 samples/sec Loss 12.1369 LearningRate 0.5705 Epoch: 4 Global Step: 22800 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:54:02,038-Speed 10497.04 samples/sec Loss 12.2259 LearningRate 0.5704 Epoch: 4 Global Step: 22810 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:54:09,851-Speed 10486.51 samples/sec Loss 12.2217 LearningRate 0.5702 Epoch: 4 Global Step: 22820 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:54:17,715-Speed 10419.16 samples/sec Loss 12.2383 LearningRate 0.5701 Epoch: 4 Global Step: 22830 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:54:25,520-Speed 10499.27 samples/sec Loss 12.2055 LearningRate 0.5699 Epoch: 4 Global Step: 22840 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:54:33,352-Speed 10461.17 samples/sec Loss 12.1832 LearningRate 0.5698 Epoch: 4 Global Step: 22850 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:54:41,182-Speed 10463.22 samples/sec Loss 12.1449 LearningRate 0.5697 Epoch: 4 Global Step: 22860 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:54:48,998-Speed 10482.10 samples/sec Loss 12.2342 LearningRate 0.5695 Epoch: 4 Global Step: 22870 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:54:56,807-Speed 10491.87 samples/sec Loss 12.1620 LearningRate 0.5694 Epoch: 4 Global Step: 22880 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:55:04,652-Speed 10443.82 samples/sec Loss 12.1423 LearningRate 0.5692 Epoch: 4 Global Step: 22890 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:55:12,482-Speed 10463.59 samples/sec Loss 12.2624 LearningRate 0.5691 Epoch: 4 Global Step: 22900 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:55:20,352-Speed 10410.06 samples/sec Loss 12.1565 LearningRate 0.5690 Epoch: 4 Global Step: 22910 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:55:28,159-Speed 10495.22 samples/sec Loss 12.3110 LearningRate 0.5688 Epoch: 4 Global Step: 22920 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:55:35,969-Speed 10490.63 samples/sec Loss 12.1819 LearningRate 0.5687 Epoch: 4 Global Step: 22930 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:55:43,773-Speed 10499.08 samples/sec Loss 12.1617 LearningRate 0.5685 Epoch: 4 Global Step: 22940 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 19:55:51,565-Speed 10514.45 samples/sec Loss 12.1900 LearningRate 0.5684 Epoch: 4 Global Step: 22950 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:55:59,366-Speed 10503.11 samples/sec Loss 12.2540 LearningRate 0.5683 Epoch: 4 Global Step: 22960 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:56:07,161-Speed 10510.26 samples/sec Loss 12.2415 LearningRate 0.5681 Epoch: 4 Global Step: 22970 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:56:14,990-Speed 10465.73 samples/sec Loss 12.1272 LearningRate 0.5680 Epoch: 4 Global Step: 22980 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:56:22,841-Speed 10435.20 samples/sec Loss 12.2388 LearningRate 0.5678 Epoch: 4 Global Step: 22990 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:56:30,637-Speed 10510.24 samples/sec Loss 12.2430 LearningRate 0.5677 Epoch: 4 Global Step: 23000 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:56:38,454-Speed 10480.64 samples/sec Loss 12.1653 LearningRate 0.5676 Epoch: 4 Global Step: 23010 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:56:46,254-Speed 10504.04 samples/sec Loss 12.2062 LearningRate 0.5674 Epoch: 4 Global Step: 23020 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:56:54,089-Speed 10457.62 samples/sec Loss 12.2331 LearningRate 0.5673 Epoch: 4 Global Step: 23030 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:57:01,887-Speed 10506.11 samples/sec Loss 12.2499 LearningRate 0.5671 Epoch: 4 Global Step: 23040 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:57:09,701-Speed 10486.39 samples/sec Loss 12.0925 LearningRate 0.5670 Epoch: 4 Global Step: 23050 Fp16 Grad Scale: 524288 Required: 18 hours Training: 2022-01-15 19:57:17,486-Speed 10523.94 samples/sec Loss 12.1365 LearningRate 0.5668 Epoch: 4 Global Step: 23060 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:57:25,289-Speed 10499.52 samples/sec Loss 12.1932 LearningRate 0.5667 Epoch: 4 Global Step: 23070 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:57:33,098-Speed 10491.86 samples/sec Loss 12.1920 LearningRate 0.5666 Epoch: 4 Global Step: 23080 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:57:40,935-Speed 10454.16 samples/sec Loss 12.0571 LearningRate 0.5664 Epoch: 4 Global Step: 23090 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:57:48,753-Speed 10480.86 samples/sec Loss 12.2760 LearningRate 0.5663 Epoch: 4 Global Step: 23100 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:57:56,566-Speed 10485.83 samples/sec Loss 12.1595 LearningRate 0.5661 Epoch: 4 Global Step: 23110 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:58:04,397-Speed 10461.75 samples/sec Loss 12.2088 LearningRate 0.5660 Epoch: 4 Global Step: 23120 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:58:12,203-Speed 10496.38 samples/sec Loss 12.1050 LearningRate 0.5659 Epoch: 4 Global Step: 23130 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:58:19,998-Speed 10511.18 samples/sec Loss 12.1623 LearningRate 0.5657 Epoch: 4 Global Step: 23140 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:58:27,818-Speed 10477.57 samples/sec Loss 12.2473 LearningRate 0.5656 Epoch: 4 Global Step: 23150 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:58:35,608-Speed 10517.45 samples/sec Loss 12.2291 LearningRate 0.5654 Epoch: 4 Global Step: 23160 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:58:43,424-Speed 10482.79 samples/sec Loss 12.2777 LearningRate 0.5653 Epoch: 4 Global Step: 23170 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:58:51,229-Speed 10497.01 samples/sec Loss 12.2427 LearningRate 0.5652 Epoch: 4 Global Step: 23180 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:58:59,034-Speed 10497.19 samples/sec Loss 12.1519 LearningRate 0.5650 Epoch: 4 Global Step: 23190 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:59:06,874-Speed 10451.03 samples/sec Loss 12.0810 LearningRate 0.5649 Epoch: 4 Global Step: 23200 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:59:14,707-Speed 10460.57 samples/sec Loss 12.1074 LearningRate 0.5647 Epoch: 4 Global Step: 23210 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:59:22,544-Speed 10453.92 samples/sec Loss 12.3328 LearningRate 0.5646 Epoch: 4 Global Step: 23220 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:59:30,416-Speed 10408.74 samples/sec Loss 12.2660 LearningRate 0.5645 Epoch: 4 Global Step: 23230 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:59:38,247-Speed 10463.43 samples/sec Loss 12.2261 LearningRate 0.5643 Epoch: 4 Global Step: 23240 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:59:46,053-Speed 10496.75 samples/sec Loss 12.1564 LearningRate 0.5642 Epoch: 4 Global Step: 23250 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 19:59:53,833-Speed 10531.58 samples/sec Loss 12.1392 LearningRate 0.5640 Epoch: 4 Global Step: 23260 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:00:01,641-Speed 10492.76 samples/sec Loss 12.0712 LearningRate 0.5639 Epoch: 4 Global Step: 23270 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:00:09,438-Speed 10508.08 samples/sec Loss 12.0689 LearningRate 0.5638 Epoch: 4 Global Step: 23280 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:00:17,224-Speed 10523.71 samples/sec Loss 12.0580 LearningRate 0.5636 Epoch: 4 Global Step: 23290 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:00:25,041-Speed 10481.87 samples/sec Loss 12.2051 LearningRate 0.5635 Epoch: 4 Global Step: 23300 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:00:32,826-Speed 10523.17 samples/sec Loss 12.1852 LearningRate 0.5633 Epoch: 4 Global Step: 23310 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:00:40,628-Speed 10501.59 samples/sec Loss 12.1659 LearningRate 0.5632 Epoch: 4 Global Step: 23320 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:00:48,447-Speed 10478.95 samples/sec Loss 12.1150 LearningRate 0.5631 Epoch: 4 Global Step: 23330 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:00:56,267-Speed 10476.63 samples/sec Loss 12.1256 LearningRate 0.5629 Epoch: 4 Global Step: 23340 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:01:04,070-Speed 10500.48 samples/sec Loss 12.1249 LearningRate 0.5628 Epoch: 4 Global Step: 23350 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:01:11,922-Speed 10435.19 samples/sec Loss 12.1349 LearningRate 0.5626 Epoch: 4 Global Step: 23360 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:01:19,703-Speed 10530.29 samples/sec Loss 12.1453 LearningRate 0.5625 Epoch: 4 Global Step: 23370 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:01:27,510-Speed 10494.09 samples/sec Loss 12.0794 LearningRate 0.5624 Epoch: 4 Global Step: 23380 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:01:35,293-Speed 10527.17 samples/sec Loss 12.1404 LearningRate 0.5622 Epoch: 4 Global Step: 23390 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:01:43,089-Speed 10509.71 samples/sec Loss 11.9810 LearningRate 0.5621 Epoch: 4 Global Step: 23400 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:01:50,880-Speed 10516.56 samples/sec Loss 12.3219 LearningRate 0.5619 Epoch: 4 Global Step: 23410 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:01:58,683-Speed 10500.22 samples/sec Loss 12.2596 LearningRate 0.5618 Epoch: 4 Global Step: 23420 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:02:06,501-Speed 10479.56 samples/sec Loss 12.1748 LearningRate 0.5617 Epoch: 4 Global Step: 23430 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:02:14,326-Speed 10470.28 samples/sec Loss 12.0390 LearningRate 0.5615 Epoch: 4 Global Step: 23440 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:02:22,135-Speed 10491.38 samples/sec Loss 11.9951 LearningRate 0.5614 Epoch: 4 Global Step: 23450 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:02:29,943-Speed 10493.86 samples/sec Loss 12.1058 LearningRate 0.5612 Epoch: 4 Global Step: 23460 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:02:37,732-Speed 10518.69 samples/sec Loss 12.0527 LearningRate 0.5611 Epoch: 4 Global Step: 23470 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:02:45,519-Speed 10521.59 samples/sec Loss 12.2075 LearningRate 0.5610 Epoch: 4 Global Step: 23480 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:02:53,340-Speed 10474.84 samples/sec Loss 12.0428 LearningRate 0.5608 Epoch: 4 Global Step: 23490 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:03:01,155-Speed 10484.22 samples/sec Loss 12.0905 LearningRate 0.5607 Epoch: 4 Global Step: 23500 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:03:09,001-Speed 10443.03 samples/sec Loss 12.1525 LearningRate 0.5605 Epoch: 4 Global Step: 23510 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:03:16,803-Speed 10502.06 samples/sec Loss 12.2287 LearningRate 0.5604 Epoch: 4 Global Step: 23520 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:03:24,600-Speed 10507.05 samples/sec Loss 12.1284 LearningRate 0.5603 Epoch: 4 Global Step: 23530 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:03:32,389-Speed 10519.56 samples/sec Loss 12.0687 LearningRate 0.5601 Epoch: 4 Global Step: 23540 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:03:40,177-Speed 10520.07 samples/sec Loss 12.0013 LearningRate 0.5600 Epoch: 4 Global Step: 23550 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:03:47,945-Speed 10546.47 samples/sec Loss 12.0055 LearningRate 0.5598 Epoch: 4 Global Step: 23560 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:03:55,741-Speed 10509.97 samples/sec Loss 12.0589 LearningRate 0.5597 Epoch: 4 Global Step: 23570 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:04:03,527-Speed 10523.93 samples/sec Loss 12.0349 LearningRate 0.5596 Epoch: 4 Global Step: 23580 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:04:11,343-Speed 10481.96 samples/sec Loss 12.0809 LearningRate 0.5594 Epoch: 4 Global Step: 23590 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:04:19,139-Speed 10508.72 samples/sec Loss 12.1090 LearningRate 0.5593 Epoch: 4 Global Step: 23600 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:04:26,927-Speed 10520.28 samples/sec Loss 12.0549 LearningRate 0.5591 Epoch: 4 Global Step: 23610 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:04:34,693-Speed 10551.19 samples/sec Loss 12.1859 LearningRate 0.5590 Epoch: 4 Global Step: 23620 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 20:04:42,486-Speed 10512.78 samples/sec Loss 12.0152 LearningRate 0.5589 Epoch: 4 Global Step: 23630 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 20:04:50,273-Speed 10521.16 samples/sec Loss 12.1799 LearningRate 0.5587 Epoch: 4 Global Step: 23640 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 20:04:58,059-Speed 10523.95 samples/sec Loss 12.2077 LearningRate 0.5586 Epoch: 4 Global Step: 23650 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 20:05:05,872-Speed 10486.53 samples/sec Loss 12.0973 LearningRate 0.5584 Epoch: 4 Global Step: 23660 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 20:05:13,694-Speed 10475.20 samples/sec Loss 12.1550 LearningRate 0.5583 Epoch: 4 Global Step: 23670 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 20:05:21,497-Speed 10499.13 samples/sec Loss 12.1458 LearningRate 0.5582 Epoch: 4 Global Step: 23680 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 20:05:29,303-Speed 10495.54 samples/sec Loss 11.9718 LearningRate 0.5580 Epoch: 4 Global Step: 23690 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 20:05:37,081-Speed 10533.55 samples/sec Loss 12.0675 LearningRate 0.5579 Epoch: 4 Global Step: 23700 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 20:05:44,892-Speed 10489.09 samples/sec Loss 12.0477 LearningRate 0.5577 Epoch: 4 Global Step: 23710 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 20:05:52,682-Speed 10518.66 samples/sec Loss 12.1185 LearningRate 0.5576 Epoch: 4 Global Step: 23720 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:06:00,500-Speed 10478.61 samples/sec Loss 11.9341 LearningRate 0.5575 Epoch: 4 Global Step: 23730 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:06:08,285-Speed 10524.74 samples/sec Loss 12.0875 LearningRate 0.5573 Epoch: 4 Global Step: 23740 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:06:16,063-Speed 10534.22 samples/sec Loss 12.0575 LearningRate 0.5572 Epoch: 4 Global Step: 23750 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:06:23,874-Speed 10489.02 samples/sec Loss 12.1561 LearningRate 0.5570 Epoch: 4 Global Step: 23760 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:06:31,670-Speed 10509.88 samples/sec Loss 12.1268 LearningRate 0.5569 Epoch: 4 Global Step: 23770 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:06:39,484-Speed 10483.73 samples/sec Loss 12.0285 LearningRate 0.5568 Epoch: 4 Global Step: 23780 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:06:47,271-Speed 10522.01 samples/sec Loss 12.0651 LearningRate 0.5566 Epoch: 4 Global Step: 23790 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:06:55,067-Speed 10510.04 samples/sec Loss 11.9973 LearningRate 0.5565 Epoch: 4 Global Step: 23800 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:07:02,862-Speed 10510.02 samples/sec Loss 12.0738 LearningRate 0.5564 Epoch: 4 Global Step: 23810 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:07:10,631-Speed 10545.15 samples/sec Loss 11.9558 LearningRate 0.5562 Epoch: 4 Global Step: 23820 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 20:07:18,439-Speed 10494.57 samples/sec Loss 12.2688 LearningRate 0.5561 Epoch: 4 Global Step: 23830 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 20:07:26,236-Speed 10508.33 samples/sec Loss 12.2008 LearningRate 0.5559 Epoch: 4 Global Step: 23840 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 20:07:34,029-Speed 10515.55 samples/sec Loss 12.1129 LearningRate 0.5558 Epoch: 4 Global Step: 23850 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 20:07:41,822-Speed 10513.71 samples/sec Loss 12.0341 LearningRate 0.5557 Epoch: 4 Global Step: 23860 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 20:07:49,601-Speed 10532.46 samples/sec Loss 11.9096 LearningRate 0.5555 Epoch: 4 Global Step: 23870 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 20:07:57,396-Speed 10510.43 samples/sec Loss 12.0333 LearningRate 0.5554 Epoch: 4 Global Step: 23880 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 20:08:05,184-Speed 10519.59 samples/sec Loss 11.9639 LearningRate 0.5552 Epoch: 4 Global Step: 23890 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 20:08:12,999-Speed 10484.14 samples/sec Loss 12.0948 LearningRate 0.5551 Epoch: 4 Global Step: 23900 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 20:08:20,802-Speed 10500.49 samples/sec Loss 12.0883 LearningRate 0.5550 Epoch: 4 Global Step: 23910 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-01-15 20:08:28,576-Speed 10541.17 samples/sec Loss 12.1476 LearningRate 0.5548 Epoch: 4 Global Step: 23920 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:08:36,371-Speed 10509.94 samples/sec Loss 11.9819 LearningRate 0.5547 Epoch: 4 Global Step: 23930 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:08:44,180-Speed 10492.27 samples/sec Loss 11.8622 LearningRate 0.5545 Epoch: 4 Global Step: 23940 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:08:51,979-Speed 10504.68 samples/sec Loss 11.9604 LearningRate 0.5544 Epoch: 4 Global Step: 23950 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:08:59,800-Speed 10476.68 samples/sec Loss 12.0207 LearningRate 0.5543 Epoch: 4 Global Step: 23960 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:09:07,619-Speed 10477.99 samples/sec Loss 12.1096 LearningRate 0.5541 Epoch: 4 Global Step: 23970 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:09:15,447-Speed 10467.04 samples/sec Loss 12.0228 LearningRate 0.5540 Epoch: 4 Global Step: 23980 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:09:23,267-Speed 10476.43 samples/sec Loss 11.9553 LearningRate 0.5538 Epoch: 4 Global Step: 23990 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:09:31,114-Speed 10440.34 samples/sec Loss 12.0181 LearningRate 0.5537 Epoch: 4 Global Step: 24000 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:09:38,923-Speed 10499.10 samples/sec Loss 12.0705 LearningRate 0.5536 Epoch: 4 Global Step: 24010 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:09:46,746-Speed 10473.76 samples/sec Loss 12.0013 LearningRate 0.5534 Epoch: 4 Global Step: 24020 Fp16 Grad Scale: 524288 Required: 18 hours Training: 2022-01-15 20:09:54,541-Speed 10510.62 samples/sec Loss 12.1112 LearningRate 0.5533 Epoch: 4 Global Step: 24030 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:10:02,389-Speed 10438.50 samples/sec Loss 11.9836 LearningRate 0.5532 Epoch: 4 Global Step: 24040 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-01-15 20:10:10,195-Speed 10500.87 samples/sec Loss 12.0527 LearningRate 0.5530 Epoch: 4 Global Step: 24050 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:10:18,015-Speed 10476.96 samples/sec Loss 12.0248 LearningRate 0.5529 Epoch: 4 Global Step: 24060 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:10:25,811-Speed 10508.49 samples/sec Loss 11.9486 LearningRate 0.5527 Epoch: 4 Global Step: 24070 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:10:33,632-Speed 10476.15 samples/sec Loss 11.9332 LearningRate 0.5526 Epoch: 4 Global Step: 24080 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:10:41,424-Speed 10515.92 samples/sec Loss 12.0275 LearningRate 0.5525 Epoch: 4 Global Step: 24090 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:10:49,192-Speed 10546.36 samples/sec Loss 11.9258 LearningRate 0.5523 Epoch: 4 Global Step: 24100 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:10:56,966-Speed 10539.30 samples/sec Loss 12.0652 LearningRate 0.5522 Epoch: 4 Global Step: 24110 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:11:04,809-Speed 10447.26 samples/sec Loss 12.0042 LearningRate 0.5520 Epoch: 4 Global Step: 24120 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:11:12,612-Speed 10500.27 samples/sec Loss 11.9818 LearningRate 0.5519 Epoch: 4 Global Step: 24130 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:11:20,399-Speed 10520.80 samples/sec Loss 12.1607 LearningRate 0.5518 Epoch: 4 Global Step: 24140 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:11:28,228-Speed 10465.17 samples/sec Loss 12.0176 LearningRate 0.5516 Epoch: 4 Global Step: 24150 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:11:36,038-Speed 10490.85 samples/sec Loss 12.0502 LearningRate 0.5515 Epoch: 4 Global Step: 24160 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:11:43,832-Speed 10511.54 samples/sec Loss 11.9732 LearningRate 0.5513 Epoch: 4 Global Step: 24170 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:11:51,644-Speed 10488.06 samples/sec Loss 11.9463 LearningRate 0.5512 Epoch: 4 Global Step: 24180 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:11:59,462-Speed 10479.12 samples/sec Loss 11.9412 LearningRate 0.5511 Epoch: 4 Global Step: 24190 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:12:07,273-Speed 10490.15 samples/sec Loss 11.8583 LearningRate 0.5509 Epoch: 4 Global Step: 24200 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:12:15,050-Speed 10537.42 samples/sec Loss 11.9314 LearningRate 0.5508 Epoch: 4 Global Step: 24210 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:12:22,838-Speed 10519.51 samples/sec Loss 12.1290 LearningRate 0.5507 Epoch: 4 Global Step: 24220 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:12:30,634-Speed 10508.77 samples/sec Loss 11.9852 LearningRate 0.5505 Epoch: 4 Global Step: 24230 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:12:38,499-Speed 10417.97 samples/sec Loss 11.8330 LearningRate 0.5504 Epoch: 4 Global Step: 24240 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:12:46,290-Speed 10516.06 samples/sec Loss 11.9359 LearningRate 0.5502 Epoch: 4 Global Step: 24250 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:12:54,116-Speed 10468.90 samples/sec Loss 12.0236 LearningRate 0.5501 Epoch: 4 Global Step: 24260 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:13:01,928-Speed 10487.80 samples/sec Loss 11.9400 LearningRate 0.5500 Epoch: 4 Global Step: 24270 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:13:09,732-Speed 10502.14 samples/sec Loss 11.9526 LearningRate 0.5498 Epoch: 4 Global Step: 24280 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:13:17,523-Speed 10515.61 samples/sec Loss 12.0330 LearningRate 0.5497 Epoch: 4 Global Step: 24290 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:13:25,322-Speed 10505.68 samples/sec Loss 11.9958 LearningRate 0.5495 Epoch: 4 Global Step: 24300 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:13:33,142-Speed 10477.21 samples/sec Loss 12.0054 LearningRate 0.5494 Epoch: 4 Global Step: 24310 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:13:40,938-Speed 10508.41 samples/sec Loss 11.9696 LearningRate 0.5493 Epoch: 4 Global Step: 24320 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:13:48,727-Speed 10519.17 samples/sec Loss 11.8384 LearningRate 0.5491 Epoch: 4 Global Step: 24330 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:13:56,554-Speed 10468.36 samples/sec Loss 11.9554 LearningRate 0.5490 Epoch: 4 Global Step: 24340 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:14:04,363-Speed 10492.08 samples/sec Loss 11.8754 LearningRate 0.5489 Epoch: 4 Global Step: 24350 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:14:12,173-Speed 10491.16 samples/sec Loss 11.9904 LearningRate 0.5487 Epoch: 4 Global Step: 24360 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:14:20,011-Speed 10459.21 samples/sec Loss 11.9267 LearningRate 0.5486 Epoch: 4 Global Step: 24370 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:14:27,809-Speed 10506.88 samples/sec Loss 11.8796 LearningRate 0.5484 Epoch: 4 Global Step: 24380 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:14:35,623-Speed 10486.17 samples/sec Loss 11.9717 LearningRate 0.5483 Epoch: 4 Global Step: 24390 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:14:43,442-Speed 10478.46 samples/sec Loss 11.8965 LearningRate 0.5482 Epoch: 4 Global Step: 24400 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:14:51,270-Speed 10466.97 samples/sec Loss 12.0143 LearningRate 0.5480 Epoch: 4 Global Step: 24410 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:14:59,096-Speed 10469.12 samples/sec Loss 11.9583 LearningRate 0.5479 Epoch: 4 Global Step: 24420 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:15:06,904-Speed 10494.12 samples/sec Loss 11.9385 LearningRate 0.5477 Epoch: 4 Global Step: 24430 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:15:14,747-Speed 10446.03 samples/sec Loss 11.9281 LearningRate 0.5476 Epoch: 4 Global Step: 24440 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:15:22,589-Speed 10447.29 samples/sec Loss 11.9526 LearningRate 0.5475 Epoch: 4 Global Step: 24450 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:15:30,424-Speed 10457.96 samples/sec Loss 11.8945 LearningRate 0.5473 Epoch: 4 Global Step: 24460 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:15:38,224-Speed 10504.32 samples/sec Loss 11.9302 LearningRate 0.5472 Epoch: 4 Global Step: 24470 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:15:46,006-Speed 10527.44 samples/sec Loss 12.0597 LearningRate 0.5471 Epoch: 4 Global Step: 24480 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:15:53,819-Speed 10487.00 samples/sec Loss 12.0158 LearningRate 0.5469 Epoch: 4 Global Step: 24490 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:16:01,636-Speed 10480.52 samples/sec Loss 11.9352 LearningRate 0.5468 Epoch: 4 Global Step: 24500 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:16:09,431-Speed 10510.62 samples/sec Loss 11.8221 LearningRate 0.5466 Epoch: 4 Global Step: 24510 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:16:17,215-Speed 10526.56 samples/sec Loss 11.8803 LearningRate 0.5465 Epoch: 4 Global Step: 24520 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:16:25,096-Speed 10395.21 samples/sec Loss 11.8816 LearningRate 0.5464 Epoch: 4 Global Step: 24530 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:16:32,916-Speed 10477.33 samples/sec Loss 12.0065 LearningRate 0.5462 Epoch: 4 Global Step: 24540 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:16:40,712-Speed 10509.92 samples/sec Loss 11.8783 LearningRate 0.5461 Epoch: 4 Global Step: 24550 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:16:48,507-Speed 10510.16 samples/sec Loss 11.8979 LearningRate 0.5460 Epoch: 4 Global Step: 24560 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:16:56,297-Speed 10517.58 samples/sec Loss 11.9100 LearningRate 0.5458 Epoch: 4 Global Step: 24570 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:17:04,097-Speed 10504.23 samples/sec Loss 11.9170 LearningRate 0.5457 Epoch: 4 Global Step: 24580 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:17:11,887-Speed 10517.68 samples/sec Loss 11.8504 LearningRate 0.5455 Epoch: 4 Global Step: 24590 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:17:19,678-Speed 10516.58 samples/sec Loss 11.8847 LearningRate 0.5454 Epoch: 4 Global Step: 24600 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:17:27,464-Speed 10522.27 samples/sec Loss 11.8216 LearningRate 0.5453 Epoch: 4 Global Step: 24610 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:17:35,298-Speed 10458.87 samples/sec Loss 11.9307 LearningRate 0.5451 Epoch: 4 Global Step: 24620 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:17:43,200-Speed 10368.59 samples/sec Loss 11.9092 LearningRate 0.5450 Epoch: 4 Global Step: 24630 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:17:51,042-Speed 10446.41 samples/sec Loss 11.8749 LearningRate 0.5448 Epoch: 4 Global Step: 24640 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:17:58,867-Speed 10471.40 samples/sec Loss 11.9084 LearningRate 0.5447 Epoch: 4 Global Step: 24650 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:18:06,724-Speed 10428.56 samples/sec Loss 11.8232 LearningRate 0.5446 Epoch: 4 Global Step: 24660 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:18:14,538-Speed 10484.42 samples/sec Loss 11.8780 LearningRate 0.5444 Epoch: 4 Global Step: 24670 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:18:22,321-Speed 10528.09 samples/sec Loss 11.9137 LearningRate 0.5443 Epoch: 4 Global Step: 24680 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:18:30,116-Speed 10510.89 samples/sec Loss 11.9579 LearningRate 0.5442 Epoch: 4 Global Step: 24690 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:18:37,913-Speed 10508.18 samples/sec Loss 11.9114 LearningRate 0.5440 Epoch: 4 Global Step: 24700 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:18:45,716-Speed 10500.30 samples/sec Loss 11.8836 LearningRate 0.5439 Epoch: 4 Global Step: 24710 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:18:53,559-Speed 10445.51 samples/sec Loss 11.8471 LearningRate 0.5437 Epoch: 4 Global Step: 24720 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:19:01,353-Speed 10511.61 samples/sec Loss 11.8390 LearningRate 0.5436 Epoch: 4 Global Step: 24730 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:19:09,174-Speed 10476.36 samples/sec Loss 11.8828 LearningRate 0.5435 Epoch: 4 Global Step: 24740 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:19:16,984-Speed 10490.52 samples/sec Loss 11.9011 LearningRate 0.5433 Epoch: 4 Global Step: 24750 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:19:24,831-Speed 10441.15 samples/sec Loss 12.0718 LearningRate 0.5432 Epoch: 4 Global Step: 24760 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:19:32,657-Speed 10468.53 samples/sec Loss 11.9510 LearningRate 0.5431 Epoch: 4 Global Step: 24770 Fp16 Grad Scale: 524288 Required: 17 hours Training: 2022-01-15 20:19:40,460-Speed 10500.44 samples/sec Loss 11.9287 LearningRate 0.5429 Epoch: 4 Global Step: 24780 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:19:48,283-Speed 10473.53 samples/sec Loss 11.8265 LearningRate 0.5428 Epoch: 4 Global Step: 24790 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:19:56,101-Speed 10479.83 samples/sec Loss 11.7918 LearningRate 0.5426 Epoch: 4 Global Step: 24800 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:20:03,907-Speed 10495.25 samples/sec Loss 11.8152 LearningRate 0.5425 Epoch: 4 Global Step: 24810 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:20:11,699-Speed 10515.30 samples/sec Loss 11.8416 LearningRate 0.5424 Epoch: 4 Global Step: 24820 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:20:19,531-Speed 10460.45 samples/sec Loss 11.8321 LearningRate 0.5422 Epoch: 4 Global Step: 24830 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:20:27,379-Speed 10439.69 samples/sec Loss 11.8925 LearningRate 0.5421 Epoch: 4 Global Step: 24840 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:20:35,179-Speed 10503.07 samples/sec Loss 11.8159 LearningRate 0.5420 Epoch: 4 Global Step: 24850 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:20:42,967-Speed 10521.51 samples/sec Loss 11.8361 LearningRate 0.5418 Epoch: 4 Global Step: 24860 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:20:50,755-Speed 10519.68 samples/sec Loss 11.9194 LearningRate 0.5417 Epoch: 4 Global Step: 24870 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:20:58,543-Speed 10520.25 samples/sec Loss 11.8974 LearningRate 0.5415 Epoch: 4 Global Step: 24880 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:21:06,341-Speed 10506.21 samples/sec Loss 11.7744 LearningRate 0.5414 Epoch: 4 Global Step: 24890 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:21:14,132-Speed 10516.61 samples/sec Loss 11.7859 LearningRate 0.5413 Epoch: 4 Global Step: 24900 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:21:21,916-Speed 10526.08 samples/sec Loss 11.7498 LearningRate 0.5411 Epoch: 4 Global Step: 24910 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:21:29,722-Speed 10495.69 samples/sec Loss 11.7297 LearningRate 0.5410 Epoch: 4 Global Step: 24920 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:21:37,538-Speed 10483.16 samples/sec Loss 11.7893 LearningRate 0.5409 Epoch: 4 Global Step: 24930 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:21:45,305-Speed 10549.14 samples/sec Loss 11.8556 LearningRate 0.5407 Epoch: 4 Global Step: 24940 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:21:53,115-Speed 10489.92 samples/sec Loss 11.8777 LearningRate 0.5406 Epoch: 4 Global Step: 24950 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:22:00,915-Speed 10504.95 samples/sec Loss 11.8627 LearningRate 0.5404 Epoch: 4 Global Step: 24960 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:22:08,710-Speed 10509.59 samples/sec Loss 11.9693 LearningRate 0.5403 Epoch: 4 Global Step: 24970 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:22:16,514-Speed 10498.28 samples/sec Loss 11.9957 LearningRate 0.5402 Epoch: 4 Global Step: 24980 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:22:24,311-Speed 10508.47 samples/sec Loss 11.7989 LearningRate 0.5400 Epoch: 4 Global Step: 24990 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:22:32,130-Speed 10478.83 samples/sec Loss 11.7395 LearningRate 0.5399 Epoch: 4 Global Step: 25000 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:22:39,930-Speed 10504.02 samples/sec Loss 11.8584 LearningRate 0.5398 Epoch: 4 Global Step: 25010 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:22:47,729-Speed 10505.09 samples/sec Loss 11.8184 LearningRate 0.5396 Epoch: 4 Global Step: 25020 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:22:55,513-Speed 10525.82 samples/sec Loss 11.7818 LearningRate 0.5395 Epoch: 4 Global Step: 25030 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:23:03,295-Speed 10528.63 samples/sec Loss 11.7593 LearningRate 0.5393 Epoch: 4 Global Step: 25040 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:23:11,092-Speed 10507.99 samples/sec Loss 11.8607 LearningRate 0.5392 Epoch: 4 Global Step: 25050 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:23:18,885-Speed 10513.32 samples/sec Loss 11.8720 LearningRate 0.5391 Epoch: 4 Global Step: 25060 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:23:26,688-Speed 10500.45 samples/sec Loss 11.9175 LearningRate 0.5389 Epoch: 4 Global Step: 25070 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:23:34,523-Speed 10457.22 samples/sec Loss 11.8222 LearningRate 0.5388 Epoch: 4 Global Step: 25080 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:23:42,302-Speed 10531.54 samples/sec Loss 11.7814 LearningRate 0.5387 Epoch: 4 Global Step: 25090 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:23:50,105-Speed 10500.58 samples/sec Loss 11.7589 LearningRate 0.5385 Epoch: 4 Global Step: 25100 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:23:57,888-Speed 10527.01 samples/sec Loss 11.7383 LearningRate 0.5384 Epoch: 4 Global Step: 25110 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:24:05,683-Speed 10510.26 samples/sec Loss 12.0425 LearningRate 0.5383 Epoch: 4 Global Step: 25120 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:24:13,486-Speed 10500.48 samples/sec Loss 11.8035 LearningRate 0.5381 Epoch: 4 Global Step: 25130 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:24:21,298-Speed 10486.76 samples/sec Loss 11.7646 LearningRate 0.5380 Epoch: 4 Global Step: 25140 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:24:29,112-Speed 10485.29 samples/sec Loss 11.7945 LearningRate 0.5378 Epoch: 4 Global Step: 25150 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:24:36,934-Speed 10475.77 samples/sec Loss 11.8463 LearningRate 0.5377 Epoch: 4 Global Step: 25160 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:24:44,720-Speed 10521.41 samples/sec Loss 11.8312 LearningRate 0.5376 Epoch: 4 Global Step: 25170 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:24:52,514-Speed 10512.20 samples/sec Loss 11.8259 LearningRate 0.5374 Epoch: 4 Global Step: 25180 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:25:00,334-Speed 10482.14 samples/sec Loss 11.6703 LearningRate 0.5373 Epoch: 4 Global Step: 25190 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:25:08,150-Speed 10482.65 samples/sec Loss 11.7805 LearningRate 0.5372 Epoch: 4 Global Step: 25200 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:25:15,948-Speed 10507.77 samples/sec Loss 11.9054 LearningRate 0.5370 Epoch: 4 Global Step: 25210 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:25:23,754-Speed 10494.72 samples/sec Loss 11.7970 LearningRate 0.5369 Epoch: 4 Global Step: 25220 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:25:31,577-Speed 10473.05 samples/sec Loss 11.7989 LearningRate 0.5367 Epoch: 4 Global Step: 25230 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:25:39,390-Speed 10486.21 samples/sec Loss 11.7340 LearningRate 0.5366 Epoch: 4 Global Step: 25240 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:25:47,198-Speed 10493.83 samples/sec Loss 11.6849 LearningRate 0.5365 Epoch: 4 Global Step: 25250 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:25:55,044-Speed 10441.32 samples/sec Loss 11.8403 LearningRate 0.5363 Epoch: 4 Global Step: 25260 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:26:02,898-Speed 10432.72 samples/sec Loss 11.7446 LearningRate 0.5362 Epoch: 4 Global Step: 25270 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:26:10,739-Speed 10459.59 samples/sec Loss 11.7511 LearningRate 0.5361 Epoch: 4 Global Step: 25280 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:26:18,544-Speed 10496.50 samples/sec Loss 11.7025 LearningRate 0.5359 Epoch: 4 Global Step: 25290 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:26:26,378-Speed 10458.77 samples/sec Loss 11.8328 LearningRate 0.5358 Epoch: 4 Global Step: 25300 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:26:34,159-Speed 10529.18 samples/sec Loss 11.7298 LearningRate 0.5356 Epoch: 4 Global Step: 25310 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:26:41,986-Speed 10468.07 samples/sec Loss 11.8606 LearningRate 0.5355 Epoch: 4 Global Step: 25320 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:26:49,760-Speed 10539.15 samples/sec Loss 11.8327 LearningRate 0.5354 Epoch: 4 Global Step: 25330 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:26:57,567-Speed 10495.32 samples/sec Loss 11.8470 LearningRate 0.5352 Epoch: 4 Global Step: 25340 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:27:05,364-Speed 10506.63 samples/sec Loss 11.7100 LearningRate 0.5351 Epoch: 4 Global Step: 25350 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:27:13,141-Speed 10534.32 samples/sec Loss 11.7741 LearningRate 0.5350 Epoch: 4 Global Step: 25360 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:27:20,913-Speed 10542.79 samples/sec Loss 11.7931 LearningRate 0.5348 Epoch: 4 Global Step: 25370 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:27:28,712-Speed 10505.36 samples/sec Loss 11.7164 LearningRate 0.5347 Epoch: 4 Global Step: 25380 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:27:36,509-Speed 10507.97 samples/sec Loss 11.7737 LearningRate 0.5346 Epoch: 4 Global Step: 25390 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:27:44,302-Speed 10513.23 samples/sec Loss 11.8989 LearningRate 0.5344 Epoch: 4 Global Step: 25400 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:27:52,113-Speed 10489.32 samples/sec Loss 11.7086 LearningRate 0.5343 Epoch: 4 Global Step: 25410 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:27:59,920-Speed 10494.68 samples/sec Loss 11.7255 LearningRate 0.5341 Epoch: 4 Global Step: 25420 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:28:07,750-Speed 10464.18 samples/sec Loss 11.8999 LearningRate 0.5340 Epoch: 4 Global Step: 25430 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:28:15,595-Speed 10443.29 samples/sec Loss 11.7476 LearningRate 0.5339 Epoch: 4 Global Step: 25440 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:28:23,403-Speed 10493.94 samples/sec Loss 11.7663 LearningRate 0.5337 Epoch: 4 Global Step: 25450 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:28:31,200-Speed 10507.43 samples/sec Loss 11.6970 LearningRate 0.5336 Epoch: 4 Global Step: 25460 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:28:38,983-Speed 10527.29 samples/sec Loss 11.7808 LearningRate 0.5335 Epoch: 4 Global Step: 25470 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:28:46,774-Speed 10516.21 samples/sec Loss 11.7040 LearningRate 0.5333 Epoch: 4 Global Step: 25480 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:28:54,614-Speed 10450.26 samples/sec Loss 11.6923 LearningRate 0.5332 Epoch: 4 Global Step: 25490 Fp16 Grad Scale: 524288 Required: 17 hours Training: 2022-01-15 20:29:02,403-Speed 10518.54 samples/sec Loss 11.7351 LearningRate 0.5331 Epoch: 4 Global Step: 25500 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:29:10,200-Speed 10508.43 samples/sec Loss 11.8150 LearningRate 0.5329 Epoch: 4 Global Step: 25510 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:29:17,979-Speed 10536.28 samples/sec Loss 11.6368 LearningRate 0.5328 Epoch: 4 Global Step: 25520 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:29:25,778-Speed 10506.04 samples/sec Loss 11.7843 LearningRate 0.5326 Epoch: 4 Global Step: 25530 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:29:33,560-Speed 10527.45 samples/sec Loss 11.6734 LearningRate 0.5325 Epoch: 4 Global Step: 25540 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:29:41,350-Speed 10518.15 samples/sec Loss 11.7519 LearningRate 0.5324 Epoch: 4 Global Step: 25550 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:29:49,143-Speed 10512.99 samples/sec Loss 11.6811 LearningRate 0.5322 Epoch: 4 Global Step: 25560 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:29:56,956-Speed 10486.55 samples/sec Loss 11.6625 LearningRate 0.5321 Epoch: 4 Global Step: 25570 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:30:04,784-Speed 10466.66 samples/sec Loss 11.7189 LearningRate 0.5320 Epoch: 4 Global Step: 25580 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:30:12,599-Speed 10482.94 samples/sec Loss 11.7735 LearningRate 0.5318 Epoch: 4 Global Step: 25590 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:30:20,409-Speed 10490.72 samples/sec Loss 11.8235 LearningRate 0.5317 Epoch: 4 Global Step: 25600 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:30:28,232-Speed 10474.05 samples/sec Loss 11.7017 LearningRate 0.5316 Epoch: 4 Global Step: 25610 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:30:36,036-Speed 10498.44 samples/sec Loss 11.7692 LearningRate 0.5314 Epoch: 4 Global Step: 25620 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:30:43,907-Speed 10408.80 samples/sec Loss 11.6852 LearningRate 0.5313 Epoch: 4 Global Step: 25630 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:30:51,708-Speed 10503.71 samples/sec Loss 11.6604 LearningRate 0.5311 Epoch: 4 Global Step: 25640 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:30:59,522-Speed 10485.15 samples/sec Loss 11.6269 LearningRate 0.5310 Epoch: 4 Global Step: 25650 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:31:07,336-Speed 10485.01 samples/sec Loss 11.7365 LearningRate 0.5309 Epoch: 4 Global Step: 25660 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:31:15,173-Speed 10454.31 samples/sec Loss 11.7648 LearningRate 0.5307 Epoch: 4 Global Step: 25670 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:31:22,969-Speed 10510.00 samples/sec Loss 11.6609 LearningRate 0.5306 Epoch: 4 Global Step: 25680 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:31:30,790-Speed 10475.98 samples/sec Loss 11.6954 LearningRate 0.5305 Epoch: 4 Global Step: 25690 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:31:38,643-Speed 10433.74 samples/sec Loss 11.8274 LearningRate 0.5303 Epoch: 4 Global Step: 25700 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:31:46,441-Speed 10506.18 samples/sec Loss 11.7476 LearningRate 0.5302 Epoch: 4 Global Step: 25710 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:31:54,229-Speed 10519.44 samples/sec Loss 11.6978 LearningRate 0.5301 Epoch: 4 Global Step: 25720 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:32:02,032-Speed 10499.88 samples/sec Loss 11.6548 LearningRate 0.5299 Epoch: 4 Global Step: 25730 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:32:09,845-Speed 10486.61 samples/sec Loss 11.9575 LearningRate 0.5298 Epoch: 4 Global Step: 25740 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:32:17,689-Speed 10445.89 samples/sec Loss 11.7317 LearningRate 0.5297 Epoch: 4 Global Step: 25750 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:32:25,485-Speed 10508.65 samples/sec Loss 11.8161 LearningRate 0.5295 Epoch: 4 Global Step: 25760 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:32:33,294-Speed 10494.08 samples/sec Loss 11.7716 LearningRate 0.5294 Epoch: 4 Global Step: 25770 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:32:41,143-Speed 10438.97 samples/sec Loss 11.6409 LearningRate 0.5292 Epoch: 4 Global Step: 25780 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:32:48,938-Speed 10511.20 samples/sec Loss 11.5906 LearningRate 0.5291 Epoch: 4 Global Step: 25790 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:32:56,734-Speed 10508.89 samples/sec Loss 11.6242 LearningRate 0.5290 Epoch: 4 Global Step: 25800 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:33:04,557-Speed 10473.71 samples/sec Loss 11.7545 LearningRate 0.5288 Epoch: 4 Global Step: 25810 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:33:12,350-Speed 10513.36 samples/sec Loss 11.6642 LearningRate 0.5287 Epoch: 4 Global Step: 25820 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:33:20,159-Speed 10491.37 samples/sec Loss 11.5773 LearningRate 0.5286 Epoch: 4 Global Step: 25830 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:33:27,989-Speed 10463.55 samples/sec Loss 11.6258 LearningRate 0.5284 Epoch: 4 Global Step: 25840 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:33:35,846-Speed 10428.31 samples/sec Loss 11.6605 LearningRate 0.5283 Epoch: 4 Global Step: 25850 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:33:43,658-Speed 10486.97 samples/sec Loss 11.6642 LearningRate 0.5282 Epoch: 4 Global Step: 25860 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:33:51,503-Speed 10444.42 samples/sec Loss 11.7924 LearningRate 0.5280 Epoch: 4 Global Step: 25870 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:33:59,322-Speed 10478.30 samples/sec Loss 11.6813 LearningRate 0.5279 Epoch: 4 Global Step: 25880 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:34:07,121-Speed 10505.54 samples/sec Loss 11.6982 LearningRate 0.5278 Epoch: 4 Global Step: 25890 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:34:14,955-Speed 10457.96 samples/sec Loss 11.8291 LearningRate 0.5276 Epoch: 4 Global Step: 25900 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:34:22,763-Speed 10493.50 samples/sec Loss 11.6738 LearningRate 0.5275 Epoch: 4 Global Step: 25910 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:34:30,570-Speed 10494.96 samples/sec Loss 11.6323 LearningRate 0.5273 Epoch: 4 Global Step: 25920 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:34:53,084-Speed 3638.98 samples/sec Loss 11.5993 LearningRate 0.5272 Epoch: 5 Global Step: 25930 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:35:00,876-Speed 10515.01 samples/sec Loss 11.5999 LearningRate 0.5271 Epoch: 5 Global Step: 25940 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:35:08,636-Speed 10560.70 samples/sec Loss 11.5908 LearningRate 0.5269 Epoch: 5 Global Step: 25950 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:35:16,401-Speed 10551.81 samples/sec Loss 11.6921 LearningRate 0.5268 Epoch: 5 Global Step: 25960 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:35:24,198-Speed 10508.05 samples/sec Loss 11.7018 LearningRate 0.5267 Epoch: 5 Global Step: 25970 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:35:31,967-Speed 10547.35 samples/sec Loss 11.6235 LearningRate 0.5265 Epoch: 5 Global Step: 25980 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:35:39,760-Speed 10513.63 samples/sec Loss 11.5926 LearningRate 0.5264 Epoch: 5 Global Step: 25990 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:35:47,576-Speed 10481.94 samples/sec Loss 11.6421 LearningRate 0.5263 Epoch: 5 Global Step: 26000 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:35:55,400-Speed 10472.85 samples/sec Loss 11.6315 LearningRate 0.5261 Epoch: 5 Global Step: 26010 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:36:03,205-Speed 10497.02 samples/sec Loss 11.6276 LearningRate 0.5260 Epoch: 5 Global Step: 26020 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:36:11,010-Speed 10497.53 samples/sec Loss 11.6953 LearningRate 0.5259 Epoch: 5 Global Step: 26030 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:36:18,819-Speed 10491.36 samples/sec Loss 11.5460 LearningRate 0.5257 Epoch: 5 Global Step: 26040 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:36:26,613-Speed 10511.86 samples/sec Loss 11.5718 LearningRate 0.5256 Epoch: 5 Global Step: 26050 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:36:34,435-Speed 10474.92 samples/sec Loss 11.7888 LearningRate 0.5254 Epoch: 5 Global Step: 26060 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:36:42,232-Speed 10510.49 samples/sec Loss 11.6705 LearningRate 0.5253 Epoch: 5 Global Step: 26070 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:36:50,029-Speed 10508.19 samples/sec Loss 11.7511 LearningRate 0.5252 Epoch: 5 Global Step: 26080 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:36:57,814-Speed 10524.17 samples/sec Loss 11.5599 LearningRate 0.5250 Epoch: 5 Global Step: 26090 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:37:05,596-Speed 10527.60 samples/sec Loss 11.6869 LearningRate 0.5249 Epoch: 5 Global Step: 26100 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:37:13,370-Speed 10539.79 samples/sec Loss 11.5647 LearningRate 0.5248 Epoch: 5 Global Step: 26110 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:37:21,159-Speed 10518.18 samples/sec Loss 11.6154 LearningRate 0.5246 Epoch: 5 Global Step: 26120 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:37:28,951-Speed 10514.24 samples/sec Loss 11.6185 LearningRate 0.5245 Epoch: 5 Global Step: 26130 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:37:36,725-Speed 10539.69 samples/sec Loss 11.6090 LearningRate 0.5244 Epoch: 5 Global Step: 26140 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:37:44,519-Speed 10512.04 samples/sec Loss 11.5796 LearningRate 0.5242 Epoch: 5 Global Step: 26150 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:37:52,335-Speed 10481.57 samples/sec Loss 11.6650 LearningRate 0.5241 Epoch: 5 Global Step: 26160 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:38:00,131-Speed 10510.16 samples/sec Loss 11.6437 LearningRate 0.5240 Epoch: 5 Global Step: 26170 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:38:07,912-Speed 10529.55 samples/sec Loss 11.6191 LearningRate 0.5238 Epoch: 5 Global Step: 26180 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:38:15,725-Speed 10486.08 samples/sec Loss 11.5717 LearningRate 0.5237 Epoch: 5 Global Step: 26190 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:38:23,506-Speed 10530.29 samples/sec Loss 11.6123 LearningRate 0.5236 Epoch: 5 Global Step: 26200 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:38:31,282-Speed 10535.46 samples/sec Loss 11.5797 LearningRate 0.5234 Epoch: 5 Global Step: 26210 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:38:39,071-Speed 10521.20 samples/sec Loss 11.5965 LearningRate 0.5233 Epoch: 5 Global Step: 26220 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:38:46,863-Speed 10514.27 samples/sec Loss 11.6313 LearningRate 0.5231 Epoch: 5 Global Step: 26230 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:38:54,658-Speed 10511.39 samples/sec Loss 11.5765 LearningRate 0.5230 Epoch: 5 Global Step: 26240 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:39:02,461-Speed 10499.95 samples/sec Loss 11.5752 LearningRate 0.5229 Epoch: 5 Global Step: 26250 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:39:10,248-Speed 10527.70 samples/sec Loss 11.6392 LearningRate 0.5227 Epoch: 5 Global Step: 26260 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:39:18,067-Speed 10477.80 samples/sec Loss 11.6402 LearningRate 0.5226 Epoch: 5 Global Step: 26270 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:39:25,874-Speed 10495.05 samples/sec Loss 11.6120 LearningRate 0.5225 Epoch: 5 Global Step: 26280 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:39:33,700-Speed 10468.97 samples/sec Loss 11.5333 LearningRate 0.5223 Epoch: 5 Global Step: 26290 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:39:41,527-Speed 10466.88 samples/sec Loss 11.4495 LearningRate 0.5222 Epoch: 5 Global Step: 26300 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:39:49,337-Speed 10491.74 samples/sec Loss 11.6247 LearningRate 0.5221 Epoch: 5 Global Step: 26310 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:39:57,177-Speed 10450.90 samples/sec Loss 11.6675 LearningRate 0.5219 Epoch: 5 Global Step: 26320 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:40:05,029-Speed 10434.27 samples/sec Loss 11.5482 LearningRate 0.5218 Epoch: 5 Global Step: 26330 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:40:12,905-Speed 10402.39 samples/sec Loss 11.5168 LearningRate 0.5217 Epoch: 5 Global Step: 26340 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:40:20,719-Speed 10484.87 samples/sec Loss 11.5586 LearningRate 0.5215 Epoch: 5 Global Step: 26350 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:40:28,554-Speed 10457.08 samples/sec Loss 11.6230 LearningRate 0.5214 Epoch: 5 Global Step: 26360 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:40:36,400-Speed 10443.07 samples/sec Loss 11.6813 LearningRate 0.5213 Epoch: 5 Global Step: 26370 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:40:44,238-Speed 10453.31 samples/sec Loss 11.7015 LearningRate 0.5211 Epoch: 5 Global Step: 26380 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:40:52,087-Speed 10437.28 samples/sec Loss 11.6386 LearningRate 0.5210 Epoch: 5 Global Step: 26390 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:40:59,906-Speed 10479.65 samples/sec Loss 11.6462 LearningRate 0.5209 Epoch: 5 Global Step: 26400 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:41:07,759-Speed 10436.27 samples/sec Loss 11.5753 LearningRate 0.5207 Epoch: 5 Global Step: 26410 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:41:15,655-Speed 10376.64 samples/sec Loss 11.6649 LearningRate 0.5206 Epoch: 5 Global Step: 26420 Fp16 Grad Scale: 524288 Required: 17 hours Training: 2022-01-15 20:41:23,501-Speed 10442.43 samples/sec Loss 11.5699 LearningRate 0.5204 Epoch: 5 Global Step: 26430 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:41:31,325-Speed 10471.40 samples/sec Loss 11.5682 LearningRate 0.5203 Epoch: 5 Global Step: 26440 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:41:39,137-Speed 10487.79 samples/sec Loss 11.5556 LearningRate 0.5202 Epoch: 5 Global Step: 26450 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:41:46,962-Speed 10470.35 samples/sec Loss 11.5295 LearningRate 0.5200 Epoch: 5 Global Step: 26460 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:41:54,817-Speed 10430.10 samples/sec Loss 11.4330 LearningRate 0.5199 Epoch: 5 Global Step: 26470 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:42:02,618-Speed 10503.26 samples/sec Loss 11.6522 LearningRate 0.5198 Epoch: 5 Global Step: 26480 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-15 20:42:10,474-Speed 10428.99 samples/sec Loss 11.6638 LearningRate 0.5196 Epoch: 5 Global Step: 26490 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-15 20:42:18,325-Speed 10435.03 samples/sec Loss 11.4805 LearningRate 0.5195 Epoch: 5 Global Step: 26500 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-15 20:42:26,140-Speed 10484.36 samples/sec Loss 11.5475 LearningRate 0.5194 Epoch: 5 Global Step: 26510 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-15 20:42:33,975-Speed 10457.52 samples/sec Loss 11.6115 LearningRate 0.5192 Epoch: 5 Global Step: 26520 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-15 20:42:41,833-Speed 10426.33 samples/sec Loss 11.5830 LearningRate 0.5191 Epoch: 5 Global Step: 26530 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-15 20:42:49,669-Speed 10456.02 samples/sec Loss 11.4472 LearningRate 0.5190 Epoch: 5 Global Step: 26540 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-15 20:42:57,490-Speed 10474.97 samples/sec Loss 11.7099 LearningRate 0.5188 Epoch: 5 Global Step: 26550 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-15 20:43:05,372-Speed 10395.50 samples/sec Loss 11.5818 LearningRate 0.5187 Epoch: 5 Global Step: 26560 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-15 20:43:13,215-Speed 10445.70 samples/sec Loss 11.6345 LearningRate 0.5186 Epoch: 5 Global Step: 26570 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-01-15 20:43:21,050-Speed 10456.95 samples/sec Loss 11.4925 LearningRate 0.5184 Epoch: 5 Global Step: 26580 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:43:28,892-Speed 10447.43 samples/sec Loss 11.4977 LearningRate 0.5183 Epoch: 5 Global Step: 26590 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:43:36,702-Speed 10490.49 samples/sec Loss 11.5077 LearningRate 0.5182 Epoch: 5 Global Step: 26600 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:43:44,513-Speed 10489.18 samples/sec Loss 11.5432 LearningRate 0.5180 Epoch: 5 Global Step: 26610 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:43:52,340-Speed 10468.05 samples/sec Loss 11.5280 LearningRate 0.5179 Epoch: 5 Global Step: 26620 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:44:00,198-Speed 10426.35 samples/sec Loss 11.5048 LearningRate 0.5178 Epoch: 5 Global Step: 26630 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:44:08,014-Speed 10483.29 samples/sec Loss 11.4621 LearningRate 0.5176 Epoch: 5 Global Step: 26640 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:44:15,815-Speed 10501.76 samples/sec Loss 11.5512 LearningRate 0.5175 Epoch: 5 Global Step: 26650 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:44:23,631-Speed 10482.62 samples/sec Loss 11.5617 LearningRate 0.5174 Epoch: 5 Global Step: 26660 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:44:31,474-Speed 10446.15 samples/sec Loss 11.5064 LearningRate 0.5172 Epoch: 5 Global Step: 26670 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:44:39,274-Speed 10504.15 samples/sec Loss 11.5240 LearningRate 0.5171 Epoch: 5 Global Step: 26680 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:44:47,093-Speed 10478.89 samples/sec Loss 11.6499 LearningRate 0.5170 Epoch: 5 Global Step: 26690 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:44:54,883-Speed 10516.69 samples/sec Loss 11.6228 LearningRate 0.5168 Epoch: 5 Global Step: 26700 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:45:02,681-Speed 10507.70 samples/sec Loss 11.5234 LearningRate 0.5167 Epoch: 5 Global Step: 26710 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:45:10,485-Speed 10498.46 samples/sec Loss 11.4469 LearningRate 0.5165 Epoch: 5 Global Step: 26720 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:45:18,286-Speed 10507.39 samples/sec Loss 11.5871 LearningRate 0.5164 Epoch: 5 Global Step: 26730 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:45:26,096-Speed 10491.55 samples/sec Loss 11.4404 LearningRate 0.5163 Epoch: 5 Global Step: 26740 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:45:33,916-Speed 10476.71 samples/sec Loss 11.5462 LearningRate 0.5161 Epoch: 5 Global Step: 26750 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:45:41,745-Speed 10465.34 samples/sec Loss 11.4628 LearningRate 0.5160 Epoch: 5 Global Step: 26760 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:45:49,572-Speed 10467.85 samples/sec Loss 11.4131 LearningRate 0.5159 Epoch: 5 Global Step: 26770 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:45:57,381-Speed 10491.44 samples/sec Loss 11.5333 LearningRate 0.5157 Epoch: 5 Global Step: 26780 Fp16 Grad Scale: 524288 Required: 17 hours Training: 2022-01-15 20:46:05,184-Speed 10500.87 samples/sec Loss 11.4788 LearningRate 0.5156 Epoch: 5 Global Step: 26790 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:46:13,039-Speed 10430.65 samples/sec Loss 11.5344 LearningRate 0.5155 Epoch: 5 Global Step: 26800 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:46:20,825-Speed 10522.76 samples/sec Loss 11.5557 LearningRate 0.5153 Epoch: 5 Global Step: 26810 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:46:28,611-Speed 10523.39 samples/sec Loss 11.5709 LearningRate 0.5152 Epoch: 5 Global Step: 26820 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:46:36,412-Speed 10502.01 samples/sec Loss 11.5047 LearningRate 0.5151 Epoch: 5 Global Step: 26830 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:46:44,252-Speed 10449.52 samples/sec Loss 11.6312 LearningRate 0.5149 Epoch: 5 Global Step: 26840 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:46:52,046-Speed 10513.04 samples/sec Loss 11.5472 LearningRate 0.5148 Epoch: 5 Global Step: 26850 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:46:59,858-Speed 10488.02 samples/sec Loss 11.4820 LearningRate 0.5147 Epoch: 5 Global Step: 26860 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:47:07,661-Speed 10500.51 samples/sec Loss 11.5031 LearningRate 0.5145 Epoch: 5 Global Step: 26870 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:47:15,474-Speed 10485.05 samples/sec Loss 11.5336 LearningRate 0.5144 Epoch: 5 Global Step: 26880 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:47:23,256-Speed 10528.94 samples/sec Loss 11.3686 LearningRate 0.5143 Epoch: 5 Global Step: 26890 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:47:31,049-Speed 10514.27 samples/sec Loss 11.4629 LearningRate 0.5141 Epoch: 5 Global Step: 26900 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:47:38,834-Speed 10522.92 samples/sec Loss 11.5549 LearningRate 0.5140 Epoch: 5 Global Step: 26910 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:47:46,652-Speed 10479.60 samples/sec Loss 11.4240 LearningRate 0.5139 Epoch: 5 Global Step: 26920 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:47:54,463-Speed 10489.96 samples/sec Loss 11.4739 LearningRate 0.5137 Epoch: 5 Global Step: 26930 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:48:02,264-Speed 10502.20 samples/sec Loss 11.4634 LearningRate 0.5136 Epoch: 5 Global Step: 26940 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:48:10,093-Speed 10465.26 samples/sec Loss 11.3550 LearningRate 0.5135 Epoch: 5 Global Step: 26950 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:48:17,904-Speed 10489.18 samples/sec Loss 11.4440 LearningRate 0.5133 Epoch: 5 Global Step: 26960 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:48:25,695-Speed 10516.61 samples/sec Loss 11.4975 LearningRate 0.5132 Epoch: 5 Global Step: 26970 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:48:33,474-Speed 10532.98 samples/sec Loss 11.4726 LearningRate 0.5131 Epoch: 5 Global Step: 26980 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:48:41,261-Speed 10520.75 samples/sec Loss 11.4067 LearningRate 0.5129 Epoch: 5 Global Step: 26990 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:48:49,061-Speed 10504.19 samples/sec Loss 11.3730 LearningRate 0.5128 Epoch: 5 Global Step: 27000 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:48:56,933-Speed 10407.83 samples/sec Loss 11.4016 LearningRate 0.5127 Epoch: 5 Global Step: 27010 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:49:04,732-Speed 10505.58 samples/sec Loss 11.5206 LearningRate 0.5125 Epoch: 5 Global Step: 27020 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:49:12,529-Speed 10508.23 samples/sec Loss 11.6074 LearningRate 0.5124 Epoch: 5 Global Step: 27030 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:49:20,338-Speed 10491.54 samples/sec Loss 11.5190 LearningRate 0.5123 Epoch: 5 Global Step: 27040 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:49:28,156-Speed 10480.53 samples/sec Loss 11.4642 LearningRate 0.5121 Epoch: 5 Global Step: 27050 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:49:35,949-Speed 10514.19 samples/sec Loss 11.4856 LearningRate 0.5120 Epoch: 5 Global Step: 27060 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:49:43,770-Speed 10474.84 samples/sec Loss 11.3899 LearningRate 0.5119 Epoch: 5 Global Step: 27070 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:49:51,604-Speed 10458.17 samples/sec Loss 11.4782 LearningRate 0.5117 Epoch: 5 Global Step: 27080 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:49:59,397-Speed 10512.83 samples/sec Loss 11.4562 LearningRate 0.5116 Epoch: 5 Global Step: 27090 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:50:07,185-Speed 10520.14 samples/sec Loss 11.4942 LearningRate 0.5115 Epoch: 5 Global Step: 27100 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:50:15,000-Speed 10484.43 samples/sec Loss 11.4984 LearningRate 0.5113 Epoch: 5 Global Step: 27110 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:50:22,813-Speed 10486.72 samples/sec Loss 11.3993 LearningRate 0.5112 Epoch: 5 Global Step: 27120 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:50:30,601-Speed 10520.39 samples/sec Loss 11.4810 LearningRate 0.5111 Epoch: 5 Global Step: 27130 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:50:38,402-Speed 10503.12 samples/sec Loss 11.4280 LearningRate 0.5109 Epoch: 5 Global Step: 27140 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:50:46,205-Speed 10500.51 samples/sec Loss 11.4385 LearningRate 0.5108 Epoch: 5 Global Step: 27150 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:50:53,997-Speed 10515.04 samples/sec Loss 11.5393 LearningRate 0.5107 Epoch: 5 Global Step: 27160 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:51:01,776-Speed 10531.43 samples/sec Loss 11.5099 LearningRate 0.5105 Epoch: 5 Global Step: 27170 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:51:09,558-Speed 10527.99 samples/sec Loss 11.3963 LearningRate 0.5104 Epoch: 5 Global Step: 27180 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:51:17,375-Speed 10481.69 samples/sec Loss 11.3918 LearningRate 0.5103 Epoch: 5 Global Step: 27190 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:51:25,195-Speed 10477.83 samples/sec Loss 11.4580 LearningRate 0.5101 Epoch: 5 Global Step: 27200 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:51:33,004-Speed 10490.79 samples/sec Loss 11.4447 LearningRate 0.5100 Epoch: 5 Global Step: 27210 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:51:40,790-Speed 10522.53 samples/sec Loss 11.4227 LearningRate 0.5099 Epoch: 5 Global Step: 27220 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:51:48,622-Speed 10464.36 samples/sec Loss 11.6004 LearningRate 0.5097 Epoch: 5 Global Step: 27230 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:51:56,419-Speed 10507.82 samples/sec Loss 11.5106 LearningRate 0.5096 Epoch: 5 Global Step: 27240 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:52:04,202-Speed 10526.36 samples/sec Loss 11.4108 LearningRate 0.5095 Epoch: 5 Global Step: 27250 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:52:12,001-Speed 10505.30 samples/sec Loss 11.4345 LearningRate 0.5093 Epoch: 5 Global Step: 27260 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:52:19,791-Speed 10517.42 samples/sec Loss 11.4928 LearningRate 0.5092 Epoch: 5 Global Step: 27270 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:52:27,593-Speed 10501.73 samples/sec Loss 11.3901 LearningRate 0.5091 Epoch: 5 Global Step: 27280 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:52:35,372-Speed 10531.46 samples/sec Loss 11.3230 LearningRate 0.5089 Epoch: 5 Global Step: 27290 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:52:43,171-Speed 10506.48 samples/sec Loss 11.5677 LearningRate 0.5088 Epoch: 5 Global Step: 27300 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:52:50,957-Speed 10524.92 samples/sec Loss 11.4708 LearningRate 0.5087 Epoch: 5 Global Step: 27310 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:52:58,759-Speed 10500.04 samples/sec Loss 11.4211 LearningRate 0.5085 Epoch: 5 Global Step: 27320 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:53:06,557-Speed 10506.85 samples/sec Loss 11.6343 LearningRate 0.5084 Epoch: 5 Global Step: 27330 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:53:14,346-Speed 10519.42 samples/sec Loss 11.4743 LearningRate 0.5083 Epoch: 5 Global Step: 27340 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:53:22,158-Speed 10494.17 samples/sec Loss 11.4275 LearningRate 0.5081 Epoch: 5 Global Step: 27350 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:53:29,987-Speed 10464.73 samples/sec Loss 11.3643 LearningRate 0.5080 Epoch: 5 Global Step: 27360 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:53:37,791-Speed 10498.57 samples/sec Loss 11.3775 LearningRate 0.5079 Epoch: 5 Global Step: 27370 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:53:45,578-Speed 10521.73 samples/sec Loss 11.3653 LearningRate 0.5077 Epoch: 5 Global Step: 27380 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:53:53,371-Speed 10513.97 samples/sec Loss 11.3186 LearningRate 0.5076 Epoch: 5 Global Step: 27390 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:54:01,168-Speed 10507.72 samples/sec Loss 11.3827 LearningRate 0.5075 Epoch: 5 Global Step: 27400 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:54:08,955-Speed 10520.86 samples/sec Loss 11.4389 LearningRate 0.5073 Epoch: 5 Global Step: 27410 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:54:16,781-Speed 10468.63 samples/sec Loss 11.5264 LearningRate 0.5072 Epoch: 5 Global Step: 27420 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:54:24,584-Speed 10499.57 samples/sec Loss 11.4346 LearningRate 0.5071 Epoch: 5 Global Step: 27430 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:54:32,376-Speed 10515.39 samples/sec Loss 11.3606 LearningRate 0.5069 Epoch: 5 Global Step: 27440 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:54:40,173-Speed 10507.90 samples/sec Loss 11.2959 LearningRate 0.5068 Epoch: 5 Global Step: 27450 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:54:47,959-Speed 10522.27 samples/sec Loss 11.3856 LearningRate 0.5067 Epoch: 5 Global Step: 27460 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:54:55,754-Speed 10511.00 samples/sec Loss 11.3733 LearningRate 0.5065 Epoch: 5 Global Step: 27470 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:55:03,549-Speed 10510.72 samples/sec Loss 11.3693 LearningRate 0.5064 Epoch: 5 Global Step: 27480 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:55:11,383-Speed 10459.03 samples/sec Loss 11.3690 LearningRate 0.5063 Epoch: 5 Global Step: 27490 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:55:19,237-Speed 10430.97 samples/sec Loss 11.2917 LearningRate 0.5061 Epoch: 5 Global Step: 27500 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:55:27,051-Speed 10485.47 samples/sec Loss 11.4692 LearningRate 0.5060 Epoch: 5 Global Step: 27510 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:55:34,852-Speed 10502.91 samples/sec Loss 11.3435 LearningRate 0.5059 Epoch: 5 Global Step: 27520 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:55:42,669-Speed 10480.80 samples/sec Loss 11.3998 LearningRate 0.5057 Epoch: 5 Global Step: 27530 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:55:50,481-Speed 10487.69 samples/sec Loss 11.4277 LearningRate 0.5056 Epoch: 5 Global Step: 27540 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:55:58,283-Speed 10501.27 samples/sec Loss 11.4501 LearningRate 0.5055 Epoch: 5 Global Step: 27550 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:56:06,073-Speed 10517.41 samples/sec Loss 11.3364 LearningRate 0.5053 Epoch: 5 Global Step: 27560 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:56:13,923-Speed 10436.74 samples/sec Loss 11.4428 LearningRate 0.5052 Epoch: 5 Global Step: 27570 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:56:21,712-Speed 10519.14 samples/sec Loss 11.4038 LearningRate 0.5051 Epoch: 5 Global Step: 27580 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:56:29,497-Speed 10523.80 samples/sec Loss 11.3434 LearningRate 0.5049 Epoch: 5 Global Step: 27590 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:56:37,299-Speed 10501.15 samples/sec Loss 11.3011 LearningRate 0.5048 Epoch: 5 Global Step: 27600 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:56:45,082-Speed 10528.08 samples/sec Loss 11.4356 LearningRate 0.5047 Epoch: 5 Global Step: 27610 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:56:52,894-Speed 10490.81 samples/sec Loss 11.3593 LearningRate 0.5045 Epoch: 5 Global Step: 27620 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:57:00,708-Speed 10484.11 samples/sec Loss 11.3282 LearningRate 0.5044 Epoch: 5 Global Step: 27630 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:57:08,491-Speed 10527.19 samples/sec Loss 11.3899 LearningRate 0.5043 Epoch: 5 Global Step: 27640 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:57:16,288-Speed 10508.83 samples/sec Loss 11.3257 LearningRate 0.5041 Epoch: 5 Global Step: 27650 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:57:24,078-Speed 10517.02 samples/sec Loss 11.3423 LearningRate 0.5040 Epoch: 5 Global Step: 27660 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:57:31,899-Speed 10475.70 samples/sec Loss 11.2896 LearningRate 0.5039 Epoch: 5 Global Step: 27670 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:57:39,697-Speed 10506.59 samples/sec Loss 11.4102 LearningRate 0.5037 Epoch: 5 Global Step: 27680 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:57:47,506-Speed 10491.65 samples/sec Loss 11.4192 LearningRate 0.5036 Epoch: 5 Global Step: 27690 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:57:55,296-Speed 10517.10 samples/sec Loss 11.3391 LearningRate 0.5035 Epoch: 5 Global Step: 27700 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:58:03,112-Speed 10483.60 samples/sec Loss 11.3657 LearningRate 0.5033 Epoch: 5 Global Step: 27710 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:58:10,888-Speed 10535.95 samples/sec Loss 11.3564 LearningRate 0.5032 Epoch: 5 Global Step: 27720 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:58:18,684-Speed 10508.82 samples/sec Loss 11.2551 LearningRate 0.5031 Epoch: 5 Global Step: 27730 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:58:26,488-Speed 10499.64 samples/sec Loss 11.3508 LearningRate 0.5029 Epoch: 5 Global Step: 27740 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:58:34,281-Speed 10513.12 samples/sec Loss 11.2627 LearningRate 0.5028 Epoch: 5 Global Step: 27750 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 20:58:42,082-Speed 10502.42 samples/sec Loss 11.6106 LearningRate 0.5027 Epoch: 5 Global Step: 27760 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:58:49,862-Speed 10530.42 samples/sec Loss 11.6839 LearningRate 0.5026 Epoch: 5 Global Step: 27770 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:58:57,662-Speed 10504.85 samples/sec Loss 11.4967 LearningRate 0.5024 Epoch: 5 Global Step: 27780 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:59:05,455-Speed 10513.48 samples/sec Loss 11.4512 LearningRate 0.5023 Epoch: 5 Global Step: 27790 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:59:13,244-Speed 10519.29 samples/sec Loss 11.3100 LearningRate 0.5022 Epoch: 5 Global Step: 27800 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:59:21,111-Speed 10414.27 samples/sec Loss 11.3139 LearningRate 0.5020 Epoch: 5 Global Step: 27810 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:59:28,933-Speed 10474.27 samples/sec Loss 11.2781 LearningRate 0.5019 Epoch: 5 Global Step: 27820 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:59:36,756-Speed 10473.32 samples/sec Loss 11.2811 LearningRate 0.5018 Epoch: 5 Global Step: 27830 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:59:44,547-Speed 10516.60 samples/sec Loss 11.2082 LearningRate 0.5016 Epoch: 5 Global Step: 27840 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 20:59:52,344-Speed 10507.67 samples/sec Loss 11.3166 LearningRate 0.5015 Epoch: 5 Global Step: 27850 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 21:00:00,154-Speed 10489.93 samples/sec Loss 11.2725 LearningRate 0.5014 Epoch: 5 Global Step: 27860 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 21:00:07,958-Speed 10499.85 samples/sec Loss 11.2806 LearningRate 0.5012 Epoch: 5 Global Step: 27870 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 21:00:15,793-Speed 10456.77 samples/sec Loss 11.3655 LearningRate 0.5011 Epoch: 5 Global Step: 27880 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 21:00:23,634-Speed 10448.14 samples/sec Loss 11.2712 LearningRate 0.5010 Epoch: 5 Global Step: 27890 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 21:00:31,418-Speed 10525.42 samples/sec Loss 11.3997 LearningRate 0.5008 Epoch: 5 Global Step: 27900 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 21:00:39,209-Speed 10516.87 samples/sec Loss 11.3384 LearningRate 0.5007 Epoch: 5 Global Step: 27910 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 21:00:46,983-Speed 10539.63 samples/sec Loss 11.4662 LearningRate 0.5006 Epoch: 5 Global Step: 27920 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 21:00:54,779-Speed 10507.96 samples/sec Loss 11.2809 LearningRate 0.5004 Epoch: 5 Global Step: 27930 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 21:01:02,588-Speed 10492.62 samples/sec Loss 11.3288 LearningRate 0.5003 Epoch: 5 Global Step: 27940 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 21:01:10,423-Speed 10457.04 samples/sec Loss 11.2570 LearningRate 0.5002 Epoch: 5 Global Step: 27950 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 21:01:18,275-Speed 10434.91 samples/sec Loss 11.2453 LearningRate 0.5000 Epoch: 5 Global Step: 27960 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 21:01:26,149-Speed 10404.48 samples/sec Loss 11.3233 LearningRate 0.4999 Epoch: 5 Global Step: 27970 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 21:01:33,931-Speed 10528.58 samples/sec Loss 11.3083 LearningRate 0.4998 Epoch: 5 Global Step: 27980 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 21:01:41,729-Speed 10506.34 samples/sec Loss 11.2847 LearningRate 0.4996 Epoch: 5 Global Step: 27990 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 21:01:49,525-Speed 10508.29 samples/sec Loss 11.3412 LearningRate 0.4995 Epoch: 5 Global Step: 28000 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 21:01:57,316-Speed 10517.57 samples/sec Loss 11.2777 LearningRate 0.4994 Epoch: 5 Global Step: 28010 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 21:02:05,129-Speed 10486.16 samples/sec Loss 11.3098 LearningRate 0.4992 Epoch: 5 Global Step: 28020 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 21:02:12,929-Speed 10504.20 samples/sec Loss 11.1763 LearningRate 0.4991 Epoch: 5 Global Step: 28030 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 21:02:20,743-Speed 10485.45 samples/sec Loss 11.3284 LearningRate 0.4990 Epoch: 5 Global Step: 28040 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 21:02:28,559-Speed 10483.26 samples/sec Loss 11.1767 LearningRate 0.4988 Epoch: 5 Global Step: 28050 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 21:02:36,396-Speed 10454.22 samples/sec Loss 11.3779 LearningRate 0.4987 Epoch: 5 Global Step: 28060 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 21:02:44,195-Speed 10505.13 samples/sec Loss 11.2167 LearningRate 0.4986 Epoch: 5 Global Step: 28070 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 21:02:52,014-Speed 10477.83 samples/sec Loss 11.2841 LearningRate 0.4985 Epoch: 5 Global Step: 28080 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 21:02:59,803-Speed 10520.01 samples/sec Loss 11.2522 LearningRate 0.4983 Epoch: 5 Global Step: 28090 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 21:03:07,599-Speed 10509.15 samples/sec Loss 11.1962 LearningRate 0.4982 Epoch: 5 Global Step: 28100 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 21:03:15,408-Speed 10491.54 samples/sec Loss 11.2483 LearningRate 0.4981 Epoch: 5 Global Step: 28110 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 21:03:23,262-Speed 10431.96 samples/sec Loss 11.2969 LearningRate 0.4979 Epoch: 5 Global Step: 28120 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 21:03:31,102-Speed 10450.97 samples/sec Loss 11.2108 LearningRate 0.4978 Epoch: 5 Global Step: 28130 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 21:03:38,921-Speed 10477.97 samples/sec Loss 11.2784 LearningRate 0.4977 Epoch: 5 Global Step: 28140 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 21:03:46,762-Speed 10448.48 samples/sec Loss 11.4269 LearningRate 0.4975 Epoch: 5 Global Step: 28150 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 21:03:54,592-Speed 10463.76 samples/sec Loss 11.2380 LearningRate 0.4974 Epoch: 5 Global Step: 28160 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 21:04:02,411-Speed 10478.28 samples/sec Loss 11.4248 LearningRate 0.4973 Epoch: 5 Global Step: 28170 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 21:04:10,223-Speed 10488.63 samples/sec Loss 11.3771 LearningRate 0.4971 Epoch: 5 Global Step: 28180 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 21:04:18,010-Speed 10521.49 samples/sec Loss 11.3162 LearningRate 0.4970 Epoch: 5 Global Step: 28190 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 21:04:25,815-Speed 10497.72 samples/sec Loss 11.2761 LearningRate 0.4969 Epoch: 5 Global Step: 28200 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 21:04:33,614-Speed 10504.43 samples/sec Loss 11.2385 LearningRate 0.4967 Epoch: 5 Global Step: 28210 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 21:04:41,433-Speed 10478.75 samples/sec Loss 11.2344 LearningRate 0.4966 Epoch: 5 Global Step: 28220 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 21:04:49,245-Speed 10488.01 samples/sec Loss 11.2854 LearningRate 0.4965 Epoch: 5 Global Step: 28230 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 21:04:57,028-Speed 10526.50 samples/sec Loss 11.2647 LearningRate 0.4963 Epoch: 5 Global Step: 28240 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 21:05:04,805-Speed 10535.39 samples/sec Loss 11.2319 LearningRate 0.4962 Epoch: 5 Global Step: 28250 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 21:05:12,607-Speed 10501.20 samples/sec Loss 11.3377 LearningRate 0.4961 Epoch: 5 Global Step: 28260 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 21:05:20,395-Speed 10520.15 samples/sec Loss 11.2648 LearningRate 0.4960 Epoch: 5 Global Step: 28270 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 21:05:28,176-Speed 10528.97 samples/sec Loss 11.3686 LearningRate 0.4958 Epoch: 5 Global Step: 28280 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 21:05:35,992-Speed 10483.15 samples/sec Loss 11.3483 LearningRate 0.4957 Epoch: 5 Global Step: 28290 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 21:05:43,797-Speed 10497.14 samples/sec Loss 11.3267 LearningRate 0.4956 Epoch: 5 Global Step: 28300 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 21:05:51,581-Speed 10525.32 samples/sec Loss 11.3181 LearningRate 0.4954 Epoch: 5 Global Step: 28310 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 21:05:59,378-Speed 10507.54 samples/sec Loss 11.2079 LearningRate 0.4953 Epoch: 5 Global Step: 28320 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 21:06:07,198-Speed 10477.64 samples/sec Loss 11.2332 LearningRate 0.4952 Epoch: 5 Global Step: 28330 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 21:06:15,022-Speed 10471.93 samples/sec Loss 11.2790 LearningRate 0.4950 Epoch: 5 Global Step: 28340 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 21:06:22,838-Speed 10484.09 samples/sec Loss 11.1644 LearningRate 0.4949 Epoch: 5 Global Step: 28350 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 21:06:30,660-Speed 10474.25 samples/sec Loss 11.2849 LearningRate 0.4948 Epoch: 5 Global Step: 28360 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 21:06:38,452-Speed 10514.55 samples/sec Loss 11.2281 LearningRate 0.4946 Epoch: 5 Global Step: 28370 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 21:06:46,288-Speed 10456.50 samples/sec Loss 11.2733 LearningRate 0.4945 Epoch: 5 Global Step: 28380 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 21:06:54,125-Speed 10454.06 samples/sec Loss 11.2209 LearningRate 0.4944 Epoch: 5 Global Step: 28390 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 21:07:01,931-Speed 10495.06 samples/sec Loss 11.2438 LearningRate 0.4942 Epoch: 5 Global Step: 28400 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 21:07:09,763-Speed 10461.81 samples/sec Loss 11.2472 LearningRate 0.4941 Epoch: 5 Global Step: 28410 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 21:07:17,561-Speed 10506.32 samples/sec Loss 11.2334 LearningRate 0.4940 Epoch: 5 Global Step: 28420 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 21:07:25,338-Speed 10535.24 samples/sec Loss 11.2209 LearningRate 0.4938 Epoch: 5 Global Step: 28430 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 21:07:33,158-Speed 10477.63 samples/sec Loss 11.2446 LearningRate 0.4937 Epoch: 5 Global Step: 28440 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 21:07:40,960-Speed 10500.72 samples/sec Loss 11.2987 LearningRate 0.4936 Epoch: 5 Global Step: 28450 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 21:07:48,743-Speed 10527.51 samples/sec Loss 11.2381 LearningRate 0.4935 Epoch: 5 Global Step: 28460 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-01-15 21:07:56,561-Speed 10484.47 samples/sec Loss 11.2724 LearningRate 0.4933 Epoch: 5 Global Step: 28470 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 21:08:04,373-Speed 10486.83 samples/sec Loss 11.3449 LearningRate 0.4932 Epoch: 5 Global Step: 28480 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 21:08:12,204-Speed 10463.68 samples/sec Loss 11.2447 LearningRate 0.4931 Epoch: 5 Global Step: 28490 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-01-15 21:08:20,070-Speed 10415.63 samples/sec Loss 11.1560 LearningRate 0.4929 Epoch: 5 Global Step: 28500 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:08:27,918-Speed 10444.88 samples/sec Loss 11.2265 LearningRate 0.4928 Epoch: 5 Global Step: 28510 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:08:35,745-Speed 10467.71 samples/sec Loss 11.2882 LearningRate 0.4927 Epoch: 5 Global Step: 28520 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:08:43,531-Speed 10524.11 samples/sec Loss 11.2390 LearningRate 0.4925 Epoch: 5 Global Step: 28530 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:08:51,305-Speed 10537.85 samples/sec Loss 11.2508 LearningRate 0.4924 Epoch: 5 Global Step: 28540 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:08:59,092-Speed 10522.16 samples/sec Loss 11.2039 LearningRate 0.4923 Epoch: 5 Global Step: 28550 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:09:06,952-Speed 10423.96 samples/sec Loss 11.1841 LearningRate 0.4921 Epoch: 5 Global Step: 28560 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:09:14,752-Speed 10504.60 samples/sec Loss 11.2096 LearningRate 0.4920 Epoch: 5 Global Step: 28570 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:09:22,565-Speed 10486.14 samples/sec Loss 11.2000 LearningRate 0.4919 Epoch: 5 Global Step: 28580 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:09:30,349-Speed 10525.66 samples/sec Loss 11.2570 LearningRate 0.4918 Epoch: 5 Global Step: 28590 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:09:38,139-Speed 10519.02 samples/sec Loss 11.2374 LearningRate 0.4916 Epoch: 5 Global Step: 28600 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:09:45,922-Speed 10526.60 samples/sec Loss 11.2292 LearningRate 0.4915 Epoch: 5 Global Step: 28610 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:09:53,791-Speed 10411.24 samples/sec Loss 11.1801 LearningRate 0.4914 Epoch: 5 Global Step: 28620 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:10:01,606-Speed 10484.43 samples/sec Loss 11.1471 LearningRate 0.4912 Epoch: 5 Global Step: 28630 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:10:09,417-Speed 10489.88 samples/sec Loss 11.2570 LearningRate 0.4911 Epoch: 5 Global Step: 28640 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:10:17,304-Speed 10387.79 samples/sec Loss 11.2006 LearningRate 0.4910 Epoch: 5 Global Step: 28650 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:10:25,097-Speed 10515.44 samples/sec Loss 11.1271 LearningRate 0.4908 Epoch: 5 Global Step: 28660 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:10:32,870-Speed 10539.46 samples/sec Loss 11.2136 LearningRate 0.4907 Epoch: 5 Global Step: 28670 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:10:40,670-Speed 10505.29 samples/sec Loss 11.3073 LearningRate 0.4906 Epoch: 5 Global Step: 28680 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:10:48,447-Speed 10534.82 samples/sec Loss 11.2727 LearningRate 0.4904 Epoch: 5 Global Step: 28690 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:10:56,245-Speed 10505.32 samples/sec Loss 11.2822 LearningRate 0.4903 Epoch: 5 Global Step: 28700 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:11:04,061-Speed 10483.76 samples/sec Loss 11.1830 LearningRate 0.4902 Epoch: 5 Global Step: 28710 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:11:11,856-Speed 10511.43 samples/sec Loss 11.2059 LearningRate 0.4901 Epoch: 5 Global Step: 28720 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:11:19,659-Speed 10498.64 samples/sec Loss 11.1729 LearningRate 0.4899 Epoch: 5 Global Step: 28730 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:11:27,459-Speed 10503.82 samples/sec Loss 11.0767 LearningRate 0.4898 Epoch: 5 Global Step: 28740 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:11:35,248-Speed 10519.16 samples/sec Loss 11.1869 LearningRate 0.4897 Epoch: 5 Global Step: 28750 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:11:43,053-Speed 10497.31 samples/sec Loss 11.2702 LearningRate 0.4895 Epoch: 5 Global Step: 28760 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:11:50,863-Speed 10491.48 samples/sec Loss 11.1985 LearningRate 0.4894 Epoch: 5 Global Step: 28770 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:11:58,675-Speed 10487.88 samples/sec Loss 11.1591 LearningRate 0.4893 Epoch: 5 Global Step: 28780 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:12:06,472-Speed 10507.35 samples/sec Loss 11.1165 LearningRate 0.4891 Epoch: 5 Global Step: 28790 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:12:14,294-Speed 10475.31 samples/sec Loss 11.2077 LearningRate 0.4890 Epoch: 5 Global Step: 28800 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:12:22,121-Speed 10467.33 samples/sec Loss 11.1520 LearningRate 0.4889 Epoch: 5 Global Step: 28810 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:12:29,962-Speed 10449.32 samples/sec Loss 11.1945 LearningRate 0.4887 Epoch: 5 Global Step: 28820 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:12:37,796-Speed 10458.42 samples/sec Loss 11.1694 LearningRate 0.4886 Epoch: 5 Global Step: 28830 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:12:45,594-Speed 10506.70 samples/sec Loss 11.2184 LearningRate 0.4885 Epoch: 5 Global Step: 28840 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:12:53,428-Speed 10458.41 samples/sec Loss 11.2278 LearningRate 0.4884 Epoch: 5 Global Step: 28850 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:13:01,264-Speed 10456.26 samples/sec Loss 11.2343 LearningRate 0.4882 Epoch: 5 Global Step: 28860 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:13:09,116-Speed 10434.48 samples/sec Loss 11.0576 LearningRate 0.4881 Epoch: 5 Global Step: 28870 Fp16 Grad Scale: 524288 Required: 16 hours Training: 2022-01-15 21:13:16,910-Speed 10512.70 samples/sec Loss 11.1304 LearningRate 0.4880 Epoch: 5 Global Step: 28880 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:13:24,703-Speed 10513.18 samples/sec Loss 11.2631 LearningRate 0.4878 Epoch: 5 Global Step: 28890 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:13:32,513-Speed 10489.99 samples/sec Loss 11.2502 LearningRate 0.4877 Epoch: 5 Global Step: 28900 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:13:40,372-Speed 10425.16 samples/sec Loss 11.1852 LearningRate 0.4876 Epoch: 5 Global Step: 28910 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:13:48,216-Speed 10446.00 samples/sec Loss 11.2902 LearningRate 0.4874 Epoch: 5 Global Step: 28920 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:13:56,017-Speed 10502.84 samples/sec Loss 11.1389 LearningRate 0.4873 Epoch: 5 Global Step: 28930 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:14:03,815-Speed 10505.78 samples/sec Loss 11.1910 LearningRate 0.4872 Epoch: 5 Global Step: 28940 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:14:11,607-Speed 10515.37 samples/sec Loss 11.1367 LearningRate 0.4870 Epoch: 5 Global Step: 28950 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:14:19,462-Speed 10429.93 samples/sec Loss 11.1250 LearningRate 0.4869 Epoch: 5 Global Step: 28960 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:14:27,304-Speed 10448.19 samples/sec Loss 11.0777 LearningRate 0.4868 Epoch: 5 Global Step: 28970 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:14:35,099-Speed 10511.51 samples/sec Loss 11.1337 LearningRate 0.4867 Epoch: 5 Global Step: 28980 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:14:42,916-Speed 10481.74 samples/sec Loss 11.0601 LearningRate 0.4865 Epoch: 5 Global Step: 28990 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:14:50,727-Speed 10489.28 samples/sec Loss 11.1656 LearningRate 0.4864 Epoch: 5 Global Step: 29000 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:14:58,529-Speed 10500.87 samples/sec Loss 11.1464 LearningRate 0.4863 Epoch: 5 Global Step: 29010 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:15:06,343-Speed 10485.61 samples/sec Loss 11.3368 LearningRate 0.4861 Epoch: 5 Global Step: 29020 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:15:14,126-Speed 10528.73 samples/sec Loss 11.1771 LearningRate 0.4860 Epoch: 5 Global Step: 29030 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:15:21,913-Speed 10521.12 samples/sec Loss 11.0722 LearningRate 0.4859 Epoch: 5 Global Step: 29040 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:15:29,749-Speed 10456.73 samples/sec Loss 11.1570 LearningRate 0.4857 Epoch: 5 Global Step: 29050 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:15:37,534-Speed 10523.37 samples/sec Loss 11.1168 LearningRate 0.4856 Epoch: 5 Global Step: 29060 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:15:45,330-Speed 10510.19 samples/sec Loss 11.1611 LearningRate 0.4855 Epoch: 5 Global Step: 29070 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:15:53,134-Speed 10498.59 samples/sec Loss 11.2108 LearningRate 0.4854 Epoch: 5 Global Step: 29080 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:16:00,933-Speed 10505.96 samples/sec Loss 11.1793 LearningRate 0.4852 Epoch: 5 Global Step: 29090 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:16:08,742-Speed 10493.13 samples/sec Loss 11.1694 LearningRate 0.4851 Epoch: 5 Global Step: 29100 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:16:16,540-Speed 10506.67 samples/sec Loss 11.1109 LearningRate 0.4850 Epoch: 5 Global Step: 29110 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:16:24,328-Speed 10520.05 samples/sec Loss 11.1386 LearningRate 0.4848 Epoch: 5 Global Step: 29120 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:16:32,124-Speed 10508.28 samples/sec Loss 11.0996 LearningRate 0.4847 Epoch: 5 Global Step: 29130 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:16:39,933-Speed 10493.09 samples/sec Loss 11.1345 LearningRate 0.4846 Epoch: 5 Global Step: 29140 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:16:47,757-Speed 10471.59 samples/sec Loss 11.0976 LearningRate 0.4844 Epoch: 5 Global Step: 29150 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:16:55,566-Speed 10492.19 samples/sec Loss 11.0929 LearningRate 0.4843 Epoch: 5 Global Step: 29160 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:17:03,377-Speed 10489.48 samples/sec Loss 11.1127 LearningRate 0.4842 Epoch: 5 Global Step: 29170 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:17:11,177-Speed 10504.53 samples/sec Loss 11.1484 LearningRate 0.4841 Epoch: 5 Global Step: 29180 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:17:18,960-Speed 10526.98 samples/sec Loss 11.0210 LearningRate 0.4839 Epoch: 5 Global Step: 29190 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:17:26,769-Speed 10491.23 samples/sec Loss 11.0734 LearningRate 0.4838 Epoch: 5 Global Step: 29200 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:17:34,565-Speed 10510.37 samples/sec Loss 11.0817 LearningRate 0.4837 Epoch: 5 Global Step: 29210 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:17:42,371-Speed 10495.54 samples/sec Loss 11.1901 LearningRate 0.4835 Epoch: 5 Global Step: 29220 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:17:50,173-Speed 10501.73 samples/sec Loss 11.1022 LearningRate 0.4834 Epoch: 5 Global Step: 29230 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:17:57,941-Speed 10546.95 samples/sec Loss 11.1052 LearningRate 0.4833 Epoch: 5 Global Step: 29240 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:18:05,750-Speed 10492.01 samples/sec Loss 11.1204 LearningRate 0.4831 Epoch: 5 Global Step: 29250 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:18:13,562-Speed 10489.00 samples/sec Loss 11.2460 LearningRate 0.4830 Epoch: 5 Global Step: 29260 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:18:21,352-Speed 10516.44 samples/sec Loss 11.1313 LearningRate 0.4829 Epoch: 5 Global Step: 29270 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:18:29,153-Speed 10503.96 samples/sec Loss 11.1387 LearningRate 0.4828 Epoch: 5 Global Step: 29280 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:18:36,939-Speed 10521.81 samples/sec Loss 11.1097 LearningRate 0.4826 Epoch: 5 Global Step: 29290 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:18:44,785-Speed 10443.81 samples/sec Loss 11.0621 LearningRate 0.4825 Epoch: 5 Global Step: 29300 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:18:52,580-Speed 10510.57 samples/sec Loss 11.1273 LearningRate 0.4824 Epoch: 5 Global Step: 29310 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:19:00,404-Speed 10472.87 samples/sec Loss 11.2578 LearningRate 0.4822 Epoch: 5 Global Step: 29320 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:19:08,217-Speed 10485.53 samples/sec Loss 11.1437 LearningRate 0.4821 Epoch: 5 Global Step: 29330 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:19:16,025-Speed 10493.97 samples/sec Loss 11.0301 LearningRate 0.4820 Epoch: 5 Global Step: 29340 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:19:23,842-Speed 10480.34 samples/sec Loss 11.0538 LearningRate 0.4818 Epoch: 5 Global Step: 29350 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:19:31,627-Speed 10524.86 samples/sec Loss 11.0418 LearningRate 0.4817 Epoch: 5 Global Step: 29360 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:19:39,413-Speed 10522.71 samples/sec Loss 11.1943 LearningRate 0.4816 Epoch: 5 Global Step: 29370 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:19:47,223-Speed 10489.91 samples/sec Loss 11.2535 LearningRate 0.4815 Epoch: 5 Global Step: 29380 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:19:55,021-Speed 10508.17 samples/sec Loss 11.1034 LearningRate 0.4813 Epoch: 5 Global Step: 29390 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:20:02,823-Speed 10501.25 samples/sec Loss 11.0475 LearningRate 0.4812 Epoch: 5 Global Step: 29400 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:20:10,638-Speed 10482.44 samples/sec Loss 11.0682 LearningRate 0.4811 Epoch: 5 Global Step: 29410 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:20:18,438-Speed 10503.69 samples/sec Loss 10.9854 LearningRate 0.4809 Epoch: 5 Global Step: 29420 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:20:26,238-Speed 10504.96 samples/sec Loss 10.9783 LearningRate 0.4808 Epoch: 5 Global Step: 29430 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:20:34,052-Speed 10485.85 samples/sec Loss 11.0527 LearningRate 0.4807 Epoch: 5 Global Step: 29440 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:20:41,871-Speed 10477.28 samples/sec Loss 11.0657 LearningRate 0.4806 Epoch: 5 Global Step: 29450 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:20:49,690-Speed 10477.99 samples/sec Loss 11.1048 LearningRate 0.4804 Epoch: 5 Global Step: 29460 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:20:57,486-Speed 10511.35 samples/sec Loss 11.0260 LearningRate 0.4803 Epoch: 5 Global Step: 29470 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:21:05,264-Speed 10533.40 samples/sec Loss 11.0543 LearningRate 0.4802 Epoch: 5 Global Step: 29480 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:21:13,044-Speed 10531.29 samples/sec Loss 11.1709 LearningRate 0.4800 Epoch: 5 Global Step: 29490 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:21:20,846-Speed 10501.91 samples/sec Loss 11.0662 LearningRate 0.4799 Epoch: 5 Global Step: 29500 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:21:28,665-Speed 10478.72 samples/sec Loss 11.0176 LearningRate 0.4798 Epoch: 5 Global Step: 29510 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:21:36,457-Speed 10515.67 samples/sec Loss 11.0795 LearningRate 0.4796 Epoch: 5 Global Step: 29520 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:21:44,254-Speed 10506.99 samples/sec Loss 11.0525 LearningRate 0.4795 Epoch: 5 Global Step: 29530 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:21:52,056-Speed 10502.52 samples/sec Loss 11.0408 LearningRate 0.4794 Epoch: 5 Global Step: 29540 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:21:59,873-Speed 10481.45 samples/sec Loss 11.1322 LearningRate 0.4793 Epoch: 5 Global Step: 29550 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:22:07,675-Speed 10501.13 samples/sec Loss 11.1569 LearningRate 0.4791 Epoch: 5 Global Step: 29560 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:22:15,474-Speed 10504.22 samples/sec Loss 11.0588 LearningRate 0.4790 Epoch: 5 Global Step: 29570 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:22:23,263-Speed 10519.36 samples/sec Loss 11.0661 LearningRate 0.4789 Epoch: 5 Global Step: 29580 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:22:31,122-Speed 10425.27 samples/sec Loss 11.0368 LearningRate 0.4787 Epoch: 5 Global Step: 29590 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:22:38,938-Speed 10482.74 samples/sec Loss 11.1334 LearningRate 0.4786 Epoch: 5 Global Step: 29600 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:22:46,756-Speed 10479.78 samples/sec Loss 10.9943 LearningRate 0.4785 Epoch: 5 Global Step: 29610 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:22:54,565-Speed 10492.55 samples/sec Loss 11.1094 LearningRate 0.4784 Epoch: 5 Global Step: 29620 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:23:02,349-Speed 10525.94 samples/sec Loss 11.0068 LearningRate 0.4782 Epoch: 5 Global Step: 29630 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:23:10,129-Speed 10530.73 samples/sec Loss 11.1003 LearningRate 0.4781 Epoch: 5 Global Step: 29640 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:23:17,907-Speed 10533.39 samples/sec Loss 11.1184 LearningRate 0.4780 Epoch: 5 Global Step: 29650 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:23:25,708-Speed 10502.80 samples/sec Loss 11.0390 LearningRate 0.4778 Epoch: 5 Global Step: 29660 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:23:33,496-Speed 10519.20 samples/sec Loss 10.9793 LearningRate 0.4777 Epoch: 5 Global Step: 29670 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:23:41,295-Speed 10506.27 samples/sec Loss 11.0715 LearningRate 0.4776 Epoch: 5 Global Step: 29680 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:23:49,096-Speed 10502.93 samples/sec Loss 11.0504 LearningRate 0.4774 Epoch: 5 Global Step: 29690 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:23:56,874-Speed 10532.69 samples/sec Loss 11.0655 LearningRate 0.4773 Epoch: 5 Global Step: 29700 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:24:04,674-Speed 10504.03 samples/sec Loss 10.9768 LearningRate 0.4772 Epoch: 5 Global Step: 29710 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:24:12,466-Speed 10515.39 samples/sec Loss 10.9786 LearningRate 0.4771 Epoch: 5 Global Step: 29720 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:24:20,236-Speed 10544.44 samples/sec Loss 10.9651 LearningRate 0.4769 Epoch: 5 Global Step: 29730 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:24:28,027-Speed 10515.42 samples/sec Loss 11.1011 LearningRate 0.4768 Epoch: 5 Global Step: 29740 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:24:35,801-Speed 10539.89 samples/sec Loss 11.0369 LearningRate 0.4767 Epoch: 5 Global Step: 29750 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:24:43,624-Speed 10473.11 samples/sec Loss 11.1068 LearningRate 0.4765 Epoch: 5 Global Step: 29760 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:24:51,462-Speed 10453.05 samples/sec Loss 11.0695 LearningRate 0.4764 Epoch: 5 Global Step: 29770 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:24:59,277-Speed 10484.59 samples/sec Loss 11.0664 LearningRate 0.4763 Epoch: 5 Global Step: 29780 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:25:07,049-Speed 10541.83 samples/sec Loss 11.0493 LearningRate 0.4762 Epoch: 5 Global Step: 29790 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:25:14,831-Speed 10528.91 samples/sec Loss 10.9747 LearningRate 0.4760 Epoch: 5 Global Step: 29800 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:25:22,623-Speed 10514.05 samples/sec Loss 10.9830 LearningRate 0.4759 Epoch: 5 Global Step: 29810 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:25:30,412-Speed 10519.65 samples/sec Loss 11.0189 LearningRate 0.4758 Epoch: 5 Global Step: 29820 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:25:38,199-Speed 10520.86 samples/sec Loss 10.9483 LearningRate 0.4756 Epoch: 5 Global Step: 29830 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:25:46,025-Speed 10469.65 samples/sec Loss 11.0064 LearningRate 0.4755 Epoch: 5 Global Step: 29840 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:25:53,815-Speed 10516.96 samples/sec Loss 11.0020 LearningRate 0.4754 Epoch: 5 Global Step: 29850 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:26:01,611-Speed 10509.24 samples/sec Loss 10.9629 LearningRate 0.4753 Epoch: 5 Global Step: 29860 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:26:09,403-Speed 10515.71 samples/sec Loss 11.0939 LearningRate 0.4751 Epoch: 5 Global Step: 29870 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:26:17,196-Speed 10513.70 samples/sec Loss 11.0515 LearningRate 0.4750 Epoch: 5 Global Step: 29880 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:26:24,979-Speed 10525.82 samples/sec Loss 11.2207 LearningRate 0.4749 Epoch: 5 Global Step: 29890 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:26:32,752-Speed 10541.02 samples/sec Loss 11.2967 LearningRate 0.4747 Epoch: 5 Global Step: 29900 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:26:40,542-Speed 10517.68 samples/sec Loss 11.0515 LearningRate 0.4746 Epoch: 5 Global Step: 29910 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:26:48,329-Speed 10521.23 samples/sec Loss 10.9319 LearningRate 0.4745 Epoch: 5 Global Step: 29920 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:26:56,097-Speed 10547.84 samples/sec Loss 10.9318 LearningRate 0.4744 Epoch: 5 Global Step: 29930 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-15 21:27:03,869-Speed 10541.26 samples/sec Loss 11.0362 LearningRate 0.4742 Epoch: 5 Global Step: 29940 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-15 21:27:11,650-Speed 10529.36 samples/sec Loss 10.9426 LearningRate 0.4741 Epoch: 5 Global Step: 29950 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-15 21:27:19,435-Speed 10525.11 samples/sec Loss 11.0239 LearningRate 0.4740 Epoch: 5 Global Step: 29960 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-15 21:27:27,236-Speed 10502.63 samples/sec Loss 10.9342 LearningRate 0.4738 Epoch: 5 Global Step: 29970 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-15 21:27:35,030-Speed 10513.14 samples/sec Loss 10.9995 LearningRate 0.4737 Epoch: 5 Global Step: 29980 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-15 21:27:42,808-Speed 10534.00 samples/sec Loss 10.9694 LearningRate 0.4736 Epoch: 5 Global Step: 29990 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-15 21:27:50,572-Speed 10551.99 samples/sec Loss 10.9665 LearningRate 0.4735 Epoch: 5 Global Step: 30000 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-15 21:28:17,886-[lfw][30000]XNorm: 22.428017 Training: 2022-01-15 21:28:17,887-[lfw][30000]Accuracy-Flip: 0.99667+-0.00197 Training: 2022-01-15 21:28:17,888-[lfw][30000]Accuracy-Highest: 0.99667 Training: 2022-01-15 21:28:50,212-[cfp_fp][30000]XNorm: 19.699531 Training: 2022-01-15 21:28:50,212-[cfp_fp][30000]Accuracy-Flip: 0.97286+-0.00767 Training: 2022-01-15 21:28:50,213-[cfp_fp][30000]Accuracy-Highest: 0.97286 Training: 2022-01-15 21:29:18,414-[agedb_30][30000]XNorm: 21.906889 Training: 2022-01-15 21:29:18,414-[agedb_30][30000]Accuracy-Flip: 0.96250+-0.00735 Training: 2022-01-15 21:29:18,415-[agedb_30][30000]Accuracy-Highest: 0.96250 Training: 2022-01-15 21:29:26,157-Speed 857.05 samples/sec Loss 10.9846 LearningRate 0.4733 Epoch: 5 Global Step: 30010 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-15 21:29:33,896-Speed 10587.58 samples/sec Loss 11.0951 LearningRate 0.4732 Epoch: 5 Global Step: 30020 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-15 21:29:41,641-Speed 10581.82 samples/sec Loss 10.9771 LearningRate 0.4731 Epoch: 5 Global Step: 30030 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:29:49,394-Speed 10568.47 samples/sec Loss 11.0717 LearningRate 0.4729 Epoch: 5 Global Step: 30040 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:29:57,185-Speed 10515.60 samples/sec Loss 10.9423 LearningRate 0.4728 Epoch: 5 Global Step: 30050 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:30:04,949-Speed 10553.31 samples/sec Loss 10.9395 LearningRate 0.4727 Epoch: 5 Global Step: 30060 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:30:12,762-Speed 10486.10 samples/sec Loss 10.9522 LearningRate 0.4726 Epoch: 5 Global Step: 30070 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:30:20,555-Speed 10513.41 samples/sec Loss 11.0042 LearningRate 0.4724 Epoch: 5 Global Step: 30080 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:30:28,378-Speed 10473.06 samples/sec Loss 10.9720 LearningRate 0.4723 Epoch: 5 Global Step: 30090 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:30:36,188-Speed 10494.22 samples/sec Loss 10.9151 LearningRate 0.4722 Epoch: 5 Global Step: 30100 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:30:43,962-Speed 10538.63 samples/sec Loss 10.9579 LearningRate 0.4720 Epoch: 5 Global Step: 30110 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:30:51,735-Speed 10539.93 samples/sec Loss 10.8753 LearningRate 0.4719 Epoch: 5 Global Step: 30120 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:30:59,533-Speed 10506.28 samples/sec Loss 11.0052 LearningRate 0.4718 Epoch: 5 Global Step: 30130 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:31:07,307-Speed 10540.34 samples/sec Loss 11.0083 LearningRate 0.4717 Epoch: 5 Global Step: 30140 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:31:15,076-Speed 10545.66 samples/sec Loss 11.0175 LearningRate 0.4715 Epoch: 5 Global Step: 30150 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:31:22,852-Speed 10536.52 samples/sec Loss 10.9315 LearningRate 0.4714 Epoch: 5 Global Step: 30160 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:31:30,659-Speed 10493.57 samples/sec Loss 10.9745 LearningRate 0.4713 Epoch: 5 Global Step: 30170 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:31:38,426-Speed 10549.01 samples/sec Loss 10.9293 LearningRate 0.4711 Epoch: 5 Global Step: 30180 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:31:46,206-Speed 10531.21 samples/sec Loss 10.9750 LearningRate 0.4710 Epoch: 5 Global Step: 30190 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:31:53,972-Speed 10549.80 samples/sec Loss 10.9908 LearningRate 0.4709 Epoch: 5 Global Step: 30200 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:32:01,752-Speed 10529.97 samples/sec Loss 11.0327 LearningRate 0.4708 Epoch: 5 Global Step: 30210 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:32:09,526-Speed 10539.57 samples/sec Loss 11.0191 LearningRate 0.4706 Epoch: 5 Global Step: 30220 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:32:17,331-Speed 10497.72 samples/sec Loss 11.0312 LearningRate 0.4705 Epoch: 5 Global Step: 30230 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:32:25,107-Speed 10536.10 samples/sec Loss 11.0436 LearningRate 0.4704 Epoch: 5 Global Step: 30240 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:32:32,909-Speed 10501.21 samples/sec Loss 11.0398 LearningRate 0.4702 Epoch: 5 Global Step: 30250 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:32:40,684-Speed 10538.24 samples/sec Loss 10.9271 LearningRate 0.4701 Epoch: 5 Global Step: 30260 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:32:48,499-Speed 10484.05 samples/sec Loss 10.9309 LearningRate 0.4700 Epoch: 5 Global Step: 30270 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:32:56,297-Speed 10507.02 samples/sec Loss 11.0602 LearningRate 0.4699 Epoch: 5 Global Step: 30280 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:33:04,125-Speed 10466.23 samples/sec Loss 10.8343 LearningRate 0.4697 Epoch: 5 Global Step: 30290 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:33:11,902-Speed 10535.48 samples/sec Loss 10.8587 LearningRate 0.4696 Epoch: 5 Global Step: 30300 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:33:19,695-Speed 10514.06 samples/sec Loss 10.9129 LearningRate 0.4695 Epoch: 5 Global Step: 30310 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:33:27,484-Speed 10518.43 samples/sec Loss 10.9708 LearningRate 0.4694 Epoch: 5 Global Step: 30320 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:33:35,249-Speed 10551.99 samples/sec Loss 10.9819 LearningRate 0.4692 Epoch: 5 Global Step: 30330 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:33:43,036-Speed 10521.01 samples/sec Loss 10.9189 LearningRate 0.4691 Epoch: 5 Global Step: 30340 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:33:50,834-Speed 10508.08 samples/sec Loss 10.9296 LearningRate 0.4690 Epoch: 5 Global Step: 30350 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:33:58,637-Speed 10499.67 samples/sec Loss 10.9137 LearningRate 0.4688 Epoch: 5 Global Step: 30360 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:34:06,420-Speed 10526.22 samples/sec Loss 10.8951 LearningRate 0.4687 Epoch: 5 Global Step: 30370 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:34:14,222-Speed 10501.07 samples/sec Loss 10.9193 LearningRate 0.4686 Epoch: 5 Global Step: 30380 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:34:22,018-Speed 10510.20 samples/sec Loss 10.9404 LearningRate 0.4685 Epoch: 5 Global Step: 30390 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:34:29,837-Speed 10478.06 samples/sec Loss 10.9178 LearningRate 0.4683 Epoch: 5 Global Step: 30400 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:34:37,610-Speed 10540.80 samples/sec Loss 11.0210 LearningRate 0.4682 Epoch: 5 Global Step: 30410 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:34:45,409-Speed 10504.46 samples/sec Loss 10.9006 LearningRate 0.4681 Epoch: 5 Global Step: 30420 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:34:53,200-Speed 10516.16 samples/sec Loss 10.8690 LearningRate 0.4679 Epoch: 5 Global Step: 30430 Fp16 Grad Scale: 524288 Required: 16 hours Training: 2022-01-15 21:35:00,997-Speed 10509.13 samples/sec Loss 10.9334 LearningRate 0.4678 Epoch: 5 Global Step: 30440 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:35:08,799-Speed 10500.38 samples/sec Loss 10.9028 LearningRate 0.4677 Epoch: 5 Global Step: 30450 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:35:16,608-Speed 10492.06 samples/sec Loss 10.9651 LearningRate 0.4676 Epoch: 5 Global Step: 30460 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:35:24,408-Speed 10504.84 samples/sec Loss 10.9172 LearningRate 0.4674 Epoch: 5 Global Step: 30470 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:35:32,206-Speed 10507.45 samples/sec Loss 10.8901 LearningRate 0.4673 Epoch: 5 Global Step: 30480 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:35:39,979-Speed 10540.96 samples/sec Loss 11.0168 LearningRate 0.4672 Epoch: 5 Global Step: 30490 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:35:47,761-Speed 10529.14 samples/sec Loss 10.9179 LearningRate 0.4671 Epoch: 5 Global Step: 30500 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:35:55,593-Speed 10460.53 samples/sec Loss 10.8889 LearningRate 0.4669 Epoch: 5 Global Step: 30510 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:36:03,423-Speed 10463.64 samples/sec Loss 10.9255 LearningRate 0.4668 Epoch: 5 Global Step: 30520 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:36:11,230-Speed 10498.55 samples/sec Loss 10.8610 LearningRate 0.4667 Epoch: 5 Global Step: 30530 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:36:19,032-Speed 10501.92 samples/sec Loss 10.8818 LearningRate 0.4665 Epoch: 5 Global Step: 30540 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:36:26,881-Speed 10438.83 samples/sec Loss 10.9209 LearningRate 0.4664 Epoch: 5 Global Step: 30550 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:36:34,650-Speed 10545.12 samples/sec Loss 10.9706 LearningRate 0.4663 Epoch: 5 Global Step: 30560 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:36:42,465-Speed 10483.56 samples/sec Loss 10.8809 LearningRate 0.4662 Epoch: 5 Global Step: 30570 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:36:50,282-Speed 10480.88 samples/sec Loss 11.0207 LearningRate 0.4660 Epoch: 5 Global Step: 30580 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:36:58,110-Speed 10466.91 samples/sec Loss 11.0076 LearningRate 0.4659 Epoch: 5 Global Step: 30590 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:37:05,915-Speed 10497.00 samples/sec Loss 10.9248 LearningRate 0.4658 Epoch: 5 Global Step: 30600 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:37:13,700-Speed 10528.75 samples/sec Loss 10.9839 LearningRate 0.4656 Epoch: 5 Global Step: 30610 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:37:21,509-Speed 10492.12 samples/sec Loss 10.9332 LearningRate 0.4655 Epoch: 5 Global Step: 30620 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:37:29,304-Speed 10512.01 samples/sec Loss 10.8613 LearningRate 0.4654 Epoch: 5 Global Step: 30630 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:37:37,093-Speed 10518.49 samples/sec Loss 10.8972 LearningRate 0.4653 Epoch: 5 Global Step: 30640 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:37:44,870-Speed 10534.04 samples/sec Loss 10.9298 LearningRate 0.4651 Epoch: 5 Global Step: 30650 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:37:52,659-Speed 10518.57 samples/sec Loss 10.8507 LearningRate 0.4650 Epoch: 5 Global Step: 30660 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:38:00,438-Speed 10533.06 samples/sec Loss 10.9003 LearningRate 0.4649 Epoch: 5 Global Step: 30670 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:38:08,205-Speed 10547.37 samples/sec Loss 10.8423 LearningRate 0.4648 Epoch: 5 Global Step: 30680 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-15 21:38:16,024-Speed 10478.89 samples/sec Loss 10.9671 LearningRate 0.4646 Epoch: 5 Global Step: 30690 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-15 21:38:23,864-Speed 10450.66 samples/sec Loss 10.8745 LearningRate 0.4645 Epoch: 5 Global Step: 30700 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-15 21:38:31,650-Speed 10523.53 samples/sec Loss 10.9129 LearningRate 0.4644 Epoch: 5 Global Step: 30710 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-15 21:38:39,442-Speed 10514.70 samples/sec Loss 10.9318 LearningRate 0.4642 Epoch: 5 Global Step: 30720 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-15 21:38:47,221-Speed 10532.02 samples/sec Loss 10.9396 LearningRate 0.4641 Epoch: 5 Global Step: 30730 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-15 21:38:55,006-Speed 10523.60 samples/sec Loss 10.9732 LearningRate 0.4640 Epoch: 5 Global Step: 30740 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-15 21:39:02,774-Speed 10547.90 samples/sec Loss 10.9020 LearningRate 0.4639 Epoch: 5 Global Step: 30750 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-15 21:39:10,554-Speed 10530.89 samples/sec Loss 10.7753 LearningRate 0.4637 Epoch: 5 Global Step: 30760 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-15 21:39:18,339-Speed 10524.26 samples/sec Loss 10.8174 LearningRate 0.4636 Epoch: 5 Global Step: 30770 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-15 21:39:26,110-Speed 10543.72 samples/sec Loss 10.8951 LearningRate 0.4635 Epoch: 5 Global Step: 30780 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:39:33,935-Speed 10471.37 samples/sec Loss 10.8624 LearningRate 0.4634 Epoch: 5 Global Step: 30790 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:39:41,757-Speed 10473.65 samples/sec Loss 10.9435 LearningRate 0.4632 Epoch: 5 Global Step: 30800 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:39:49,591-Speed 10458.15 samples/sec Loss 10.8911 LearningRate 0.4631 Epoch: 5 Global Step: 30810 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:39:57,377-Speed 10522.60 samples/sec Loss 10.9234 LearningRate 0.4630 Epoch: 5 Global Step: 30820 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:40:05,155-Speed 10534.28 samples/sec Loss 10.7841 LearningRate 0.4629 Epoch: 5 Global Step: 30830 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:40:12,956-Speed 10505.91 samples/sec Loss 10.8306 LearningRate 0.4627 Epoch: 5 Global Step: 30840 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:40:20,752-Speed 10510.38 samples/sec Loss 10.8382 LearningRate 0.4626 Epoch: 5 Global Step: 30850 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:40:28,562-Speed 10490.74 samples/sec Loss 10.8856 LearningRate 0.4625 Epoch: 5 Global Step: 30860 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:40:36,361-Speed 10504.05 samples/sec Loss 10.8876 LearningRate 0.4623 Epoch: 5 Global Step: 30870 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:40:44,181-Speed 10477.30 samples/sec Loss 10.9080 LearningRate 0.4622 Epoch: 5 Global Step: 30880 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:40:52,003-Speed 10475.64 samples/sec Loss 10.9218 LearningRate 0.4621 Epoch: 5 Global Step: 30890 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:40:59,810-Speed 10493.87 samples/sec Loss 10.9234 LearningRate 0.4620 Epoch: 5 Global Step: 30900 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:41:07,591-Speed 10529.33 samples/sec Loss 10.8804 LearningRate 0.4618 Epoch: 5 Global Step: 30910 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:41:15,390-Speed 10505.54 samples/sec Loss 10.8417 LearningRate 0.4617 Epoch: 5 Global Step: 30920 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:41:23,176-Speed 10523.27 samples/sec Loss 10.8578 LearningRate 0.4616 Epoch: 5 Global Step: 30930 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:41:30,959-Speed 10526.40 samples/sec Loss 10.7944 LearningRate 0.4615 Epoch: 5 Global Step: 30940 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:41:38,752-Speed 10518.09 samples/sec Loss 11.1091 LearningRate 0.4613 Epoch: 5 Global Step: 30950 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:41:46,574-Speed 10474.79 samples/sec Loss 10.9174 LearningRate 0.4612 Epoch: 5 Global Step: 30960 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:41:54,367-Speed 10513.03 samples/sec Loss 10.8765 LearningRate 0.4611 Epoch: 5 Global Step: 30970 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:42:02,141-Speed 10538.48 samples/sec Loss 10.7882 LearningRate 0.4609 Epoch: 5 Global Step: 30980 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:42:09,963-Speed 10475.00 samples/sec Loss 10.8068 LearningRate 0.4608 Epoch: 5 Global Step: 30990 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:42:17,777-Speed 10485.30 samples/sec Loss 10.8720 LearningRate 0.4607 Epoch: 5 Global Step: 31000 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:42:25,575-Speed 10506.05 samples/sec Loss 10.8801 LearningRate 0.4606 Epoch: 5 Global Step: 31010 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:42:33,357-Speed 10529.05 samples/sec Loss 10.8804 LearningRate 0.4604 Epoch: 5 Global Step: 31020 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:42:41,149-Speed 10516.13 samples/sec Loss 10.7619 LearningRate 0.4603 Epoch: 5 Global Step: 31030 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:42:48,929-Speed 10531.46 samples/sec Loss 10.8619 LearningRate 0.4602 Epoch: 5 Global Step: 31040 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:42:56,740-Speed 10488.44 samples/sec Loss 10.8580 LearningRate 0.4601 Epoch: 5 Global Step: 31050 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:43:04,531-Speed 10516.67 samples/sec Loss 10.8430 LearningRate 0.4599 Epoch: 5 Global Step: 31060 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:43:12,325-Speed 10513.10 samples/sec Loss 10.8462 LearningRate 0.4598 Epoch: 5 Global Step: 31070 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:43:20,097-Speed 10541.84 samples/sec Loss 10.8702 LearningRate 0.4597 Epoch: 5 Global Step: 31080 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:43:27,925-Speed 10466.19 samples/sec Loss 10.9111 LearningRate 0.4596 Epoch: 5 Global Step: 31090 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:43:35,746-Speed 10477.70 samples/sec Loss 10.9134 LearningRate 0.4594 Epoch: 5 Global Step: 31100 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:43:43,568-Speed 10474.02 samples/sec Loss 10.7533 LearningRate 0.4593 Epoch: 5 Global Step: 31110 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:44:06,823-Speed 3522.84 samples/sec Loss 10.7530 LearningRate 0.4592 Epoch: 6 Global Step: 31120 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:44:14,613-Speed 10518.15 samples/sec Loss 10.7925 LearningRate 0.4590 Epoch: 6 Global Step: 31130 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:44:22,371-Speed 10562.04 samples/sec Loss 10.8982 LearningRate 0.4589 Epoch: 6 Global Step: 31140 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:44:30,133-Speed 10555.43 samples/sec Loss 10.8596 LearningRate 0.4588 Epoch: 6 Global Step: 31150 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:44:37,916-Speed 10527.23 samples/sec Loss 10.8835 LearningRate 0.4587 Epoch: 6 Global Step: 31160 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:44:45,702-Speed 10522.07 samples/sec Loss 10.7393 LearningRate 0.4585 Epoch: 6 Global Step: 31170 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:44:53,505-Speed 10500.02 samples/sec Loss 10.7237 LearningRate 0.4584 Epoch: 6 Global Step: 31180 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:45:01,286-Speed 10529.67 samples/sec Loss 10.7982 LearningRate 0.4583 Epoch: 6 Global Step: 31190 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:45:09,084-Speed 10506.57 samples/sec Loss 10.7044 LearningRate 0.4582 Epoch: 6 Global Step: 31200 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:45:16,892-Speed 10493.13 samples/sec Loss 10.8499 LearningRate 0.4580 Epoch: 6 Global Step: 31210 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:45:24,697-Speed 10497.11 samples/sec Loss 10.8678 LearningRate 0.4579 Epoch: 6 Global Step: 31220 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:45:32,531-Speed 10458.15 samples/sec Loss 10.7444 LearningRate 0.4578 Epoch: 6 Global Step: 31230 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:45:40,314-Speed 10527.45 samples/sec Loss 10.7924 LearningRate 0.4577 Epoch: 6 Global Step: 31240 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:45:48,109-Speed 10510.39 samples/sec Loss 10.7829 LearningRate 0.4575 Epoch: 6 Global Step: 31250 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:45:56,006-Speed 10375.58 samples/sec Loss 10.7445 LearningRate 0.4574 Epoch: 6 Global Step: 31260 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:46:03,812-Speed 10495.33 samples/sec Loss 10.9208 LearningRate 0.4573 Epoch: 6 Global Step: 31270 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:46:11,655-Speed 10446.84 samples/sec Loss 10.7622 LearningRate 0.4571 Epoch: 6 Global Step: 31280 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:46:19,469-Speed 10485.46 samples/sec Loss 10.7911 LearningRate 0.4570 Epoch: 6 Global Step: 31290 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:46:27,306-Speed 10453.98 samples/sec Loss 10.8759 LearningRate 0.4569 Epoch: 6 Global Step: 31300 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:46:35,129-Speed 10473.16 samples/sec Loss 10.8818 LearningRate 0.4568 Epoch: 6 Global Step: 31310 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:46:42,946-Speed 10480.69 samples/sec Loss 10.8510 LearningRate 0.4566 Epoch: 6 Global Step: 31320 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:46:50,732-Speed 10523.43 samples/sec Loss 10.7983 LearningRate 0.4565 Epoch: 6 Global Step: 31330 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:46:58,519-Speed 10520.96 samples/sec Loss 10.7771 LearningRate 0.4564 Epoch: 6 Global Step: 31340 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:47:06,307-Speed 10520.45 samples/sec Loss 10.7348 LearningRate 0.4563 Epoch: 6 Global Step: 31350 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:47:14,091-Speed 10526.58 samples/sec Loss 10.7398 LearningRate 0.4561 Epoch: 6 Global Step: 31360 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:47:21,873-Speed 10529.42 samples/sec Loss 10.8142 LearningRate 0.4560 Epoch: 6 Global Step: 31370 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:47:29,688-Speed 10483.71 samples/sec Loss 10.7490 LearningRate 0.4559 Epoch: 6 Global Step: 31380 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:47:37,478-Speed 10519.27 samples/sec Loss 10.8543 LearningRate 0.4558 Epoch: 6 Global Step: 31390 Fp16 Grad Scale: 524288 Required: 16 hours Training: 2022-01-15 21:47:45,262-Speed 10524.04 samples/sec Loss 10.8129 LearningRate 0.4556 Epoch: 6 Global Step: 31400 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:47:53,078-Speed 10482.32 samples/sec Loss 10.7868 LearningRate 0.4555 Epoch: 6 Global Step: 31410 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:48:00,858-Speed 10531.60 samples/sec Loss 10.7623 LearningRate 0.4554 Epoch: 6 Global Step: 31420 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:48:08,687-Speed 10465.76 samples/sec Loss 10.7286 LearningRate 0.4553 Epoch: 6 Global Step: 31430 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:48:16,481-Speed 10511.64 samples/sec Loss 10.8896 LearningRate 0.4551 Epoch: 6 Global Step: 31440 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:48:24,287-Speed 10496.59 samples/sec Loss 10.8060 LearningRate 0.4550 Epoch: 6 Global Step: 31450 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:48:32,097-Speed 10489.78 samples/sec Loss 10.7673 LearningRate 0.4549 Epoch: 6 Global Step: 31460 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:48:39,899-Speed 10502.89 samples/sec Loss 10.7485 LearningRate 0.4548 Epoch: 6 Global Step: 31470 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:48:47,689-Speed 10516.69 samples/sec Loss 10.7806 LearningRate 0.4546 Epoch: 6 Global Step: 31480 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:48:55,537-Speed 10440.34 samples/sec Loss 10.7949 LearningRate 0.4545 Epoch: 6 Global Step: 31490 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:49:03,335-Speed 10505.81 samples/sec Loss 10.8297 LearningRate 0.4544 Epoch: 6 Global Step: 31500 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:49:11,153-Speed 10479.99 samples/sec Loss 10.8311 LearningRate 0.4542 Epoch: 6 Global Step: 31510 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:49:19,015-Speed 10421.23 samples/sec Loss 10.7614 LearningRate 0.4541 Epoch: 6 Global Step: 31520 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:49:26,837-Speed 10474.67 samples/sec Loss 10.7803 LearningRate 0.4540 Epoch: 6 Global Step: 31530 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:49:34,656-Speed 10479.22 samples/sec Loss 10.7920 LearningRate 0.4539 Epoch: 6 Global Step: 31540 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:49:42,489-Speed 10459.34 samples/sec Loss 10.8628 LearningRate 0.4537 Epoch: 6 Global Step: 31550 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:49:50,344-Speed 10430.43 samples/sec Loss 10.8298 LearningRate 0.4536 Epoch: 6 Global Step: 31560 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:49:58,190-Speed 10441.93 samples/sec Loss 10.6995 LearningRate 0.4535 Epoch: 6 Global Step: 31570 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:50:06,053-Speed 10420.89 samples/sec Loss 10.7911 LearningRate 0.4534 Epoch: 6 Global Step: 31580 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:50:13,884-Speed 10462.18 samples/sec Loss 10.7751 LearningRate 0.4532 Epoch: 6 Global Step: 31590 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:50:21,725-Speed 10449.80 samples/sec Loss 10.7903 LearningRate 0.4531 Epoch: 6 Global Step: 31600 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:50:29,590-Speed 10416.09 samples/sec Loss 10.7500 LearningRate 0.4530 Epoch: 6 Global Step: 31610 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:50:37,467-Speed 10402.32 samples/sec Loss 10.7820 LearningRate 0.4529 Epoch: 6 Global Step: 31620 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:50:45,295-Speed 10466.53 samples/sec Loss 10.6809 LearningRate 0.4527 Epoch: 6 Global Step: 31630 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:50:53,110-Speed 10482.63 samples/sec Loss 10.8306 LearningRate 0.4526 Epoch: 6 Global Step: 31640 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:51:00,927-Speed 10481.22 samples/sec Loss 10.7179 LearningRate 0.4525 Epoch: 6 Global Step: 31650 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:51:08,756-Speed 10465.19 samples/sec Loss 10.7535 LearningRate 0.4524 Epoch: 6 Global Step: 31660 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:51:16,587-Speed 10462.51 samples/sec Loss 10.7084 LearningRate 0.4522 Epoch: 6 Global Step: 31670 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:51:24,418-Speed 10462.42 samples/sec Loss 10.8269 LearningRate 0.4521 Epoch: 6 Global Step: 31680 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:51:32,244-Speed 10469.13 samples/sec Loss 10.7442 LearningRate 0.4520 Epoch: 6 Global Step: 31690 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:51:40,069-Speed 10470.32 samples/sec Loss 10.6782 LearningRate 0.4519 Epoch: 6 Global Step: 31700 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:51:47,921-Speed 10433.94 samples/sec Loss 10.7736 LearningRate 0.4517 Epoch: 6 Global Step: 31710 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:51:55,738-Speed 10481.74 samples/sec Loss 10.7615 LearningRate 0.4516 Epoch: 6 Global Step: 31720 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:52:03,558-Speed 10476.78 samples/sec Loss 10.7317 LearningRate 0.4515 Epoch: 6 Global Step: 31730 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:52:11,394-Speed 10455.88 samples/sec Loss 10.7569 LearningRate 0.4514 Epoch: 6 Global Step: 31740 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:52:19,212-Speed 10479.66 samples/sec Loss 10.6814 LearningRate 0.4512 Epoch: 6 Global Step: 31750 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:52:27,031-Speed 10478.08 samples/sec Loss 10.7136 LearningRate 0.4511 Epoch: 6 Global Step: 31760 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:52:34,859-Speed 10469.59 samples/sec Loss 10.7331 LearningRate 0.4510 Epoch: 6 Global Step: 31770 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:52:42,694-Speed 10456.71 samples/sec Loss 10.6429 LearningRate 0.4509 Epoch: 6 Global Step: 31780 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:52:50,538-Speed 10445.29 samples/sec Loss 10.7070 LearningRate 0.4507 Epoch: 6 Global Step: 31790 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:52:58,389-Speed 10436.77 samples/sec Loss 10.8842 LearningRate 0.4506 Epoch: 6 Global Step: 31800 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:53:06,229-Speed 10449.33 samples/sec Loss 10.8185 LearningRate 0.4505 Epoch: 6 Global Step: 31810 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:53:14,074-Speed 10443.59 samples/sec Loss 10.7243 LearningRate 0.4504 Epoch: 6 Global Step: 31820 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:53:21,943-Speed 10413.17 samples/sec Loss 10.6901 LearningRate 0.4502 Epoch: 6 Global Step: 31830 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:53:29,750-Speed 10494.68 samples/sec Loss 10.6922 LearningRate 0.4501 Epoch: 6 Global Step: 31840 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:53:37,566-Speed 10480.89 samples/sec Loss 10.7757 LearningRate 0.4500 Epoch: 6 Global Step: 31850 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:53:45,389-Speed 10473.81 samples/sec Loss 10.7258 LearningRate 0.4499 Epoch: 6 Global Step: 31860 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:53:53,272-Speed 10394.82 samples/sec Loss 10.7366 LearningRate 0.4497 Epoch: 6 Global Step: 31870 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:54:01,074-Speed 10500.80 samples/sec Loss 10.7269 LearningRate 0.4496 Epoch: 6 Global Step: 31880 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:54:08,858-Speed 10525.39 samples/sec Loss 10.6844 LearningRate 0.4495 Epoch: 6 Global Step: 31890 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:54:16,651-Speed 10513.04 samples/sec Loss 10.7001 LearningRate 0.4494 Epoch: 6 Global Step: 31900 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:54:24,432-Speed 10529.12 samples/sec Loss 10.7429 LearningRate 0.4492 Epoch: 6 Global Step: 31910 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:54:32,224-Speed 10522.76 samples/sec Loss 10.7876 LearningRate 0.4491 Epoch: 6 Global Step: 31920 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:54:40,006-Speed 10528.40 samples/sec Loss 10.7588 LearningRate 0.4490 Epoch: 6 Global Step: 31930 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:54:47,795-Speed 10519.54 samples/sec Loss 10.6852 LearningRate 0.4489 Epoch: 6 Global Step: 31940 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:54:55,580-Speed 10523.65 samples/sec Loss 10.7627 LearningRate 0.4487 Epoch: 6 Global Step: 31950 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:55:03,357-Speed 10534.76 samples/sec Loss 10.6946 LearningRate 0.4486 Epoch: 6 Global Step: 31960 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:55:11,149-Speed 10520.33 samples/sec Loss 10.6570 LearningRate 0.4485 Epoch: 6 Global Step: 31970 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:55:18,931-Speed 10528.84 samples/sec Loss 10.6622 LearningRate 0.4484 Epoch: 6 Global Step: 31980 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:55:26,715-Speed 10524.50 samples/sec Loss 10.7156 LearningRate 0.4482 Epoch: 6 Global Step: 31990 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:55:34,513-Speed 10510.30 samples/sec Loss 10.7195 LearningRate 0.4481 Epoch: 6 Global Step: 32000 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:55:42,291-Speed 10533.97 samples/sec Loss 10.6806 LearningRate 0.4480 Epoch: 6 Global Step: 32010 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:55:50,064-Speed 10540.10 samples/sec Loss 10.6963 LearningRate 0.4479 Epoch: 6 Global Step: 32020 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:55:57,881-Speed 10481.87 samples/sec Loss 10.6396 LearningRate 0.4477 Epoch: 6 Global Step: 32030 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:56:05,692-Speed 10489.12 samples/sec Loss 10.8020 LearningRate 0.4476 Epoch: 6 Global Step: 32040 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:56:13,510-Speed 10479.13 samples/sec Loss 10.6672 LearningRate 0.4475 Epoch: 6 Global Step: 32050 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:56:21,314-Speed 10499.37 samples/sec Loss 10.7686 LearningRate 0.4474 Epoch: 6 Global Step: 32060 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:56:29,111-Speed 10508.11 samples/sec Loss 10.7190 LearningRate 0.4472 Epoch: 6 Global Step: 32070 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:56:36,928-Speed 10481.18 samples/sec Loss 10.7127 LearningRate 0.4471 Epoch: 6 Global Step: 32080 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:56:44,725-Speed 10507.87 samples/sec Loss 10.5985 LearningRate 0.4470 Epoch: 6 Global Step: 32090 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:56:52,506-Speed 10530.23 samples/sec Loss 10.6821 LearningRate 0.4469 Epoch: 6 Global Step: 32100 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:57:00,320-Speed 10485.31 samples/sec Loss 10.6975 LearningRate 0.4467 Epoch: 6 Global Step: 32110 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:57:08,109-Speed 10518.46 samples/sec Loss 10.7166 LearningRate 0.4466 Epoch: 6 Global Step: 32120 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:57:15,939-Speed 10464.05 samples/sec Loss 10.6826 LearningRate 0.4465 Epoch: 6 Global Step: 32130 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:57:23,738-Speed 10504.32 samples/sec Loss 10.6898 LearningRate 0.4464 Epoch: 6 Global Step: 32140 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:57:31,530-Speed 10514.84 samples/sec Loss 10.6852 LearningRate 0.4462 Epoch: 6 Global Step: 32150 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:57:39,323-Speed 10514.21 samples/sec Loss 10.7105 LearningRate 0.4461 Epoch: 6 Global Step: 32160 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:57:47,114-Speed 10515.52 samples/sec Loss 10.6503 LearningRate 0.4460 Epoch: 6 Global Step: 32170 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:57:54,886-Speed 10541.72 samples/sec Loss 10.6319 LearningRate 0.4459 Epoch: 6 Global Step: 32180 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:58:02,682-Speed 10510.02 samples/sec Loss 10.6348 LearningRate 0.4457 Epoch: 6 Global Step: 32190 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:58:10,472-Speed 10518.02 samples/sec Loss 10.6172 LearningRate 0.4456 Epoch: 6 Global Step: 32200 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:58:18,262-Speed 10517.06 samples/sec Loss 10.6446 LearningRate 0.4455 Epoch: 6 Global Step: 32210 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:58:26,073-Speed 10489.14 samples/sec Loss 10.7371 LearningRate 0.4454 Epoch: 6 Global Step: 32220 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:58:33,874-Speed 10502.24 samples/sec Loss 10.6640 LearningRate 0.4452 Epoch: 6 Global Step: 32230 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:58:41,695-Speed 10476.78 samples/sec Loss 10.6456 LearningRate 0.4451 Epoch: 6 Global Step: 32240 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:58:49,476-Speed 10528.90 samples/sec Loss 10.7356 LearningRate 0.4450 Epoch: 6 Global Step: 32250 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:58:57,261-Speed 10523.67 samples/sec Loss 10.7428 LearningRate 0.4449 Epoch: 6 Global Step: 32260 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:59:05,074-Speed 10486.66 samples/sec Loss 10.6016 LearningRate 0.4447 Epoch: 6 Global Step: 32270 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:59:12,859-Speed 10524.01 samples/sec Loss 10.6540 LearningRate 0.4446 Epoch: 6 Global Step: 32280 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:59:20,641-Speed 10529.22 samples/sec Loss 10.7345 LearningRate 0.4445 Epoch: 6 Global Step: 32290 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:59:28,428-Speed 10520.82 samples/sec Loss 10.7116 LearningRate 0.4444 Epoch: 6 Global Step: 32300 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:59:36,210-Speed 10528.88 samples/sec Loss 10.7053 LearningRate 0.4442 Epoch: 6 Global Step: 32310 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:59:43,996-Speed 10522.74 samples/sec Loss 10.7840 LearningRate 0.4441 Epoch: 6 Global Step: 32320 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 21:59:51,806-Speed 10490.02 samples/sec Loss 10.6756 LearningRate 0.4440 Epoch: 6 Global Step: 32330 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 21:59:59,595-Speed 10519.13 samples/sec Loss 10.5766 LearningRate 0.4439 Epoch: 6 Global Step: 32340 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 22:00:07,382-Speed 10521.82 samples/sec Loss 10.6444 LearningRate 0.4437 Epoch: 6 Global Step: 32350 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:00:15,170-Speed 10520.32 samples/sec Loss 10.7189 LearningRate 0.4436 Epoch: 6 Global Step: 32360 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:00:22,977-Speed 10494.52 samples/sec Loss 10.7001 LearningRate 0.4435 Epoch: 6 Global Step: 32370 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:00:30,775-Speed 10507.39 samples/sec Loss 10.6181 LearningRate 0.4434 Epoch: 6 Global Step: 32380 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:00:38,586-Speed 10489.14 samples/sec Loss 10.6366 LearningRate 0.4432 Epoch: 6 Global Step: 32390 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:00:46,361-Speed 10537.57 samples/sec Loss 10.6775 LearningRate 0.4431 Epoch: 6 Global Step: 32400 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:00:54,184-Speed 10472.49 samples/sec Loss 10.6734 LearningRate 0.4430 Epoch: 6 Global Step: 32410 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-15 22:01:01,967-Speed 10527.11 samples/sec Loss 10.6276 LearningRate 0.4429 Epoch: 6 Global Step: 32420 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-15 22:01:09,788-Speed 10476.03 samples/sec Loss 10.6395 LearningRate 0.4427 Epoch: 6 Global Step: 32430 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-15 22:01:17,566-Speed 10533.51 samples/sec Loss 10.6435 LearningRate 0.4426 Epoch: 6 Global Step: 32440 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-15 22:01:25,352-Speed 10523.13 samples/sec Loss 10.6323 LearningRate 0.4425 Epoch: 6 Global Step: 32450 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-15 22:01:33,124-Speed 10542.37 samples/sec Loss 10.6749 LearningRate 0.4424 Epoch: 6 Global Step: 32460 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-15 22:01:40,911-Speed 10521.20 samples/sec Loss 10.6330 LearningRate 0.4422 Epoch: 6 Global Step: 32470 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-15 22:01:48,749-Speed 10455.95 samples/sec Loss 10.7165 LearningRate 0.4421 Epoch: 6 Global Step: 32480 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-15 22:01:56,541-Speed 10514.08 samples/sec Loss 10.6292 LearningRate 0.4420 Epoch: 6 Global Step: 32490 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-15 22:02:04,345-Speed 10499.11 samples/sec Loss 10.6677 LearningRate 0.4419 Epoch: 6 Global Step: 32500 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-01-15 22:02:12,138-Speed 10513.22 samples/sec Loss 10.6454 LearningRate 0.4417 Epoch: 6 Global Step: 32510 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:02:19,941-Speed 10499.91 samples/sec Loss 10.6312 LearningRate 0.4416 Epoch: 6 Global Step: 32520 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:02:27,733-Speed 10513.97 samples/sec Loss 10.6374 LearningRate 0.4415 Epoch: 6 Global Step: 32530 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:02:35,537-Speed 10499.08 samples/sec Loss 10.6232 LearningRate 0.4414 Epoch: 6 Global Step: 32540 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:02:43,333-Speed 10509.49 samples/sec Loss 10.6333 LearningRate 0.4413 Epoch: 6 Global Step: 32550 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:02:51,115-Speed 10527.86 samples/sec Loss 10.5883 LearningRate 0.4411 Epoch: 6 Global Step: 32560 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:02:58,921-Speed 10496.94 samples/sec Loss 10.7083 LearningRate 0.4410 Epoch: 6 Global Step: 32570 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:03:06,706-Speed 10523.82 samples/sec Loss 10.6108 LearningRate 0.4409 Epoch: 6 Global Step: 32580 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:03:14,505-Speed 10505.00 samples/sec Loss 10.7116 LearningRate 0.4408 Epoch: 6 Global Step: 32590 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:03:22,314-Speed 10491.40 samples/sec Loss 10.6583 LearningRate 0.4406 Epoch: 6 Global Step: 32600 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:03:30,142-Speed 10468.06 samples/sec Loss 10.6008 LearningRate 0.4405 Epoch: 6 Global Step: 32610 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 22:03:37,952-Speed 10490.16 samples/sec Loss 10.6295 LearningRate 0.4404 Epoch: 6 Global Step: 32620 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 22:03:45,744-Speed 10514.14 samples/sec Loss 10.5695 LearningRate 0.4403 Epoch: 6 Global Step: 32630 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 22:03:53,555-Speed 10490.17 samples/sec Loss 10.6201 LearningRate 0.4401 Epoch: 6 Global Step: 32640 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 22:04:01,414-Speed 10425.77 samples/sec Loss 10.6726 LearningRate 0.4400 Epoch: 6 Global Step: 32650 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 22:04:09,215-Speed 10503.54 samples/sec Loss 10.6730 LearningRate 0.4399 Epoch: 6 Global Step: 32660 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 22:04:17,014-Speed 10505.46 samples/sec Loss 10.6851 LearningRate 0.4398 Epoch: 6 Global Step: 32670 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 22:04:24,827-Speed 10486.76 samples/sec Loss 10.6245 LearningRate 0.4396 Epoch: 6 Global Step: 32680 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 22:04:32,639-Speed 10488.50 samples/sec Loss 10.5113 LearningRate 0.4395 Epoch: 6 Global Step: 32690 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 22:04:40,431-Speed 10513.57 samples/sec Loss 10.5455 LearningRate 0.4394 Epoch: 6 Global Step: 32700 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 22:04:48,211-Speed 10531.62 samples/sec Loss 10.5476 LearningRate 0.4393 Epoch: 6 Global Step: 32710 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:04:56,014-Speed 10499.34 samples/sec Loss 10.5654 LearningRate 0.4391 Epoch: 6 Global Step: 32720 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:05:03,817-Speed 10500.86 samples/sec Loss 10.5487 LearningRate 0.4390 Epoch: 6 Global Step: 32730 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:05:11,642-Speed 10470.33 samples/sec Loss 10.6973 LearningRate 0.4389 Epoch: 6 Global Step: 32740 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:05:19,439-Speed 10507.97 samples/sec Loss 10.6353 LearningRate 0.4388 Epoch: 6 Global Step: 32750 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:05:27,227-Speed 10520.30 samples/sec Loss 10.5988 LearningRate 0.4387 Epoch: 6 Global Step: 32760 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:05:35,030-Speed 10499.21 samples/sec Loss 10.5538 LearningRate 0.4385 Epoch: 6 Global Step: 32770 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:05:42,804-Speed 10539.69 samples/sec Loss 10.5515 LearningRate 0.4384 Epoch: 6 Global Step: 32780 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:05:50,629-Speed 10469.83 samples/sec Loss 10.6289 LearningRate 0.4383 Epoch: 6 Global Step: 32790 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:05:58,414-Speed 10524.12 samples/sec Loss 10.5462 LearningRate 0.4382 Epoch: 6 Global Step: 32800 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:06:06,194-Speed 10531.38 samples/sec Loss 10.6167 LearningRate 0.4380 Epoch: 6 Global Step: 32810 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 22:06:14,016-Speed 10474.65 samples/sec Loss 10.4915 LearningRate 0.4379 Epoch: 6 Global Step: 32820 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 22:06:21,826-Speed 10489.67 samples/sec Loss 10.6343 LearningRate 0.4378 Epoch: 6 Global Step: 32830 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 22:06:29,610-Speed 10526.63 samples/sec Loss 10.6528 LearningRate 0.4377 Epoch: 6 Global Step: 32840 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 22:06:37,431-Speed 10475.26 samples/sec Loss 10.5414 LearningRate 0.4375 Epoch: 6 Global Step: 32850 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 22:06:45,233-Speed 10500.75 samples/sec Loss 10.6421 LearningRate 0.4374 Epoch: 6 Global Step: 32860 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 22:06:53,026-Speed 10513.43 samples/sec Loss 10.5232 LearningRate 0.4373 Epoch: 6 Global Step: 32870 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 22:07:00,830-Speed 10498.55 samples/sec Loss 10.5911 LearningRate 0.4372 Epoch: 6 Global Step: 32880 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 22:07:08,624-Speed 10512.73 samples/sec Loss 10.5261 LearningRate 0.4370 Epoch: 6 Global Step: 32890 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 22:07:16,405-Speed 10530.16 samples/sec Loss 10.6160 LearningRate 0.4369 Epoch: 6 Global Step: 32900 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:07:24,228-Speed 10472.26 samples/sec Loss 10.6107 LearningRate 0.4368 Epoch: 6 Global Step: 32910 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:07:32,049-Speed 10476.14 samples/sec Loss 10.5099 LearningRate 0.4367 Epoch: 6 Global Step: 32920 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:07:39,824-Speed 10538.29 samples/sec Loss 10.5937 LearningRate 0.4366 Epoch: 6 Global Step: 32930 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:07:47,621-Speed 10507.38 samples/sec Loss 10.5000 LearningRate 0.4364 Epoch: 6 Global Step: 32940 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:07:55,450-Speed 10465.40 samples/sec Loss 10.5598 LearningRate 0.4363 Epoch: 6 Global Step: 32950 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:08:03,224-Speed 10539.17 samples/sec Loss 10.6460 LearningRate 0.4362 Epoch: 6 Global Step: 32960 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:08:11,023-Speed 10505.52 samples/sec Loss 10.6313 LearningRate 0.4361 Epoch: 6 Global Step: 32970 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:08:18,857-Speed 10457.80 samples/sec Loss 10.5221 LearningRate 0.4359 Epoch: 6 Global Step: 32980 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:08:26,664-Speed 10494.69 samples/sec Loss 10.6902 LearningRate 0.4358 Epoch: 6 Global Step: 32990 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:08:34,476-Speed 10489.01 samples/sec Loss 10.5769 LearningRate 0.4357 Epoch: 6 Global Step: 33000 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 22:08:42,281-Speed 10497.50 samples/sec Loss 10.6221 LearningRate 0.4356 Epoch: 6 Global Step: 33010 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 22:08:50,068-Speed 10521.08 samples/sec Loss 10.5591 LearningRate 0.4354 Epoch: 6 Global Step: 33020 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 22:08:57,857-Speed 10518.91 samples/sec Loss 10.4766 LearningRate 0.4353 Epoch: 6 Global Step: 33030 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 22:09:05,639-Speed 10528.42 samples/sec Loss 10.4916 LearningRate 0.4352 Epoch: 6 Global Step: 33040 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 22:09:13,414-Speed 10537.58 samples/sec Loss 10.4823 LearningRate 0.4351 Epoch: 6 Global Step: 33050 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 22:09:21,215-Speed 10502.77 samples/sec Loss 10.5944 LearningRate 0.4349 Epoch: 6 Global Step: 33060 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 22:09:29,015-Speed 10504.44 samples/sec Loss 10.5560 LearningRate 0.4348 Epoch: 6 Global Step: 33070 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 22:09:36,812-Speed 10508.32 samples/sec Loss 10.4854 LearningRate 0.4347 Epoch: 6 Global Step: 33080 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 22:09:44,597-Speed 10524.34 samples/sec Loss 10.5458 LearningRate 0.4346 Epoch: 6 Global Step: 33090 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 22:09:52,396-Speed 10504.59 samples/sec Loss 10.4890 LearningRate 0.4345 Epoch: 6 Global Step: 33100 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 22:10:00,218-Speed 10475.06 samples/sec Loss 10.5854 LearningRate 0.4343 Epoch: 6 Global Step: 33110 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 22:10:08,008-Speed 10517.08 samples/sec Loss 10.4440 LearningRate 0.4342 Epoch: 6 Global Step: 33120 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 22:10:15,791-Speed 10527.39 samples/sec Loss 10.4674 LearningRate 0.4341 Epoch: 6 Global Step: 33130 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-01-15 22:10:23,586-Speed 10511.52 samples/sec Loss 10.4409 LearningRate 0.4340 Epoch: 6 Global Step: 33140 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:10:31,369-Speed 10526.84 samples/sec Loss 10.5370 LearningRate 0.4338 Epoch: 6 Global Step: 33150 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:10:39,179-Speed 10489.39 samples/sec Loss 10.6172 LearningRate 0.4337 Epoch: 6 Global Step: 33160 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:10:47,002-Speed 10474.16 samples/sec Loss 10.6226 LearningRate 0.4336 Epoch: 6 Global Step: 33170 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:10:54,793-Speed 10521.08 samples/sec Loss 10.5520 LearningRate 0.4335 Epoch: 6 Global Step: 33180 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:11:02,599-Speed 10495.63 samples/sec Loss 10.5688 LearningRate 0.4333 Epoch: 6 Global Step: 33190 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:11:10,398-Speed 10504.87 samples/sec Loss 10.4723 LearningRate 0.4332 Epoch: 6 Global Step: 33200 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-01-15 22:11:18,203-Speed 10497.05 samples/sec Loss 10.4654 LearningRate 0.4331 Epoch: 6 Global Step: 33210 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:11:29,891-Speed 10539.31 samples/sec Loss 10.5022 LearningRate 0.4330 Epoch: 6 Global Step: 33220 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:11:37,657-Speed 10553.87 samples/sec Loss 10.4975 LearningRate 0.4329 Epoch: 6 Global Step: 33230 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:11:45,450-Speed 10512.37 samples/sec Loss 10.4904 LearningRate 0.4327 Epoch: 6 Global Step: 33240 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:11:53,276-Speed 10469.33 samples/sec Loss 10.4851 LearningRate 0.4326 Epoch: 6 Global Step: 33250 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:12:01,082-Speed 10496.40 samples/sec Loss 10.5133 LearningRate 0.4325 Epoch: 6 Global Step: 33260 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:12:08,869-Speed 10521.16 samples/sec Loss 10.5299 LearningRate 0.4324 Epoch: 6 Global Step: 33270 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:12:16,689-Speed 10480.33 samples/sec Loss 10.5138 LearningRate 0.4322 Epoch: 6 Global Step: 33280 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:12:24,481-Speed 10519.97 samples/sec Loss 10.5127 LearningRate 0.4321 Epoch: 6 Global Step: 33290 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:12:32,283-Speed 10502.58 samples/sec Loss 10.5040 LearningRate 0.4320 Epoch: 6 Global Step: 33300 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:12:40,085-Speed 10500.18 samples/sec Loss 10.5539 LearningRate 0.4319 Epoch: 6 Global Step: 33310 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:12:47,875-Speed 10518.02 samples/sec Loss 10.4786 LearningRate 0.4318 Epoch: 6 Global Step: 33320 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:12:55,662-Speed 10522.66 samples/sec Loss 10.5692 LearningRate 0.4316 Epoch: 6 Global Step: 33330 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:13:03,456-Speed 10511.88 samples/sec Loss 10.5745 LearningRate 0.4315 Epoch: 6 Global Step: 33340 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:13:11,277-Speed 10482.13 samples/sec Loss 10.4089 LearningRate 0.4314 Epoch: 6 Global Step: 33350 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:13:19,123-Speed 10442.61 samples/sec Loss 10.5198 LearningRate 0.4313 Epoch: 6 Global Step: 33360 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:13:27,057-Speed 10327.23 samples/sec Loss 10.4282 LearningRate 0.4311 Epoch: 6 Global Step: 33370 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:13:34,927-Speed 10412.38 samples/sec Loss 10.5441 LearningRate 0.4310 Epoch: 6 Global Step: 33380 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:13:42,721-Speed 10512.26 samples/sec Loss 10.5607 LearningRate 0.4309 Epoch: 6 Global Step: 33390 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:13:50,546-Speed 10469.81 samples/sec Loss 10.5604 LearningRate 0.4308 Epoch: 6 Global Step: 33400 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:13:58,329-Speed 10527.52 samples/sec Loss 10.4541 LearningRate 0.4306 Epoch: 6 Global Step: 33410 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:14:06,116-Speed 10521.66 samples/sec Loss 10.5186 LearningRate 0.4305 Epoch: 6 Global Step: 33420 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:14:13,912-Speed 10509.51 samples/sec Loss 10.4762 LearningRate 0.4304 Epoch: 6 Global Step: 33430 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:14:21,711-Speed 10505.60 samples/sec Loss 10.4854 LearningRate 0.4303 Epoch: 6 Global Step: 33440 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:14:29,538-Speed 10468.40 samples/sec Loss 10.5265 LearningRate 0.4302 Epoch: 6 Global Step: 33450 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:14:37,388-Speed 10438.55 samples/sec Loss 10.6596 LearningRate 0.4300 Epoch: 6 Global Step: 33460 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:14:45,223-Speed 10457.42 samples/sec Loss 10.5177 LearningRate 0.4299 Epoch: 6 Global Step: 33470 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:14:53,007-Speed 10524.23 samples/sec Loss 10.4944 LearningRate 0.4298 Epoch: 6 Global Step: 33480 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:15:00,784-Speed 10535.26 samples/sec Loss 10.4971 LearningRate 0.4297 Epoch: 6 Global Step: 33490 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:15:08,547-Speed 10554.35 samples/sec Loss 10.4550 LearningRate 0.4295 Epoch: 6 Global Step: 33500 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:15:16,318-Speed 10542.53 samples/sec Loss 10.5359 LearningRate 0.4294 Epoch: 6 Global Step: 33510 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:15:24,109-Speed 10516.34 samples/sec Loss 10.5272 LearningRate 0.4293 Epoch: 6 Global Step: 33520 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:15:31,885-Speed 10538.09 samples/sec Loss 10.4958 LearningRate 0.4292 Epoch: 6 Global Step: 33530 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:15:39,696-Speed 10489.35 samples/sec Loss 10.5394 LearningRate 0.4291 Epoch: 6 Global Step: 33540 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:15:47,487-Speed 10516.65 samples/sec Loss 10.4735 LearningRate 0.4289 Epoch: 6 Global Step: 33550 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:15:55,276-Speed 10517.61 samples/sec Loss 10.4876 LearningRate 0.4288 Epoch: 6 Global Step: 33560 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:16:03,078-Speed 10502.39 samples/sec Loss 10.4114 LearningRate 0.4287 Epoch: 6 Global Step: 33570 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:16:10,876-Speed 10506.00 samples/sec Loss 10.5213 LearningRate 0.4286 Epoch: 6 Global Step: 33580 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:16:18,665-Speed 10524.29 samples/sec Loss 10.5212 LearningRate 0.4284 Epoch: 6 Global Step: 33590 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:16:26,467-Speed 10500.59 samples/sec Loss 10.4664 LearningRate 0.4283 Epoch: 6 Global Step: 33600 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:16:34,243-Speed 10536.89 samples/sec Loss 10.4052 LearningRate 0.4282 Epoch: 6 Global Step: 33610 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:16:42,078-Speed 10457.02 samples/sec Loss 10.4805 LearningRate 0.4281 Epoch: 6 Global Step: 33620 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:16:49,899-Speed 10475.95 samples/sec Loss 10.4657 LearningRate 0.4280 Epoch: 6 Global Step: 33630 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:16:57,694-Speed 10512.54 samples/sec Loss 10.5832 LearningRate 0.4278 Epoch: 6 Global Step: 33640 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:17:05,487-Speed 10513.23 samples/sec Loss 10.4893 LearningRate 0.4277 Epoch: 6 Global Step: 33650 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:17:13,266-Speed 10532.63 samples/sec Loss 10.4614 LearningRate 0.4276 Epoch: 6 Global Step: 33660 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:17:21,053-Speed 10521.39 samples/sec Loss 10.5049 LearningRate 0.4275 Epoch: 6 Global Step: 33670 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:17:28,887-Speed 10459.04 samples/sec Loss 10.4114 LearningRate 0.4273 Epoch: 6 Global Step: 33680 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:17:36,679-Speed 10515.20 samples/sec Loss 10.4969 LearningRate 0.4272 Epoch: 6 Global Step: 33690 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:17:44,455-Speed 10535.73 samples/sec Loss 10.4505 LearningRate 0.4271 Epoch: 6 Global Step: 33700 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:17:52,271-Speed 10483.52 samples/sec Loss 10.4923 LearningRate 0.4270 Epoch: 6 Global Step: 33710 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:18:00,072-Speed 10502.76 samples/sec Loss 10.4377 LearningRate 0.4269 Epoch: 6 Global Step: 33720 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:18:07,859-Speed 10520.85 samples/sec Loss 10.4228 LearningRate 0.4267 Epoch: 6 Global Step: 33730 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:18:15,667-Speed 10493.29 samples/sec Loss 10.5014 LearningRate 0.4266 Epoch: 6 Global Step: 33740 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:18:23,468-Speed 10502.67 samples/sec Loss 10.4267 LearningRate 0.4265 Epoch: 6 Global Step: 33750 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:18:31,262-Speed 10511.73 samples/sec Loss 10.4078 LearningRate 0.4264 Epoch: 6 Global Step: 33760 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:18:39,126-Speed 10418.42 samples/sec Loss 10.4205 LearningRate 0.4262 Epoch: 6 Global Step: 33770 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:18:46,936-Speed 10491.00 samples/sec Loss 10.4661 LearningRate 0.4261 Epoch: 6 Global Step: 33780 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:18:54,740-Speed 10497.52 samples/sec Loss 10.3963 LearningRate 0.4260 Epoch: 6 Global Step: 33790 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:19:02,549-Speed 10493.71 samples/sec Loss 10.5014 LearningRate 0.4259 Epoch: 6 Global Step: 33800 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:19:10,367-Speed 10479.58 samples/sec Loss 10.4093 LearningRate 0.4258 Epoch: 6 Global Step: 33810 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:19:18,161-Speed 10510.92 samples/sec Loss 10.3447 LearningRate 0.4256 Epoch: 6 Global Step: 33820 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:19:25,966-Speed 10498.17 samples/sec Loss 10.3748 LearningRate 0.4255 Epoch: 6 Global Step: 33830 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:19:33,753-Speed 10522.24 samples/sec Loss 10.3804 LearningRate 0.4254 Epoch: 6 Global Step: 33840 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:19:41,548-Speed 10509.35 samples/sec Loss 10.4880 LearningRate 0.4253 Epoch: 6 Global Step: 33850 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:19:49,406-Speed 10426.32 samples/sec Loss 10.4622 LearningRate 0.4251 Epoch: 6 Global Step: 33860 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:19:57,186-Speed 10531.45 samples/sec Loss 10.4266 LearningRate 0.4250 Epoch: 6 Global Step: 33870 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:20:05,050-Speed 10418.90 samples/sec Loss 10.4222 LearningRate 0.4249 Epoch: 6 Global Step: 33880 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:20:12,874-Speed 10471.69 samples/sec Loss 10.4049 LearningRate 0.4248 Epoch: 6 Global Step: 33890 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:20:20,664-Speed 10516.74 samples/sec Loss 10.5095 LearningRate 0.4247 Epoch: 6 Global Step: 33900 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:20:28,445-Speed 10530.69 samples/sec Loss 10.4320 LearningRate 0.4245 Epoch: 6 Global Step: 33910 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:20:36,241-Speed 10508.74 samples/sec Loss 10.4435 LearningRate 0.4244 Epoch: 6 Global Step: 33920 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:20:44,012-Speed 10542.61 samples/sec Loss 10.5016 LearningRate 0.4243 Epoch: 6 Global Step: 33930 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:20:51,834-Speed 10473.39 samples/sec Loss 10.4545 LearningRate 0.4242 Epoch: 6 Global Step: 33940 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:20:59,650-Speed 10483.33 samples/sec Loss 10.4675 LearningRate 0.4241 Epoch: 6 Global Step: 33950 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:21:07,487-Speed 10455.71 samples/sec Loss 10.4204 LearningRate 0.4239 Epoch: 6 Global Step: 33960 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:21:15,287-Speed 10503.10 samples/sec Loss 10.4097 LearningRate 0.4238 Epoch: 6 Global Step: 33970 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:21:23,078-Speed 10516.68 samples/sec Loss 10.4130 LearningRate 0.4237 Epoch: 6 Global Step: 33980 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:21:30,919-Speed 10449.41 samples/sec Loss 10.5152 LearningRate 0.4236 Epoch: 6 Global Step: 33990 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:21:38,699-Speed 10531.09 samples/sec Loss 10.4041 LearningRate 0.4234 Epoch: 6 Global Step: 34000 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:21:46,489-Speed 10516.67 samples/sec Loss 10.4725 LearningRate 0.4233 Epoch: 6 Global Step: 34010 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:21:54,309-Speed 10476.95 samples/sec Loss 10.3935 LearningRate 0.4232 Epoch: 6 Global Step: 34020 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:22:02,134-Speed 10469.98 samples/sec Loss 10.3942 LearningRate 0.4231 Epoch: 6 Global Step: 34030 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:22:09,964-Speed 10464.29 samples/sec Loss 10.3930 LearningRate 0.4230 Epoch: 6 Global Step: 34040 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:22:17,767-Speed 10500.09 samples/sec Loss 10.3796 LearningRate 0.4228 Epoch: 6 Global Step: 34050 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:22:25,634-Speed 10413.84 samples/sec Loss 10.3586 LearningRate 0.4227 Epoch: 6 Global Step: 34060 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:22:33,433-Speed 10506.29 samples/sec Loss 10.4057 LearningRate 0.4226 Epoch: 6 Global Step: 34070 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:22:41,227-Speed 10512.00 samples/sec Loss 10.4061 LearningRate 0.4225 Epoch: 6 Global Step: 34080 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:22:49,047-Speed 10476.14 samples/sec Loss 10.3241 LearningRate 0.4224 Epoch: 6 Global Step: 34090 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:22:56,858-Speed 10489.18 samples/sec Loss 10.3491 LearningRate 0.4222 Epoch: 6 Global Step: 34100 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:23:04,666-Speed 10493.33 samples/sec Loss 10.5251 LearningRate 0.4221 Epoch: 6 Global Step: 34110 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:23:12,481-Speed 10485.10 samples/sec Loss 10.4789 LearningRate 0.4220 Epoch: 6 Global Step: 34120 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:23:20,257-Speed 10537.07 samples/sec Loss 10.3522 LearningRate 0.4219 Epoch: 6 Global Step: 34130 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:23:28,097-Speed 10450.74 samples/sec Loss 10.3076 LearningRate 0.4217 Epoch: 6 Global Step: 34140 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:23:35,895-Speed 10506.30 samples/sec Loss 10.4003 LearningRate 0.4216 Epoch: 6 Global Step: 34150 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:23:43,698-Speed 10501.47 samples/sec Loss 10.4089 LearningRate 0.4215 Epoch: 6 Global Step: 34160 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:23:51,512-Speed 10485.35 samples/sec Loss 10.4182 LearningRate 0.4214 Epoch: 6 Global Step: 34170 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:23:59,314-Speed 10501.21 samples/sec Loss 10.3687 LearningRate 0.4213 Epoch: 6 Global Step: 34180 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:24:07,090-Speed 10537.61 samples/sec Loss 10.3378 LearningRate 0.4211 Epoch: 6 Global Step: 34190 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:24:14,944-Speed 10432.56 samples/sec Loss 10.3503 LearningRate 0.4210 Epoch: 6 Global Step: 34200 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:24:22,739-Speed 10510.11 samples/sec Loss 10.4220 LearningRate 0.4209 Epoch: 6 Global Step: 34210 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:24:30,538-Speed 10505.13 samples/sec Loss 10.4507 LearningRate 0.4208 Epoch: 6 Global Step: 34220 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:24:38,348-Speed 10491.38 samples/sec Loss 10.4566 LearningRate 0.4207 Epoch: 6 Global Step: 34230 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:24:46,145-Speed 10508.85 samples/sec Loss 10.4310 LearningRate 0.4205 Epoch: 6 Global Step: 34240 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:24:53,932-Speed 10521.00 samples/sec Loss 10.3583 LearningRate 0.4204 Epoch: 6 Global Step: 34250 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:25:01,726-Speed 10510.99 samples/sec Loss 10.3753 LearningRate 0.4203 Epoch: 6 Global Step: 34260 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:25:09,513-Speed 10522.31 samples/sec Loss 10.4206 LearningRate 0.4202 Epoch: 6 Global Step: 34270 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:25:17,316-Speed 10500.10 samples/sec Loss 10.3905 LearningRate 0.4200 Epoch: 6 Global Step: 34280 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:25:25,152-Speed 10454.99 samples/sec Loss 10.3566 LearningRate 0.4199 Epoch: 6 Global Step: 34290 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:25:34,948-Speed 8363.11 samples/sec Loss 10.3355 LearningRate 0.4198 Epoch: 6 Global Step: 34300 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:25:42,734-Speed 10523.72 samples/sec Loss 10.3927 LearningRate 0.4197 Epoch: 6 Global Step: 34310 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:25:50,568-Speed 10459.11 samples/sec Loss 10.3844 LearningRate 0.4196 Epoch: 6 Global Step: 34320 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:25:58,400-Speed 10461.21 samples/sec Loss 10.3651 LearningRate 0.4194 Epoch: 6 Global Step: 34330 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:26:06,202-Speed 10501.97 samples/sec Loss 10.2941 LearningRate 0.4193 Epoch: 6 Global Step: 34340 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:26:14,004-Speed 10501.22 samples/sec Loss 10.3646 LearningRate 0.4192 Epoch: 6 Global Step: 34350 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:26:21,792-Speed 10520.58 samples/sec Loss 10.2838 LearningRate 0.4191 Epoch: 6 Global Step: 34360 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:26:29,618-Speed 10469.27 samples/sec Loss 10.3613 LearningRate 0.4190 Epoch: 6 Global Step: 34370 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:26:37,413-Speed 10510.41 samples/sec Loss 10.3555 LearningRate 0.4188 Epoch: 6 Global Step: 34380 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:26:45,238-Speed 10470.42 samples/sec Loss 10.4727 LearningRate 0.4187 Epoch: 6 Global Step: 34390 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:26:53,053-Speed 10484.62 samples/sec Loss 10.3791 LearningRate 0.4186 Epoch: 6 Global Step: 34400 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:27:00,833-Speed 10530.87 samples/sec Loss 10.2950 LearningRate 0.4185 Epoch: 6 Global Step: 34410 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:27:08,614-Speed 10528.87 samples/sec Loss 10.3685 LearningRate 0.4184 Epoch: 6 Global Step: 34420 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:27:16,394-Speed 10531.41 samples/sec Loss 10.3030 LearningRate 0.4182 Epoch: 6 Global Step: 34430 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:27:24,186-Speed 10514.08 samples/sec Loss 10.3727 LearningRate 0.4181 Epoch: 6 Global Step: 34440 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:27:31,979-Speed 10512.94 samples/sec Loss 10.3339 LearningRate 0.4180 Epoch: 6 Global Step: 34450 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:27:39,768-Speed 10519.26 samples/sec Loss 10.2887 LearningRate 0.4179 Epoch: 6 Global Step: 34460 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:27:47,568-Speed 10504.35 samples/sec Loss 10.3046 LearningRate 0.4178 Epoch: 6 Global Step: 34470 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:27:55,338-Speed 10544.30 samples/sec Loss 10.2943 LearningRate 0.4176 Epoch: 6 Global Step: 34480 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:28:03,124-Speed 10523.45 samples/sec Loss 10.3520 LearningRate 0.4175 Epoch: 6 Global Step: 34490 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:28:10,908-Speed 10525.75 samples/sec Loss 10.3795 LearningRate 0.4174 Epoch: 6 Global Step: 34500 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:28:18,704-Speed 10508.74 samples/sec Loss 10.2875 LearningRate 0.4173 Epoch: 6 Global Step: 34510 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:28:26,497-Speed 10513.85 samples/sec Loss 10.4088 LearningRate 0.4171 Epoch: 6 Global Step: 34520 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:28:34,361-Speed 10418.14 samples/sec Loss 10.3360 LearningRate 0.4170 Epoch: 6 Global Step: 34530 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:28:42,165-Speed 10498.42 samples/sec Loss 10.3390 LearningRate 0.4169 Epoch: 6 Global Step: 34540 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:28:49,957-Speed 10515.89 samples/sec Loss 10.2853 LearningRate 0.4168 Epoch: 6 Global Step: 34550 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:28:57,764-Speed 10493.86 samples/sec Loss 10.2905 LearningRate 0.4167 Epoch: 6 Global Step: 34560 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:29:05,607-Speed 10447.02 samples/sec Loss 10.2717 LearningRate 0.4165 Epoch: 6 Global Step: 34570 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:29:13,433-Speed 10469.89 samples/sec Loss 10.3090 LearningRate 0.4164 Epoch: 6 Global Step: 34580 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:29:21,259-Speed 10472.45 samples/sec Loss 10.3731 LearningRate 0.4163 Epoch: 6 Global Step: 34590 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:29:29,081-Speed 10475.46 samples/sec Loss 10.3734 LearningRate 0.4162 Epoch: 6 Global Step: 34600 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:29:36,899-Speed 10479.87 samples/sec Loss 10.3853 LearningRate 0.4161 Epoch: 6 Global Step: 34610 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:29:44,687-Speed 10520.51 samples/sec Loss 10.3637 LearningRate 0.4159 Epoch: 6 Global Step: 34620 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:29:52,496-Speed 10492.02 samples/sec Loss 10.3101 LearningRate 0.4158 Epoch: 6 Global Step: 34630 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:30:00,323-Speed 10466.90 samples/sec Loss 10.2283 LearningRate 0.4157 Epoch: 6 Global Step: 34640 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:30:08,149-Speed 10469.43 samples/sec Loss 10.3284 LearningRate 0.4156 Epoch: 6 Global Step: 34650 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:30:15,953-Speed 10499.08 samples/sec Loss 10.3424 LearningRate 0.4155 Epoch: 6 Global Step: 34660 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:30:23,760-Speed 10494.17 samples/sec Loss 10.3643 LearningRate 0.4153 Epoch: 6 Global Step: 34670 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:30:31,547-Speed 10521.41 samples/sec Loss 10.2631 LearningRate 0.4152 Epoch: 6 Global Step: 34680 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:30:39,380-Speed 10459.28 samples/sec Loss 10.2201 LearningRate 0.4151 Epoch: 6 Global Step: 34690 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:30:47,204-Speed 10471.87 samples/sec Loss 10.2544 LearningRate 0.4150 Epoch: 6 Global Step: 34700 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:30:55,003-Speed 10506.33 samples/sec Loss 10.2586 LearningRate 0.4149 Epoch: 6 Global Step: 34710 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:31:02,804-Speed 10502.21 samples/sec Loss 10.2859 LearningRate 0.4147 Epoch: 6 Global Step: 34720 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:31:10,593-Speed 10518.61 samples/sec Loss 10.3134 LearningRate 0.4146 Epoch: 6 Global Step: 34730 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:31:18,387-Speed 10513.33 samples/sec Loss 10.3116 LearningRate 0.4145 Epoch: 6 Global Step: 34740 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:31:26,188-Speed 10502.05 samples/sec Loss 10.3641 LearningRate 0.4144 Epoch: 6 Global Step: 34750 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:31:34,008-Speed 10477.00 samples/sec Loss 10.2577 LearningRate 0.4143 Epoch: 6 Global Step: 34760 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:31:41,818-Speed 10492.08 samples/sec Loss 10.3184 LearningRate 0.4141 Epoch: 6 Global Step: 34770 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:31:49,611-Speed 10514.53 samples/sec Loss 10.3019 LearningRate 0.4140 Epoch: 6 Global Step: 34780 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:31:57,431-Speed 10476.78 samples/sec Loss 10.3422 LearningRate 0.4139 Epoch: 6 Global Step: 34790 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:32:05,244-Speed 10486.02 samples/sec Loss 10.2001 LearningRate 0.4138 Epoch: 6 Global Step: 34800 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:32:13,049-Speed 10498.49 samples/sec Loss 10.3051 LearningRate 0.4137 Epoch: 6 Global Step: 34810 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:32:20,860-Speed 10488.78 samples/sec Loss 10.3073 LearningRate 0.4135 Epoch: 6 Global Step: 34820 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:32:28,678-Speed 10479.64 samples/sec Loss 10.2470 LearningRate 0.4134 Epoch: 6 Global Step: 34830 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:32:36,505-Speed 10468.48 samples/sec Loss 10.2667 LearningRate 0.4133 Epoch: 6 Global Step: 34840 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:32:44,304-Speed 10505.65 samples/sec Loss 10.2982 LearningRate 0.4132 Epoch: 6 Global Step: 34850 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:32:52,106-Speed 10502.36 samples/sec Loss 10.2609 LearningRate 0.4131 Epoch: 6 Global Step: 34860 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:32:59,879-Speed 10539.17 samples/sec Loss 10.3130 LearningRate 0.4129 Epoch: 6 Global Step: 34870 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:33:07,718-Speed 10452.57 samples/sec Loss 10.3175 LearningRate 0.4128 Epoch: 6 Global Step: 34880 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:33:15,523-Speed 10499.02 samples/sec Loss 10.2988 LearningRate 0.4127 Epoch: 6 Global Step: 34890 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:33:23,315-Speed 10514.81 samples/sec Loss 10.2594 LearningRate 0.4126 Epoch: 6 Global Step: 34900 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:33:31,117-Speed 10501.83 samples/sec Loss 10.2875 LearningRate 0.4125 Epoch: 6 Global Step: 34910 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:33:38,933-Speed 10482.96 samples/sec Loss 10.1869 LearningRate 0.4123 Epoch: 6 Global Step: 34920 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:33:46,714-Speed 10529.23 samples/sec Loss 10.2527 LearningRate 0.4122 Epoch: 6 Global Step: 34930 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:33:54,497-Speed 10527.74 samples/sec Loss 10.2810 LearningRate 0.4121 Epoch: 6 Global Step: 34940 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:34:02,279-Speed 10526.99 samples/sec Loss 10.1922 LearningRate 0.4120 Epoch: 6 Global Step: 34950 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:34:10,069-Speed 10518.19 samples/sec Loss 10.2219 LearningRate 0.4119 Epoch: 6 Global Step: 34960 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:34:17,916-Speed 10441.16 samples/sec Loss 10.3222 LearningRate 0.4117 Epoch: 6 Global Step: 34970 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:34:25,728-Speed 10488.34 samples/sec Loss 10.2223 LearningRate 0.4116 Epoch: 6 Global Step: 34980 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:34:33,517-Speed 10518.28 samples/sec Loss 10.2802 LearningRate 0.4115 Epoch: 6 Global Step: 34990 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:34:41,314-Speed 10509.16 samples/sec Loss 10.1400 LearningRate 0.4114 Epoch: 6 Global Step: 35000 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:34:49,116-Speed 10501.66 samples/sec Loss 10.2349 LearningRate 0.4113 Epoch: 6 Global Step: 35010 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:34:56,897-Speed 10530.65 samples/sec Loss 10.1922 LearningRate 0.4111 Epoch: 6 Global Step: 35020 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:35:04,751-Speed 10430.60 samples/sec Loss 10.1918 LearningRate 0.4110 Epoch: 6 Global Step: 35030 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:35:12,558-Speed 10494.53 samples/sec Loss 10.2691 LearningRate 0.4109 Epoch: 6 Global Step: 35040 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:35:20,379-Speed 10476.82 samples/sec Loss 10.3418 LearningRate 0.4108 Epoch: 6 Global Step: 35050 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:35:28,198-Speed 10478.84 samples/sec Loss 10.2964 LearningRate 0.4107 Epoch: 6 Global Step: 35060 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:35:36,001-Speed 10499.14 samples/sec Loss 10.2977 LearningRate 0.4105 Epoch: 6 Global Step: 35070 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:35:43,789-Speed 10520.70 samples/sec Loss 10.2428 LearningRate 0.4104 Epoch: 6 Global Step: 35080 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:35:51,621-Speed 10460.64 samples/sec Loss 10.1852 LearningRate 0.4103 Epoch: 6 Global Step: 35090 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:35:59,397-Speed 10537.02 samples/sec Loss 10.2114 LearningRate 0.4102 Epoch: 6 Global Step: 35100 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:36:07,181-Speed 10525.46 samples/sec Loss 10.2644 LearningRate 0.4101 Epoch: 6 Global Step: 35110 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:36:14,959-Speed 10533.45 samples/sec Loss 10.3309 LearningRate 0.4099 Epoch: 6 Global Step: 35120 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:36:22,763-Speed 10498.58 samples/sec Loss 10.2290 LearningRate 0.4098 Epoch: 6 Global Step: 35130 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:36:30,556-Speed 10513.84 samples/sec Loss 10.2214 LearningRate 0.4097 Epoch: 6 Global Step: 35140 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:36:38,351-Speed 10511.12 samples/sec Loss 10.2501 LearningRate 0.4096 Epoch: 6 Global Step: 35150 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:36:46,137-Speed 10522.15 samples/sec Loss 10.1938 LearningRate 0.4095 Epoch: 6 Global Step: 35160 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:36:53,941-Speed 10498.26 samples/sec Loss 10.2176 LearningRate 0.4093 Epoch: 6 Global Step: 35170 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:37:01,732-Speed 10517.43 samples/sec Loss 10.1716 LearningRate 0.4092 Epoch: 6 Global Step: 35180 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:37:09,534-Speed 10501.48 samples/sec Loss 10.2138 LearningRate 0.4091 Epoch: 6 Global Step: 35190 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:37:17,341-Speed 10494.17 samples/sec Loss 10.2632 LearningRate 0.4090 Epoch: 6 Global Step: 35200 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:37:25,127-Speed 10523.88 samples/sec Loss 10.2307 LearningRate 0.4089 Epoch: 6 Global Step: 35210 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:37:32,915-Speed 10518.88 samples/sec Loss 10.2400 LearningRate 0.4087 Epoch: 6 Global Step: 35220 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:37:40,723-Speed 10493.75 samples/sec Loss 10.1674 LearningRate 0.4086 Epoch: 6 Global Step: 35230 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:37:48,523-Speed 10505.10 samples/sec Loss 10.2045 LearningRate 0.4085 Epoch: 6 Global Step: 35240 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:37:56,313-Speed 10515.86 samples/sec Loss 10.2509 LearningRate 0.4084 Epoch: 6 Global Step: 35250 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:38:04,113-Speed 10504.18 samples/sec Loss 10.2225 LearningRate 0.4083 Epoch: 6 Global Step: 35260 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:38:11,918-Speed 10497.89 samples/sec Loss 10.1475 LearningRate 0.4082 Epoch: 6 Global Step: 35270 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:38:19,753-Speed 10458.20 samples/sec Loss 10.1916 LearningRate 0.4080 Epoch: 6 Global Step: 35280 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:38:27,541-Speed 10518.90 samples/sec Loss 10.1822 LearningRate 0.4079 Epoch: 6 Global Step: 35290 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:38:35,326-Speed 10524.83 samples/sec Loss 10.4451 LearningRate 0.4078 Epoch: 6 Global Step: 35300 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:38:43,128-Speed 10501.40 samples/sec Loss 10.2437 LearningRate 0.4077 Epoch: 6 Global Step: 35310 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:38:50,907-Speed 10533.53 samples/sec Loss 10.1713 LearningRate 0.4076 Epoch: 6 Global Step: 35320 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:38:58,696-Speed 10518.33 samples/sec Loss 10.1431 LearningRate 0.4074 Epoch: 6 Global Step: 35330 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:39:06,504-Speed 10493.38 samples/sec Loss 10.2169 LearningRate 0.4073 Epoch: 6 Global Step: 35340 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:39:14,291-Speed 10521.30 samples/sec Loss 10.1834 LearningRate 0.4072 Epoch: 6 Global Step: 35350 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:39:22,097-Speed 10497.05 samples/sec Loss 10.1921 LearningRate 0.4071 Epoch: 6 Global Step: 35360 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:39:29,937-Speed 10449.53 samples/sec Loss 10.0828 LearningRate 0.4070 Epoch: 6 Global Step: 35370 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:39:37,734-Speed 10508.91 samples/sec Loss 10.1845 LearningRate 0.4068 Epoch: 6 Global Step: 35380 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:39:45,500-Speed 10550.42 samples/sec Loss 10.1933 LearningRate 0.4067 Epoch: 6 Global Step: 35390 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:39:53,298-Speed 10507.30 samples/sec Loss 10.2096 LearningRate 0.4066 Epoch: 6 Global Step: 35400 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:40:01,127-Speed 10471.50 samples/sec Loss 10.1330 LearningRate 0.4065 Epoch: 6 Global Step: 35410 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:40:08,903-Speed 10535.66 samples/sec Loss 10.2892 LearningRate 0.4064 Epoch: 6 Global Step: 35420 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:40:16,733-Speed 10463.77 samples/sec Loss 10.2176 LearningRate 0.4062 Epoch: 6 Global Step: 35430 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:40:24,531-Speed 10507.25 samples/sec Loss 10.1717 LearningRate 0.4061 Epoch: 6 Global Step: 35440 Fp16 Grad Scale: 524288 Required: 15 hours Training: 2022-01-15 22:40:32,307-Speed 10535.86 samples/sec Loss 10.2507 LearningRate 0.4060 Epoch: 6 Global Step: 35450 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:40:40,096-Speed 10519.72 samples/sec Loss 10.1886 LearningRate 0.4059 Epoch: 6 Global Step: 35460 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:40:47,888-Speed 10515.97 samples/sec Loss 10.1828 LearningRate 0.4058 Epoch: 6 Global Step: 35470 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:40:55,669-Speed 10528.37 samples/sec Loss 10.1087 LearningRate 0.4056 Epoch: 6 Global Step: 35480 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:41:03,483-Speed 10485.69 samples/sec Loss 10.1512 LearningRate 0.4055 Epoch: 6 Global Step: 35490 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:41:11,289-Speed 10496.74 samples/sec Loss 10.2476 LearningRate 0.4054 Epoch: 6 Global Step: 35500 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:41:19,064-Speed 10537.73 samples/sec Loss 10.1678 LearningRate 0.4053 Epoch: 6 Global Step: 35510 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:41:26,880-Speed 10481.59 samples/sec Loss 10.1656 LearningRate 0.4052 Epoch: 6 Global Step: 35520 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:41:34,667-Speed 10521.97 samples/sec Loss 10.1803 LearningRate 0.4051 Epoch: 6 Global Step: 35530 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:41:42,450-Speed 10527.01 samples/sec Loss 10.1455 LearningRate 0.4049 Epoch: 6 Global Step: 35540 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:41:50,234-Speed 10526.26 samples/sec Loss 10.1985 LearningRate 0.4048 Epoch: 6 Global Step: 35550 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:41:58,014-Speed 10531.03 samples/sec Loss 10.1557 LearningRate 0.4047 Epoch: 6 Global Step: 35560 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:42:05,831-Speed 10480.70 samples/sec Loss 10.2243 LearningRate 0.4046 Epoch: 6 Global Step: 35570 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:42:13,633-Speed 10501.04 samples/sec Loss 10.1208 LearningRate 0.4045 Epoch: 6 Global Step: 35580 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:42:21,430-Speed 10507.54 samples/sec Loss 10.1229 LearningRate 0.4043 Epoch: 6 Global Step: 35590 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:42:29,231-Speed 10503.87 samples/sec Loss 10.1603 LearningRate 0.4042 Epoch: 6 Global Step: 35600 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-01-15 22:42:37,034-Speed 10499.02 samples/sec Loss 10.1534 LearningRate 0.4041 Epoch: 6 Global Step: 35610 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:42:44,830-Speed 10509.63 samples/sec Loss 10.1287 LearningRate 0.4040 Epoch: 6 Global Step: 35620 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:42:52,612-Speed 10528.68 samples/sec Loss 10.1101 LearningRate 0.4039 Epoch: 6 Global Step: 35630 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:43:00,417-Speed 10502.23 samples/sec Loss 10.1432 LearningRate 0.4037 Epoch: 6 Global Step: 35640 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:43:08,216-Speed 10504.60 samples/sec Loss 10.1317 LearningRate 0.4036 Epoch: 6 Global Step: 35650 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:43:16,010-Speed 10512.86 samples/sec Loss 10.1613 LearningRate 0.4035 Epoch: 6 Global Step: 35660 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:43:23,808-Speed 10505.22 samples/sec Loss 10.1808 LearningRate 0.4034 Epoch: 6 Global Step: 35670 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:43:31,612-Speed 10499.66 samples/sec Loss 10.1017 LearningRate 0.4033 Epoch: 6 Global Step: 35680 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:43:39,414-Speed 10500.64 samples/sec Loss 10.1121 LearningRate 0.4032 Epoch: 6 Global Step: 35690 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:43:47,186-Speed 10542.48 samples/sec Loss 10.1938 LearningRate 0.4030 Epoch: 6 Global Step: 35700 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:43:54,990-Speed 10498.34 samples/sec Loss 10.1420 LearningRate 0.4029 Epoch: 6 Global Step: 35710 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:44:02,772-Speed 10528.99 samples/sec Loss 10.0436 LearningRate 0.4028 Epoch: 6 Global Step: 35720 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:44:10,581-Speed 10490.98 samples/sec Loss 10.1540 LearningRate 0.4027 Epoch: 6 Global Step: 35730 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:44:18,393-Speed 10488.17 samples/sec Loss 10.1031 LearningRate 0.4026 Epoch: 6 Global Step: 35740 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:44:26,188-Speed 10511.54 samples/sec Loss 10.1728 LearningRate 0.4024 Epoch: 6 Global Step: 35750 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:44:34,025-Speed 10453.87 samples/sec Loss 10.1074 LearningRate 0.4023 Epoch: 6 Global Step: 35760 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:44:41,822-Speed 10508.53 samples/sec Loss 10.1139 LearningRate 0.4022 Epoch: 6 Global Step: 35770 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:44:49,620-Speed 10506.40 samples/sec Loss 10.1197 LearningRate 0.4021 Epoch: 6 Global Step: 35780 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:44:57,413-Speed 10513.48 samples/sec Loss 10.1098 LearningRate 0.4020 Epoch: 6 Global Step: 35790 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:45:05,237-Speed 10471.89 samples/sec Loss 10.1254 LearningRate 0.4019 Epoch: 6 Global Step: 35800 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:45:13,067-Speed 10464.87 samples/sec Loss 10.1410 LearningRate 0.4017 Epoch: 6 Global Step: 35810 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:45:20,861-Speed 10511.80 samples/sec Loss 10.1325 LearningRate 0.4016 Epoch: 6 Global Step: 35820 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:45:28,658-Speed 10508.07 samples/sec Loss 10.1778 LearningRate 0.4015 Epoch: 6 Global Step: 35830 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:45:36,454-Speed 10509.02 samples/sec Loss 10.1465 LearningRate 0.4014 Epoch: 6 Global Step: 35840 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:45:44,240-Speed 10522.33 samples/sec Loss 10.0990 LearningRate 0.4013 Epoch: 6 Global Step: 35850 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:45:52,048-Speed 10494.03 samples/sec Loss 10.0820 LearningRate 0.4011 Epoch: 6 Global Step: 35860 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:45:59,837-Speed 10517.29 samples/sec Loss 10.1477 LearningRate 0.4010 Epoch: 6 Global Step: 35870 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:46:07,620-Speed 10527.29 samples/sec Loss 10.0733 LearningRate 0.4009 Epoch: 6 Global Step: 35880 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:46:15,423-Speed 10500.36 samples/sec Loss 10.0859 LearningRate 0.4008 Epoch: 6 Global Step: 35890 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:46:23,240-Speed 10480.92 samples/sec Loss 10.0933 LearningRate 0.4007 Epoch: 6 Global Step: 35900 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:46:31,023-Speed 10527.11 samples/sec Loss 10.2055 LearningRate 0.4005 Epoch: 6 Global Step: 35910 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:46:38,840-Speed 10484.33 samples/sec Loss 10.1305 LearningRate 0.4004 Epoch: 6 Global Step: 35920 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:46:46,636-Speed 10508.89 samples/sec Loss 10.1122 LearningRate 0.4003 Epoch: 6 Global Step: 35930 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:46:54,450-Speed 10484.57 samples/sec Loss 10.0682 LearningRate 0.4002 Epoch: 6 Global Step: 35940 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:47:02,271-Speed 10476.62 samples/sec Loss 10.2006 LearningRate 0.4001 Epoch: 6 Global Step: 35950 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:47:10,130-Speed 10425.13 samples/sec Loss 10.1366 LearningRate 0.4000 Epoch: 6 Global Step: 35960 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:47:17,906-Speed 10537.13 samples/sec Loss 10.0858 LearningRate 0.3998 Epoch: 6 Global Step: 35970 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:47:25,698-Speed 10514.77 samples/sec Loss 10.1639 LearningRate 0.3997 Epoch: 6 Global Step: 35980 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:47:33,516-Speed 10480.48 samples/sec Loss 10.1114 LearningRate 0.3996 Epoch: 6 Global Step: 35990 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:47:41,327-Speed 10489.44 samples/sec Loss 10.1912 LearningRate 0.3995 Epoch: 6 Global Step: 36000 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:47:49,130-Speed 10499.90 samples/sec Loss 10.0723 LearningRate 0.3994 Epoch: 6 Global Step: 36010 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:47:56,930-Speed 10503.78 samples/sec Loss 10.0668 LearningRate 0.3993 Epoch: 6 Global Step: 36020 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:48:04,750-Speed 10476.83 samples/sec Loss 10.0098 LearningRate 0.3991 Epoch: 6 Global Step: 36030 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:48:12,523-Speed 10542.27 samples/sec Loss 10.1189 LearningRate 0.3990 Epoch: 6 Global Step: 36040 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:48:20,318-Speed 10511.97 samples/sec Loss 10.0376 LearningRate 0.3989 Epoch: 6 Global Step: 36050 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:48:28,142-Speed 10470.59 samples/sec Loss 9.9816 LearningRate 0.3988 Epoch: 6 Global Step: 36060 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:48:35,940-Speed 10506.06 samples/sec Loss 10.1005 LearningRate 0.3987 Epoch: 6 Global Step: 36070 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:48:43,789-Speed 10439.53 samples/sec Loss 10.0590 LearningRate 0.3985 Epoch: 6 Global Step: 36080 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:48:51,599-Speed 10490.11 samples/sec Loss 10.2509 LearningRate 0.3984 Epoch: 6 Global Step: 36090 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:48:59,386-Speed 10520.86 samples/sec Loss 10.2780 LearningRate 0.3983 Epoch: 6 Global Step: 36100 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:49:07,170-Speed 10526.03 samples/sec Loss 10.1362 LearningRate 0.3982 Epoch: 6 Global Step: 36110 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:49:14,979-Speed 10491.98 samples/sec Loss 10.0623 LearningRate 0.3981 Epoch: 6 Global Step: 36120 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:49:22,771-Speed 10515.33 samples/sec Loss 9.9433 LearningRate 0.3980 Epoch: 6 Global Step: 36130 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:49:30,565-Speed 10516.71 samples/sec Loss 10.0444 LearningRate 0.3978 Epoch: 6 Global Step: 36140 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:49:38,378-Speed 10486.63 samples/sec Loss 10.0302 LearningRate 0.3977 Epoch: 6 Global Step: 36150 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:49:46,156-Speed 10538.10 samples/sec Loss 10.0566 LearningRate 0.3976 Epoch: 6 Global Step: 36160 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:49:53,963-Speed 10493.71 samples/sec Loss 10.1550 LearningRate 0.3975 Epoch: 6 Global Step: 36170 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:50:01,768-Speed 10497.36 samples/sec Loss 10.0674 LearningRate 0.3974 Epoch: 6 Global Step: 36180 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:50:09,567-Speed 10505.53 samples/sec Loss 10.0322 LearningRate 0.3972 Epoch: 6 Global Step: 36190 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:50:17,347-Speed 10531.41 samples/sec Loss 10.0022 LearningRate 0.3971 Epoch: 6 Global Step: 36200 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:50:25,133-Speed 10522.69 samples/sec Loss 10.1109 LearningRate 0.3970 Epoch: 6 Global Step: 36210 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:50:32,936-Speed 10500.25 samples/sec Loss 10.0303 LearningRate 0.3969 Epoch: 6 Global Step: 36220 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:50:40,726-Speed 10517.67 samples/sec Loss 10.0840 LearningRate 0.3968 Epoch: 6 Global Step: 36230 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:50:48,526-Speed 10504.13 samples/sec Loss 10.0706 LearningRate 0.3967 Epoch: 6 Global Step: 36240 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:50:56,341-Speed 10483.76 samples/sec Loss 10.1009 LearningRate 0.3965 Epoch: 6 Global Step: 36250 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:51:04,119-Speed 10532.97 samples/sec Loss 10.0677 LearningRate 0.3964 Epoch: 6 Global Step: 36260 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:51:11,917-Speed 10507.64 samples/sec Loss 10.0637 LearningRate 0.3963 Epoch: 6 Global Step: 36270 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:51:19,753-Speed 10455.46 samples/sec Loss 10.1333 LearningRate 0.3962 Epoch: 6 Global Step: 36280 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:51:27,542-Speed 10518.93 samples/sec Loss 10.0071 LearningRate 0.3961 Epoch: 6 Global Step: 36290 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:51:50,498-Speed 3568.64 samples/sec Loss 10.0450 LearningRate 0.3960 Epoch: 7 Global Step: 36300 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:51:58,287-Speed 10520.32 samples/sec Loss 10.0143 LearningRate 0.3958 Epoch: 7 Global Step: 36310 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:52:06,061-Speed 10539.12 samples/sec Loss 10.0526 LearningRate 0.3957 Epoch: 7 Global Step: 36320 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:52:13,878-Speed 10480.56 samples/sec Loss 10.0492 LearningRate 0.3956 Epoch: 7 Global Step: 36330 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:52:21,671-Speed 10513.82 samples/sec Loss 10.0366 LearningRate 0.3955 Epoch: 7 Global Step: 36340 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:52:29,446-Speed 10538.33 samples/sec Loss 10.0851 LearningRate 0.3954 Epoch: 7 Global Step: 36350 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:52:37,239-Speed 10512.75 samples/sec Loss 10.0690 LearningRate 0.3952 Epoch: 7 Global Step: 36360 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:52:45,029-Speed 10516.73 samples/sec Loss 10.0518 LearningRate 0.3951 Epoch: 7 Global Step: 36370 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:52:52,853-Speed 10472.18 samples/sec Loss 9.9860 LearningRate 0.3950 Epoch: 7 Global Step: 36380 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:53:00,639-Speed 10523.38 samples/sec Loss 10.0421 LearningRate 0.3949 Epoch: 7 Global Step: 36390 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:53:08,461-Speed 10474.71 samples/sec Loss 9.9442 LearningRate 0.3948 Epoch: 7 Global Step: 36400 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:53:16,261-Speed 10503.67 samples/sec Loss 9.9787 LearningRate 0.3947 Epoch: 7 Global Step: 36410 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:53:24,121-Speed 10423.33 samples/sec Loss 10.0186 LearningRate 0.3945 Epoch: 7 Global Step: 36420 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:53:31,935-Speed 10485.37 samples/sec Loss 10.0578 LearningRate 0.3944 Epoch: 7 Global Step: 36430 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:53:39,720-Speed 10524.02 samples/sec Loss 9.9911 LearningRate 0.3943 Epoch: 7 Global Step: 36440 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:53:47,501-Speed 10529.95 samples/sec Loss 10.0579 LearningRate 0.3942 Epoch: 7 Global Step: 36450 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:53:55,299-Speed 10505.94 samples/sec Loss 10.0217 LearningRate 0.3941 Epoch: 7 Global Step: 36460 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:54:03,064-Speed 10551.97 samples/sec Loss 9.9466 LearningRate 0.3940 Epoch: 7 Global Step: 36470 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:54:10,865-Speed 10503.16 samples/sec Loss 9.9845 LearningRate 0.3938 Epoch: 7 Global Step: 36480 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:54:18,634-Speed 10545.44 samples/sec Loss 10.0502 LearningRate 0.3937 Epoch: 7 Global Step: 36490 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:54:26,401-Speed 10549.24 samples/sec Loss 9.8856 LearningRate 0.3936 Epoch: 7 Global Step: 36500 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:54:34,185-Speed 10526.05 samples/sec Loss 9.9970 LearningRate 0.3935 Epoch: 7 Global Step: 36510 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:54:41,975-Speed 10516.92 samples/sec Loss 9.9819 LearningRate 0.3934 Epoch: 7 Global Step: 36520 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:54:49,776-Speed 10501.42 samples/sec Loss 10.0001 LearningRate 0.3933 Epoch: 7 Global Step: 36530 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:54:57,587-Speed 10489.65 samples/sec Loss 9.9920 LearningRate 0.3931 Epoch: 7 Global Step: 36540 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:55:05,395-Speed 10494.30 samples/sec Loss 9.9976 LearningRate 0.3930 Epoch: 7 Global Step: 36550 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:55:13,243-Speed 10438.37 samples/sec Loss 10.0893 LearningRate 0.3929 Epoch: 7 Global Step: 36560 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:55:21,079-Speed 10456.56 samples/sec Loss 9.9903 LearningRate 0.3928 Epoch: 7 Global Step: 36570 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:55:28,893-Speed 10485.34 samples/sec Loss 10.0172 LearningRate 0.3927 Epoch: 7 Global Step: 36580 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:55:36,696-Speed 10500.34 samples/sec Loss 9.9323 LearningRate 0.3926 Epoch: 7 Global Step: 36590 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:55:44,502-Speed 10495.35 samples/sec Loss 10.0849 LearningRate 0.3924 Epoch: 7 Global Step: 36600 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:55:52,323-Speed 10475.76 samples/sec Loss 10.0242 LearningRate 0.3923 Epoch: 7 Global Step: 36610 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:56:00,186-Speed 10419.36 samples/sec Loss 9.8826 LearningRate 0.3922 Epoch: 7 Global Step: 36620 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:56:08,000-Speed 10486.98 samples/sec Loss 9.9413 LearningRate 0.3921 Epoch: 7 Global Step: 36630 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:56:15,828-Speed 10466.28 samples/sec Loss 10.1826 LearningRate 0.3920 Epoch: 7 Global Step: 36640 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:56:23,647-Speed 10478.70 samples/sec Loss 9.9924 LearningRate 0.3918 Epoch: 7 Global Step: 36650 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:56:31,508-Speed 10422.56 samples/sec Loss 10.0997 LearningRate 0.3917 Epoch: 7 Global Step: 36660 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:56:39,330-Speed 10480.95 samples/sec Loss 10.0441 LearningRate 0.3916 Epoch: 7 Global Step: 36670 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:56:47,196-Speed 10416.07 samples/sec Loss 10.0149 LearningRate 0.3915 Epoch: 7 Global Step: 36680 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:56:55,041-Speed 10443.41 samples/sec Loss 9.9797 LearningRate 0.3914 Epoch: 7 Global Step: 36690 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:57:02,933-Speed 10381.10 samples/sec Loss 9.9747 LearningRate 0.3913 Epoch: 7 Global Step: 36700 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:57:10,760-Speed 10468.75 samples/sec Loss 9.9845 LearningRate 0.3911 Epoch: 7 Global Step: 36710 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:57:18,594-Speed 10457.44 samples/sec Loss 9.9504 LearningRate 0.3910 Epoch: 7 Global Step: 36720 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:57:26,445-Speed 10436.36 samples/sec Loss 10.0785 LearningRate 0.3909 Epoch: 7 Global Step: 36730 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:57:34,284-Speed 10452.25 samples/sec Loss 9.9686 LearningRate 0.3908 Epoch: 7 Global Step: 36740 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:57:42,154-Speed 10411.07 samples/sec Loss 9.9510 LearningRate 0.3907 Epoch: 7 Global Step: 36750 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:57:50,021-Speed 10413.31 samples/sec Loss 10.0710 LearningRate 0.3906 Epoch: 7 Global Step: 36760 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:57:57,931-Speed 10358.62 samples/sec Loss 10.0093 LearningRate 0.3904 Epoch: 7 Global Step: 36770 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:58:05,781-Speed 10437.17 samples/sec Loss 9.9705 LearningRate 0.3903 Epoch: 7 Global Step: 36780 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:58:13,612-Speed 10463.82 samples/sec Loss 9.9667 LearningRate 0.3902 Epoch: 7 Global Step: 36790 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:58:21,459-Speed 10440.21 samples/sec Loss 9.9366 LearningRate 0.3901 Epoch: 7 Global Step: 36800 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 22:58:29,303-Speed 10445.53 samples/sec Loss 9.9009 LearningRate 0.3900 Epoch: 7 Global Step: 36810 Fp16 Grad Scale: 524288 Required: 15 hours Training: 2022-01-15 22:58:37,140-Speed 10454.66 samples/sec Loss 9.9888 LearningRate 0.3899 Epoch: 7 Global Step: 36820 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:58:44,997-Speed 10428.38 samples/sec Loss 9.9793 LearningRate 0.3897 Epoch: 7 Global Step: 36830 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:58:52,824-Speed 10467.61 samples/sec Loss 10.0011 LearningRate 0.3896 Epoch: 7 Global Step: 36840 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:59:00,664-Speed 10449.73 samples/sec Loss 10.0110 LearningRate 0.3895 Epoch: 7 Global Step: 36850 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:59:08,546-Speed 10394.70 samples/sec Loss 9.9574 LearningRate 0.3894 Epoch: 7 Global Step: 36860 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:59:16,389-Speed 10447.13 samples/sec Loss 9.9326 LearningRate 0.3893 Epoch: 7 Global Step: 36870 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:59:24,226-Speed 10454.13 samples/sec Loss 9.9322 LearningRate 0.3892 Epoch: 7 Global Step: 36880 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:59:32,072-Speed 10443.56 samples/sec Loss 9.9953 LearningRate 0.3890 Epoch: 7 Global Step: 36890 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:59:39,909-Speed 10454.44 samples/sec Loss 9.9678 LearningRate 0.3889 Epoch: 7 Global Step: 36900 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:59:47,727-Speed 10480.48 samples/sec Loss 9.9332 LearningRate 0.3888 Epoch: 7 Global Step: 36910 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 22:59:55,545-Speed 10478.84 samples/sec Loss 9.9655 LearningRate 0.3887 Epoch: 7 Global Step: 36920 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:00:03,396-Speed 10436.54 samples/sec Loss 9.9476 LearningRate 0.3886 Epoch: 7 Global Step: 36930 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:00:11,239-Speed 10446.20 samples/sec Loss 9.9339 LearningRate 0.3885 Epoch: 7 Global Step: 36940 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:00:19,074-Speed 10456.90 samples/sec Loss 9.9024 LearningRate 0.3884 Epoch: 7 Global Step: 36950 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:00:26,924-Speed 10437.88 samples/sec Loss 9.9812 LearningRate 0.3882 Epoch: 7 Global Step: 36960 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:00:34,774-Speed 10437.26 samples/sec Loss 10.0086 LearningRate 0.3881 Epoch: 7 Global Step: 36970 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:00:42,619-Speed 10443.63 samples/sec Loss 9.9440 LearningRate 0.3880 Epoch: 7 Global Step: 36980 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:00:50,480-Speed 10421.56 samples/sec Loss 9.8873 LearningRate 0.3879 Epoch: 7 Global Step: 36990 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:00:58,322-Speed 10449.35 samples/sec Loss 9.9303 LearningRate 0.3878 Epoch: 7 Global Step: 37000 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:01:06,167-Speed 10443.19 samples/sec Loss 9.9216 LearningRate 0.3877 Epoch: 7 Global Step: 37010 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:01:13,977-Speed 10490.07 samples/sec Loss 9.9267 LearningRate 0.3875 Epoch: 7 Global Step: 37020 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:01:21,776-Speed 10505.72 samples/sec Loss 9.9687 LearningRate 0.3874 Epoch: 7 Global Step: 37030 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:01:29,596-Speed 10477.02 samples/sec Loss 10.0431 LearningRate 0.3873 Epoch: 7 Global Step: 37040 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:01:37,404-Speed 10493.16 samples/sec Loss 9.9566 LearningRate 0.3872 Epoch: 7 Global Step: 37050 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:01:45,213-Speed 10492.25 samples/sec Loss 9.8926 LearningRate 0.3871 Epoch: 7 Global Step: 37060 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:01:53,009-Speed 10509.21 samples/sec Loss 9.9643 LearningRate 0.3870 Epoch: 7 Global Step: 37070 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:02:00,823-Speed 10485.19 samples/sec Loss 9.9351 LearningRate 0.3868 Epoch: 7 Global Step: 37080 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:02:08,641-Speed 10482.16 samples/sec Loss 9.8978 LearningRate 0.3867 Epoch: 7 Global Step: 37090 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:02:16,457-Speed 10482.54 samples/sec Loss 9.9295 LearningRate 0.3866 Epoch: 7 Global Step: 37100 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:02:24,270-Speed 10486.54 samples/sec Loss 9.8710 LearningRate 0.3865 Epoch: 7 Global Step: 37110 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:02:32,124-Speed 10430.91 samples/sec Loss 9.9073 LearningRate 0.3864 Epoch: 7 Global Step: 37120 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:02:39,930-Speed 10496.58 samples/sec Loss 9.9475 LearningRate 0.3863 Epoch: 7 Global Step: 37130 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:02:47,727-Speed 10507.91 samples/sec Loss 9.9075 LearningRate 0.3861 Epoch: 7 Global Step: 37140 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:02:55,556-Speed 10465.98 samples/sec Loss 9.8430 LearningRate 0.3860 Epoch: 7 Global Step: 37150 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:03:03,367-Speed 10488.33 samples/sec Loss 9.9925 LearningRate 0.3859 Epoch: 7 Global Step: 37160 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:03:11,193-Speed 10469.02 samples/sec Loss 9.9474 LearningRate 0.3858 Epoch: 7 Global Step: 37170 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:03:19,015-Speed 10475.19 samples/sec Loss 9.9262 LearningRate 0.3857 Epoch: 7 Global Step: 37180 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:03:26,830-Speed 10483.14 samples/sec Loss 9.9680 LearningRate 0.3856 Epoch: 7 Global Step: 37190 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:03:34,624-Speed 10512.63 samples/sec Loss 9.8878 LearningRate 0.3854 Epoch: 7 Global Step: 37200 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:03:42,413-Speed 10518.24 samples/sec Loss 9.8864 LearningRate 0.3853 Epoch: 7 Global Step: 37210 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:03:50,216-Speed 10500.65 samples/sec Loss 9.9425 LearningRate 0.3852 Epoch: 7 Global Step: 37220 Fp16 Grad Scale: 524288 Required: 15 hours Training: 2022-01-15 23:03:57,990-Speed 10538.79 samples/sec Loss 9.9396 LearningRate 0.3851 Epoch: 7 Global Step: 37230 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:04:05,808-Speed 10480.20 samples/sec Loss 9.9278 LearningRate 0.3850 Epoch: 7 Global Step: 37240 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:04:13,622-Speed 10484.85 samples/sec Loss 9.9379 LearningRate 0.3849 Epoch: 7 Global Step: 37250 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:04:21,423-Speed 10502.79 samples/sec Loss 9.8651 LearningRate 0.3848 Epoch: 7 Global Step: 37260 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:04:29,217-Speed 10512.77 samples/sec Loss 9.8628 LearningRate 0.3846 Epoch: 7 Global Step: 37270 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:04:37,044-Speed 10467.43 samples/sec Loss 9.7972 LearningRate 0.3845 Epoch: 7 Global Step: 37280 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:04:44,827-Speed 10526.47 samples/sec Loss 9.7992 LearningRate 0.3844 Epoch: 7 Global Step: 37290 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 23:04:52,621-Speed 10512.49 samples/sec Loss 9.9416 LearningRate 0.3843 Epoch: 7 Global Step: 37300 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 23:05:00,400-Speed 10533.60 samples/sec Loss 9.8575 LearningRate 0.3842 Epoch: 7 Global Step: 37310 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 23:05:08,221-Speed 10476.21 samples/sec Loss 9.8523 LearningRate 0.3841 Epoch: 7 Global Step: 37320 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 23:05:16,009-Speed 10519.56 samples/sec Loss 9.9159 LearningRate 0.3839 Epoch: 7 Global Step: 37330 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 23:05:23,804-Speed 10511.20 samples/sec Loss 9.9134 LearningRate 0.3838 Epoch: 7 Global Step: 37340 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 23:05:31,601-Speed 10508.32 samples/sec Loss 9.8287 LearningRate 0.3837 Epoch: 7 Global Step: 37350 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 23:05:39,406-Speed 10497.15 samples/sec Loss 9.9629 LearningRate 0.3836 Epoch: 7 Global Step: 37360 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 23:05:47,215-Speed 10492.79 samples/sec Loss 9.8865 LearningRate 0.3835 Epoch: 7 Global Step: 37370 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 23:05:55,023-Speed 10493.31 samples/sec Loss 9.8047 LearningRate 0.3834 Epoch: 7 Global Step: 37380 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 23:06:02,821-Speed 10506.53 samples/sec Loss 9.8426 LearningRate 0.3832 Epoch: 7 Global Step: 37390 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:06:10,646-Speed 10470.86 samples/sec Loss 9.8667 LearningRate 0.3831 Epoch: 7 Global Step: 37400 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:06:18,460-Speed 10486.03 samples/sec Loss 9.8645 LearningRate 0.3830 Epoch: 7 Global Step: 37410 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:06:26,247-Speed 10519.96 samples/sec Loss 9.9642 LearningRate 0.3829 Epoch: 7 Global Step: 37420 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:06:34,037-Speed 10518.45 samples/sec Loss 9.8919 LearningRate 0.3828 Epoch: 7 Global Step: 37430 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:06:41,844-Speed 10494.46 samples/sec Loss 9.9075 LearningRate 0.3827 Epoch: 7 Global Step: 37440 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:06:49,641-Speed 10508.29 samples/sec Loss 9.8416 LearningRate 0.3826 Epoch: 7 Global Step: 37450 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:06:57,432-Speed 10515.54 samples/sec Loss 9.8876 LearningRate 0.3824 Epoch: 7 Global Step: 37460 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:07:05,260-Speed 10472.04 samples/sec Loss 9.8873 LearningRate 0.3823 Epoch: 7 Global Step: 37470 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:07:13,042-Speed 10528.78 samples/sec Loss 9.8339 LearningRate 0.3822 Epoch: 7 Global Step: 37480 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:07:20,826-Speed 10525.20 samples/sec Loss 9.8649 LearningRate 0.3821 Epoch: 7 Global Step: 37490 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:07:28,629-Speed 10499.17 samples/sec Loss 9.9124 LearningRate 0.3820 Epoch: 7 Global Step: 37500 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 23:07:36,426-Speed 10508.32 samples/sec Loss 9.9359 LearningRate 0.3819 Epoch: 7 Global Step: 37510 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 23:07:44,240-Speed 10485.99 samples/sec Loss 9.8840 LearningRate 0.3817 Epoch: 7 Global Step: 37520 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 23:07:52,077-Speed 10453.50 samples/sec Loss 9.8947 LearningRate 0.3816 Epoch: 7 Global Step: 37530 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 23:07:59,907-Speed 10464.96 samples/sec Loss 9.9020 LearningRate 0.3815 Epoch: 7 Global Step: 37540 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 23:08:07,707-Speed 10503.99 samples/sec Loss 9.8044 LearningRate 0.3814 Epoch: 7 Global Step: 37550 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 23:08:15,521-Speed 10486.11 samples/sec Loss 9.8448 LearningRate 0.3813 Epoch: 7 Global Step: 37560 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 23:08:23,344-Speed 10472.19 samples/sec Loss 9.8240 LearningRate 0.3812 Epoch: 7 Global Step: 37570 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 23:08:31,170-Speed 10469.60 samples/sec Loss 9.7901 LearningRate 0.3811 Epoch: 7 Global Step: 37580 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 23:08:39,001-Speed 10461.79 samples/sec Loss 9.8384 LearningRate 0.3809 Epoch: 7 Global Step: 37590 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-01-15 23:08:46,834-Speed 10460.21 samples/sec Loss 9.8173 LearningRate 0.3808 Epoch: 7 Global Step: 37600 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:08:54,632-Speed 10505.96 samples/sec Loss 9.8001 LearningRate 0.3807 Epoch: 7 Global Step: 37610 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:09:02,468-Speed 10456.67 samples/sec Loss 9.8705 LearningRate 0.3806 Epoch: 7 Global Step: 37620 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:09:10,277-Speed 10491.55 samples/sec Loss 9.8713 LearningRate 0.3805 Epoch: 7 Global Step: 37630 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:09:18,058-Speed 10529.84 samples/sec Loss 9.8782 LearningRate 0.3804 Epoch: 7 Global Step: 37640 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:09:25,863-Speed 10497.71 samples/sec Loss 9.8786 LearningRate 0.3802 Epoch: 7 Global Step: 37650 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:09:33,632-Speed 10545.20 samples/sec Loss 9.8305 LearningRate 0.3801 Epoch: 7 Global Step: 37660 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:09:41,419-Speed 10521.46 samples/sec Loss 9.9002 LearningRate 0.3800 Epoch: 7 Global Step: 37670 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:09:49,227-Speed 10492.60 samples/sec Loss 9.8674 LearningRate 0.3799 Epoch: 7 Global Step: 37680 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-01-15 23:09:57,011-Speed 10525.80 samples/sec Loss 9.8102 LearningRate 0.3798 Epoch: 7 Global Step: 37690 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:10:04,797-Speed 10522.81 samples/sec Loss 9.8884 LearningRate 0.3797 Epoch: 7 Global Step: 37700 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:10:12,597-Speed 10504.15 samples/sec Loss 9.8124 LearningRate 0.3796 Epoch: 7 Global Step: 37710 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:10:20,422-Speed 10471.53 samples/sec Loss 9.8204 LearningRate 0.3794 Epoch: 7 Global Step: 37720 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:10:28,226-Speed 10497.65 samples/sec Loss 9.8198 LearningRate 0.3793 Epoch: 7 Global Step: 37730 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:10:36,055-Speed 10465.01 samples/sec Loss 9.8086 LearningRate 0.3792 Epoch: 7 Global Step: 37740 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:10:43,899-Speed 10445.75 samples/sec Loss 9.9178 LearningRate 0.3791 Epoch: 7 Global Step: 37750 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:10:51,684-Speed 10523.33 samples/sec Loss 9.8763 LearningRate 0.3790 Epoch: 7 Global Step: 37760 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:10:59,471-Speed 10522.56 samples/sec Loss 9.8202 LearningRate 0.3789 Epoch: 7 Global Step: 37770 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:11:07,256-Speed 10524.53 samples/sec Loss 9.7694 LearningRate 0.3787 Epoch: 7 Global Step: 37780 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:11:15,059-Speed 10500.02 samples/sec Loss 9.7551 LearningRate 0.3786 Epoch: 7 Global Step: 37790 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:11:22,851-Speed 10516.16 samples/sec Loss 9.8306 LearningRate 0.3785 Epoch: 7 Global Step: 37800 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:11:30,665-Speed 10485.00 samples/sec Loss 9.8317 LearningRate 0.3784 Epoch: 7 Global Step: 37810 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:11:38,468-Speed 10500.02 samples/sec Loss 9.9081 LearningRate 0.3783 Epoch: 7 Global Step: 37820 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:11:46,259-Speed 10515.80 samples/sec Loss 9.8418 LearningRate 0.3782 Epoch: 7 Global Step: 37830 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:11:54,043-Speed 10525.72 samples/sec Loss 9.8240 LearningRate 0.3781 Epoch: 7 Global Step: 37840 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:12:01,826-Speed 10526.36 samples/sec Loss 9.7961 LearningRate 0.3779 Epoch: 7 Global Step: 37850 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:12:09,624-Speed 10508.34 samples/sec Loss 9.8021 LearningRate 0.3778 Epoch: 7 Global Step: 37860 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:12:17,449-Speed 10470.04 samples/sec Loss 9.7585 LearningRate 0.3777 Epoch: 7 Global Step: 37870 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:12:25,265-Speed 10483.38 samples/sec Loss 9.8425 LearningRate 0.3776 Epoch: 7 Global Step: 37880 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:12:33,067-Speed 10502.45 samples/sec Loss 9.7681 LearningRate 0.3775 Epoch: 7 Global Step: 37890 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:12:40,932-Speed 10416.45 samples/sec Loss 9.7741 LearningRate 0.3774 Epoch: 7 Global Step: 37900 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:12:48,721-Speed 10518.74 samples/sec Loss 9.7527 LearningRate 0.3773 Epoch: 7 Global Step: 37910 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:12:56,497-Speed 10536.80 samples/sec Loss 9.7733 LearningRate 0.3771 Epoch: 7 Global Step: 37920 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:13:04,320-Speed 10472.77 samples/sec Loss 9.8460 LearningRate 0.3770 Epoch: 7 Global Step: 37930 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:13:12,142-Speed 10474.99 samples/sec Loss 9.8017 LearningRate 0.3769 Epoch: 7 Global Step: 37940 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:13:19,930-Speed 10520.03 samples/sec Loss 9.8121 LearningRate 0.3768 Epoch: 7 Global Step: 37950 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:13:27,745-Speed 10483.88 samples/sec Loss 9.7744 LearningRate 0.3767 Epoch: 7 Global Step: 37960 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:13:35,533-Speed 10520.28 samples/sec Loss 9.7955 LearningRate 0.3766 Epoch: 7 Global Step: 37970 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-15 23:13:43,325-Speed 10514.31 samples/sec Loss 9.7520 LearningRate 0.3765 Epoch: 7 Global Step: 37980 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-15 23:13:51,153-Speed 10466.09 samples/sec Loss 9.7770 LearningRate 0.3763 Epoch: 7 Global Step: 37990 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-15 23:13:58,930-Speed 10534.88 samples/sec Loss 9.7686 LearningRate 0.3762 Epoch: 7 Global Step: 38000 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-15 23:14:06,729-Speed 10505.64 samples/sec Loss 9.7108 LearningRate 0.3761 Epoch: 7 Global Step: 38010 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-15 23:14:14,524-Speed 10510.31 samples/sec Loss 9.7651 LearningRate 0.3760 Epoch: 7 Global Step: 38020 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-15 23:14:22,370-Speed 10442.78 samples/sec Loss 9.8598 LearningRate 0.3759 Epoch: 7 Global Step: 38030 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-15 23:14:30,167-Speed 10507.42 samples/sec Loss 9.8156 LearningRate 0.3758 Epoch: 7 Global Step: 38040 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-15 23:14:37,956-Speed 10518.40 samples/sec Loss 9.7905 LearningRate 0.3757 Epoch: 7 Global Step: 38050 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-15 23:14:45,766-Speed 10491.41 samples/sec Loss 9.7582 LearningRate 0.3755 Epoch: 7 Global Step: 38060 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-15 23:14:53,555-Speed 10518.86 samples/sec Loss 9.7677 LearningRate 0.3754 Epoch: 7 Global Step: 38070 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:15:01,457-Speed 10368.48 samples/sec Loss 9.7089 LearningRate 0.3753 Epoch: 7 Global Step: 38080 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:15:09,324-Speed 10414.19 samples/sec Loss 9.7577 LearningRate 0.3752 Epoch: 7 Global Step: 38090 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:15:17,134-Speed 10490.61 samples/sec Loss 9.7835 LearningRate 0.3751 Epoch: 7 Global Step: 38100 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:15:24,955-Speed 10475.43 samples/sec Loss 9.7860 LearningRate 0.3750 Epoch: 7 Global Step: 38110 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:15:32,760-Speed 10496.98 samples/sec Loss 9.8778 LearningRate 0.3749 Epoch: 7 Global Step: 38120 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:15:40,557-Speed 10508.26 samples/sec Loss 9.7577 LearningRate 0.3747 Epoch: 7 Global Step: 38130 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:15:48,371-Speed 10485.80 samples/sec Loss 9.7785 LearningRate 0.3746 Epoch: 7 Global Step: 38140 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:15:56,171-Speed 10503.84 samples/sec Loss 9.7697 LearningRate 0.3745 Epoch: 7 Global Step: 38150 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:16:03,972-Speed 10502.10 samples/sec Loss 9.7760 LearningRate 0.3744 Epoch: 7 Global Step: 38160 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-15 23:16:11,795-Speed 10472.82 samples/sec Loss 9.8586 LearningRate 0.3743 Epoch: 7 Global Step: 38170 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-15 23:16:19,625-Speed 10464.77 samples/sec Loss 9.7258 LearningRate 0.3742 Epoch: 7 Global Step: 38180 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-15 23:16:27,438-Speed 10485.59 samples/sec Loss 9.6733 LearningRate 0.3741 Epoch: 7 Global Step: 38190 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-15 23:16:35,230-Speed 10515.98 samples/sec Loss 9.6947 LearningRate 0.3739 Epoch: 7 Global Step: 38200 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-15 23:16:43,067-Speed 10454.94 samples/sec Loss 9.7407 LearningRate 0.3738 Epoch: 7 Global Step: 38210 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-15 23:16:50,878-Speed 10489.75 samples/sec Loss 9.7334 LearningRate 0.3737 Epoch: 7 Global Step: 38220 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-15 23:16:58,688-Speed 10491.26 samples/sec Loss 9.6989 LearningRate 0.3736 Epoch: 7 Global Step: 38230 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-15 23:17:06,518-Speed 10462.79 samples/sec Loss 9.7370 LearningRate 0.3735 Epoch: 7 Global Step: 38240 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-15 23:17:14,314-Speed 10509.85 samples/sec Loss 9.6990 LearningRate 0.3734 Epoch: 7 Global Step: 38250 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-01-15 23:17:22,126-Speed 10487.96 samples/sec Loss 9.7854 LearningRate 0.3733 Epoch: 7 Global Step: 38260 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-15 23:17:30,001-Speed 10404.95 samples/sec Loss 9.6897 LearningRate 0.3731 Epoch: 7 Global Step: 38270 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-15 23:17:37,885-Speed 10391.35 samples/sec Loss 9.7765 LearningRate 0.3730 Epoch: 7 Global Step: 38280 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-15 23:17:45,693-Speed 10493.14 samples/sec Loss 9.7403 LearningRate 0.3729 Epoch: 7 Global Step: 38290 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-15 23:17:53,517-Speed 10471.73 samples/sec Loss 9.7033 LearningRate 0.3728 Epoch: 7 Global Step: 38300 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-15 23:18:01,325-Speed 10492.99 samples/sec Loss 9.7257 LearningRate 0.3727 Epoch: 7 Global Step: 38310 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-15 23:18:09,106-Speed 10529.60 samples/sec Loss 9.6460 LearningRate 0.3726 Epoch: 7 Global Step: 38320 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-15 23:18:16,901-Speed 10511.45 samples/sec Loss 9.7566 LearningRate 0.3725 Epoch: 7 Global Step: 38330 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-15 23:18:24,743-Speed 10447.58 samples/sec Loss 9.8122 LearningRate 0.3723 Epoch: 7 Global Step: 38340 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-15 23:18:32,553-Speed 10490.68 samples/sec Loss 9.7605 LearningRate 0.3722 Epoch: 7 Global Step: 38350 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-01-15 23:18:40,331-Speed 10533.46 samples/sec Loss 9.7273 LearningRate 0.3721 Epoch: 7 Global Step: 38360 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:18:48,156-Speed 10470.36 samples/sec Loss 9.7573 LearningRate 0.3720 Epoch: 7 Global Step: 38370 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:18:55,989-Speed 10460.13 samples/sec Loss 9.6641 LearningRate 0.3719 Epoch: 7 Global Step: 38380 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:19:03,829-Speed 10451.24 samples/sec Loss 9.7404 LearningRate 0.3718 Epoch: 7 Global Step: 38390 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:19:11,611-Speed 10527.99 samples/sec Loss 9.7243 LearningRate 0.3717 Epoch: 7 Global Step: 38400 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:19:19,406-Speed 10510.60 samples/sec Loss 9.6134 LearningRate 0.3715 Epoch: 7 Global Step: 38410 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:19:27,247-Speed 10449.99 samples/sec Loss 9.7346 LearningRate 0.3714 Epoch: 7 Global Step: 38420 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:19:35,087-Speed 10450.29 samples/sec Loss 9.6899 LearningRate 0.3713 Epoch: 7 Global Step: 38430 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:19:42,876-Speed 10517.31 samples/sec Loss 9.6771 LearningRate 0.3712 Epoch: 7 Global Step: 38440 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:19:50,674-Speed 10507.62 samples/sec Loss 9.7469 LearningRate 0.3711 Epoch: 7 Global Step: 38450 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:19:58,480-Speed 10496.01 samples/sec Loss 9.8443 LearningRate 0.3710 Epoch: 7 Global Step: 38460 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:20:06,272-Speed 10514.05 samples/sec Loss 9.7056 LearningRate 0.3709 Epoch: 7 Global Step: 38470 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:20:14,078-Speed 10496.64 samples/sec Loss 9.6756 LearningRate 0.3707 Epoch: 7 Global Step: 38480 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:20:21,894-Speed 10482.34 samples/sec Loss 9.7279 LearningRate 0.3706 Epoch: 7 Global Step: 38490 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:20:29,760-Speed 10420.25 samples/sec Loss 9.7143 LearningRate 0.3705 Epoch: 7 Global Step: 38500 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:20:37,578-Speed 10480.82 samples/sec Loss 9.7421 LearningRate 0.3704 Epoch: 7 Global Step: 38510 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:20:45,384-Speed 10495.84 samples/sec Loss 9.6715 LearningRate 0.3703 Epoch: 7 Global Step: 38520 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:20:53,186-Speed 10500.55 samples/sec Loss 9.6813 LearningRate 0.3702 Epoch: 7 Global Step: 38530 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:21:00,984-Speed 10507.05 samples/sec Loss 9.7665 LearningRate 0.3701 Epoch: 7 Global Step: 38540 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:21:08,773-Speed 10518.31 samples/sec Loss 9.6886 LearningRate 0.3700 Epoch: 7 Global Step: 38550 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:21:16,578-Speed 10498.06 samples/sec Loss 9.6870 LearningRate 0.3698 Epoch: 7 Global Step: 38560 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:21:24,401-Speed 10472.10 samples/sec Loss 9.6252 LearningRate 0.3697 Epoch: 7 Global Step: 38570 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:21:32,206-Speed 10498.02 samples/sec Loss 9.7309 LearningRate 0.3696 Epoch: 7 Global Step: 38580 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:21:40,005-Speed 10505.36 samples/sec Loss 9.7243 LearningRate 0.3695 Epoch: 7 Global Step: 38590 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:21:47,810-Speed 10496.15 samples/sec Loss 9.6857 LearningRate 0.3694 Epoch: 7 Global Step: 38600 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:21:55,610-Speed 10503.87 samples/sec Loss 9.7554 LearningRate 0.3693 Epoch: 7 Global Step: 38610 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:22:03,414-Speed 10500.47 samples/sec Loss 9.6648 LearningRate 0.3692 Epoch: 7 Global Step: 38620 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:22:11,217-Speed 10498.86 samples/sec Loss 9.7663 LearningRate 0.3690 Epoch: 7 Global Step: 38630 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:22:19,004-Speed 10521.59 samples/sec Loss 9.6363 LearningRate 0.3689 Epoch: 7 Global Step: 38640 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:22:26,799-Speed 10510.15 samples/sec Loss 9.7505 LearningRate 0.3688 Epoch: 7 Global Step: 38650 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:22:34,618-Speed 10481.85 samples/sec Loss 9.7064 LearningRate 0.3687 Epoch: 7 Global Step: 38660 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:22:42,454-Speed 10455.81 samples/sec Loss 9.6924 LearningRate 0.3686 Epoch: 7 Global Step: 38670 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:22:50,255-Speed 10501.55 samples/sec Loss 9.7104 LearningRate 0.3685 Epoch: 7 Global Step: 38680 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:22:58,068-Speed 10487.33 samples/sec Loss 9.6730 LearningRate 0.3684 Epoch: 7 Global Step: 38690 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:23:05,878-Speed 10490.28 samples/sec Loss 9.6557 LearningRate 0.3682 Epoch: 7 Global Step: 38700 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:23:13,684-Speed 10495.70 samples/sec Loss 9.6612 LearningRate 0.3681 Epoch: 7 Global Step: 38710 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:23:21,477-Speed 10513.65 samples/sec Loss 9.6560 LearningRate 0.3680 Epoch: 7 Global Step: 38720 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:23:29,266-Speed 10519.63 samples/sec Loss 9.7321 LearningRate 0.3679 Epoch: 7 Global Step: 38730 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:23:37,053-Speed 10520.75 samples/sec Loss 9.7296 LearningRate 0.3678 Epoch: 7 Global Step: 38740 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:23:44,847-Speed 10512.76 samples/sec Loss 9.7706 LearningRate 0.3677 Epoch: 7 Global Step: 38750 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:23:52,667-Speed 10475.93 samples/sec Loss 9.6674 LearningRate 0.3676 Epoch: 7 Global Step: 38760 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:24:00,466-Speed 10506.41 samples/sec Loss 9.6779 LearningRate 0.3675 Epoch: 7 Global Step: 38770 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:24:08,260-Speed 10511.80 samples/sec Loss 9.6190 LearningRate 0.3673 Epoch: 7 Global Step: 38780 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:24:16,048-Speed 10519.96 samples/sec Loss 9.6659 LearningRate 0.3672 Epoch: 7 Global Step: 38790 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:24:23,827-Speed 10532.50 samples/sec Loss 9.6379 LearningRate 0.3671 Epoch: 7 Global Step: 38800 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:24:31,631-Speed 10499.12 samples/sec Loss 9.7186 LearningRate 0.3670 Epoch: 7 Global Step: 38810 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:24:39,423-Speed 10515.11 samples/sec Loss 9.6425 LearningRate 0.3669 Epoch: 7 Global Step: 38820 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:24:47,234-Speed 10488.93 samples/sec Loss 9.6253 LearningRate 0.3668 Epoch: 7 Global Step: 38830 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:24:55,019-Speed 10522.55 samples/sec Loss 9.6789 LearningRate 0.3667 Epoch: 7 Global Step: 38840 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:25:02,848-Speed 10465.96 samples/sec Loss 9.6450 LearningRate 0.3666 Epoch: 7 Global Step: 38850 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:25:10,675-Speed 10468.87 samples/sec Loss 9.6634 LearningRate 0.3664 Epoch: 7 Global Step: 38860 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:25:18,497-Speed 10473.51 samples/sec Loss 9.6520 LearningRate 0.3663 Epoch: 7 Global Step: 38870 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:25:26,312-Speed 10483.71 samples/sec Loss 9.6807 LearningRate 0.3662 Epoch: 7 Global Step: 38880 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:25:34,113-Speed 10503.53 samples/sec Loss 9.6501 LearningRate 0.3661 Epoch: 7 Global Step: 38890 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:25:41,924-Speed 10488.31 samples/sec Loss 9.6083 LearningRate 0.3660 Epoch: 7 Global Step: 38900 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:25:49,750-Speed 10469.62 samples/sec Loss 9.6694 LearningRate 0.3659 Epoch: 7 Global Step: 38910 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:25:57,583-Speed 10460.24 samples/sec Loss 9.6463 LearningRate 0.3658 Epoch: 7 Global Step: 38920 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:26:05,406-Speed 10472.41 samples/sec Loss 9.7194 LearningRate 0.3656 Epoch: 7 Global Step: 38930 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:26:13,241-Speed 10457.34 samples/sec Loss 9.6408 LearningRate 0.3655 Epoch: 7 Global Step: 38940 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:26:21,059-Speed 10479.53 samples/sec Loss 9.6719 LearningRate 0.3654 Epoch: 7 Global Step: 38950 Fp16 Grad Scale: 524288 Required: 14 hours Training: 2022-01-15 23:26:28,842-Speed 10527.35 samples/sec Loss 9.6686 LearningRate 0.3653 Epoch: 7 Global Step: 38960 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:26:36,625-Speed 10526.50 samples/sec Loss 9.6503 LearningRate 0.3652 Epoch: 7 Global Step: 38970 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:26:44,433-Speed 10493.00 samples/sec Loss 9.6186 LearningRate 0.3651 Epoch: 7 Global Step: 38980 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:26:52,250-Speed 10481.04 samples/sec Loss 9.6504 LearningRate 0.3650 Epoch: 7 Global Step: 38990 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:27:00,048-Speed 10507.26 samples/sec Loss 9.6293 LearningRate 0.3649 Epoch: 7 Global Step: 39000 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:27:07,839-Speed 10516.90 samples/sec Loss 9.6685 LearningRate 0.3647 Epoch: 7 Global Step: 39010 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:27:15,627-Speed 10519.79 samples/sec Loss 9.6778 LearningRate 0.3646 Epoch: 7 Global Step: 39020 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:27:23,438-Speed 10488.66 samples/sec Loss 9.6395 LearningRate 0.3645 Epoch: 7 Global Step: 39030 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:27:31,249-Speed 10489.16 samples/sec Loss 9.6574 LearningRate 0.3644 Epoch: 7 Global Step: 39040 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:27:39,042-Speed 10514.18 samples/sec Loss 9.6950 LearningRate 0.3643 Epoch: 7 Global Step: 39050 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:27:46,864-Speed 10473.75 samples/sec Loss 9.6484 LearningRate 0.3642 Epoch: 7 Global Step: 39060 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:27:54,651-Speed 10520.94 samples/sec Loss 9.6208 LearningRate 0.3641 Epoch: 7 Global Step: 39070 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:28:02,450-Speed 10505.74 samples/sec Loss 9.6107 LearningRate 0.3640 Epoch: 7 Global Step: 39080 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:28:10,254-Speed 10498.84 samples/sec Loss 9.6182 LearningRate 0.3638 Epoch: 7 Global Step: 39090 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:28:18,063-Speed 10493.01 samples/sec Loss 9.6345 LearningRate 0.3637 Epoch: 7 Global Step: 39100 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:28:25,871-Speed 10492.67 samples/sec Loss 9.5895 LearningRate 0.3636 Epoch: 7 Global Step: 39110 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:28:33,681-Speed 10489.64 samples/sec Loss 9.6571 LearningRate 0.3635 Epoch: 7 Global Step: 39120 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:28:41,460-Speed 10533.00 samples/sec Loss 9.6567 LearningRate 0.3634 Epoch: 7 Global Step: 39130 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:28:49,264-Speed 10498.46 samples/sec Loss 9.6140 LearningRate 0.3633 Epoch: 7 Global Step: 39140 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:28:57,061-Speed 10508.14 samples/sec Loss 9.6451 LearningRate 0.3632 Epoch: 7 Global Step: 39150 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:29:04,870-Speed 10491.80 samples/sec Loss 9.5767 LearningRate 0.3631 Epoch: 7 Global Step: 39160 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:29:12,692-Speed 10477.45 samples/sec Loss 9.6520 LearningRate 0.3629 Epoch: 7 Global Step: 39170 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:29:20,481-Speed 10518.92 samples/sec Loss 9.6383 LearningRate 0.3628 Epoch: 7 Global Step: 39180 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:29:28,266-Speed 10524.71 samples/sec Loss 9.6072 LearningRate 0.3627 Epoch: 7 Global Step: 39190 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:29:36,100-Speed 10457.22 samples/sec Loss 9.6869 LearningRate 0.3626 Epoch: 7 Global Step: 39200 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:29:43,908-Speed 10492.78 samples/sec Loss 9.6555 LearningRate 0.3625 Epoch: 7 Global Step: 39210 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:29:51,714-Speed 10495.88 samples/sec Loss 9.6444 LearningRate 0.3624 Epoch: 7 Global Step: 39220 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:29:59,516-Speed 10501.98 samples/sec Loss 9.5714 LearningRate 0.3623 Epoch: 7 Global Step: 39230 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:30:07,338-Speed 10474.18 samples/sec Loss 9.5702 LearningRate 0.3622 Epoch: 7 Global Step: 39240 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:30:15,124-Speed 10523.22 samples/sec Loss 9.5067 LearningRate 0.3620 Epoch: 7 Global Step: 39250 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:30:22,906-Speed 10527.87 samples/sec Loss 9.6641 LearningRate 0.3619 Epoch: 7 Global Step: 39260 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:30:30,689-Speed 10527.67 samples/sec Loss 9.7016 LearningRate 0.3618 Epoch: 7 Global Step: 39270 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:30:38,478-Speed 10518.36 samples/sec Loss 9.5905 LearningRate 0.3617 Epoch: 7 Global Step: 39280 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:30:46,278-Speed 10503.25 samples/sec Loss 9.6221 LearningRate 0.3616 Epoch: 7 Global Step: 39290 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:30:54,081-Speed 10501.14 samples/sec Loss 9.6274 LearningRate 0.3615 Epoch: 7 Global Step: 39300 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:31:01,898-Speed 10481.63 samples/sec Loss 9.5320 LearningRate 0.3614 Epoch: 7 Global Step: 39310 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:31:09,737-Speed 10451.05 samples/sec Loss 9.6100 LearningRate 0.3613 Epoch: 7 Global Step: 39320 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:31:17,541-Speed 10498.56 samples/sec Loss 9.5873 LearningRate 0.3611 Epoch: 7 Global Step: 39330 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:31:25,344-Speed 10500.63 samples/sec Loss 9.5586 LearningRate 0.3610 Epoch: 7 Global Step: 39340 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:31:33,141-Speed 10507.55 samples/sec Loss 9.5415 LearningRate 0.3609 Epoch: 7 Global Step: 39350 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:31:40,929-Speed 10520.90 samples/sec Loss 9.5303 LearningRate 0.3608 Epoch: 7 Global Step: 39360 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:31:48,736-Speed 10493.73 samples/sec Loss 9.6018 LearningRate 0.3607 Epoch: 7 Global Step: 39370 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:31:56,545-Speed 10492.34 samples/sec Loss 9.6654 LearningRate 0.3606 Epoch: 7 Global Step: 39380 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:32:04,334-Speed 10518.82 samples/sec Loss 9.6428 LearningRate 0.3605 Epoch: 7 Global Step: 39390 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:32:12,115-Speed 10530.03 samples/sec Loss 9.6620 LearningRate 0.3604 Epoch: 7 Global Step: 39400 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:32:19,904-Speed 10518.52 samples/sec Loss 9.6622 LearningRate 0.3602 Epoch: 7 Global Step: 39410 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:32:27,696-Speed 10514.81 samples/sec Loss 9.5463 LearningRate 0.3601 Epoch: 7 Global Step: 39420 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:32:35,521-Speed 10470.53 samples/sec Loss 9.5403 LearningRate 0.3600 Epoch: 7 Global Step: 39430 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:32:43,301-Speed 10531.05 samples/sec Loss 9.4992 LearningRate 0.3599 Epoch: 7 Global Step: 39440 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:32:51,129-Speed 10467.10 samples/sec Loss 9.5790 LearningRate 0.3598 Epoch: 7 Global Step: 39450 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:32:58,952-Speed 10472.90 samples/sec Loss 9.5538 LearningRate 0.3597 Epoch: 7 Global Step: 39460 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:33:06,762-Speed 10490.42 samples/sec Loss 9.5832 LearningRate 0.3596 Epoch: 7 Global Step: 39470 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:33:14,552-Speed 10517.44 samples/sec Loss 9.4907 LearningRate 0.3595 Epoch: 7 Global Step: 39480 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:33:22,350-Speed 10507.20 samples/sec Loss 9.5335 LearningRate 0.3593 Epoch: 7 Global Step: 39490 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:33:30,140-Speed 10516.58 samples/sec Loss 9.5994 LearningRate 0.3592 Epoch: 7 Global Step: 39500 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:33:38,000-Speed 10424.00 samples/sec Loss 9.5468 LearningRate 0.3591 Epoch: 7 Global Step: 39510 Fp16 Grad Scale: 524288 Required: 14 hours Training: 2022-01-15 23:33:45,825-Speed 10470.59 samples/sec Loss 9.6292 LearningRate 0.3590 Epoch: 7 Global Step: 39520 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:33:53,625-Speed 10505.22 samples/sec Loss 9.5422 LearningRate 0.3589 Epoch: 7 Global Step: 39530 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:34:01,411-Speed 10523.53 samples/sec Loss 9.5555 LearningRate 0.3588 Epoch: 7 Global Step: 39540 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:34:09,231-Speed 10476.29 samples/sec Loss 9.5569 LearningRate 0.3587 Epoch: 7 Global Step: 39550 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:34:17,081-Speed 10436.95 samples/sec Loss 9.5600 LearningRate 0.3586 Epoch: 7 Global Step: 39560 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:34:24,859-Speed 10533.76 samples/sec Loss 9.5858 LearningRate 0.3585 Epoch: 7 Global Step: 39570 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:34:32,648-Speed 10528.04 samples/sec Loss 9.6149 LearningRate 0.3583 Epoch: 7 Global Step: 39580 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:34:40,454-Speed 10496.46 samples/sec Loss 9.5185 LearningRate 0.3582 Epoch: 7 Global Step: 39590 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:34:48,267-Speed 10485.07 samples/sec Loss 9.5579 LearningRate 0.3581 Epoch: 7 Global Step: 39600 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:34:56,065-Speed 10506.60 samples/sec Loss 9.5968 LearningRate 0.3580 Epoch: 7 Global Step: 39610 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:35:03,853-Speed 10521.06 samples/sec Loss 9.5895 LearningRate 0.3579 Epoch: 7 Global Step: 39620 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:35:11,654-Speed 10503.36 samples/sec Loss 9.5477 LearningRate 0.3578 Epoch: 7 Global Step: 39630 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:35:19,445-Speed 10515.38 samples/sec Loss 9.5645 LearningRate 0.3577 Epoch: 7 Global Step: 39640 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:35:27,239-Speed 10513.08 samples/sec Loss 9.5118 LearningRate 0.3576 Epoch: 7 Global Step: 39650 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:35:35,014-Speed 10536.93 samples/sec Loss 9.4659 LearningRate 0.3574 Epoch: 7 Global Step: 39660 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:35:42,838-Speed 10472.05 samples/sec Loss 9.5577 LearningRate 0.3573 Epoch: 7 Global Step: 39670 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:35:50,622-Speed 10525.00 samples/sec Loss 9.5891 LearningRate 0.3572 Epoch: 7 Global Step: 39680 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:35:58,444-Speed 10475.52 samples/sec Loss 9.5182 LearningRate 0.3571 Epoch: 7 Global Step: 39690 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:36:06,245-Speed 10501.81 samples/sec Loss 9.4771 LearningRate 0.3570 Epoch: 7 Global Step: 39700 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:36:14,044-Speed 10505.59 samples/sec Loss 9.5182 LearningRate 0.3569 Epoch: 7 Global Step: 39710 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:36:21,835-Speed 10516.16 samples/sec Loss 9.5268 LearningRate 0.3568 Epoch: 7 Global Step: 39720 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:36:29,630-Speed 10510.89 samples/sec Loss 9.4943 LearningRate 0.3567 Epoch: 7 Global Step: 39730 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:36:37,423-Speed 10513.52 samples/sec Loss 9.5758 LearningRate 0.3566 Epoch: 7 Global Step: 39740 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:36:45,228-Speed 10496.42 samples/sec Loss 9.5045 LearningRate 0.3564 Epoch: 7 Global Step: 39750 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:36:53,031-Speed 10500.69 samples/sec Loss 9.5953 LearningRate 0.3563 Epoch: 7 Global Step: 39760 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:37:00,842-Speed 10489.19 samples/sec Loss 9.6606 LearningRate 0.3562 Epoch: 7 Global Step: 39770 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:37:08,633-Speed 10516.11 samples/sec Loss 9.5618 LearningRate 0.3561 Epoch: 7 Global Step: 39780 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:37:16,422-Speed 10518.94 samples/sec Loss 9.5242 LearningRate 0.3560 Epoch: 7 Global Step: 39790 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:37:24,239-Speed 10481.15 samples/sec Loss 9.5222 LearningRate 0.3559 Epoch: 7 Global Step: 39800 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:37:32,042-Speed 10500.34 samples/sec Loss 9.5112 LearningRate 0.3558 Epoch: 7 Global Step: 39810 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:37:39,854-Speed 10487.53 samples/sec Loss 9.4632 LearningRate 0.3557 Epoch: 7 Global Step: 39820 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:37:47,673-Speed 10478.19 samples/sec Loss 9.5221 LearningRate 0.3556 Epoch: 7 Global Step: 39830 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:37:55,491-Speed 10479.88 samples/sec Loss 9.5361 LearningRate 0.3554 Epoch: 7 Global Step: 39840 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:38:03,282-Speed 10516.98 samples/sec Loss 9.4927 LearningRate 0.3553 Epoch: 7 Global Step: 39850 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:38:11,062-Speed 10531.02 samples/sec Loss 9.4745 LearningRate 0.3552 Epoch: 7 Global Step: 39860 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:38:18,864-Speed 10502.14 samples/sec Loss 9.5269 LearningRate 0.3551 Epoch: 7 Global Step: 39870 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:38:26,631-Speed 10547.59 samples/sec Loss 9.4470 LearningRate 0.3550 Epoch: 7 Global Step: 39880 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:38:34,424-Speed 10512.67 samples/sec Loss 9.4486 LearningRate 0.3549 Epoch: 7 Global Step: 39890 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:38:42,239-Speed 10484.04 samples/sec Loss 9.5675 LearningRate 0.3548 Epoch: 7 Global Step: 39900 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:38:50,027-Speed 10520.44 samples/sec Loss 9.6014 LearningRate 0.3547 Epoch: 7 Global Step: 39910 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:38:57,799-Speed 10541.18 samples/sec Loss 9.4758 LearningRate 0.3546 Epoch: 7 Global Step: 39920 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:39:05,567-Speed 10547.10 samples/sec Loss 9.5176 LearningRate 0.3544 Epoch: 7 Global Step: 39930 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:39:13,361-Speed 10511.86 samples/sec Loss 9.5031 LearningRate 0.3543 Epoch: 7 Global Step: 39940 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:39:21,153-Speed 10516.36 samples/sec Loss 9.5189 LearningRate 0.3542 Epoch: 7 Global Step: 39950 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:39:29,031-Speed 10399.28 samples/sec Loss 9.5243 LearningRate 0.3541 Epoch: 7 Global Step: 39960 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:39:36,814-Speed 10526.82 samples/sec Loss 9.4545 LearningRate 0.3540 Epoch: 7 Global Step: 39970 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:39:44,602-Speed 10520.06 samples/sec Loss 9.5439 LearningRate 0.3539 Epoch: 7 Global Step: 39980 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:39:52,418-Speed 10482.81 samples/sec Loss 9.5078 LearningRate 0.3538 Epoch: 7 Global Step: 39990 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:40:00,202-Speed 10525.62 samples/sec Loss 9.4429 LearningRate 0.3537 Epoch: 7 Global Step: 40000 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:40:28,043-[lfw][40000]XNorm: 23.041705 Training: 2022-01-15 23:40:28,043-[lfw][40000]Accuracy-Flip: 0.99667+-0.00236 Training: 2022-01-15 23:40:28,044-[lfw][40000]Accuracy-Highest: 0.99667 Training: 2022-01-15 23:41:00,975-[cfp_fp][40000]XNorm: 19.753488 Training: 2022-01-15 23:41:00,976-[cfp_fp][40000]Accuracy-Flip: 0.97571+-0.00818 Training: 2022-01-15 23:41:00,976-[cfp_fp][40000]Accuracy-Highest: 0.97571 Training: 2022-01-15 23:41:29,290-[agedb_30][40000]XNorm: 22.239426 Training: 2022-01-15 23:41:29,291-[agedb_30][40000]Accuracy-Flip: 0.96467+-0.01092 Training: 2022-01-15 23:41:29,291-[agedb_30][40000]Accuracy-Highest: 0.96467 Training: 2022-01-15 23:41:37,051-Speed 845.86 samples/sec Loss 9.5444 LearningRate 0.3536 Epoch: 7 Global Step: 40010 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:41:44,797-Speed 10577.06 samples/sec Loss 9.4968 LearningRate 0.3534 Epoch: 7 Global Step: 40020 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:41:52,560-Speed 10554.31 samples/sec Loss 9.5220 LearningRate 0.3533 Epoch: 7 Global Step: 40030 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:42:00,306-Speed 10577.08 samples/sec Loss 9.5181 LearningRate 0.3532 Epoch: 7 Global Step: 40040 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:42:08,044-Speed 10589.06 samples/sec Loss 9.4939 LearningRate 0.3531 Epoch: 7 Global Step: 40050 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:42:15,822-Speed 10533.74 samples/sec Loss 9.4221 LearningRate 0.3530 Epoch: 7 Global Step: 40060 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:42:23,579-Speed 10561.51 samples/sec Loss 9.5659 LearningRate 0.3529 Epoch: 7 Global Step: 40070 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:42:31,347-Speed 10548.31 samples/sec Loss 9.5836 LearningRate 0.3528 Epoch: 7 Global Step: 40080 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:42:39,122-Speed 10536.69 samples/sec Loss 9.4509 LearningRate 0.3527 Epoch: 7 Global Step: 40090 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:42:46,896-Speed 10540.15 samples/sec Loss 9.4689 LearningRate 0.3526 Epoch: 7 Global Step: 40100 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:42:54,683-Speed 10520.88 samples/sec Loss 9.4502 LearningRate 0.3524 Epoch: 7 Global Step: 40110 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:43:02,495-Speed 10487.43 samples/sec Loss 9.4475 LearningRate 0.3523 Epoch: 7 Global Step: 40120 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:43:10,257-Speed 10555.53 samples/sec Loss 9.4761 LearningRate 0.3522 Epoch: 7 Global Step: 40130 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:43:18,025-Speed 10549.20 samples/sec Loss 9.4941 LearningRate 0.3521 Epoch: 7 Global Step: 40140 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:43:25,821-Speed 10508.98 samples/sec Loss 9.4824 LearningRate 0.3520 Epoch: 7 Global Step: 40150 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:43:33,583-Speed 10555.37 samples/sec Loss 9.5030 LearningRate 0.3519 Epoch: 7 Global Step: 40160 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:43:41,377-Speed 10512.48 samples/sec Loss 9.4451 LearningRate 0.3518 Epoch: 7 Global Step: 40170 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:43:49,164-Speed 10530.15 samples/sec Loss 9.4338 LearningRate 0.3517 Epoch: 7 Global Step: 40180 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:43:56,922-Speed 10560.40 samples/sec Loss 9.4351 LearningRate 0.3516 Epoch: 7 Global Step: 40190 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:44:04,702-Speed 10530.54 samples/sec Loss 9.4430 LearningRate 0.3514 Epoch: 7 Global Step: 40200 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:44:12,459-Speed 10561.07 samples/sec Loss 9.4231 LearningRate 0.3513 Epoch: 7 Global Step: 40210 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:44:20,227-Speed 10548.57 samples/sec Loss 9.4594 LearningRate 0.3512 Epoch: 7 Global Step: 40220 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:44:27,982-Speed 10565.56 samples/sec Loss 9.5063 LearningRate 0.3511 Epoch: 7 Global Step: 40230 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:44:35,739-Speed 10561.47 samples/sec Loss 9.4351 LearningRate 0.3510 Epoch: 7 Global Step: 40240 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:44:43,543-Speed 10499.45 samples/sec Loss 9.4359 LearningRate 0.3509 Epoch: 7 Global Step: 40250 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:44:51,315-Speed 10541.68 samples/sec Loss 9.3857 LearningRate 0.3508 Epoch: 7 Global Step: 40260 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:44:59,098-Speed 10528.03 samples/sec Loss 9.4327 LearningRate 0.3507 Epoch: 7 Global Step: 40270 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:45:06,867-Speed 10544.34 samples/sec Loss 9.4328 LearningRate 0.3506 Epoch: 7 Global Step: 40280 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:45:14,633-Speed 10550.87 samples/sec Loss 9.4417 LearningRate 0.3504 Epoch: 7 Global Step: 40290 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:45:22,434-Speed 10503.49 samples/sec Loss 9.4564 LearningRate 0.3503 Epoch: 7 Global Step: 40300 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:45:30,244-Speed 10490.71 samples/sec Loss 9.6271 LearningRate 0.3502 Epoch: 7 Global Step: 40310 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:45:38,020-Speed 10535.94 samples/sec Loss 9.6370 LearningRate 0.3501 Epoch: 7 Global Step: 40320 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:45:45,795-Speed 10538.16 samples/sec Loss 9.4685 LearningRate 0.3500 Epoch: 7 Global Step: 40330 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:45:53,561-Speed 10549.65 samples/sec Loss 9.4110 LearningRate 0.3499 Epoch: 7 Global Step: 40340 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:46:01,386-Speed 10471.40 samples/sec Loss 9.4316 LearningRate 0.3498 Epoch: 7 Global Step: 40350 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:46:09,155-Speed 10545.47 samples/sec Loss 9.4785 LearningRate 0.3497 Epoch: 7 Global Step: 40360 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:46:16,946-Speed 10516.67 samples/sec Loss 9.4468 LearningRate 0.3496 Epoch: 7 Global Step: 40370 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:46:24,725-Speed 10530.73 samples/sec Loss 9.4421 LearningRate 0.3495 Epoch: 7 Global Step: 40380 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:46:32,546-Speed 10476.46 samples/sec Loss 9.4311 LearningRate 0.3493 Epoch: 7 Global Step: 40390 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:46:40,324-Speed 10534.01 samples/sec Loss 9.4562 LearningRate 0.3492 Epoch: 7 Global Step: 40400 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:46:48,104-Speed 10530.51 samples/sec Loss 9.4157 LearningRate 0.3491 Epoch: 7 Global Step: 40410 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:46:55,890-Speed 10523.77 samples/sec Loss 9.4226 LearningRate 0.3490 Epoch: 7 Global Step: 40420 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:47:03,691-Speed 10503.09 samples/sec Loss 9.4104 LearningRate 0.3489 Epoch: 7 Global Step: 40430 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:47:11,476-Speed 10522.82 samples/sec Loss 9.3782 LearningRate 0.3488 Epoch: 7 Global Step: 40440 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:47:19,283-Speed 10494.43 samples/sec Loss 9.4942 LearningRate 0.3487 Epoch: 7 Global Step: 40450 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:47:27,095-Speed 10488.00 samples/sec Loss 9.4177 LearningRate 0.3486 Epoch: 7 Global Step: 40460 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:47:34,879-Speed 10528.23 samples/sec Loss 9.3755 LearningRate 0.3485 Epoch: 7 Global Step: 40470 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:47:42,667-Speed 10519.73 samples/sec Loss 9.4180 LearningRate 0.3483 Epoch: 7 Global Step: 40480 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:47:50,440-Speed 10541.66 samples/sec Loss 9.4487 LearningRate 0.3482 Epoch: 7 Global Step: 40490 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:47:58,215-Speed 10541.61 samples/sec Loss 9.4442 LearningRate 0.3481 Epoch: 7 Global Step: 40500 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:48:05,994-Speed 10532.72 samples/sec Loss 9.4587 LearningRate 0.3480 Epoch: 7 Global Step: 40510 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:48:13,775-Speed 10529.44 samples/sec Loss 9.4099 LearningRate 0.3479 Epoch: 7 Global Step: 40520 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:48:21,584-Speed 10493.43 samples/sec Loss 9.3958 LearningRate 0.3478 Epoch: 7 Global Step: 40530 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:48:29,383-Speed 10509.13 samples/sec Loss 9.4336 LearningRate 0.3477 Epoch: 7 Global Step: 40540 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:48:37,209-Speed 10469.23 samples/sec Loss 9.4286 LearningRate 0.3476 Epoch: 7 Global Step: 40550 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:48:44,998-Speed 10518.84 samples/sec Loss 9.4116 LearningRate 0.3475 Epoch: 7 Global Step: 40560 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:48:52,782-Speed 10524.23 samples/sec Loss 9.4466 LearningRate 0.3474 Epoch: 7 Global Step: 40570 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:49:00,610-Speed 10466.60 samples/sec Loss 9.3672 LearningRate 0.3472 Epoch: 7 Global Step: 40580 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:49:08,403-Speed 10513.83 samples/sec Loss 9.3763 LearningRate 0.3471 Epoch: 7 Global Step: 40590 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:49:16,180-Speed 10534.43 samples/sec Loss 9.4458 LearningRate 0.3470 Epoch: 7 Global Step: 40600 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:49:23,983-Speed 10500.57 samples/sec Loss 9.4556 LearningRate 0.3469 Epoch: 7 Global Step: 40610 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:49:31,782-Speed 10504.94 samples/sec Loss 9.3895 LearningRate 0.3468 Epoch: 7 Global Step: 40620 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:49:39,587-Speed 10496.57 samples/sec Loss 9.3164 LearningRate 0.3467 Epoch: 7 Global Step: 40630 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:49:47,365-Speed 10534.62 samples/sec Loss 9.5108 LearningRate 0.3466 Epoch: 7 Global Step: 40640 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:49:55,137-Speed 10541.87 samples/sec Loss 9.3823 LearningRate 0.3465 Epoch: 7 Global Step: 40650 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:50:02,940-Speed 10500.30 samples/sec Loss 9.3482 LearningRate 0.3464 Epoch: 7 Global Step: 40660 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:50:10,711-Speed 10543.60 samples/sec Loss 9.4366 LearningRate 0.3463 Epoch: 7 Global Step: 40670 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:50:18,500-Speed 10517.76 samples/sec Loss 9.3558 LearningRate 0.3461 Epoch: 7 Global Step: 40680 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:50:26,294-Speed 10512.57 samples/sec Loss 9.4990 LearningRate 0.3460 Epoch: 7 Global Step: 40690 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:50:34,083-Speed 10519.07 samples/sec Loss 9.3515 LearningRate 0.3459 Epoch: 7 Global Step: 40700 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:50:41,867-Speed 10526.49 samples/sec Loss 9.3968 LearningRate 0.3458 Epoch: 7 Global Step: 40710 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:50:49,653-Speed 10522.03 samples/sec Loss 9.3713 LearningRate 0.3457 Epoch: 7 Global Step: 40720 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:50:57,463-Speed 10490.54 samples/sec Loss 9.3617 LearningRate 0.3456 Epoch: 7 Global Step: 40730 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:51:05,236-Speed 10539.91 samples/sec Loss 9.4164 LearningRate 0.3455 Epoch: 7 Global Step: 40740 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:51:13,001-Speed 10552.56 samples/sec Loss 9.3856 LearningRate 0.3454 Epoch: 7 Global Step: 40750 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:51:20,757-Speed 10563.32 samples/sec Loss 9.3755 LearningRate 0.3453 Epoch: 7 Global Step: 40760 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:51:28,540-Speed 10526.83 samples/sec Loss 9.4331 LearningRate 0.3452 Epoch: 7 Global Step: 40770 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:51:36,329-Speed 10518.09 samples/sec Loss 9.3431 LearningRate 0.3451 Epoch: 7 Global Step: 40780 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:51:44,116-Speed 10522.13 samples/sec Loss 9.3325 LearningRate 0.3449 Epoch: 7 Global Step: 40790 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:51:51,894-Speed 10533.11 samples/sec Loss 9.3458 LearningRate 0.3448 Epoch: 7 Global Step: 40800 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:51:59,689-Speed 10510.58 samples/sec Loss 9.3725 LearningRate 0.3447 Epoch: 7 Global Step: 40810 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:52:07,499-Speed 10491.31 samples/sec Loss 9.3465 LearningRate 0.3446 Epoch: 7 Global Step: 40820 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:52:15,309-Speed 10491.17 samples/sec Loss 9.3400 LearningRate 0.3445 Epoch: 7 Global Step: 40830 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:52:23,120-Speed 10488.60 samples/sec Loss 9.3761 LearningRate 0.3444 Epoch: 7 Global Step: 40840 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:52:30,914-Speed 10512.05 samples/sec Loss 9.3555 LearningRate 0.3443 Epoch: 7 Global Step: 40850 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:52:38,695-Speed 10530.90 samples/sec Loss 9.3108 LearningRate 0.3442 Epoch: 7 Global Step: 40860 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:52:46,570-Speed 10403.19 samples/sec Loss 9.4265 LearningRate 0.3441 Epoch: 7 Global Step: 40870 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:52:54,355-Speed 10525.01 samples/sec Loss 9.3575 LearningRate 0.3440 Epoch: 7 Global Step: 40880 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:53:02,136-Speed 10529.10 samples/sec Loss 9.3793 LearningRate 0.3438 Epoch: 7 Global Step: 40890 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:53:09,927-Speed 10516.53 samples/sec Loss 9.3554 LearningRate 0.3437 Epoch: 7 Global Step: 40900 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:53:17,733-Speed 10495.56 samples/sec Loss 9.3806 LearningRate 0.3436 Epoch: 7 Global Step: 40910 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:53:25,516-Speed 10527.01 samples/sec Loss 9.4079 LearningRate 0.3435 Epoch: 7 Global Step: 40920 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:53:33,314-Speed 10505.72 samples/sec Loss 9.3379 LearningRate 0.3434 Epoch: 7 Global Step: 40930 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:53:41,088-Speed 10539.60 samples/sec Loss 9.3450 LearningRate 0.3433 Epoch: 7 Global Step: 40940 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:53:48,888-Speed 10504.00 samples/sec Loss 9.3124 LearningRate 0.3432 Epoch: 7 Global Step: 40950 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:53:56,688-Speed 10503.92 samples/sec Loss 9.4147 LearningRate 0.3431 Epoch: 7 Global Step: 40960 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:54:04,476-Speed 10520.09 samples/sec Loss 9.4043 LearningRate 0.3430 Epoch: 7 Global Step: 40970 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:54:12,258-Speed 10528.84 samples/sec Loss 9.3520 LearningRate 0.3429 Epoch: 7 Global Step: 40980 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:54:20,043-Speed 10524.20 samples/sec Loss 9.3953 LearningRate 0.3428 Epoch: 7 Global Step: 40990 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:54:27,860-Speed 10480.75 samples/sec Loss 9.4412 LearningRate 0.3426 Epoch: 7 Global Step: 41000 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:54:35,666-Speed 10495.38 samples/sec Loss 9.3547 LearningRate 0.3425 Epoch: 7 Global Step: 41010 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:54:43,462-Speed 10510.56 samples/sec Loss 9.3208 LearningRate 0.3424 Epoch: 7 Global Step: 41020 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:54:51,248-Speed 10522.35 samples/sec Loss 9.3017 LearningRate 0.3423 Epoch: 7 Global Step: 41030 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:54:59,050-Speed 10501.07 samples/sec Loss 9.3741 LearningRate 0.3422 Epoch: 7 Global Step: 41040 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:55:06,858-Speed 10493.32 samples/sec Loss 9.2801 LearningRate 0.3421 Epoch: 7 Global Step: 41050 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:55:14,676-Speed 10482.59 samples/sec Loss 9.3247 LearningRate 0.3420 Epoch: 7 Global Step: 41060 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:55:22,471-Speed 10510.54 samples/sec Loss 9.2787 LearningRate 0.3419 Epoch: 7 Global Step: 41070 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:55:30,294-Speed 10473.25 samples/sec Loss 9.2954 LearningRate 0.3418 Epoch: 7 Global Step: 41080 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:55:38,088-Speed 10511.77 samples/sec Loss 9.2915 LearningRate 0.3417 Epoch: 7 Global Step: 41090 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:55:45,877-Speed 10519.18 samples/sec Loss 9.3985 LearningRate 0.3415 Epoch: 7 Global Step: 41100 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:55:53,664-Speed 10521.86 samples/sec Loss 9.3051 LearningRate 0.3414 Epoch: 7 Global Step: 41110 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:56:01,458-Speed 10511.38 samples/sec Loss 9.4037 LearningRate 0.3413 Epoch: 7 Global Step: 41120 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:56:09,284-Speed 10469.42 samples/sec Loss 9.4060 LearningRate 0.3412 Epoch: 7 Global Step: 41130 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:56:17,122-Speed 10454.57 samples/sec Loss 9.3174 LearningRate 0.3411 Epoch: 7 Global Step: 41140 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:56:24,947-Speed 10470.97 samples/sec Loss 9.2853 LearningRate 0.3410 Epoch: 7 Global Step: 41150 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:56:32,751-Speed 10497.72 samples/sec Loss 9.4273 LearningRate 0.3409 Epoch: 7 Global Step: 41160 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:56:40,589-Speed 10454.40 samples/sec Loss 9.3601 LearningRate 0.3408 Epoch: 7 Global Step: 41170 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:56:48,404-Speed 10483.24 samples/sec Loss 9.3250 LearningRate 0.3407 Epoch: 7 Global Step: 41180 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:56:56,207-Speed 10499.92 samples/sec Loss 9.2813 LearningRate 0.3406 Epoch: 7 Global Step: 41190 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:57:03,991-Speed 10525.58 samples/sec Loss 9.2820 LearningRate 0.3405 Epoch: 7 Global Step: 41200 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:57:11,782-Speed 10517.92 samples/sec Loss 9.3394 LearningRate 0.3403 Epoch: 7 Global Step: 41210 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:57:19,598-Speed 10482.80 samples/sec Loss 9.3285 LearningRate 0.3402 Epoch: 7 Global Step: 41220 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:57:27,398-Speed 10503.30 samples/sec Loss 9.2781 LearningRate 0.3401 Epoch: 7 Global Step: 41230 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:57:35,199-Speed 10503.59 samples/sec Loss 9.3434 LearningRate 0.3400 Epoch: 7 Global Step: 41240 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:57:43,029-Speed 10466.08 samples/sec Loss 9.3209 LearningRate 0.3399 Epoch: 7 Global Step: 41250 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-15 23:57:50,833-Speed 10498.73 samples/sec Loss 9.3380 LearningRate 0.3398 Epoch: 7 Global Step: 41260 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:57:58,632-Speed 10505.81 samples/sec Loss 9.3316 LearningRate 0.3397 Epoch: 7 Global Step: 41270 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:58:06,466-Speed 10458.10 samples/sec Loss 9.2687 LearningRate 0.3396 Epoch: 7 Global Step: 41280 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:58:14,262-Speed 10508.63 samples/sec Loss 9.3121 LearningRate 0.3395 Epoch: 7 Global Step: 41290 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:58:22,063-Speed 10503.51 samples/sec Loss 9.3280 LearningRate 0.3394 Epoch: 7 Global Step: 41300 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:58:29,884-Speed 10475.70 samples/sec Loss 9.2963 LearningRate 0.3393 Epoch: 7 Global Step: 41310 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:58:37,658-Speed 10539.44 samples/sec Loss 9.3155 LearningRate 0.3392 Epoch: 7 Global Step: 41320 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:58:45,443-Speed 10524.12 samples/sec Loss 9.2436 LearningRate 0.3390 Epoch: 7 Global Step: 41330 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:58:53,239-Speed 10509.57 samples/sec Loss 9.3147 LearningRate 0.3389 Epoch: 7 Global Step: 41340 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:59:01,013-Speed 10539.83 samples/sec Loss 9.2864 LearningRate 0.3388 Epoch: 7 Global Step: 41350 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:59:08,787-Speed 10539.79 samples/sec Loss 9.3470 LearningRate 0.3387 Epoch: 7 Global Step: 41360 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:59:16,570-Speed 10525.59 samples/sec Loss 9.5356 LearningRate 0.3386 Epoch: 7 Global Step: 41370 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:59:24,396-Speed 10468.20 samples/sec Loss 9.3917 LearningRate 0.3385 Epoch: 7 Global Step: 41380 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:59:32,193-Speed 10508.86 samples/sec Loss 9.3074 LearningRate 0.3384 Epoch: 7 Global Step: 41390 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:59:39,985-Speed 10517.60 samples/sec Loss 9.2656 LearningRate 0.3383 Epoch: 7 Global Step: 41400 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:59:47,805-Speed 10475.52 samples/sec Loss 9.2315 LearningRate 0.3382 Epoch: 7 Global Step: 41410 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-15 23:59:55,702-Speed 10374.80 samples/sec Loss 9.2843 LearningRate 0.3381 Epoch: 7 Global Step: 41420 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:00:03,529-Speed 10468.67 samples/sec Loss 9.2505 LearningRate 0.3380 Epoch: 7 Global Step: 41430 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:00:11,334-Speed 10499.28 samples/sec Loss 9.2828 LearningRate 0.3378 Epoch: 7 Global Step: 41440 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:00:19,114-Speed 10531.77 samples/sec Loss 9.2549 LearningRate 0.3377 Epoch: 7 Global Step: 41450 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:00:26,942-Speed 10466.58 samples/sec Loss 9.3226 LearningRate 0.3376 Epoch: 7 Global Step: 41460 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-16 00:00:34,755-Speed 10487.00 samples/sec Loss 9.3094 LearningRate 0.3375 Epoch: 7 Global Step: 41470 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-16 00:00:42,602-Speed 10440.79 samples/sec Loss 9.2814 LearningRate 0.3374 Epoch: 7 Global Step: 41480 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-16 00:01:04,874-Speed 3678.26 samples/sec Loss 9.3820 LearningRate 0.3373 Epoch: 8 Global Step: 41490 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-16 00:01:12,656-Speed 10529.37 samples/sec Loss 9.2966 LearningRate 0.3372 Epoch: 8 Global Step: 41500 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-16 00:01:20,415-Speed 10559.32 samples/sec Loss 9.4096 LearningRate 0.3371 Epoch: 8 Global Step: 41510 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-16 00:01:28,204-Speed 10519.11 samples/sec Loss 9.2986 LearningRate 0.3370 Epoch: 8 Global Step: 41520 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-16 00:01:35,972-Speed 10545.92 samples/sec Loss 9.2792 LearningRate 0.3369 Epoch: 8 Global Step: 41530 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-16 00:01:43,754-Speed 10529.66 samples/sec Loss 9.2263 LearningRate 0.3368 Epoch: 8 Global Step: 41540 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-16 00:01:51,536-Speed 10527.40 samples/sec Loss 9.2597 LearningRate 0.3367 Epoch: 8 Global Step: 41550 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:01:59,321-Speed 10525.05 samples/sec Loss 9.2767 LearningRate 0.3365 Epoch: 8 Global Step: 41560 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:02:07,104-Speed 10525.97 samples/sec Loss 9.2512 LearningRate 0.3364 Epoch: 8 Global Step: 41570 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:02:14,889-Speed 10525.43 samples/sec Loss 9.2355 LearningRate 0.3363 Epoch: 8 Global Step: 41580 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:02:22,671-Speed 10527.03 samples/sec Loss 9.2054 LearningRate 0.3362 Epoch: 8 Global Step: 41590 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:02:30,479-Speed 10493.67 samples/sec Loss 9.2477 LearningRate 0.3361 Epoch: 8 Global Step: 41600 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:02:38,308-Speed 10465.58 samples/sec Loss 9.3171 LearningRate 0.3360 Epoch: 8 Global Step: 41610 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:02:46,103-Speed 10511.60 samples/sec Loss 9.2921 LearningRate 0.3359 Epoch: 8 Global Step: 41620 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:02:53,905-Speed 10500.62 samples/sec Loss 9.2149 LearningRate 0.3358 Epoch: 8 Global Step: 41630 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:03:01,694-Speed 10517.91 samples/sec Loss 9.1655 LearningRate 0.3357 Epoch: 8 Global Step: 41640 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:03:09,501-Speed 10495.37 samples/sec Loss 9.2812 LearningRate 0.3356 Epoch: 8 Global Step: 41650 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-16 00:03:17,302-Speed 10503.17 samples/sec Loss 9.2022 LearningRate 0.3355 Epoch: 8 Global Step: 41660 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-16 00:03:25,102-Speed 10503.01 samples/sec Loss 9.2255 LearningRate 0.3354 Epoch: 8 Global Step: 41670 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-16 00:03:32,902-Speed 10504.45 samples/sec Loss 9.2069 LearningRate 0.3352 Epoch: 8 Global Step: 41680 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-16 00:03:40,683-Speed 10531.27 samples/sec Loss 9.3627 LearningRate 0.3351 Epoch: 8 Global Step: 41690 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:03:48,486-Speed 10500.81 samples/sec Loss 9.2853 LearningRate 0.3350 Epoch: 8 Global Step: 41700 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:03:56,297-Speed 10489.06 samples/sec Loss 9.2672 LearningRate 0.3349 Epoch: 8 Global Step: 41710 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:04:04,083-Speed 10523.04 samples/sec Loss 9.2049 LearningRate 0.3348 Epoch: 8 Global Step: 41720 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:04:11,875-Speed 10514.91 samples/sec Loss 9.2662 LearningRate 0.3347 Epoch: 8 Global Step: 41730 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:04:19,671-Speed 10509.42 samples/sec Loss 9.2434 LearningRate 0.3346 Epoch: 8 Global Step: 41740 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:04:27,458-Speed 10521.11 samples/sec Loss 9.2325 LearningRate 0.3345 Epoch: 8 Global Step: 41750 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:04:35,243-Speed 10523.84 samples/sec Loss 9.1685 LearningRate 0.3344 Epoch: 8 Global Step: 41760 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:04:43,060-Speed 10482.62 samples/sec Loss 9.1676 LearningRate 0.3343 Epoch: 8 Global Step: 41770 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:04:50,878-Speed 10480.30 samples/sec Loss 9.3607 LearningRate 0.3342 Epoch: 8 Global Step: 41780 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:04:58,755-Speed 10401.48 samples/sec Loss 9.1986 LearningRate 0.3341 Epoch: 8 Global Step: 41790 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-16 00:05:06,581-Speed 10468.46 samples/sec Loss 9.3580 LearningRate 0.3340 Epoch: 8 Global Step: 41800 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-16 00:05:14,379-Speed 10508.82 samples/sec Loss 9.2322 LearningRate 0.3338 Epoch: 8 Global Step: 41810 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-16 00:05:22,174-Speed 10509.80 samples/sec Loss 9.2875 LearningRate 0.3337 Epoch: 8 Global Step: 41820 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-16 00:05:29,980-Speed 10496.02 samples/sec Loss 9.2470 LearningRate 0.3336 Epoch: 8 Global Step: 41830 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-16 00:05:37,776-Speed 10509.63 samples/sec Loss 9.1983 LearningRate 0.3335 Epoch: 8 Global Step: 41840 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-16 00:05:45,580-Speed 10499.09 samples/sec Loss 9.1920 LearningRate 0.3334 Epoch: 8 Global Step: 41850 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-16 00:05:53,386-Speed 10495.93 samples/sec Loss 9.2629 LearningRate 0.3333 Epoch: 8 Global Step: 41860 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:06:01,201-Speed 10483.70 samples/sec Loss 9.2745 LearningRate 0.3332 Epoch: 8 Global Step: 41870 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:06:09,021-Speed 10478.07 samples/sec Loss 9.2265 LearningRate 0.3331 Epoch: 8 Global Step: 41880 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:06:16,860-Speed 10451.40 samples/sec Loss 9.2200 LearningRate 0.3330 Epoch: 8 Global Step: 41890 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:06:24,701-Speed 10449.62 samples/sec Loss 9.2364 LearningRate 0.3329 Epoch: 8 Global Step: 41900 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:06:32,566-Speed 10416.85 samples/sec Loss 9.2318 LearningRate 0.3328 Epoch: 8 Global Step: 41910 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:06:40,411-Speed 10442.87 samples/sec Loss 9.2550 LearningRate 0.3327 Epoch: 8 Global Step: 41920 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:06:48,237-Speed 10469.77 samples/sec Loss 9.3247 LearningRate 0.3325 Epoch: 8 Global Step: 41930 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:06:56,079-Speed 10447.32 samples/sec Loss 9.2919 LearningRate 0.3324 Epoch: 8 Global Step: 41940 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:07:03,913-Speed 10458.80 samples/sec Loss 9.1980 LearningRate 0.3323 Epoch: 8 Global Step: 41950 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:07:11,802-Speed 10386.18 samples/sec Loss 9.1723 LearningRate 0.3322 Epoch: 8 Global Step: 41960 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-16 00:07:19,629-Speed 10467.68 samples/sec Loss 9.2285 LearningRate 0.3321 Epoch: 8 Global Step: 41970 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:07:27,466-Speed 10454.19 samples/sec Loss 9.2436 LearningRate 0.3320 Epoch: 8 Global Step: 41980 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:07:35,298-Speed 10460.53 samples/sec Loss 9.1849 LearningRate 0.3319 Epoch: 8 Global Step: 41990 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:07:43,131-Speed 10460.55 samples/sec Loss 9.2764 LearningRate 0.3318 Epoch: 8 Global Step: 42000 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:07:50,985-Speed 10432.12 samples/sec Loss 9.2337 LearningRate 0.3317 Epoch: 8 Global Step: 42010 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:07:58,817-Speed 10462.20 samples/sec Loss 9.1950 LearningRate 0.3316 Epoch: 8 Global Step: 42020 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:08:06,632-Speed 10484.59 samples/sec Loss 9.1909 LearningRate 0.3315 Epoch: 8 Global Step: 42030 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:08:14,471-Speed 10451.39 samples/sec Loss 9.1746 LearningRate 0.3314 Epoch: 8 Global Step: 42040 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:08:22,314-Speed 10447.46 samples/sec Loss 9.1433 LearningRate 0.3313 Epoch: 8 Global Step: 42050 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:08:30,159-Speed 10444.48 samples/sec Loss 9.1685 LearningRate 0.3311 Epoch: 8 Global Step: 42060 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:08:38,002-Speed 10446.77 samples/sec Loss 9.1641 LearningRate 0.3310 Epoch: 8 Global Step: 42070 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-16 00:08:45,829-Speed 10467.51 samples/sec Loss 9.2064 LearningRate 0.3309 Epoch: 8 Global Step: 42080 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-16 00:08:53,657-Speed 10467.13 samples/sec Loss 9.1750 LearningRate 0.3308 Epoch: 8 Global Step: 42090 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-16 00:09:01,521-Speed 10419.62 samples/sec Loss 9.2255 LearningRate 0.3307 Epoch: 8 Global Step: 42100 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-16 00:09:09,343-Speed 10473.88 samples/sec Loss 9.1982 LearningRate 0.3306 Epoch: 8 Global Step: 42110 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-16 00:09:17,172-Speed 10464.69 samples/sec Loss 9.2153 LearningRate 0.3305 Epoch: 8 Global Step: 42120 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-16 00:09:25,005-Speed 10460.81 samples/sec Loss 9.1972 LearningRate 0.3304 Epoch: 8 Global Step: 42130 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:09:32,834-Speed 10464.12 samples/sec Loss 9.1413 LearningRate 0.3303 Epoch: 8 Global Step: 42140 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:09:40,667-Speed 10460.16 samples/sec Loss 9.1589 LearningRate 0.3302 Epoch: 8 Global Step: 42150 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:09:48,474-Speed 10495.07 samples/sec Loss 9.2001 LearningRate 0.3301 Epoch: 8 Global Step: 42160 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:09:56,298-Speed 10470.56 samples/sec Loss 9.1816 LearningRate 0.3300 Epoch: 8 Global Step: 42170 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:10:04,132-Speed 10458.99 samples/sec Loss 9.1371 LearningRate 0.3299 Epoch: 8 Global Step: 42180 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:10:12,035-Speed 10367.33 samples/sec Loss 9.1827 LearningRate 0.3298 Epoch: 8 Global Step: 42190 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:10:19,879-Speed 10445.74 samples/sec Loss 9.2110 LearningRate 0.3296 Epoch: 8 Global Step: 42200 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:10:27,758-Speed 10398.96 samples/sec Loss 9.1487 LearningRate 0.3295 Epoch: 8 Global Step: 42210 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:10:35,603-Speed 10443.17 samples/sec Loss 9.1853 LearningRate 0.3294 Epoch: 8 Global Step: 42220 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:10:43,471-Speed 10414.15 samples/sec Loss 9.2123 LearningRate 0.3293 Epoch: 8 Global Step: 42230 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-16 00:10:51,312-Speed 10447.82 samples/sec Loss 9.2023 LearningRate 0.3292 Epoch: 8 Global Step: 42240 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-16 00:10:59,130-Speed 10483.21 samples/sec Loss 9.1845 LearningRate 0.3291 Epoch: 8 Global Step: 42250 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-16 00:11:06,939-Speed 10492.07 samples/sec Loss 9.2026 LearningRate 0.3290 Epoch: 8 Global Step: 42260 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-16 00:11:14,778-Speed 10452.00 samples/sec Loss 9.1608 LearningRate 0.3289 Epoch: 8 Global Step: 42270 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-16 00:11:22,574-Speed 10509.32 samples/sec Loss 9.1567 LearningRate 0.3288 Epoch: 8 Global Step: 42280 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-01-16 00:11:30,343-Speed 10546.19 samples/sec Loss 9.2070 LearningRate 0.3287 Epoch: 8 Global Step: 42290 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:11:38,140-Speed 10508.61 samples/sec Loss 9.1614 LearningRate 0.3286 Epoch: 8 Global Step: 42300 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:11:45,939-Speed 10505.90 samples/sec Loss 9.1704 LearningRate 0.3285 Epoch: 8 Global Step: 42310 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:11:53,767-Speed 10465.87 samples/sec Loss 9.1194 LearningRate 0.3284 Epoch: 8 Global Step: 42320 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:12:01,587-Speed 10477.51 samples/sec Loss 9.1684 LearningRate 0.3283 Epoch: 8 Global Step: 42330 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-01-16 00:12:09,389-Speed 10502.92 samples/sec Loss 9.0836 LearningRate 0.3281 Epoch: 8 Global Step: 42340 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:12:17,201-Speed 10487.03 samples/sec Loss 9.1396 LearningRate 0.3280 Epoch: 8 Global Step: 42350 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:12:24,997-Speed 10510.54 samples/sec Loss 9.1578 LearningRate 0.3279 Epoch: 8 Global Step: 42360 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:12:32,857-Speed 10423.33 samples/sec Loss 9.2376 LearningRate 0.3278 Epoch: 8 Global Step: 42370 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:12:40,665-Speed 10493.74 samples/sec Loss 9.1588 LearningRate 0.3277 Epoch: 8 Global Step: 42380 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:12:48,473-Speed 10492.84 samples/sec Loss 9.1667 LearningRate 0.3276 Epoch: 8 Global Step: 42390 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:12:56,283-Speed 10489.89 samples/sec Loss 9.1862 LearningRate 0.3275 Epoch: 8 Global Step: 42400 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:13:04,108-Speed 10470.06 samples/sec Loss 9.0938 LearningRate 0.3274 Epoch: 8 Global Step: 42410 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:13:11,899-Speed 10516.97 samples/sec Loss 9.1903 LearningRate 0.3273 Epoch: 8 Global Step: 42420 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:13:19,710-Speed 10489.29 samples/sec Loss 9.1408 LearningRate 0.3272 Epoch: 8 Global Step: 42430 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:13:27,539-Speed 10465.08 samples/sec Loss 9.0785 LearningRate 0.3271 Epoch: 8 Global Step: 42440 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:13:35,374-Speed 10457.01 samples/sec Loss 9.0881 LearningRate 0.3270 Epoch: 8 Global Step: 42450 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:13:43,186-Speed 10488.31 samples/sec Loss 9.0603 LearningRate 0.3269 Epoch: 8 Global Step: 42460 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:13:50,993-Speed 10494.25 samples/sec Loss 9.1069 LearningRate 0.3268 Epoch: 8 Global Step: 42470 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:13:58,804-Speed 10488.92 samples/sec Loss 9.2112 LearningRate 0.3267 Epoch: 8 Global Step: 42480 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:14:06,613-Speed 10494.80 samples/sec Loss 9.1129 LearningRate 0.3265 Epoch: 8 Global Step: 42490 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:14:14,428-Speed 10484.35 samples/sec Loss 9.1703 LearningRate 0.3264 Epoch: 8 Global Step: 42500 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:14:22,225-Speed 10507.05 samples/sec Loss 9.1462 LearningRate 0.3263 Epoch: 8 Global Step: 42510 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:14:30,083-Speed 10427.27 samples/sec Loss 9.2074 LearningRate 0.3262 Epoch: 8 Global Step: 42520 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:14:37,899-Speed 10482.39 samples/sec Loss 9.1725 LearningRate 0.3261 Epoch: 8 Global Step: 42530 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:14:45,696-Speed 10508.59 samples/sec Loss 9.1351 LearningRate 0.3260 Epoch: 8 Global Step: 42540 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:14:53,515-Speed 10478.75 samples/sec Loss 9.1300 LearningRate 0.3259 Epoch: 8 Global Step: 42550 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:15:01,332-Speed 10480.89 samples/sec Loss 9.1124 LearningRate 0.3258 Epoch: 8 Global Step: 42560 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:15:09,162-Speed 10463.90 samples/sec Loss 9.1041 LearningRate 0.3257 Epoch: 8 Global Step: 42570 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:15:16,963-Speed 10503.60 samples/sec Loss 9.1501 LearningRate 0.3256 Epoch: 8 Global Step: 42580 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:15:24,782-Speed 10478.32 samples/sec Loss 9.1397 LearningRate 0.3255 Epoch: 8 Global Step: 42590 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:15:32,573-Speed 10516.55 samples/sec Loss 9.0032 LearningRate 0.3254 Epoch: 8 Global Step: 42600 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:15:40,358-Speed 10523.62 samples/sec Loss 9.1504 LearningRate 0.3253 Epoch: 8 Global Step: 42610 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:15:48,166-Speed 10494.23 samples/sec Loss 9.0733 LearningRate 0.3252 Epoch: 8 Global Step: 42620 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:15:55,968-Speed 10501.26 samples/sec Loss 9.1208 LearningRate 0.3251 Epoch: 8 Global Step: 42630 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:16:03,799-Speed 10461.68 samples/sec Loss 9.1880 LearningRate 0.3249 Epoch: 8 Global Step: 42640 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:16:11,655-Speed 10429.34 samples/sec Loss 9.1665 LearningRate 0.3248 Epoch: 8 Global Step: 42650 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:16:19,480-Speed 10470.49 samples/sec Loss 9.1542 LearningRate 0.3247 Epoch: 8 Global Step: 42660 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:16:27,274-Speed 10511.44 samples/sec Loss 9.1040 LearningRate 0.3246 Epoch: 8 Global Step: 42670 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:16:35,119-Speed 10443.26 samples/sec Loss 9.1426 LearningRate 0.3245 Epoch: 8 Global Step: 42680 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:16:42,968-Speed 10438.39 samples/sec Loss 9.1364 LearningRate 0.3244 Epoch: 8 Global Step: 42690 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:16:50,762-Speed 10513.35 samples/sec Loss 9.0640 LearningRate 0.3243 Epoch: 8 Global Step: 42700 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:16:58,579-Speed 10481.51 samples/sec Loss 9.0739 LearningRate 0.3242 Epoch: 8 Global Step: 42710 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:17:06,389-Speed 10489.60 samples/sec Loss 9.0950 LearningRate 0.3241 Epoch: 8 Global Step: 42720 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:17:14,202-Speed 10486.42 samples/sec Loss 9.0884 LearningRate 0.3240 Epoch: 8 Global Step: 42730 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:17:22,017-Speed 10484.37 samples/sec Loss 9.1525 LearningRate 0.3239 Epoch: 8 Global Step: 42740 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:17:29,868-Speed 10435.87 samples/sec Loss 9.1920 LearningRate 0.3238 Epoch: 8 Global Step: 42750 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:17:37,683-Speed 10484.03 samples/sec Loss 9.1426 LearningRate 0.3237 Epoch: 8 Global Step: 42760 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:17:45,486-Speed 10499.31 samples/sec Loss 9.1264 LearningRate 0.3236 Epoch: 8 Global Step: 42770 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:17:53,275-Speed 10519.71 samples/sec Loss 9.1250 LearningRate 0.3235 Epoch: 8 Global Step: 42780 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:18:01,065-Speed 10516.99 samples/sec Loss 9.1213 LearningRate 0.3234 Epoch: 8 Global Step: 42790 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:18:08,859-Speed 10512.45 samples/sec Loss 9.0517 LearningRate 0.3232 Epoch: 8 Global Step: 42800 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:18:16,649-Speed 10518.05 samples/sec Loss 9.0808 LearningRate 0.3231 Epoch: 8 Global Step: 42810 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:18:24,462-Speed 10486.83 samples/sec Loss 9.0567 LearningRate 0.3230 Epoch: 8 Global Step: 42820 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:18:32,249-Speed 10521.16 samples/sec Loss 9.0532 LearningRate 0.3229 Epoch: 8 Global Step: 42830 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:18:40,043-Speed 10511.79 samples/sec Loss 9.0840 LearningRate 0.3228 Epoch: 8 Global Step: 42840 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:18:47,821-Speed 10533.37 samples/sec Loss 8.9961 LearningRate 0.3227 Epoch: 8 Global Step: 42850 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:18:55,622-Speed 10503.82 samples/sec Loss 9.0124 LearningRate 0.3226 Epoch: 8 Global Step: 42860 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:19:03,446-Speed 10471.52 samples/sec Loss 9.1017 LearningRate 0.3225 Epoch: 8 Global Step: 42870 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:19:11,246-Speed 10504.47 samples/sec Loss 9.1506 LearningRate 0.3224 Epoch: 8 Global Step: 42880 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:19:19,041-Speed 10511.06 samples/sec Loss 9.0850 LearningRate 0.3223 Epoch: 8 Global Step: 42890 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:19:26,838-Speed 10508.34 samples/sec Loss 9.1351 LearningRate 0.3222 Epoch: 8 Global Step: 42900 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:19:34,655-Speed 10481.57 samples/sec Loss 9.0995 LearningRate 0.3221 Epoch: 8 Global Step: 42910 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:19:42,463-Speed 10493.28 samples/sec Loss 9.1359 LearningRate 0.3220 Epoch: 8 Global Step: 42920 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:19:50,289-Speed 10468.72 samples/sec Loss 9.1530 LearningRate 0.3219 Epoch: 8 Global Step: 42930 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:19:58,082-Speed 10513.34 samples/sec Loss 9.0082 LearningRate 0.3218 Epoch: 8 Global Step: 42940 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:20:05,902-Speed 10476.83 samples/sec Loss 9.1310 LearningRate 0.3217 Epoch: 8 Global Step: 42950 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:20:13,706-Speed 10498.68 samples/sec Loss 9.0920 LearningRate 0.3215 Epoch: 8 Global Step: 42960 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:20:21,535-Speed 10465.16 samples/sec Loss 9.0871 LearningRate 0.3214 Epoch: 8 Global Step: 42970 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:20:29,323-Speed 10519.75 samples/sec Loss 9.0819 LearningRate 0.3213 Epoch: 8 Global Step: 42980 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:20:37,136-Speed 10486.71 samples/sec Loss 9.0563 LearningRate 0.3212 Epoch: 8 Global Step: 42990 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:20:44,935-Speed 10504.64 samples/sec Loss 9.0769 LearningRate 0.3211 Epoch: 8 Global Step: 43000 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:20:52,753-Speed 10481.70 samples/sec Loss 9.0512 LearningRate 0.3210 Epoch: 8 Global Step: 43010 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:21:00,582-Speed 10465.73 samples/sec Loss 8.9861 LearningRate 0.3209 Epoch: 8 Global Step: 43020 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:21:08,385-Speed 10499.87 samples/sec Loss 9.0861 LearningRate 0.3208 Epoch: 8 Global Step: 43030 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:21:16,183-Speed 10506.75 samples/sec Loss 9.0396 LearningRate 0.3207 Epoch: 8 Global Step: 43040 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:21:23,977-Speed 10512.50 samples/sec Loss 8.9851 LearningRate 0.3206 Epoch: 8 Global Step: 43050 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:21:31,790-Speed 10487.50 samples/sec Loss 9.0632 LearningRate 0.3205 Epoch: 8 Global Step: 43060 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:21:39,590-Speed 10503.63 samples/sec Loss 9.0188 LearningRate 0.3204 Epoch: 8 Global Step: 43070 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:21:47,421-Speed 10462.38 samples/sec Loss 9.1049 LearningRate 0.3203 Epoch: 8 Global Step: 43080 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:21:55,280-Speed 10424.71 samples/sec Loss 9.0591 LearningRate 0.3202 Epoch: 8 Global Step: 43090 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:22:03,076-Speed 10508.91 samples/sec Loss 9.1228 LearningRate 0.3201 Epoch: 8 Global Step: 43100 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:22:10,908-Speed 10461.25 samples/sec Loss 9.0550 LearningRate 0.3200 Epoch: 8 Global Step: 43110 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:22:18,720-Speed 10487.99 samples/sec Loss 9.0252 LearningRate 0.3199 Epoch: 8 Global Step: 43120 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:22:26,526-Speed 10495.95 samples/sec Loss 9.0529 LearningRate 0.3197 Epoch: 8 Global Step: 43130 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:22:34,332-Speed 10496.11 samples/sec Loss 9.0564 LearningRate 0.3196 Epoch: 8 Global Step: 43140 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:22:42,159-Speed 10467.46 samples/sec Loss 9.0802 LearningRate 0.3195 Epoch: 8 Global Step: 43150 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:22:49,965-Speed 10495.39 samples/sec Loss 9.0454 LearningRate 0.3194 Epoch: 8 Global Step: 43160 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:22:57,824-Speed 10425.97 samples/sec Loss 9.0592 LearningRate 0.3193 Epoch: 8 Global Step: 43170 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:23:05,624-Speed 10504.16 samples/sec Loss 9.1117 LearningRate 0.3192 Epoch: 8 Global Step: 43180 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:23:13,429-Speed 10497.50 samples/sec Loss 8.9973 LearningRate 0.3191 Epoch: 8 Global Step: 43190 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:23:21,212-Speed 10527.76 samples/sec Loss 9.0048 LearningRate 0.3190 Epoch: 8 Global Step: 43200 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:23:28,992-Speed 10531.24 samples/sec Loss 9.0908 LearningRate 0.3189 Epoch: 8 Global Step: 43210 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:23:36,789-Speed 10507.79 samples/sec Loss 9.0160 LearningRate 0.3188 Epoch: 8 Global Step: 43220 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:23:44,619-Speed 10463.06 samples/sec Loss 9.0205 LearningRate 0.3187 Epoch: 8 Global Step: 43230 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:23:52,408-Speed 10519.17 samples/sec Loss 8.9887 LearningRate 0.3186 Epoch: 8 Global Step: 43240 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:24:00,212-Speed 10499.99 samples/sec Loss 8.9905 LearningRate 0.3185 Epoch: 8 Global Step: 43250 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:24:08,011-Speed 10505.15 samples/sec Loss 9.0025 LearningRate 0.3184 Epoch: 8 Global Step: 43260 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:24:15,814-Speed 10499.59 samples/sec Loss 9.0233 LearningRate 0.3183 Epoch: 8 Global Step: 43270 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:24:23,613-Speed 10505.83 samples/sec Loss 9.0052 LearningRate 0.3182 Epoch: 8 Global Step: 43280 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:24:31,528-Speed 10351.45 samples/sec Loss 8.9746 LearningRate 0.3181 Epoch: 8 Global Step: 43290 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:24:39,324-Speed 10510.56 samples/sec Loss 9.0239 LearningRate 0.3180 Epoch: 8 Global Step: 43300 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:24:47,104-Speed 10530.31 samples/sec Loss 9.0224 LearningRate 0.3179 Epoch: 8 Global Step: 43310 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:24:54,884-Speed 10530.92 samples/sec Loss 8.9650 LearningRate 0.3177 Epoch: 8 Global Step: 43320 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:25:02,705-Speed 10475.09 samples/sec Loss 9.0383 LearningRate 0.3176 Epoch: 8 Global Step: 43330 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:25:10,495-Speed 10517.75 samples/sec Loss 9.0405 LearningRate 0.3175 Epoch: 8 Global Step: 43340 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:25:18,309-Speed 10485.36 samples/sec Loss 9.0684 LearningRate 0.3174 Epoch: 8 Global Step: 43350 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:25:26,167-Speed 10425.22 samples/sec Loss 9.0526 LearningRate 0.3173 Epoch: 8 Global Step: 43360 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:25:33,995-Speed 10472.86 samples/sec Loss 9.0885 LearningRate 0.3172 Epoch: 8 Global Step: 43370 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:25:41,784-Speed 10523.29 samples/sec Loss 9.0778 LearningRate 0.3171 Epoch: 8 Global Step: 43380 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:25:49,590-Speed 10496.31 samples/sec Loss 9.0146 LearningRate 0.3170 Epoch: 8 Global Step: 43390 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:25:57,401-Speed 10490.93 samples/sec Loss 9.0147 LearningRate 0.3169 Epoch: 8 Global Step: 43400 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:26:05,208-Speed 10495.61 samples/sec Loss 9.0747 LearningRate 0.3168 Epoch: 8 Global Step: 43410 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:26:13,019-Speed 10489.19 samples/sec Loss 8.9830 LearningRate 0.3167 Epoch: 8 Global Step: 43420 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:26:20,830-Speed 10489.62 samples/sec Loss 8.9040 LearningRate 0.3166 Epoch: 8 Global Step: 43430 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:26:28,651-Speed 10476.81 samples/sec Loss 8.9492 LearningRate 0.3165 Epoch: 8 Global Step: 43440 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:26:36,469-Speed 10480.06 samples/sec Loss 8.9776 LearningRate 0.3164 Epoch: 8 Global Step: 43450 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:26:44,299-Speed 10463.94 samples/sec Loss 9.0190 LearningRate 0.3163 Epoch: 8 Global Step: 43460 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:26:52,077-Speed 10533.42 samples/sec Loss 9.0054 LearningRate 0.3162 Epoch: 8 Global Step: 43470 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:26:59,865-Speed 10520.22 samples/sec Loss 9.0613 LearningRate 0.3161 Epoch: 8 Global Step: 43480 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:27:07,681-Speed 10482.39 samples/sec Loss 8.9538 LearningRate 0.3160 Epoch: 8 Global Step: 43490 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:27:15,506-Speed 10470.22 samples/sec Loss 8.9145 LearningRate 0.3159 Epoch: 8 Global Step: 43500 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:27:23,346-Speed 10450.30 samples/sec Loss 8.9816 LearningRate 0.3157 Epoch: 8 Global Step: 43510 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:27:31,159-Speed 10487.37 samples/sec Loss 9.0055 LearningRate 0.3156 Epoch: 8 Global Step: 43520 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:27:38,980-Speed 10475.13 samples/sec Loss 8.9379 LearningRate 0.3155 Epoch: 8 Global Step: 43530 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:27:46,762-Speed 10528.79 samples/sec Loss 8.9819 LearningRate 0.3154 Epoch: 8 Global Step: 43540 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:27:54,583-Speed 10475.82 samples/sec Loss 8.9619 LearningRate 0.3153 Epoch: 8 Global Step: 43550 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:28:02,382-Speed 10504.20 samples/sec Loss 8.9701 LearningRate 0.3152 Epoch: 8 Global Step: 43560 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:28:10,218-Speed 10460.19 samples/sec Loss 8.9669 LearningRate 0.3151 Epoch: 8 Global Step: 43570 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:28:18,072-Speed 10432.07 samples/sec Loss 9.0432 LearningRate 0.3150 Epoch: 8 Global Step: 43580 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:28:25,909-Speed 10455.04 samples/sec Loss 8.9812 LearningRate 0.3149 Epoch: 8 Global Step: 43590 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:28:33,731-Speed 10473.84 samples/sec Loss 8.9734 LearningRate 0.3148 Epoch: 8 Global Step: 43600 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:28:41,567-Speed 10457.05 samples/sec Loss 8.9941 LearningRate 0.3147 Epoch: 8 Global Step: 43610 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:28:49,372-Speed 10497.12 samples/sec Loss 9.0099 LearningRate 0.3146 Epoch: 8 Global Step: 43620 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:28:57,196-Speed 10471.66 samples/sec Loss 8.9999 LearningRate 0.3145 Epoch: 8 Global Step: 43630 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:29:05,041-Speed 10446.30 samples/sec Loss 8.9753 LearningRate 0.3144 Epoch: 8 Global Step: 43640 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:29:12,832-Speed 10515.58 samples/sec Loss 8.9699 LearningRate 0.3143 Epoch: 8 Global Step: 43650 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:29:20,644-Speed 10488.70 samples/sec Loss 8.9422 LearningRate 0.3142 Epoch: 8 Global Step: 43660 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:29:28,429-Speed 10524.31 samples/sec Loss 8.9560 LearningRate 0.3141 Epoch: 8 Global Step: 43670 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:29:36,245-Speed 10482.20 samples/sec Loss 8.9898 LearningRate 0.3140 Epoch: 8 Global Step: 43680 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:29:44,041-Speed 10510.30 samples/sec Loss 9.0299 LearningRate 0.3139 Epoch: 8 Global Step: 43690 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:29:51,855-Speed 10484.44 samples/sec Loss 8.9614 LearningRate 0.3138 Epoch: 8 Global Step: 43700 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:29:59,688-Speed 10459.00 samples/sec Loss 9.0419 LearningRate 0.3137 Epoch: 8 Global Step: 43710 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:30:07,514-Speed 10469.73 samples/sec Loss 8.9727 LearningRate 0.3135 Epoch: 8 Global Step: 43720 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:30:15,324-Speed 10491.11 samples/sec Loss 8.9804 LearningRate 0.3134 Epoch: 8 Global Step: 43730 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:30:23,119-Speed 10510.60 samples/sec Loss 8.9344 LearningRate 0.3133 Epoch: 8 Global Step: 43740 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:30:30,904-Speed 10524.68 samples/sec Loss 8.9397 LearningRate 0.3132 Epoch: 8 Global Step: 43750 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:30:38,732-Speed 10466.41 samples/sec Loss 8.9993 LearningRate 0.3131 Epoch: 8 Global Step: 43760 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:30:46,560-Speed 10466.49 samples/sec Loss 8.9938 LearningRate 0.3130 Epoch: 8 Global Step: 43770 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:30:54,359-Speed 10505.35 samples/sec Loss 8.9706 LearningRate 0.3129 Epoch: 8 Global Step: 43780 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:31:02,158-Speed 10503.98 samples/sec Loss 8.9601 LearningRate 0.3128 Epoch: 8 Global Step: 43790 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:31:09,987-Speed 10465.39 samples/sec Loss 9.0125 LearningRate 0.3127 Epoch: 8 Global Step: 43800 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:31:17,773-Speed 10524.49 samples/sec Loss 8.9561 LearningRate 0.3126 Epoch: 8 Global Step: 43810 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:31:25,581-Speed 10491.89 samples/sec Loss 9.0057 LearningRate 0.3125 Epoch: 8 Global Step: 43820 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:31:33,389-Speed 10493.37 samples/sec Loss 8.9722 LearningRate 0.3124 Epoch: 8 Global Step: 43830 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:31:41,211-Speed 10474.02 samples/sec Loss 8.9549 LearningRate 0.3123 Epoch: 8 Global Step: 43840 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:31:49,007-Speed 10510.08 samples/sec Loss 8.9216 LearningRate 0.3122 Epoch: 8 Global Step: 43850 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:31:56,823-Speed 10482.09 samples/sec Loss 8.9411 LearningRate 0.3121 Epoch: 8 Global Step: 43860 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:32:04,661-Speed 10453.15 samples/sec Loss 8.9631 LearningRate 0.3120 Epoch: 8 Global Step: 43870 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:32:12,449-Speed 10519.96 samples/sec Loss 8.9809 LearningRate 0.3119 Epoch: 8 Global Step: 43880 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:32:20,279-Speed 10464.78 samples/sec Loss 8.8975 LearningRate 0.3118 Epoch: 8 Global Step: 43890 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:32:28,073-Speed 10512.13 samples/sec Loss 9.0341 LearningRate 0.3117 Epoch: 8 Global Step: 43900 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:32:35,892-Speed 10477.81 samples/sec Loss 8.9256 LearningRate 0.3116 Epoch: 8 Global Step: 43910 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:32:43,710-Speed 10480.59 samples/sec Loss 8.9279 LearningRate 0.3115 Epoch: 8 Global Step: 43920 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:32:51,498-Speed 10520.49 samples/sec Loss 8.9590 LearningRate 0.3114 Epoch: 8 Global Step: 43930 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:32:59,324-Speed 10469.38 samples/sec Loss 8.8729 LearningRate 0.3113 Epoch: 8 Global Step: 43940 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:33:07,156-Speed 10461.62 samples/sec Loss 8.9124 LearningRate 0.3111 Epoch: 8 Global Step: 43950 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:33:14,988-Speed 10461.20 samples/sec Loss 8.8993 LearningRate 0.3110 Epoch: 8 Global Step: 43960 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:33:22,799-Speed 10489.59 samples/sec Loss 8.9241 LearningRate 0.3109 Epoch: 8 Global Step: 43970 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:33:30,608-Speed 10490.74 samples/sec Loss 8.9192 LearningRate 0.3108 Epoch: 8 Global Step: 43980 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:33:38,411-Speed 10500.56 samples/sec Loss 8.8960 LearningRate 0.3107 Epoch: 8 Global Step: 43990 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:33:46,235-Speed 10472.44 samples/sec Loss 8.8917 LearningRate 0.3106 Epoch: 8 Global Step: 44000 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:33:54,048-Speed 10486.50 samples/sec Loss 8.9398 LearningRate 0.3105 Epoch: 8 Global Step: 44010 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:34:01,890-Speed 10448.12 samples/sec Loss 8.9395 LearningRate 0.3104 Epoch: 8 Global Step: 44020 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:34:09,713-Speed 10472.68 samples/sec Loss 8.9802 LearningRate 0.3103 Epoch: 8 Global Step: 44030 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:34:17,551-Speed 10453.35 samples/sec Loss 8.9367 LearningRate 0.3102 Epoch: 8 Global Step: 44040 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:34:25,360-Speed 10492.86 samples/sec Loss 8.9359 LearningRate 0.3101 Epoch: 8 Global Step: 44050 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:34:33,167-Speed 10494.49 samples/sec Loss 8.8827 LearningRate 0.3100 Epoch: 8 Global Step: 44060 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:34:40,962-Speed 10509.66 samples/sec Loss 8.9269 LearningRate 0.3099 Epoch: 8 Global Step: 44070 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:34:48,765-Speed 10500.58 samples/sec Loss 8.8933 LearningRate 0.3098 Epoch: 8 Global Step: 44080 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:34:56,564-Speed 10506.79 samples/sec Loss 9.0110 LearningRate 0.3097 Epoch: 8 Global Step: 44090 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:35:04,377-Speed 10485.81 samples/sec Loss 8.9812 LearningRate 0.3096 Epoch: 8 Global Step: 44100 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:35:12,166-Speed 10525.97 samples/sec Loss 8.8977 LearningRate 0.3095 Epoch: 8 Global Step: 44110 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:35:19,965-Speed 10505.38 samples/sec Loss 8.8692 LearningRate 0.3094 Epoch: 8 Global Step: 44120 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:35:27,833-Speed 10413.34 samples/sec Loss 8.9036 LearningRate 0.3093 Epoch: 8 Global Step: 44130 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:35:35,631-Speed 10506.36 samples/sec Loss 8.8641 LearningRate 0.3092 Epoch: 8 Global Step: 44140 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:35:43,433-Speed 10501.47 samples/sec Loss 8.9184 LearningRate 0.3091 Epoch: 8 Global Step: 44150 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:35:51,274-Speed 10449.52 samples/sec Loss 8.9109 LearningRate 0.3090 Epoch: 8 Global Step: 44160 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:35:59,078-Speed 10497.99 samples/sec Loss 8.8658 LearningRate 0.3089 Epoch: 8 Global Step: 44170 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:36:06,889-Speed 10489.02 samples/sec Loss 8.8643 LearningRate 0.3088 Epoch: 8 Global Step: 44180 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:36:14,670-Speed 10530.17 samples/sec Loss 8.8954 LearningRate 0.3087 Epoch: 8 Global Step: 44190 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:36:22,468-Speed 10506.76 samples/sec Loss 8.9117 LearningRate 0.3085 Epoch: 8 Global Step: 44200 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:36:30,280-Speed 10488.71 samples/sec Loss 8.9546 LearningRate 0.3084 Epoch: 8 Global Step: 44210 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:36:38,102-Speed 10473.21 samples/sec Loss 8.9284 LearningRate 0.3083 Epoch: 8 Global Step: 44220 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:36:45,924-Speed 10474.24 samples/sec Loss 8.8798 LearningRate 0.3082 Epoch: 8 Global Step: 44230 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:36:53,747-Speed 10473.54 samples/sec Loss 8.8467 LearningRate 0.3081 Epoch: 8 Global Step: 44240 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:37:01,589-Speed 10447.90 samples/sec Loss 8.9518 LearningRate 0.3080 Epoch: 8 Global Step: 44250 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:37:09,388-Speed 10504.26 samples/sec Loss 8.9151 LearningRate 0.3079 Epoch: 8 Global Step: 44260 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:37:17,207-Speed 10478.85 samples/sec Loss 8.9363 LearningRate 0.3078 Epoch: 8 Global Step: 44270 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:37:25,043-Speed 10455.61 samples/sec Loss 8.9199 LearningRate 0.3077 Epoch: 8 Global Step: 44280 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:37:32,852-Speed 10492.34 samples/sec Loss 8.9505 LearningRate 0.3076 Epoch: 8 Global Step: 44290 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:37:40,652-Speed 10503.24 samples/sec Loss 8.8771 LearningRate 0.3075 Epoch: 8 Global Step: 44300 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:37:48,452-Speed 10508.97 samples/sec Loss 8.9301 LearningRate 0.3074 Epoch: 8 Global Step: 44310 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:37:56,279-Speed 10468.13 samples/sec Loss 8.9259 LearningRate 0.3073 Epoch: 8 Global Step: 44320 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:38:04,095-Speed 10483.80 samples/sec Loss 8.8316 LearningRate 0.3072 Epoch: 8 Global Step: 44330 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:38:11,889-Speed 10510.71 samples/sec Loss 8.9026 LearningRate 0.3071 Epoch: 8 Global Step: 44340 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:38:19,680-Speed 10516.54 samples/sec Loss 8.8511 LearningRate 0.3070 Epoch: 8 Global Step: 44350 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:38:27,482-Speed 10501.72 samples/sec Loss 8.9200 LearningRate 0.3069 Epoch: 8 Global Step: 44360 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:38:35,290-Speed 10493.59 samples/sec Loss 8.8702 LearningRate 0.3068 Epoch: 8 Global Step: 44370 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:38:43,087-Speed 10507.55 samples/sec Loss 8.8487 LearningRate 0.3067 Epoch: 8 Global Step: 44380 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:38:50,860-Speed 10540.15 samples/sec Loss 8.8744 LearningRate 0.3066 Epoch: 8 Global Step: 44390 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:38:58,651-Speed 10515.66 samples/sec Loss 8.8177 LearningRate 0.3065 Epoch: 8 Global Step: 44400 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:39:06,446-Speed 10512.70 samples/sec Loss 8.7859 LearningRate 0.3064 Epoch: 8 Global Step: 44410 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:39:14,251-Speed 10496.37 samples/sec Loss 8.8856 LearningRate 0.3063 Epoch: 8 Global Step: 44420 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:39:22,084-Speed 10459.16 samples/sec Loss 8.8864 LearningRate 0.3062 Epoch: 8 Global Step: 44430 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:39:29,909-Speed 10471.64 samples/sec Loss 8.8446 LearningRate 0.3061 Epoch: 8 Global Step: 44440 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:39:37,779-Speed 10411.02 samples/sec Loss 8.8035 LearningRate 0.3060 Epoch: 8 Global Step: 44450 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:39:45,593-Speed 10486.57 samples/sec Loss 8.8910 LearningRate 0.3059 Epoch: 8 Global Step: 44460 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:39:53,383-Speed 10517.31 samples/sec Loss 8.8419 LearningRate 0.3058 Epoch: 8 Global Step: 44470 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:40:01,202-Speed 10479.08 samples/sec Loss 8.8399 LearningRate 0.3057 Epoch: 8 Global Step: 44480 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:40:08,996-Speed 10512.63 samples/sec Loss 8.8835 LearningRate 0.3055 Epoch: 8 Global Step: 44490 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:40:16,796-Speed 10503.38 samples/sec Loss 8.8618 LearningRate 0.3054 Epoch: 8 Global Step: 44500 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:40:24,616-Speed 10477.47 samples/sec Loss 8.8933 LearningRate 0.3053 Epoch: 8 Global Step: 44510 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:40:32,403-Speed 10521.32 samples/sec Loss 8.8586 LearningRate 0.3052 Epoch: 8 Global Step: 44520 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:40:40,213-Speed 10490.36 samples/sec Loss 8.8507 LearningRate 0.3051 Epoch: 8 Global Step: 44530 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:40:47,997-Speed 10525.67 samples/sec Loss 8.8371 LearningRate 0.3050 Epoch: 8 Global Step: 44540 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:40:55,787-Speed 10518.26 samples/sec Loss 8.8452 LearningRate 0.3049 Epoch: 8 Global Step: 44550 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:41:03,587-Speed 10507.52 samples/sec Loss 8.8939 LearningRate 0.3048 Epoch: 8 Global Step: 44560 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:41:11,372-Speed 10525.03 samples/sec Loss 8.9037 LearningRate 0.3047 Epoch: 8 Global Step: 44570 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:41:19,203-Speed 10461.72 samples/sec Loss 8.7875 LearningRate 0.3046 Epoch: 8 Global Step: 44580 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:41:27,022-Speed 10478.76 samples/sec Loss 8.8655 LearningRate 0.3045 Epoch: 8 Global Step: 44590 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:41:34,856-Speed 10458.26 samples/sec Loss 8.7833 LearningRate 0.3044 Epoch: 8 Global Step: 44600 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:41:42,671-Speed 10483.85 samples/sec Loss 8.8236 LearningRate 0.3043 Epoch: 8 Global Step: 44610 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:41:50,463-Speed 10514.89 samples/sec Loss 8.8029 LearningRate 0.3042 Epoch: 8 Global Step: 44620 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:41:58,290-Speed 10467.76 samples/sec Loss 8.8704 LearningRate 0.3041 Epoch: 8 Global Step: 44630 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:42:06,107-Speed 10486.65 samples/sec Loss 8.7706 LearningRate 0.3040 Epoch: 8 Global Step: 44640 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:42:13,892-Speed 10524.85 samples/sec Loss 8.8291 LearningRate 0.3039 Epoch: 8 Global Step: 44650 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:42:21,700-Speed 10492.31 samples/sec Loss 8.8205 LearningRate 0.3038 Epoch: 8 Global Step: 44660 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:42:29,502-Speed 10500.99 samples/sec Loss 8.8692 LearningRate 0.3037 Epoch: 8 Global Step: 44670 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:42:37,308-Speed 10496.35 samples/sec Loss 8.7957 LearningRate 0.3036 Epoch: 8 Global Step: 44680 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:42:45,087-Speed 10532.14 samples/sec Loss 8.8003 LearningRate 0.3035 Epoch: 8 Global Step: 44690 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:42:52,862-Speed 10538.09 samples/sec Loss 8.8780 LearningRate 0.3034 Epoch: 8 Global Step: 44700 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:43:00,673-Speed 10488.13 samples/sec Loss 8.8349 LearningRate 0.3033 Epoch: 8 Global Step: 44710 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:43:08,514-Speed 10448.97 samples/sec Loss 8.8399 LearningRate 0.3032 Epoch: 8 Global Step: 44720 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:43:16,392-Speed 10400.17 samples/sec Loss 8.8153 LearningRate 0.3031 Epoch: 8 Global Step: 44730 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:43:24,236-Speed 10445.37 samples/sec Loss 8.8543 LearningRate 0.3030 Epoch: 8 Global Step: 44740 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:43:32,118-Speed 10395.19 samples/sec Loss 8.8003 LearningRate 0.3029 Epoch: 8 Global Step: 44750 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:43:39,956-Speed 10453.94 samples/sec Loss 8.8368 LearningRate 0.3028 Epoch: 8 Global Step: 44760 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:43:47,759-Speed 10500.34 samples/sec Loss 8.7464 LearningRate 0.3027 Epoch: 8 Global Step: 44770 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:43:55,559-Speed 10505.02 samples/sec Loss 8.8184 LearningRate 0.3026 Epoch: 8 Global Step: 44780 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:44:03,357-Speed 10507.57 samples/sec Loss 8.7980 LearningRate 0.3025 Epoch: 8 Global Step: 44790 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:44:11,177-Speed 10477.69 samples/sec Loss 8.7508 LearningRate 0.3024 Epoch: 8 Global Step: 44800 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:44:19,046-Speed 10412.27 samples/sec Loss 8.7906 LearningRate 0.3023 Epoch: 8 Global Step: 44810 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:44:26,868-Speed 10473.91 samples/sec Loss 8.7794 LearningRate 0.3021 Epoch: 8 Global Step: 44820 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:44:34,677-Speed 10491.69 samples/sec Loss 8.8412 LearningRate 0.3020 Epoch: 8 Global Step: 44830 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:44:42,486-Speed 10492.99 samples/sec Loss 8.8017 LearningRate 0.3019 Epoch: 8 Global Step: 44840 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:44:50,303-Speed 10480.48 samples/sec Loss 8.8007 LearningRate 0.3018 Epoch: 8 Global Step: 44850 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:44:58,090-Speed 10522.12 samples/sec Loss 8.8219 LearningRate 0.3017 Epoch: 8 Global Step: 44860 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:45:05,899-Speed 10491.74 samples/sec Loss 8.7458 LearningRate 0.3016 Epoch: 8 Global Step: 44870 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:45:13,726-Speed 10467.26 samples/sec Loss 8.8321 LearningRate 0.3015 Epoch: 8 Global Step: 44880 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:45:21,602-Speed 10402.60 samples/sec Loss 8.7848 LearningRate 0.3014 Epoch: 8 Global Step: 44890 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:45:29,409-Speed 10494.90 samples/sec Loss 8.7697 LearningRate 0.3013 Epoch: 8 Global Step: 44900 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:45:37,276-Speed 10413.36 samples/sec Loss 8.7732 LearningRate 0.3012 Epoch: 8 Global Step: 44910 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:45:45,078-Speed 10502.63 samples/sec Loss 8.8279 LearningRate 0.3011 Epoch: 8 Global Step: 44920 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:45:52,891-Speed 10486.39 samples/sec Loss 8.8392 LearningRate 0.3010 Epoch: 8 Global Step: 44930 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:46:00,691-Speed 10503.31 samples/sec Loss 8.8193 LearningRate 0.3009 Epoch: 8 Global Step: 44940 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:46:08,493-Speed 10502.21 samples/sec Loss 8.7208 LearningRate 0.3008 Epoch: 8 Global Step: 44950 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:46:16,306-Speed 10486.76 samples/sec Loss 8.7848 LearningRate 0.3007 Epoch: 8 Global Step: 44960 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:46:24,100-Speed 10512.17 samples/sec Loss 8.7891 LearningRate 0.3006 Epoch: 8 Global Step: 44970 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:46:31,894-Speed 10511.87 samples/sec Loss 8.7636 LearningRate 0.3005 Epoch: 8 Global Step: 44980 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:46:39,713-Speed 10478.57 samples/sec Loss 8.7353 LearningRate 0.3004 Epoch: 8 Global Step: 44990 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:46:47,504-Speed 10516.37 samples/sec Loss 8.7817 LearningRate 0.3003 Epoch: 8 Global Step: 45000 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:46:55,302-Speed 10507.08 samples/sec Loss 8.7856 LearningRate 0.3002 Epoch: 8 Global Step: 45010 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:47:03,121-Speed 10477.73 samples/sec Loss 8.7694 LearningRate 0.3001 Epoch: 8 Global Step: 45020 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:47:10,909-Speed 10519.91 samples/sec Loss 8.7875 LearningRate 0.3000 Epoch: 8 Global Step: 45030 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:47:18,739-Speed 10463.40 samples/sec Loss 8.7574 LearningRate 0.2999 Epoch: 8 Global Step: 45040 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:47:26,536-Speed 10508.19 samples/sec Loss 8.7551 LearningRate 0.2998 Epoch: 8 Global Step: 45050 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:47:34,347-Speed 10489.69 samples/sec Loss 8.7835 LearningRate 0.2997 Epoch: 8 Global Step: 45060 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:47:42,144-Speed 10507.42 samples/sec Loss 8.7988 LearningRate 0.2996 Epoch: 8 Global Step: 45070 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:47:49,934-Speed 10518.25 samples/sec Loss 8.7894 LearningRate 0.2995 Epoch: 8 Global Step: 45080 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:47:57,744-Speed 10489.66 samples/sec Loss 8.7544 LearningRate 0.2994 Epoch: 8 Global Step: 45090 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:48:05,554-Speed 10490.65 samples/sec Loss 8.7296 LearningRate 0.2993 Epoch: 8 Global Step: 45100 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:48:13,414-Speed 10425.10 samples/sec Loss 8.8050 LearningRate 0.2992 Epoch: 8 Global Step: 45110 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:48:21,216-Speed 10501.42 samples/sec Loss 8.7038 LearningRate 0.2991 Epoch: 8 Global Step: 45120 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:48:29,003-Speed 10521.17 samples/sec Loss 8.7657 LearningRate 0.2990 Epoch: 8 Global Step: 45130 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:48:36,781-Speed 10533.21 samples/sec Loss 8.7686 LearningRate 0.2989 Epoch: 8 Global Step: 45140 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:48:44,575-Speed 10512.41 samples/sec Loss 8.7379 LearningRate 0.2988 Epoch: 8 Global Step: 45150 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:48:52,394-Speed 10479.14 samples/sec Loss 8.7254 LearningRate 0.2987 Epoch: 8 Global Step: 45160 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:49:00,224-Speed 10463.88 samples/sec Loss 8.7494 LearningRate 0.2986 Epoch: 8 Global Step: 45170 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:49:08,008-Speed 10524.54 samples/sec Loss 8.7720 LearningRate 0.2985 Epoch: 8 Global Step: 45180 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:49:15,789-Speed 10530.07 samples/sec Loss 8.7208 LearningRate 0.2984 Epoch: 8 Global Step: 45190 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:49:23,581-Speed 10515.57 samples/sec Loss 8.8287 LearningRate 0.2983 Epoch: 8 Global Step: 45200 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:49:31,356-Speed 10537.43 samples/sec Loss 8.7595 LearningRate 0.2982 Epoch: 8 Global Step: 45210 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:49:39,192-Speed 10456.25 samples/sec Loss 8.6998 LearningRate 0.2981 Epoch: 8 Global Step: 45220 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:49:47,046-Speed 10431.77 samples/sec Loss 8.7742 LearningRate 0.2980 Epoch: 8 Global Step: 45230 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:49:54,851-Speed 10496.65 samples/sec Loss 8.7759 LearningRate 0.2979 Epoch: 8 Global Step: 45240 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:50:02,675-Speed 10472.36 samples/sec Loss 8.7445 LearningRate 0.2978 Epoch: 8 Global Step: 45250 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:50:10,511-Speed 10455.93 samples/sec Loss 8.7323 LearningRate 0.2976 Epoch: 8 Global Step: 45260 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:50:18,332-Speed 10476.53 samples/sec Loss 8.7535 LearningRate 0.2975 Epoch: 8 Global Step: 45270 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:50:26,151-Speed 10477.76 samples/sec Loss 8.7425 LearningRate 0.2974 Epoch: 8 Global Step: 45280 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:50:33,970-Speed 10479.49 samples/sec Loss 8.8148 LearningRate 0.2973 Epoch: 8 Global Step: 45290 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:50:41,792-Speed 10474.61 samples/sec Loss 8.7943 LearningRate 0.2972 Epoch: 8 Global Step: 45300 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:50:49,646-Speed 10430.82 samples/sec Loss 8.7036 LearningRate 0.2971 Epoch: 8 Global Step: 45310 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:50:57,482-Speed 10455.46 samples/sec Loss 8.7596 LearningRate 0.2970 Epoch: 8 Global Step: 45320 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:51:05,279-Speed 10507.45 samples/sec Loss 8.6624 LearningRate 0.2969 Epoch: 8 Global Step: 45330 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:51:13,149-Speed 10411.37 samples/sec Loss 8.7266 LearningRate 0.2968 Epoch: 8 Global Step: 45340 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:51:20,956-Speed 10493.75 samples/sec Loss 8.7456 LearningRate 0.2967 Epoch: 8 Global Step: 45350 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:51:28,757-Speed 10504.92 samples/sec Loss 8.7163 LearningRate 0.2966 Epoch: 8 Global Step: 45360 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:51:36,571-Speed 10486.07 samples/sec Loss 8.7350 LearningRate 0.2965 Epoch: 8 Global Step: 45370 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:51:44,373-Speed 10501.99 samples/sec Loss 8.7263 LearningRate 0.2964 Epoch: 8 Global Step: 45380 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:51:52,206-Speed 10459.71 samples/sec Loss 8.7016 LearningRate 0.2963 Epoch: 8 Global Step: 45390 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:52:00,046-Speed 10449.79 samples/sec Loss 8.6825 LearningRate 0.2962 Epoch: 8 Global Step: 45400 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:52:07,867-Speed 10475.55 samples/sec Loss 8.7525 LearningRate 0.2961 Epoch: 8 Global Step: 45410 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:52:15,672-Speed 10497.03 samples/sec Loss 8.7228 LearningRate 0.2960 Epoch: 8 Global Step: 45420 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:52:23,503-Speed 10462.38 samples/sec Loss 8.7187 LearningRate 0.2959 Epoch: 8 Global Step: 45430 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:52:31,332-Speed 10464.75 samples/sec Loss 8.6770 LearningRate 0.2958 Epoch: 8 Global Step: 45440 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:52:39,154-Speed 10474.78 samples/sec Loss 8.6844 LearningRate 0.2957 Epoch: 8 Global Step: 45450 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:52:46,968-Speed 10486.02 samples/sec Loss 8.6902 LearningRate 0.2956 Epoch: 8 Global Step: 45460 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:52:54,800-Speed 10466.29 samples/sec Loss 8.7701 LearningRate 0.2955 Epoch: 8 Global Step: 45470 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:53:02,639-Speed 10451.36 samples/sec Loss 8.6823 LearningRate 0.2954 Epoch: 8 Global Step: 45480 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:53:10,464-Speed 10470.72 samples/sec Loss 8.7021 LearningRate 0.2953 Epoch: 8 Global Step: 45490 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:53:18,296-Speed 10466.23 samples/sec Loss 8.7764 LearningRate 0.2952 Epoch: 8 Global Step: 45500 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:53:26,147-Speed 10436.45 samples/sec Loss 8.6554 LearningRate 0.2951 Epoch: 8 Global Step: 45510 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:53:33,944-Speed 10506.73 samples/sec Loss 8.6693 LearningRate 0.2950 Epoch: 8 Global Step: 45520 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:53:41,785-Speed 10450.04 samples/sec Loss 8.7134 LearningRate 0.2949 Epoch: 8 Global Step: 45530 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:53:49,606-Speed 10476.37 samples/sec Loss 8.7140 LearningRate 0.2948 Epoch: 8 Global Step: 45540 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:53:57,462-Speed 10428.39 samples/sec Loss 8.7152 LearningRate 0.2947 Epoch: 8 Global Step: 45550 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:54:05,288-Speed 10469.16 samples/sec Loss 8.6681 LearningRate 0.2946 Epoch: 8 Global Step: 45560 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:54:13,117-Speed 10464.81 samples/sec Loss 8.7045 LearningRate 0.2945 Epoch: 8 Global Step: 45570 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:54:20,946-Speed 10465.06 samples/sec Loss 8.6593 LearningRate 0.2944 Epoch: 8 Global Step: 45580 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:54:28,776-Speed 10463.87 samples/sec Loss 8.6928 LearningRate 0.2943 Epoch: 8 Global Step: 45590 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:54:36,609-Speed 10459.62 samples/sec Loss 8.6504 LearningRate 0.2942 Epoch: 8 Global Step: 45600 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:54:44,427-Speed 10479.02 samples/sec Loss 8.6556 LearningRate 0.2941 Epoch: 8 Global Step: 45610 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:54:52,222-Speed 10511.46 samples/sec Loss 8.6850 LearningRate 0.2940 Epoch: 8 Global Step: 45620 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:55:00,017-Speed 10510.71 samples/sec Loss 8.6845 LearningRate 0.2939 Epoch: 8 Global Step: 45630 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:55:07,808-Speed 10515.82 samples/sec Loss 8.7452 LearningRate 0.2938 Epoch: 8 Global Step: 45640 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:55:15,605-Speed 10508.27 samples/sec Loss 8.7458 LearningRate 0.2937 Epoch: 8 Global Step: 45650 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:55:23,412-Speed 10494.26 samples/sec Loss 8.6772 LearningRate 0.2936 Epoch: 8 Global Step: 45660 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:55:31,201-Speed 10518.76 samples/sec Loss 8.6270 LearningRate 0.2935 Epoch: 8 Global Step: 45670 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:55:39,008-Speed 10495.52 samples/sec Loss 8.6864 LearningRate 0.2934 Epoch: 8 Global Step: 45680 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:55:46,801-Speed 10512.52 samples/sec Loss 8.6851 LearningRate 0.2933 Epoch: 8 Global Step: 45690 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:55:54,586-Speed 10525.42 samples/sec Loss 8.6425 LearningRate 0.2932 Epoch: 8 Global Step: 45700 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:56:02,370-Speed 10525.78 samples/sec Loss 8.6584 LearningRate 0.2931 Epoch: 8 Global Step: 45710 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:56:10,149-Speed 10532.04 samples/sec Loss 8.6764 LearningRate 0.2930 Epoch: 8 Global Step: 45720 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:56:17,931-Speed 10526.96 samples/sec Loss 8.6411 LearningRate 0.2929 Epoch: 8 Global Step: 45730 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:56:25,711-Speed 10535.26 samples/sec Loss 8.6653 LearningRate 0.2928 Epoch: 8 Global Step: 45740 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:56:33,509-Speed 10506.48 samples/sec Loss 8.5999 LearningRate 0.2927 Epoch: 8 Global Step: 45750 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:56:41,311-Speed 10501.48 samples/sec Loss 8.6702 LearningRate 0.2926 Epoch: 8 Global Step: 45760 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:56:49,081-Speed 10544.92 samples/sec Loss 8.7091 LearningRate 0.2925 Epoch: 8 Global Step: 45770 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:56:56,864-Speed 10527.06 samples/sec Loss 8.6299 LearningRate 0.2924 Epoch: 8 Global Step: 45780 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:57:04,686-Speed 10475.35 samples/sec Loss 8.6385 LearningRate 0.2923 Epoch: 8 Global Step: 45790 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:57:12,477-Speed 10516.38 samples/sec Loss 8.7646 LearningRate 0.2922 Epoch: 8 Global Step: 45800 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:57:20,273-Speed 10508.30 samples/sec Loss 8.7138 LearningRate 0.2921 Epoch: 8 Global Step: 45810 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:57:28,068-Speed 10509.69 samples/sec Loss 8.6300 LearningRate 0.2920 Epoch: 8 Global Step: 45820 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:57:35,850-Speed 10529.91 samples/sec Loss 8.6699 LearningRate 0.2919 Epoch: 8 Global Step: 45830 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:57:43,693-Speed 10446.25 samples/sec Loss 8.6498 LearningRate 0.2918 Epoch: 8 Global Step: 45840 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:57:51,522-Speed 10463.66 samples/sec Loss 8.6081 LearningRate 0.2917 Epoch: 8 Global Step: 45850 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-01-16 00:57:59,347-Speed 10475.27 samples/sec Loss 8.6459 LearningRate 0.2916 Epoch: 8 Global Step: 45860 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:58:07,154-Speed 10500.17 samples/sec Loss 8.5707 LearningRate 0.2915 Epoch: 8 Global Step: 45870 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:58:14,977-Speed 10473.24 samples/sec Loss 8.6948 LearningRate 0.2914 Epoch: 8 Global Step: 45880 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:58:22,765-Speed 10520.46 samples/sec Loss 8.6399 LearningRate 0.2913 Epoch: 8 Global Step: 45890 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:58:30,569-Speed 10498.32 samples/sec Loss 8.6213 LearningRate 0.2912 Epoch: 8 Global Step: 45900 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:58:38,374-Speed 10496.42 samples/sec Loss 8.6553 LearningRate 0.2911 Epoch: 8 Global Step: 45910 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:58:46,167-Speed 10514.52 samples/sec Loss 8.6117 LearningRate 0.2910 Epoch: 8 Global Step: 45920 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:58:53,975-Speed 10493.38 samples/sec Loss 8.6055 LearningRate 0.2909 Epoch: 8 Global Step: 45930 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:59:01,785-Speed 10489.53 samples/sec Loss 8.6768 LearningRate 0.2908 Epoch: 8 Global Step: 45940 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:59:09,607-Speed 10474.71 samples/sec Loss 8.6365 LearningRate 0.2907 Epoch: 8 Global Step: 45950 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:59:17,408-Speed 10502.08 samples/sec Loss 8.7095 LearningRate 0.2906 Epoch: 8 Global Step: 45960 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:59:25,239-Speed 10462.75 samples/sec Loss 8.6948 LearningRate 0.2905 Epoch: 8 Global Step: 45970 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:59:33,072-Speed 10459.98 samples/sec Loss 8.6075 LearningRate 0.2904 Epoch: 8 Global Step: 45980 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 00:59:40,865-Speed 10512.98 samples/sec Loss 8.6016 LearningRate 0.2903 Epoch: 8 Global Step: 45990 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:59:48,681-Speed 10487.40 samples/sec Loss 8.6197 LearningRate 0.2902 Epoch: 8 Global Step: 46000 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 00:59:56,507-Speed 10468.53 samples/sec Loss 8.5776 LearningRate 0.2901 Epoch: 8 Global Step: 46010 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:00:04,330-Speed 10473.54 samples/sec Loss 8.6915 LearningRate 0.2900 Epoch: 8 Global Step: 46020 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:00:12,153-Speed 10473.52 samples/sec Loss 8.6442 LearningRate 0.2899 Epoch: 8 Global Step: 46030 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:00:19,951-Speed 10506.21 samples/sec Loss 8.5897 LearningRate 0.2898 Epoch: 8 Global Step: 46040 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:00:27,778-Speed 10467.63 samples/sec Loss 8.6499 LearningRate 0.2897 Epoch: 8 Global Step: 46050 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:00:35,605-Speed 10468.84 samples/sec Loss 8.5994 LearningRate 0.2896 Epoch: 8 Global Step: 46060 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:00:43,389-Speed 10525.00 samples/sec Loss 8.6046 LearningRate 0.2895 Epoch: 8 Global Step: 46070 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:00:51,186-Speed 10508.68 samples/sec Loss 8.5894 LearningRate 0.2894 Epoch: 8 Global Step: 46080 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:00:58,981-Speed 10510.42 samples/sec Loss 8.5998 LearningRate 0.2893 Epoch: 8 Global Step: 46090 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 01:01:06,769-Speed 10520.87 samples/sec Loss 8.6242 LearningRate 0.2892 Epoch: 8 Global Step: 46100 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 01:01:14,542-Speed 10542.29 samples/sec Loss 8.6282 LearningRate 0.2891 Epoch: 8 Global Step: 46110 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 01:01:22,331-Speed 10519.46 samples/sec Loss 8.5514 LearningRate 0.2890 Epoch: 8 Global Step: 46120 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:01:30,139-Speed 10492.82 samples/sec Loss 8.7057 LearningRate 0.2888 Epoch: 8 Global Step: 46130 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:01:37,930-Speed 10516.76 samples/sec Loss 8.5885 LearningRate 0.2887 Epoch: 8 Global Step: 46140 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:01:45,715-Speed 10523.70 samples/sec Loss 8.5804 LearningRate 0.2886 Epoch: 8 Global Step: 46150 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:01:53,507-Speed 10514.23 samples/sec Loss 8.6329 LearningRate 0.2885 Epoch: 8 Global Step: 46160 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:02:01,302-Speed 10511.71 samples/sec Loss 8.6021 LearningRate 0.2884 Epoch: 8 Global Step: 46170 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:02:09,117-Speed 10483.14 samples/sec Loss 8.5864 LearningRate 0.2883 Epoch: 8 Global Step: 46180 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:02:16,922-Speed 10497.57 samples/sec Loss 8.6708 LearningRate 0.2882 Epoch: 8 Global Step: 46190 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:02:24,711-Speed 10518.90 samples/sec Loss 8.5372 LearningRate 0.2881 Epoch: 8 Global Step: 46200 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:02:32,516-Speed 10498.89 samples/sec Loss 8.7088 LearningRate 0.2880 Epoch: 8 Global Step: 46210 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:02:40,345-Speed 10464.74 samples/sec Loss 8.6457 LearningRate 0.2879 Epoch: 8 Global Step: 46220 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 01:02:48,130-Speed 10523.12 samples/sec Loss 8.5796 LearningRate 0.2878 Epoch: 8 Global Step: 46230 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 01:02:55,918-Speed 10521.05 samples/sec Loss 8.6382 LearningRate 0.2877 Epoch: 8 Global Step: 46240 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:03:03,721-Speed 10500.35 samples/sec Loss 8.6339 LearningRate 0.2876 Epoch: 8 Global Step: 46250 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:03:11,523-Speed 10499.99 samples/sec Loss 8.6365 LearningRate 0.2875 Epoch: 8 Global Step: 46260 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:03:19,349-Speed 10469.30 samples/sec Loss 8.5173 LearningRate 0.2874 Epoch: 8 Global Step: 46270 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:03:27,164-Speed 10484.85 samples/sec Loss 8.5060 LearningRate 0.2873 Epoch: 8 Global Step: 46280 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:03:34,953-Speed 10518.53 samples/sec Loss 8.5849 LearningRate 0.2872 Epoch: 8 Global Step: 46290 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:03:42,747-Speed 10511.87 samples/sec Loss 8.6023 LearningRate 0.2871 Epoch: 8 Global Step: 46300 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:03:50,581-Speed 10458.48 samples/sec Loss 8.5808 LearningRate 0.2870 Epoch: 8 Global Step: 46310 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:03:58,387-Speed 10495.26 samples/sec Loss 8.5733 LearningRate 0.2869 Epoch: 8 Global Step: 46320 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:04:06,159-Speed 10542.54 samples/sec Loss 8.5762 LearningRate 0.2868 Epoch: 8 Global Step: 46330 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:04:13,939-Speed 10530.87 samples/sec Loss 8.5975 LearningRate 0.2867 Epoch: 8 Global Step: 46340 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 01:04:21,731-Speed 10514.54 samples/sec Loss 8.6368 LearningRate 0.2866 Epoch: 8 Global Step: 46350 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:04:29,521-Speed 10517.64 samples/sec Loss 8.6373 LearningRate 0.2865 Epoch: 8 Global Step: 46360 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:04:37,306-Speed 10524.72 samples/sec Loss 8.5775 LearningRate 0.2864 Epoch: 8 Global Step: 46370 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:04:45,092-Speed 10523.04 samples/sec Loss 8.5365 LearningRate 0.2863 Epoch: 8 Global Step: 46380 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:04:52,901-Speed 10492.18 samples/sec Loss 8.5758 LearningRate 0.2862 Epoch: 8 Global Step: 46390 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:05:00,682-Speed 10528.97 samples/sec Loss 8.4723 LearningRate 0.2861 Epoch: 8 Global Step: 46400 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:05:08,517-Speed 10456.88 samples/sec Loss 8.5577 LearningRate 0.2860 Epoch: 8 Global Step: 46410 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:05:16,336-Speed 10478.60 samples/sec Loss 8.6000 LearningRate 0.2859 Epoch: 8 Global Step: 46420 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:05:24,136-Speed 10504.35 samples/sec Loss 8.5180 LearningRate 0.2858 Epoch: 8 Global Step: 46430 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:05:31,940-Speed 10499.53 samples/sec Loss 8.5356 LearningRate 0.2857 Epoch: 8 Global Step: 46440 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:05:39,733-Speed 10512.91 samples/sec Loss 8.5733 LearningRate 0.2856 Epoch: 8 Global Step: 46450 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:05:47,529-Speed 10508.54 samples/sec Loss 8.5776 LearningRate 0.2855 Epoch: 8 Global Step: 46460 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:05:55,333-Speed 10499.43 samples/sec Loss 8.5352 LearningRate 0.2854 Epoch: 8 Global Step: 46470 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:06:03,152-Speed 10477.76 samples/sec Loss 8.5740 LearningRate 0.2853 Epoch: 8 Global Step: 46480 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:06:10,952-Speed 10504.93 samples/sec Loss 8.5883 LearningRate 0.2852 Epoch: 8 Global Step: 46490 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:06:18,756-Speed 10497.71 samples/sec Loss 8.6044 LearningRate 0.2851 Epoch: 8 Global Step: 46500 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:06:26,530-Speed 10539.84 samples/sec Loss 8.5138 LearningRate 0.2850 Epoch: 8 Global Step: 46510 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:06:34,310-Speed 10530.37 samples/sec Loss 8.5689 LearningRate 0.2849 Epoch: 8 Global Step: 46520 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:06:42,101-Speed 10516.76 samples/sec Loss 8.6413 LearningRate 0.2848 Epoch: 8 Global Step: 46530 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:06:49,909-Speed 10493.59 samples/sec Loss 8.5319 LearningRate 0.2847 Epoch: 8 Global Step: 46540 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:06:57,702-Speed 10512.26 samples/sec Loss 8.5290 LearningRate 0.2846 Epoch: 8 Global Step: 46550 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 01:07:05,492-Speed 10517.09 samples/sec Loss 8.4994 LearningRate 0.2845 Epoch: 8 Global Step: 46560 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 01:07:13,305-Speed 10486.72 samples/sec Loss 8.5712 LearningRate 0.2844 Epoch: 8 Global Step: 46570 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 01:07:21,089-Speed 10525.51 samples/sec Loss 8.5636 LearningRate 0.2844 Epoch: 8 Global Step: 46580 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 01:07:28,887-Speed 10506.68 samples/sec Loss 8.5147 LearningRate 0.2843 Epoch: 8 Global Step: 46590 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 01:07:36,675-Speed 10519.79 samples/sec Loss 8.5406 LearningRate 0.2842 Epoch: 8 Global Step: 46600 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 01:07:44,467-Speed 10515.55 samples/sec Loss 8.5328 LearningRate 0.2841 Epoch: 8 Global Step: 46610 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 01:07:52,275-Speed 10493.15 samples/sec Loss 8.5496 LearningRate 0.2840 Epoch: 8 Global Step: 46620 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:08:00,083-Speed 10492.55 samples/sec Loss 8.5272 LearningRate 0.2839 Epoch: 8 Global Step: 46630 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:08:07,896-Speed 10487.45 samples/sec Loss 8.6432 LearningRate 0.2838 Epoch: 8 Global Step: 46640 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:08:15,701-Speed 10496.88 samples/sec Loss 8.5777 LearningRate 0.2837 Epoch: 8 Global Step: 46650 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:08:23,499-Speed 10507.48 samples/sec Loss 8.5631 LearningRate 0.2836 Epoch: 8 Global Step: 46660 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:08:46,366-Speed 3582.50 samples/sec Loss 8.5290 LearningRate 0.2835 Epoch: 9 Global Step: 46670 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:08:54,120-Speed 10566.48 samples/sec Loss 8.5241 LearningRate 0.2834 Epoch: 9 Global Step: 46680 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:09:01,916-Speed 10509.65 samples/sec Loss 8.5198 LearningRate 0.2833 Epoch: 9 Global Step: 46690 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:09:09,708-Speed 10514.37 samples/sec Loss 8.5253 LearningRate 0.2832 Epoch: 9 Global Step: 46700 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:09:17,499-Speed 10515.52 samples/sec Loss 8.5235 LearningRate 0.2831 Epoch: 9 Global Step: 46710 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:09:25,298-Speed 10505.70 samples/sec Loss 8.5280 LearningRate 0.2830 Epoch: 9 Global Step: 46720 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 01:09:33,103-Speed 10497.86 samples/sec Loss 8.5253 LearningRate 0.2829 Epoch: 9 Global Step: 46730 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 01:09:40,865-Speed 10554.07 samples/sec Loss 8.5771 LearningRate 0.2828 Epoch: 9 Global Step: 46740 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 01:09:48,664-Speed 10505.67 samples/sec Loss 8.5426 LearningRate 0.2827 Epoch: 9 Global Step: 46750 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 01:09:56,475-Speed 10490.09 samples/sec Loss 8.5394 LearningRate 0.2826 Epoch: 9 Global Step: 46760 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 01:10:04,266-Speed 10515.40 samples/sec Loss 8.4594 LearningRate 0.2825 Epoch: 9 Global Step: 46770 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-01-16 01:10:12,058-Speed 10515.01 samples/sec Loss 8.4466 LearningRate 0.2824 Epoch: 9 Global Step: 46780 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:10:19,858-Speed 10506.40 samples/sec Loss 8.5026 LearningRate 0.2823 Epoch: 9 Global Step: 46790 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:10:27,656-Speed 10507.38 samples/sec Loss 8.5061 LearningRate 0.2822 Epoch: 9 Global Step: 46800 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:10:35,457-Speed 10502.60 samples/sec Loss 8.4761 LearningRate 0.2821 Epoch: 9 Global Step: 46810 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:10:43,256-Speed 10505.30 samples/sec Loss 8.4912 LearningRate 0.2820 Epoch: 9 Global Step: 46820 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-01-16 01:10:51,043-Speed 10521.28 samples/sec Loss 8.4246 LearningRate 0.2819 Epoch: 9 Global Step: 46830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:10:58,838-Speed 10511.92 samples/sec Loss 8.5136 LearningRate 0.2818 Epoch: 9 Global Step: 46840 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:11:06,691-Speed 10431.83 samples/sec Loss 8.4984 LearningRate 0.2817 Epoch: 9 Global Step: 46850 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:11:14,507-Speed 10483.06 samples/sec Loss 8.5289 LearningRate 0.2816 Epoch: 9 Global Step: 46860 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:11:22,296-Speed 10519.14 samples/sec Loss 8.4792 LearningRate 0.2815 Epoch: 9 Global Step: 46870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:11:30,105-Speed 10491.13 samples/sec Loss 8.4835 LearningRate 0.2814 Epoch: 9 Global Step: 46880 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:11:37,897-Speed 10515.46 samples/sec Loss 8.4679 LearningRate 0.2813 Epoch: 9 Global Step: 46890 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:11:45,727-Speed 10462.73 samples/sec Loss 8.4893 LearningRate 0.2812 Epoch: 9 Global Step: 46900 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:11:53,506-Speed 10531.66 samples/sec Loss 8.4688 LearningRate 0.2811 Epoch: 9 Global Step: 46910 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:12:01,288-Speed 10529.42 samples/sec Loss 8.5072 LearningRate 0.2810 Epoch: 9 Global Step: 46920 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:12:09,091-Speed 10498.94 samples/sec Loss 8.5141 LearningRate 0.2809 Epoch: 9 Global Step: 46930 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:12:16,872-Speed 10529.59 samples/sec Loss 8.5236 LearningRate 0.2808 Epoch: 9 Global Step: 46940 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:12:24,660-Speed 10520.15 samples/sec Loss 8.5304 LearningRate 0.2807 Epoch: 9 Global Step: 46950 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:12:32,446-Speed 10522.86 samples/sec Loss 8.5329 LearningRate 0.2806 Epoch: 9 Global Step: 46960 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:12:40,245-Speed 10505.07 samples/sec Loss 8.4182 LearningRate 0.2805 Epoch: 9 Global Step: 46970 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:12:48,028-Speed 10527.02 samples/sec Loss 8.5120 LearningRate 0.2804 Epoch: 9 Global Step: 46980 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:12:55,811-Speed 10528.42 samples/sec Loss 8.4625 LearningRate 0.2803 Epoch: 9 Global Step: 46990 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:13:03,652-Speed 10448.26 samples/sec Loss 8.5790 LearningRate 0.2802 Epoch: 9 Global Step: 47000 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:13:11,445-Speed 10513.21 samples/sec Loss 8.5485 LearningRate 0.2801 Epoch: 9 Global Step: 47010 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:13:19,235-Speed 10518.19 samples/sec Loss 8.4775 LearningRate 0.2800 Epoch: 9 Global Step: 47020 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:13:27,047-Speed 10489.77 samples/sec Loss 8.4822 LearningRate 0.2799 Epoch: 9 Global Step: 47030 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:13:34,850-Speed 10500.10 samples/sec Loss 8.4981 LearningRate 0.2798 Epoch: 9 Global Step: 47040 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:13:42,645-Speed 10509.51 samples/sec Loss 8.4881 LearningRate 0.2797 Epoch: 9 Global Step: 47050 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:13:50,456-Speed 10489.57 samples/sec Loss 8.4540 LearningRate 0.2796 Epoch: 9 Global Step: 47060 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:13:58,280-Speed 10473.21 samples/sec Loss 8.5184 LearningRate 0.2795 Epoch: 9 Global Step: 47070 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:14:06,130-Speed 10435.82 samples/sec Loss 8.4891 LearningRate 0.2794 Epoch: 9 Global Step: 47080 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:14:13,961-Speed 10462.75 samples/sec Loss 8.4792 LearningRate 0.2793 Epoch: 9 Global Step: 47090 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:14:21,776-Speed 10483.21 samples/sec Loss 8.4227 LearningRate 0.2792 Epoch: 9 Global Step: 47100 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:14:29,609-Speed 10460.59 samples/sec Loss 8.4732 LearningRate 0.2791 Epoch: 9 Global Step: 47110 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:14:37,432-Speed 10472.08 samples/sec Loss 8.4398 LearningRate 0.2790 Epoch: 9 Global Step: 47120 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:14:45,268-Speed 10455.94 samples/sec Loss 8.4664 LearningRate 0.2789 Epoch: 9 Global Step: 47130 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:14:53,098-Speed 10464.02 samples/sec Loss 8.5057 LearningRate 0.2788 Epoch: 9 Global Step: 47140 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:15:00,942-Speed 10445.25 samples/sec Loss 8.4885 LearningRate 0.2787 Epoch: 9 Global Step: 47150 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:15:08,810-Speed 10412.23 samples/sec Loss 8.5047 LearningRate 0.2786 Epoch: 9 Global Step: 47160 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:15:16,666-Speed 10429.78 samples/sec Loss 8.4672 LearningRate 0.2785 Epoch: 9 Global Step: 47170 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:15:24,507-Speed 10448.63 samples/sec Loss 8.4390 LearningRate 0.2784 Epoch: 9 Global Step: 47180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:15:32,335-Speed 10467.40 samples/sec Loss 8.5087 LearningRate 0.2783 Epoch: 9 Global Step: 47190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:15:40,167-Speed 10460.75 samples/sec Loss 8.4060 LearningRate 0.2782 Epoch: 9 Global Step: 47200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:15:47,997-Speed 10463.08 samples/sec Loss 8.4555 LearningRate 0.2781 Epoch: 9 Global Step: 47210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:15:55,825-Speed 10466.21 samples/sec Loss 8.5167 LearningRate 0.2780 Epoch: 9 Global Step: 47220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:16:03,674-Speed 10439.23 samples/sec Loss 8.4597 LearningRate 0.2779 Epoch: 9 Global Step: 47230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:16:11,531-Speed 10427.00 samples/sec Loss 8.4517 LearningRate 0.2778 Epoch: 9 Global Step: 47240 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:16:19,362-Speed 10462.65 samples/sec Loss 8.4058 LearningRate 0.2777 Epoch: 9 Global Step: 47250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:16:27,180-Speed 10479.54 samples/sec Loss 8.4293 LearningRate 0.2776 Epoch: 9 Global Step: 47260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:16:35,001-Speed 10475.08 samples/sec Loss 8.4287 LearningRate 0.2775 Epoch: 9 Global Step: 47270 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:16:42,864-Speed 10420.49 samples/sec Loss 8.4417 LearningRate 0.2774 Epoch: 9 Global Step: 47280 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:16:50,694-Speed 10463.68 samples/sec Loss 8.3890 LearningRate 0.2773 Epoch: 9 Global Step: 47290 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:16:58,517-Speed 10472.73 samples/sec Loss 8.4513 LearningRate 0.2772 Epoch: 9 Global Step: 47300 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:17:06,351-Speed 10458.88 samples/sec Loss 8.4281 LearningRate 0.2771 Epoch: 9 Global Step: 47310 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:17:14,175-Speed 10471.60 samples/sec Loss 8.4566 LearningRate 0.2770 Epoch: 9 Global Step: 47320 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:17:22,110-Speed 10324.27 samples/sec Loss 8.3915 LearningRate 0.2769 Epoch: 9 Global Step: 47330 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:17:29,962-Speed 10439.22 samples/sec Loss 8.4972 LearningRate 0.2768 Epoch: 9 Global Step: 47340 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:17:37,817-Speed 10430.07 samples/sec Loss 8.4295 LearningRate 0.2767 Epoch: 9 Global Step: 47350 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:17:45,628-Speed 10489.17 samples/sec Loss 8.3650 LearningRate 0.2766 Epoch: 9 Global Step: 47360 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:17:53,465-Speed 10454.23 samples/sec Loss 8.4141 LearningRate 0.2765 Epoch: 9 Global Step: 47370 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:18:01,283-Speed 10481.81 samples/sec Loss 8.4396 LearningRate 0.2764 Epoch: 9 Global Step: 47380 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:18:09,132-Speed 10439.96 samples/sec Loss 8.3756 LearningRate 0.2763 Epoch: 9 Global Step: 47390 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:18:17,001-Speed 10412.08 samples/sec Loss 8.4396 LearningRate 0.2762 Epoch: 9 Global Step: 47400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:18:24,833-Speed 10461.29 samples/sec Loss 8.4160 LearningRate 0.2761 Epoch: 9 Global Step: 47410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:18:32,664-Speed 10462.56 samples/sec Loss 8.3846 LearningRate 0.2760 Epoch: 9 Global Step: 47420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:18:40,519-Speed 10430.29 samples/sec Loss 8.3977 LearningRate 0.2759 Epoch: 9 Global Step: 47430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:18:48,329-Speed 10491.63 samples/sec Loss 8.4027 LearningRate 0.2758 Epoch: 9 Global Step: 47440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:18:56,179-Speed 10436.40 samples/sec Loss 8.4477 LearningRate 0.2758 Epoch: 9 Global Step: 47450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:19:04,005-Speed 10469.99 samples/sec Loss 8.4227 LearningRate 0.2757 Epoch: 9 Global Step: 47460 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:19:11,802-Speed 10507.47 samples/sec Loss 8.4479 LearningRate 0.2756 Epoch: 9 Global Step: 47470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:19:19,593-Speed 10516.56 samples/sec Loss 8.4323 LearningRate 0.2755 Epoch: 9 Global Step: 47480 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:19:27,427-Speed 10457.61 samples/sec Loss 8.4266 LearningRate 0.2754 Epoch: 9 Global Step: 47490 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:19:35,237-Speed 10492.01 samples/sec Loss 8.3772 LearningRate 0.2753 Epoch: 9 Global Step: 47500 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:19:43,045-Speed 10493.08 samples/sec Loss 8.4060 LearningRate 0.2752 Epoch: 9 Global Step: 47510 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:19:50,843-Speed 10506.48 samples/sec Loss 8.4164 LearningRate 0.2751 Epoch: 9 Global Step: 47520 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:19:58,628-Speed 10524.47 samples/sec Loss 8.3882 LearningRate 0.2750 Epoch: 9 Global Step: 47530 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:20:06,432-Speed 10498.75 samples/sec Loss 8.5234 LearningRate 0.2749 Epoch: 9 Global Step: 47540 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:20:14,262-Speed 10462.84 samples/sec Loss 8.4663 LearningRate 0.2748 Epoch: 9 Global Step: 47550 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:20:22,058-Speed 10509.86 samples/sec Loss 8.3944 LearningRate 0.2747 Epoch: 9 Global Step: 47560 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:20:29,864-Speed 10495.85 samples/sec Loss 8.4140 LearningRate 0.2746 Epoch: 9 Global Step: 47570 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:20:37,696-Speed 10461.17 samples/sec Loss 8.3692 LearningRate 0.2745 Epoch: 9 Global Step: 47580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:20:45,526-Speed 10462.79 samples/sec Loss 8.3766 LearningRate 0.2744 Epoch: 9 Global Step: 47590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:20:53,333-Speed 10494.99 samples/sec Loss 8.4659 LearningRate 0.2743 Epoch: 9 Global Step: 47600 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:21:01,186-Speed 10432.74 samples/sec Loss 8.4580 LearningRate 0.2742 Epoch: 9 Global Step: 47610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:21:08,988-Speed 10501.94 samples/sec Loss 8.3850 LearningRate 0.2741 Epoch: 9 Global Step: 47620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:21:16,786-Speed 10507.61 samples/sec Loss 8.3362 LearningRate 0.2740 Epoch: 9 Global Step: 47630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:21:24,607-Speed 10474.69 samples/sec Loss 8.3475 LearningRate 0.2739 Epoch: 9 Global Step: 47640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:21:32,401-Speed 10512.53 samples/sec Loss 8.3851 LearningRate 0.2738 Epoch: 9 Global Step: 47650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:21:40,190-Speed 10521.31 samples/sec Loss 8.3556 LearningRate 0.2737 Epoch: 9 Global Step: 47660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:21:47,994-Speed 10502.43 samples/sec Loss 8.3316 LearningRate 0.2736 Epoch: 9 Global Step: 47670 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:21:55,782-Speed 10520.41 samples/sec Loss 8.3484 LearningRate 0.2735 Epoch: 9 Global Step: 47680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:22:03,609-Speed 10468.26 samples/sec Loss 8.3494 LearningRate 0.2734 Epoch: 9 Global Step: 47690 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:22:11,397-Speed 10519.42 samples/sec Loss 8.3521 LearningRate 0.2733 Epoch: 9 Global Step: 47700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:22:19,239-Speed 10447.44 samples/sec Loss 8.3505 LearningRate 0.2732 Epoch: 9 Global Step: 47710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:22:27,075-Speed 10456.81 samples/sec Loss 8.3856 LearningRate 0.2731 Epoch: 9 Global Step: 47720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:22:34,867-Speed 10514.85 samples/sec Loss 8.3413 LearningRate 0.2730 Epoch: 9 Global Step: 47730 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:22:42,645-Speed 10533.22 samples/sec Loss 8.3938 LearningRate 0.2729 Epoch: 9 Global Step: 47740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:22:50,428-Speed 10527.32 samples/sec Loss 8.3856 LearningRate 0.2728 Epoch: 9 Global Step: 47750 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:22:58,223-Speed 10509.89 samples/sec Loss 8.3652 LearningRate 0.2727 Epoch: 9 Global Step: 47760 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:23:06,028-Speed 10498.97 samples/sec Loss 8.3688 LearningRate 0.2726 Epoch: 9 Global Step: 47770 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:23:13,802-Speed 10539.38 samples/sec Loss 8.3552 LearningRate 0.2725 Epoch: 9 Global Step: 47780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:23:21,582-Speed 10531.49 samples/sec Loss 8.3621 LearningRate 0.2724 Epoch: 9 Global Step: 47790 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:23:29,389-Speed 10494.39 samples/sec Loss 8.4133 LearningRate 0.2723 Epoch: 9 Global Step: 47800 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:23:37,182-Speed 10514.39 samples/sec Loss 8.3416 LearningRate 0.2722 Epoch: 9 Global Step: 47810 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:23:44,955-Speed 10540.10 samples/sec Loss 8.3639 LearningRate 0.2721 Epoch: 9 Global Step: 47820 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:23:52,747-Speed 10515.21 samples/sec Loss 8.3476 LearningRate 0.2720 Epoch: 9 Global Step: 47830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:24:00,561-Speed 10485.95 samples/sec Loss 8.3378 LearningRate 0.2719 Epoch: 9 Global Step: 47840 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:24:08,349-Speed 10518.56 samples/sec Loss 8.4363 LearningRate 0.2718 Epoch: 9 Global Step: 47850 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:24:16,171-Speed 10474.23 samples/sec Loss 8.4001 LearningRate 0.2717 Epoch: 9 Global Step: 47860 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:24:23,967-Speed 10510.29 samples/sec Loss 8.3648 LearningRate 0.2716 Epoch: 9 Global Step: 47870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:24:31,764-Speed 10508.14 samples/sec Loss 8.3420 LearningRate 0.2715 Epoch: 9 Global Step: 47880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:24:39,571-Speed 10494.35 samples/sec Loss 8.3486 LearningRate 0.2715 Epoch: 9 Global Step: 47890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:24:47,356-Speed 10524.05 samples/sec Loss 8.3421 LearningRate 0.2714 Epoch: 9 Global Step: 47900 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:24:55,171-Speed 10484.22 samples/sec Loss 8.3464 LearningRate 0.2713 Epoch: 9 Global Step: 47910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:25:02,966-Speed 10511.16 samples/sec Loss 8.3826 LearningRate 0.2712 Epoch: 9 Global Step: 47920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:25:10,766-Speed 10504.35 samples/sec Loss 8.3964 LearningRate 0.2711 Epoch: 9 Global Step: 47930 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:25:18,588-Speed 10474.04 samples/sec Loss 8.3335 LearningRate 0.2710 Epoch: 9 Global Step: 47940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:25:26,415-Speed 10466.89 samples/sec Loss 8.3409 LearningRate 0.2709 Epoch: 9 Global Step: 47950 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:25:34,213-Speed 10507.40 samples/sec Loss 8.4193 LearningRate 0.2708 Epoch: 9 Global Step: 47960 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:25:42,020-Speed 10493.87 samples/sec Loss 8.3378 LearningRate 0.2707 Epoch: 9 Global Step: 47970 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:25:49,824-Speed 10499.54 samples/sec Loss 8.2920 LearningRate 0.2706 Epoch: 9 Global Step: 47980 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:25:57,668-Speed 10444.91 samples/sec Loss 8.3425 LearningRate 0.2705 Epoch: 9 Global Step: 47990 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:26:05,477-Speed 10491.53 samples/sec Loss 8.3647 LearningRate 0.2704 Epoch: 9 Global Step: 48000 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:26:13,268-Speed 10515.81 samples/sec Loss 8.3385 LearningRate 0.2703 Epoch: 9 Global Step: 48010 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:26:21,055-Speed 10521.87 samples/sec Loss 8.3427 LearningRate 0.2702 Epoch: 9 Global Step: 48020 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:26:28,855-Speed 10504.52 samples/sec Loss 8.3236 LearningRate 0.2701 Epoch: 9 Global Step: 48030 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:26:36,694-Speed 10450.72 samples/sec Loss 8.2692 LearningRate 0.2700 Epoch: 9 Global Step: 48040 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:26:44,523-Speed 10465.93 samples/sec Loss 8.3595 LearningRate 0.2699 Epoch: 9 Global Step: 48050 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:26:52,387-Speed 10418.48 samples/sec Loss 8.3367 LearningRate 0.2698 Epoch: 9 Global Step: 48060 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:27:00,190-Speed 10500.94 samples/sec Loss 8.3472 LearningRate 0.2697 Epoch: 9 Global Step: 48070 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:27:08,003-Speed 10486.45 samples/sec Loss 8.3022 LearningRate 0.2696 Epoch: 9 Global Step: 48080 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:27:15,805-Speed 10501.03 samples/sec Loss 8.3720 LearningRate 0.2695 Epoch: 9 Global Step: 48090 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:27:23,580-Speed 10537.25 samples/sec Loss 8.3088 LearningRate 0.2694 Epoch: 9 Global Step: 48100 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:27:31,382-Speed 10501.47 samples/sec Loss 8.3594 LearningRate 0.2693 Epoch: 9 Global Step: 48110 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:27:39,168-Speed 10523.24 samples/sec Loss 8.3009 LearningRate 0.2692 Epoch: 9 Global Step: 48120 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:27:46,971-Speed 10499.83 samples/sec Loss 8.3381 LearningRate 0.2691 Epoch: 9 Global Step: 48130 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:27:54,773-Speed 10501.07 samples/sec Loss 8.3079 LearningRate 0.2690 Epoch: 9 Global Step: 48140 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:28:02,567-Speed 10511.77 samples/sec Loss 8.3203 LearningRate 0.2689 Epoch: 9 Global Step: 48150 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:28:10,370-Speed 10499.73 samples/sec Loss 8.3892 LearningRate 0.2688 Epoch: 9 Global Step: 48160 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:28:18,187-Speed 10481.37 samples/sec Loss 8.3200 LearningRate 0.2687 Epoch: 9 Global Step: 48170 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:28:25,986-Speed 10505.57 samples/sec Loss 8.2859 LearningRate 0.2686 Epoch: 9 Global Step: 48180 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:28:33,802-Speed 10482.12 samples/sec Loss 8.3415 LearningRate 0.2685 Epoch: 9 Global Step: 48190 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:28:41,608-Speed 10495.65 samples/sec Loss 8.2717 LearningRate 0.2684 Epoch: 9 Global Step: 48200 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:28:49,399-Speed 10516.20 samples/sec Loss 8.3459 LearningRate 0.2683 Epoch: 9 Global Step: 48210 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:28:57,246-Speed 10440.92 samples/sec Loss 8.2991 LearningRate 0.2683 Epoch: 9 Global Step: 48220 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:29:05,064-Speed 10480.06 samples/sec Loss 8.2946 LearningRate 0.2682 Epoch: 9 Global Step: 48230 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:29:12,898-Speed 10458.97 samples/sec Loss 8.2262 LearningRate 0.2681 Epoch: 9 Global Step: 48240 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:29:20,737-Speed 10450.69 samples/sec Loss 8.3229 LearningRate 0.2680 Epoch: 9 Global Step: 48250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:29:28,606-Speed 10415.75 samples/sec Loss 8.3445 LearningRate 0.2679 Epoch: 9 Global Step: 48260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:29:36,427-Speed 10476.65 samples/sec Loss 8.2887 LearningRate 0.2678 Epoch: 9 Global Step: 48270 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:29:44,246-Speed 10479.39 samples/sec Loss 8.2385 LearningRate 0.2677 Epoch: 9 Global Step: 48280 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:29:52,080-Speed 10457.33 samples/sec Loss 8.2988 LearningRate 0.2676 Epoch: 9 Global Step: 48290 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:29:59,882-Speed 10501.57 samples/sec Loss 8.2758 LearningRate 0.2675 Epoch: 9 Global Step: 48300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:30:07,709-Speed 10468.05 samples/sec Loss 8.2798 LearningRate 0.2674 Epoch: 9 Global Step: 48310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:30:15,506-Speed 10507.73 samples/sec Loss 8.3251 LearningRate 0.2673 Epoch: 9 Global Step: 48320 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:30:23,316-Speed 10491.72 samples/sec Loss 8.2959 LearningRate 0.2672 Epoch: 9 Global Step: 48330 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:30:31,103-Speed 10521.93 samples/sec Loss 8.2925 LearningRate 0.2671 Epoch: 9 Global Step: 48340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:30:38,886-Speed 10526.31 samples/sec Loss 8.3073 LearningRate 0.2670 Epoch: 9 Global Step: 48350 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:30:46,672-Speed 10523.54 samples/sec Loss 8.2378 LearningRate 0.2669 Epoch: 9 Global Step: 48360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:30:54,463-Speed 10516.59 samples/sec Loss 8.2677 LearningRate 0.2668 Epoch: 9 Global Step: 48370 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:31:02,301-Speed 10453.13 samples/sec Loss 8.2248 LearningRate 0.2667 Epoch: 9 Global Step: 48380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:31:10,089-Speed 10519.09 samples/sec Loss 8.2852 LearningRate 0.2666 Epoch: 9 Global Step: 48390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:31:17,878-Speed 10520.74 samples/sec Loss 8.3503 LearningRate 0.2665 Epoch: 9 Global Step: 48400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:31:25,667-Speed 10517.74 samples/sec Loss 8.2970 LearningRate 0.2664 Epoch: 9 Global Step: 48410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:31:33,482-Speed 10484.16 samples/sec Loss 8.2115 LearningRate 0.2663 Epoch: 9 Global Step: 48420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:31:41,295-Speed 10487.15 samples/sec Loss 8.2826 LearningRate 0.2662 Epoch: 9 Global Step: 48430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:31:49,120-Speed 10470.13 samples/sec Loss 8.3098 LearningRate 0.2661 Epoch: 9 Global Step: 48440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:31:56,900-Speed 10530.69 samples/sec Loss 8.2461 LearningRate 0.2660 Epoch: 9 Global Step: 48450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:32:04,679-Speed 10532.79 samples/sec Loss 8.2798 LearningRate 0.2659 Epoch: 9 Global Step: 48460 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:32:12,489-Speed 10490.41 samples/sec Loss 8.2959 LearningRate 0.2658 Epoch: 9 Global Step: 48470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:32:20,281-Speed 10515.26 samples/sec Loss 8.2761 LearningRate 0.2657 Epoch: 9 Global Step: 48480 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:32:28,115-Speed 10457.80 samples/sec Loss 8.2590 LearningRate 0.2656 Epoch: 9 Global Step: 48490 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:32:35,967-Speed 10433.59 samples/sec Loss 8.2444 LearningRate 0.2655 Epoch: 9 Global Step: 48500 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:32:43,791-Speed 10472.65 samples/sec Loss 8.2891 LearningRate 0.2655 Epoch: 9 Global Step: 48510 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:32:51,585-Speed 10511.75 samples/sec Loss 8.2636 LearningRate 0.2654 Epoch: 9 Global Step: 48520 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:32:59,418-Speed 10459.63 samples/sec Loss 8.2755 LearningRate 0.2653 Epoch: 9 Global Step: 48530 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:33:07,253-Speed 10456.75 samples/sec Loss 8.2362 LearningRate 0.2652 Epoch: 9 Global Step: 48540 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:33:15,043-Speed 10518.79 samples/sec Loss 8.3559 LearningRate 0.2651 Epoch: 9 Global Step: 48550 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:33:22,844-Speed 10502.32 samples/sec Loss 8.2396 LearningRate 0.2650 Epoch: 9 Global Step: 48560 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:33:30,649-Speed 10496.47 samples/sec Loss 8.2564 LearningRate 0.2649 Epoch: 9 Global Step: 48570 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:33:38,476-Speed 10468.57 samples/sec Loss 8.1766 LearningRate 0.2648 Epoch: 9 Global Step: 48580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:33:46,298-Speed 10475.10 samples/sec Loss 8.2009 LearningRate 0.2647 Epoch: 9 Global Step: 48590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:33:54,135-Speed 10456.37 samples/sec Loss 8.2304 LearningRate 0.2646 Epoch: 9 Global Step: 48600 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:34:01,940-Speed 10497.52 samples/sec Loss 8.2881 LearningRate 0.2645 Epoch: 9 Global Step: 48610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:34:09,780-Speed 10450.76 samples/sec Loss 8.2581 LearningRate 0.2644 Epoch: 9 Global Step: 48620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:34:17,584-Speed 10498.33 samples/sec Loss 8.2664 LearningRate 0.2643 Epoch: 9 Global Step: 48630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:34:25,403-Speed 10478.59 samples/sec Loss 8.2363 LearningRate 0.2642 Epoch: 9 Global Step: 48640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:34:33,213-Speed 10491.47 samples/sec Loss 8.2534 LearningRate 0.2641 Epoch: 9 Global Step: 48650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:34:41,015-Speed 10500.92 samples/sec Loss 8.2028 LearningRate 0.2640 Epoch: 9 Global Step: 48660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:34:48,822-Speed 10495.50 samples/sec Loss 8.2190 LearningRate 0.2639 Epoch: 9 Global Step: 48670 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:34:56,617-Speed 10509.99 samples/sec Loss 8.2401 LearningRate 0.2638 Epoch: 9 Global Step: 48680 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:35:04,404-Speed 10521.43 samples/sec Loss 8.2239 LearningRate 0.2637 Epoch: 9 Global Step: 48690 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:35:12,201-Speed 10508.72 samples/sec Loss 8.2903 LearningRate 0.2636 Epoch: 9 Global Step: 48700 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:35:19,977-Speed 10535.56 samples/sec Loss 8.2085 LearningRate 0.2635 Epoch: 9 Global Step: 48710 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:35:27,777-Speed 10504.28 samples/sec Loss 8.2306 LearningRate 0.2634 Epoch: 9 Global Step: 48720 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:35:35,569-Speed 10515.35 samples/sec Loss 8.2068 LearningRate 0.2633 Epoch: 9 Global Step: 48730 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:35:43,364-Speed 10511.34 samples/sec Loss 8.2827 LearningRate 0.2632 Epoch: 9 Global Step: 48740 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:35:51,202-Speed 10452.46 samples/sec Loss 8.2146 LearningRate 0.2631 Epoch: 9 Global Step: 48750 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:35:59,007-Speed 10496.74 samples/sec Loss 8.2635 LearningRate 0.2631 Epoch: 9 Global Step: 48760 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:36:06,791-Speed 10525.33 samples/sec Loss 8.2158 LearningRate 0.2630 Epoch: 9 Global Step: 48770 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:36:14,594-Speed 10499.93 samples/sec Loss 8.2113 LearningRate 0.2629 Epoch: 9 Global Step: 48780 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:36:22,403-Speed 10491.88 samples/sec Loss 8.1967 LearningRate 0.2628 Epoch: 9 Global Step: 48790 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:36:30,197-Speed 10512.36 samples/sec Loss 8.2446 LearningRate 0.2627 Epoch: 9 Global Step: 48800 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:36:38,008-Speed 10488.63 samples/sec Loss 8.1887 LearningRate 0.2626 Epoch: 9 Global Step: 48810 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:36:45,824-Speed 10482.85 samples/sec Loss 8.2383 LearningRate 0.2625 Epoch: 9 Global Step: 48820 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:36:53,632-Speed 10494.33 samples/sec Loss 8.1653 LearningRate 0.2624 Epoch: 9 Global Step: 48830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:37:01,523-Speed 10383.23 samples/sec Loss 8.1907 LearningRate 0.2623 Epoch: 9 Global Step: 48840 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:37:09,307-Speed 10526.43 samples/sec Loss 8.2269 LearningRate 0.2622 Epoch: 9 Global Step: 48850 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:37:17,106-Speed 10505.16 samples/sec Loss 8.2167 LearningRate 0.2621 Epoch: 9 Global Step: 48860 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:37:24,898-Speed 10514.22 samples/sec Loss 8.2619 LearningRate 0.2620 Epoch: 9 Global Step: 48870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:37:32,714-Speed 10481.89 samples/sec Loss 8.2017 LearningRate 0.2619 Epoch: 9 Global Step: 48880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:37:40,522-Speed 10494.03 samples/sec Loss 8.2572 LearningRate 0.2618 Epoch: 9 Global Step: 48890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:37:48,326-Speed 10498.22 samples/sec Loss 8.2298 LearningRate 0.2617 Epoch: 9 Global Step: 48900 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:37:56,132-Speed 10496.38 samples/sec Loss 8.2493 LearningRate 0.2616 Epoch: 9 Global Step: 48910 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:38:03,927-Speed 10511.07 samples/sec Loss 8.2236 LearningRate 0.2615 Epoch: 9 Global Step: 48920 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:38:11,725-Speed 10506.32 samples/sec Loss 8.1825 LearningRate 0.2614 Epoch: 9 Global Step: 48930 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:38:19,542-Speed 10482.03 samples/sec Loss 8.2216 LearningRate 0.2613 Epoch: 9 Global Step: 48940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:38:27,348-Speed 10494.76 samples/sec Loss 8.2390 LearningRate 0.2612 Epoch: 9 Global Step: 48950 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:38:35,124-Speed 10535.36 samples/sec Loss 8.2215 LearningRate 0.2611 Epoch: 9 Global Step: 48960 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:38:42,927-Speed 10501.85 samples/sec Loss 8.1630 LearningRate 0.2610 Epoch: 9 Global Step: 48970 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:38:50,722-Speed 10510.92 samples/sec Loss 8.2337 LearningRate 0.2609 Epoch: 9 Global Step: 48980 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:38:58,526-Speed 10497.51 samples/sec Loss 8.1654 LearningRate 0.2609 Epoch: 9 Global Step: 48990 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:39:06,338-Speed 10487.61 samples/sec Loss 8.1927 LearningRate 0.2608 Epoch: 9 Global Step: 49000 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:39:14,150-Speed 10491.42 samples/sec Loss 8.1288 LearningRate 0.2607 Epoch: 9 Global Step: 49010 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:39:21,941-Speed 10516.07 samples/sec Loss 8.1491 LearningRate 0.2606 Epoch: 9 Global Step: 49020 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:39:29,725-Speed 10525.81 samples/sec Loss 8.1813 LearningRate 0.2605 Epoch: 9 Global Step: 49030 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:39:37,506-Speed 10529.19 samples/sec Loss 8.1537 LearningRate 0.2604 Epoch: 9 Global Step: 49040 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:39:45,255-Speed 10572.95 samples/sec Loss 8.1719 LearningRate 0.2603 Epoch: 9 Global Step: 49050 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-16 01:39:53,028-Speed 10540.42 samples/sec Loss 8.2324 LearningRate 0.2602 Epoch: 9 Global Step: 49060 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-16 01:40:00,824-Speed 10509.81 samples/sec Loss 8.2225 LearningRate 0.2601 Epoch: 9 Global Step: 49070 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-16 01:40:08,611-Speed 10520.60 samples/sec Loss 8.1839 LearningRate 0.2600 Epoch: 9 Global Step: 49080 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-16 01:40:16,400-Speed 10519.94 samples/sec Loss 8.1816 LearningRate 0.2599 Epoch: 9 Global Step: 49090 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-16 01:40:24,197-Speed 10508.70 samples/sec Loss 8.1854 LearningRate 0.2598 Epoch: 9 Global Step: 49100 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-16 01:40:31,966-Speed 10544.65 samples/sec Loss 8.1935 LearningRate 0.2597 Epoch: 9 Global Step: 49110 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-16 01:40:39,778-Speed 10488.20 samples/sec Loss 8.2310 LearningRate 0.2596 Epoch: 9 Global Step: 49120 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-16 01:40:47,596-Speed 10479.63 samples/sec Loss 8.1784 LearningRate 0.2595 Epoch: 9 Global Step: 49130 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-16 01:40:55,413-Speed 10480.49 samples/sec Loss 8.1964 LearningRate 0.2594 Epoch: 9 Global Step: 49140 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-01-16 01:41:03,206-Speed 10513.96 samples/sec Loss 8.2547 LearningRate 0.2593 Epoch: 9 Global Step: 49150 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:41:11,003-Speed 10508.38 samples/sec Loss 8.1184 LearningRate 0.2592 Epoch: 9 Global Step: 49160 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:41:18,807-Speed 10498.84 samples/sec Loss 8.2089 LearningRate 0.2591 Epoch: 9 Global Step: 49170 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:41:26,598-Speed 10516.49 samples/sec Loss 8.1916 LearningRate 0.2590 Epoch: 9 Global Step: 49180 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:41:34,376-Speed 10534.95 samples/sec Loss 8.1940 LearningRate 0.2589 Epoch: 9 Global Step: 49190 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:41:42,165-Speed 10519.08 samples/sec Loss 8.1753 LearningRate 0.2589 Epoch: 9 Global Step: 49200 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:41:49,947-Speed 10527.97 samples/sec Loss 8.1210 LearningRate 0.2588 Epoch: 9 Global Step: 49210 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:41:57,757-Speed 10491.29 samples/sec Loss 8.1516 LearningRate 0.2587 Epoch: 9 Global Step: 49220 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:42:05,584-Speed 10467.97 samples/sec Loss 8.1230 LearningRate 0.2586 Epoch: 9 Global Step: 49230 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:42:13,391-Speed 10494.89 samples/sec Loss 8.2223 LearningRate 0.2585 Epoch: 9 Global Step: 49240 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:42:21,219-Speed 10469.37 samples/sec Loss 8.2610 LearningRate 0.2584 Epoch: 9 Global Step: 49250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:42:29,007-Speed 10521.09 samples/sec Loss 8.1560 LearningRate 0.2583 Epoch: 9 Global Step: 49260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:42:36,799-Speed 10514.23 samples/sec Loss 8.1766 LearningRate 0.2582 Epoch: 9 Global Step: 49270 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:42:44,588-Speed 10518.78 samples/sec Loss 8.1588 LearningRate 0.2581 Epoch: 9 Global Step: 49280 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:42:52,411-Speed 10472.87 samples/sec Loss 8.1640 LearningRate 0.2580 Epoch: 9 Global Step: 49290 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:43:00,217-Speed 10496.65 samples/sec Loss 8.1519 LearningRate 0.2579 Epoch: 9 Global Step: 49300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:43:08,026-Speed 10493.08 samples/sec Loss 8.1363 LearningRate 0.2578 Epoch: 9 Global Step: 49310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:43:15,793-Speed 10547.69 samples/sec Loss 8.1126 LearningRate 0.2577 Epoch: 9 Global Step: 49320 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:43:23,562-Speed 10546.58 samples/sec Loss 8.1113 LearningRate 0.2576 Epoch: 9 Global Step: 49330 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:43:31,342-Speed 10530.78 samples/sec Loss 8.1658 LearningRate 0.2575 Epoch: 9 Global Step: 49340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:43:39,144-Speed 10502.51 samples/sec Loss 8.1806 LearningRate 0.2574 Epoch: 9 Global Step: 49350 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:43:46,937-Speed 10512.64 samples/sec Loss 8.1880 LearningRate 0.2573 Epoch: 9 Global Step: 49360 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:43:54,726-Speed 10518.30 samples/sec Loss 8.1914 LearningRate 0.2572 Epoch: 9 Global Step: 49370 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:44:02,521-Speed 10510.51 samples/sec Loss 8.1988 LearningRate 0.2571 Epoch: 9 Global Step: 49380 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:44:10,290-Speed 10546.59 samples/sec Loss 8.2046 LearningRate 0.2571 Epoch: 9 Global Step: 49390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:44:18,075-Speed 10523.50 samples/sec Loss 8.1463 LearningRate 0.2570 Epoch: 9 Global Step: 49400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:44:25,913-Speed 10453.74 samples/sec Loss 8.1896 LearningRate 0.2569 Epoch: 9 Global Step: 49410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:44:33,703-Speed 10517.73 samples/sec Loss 8.1453 LearningRate 0.2568 Epoch: 9 Global Step: 49420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:44:41,499-Speed 10508.63 samples/sec Loss 8.1125 LearningRate 0.2567 Epoch: 9 Global Step: 49430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:44:49,313-Speed 10487.41 samples/sec Loss 8.0966 LearningRate 0.2566 Epoch: 9 Global Step: 49440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:44:57,109-Speed 10513.02 samples/sec Loss 8.1277 LearningRate 0.2565 Epoch: 9 Global Step: 49450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:45:04,893-Speed 10524.88 samples/sec Loss 8.1214 LearningRate 0.2564 Epoch: 9 Global Step: 49460 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:45:12,681-Speed 10521.15 samples/sec Loss 8.1090 LearningRate 0.2563 Epoch: 9 Global Step: 49470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:45:20,506-Speed 10470.58 samples/sec Loss 8.1085 LearningRate 0.2562 Epoch: 9 Global Step: 49480 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:45:28,292-Speed 10522.88 samples/sec Loss 8.1330 LearningRate 0.2561 Epoch: 9 Global Step: 49490 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:45:36,078-Speed 10522.95 samples/sec Loss 8.1055 LearningRate 0.2560 Epoch: 9 Global Step: 49500 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:45:43,875-Speed 10509.92 samples/sec Loss 8.1083 LearningRate 0.2559 Epoch: 9 Global Step: 49510 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:45:51,671-Speed 10509.42 samples/sec Loss 8.1503 LearningRate 0.2558 Epoch: 9 Global Step: 49520 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:45:59,478-Speed 10494.17 samples/sec Loss 8.1443 LearningRate 0.2557 Epoch: 9 Global Step: 49530 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:46:07,278-Speed 10504.69 samples/sec Loss 8.1085 LearningRate 0.2556 Epoch: 9 Global Step: 49540 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:46:15,060-Speed 10528.08 samples/sec Loss 8.2522 LearningRate 0.2555 Epoch: 9 Global Step: 49550 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:46:22,849-Speed 10519.62 samples/sec Loss 8.1286 LearningRate 0.2554 Epoch: 9 Global Step: 49560 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:46:30,650-Speed 10502.21 samples/sec Loss 8.1214 LearningRate 0.2554 Epoch: 9 Global Step: 49570 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:46:38,451-Speed 10503.19 samples/sec Loss 8.0887 LearningRate 0.2553 Epoch: 9 Global Step: 49580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:46:46,244-Speed 10513.35 samples/sec Loss 8.0842 LearningRate 0.2552 Epoch: 9 Global Step: 49590 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:46:54,048-Speed 10498.05 samples/sec Loss 8.1040 LearningRate 0.2551 Epoch: 9 Global Step: 49600 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:47:01,857-Speed 10492.14 samples/sec Loss 8.1288 LearningRate 0.2550 Epoch: 9 Global Step: 49610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:47:09,637-Speed 10530.22 samples/sec Loss 8.0751 LearningRate 0.2549 Epoch: 9 Global Step: 49620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:47:17,412-Speed 10537.97 samples/sec Loss 8.0792 LearningRate 0.2548 Epoch: 9 Global Step: 49630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:47:25,198-Speed 10523.31 samples/sec Loss 8.0998 LearningRate 0.2547 Epoch: 9 Global Step: 49640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:47:32,976-Speed 10533.27 samples/sec Loss 8.1011 LearningRate 0.2546 Epoch: 9 Global Step: 49650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:47:40,754-Speed 10534.05 samples/sec Loss 8.0824 LearningRate 0.2545 Epoch: 9 Global Step: 49660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:47:48,561-Speed 10494.52 samples/sec Loss 8.0736 LearningRate 0.2544 Epoch: 9 Global Step: 49670 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:47:56,340-Speed 10532.02 samples/sec Loss 8.1109 LearningRate 0.2543 Epoch: 9 Global Step: 49680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:48:04,165-Speed 10470.13 samples/sec Loss 8.1207 LearningRate 0.2542 Epoch: 9 Global Step: 49690 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:48:11,967-Speed 10501.41 samples/sec Loss 8.0570 LearningRate 0.2541 Epoch: 9 Global Step: 49700 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:48:19,784-Speed 10482.41 samples/sec Loss 8.1159 LearningRate 0.2540 Epoch: 9 Global Step: 49710 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:48:27,592-Speed 10493.55 samples/sec Loss 8.0986 LearningRate 0.2539 Epoch: 9 Global Step: 49720 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:48:35,379-Speed 10521.56 samples/sec Loss 8.0707 LearningRate 0.2538 Epoch: 9 Global Step: 49730 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:48:43,177-Speed 10509.15 samples/sec Loss 8.1303 LearningRate 0.2537 Epoch: 9 Global Step: 49740 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:48:50,980-Speed 10501.09 samples/sec Loss 8.0953 LearningRate 0.2537 Epoch: 9 Global Step: 49750 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:48:58,791-Speed 10490.35 samples/sec Loss 8.0710 LearningRate 0.2536 Epoch: 9 Global Step: 49760 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:49:06,584-Speed 10512.72 samples/sec Loss 8.1365 LearningRate 0.2535 Epoch: 9 Global Step: 49770 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:49:14,388-Speed 10498.23 samples/sec Loss 8.1220 LearningRate 0.2534 Epoch: 9 Global Step: 49780 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:49:22,215-Speed 10469.33 samples/sec Loss 8.0911 LearningRate 0.2533 Epoch: 9 Global Step: 49790 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:49:30,015-Speed 10504.87 samples/sec Loss 8.1074 LearningRate 0.2532 Epoch: 9 Global Step: 49800 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:49:37,809-Speed 10511.49 samples/sec Loss 8.0094 LearningRate 0.2531 Epoch: 9 Global Step: 49810 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:49:45,618-Speed 10491.55 samples/sec Loss 8.0450 LearningRate 0.2530 Epoch: 9 Global Step: 49820 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:49:53,422-Speed 10498.55 samples/sec Loss 8.0683 LearningRate 0.2529 Epoch: 9 Global Step: 49830 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:50:01,210-Speed 10520.48 samples/sec Loss 8.0624 LearningRate 0.2528 Epoch: 9 Global Step: 49840 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:50:08,989-Speed 10532.62 samples/sec Loss 8.1103 LearningRate 0.2527 Epoch: 9 Global Step: 49850 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:50:16,781-Speed 10514.31 samples/sec Loss 8.0612 LearningRate 0.2526 Epoch: 9 Global Step: 49860 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:50:24,575-Speed 10512.72 samples/sec Loss 8.0897 LearningRate 0.2525 Epoch: 9 Global Step: 49870 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:50:32,379-Speed 10498.29 samples/sec Loss 8.1327 LearningRate 0.2524 Epoch: 9 Global Step: 49880 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:50:40,158-Speed 10532.34 samples/sec Loss 8.0759 LearningRate 0.2523 Epoch: 9 Global Step: 49890 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:50:47,944-Speed 10523.71 samples/sec Loss 8.0619 LearningRate 0.2522 Epoch: 9 Global Step: 49900 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:50:55,751-Speed 10494.28 samples/sec Loss 8.0850 LearningRate 0.2522 Epoch: 9 Global Step: 49910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:51:03,530-Speed 10531.62 samples/sec Loss 8.0915 LearningRate 0.2521 Epoch: 9 Global Step: 49920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:51:11,334-Speed 10499.23 samples/sec Loss 8.0329 LearningRate 0.2520 Epoch: 9 Global Step: 49930 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:51:19,111-Speed 10534.84 samples/sec Loss 8.0745 LearningRate 0.2519 Epoch: 9 Global Step: 49940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:51:26,914-Speed 10499.82 samples/sec Loss 8.0757 LearningRate 0.2518 Epoch: 9 Global Step: 49950 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:51:34,701-Speed 10521.31 samples/sec Loss 8.1231 LearningRate 0.2517 Epoch: 9 Global Step: 49960 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:51:42,506-Speed 10497.04 samples/sec Loss 8.0123 LearningRate 0.2516 Epoch: 9 Global Step: 49970 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:51:50,298-Speed 10515.42 samples/sec Loss 8.0001 LearningRate 0.2515 Epoch: 9 Global Step: 49980 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:51:58,104-Speed 10496.43 samples/sec Loss 8.0594 LearningRate 0.2514 Epoch: 9 Global Step: 49990 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:52:05,923-Speed 10478.31 samples/sec Loss 8.0848 LearningRate 0.2513 Epoch: 9 Global Step: 50000 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:52:33,309-[lfw][50000]XNorm: 21.813953 Training: 2022-01-16 01:52:33,309-[lfw][50000]Accuracy-Flip: 0.99783+-0.00211 Training: 2022-01-16 01:52:33,310-[lfw][50000]Accuracy-Highest: 0.99783 Training: 2022-01-16 01:53:05,107-[cfp_fp][50000]XNorm: 18.823709 Training: 2022-01-16 01:53:05,108-[cfp_fp][50000]Accuracy-Flip: 0.97886+-0.00625 Training: 2022-01-16 01:53:05,109-[cfp_fp][50000]Accuracy-Highest: 0.97886 Training: 2022-01-16 01:53:33,424-[agedb_30][50000]XNorm: 21.084104 Training: 2022-01-16 01:53:33,424-[agedb_30][50000]Accuracy-Flip: 0.96667+-0.00972 Training: 2022-01-16 01:53:33,425-[agedb_30][50000]Accuracy-Highest: 0.96667 Training: 2022-01-16 01:53:41,229-Speed 859.57 samples/sec Loss 8.0596 LearningRate 0.2512 Epoch: 9 Global Step: 50010 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:53:49,048-Speed 10479.39 samples/sec Loss 8.0247 LearningRate 0.2511 Epoch: 9 Global Step: 50020 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:53:56,899-Speed 10437.16 samples/sec Loss 8.1111 LearningRate 0.2510 Epoch: 9 Global Step: 50030 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:54:04,721-Speed 10474.80 samples/sec Loss 8.1080 LearningRate 0.2509 Epoch: 9 Global Step: 50040 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:54:12,524-Speed 10500.72 samples/sec Loss 8.0238 LearningRate 0.2508 Epoch: 9 Global Step: 50050 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:54:20,325-Speed 10503.18 samples/sec Loss 8.0466 LearningRate 0.2507 Epoch: 9 Global Step: 50060 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:54:28,129-Speed 10499.34 samples/sec Loss 8.0405 LearningRate 0.2507 Epoch: 9 Global Step: 50070 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:54:35,907-Speed 10532.90 samples/sec Loss 8.0383 LearningRate 0.2506 Epoch: 9 Global Step: 50080 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:54:43,720-Speed 10490.02 samples/sec Loss 8.0941 LearningRate 0.2505 Epoch: 9 Global Step: 50090 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:54:51,549-Speed 10469.73 samples/sec Loss 8.0650 LearningRate 0.2504 Epoch: 9 Global Step: 50100 Fp16 Grad Scale: 524288 Required: 12 hours Training: 2022-01-16 01:54:59,349-Speed 10504.38 samples/sec Loss 8.0925 LearningRate 0.2503 Epoch: 9 Global Step: 50110 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:55:07,145-Speed 10509.90 samples/sec Loss 8.0028 LearningRate 0.2502 Epoch: 9 Global Step: 50120 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:55:14,920-Speed 10538.29 samples/sec Loss 7.9982 LearningRate 0.2501 Epoch: 9 Global Step: 50130 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:55:22,698-Speed 10538.21 samples/sec Loss 8.0513 LearningRate 0.2500 Epoch: 9 Global Step: 50140 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:55:30,496-Speed 10506.60 samples/sec Loss 8.0340 LearningRate 0.2499 Epoch: 9 Global Step: 50150 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:55:38,296-Speed 10505.20 samples/sec Loss 8.0053 LearningRate 0.2498 Epoch: 9 Global Step: 50160 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:55:46,094-Speed 10507.29 samples/sec Loss 8.0316 LearningRate 0.2497 Epoch: 9 Global Step: 50170 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:55:53,863-Speed 10545.48 samples/sec Loss 8.0473 LearningRate 0.2496 Epoch: 9 Global Step: 50180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:56:01,627-Speed 10553.41 samples/sec Loss 8.0440 LearningRate 0.2495 Epoch: 9 Global Step: 50190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:56:09,427-Speed 10504.68 samples/sec Loss 7.9829 LearningRate 0.2494 Epoch: 9 Global Step: 50200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:56:17,186-Speed 10560.02 samples/sec Loss 8.0439 LearningRate 0.2493 Epoch: 9 Global Step: 50210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:56:24,972-Speed 10523.16 samples/sec Loss 8.0145 LearningRate 0.2493 Epoch: 9 Global Step: 50220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:56:32,763-Speed 10516.70 samples/sec Loss 8.0824 LearningRate 0.2492 Epoch: 9 Global Step: 50230 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:56:40,555-Speed 10515.61 samples/sec Loss 8.0261 LearningRate 0.2491 Epoch: 9 Global Step: 50240 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:56:48,360-Speed 10497.46 samples/sec Loss 8.0338 LearningRate 0.2490 Epoch: 9 Global Step: 50250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:56:56,169-Speed 10491.87 samples/sec Loss 8.0380 LearningRate 0.2489 Epoch: 9 Global Step: 50260 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:57:04,026-Speed 10427.69 samples/sec Loss 8.0348 LearningRate 0.2488 Epoch: 9 Global Step: 50270 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:57:11,817-Speed 10516.64 samples/sec Loss 7.9853 LearningRate 0.2487 Epoch: 9 Global Step: 50280 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:57:19,597-Speed 10530.93 samples/sec Loss 8.1055 LearningRate 0.2486 Epoch: 9 Global Step: 50290 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:57:27,380-Speed 10527.41 samples/sec Loss 7.9629 LearningRate 0.2485 Epoch: 9 Global Step: 50300 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:57:35,171-Speed 10516.56 samples/sec Loss 7.9450 LearningRate 0.2484 Epoch: 9 Global Step: 50310 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:57:42,969-Speed 10506.26 samples/sec Loss 7.9832 LearningRate 0.2483 Epoch: 9 Global Step: 50320 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:57:50,765-Speed 10510.39 samples/sec Loss 8.0105 LearningRate 0.2482 Epoch: 9 Global Step: 50330 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:57:58,557-Speed 10515.50 samples/sec Loss 8.0071 LearningRate 0.2481 Epoch: 9 Global Step: 50340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:58:06,344-Speed 10520.50 samples/sec Loss 8.0643 LearningRate 0.2480 Epoch: 9 Global Step: 50350 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:58:14,130-Speed 10522.82 samples/sec Loss 8.0348 LearningRate 0.2479 Epoch: 9 Global Step: 50360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:58:21,948-Speed 10480.77 samples/sec Loss 8.0335 LearningRate 0.2479 Epoch: 9 Global Step: 50370 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:58:29,761-Speed 10486.47 samples/sec Loss 7.9741 LearningRate 0.2478 Epoch: 9 Global Step: 50380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:58:37,573-Speed 10487.80 samples/sec Loss 8.0304 LearningRate 0.2477 Epoch: 9 Global Step: 50390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:58:45,388-Speed 10484.04 samples/sec Loss 8.0116 LearningRate 0.2476 Epoch: 9 Global Step: 50400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:58:53,177-Speed 10518.38 samples/sec Loss 8.0570 LearningRate 0.2475 Epoch: 9 Global Step: 50410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:59:00,995-Speed 10480.74 samples/sec Loss 8.0725 LearningRate 0.2474 Epoch: 9 Global Step: 50420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:59:08,799-Speed 10499.20 samples/sec Loss 8.0743 LearningRate 0.2473 Epoch: 9 Global Step: 50430 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:59:16,599-Speed 10506.65 samples/sec Loss 7.9889 LearningRate 0.2472 Epoch: 9 Global Step: 50440 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 01:59:24,388-Speed 10518.76 samples/sec Loss 7.9759 LearningRate 0.2471 Epoch: 9 Global Step: 50450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 01:59:32,163-Speed 10537.80 samples/sec Loss 7.9637 LearningRate 0.2470 Epoch: 9 Global Step: 50460 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:59:39,938-Speed 10537.71 samples/sec Loss 7.9456 LearningRate 0.2469 Epoch: 9 Global Step: 50470 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:59:47,718-Speed 10530.91 samples/sec Loss 7.8899 LearningRate 0.2468 Epoch: 9 Global Step: 50480 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 01:59:55,497-Speed 10532.29 samples/sec Loss 8.0310 LearningRate 0.2467 Epoch: 9 Global Step: 50490 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 02:00:03,291-Speed 10515.63 samples/sec Loss 8.0259 LearningRate 0.2466 Epoch: 9 Global Step: 50500 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 02:00:11,067-Speed 10535.61 samples/sec Loss 7.8999 LearningRate 0.2466 Epoch: 9 Global Step: 50510 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 02:00:18,914-Speed 10441.66 samples/sec Loss 7.9440 LearningRate 0.2465 Epoch: 9 Global Step: 50520 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 02:00:26,737-Speed 10473.75 samples/sec Loss 8.0107 LearningRate 0.2464 Epoch: 9 Global Step: 50530 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 02:00:34,515-Speed 10533.86 samples/sec Loss 7.9407 LearningRate 0.2463 Epoch: 9 Global Step: 50540 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 02:00:42,308-Speed 10513.24 samples/sec Loss 8.0079 LearningRate 0.2462 Epoch: 9 Global Step: 50550 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 02:00:50,120-Speed 10488.76 samples/sec Loss 7.9826 LearningRate 0.2461 Epoch: 9 Global Step: 50560 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:00:57,921-Speed 10503.90 samples/sec Loss 7.9863 LearningRate 0.2460 Epoch: 9 Global Step: 50570 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:01:05,707-Speed 10522.73 samples/sec Loss 7.9747 LearningRate 0.2459 Epoch: 9 Global Step: 50580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:01:13,481-Speed 10540.64 samples/sec Loss 7.9800 LearningRate 0.2458 Epoch: 9 Global Step: 50590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:01:21,289-Speed 10493.27 samples/sec Loss 7.9017 LearningRate 0.2457 Epoch: 9 Global Step: 50600 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:01:29,116-Speed 10468.05 samples/sec Loss 7.9420 LearningRate 0.2456 Epoch: 9 Global Step: 50610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:01:36,917-Speed 10506.82 samples/sec Loss 7.9170 LearningRate 0.2455 Epoch: 9 Global Step: 50620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:01:44,705-Speed 10519.59 samples/sec Loss 7.9609 LearningRate 0.2454 Epoch: 9 Global Step: 50630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:01:52,494-Speed 10518.91 samples/sec Loss 7.9962 LearningRate 0.2454 Epoch: 9 Global Step: 50640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:02:00,273-Speed 10533.04 samples/sec Loss 7.9557 LearningRate 0.2453 Epoch: 9 Global Step: 50650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:02:08,065-Speed 10513.89 samples/sec Loss 8.0050 LearningRate 0.2452 Epoch: 9 Global Step: 50660 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 02:02:15,860-Speed 10510.87 samples/sec Loss 7.9217 LearningRate 0.2451 Epoch: 9 Global Step: 50670 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 02:02:23,630-Speed 10544.27 samples/sec Loss 7.9940 LearningRate 0.2450 Epoch: 9 Global Step: 50680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:02:31,418-Speed 10519.78 samples/sec Loss 7.9791 LearningRate 0.2449 Epoch: 9 Global Step: 50690 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:02:39,195-Speed 10536.17 samples/sec Loss 7.9397 LearningRate 0.2448 Epoch: 9 Global Step: 50700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:02:47,020-Speed 10471.00 samples/sec Loss 8.0250 LearningRate 0.2447 Epoch: 9 Global Step: 50710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:02:54,822-Speed 10499.69 samples/sec Loss 7.9596 LearningRate 0.2446 Epoch: 9 Global Step: 50720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:03:02,611-Speed 10519.24 samples/sec Loss 8.0089 LearningRate 0.2445 Epoch: 9 Global Step: 50730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:03:10,395-Speed 10524.88 samples/sec Loss 8.0395 LearningRate 0.2444 Epoch: 9 Global Step: 50740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:03:18,187-Speed 10515.72 samples/sec Loss 7.9750 LearningRate 0.2443 Epoch: 9 Global Step: 50750 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:03:25,978-Speed 10515.35 samples/sec Loss 7.9193 LearningRate 0.2442 Epoch: 9 Global Step: 50760 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:03:33,757-Speed 10533.51 samples/sec Loss 7.9522 LearningRate 0.2442 Epoch: 9 Global Step: 50770 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:03:41,558-Speed 10502.89 samples/sec Loss 7.9308 LearningRate 0.2441 Epoch: 9 Global Step: 50780 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 02:03:49,373-Speed 10485.38 samples/sec Loss 7.9279 LearningRate 0.2440 Epoch: 9 Global Step: 50790 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 02:03:57,168-Speed 10512.32 samples/sec Loss 7.9076 LearningRate 0.2439 Epoch: 9 Global Step: 50800 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 02:04:04,956-Speed 10520.62 samples/sec Loss 7.9741 LearningRate 0.2438 Epoch: 9 Global Step: 50810 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 02:04:12,743-Speed 10521.87 samples/sec Loss 7.9444 LearningRate 0.2437 Epoch: 9 Global Step: 50820 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 02:04:20,529-Speed 10522.65 samples/sec Loss 7.9697 LearningRate 0.2436 Epoch: 9 Global Step: 50830 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 02:04:28,316-Speed 10522.27 samples/sec Loss 7.9084 LearningRate 0.2435 Epoch: 9 Global Step: 50840 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 02:04:36,100-Speed 10525.19 samples/sec Loss 7.9196 LearningRate 0.2434 Epoch: 9 Global Step: 50850 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 02:04:43,908-Speed 10493.23 samples/sec Loss 7.8803 LearningRate 0.2433 Epoch: 9 Global Step: 50860 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 02:04:51,721-Speed 10486.69 samples/sec Loss 7.9307 LearningRate 0.2432 Epoch: 9 Global Step: 50870 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 02:04:59,519-Speed 10507.16 samples/sec Loss 7.9328 LearningRate 0.2431 Epoch: 9 Global Step: 50880 Fp16 Grad Scale: 524288 Required: 12 hours Training: 2022-01-16 02:05:07,326-Speed 10495.05 samples/sec Loss 7.9301 LearningRate 0.2430 Epoch: 9 Global Step: 50890 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 02:05:15,128-Speed 10501.67 samples/sec Loss 7.8966 LearningRate 0.2430 Epoch: 9 Global Step: 50900 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 02:05:22,953-Speed 10469.58 samples/sec Loss 7.9431 LearningRate 0.2429 Epoch: 9 Global Step: 50910 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 02:05:30,756-Speed 10500.82 samples/sec Loss 7.9656 LearningRate 0.2428 Epoch: 9 Global Step: 50920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:05:38,536-Speed 10530.41 samples/sec Loss 7.8851 LearningRate 0.2427 Epoch: 9 Global Step: 50930 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:05:46,312-Speed 10536.06 samples/sec Loss 7.9038 LearningRate 0.2426 Epoch: 9 Global Step: 50940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:05:54,138-Speed 10468.93 samples/sec Loss 7.8982 LearningRate 0.2425 Epoch: 9 Global Step: 50950 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:06:01,949-Speed 10489.78 samples/sec Loss 7.9183 LearningRate 0.2424 Epoch: 9 Global Step: 50960 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:06:09,737-Speed 10519.67 samples/sec Loss 7.9063 LearningRate 0.2423 Epoch: 9 Global Step: 50970 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:06:17,522-Speed 10524.97 samples/sec Loss 7.8806 LearningRate 0.2422 Epoch: 9 Global Step: 50980 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:06:25,333-Speed 10488.51 samples/sec Loss 7.9196 LearningRate 0.2421 Epoch: 9 Global Step: 50990 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:06:33,126-Speed 10514.84 samples/sec Loss 7.8673 LearningRate 0.2420 Epoch: 9 Global Step: 51000 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:06:40,932-Speed 10494.30 samples/sec Loss 7.9135 LearningRate 0.2419 Epoch: 9 Global Step: 51010 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:06:48,746-Speed 10485.94 samples/sec Loss 7.9371 LearningRate 0.2418 Epoch: 9 Global Step: 51020 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 02:06:56,526-Speed 10531.26 samples/sec Loss 7.8480 LearningRate 0.2418 Epoch: 9 Global Step: 51030 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:07:04,344-Speed 10480.20 samples/sec Loss 7.9114 LearningRate 0.2417 Epoch: 9 Global Step: 51040 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:07:12,149-Speed 10496.15 samples/sec Loss 7.9390 LearningRate 0.2416 Epoch: 9 Global Step: 51050 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:07:19,955-Speed 10495.89 samples/sec Loss 7.8924 LearningRate 0.2415 Epoch: 9 Global Step: 51060 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:07:27,747-Speed 10515.71 samples/sec Loss 7.8143 LearningRate 0.2414 Epoch: 9 Global Step: 51070 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:07:35,544-Speed 10509.68 samples/sec Loss 7.9384 LearningRate 0.2413 Epoch: 9 Global Step: 51080 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:07:43,321-Speed 10534.80 samples/sec Loss 7.9680 LearningRate 0.2412 Epoch: 9 Global Step: 51090 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:07:51,133-Speed 10486.48 samples/sec Loss 7.9099 LearningRate 0.2411 Epoch: 9 Global Step: 51100 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:07:58,933-Speed 10509.60 samples/sec Loss 7.9176 LearningRate 0.2410 Epoch: 9 Global Step: 51110 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:08:06,720-Speed 10520.66 samples/sec Loss 7.8641 LearningRate 0.2409 Epoch: 9 Global Step: 51120 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:08:14,537-Speed 10481.24 samples/sec Loss 7.8820 LearningRate 0.2408 Epoch: 9 Global Step: 51130 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 02:08:22,354-Speed 10481.98 samples/sec Loss 7.9119 LearningRate 0.2407 Epoch: 9 Global Step: 51140 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 02:08:30,144-Speed 10517.34 samples/sec Loss 7.9264 LearningRate 0.2407 Epoch: 9 Global Step: 51150 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 02:08:37,945-Speed 10503.93 samples/sec Loss 7.9581 LearningRate 0.2406 Epoch: 9 Global Step: 51160 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 02:08:45,739-Speed 10511.81 samples/sec Loss 7.8592 LearningRate 0.2405 Epoch: 9 Global Step: 51170 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 02:08:53,527-Speed 10520.33 samples/sec Loss 7.8681 LearningRate 0.2404 Epoch: 9 Global Step: 51180 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 02:09:01,334-Speed 10493.77 samples/sec Loss 7.8989 LearningRate 0.2403 Epoch: 9 Global Step: 51190 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 02:09:09,120-Speed 10523.71 samples/sec Loss 7.8537 LearningRate 0.2402 Epoch: 9 Global Step: 51200 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 02:09:16,919-Speed 10505.20 samples/sec Loss 7.8616 LearningRate 0.2401 Epoch: 9 Global Step: 51210 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 02:09:24,746-Speed 10468.27 samples/sec Loss 7.9195 LearningRate 0.2400 Epoch: 9 Global Step: 51220 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 02:09:32,563-Speed 10480.81 samples/sec Loss 7.9006 LearningRate 0.2399 Epoch: 9 Global Step: 51230 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 02:09:40,373-Speed 10490.66 samples/sec Loss 7.8974 LearningRate 0.2398 Epoch: 9 Global Step: 51240 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-01-16 02:09:48,157-Speed 10526.17 samples/sec Loss 7.8805 LearningRate 0.2397 Epoch: 9 Global Step: 51250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:09:55,966-Speed 10490.33 samples/sec Loss 7.8998 LearningRate 0.2396 Epoch: 9 Global Step: 51260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:10:03,768-Speed 10504.59 samples/sec Loss 7.9237 LearningRate 0.2396 Epoch: 9 Global Step: 51270 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:10:11,607-Speed 10452.61 samples/sec Loss 7.8626 LearningRate 0.2395 Epoch: 9 Global Step: 51280 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:10:19,394-Speed 10521.88 samples/sec Loss 7.8903 LearningRate 0.2394 Epoch: 9 Global Step: 51290 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:10:27,201-Speed 10495.03 samples/sec Loss 7.9129 LearningRate 0.2393 Epoch: 9 Global Step: 51300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:10:34,994-Speed 10513.72 samples/sec Loss 7.8396 LearningRate 0.2392 Epoch: 9 Global Step: 51310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:10:42,795-Speed 10502.84 samples/sec Loss 7.8283 LearningRate 0.2391 Epoch: 9 Global Step: 51320 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:10:50,614-Speed 10478.49 samples/sec Loss 7.8889 LearningRate 0.2390 Epoch: 9 Global Step: 51330 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:10:58,399-Speed 10524.62 samples/sec Loss 7.8955 LearningRate 0.2389 Epoch: 9 Global Step: 51340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:11:06,211-Speed 10487.98 samples/sec Loss 7.8284 LearningRate 0.2388 Epoch: 9 Global Step: 51350 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 02:11:14,038-Speed 10468.18 samples/sec Loss 7.8400 LearningRate 0.2387 Epoch: 9 Global Step: 51360 Fp16 Grad Scale: 262144 Required: 12 hours Training: 2022-01-16 02:11:21,845-Speed 10495.80 samples/sec Loss 7.8222 LearningRate 0.2386 Epoch: 9 Global Step: 51370 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:11:29,651-Speed 10495.20 samples/sec Loss 7.8453 LearningRate 0.2386 Epoch: 9 Global Step: 51380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:11:37,475-Speed 10471.83 samples/sec Loss 7.8188 LearningRate 0.2385 Epoch: 9 Global Step: 51390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:11:45,312-Speed 10453.66 samples/sec Loss 7.8900 LearningRate 0.2384 Epoch: 9 Global Step: 51400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:11:53,136-Speed 10472.91 samples/sec Loss 7.9015 LearningRate 0.2383 Epoch: 9 Global Step: 51410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:12:00,992-Speed 10429.95 samples/sec Loss 7.8663 LearningRate 0.2382 Epoch: 9 Global Step: 51420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-01-16 02:12:08,795-Speed 10498.40 samples/sec Loss 7.8553 LearningRate 0.2381 Epoch: 9 Global Step: 51430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:12:16,601-Speed 10496.45 samples/sec Loss 7.8306 LearningRate 0.2380 Epoch: 9 Global Step: 51440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:12:24,418-Speed 10481.06 samples/sec Loss 7.8381 LearningRate 0.2379 Epoch: 9 Global Step: 51450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:12:32,202-Speed 10526.92 samples/sec Loss 7.8383 LearningRate 0.2378 Epoch: 9 Global Step: 51460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:12:40,000-Speed 10505.40 samples/sec Loss 7.8308 LearningRate 0.2377 Epoch: 9 Global Step: 51470 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 02:12:47,794-Speed 10512.21 samples/sec Loss 7.8497 LearningRate 0.2376 Epoch: 9 Global Step: 51480 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 02:12:55,591-Speed 10508.31 samples/sec Loss 7.8399 LearningRate 0.2376 Epoch: 9 Global Step: 51490 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 02:13:03,382-Speed 10516.29 samples/sec Loss 7.8789 LearningRate 0.2375 Epoch: 9 Global Step: 51500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:13:11,164-Speed 10528.58 samples/sec Loss 7.8557 LearningRate 0.2374 Epoch: 9 Global Step: 51510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:13:18,962-Speed 10506.10 samples/sec Loss 7.8457 LearningRate 0.2373 Epoch: 9 Global Step: 51520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:13:26,818-Speed 10430.63 samples/sec Loss 7.8903 LearningRate 0.2372 Epoch: 9 Global Step: 51530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:13:34,598-Speed 10530.74 samples/sec Loss 7.8810 LearningRate 0.2371 Epoch: 9 Global Step: 51540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:13:42,415-Speed 10480.88 samples/sec Loss 7.8094 LearningRate 0.2370 Epoch: 9 Global Step: 51550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:13:50,238-Speed 10473.75 samples/sec Loss 7.7899 LearningRate 0.2369 Epoch: 9 Global Step: 51560 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:13:58,076-Speed 10453.36 samples/sec Loss 7.8215 LearningRate 0.2368 Epoch: 9 Global Step: 51570 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:14:05,882-Speed 10495.28 samples/sec Loss 7.8508 LearningRate 0.2367 Epoch: 9 Global Step: 51580 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:14:13,673-Speed 10515.29 samples/sec Loss 7.7884 LearningRate 0.2366 Epoch: 9 Global Step: 51590 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:14:21,463-Speed 10518.18 samples/sec Loss 7.8028 LearningRate 0.2366 Epoch: 9 Global Step: 51600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:14:29,254-Speed 10517.10 samples/sec Loss 7.7978 LearningRate 0.2365 Epoch: 9 Global Step: 51610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:14:37,049-Speed 10510.23 samples/sec Loss 7.8088 LearningRate 0.2364 Epoch: 9 Global Step: 51620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:14:44,837-Speed 10520.03 samples/sec Loss 7.8363 LearningRate 0.2363 Epoch: 9 Global Step: 51630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:14:52,631-Speed 10512.61 samples/sec Loss 7.8230 LearningRate 0.2362 Epoch: 9 Global Step: 51640 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:15:00,404-Speed 10541.07 samples/sec Loss 7.8148 LearningRate 0.2361 Epoch: 9 Global Step: 51650 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:15:08,263-Speed 10423.61 samples/sec Loss 7.7875 LearningRate 0.2360 Epoch: 9 Global Step: 51660 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:15:16,050-Speed 10523.16 samples/sec Loss 7.8423 LearningRate 0.2359 Epoch: 9 Global Step: 51670 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:15:23,839-Speed 10518.63 samples/sec Loss 7.8885 LearningRate 0.2358 Epoch: 9 Global Step: 51680 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:15:31,652-Speed 10486.52 samples/sec Loss 7.8357 LearningRate 0.2357 Epoch: 9 Global Step: 51690 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:15:39,443-Speed 10516.39 samples/sec Loss 7.8028 LearningRate 0.2356 Epoch: 9 Global Step: 51700 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:15:47,263-Speed 10476.69 samples/sec Loss 7.7616 LearningRate 0.2356 Epoch: 9 Global Step: 51710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:15:55,084-Speed 10476.81 samples/sec Loss 7.8202 LearningRate 0.2355 Epoch: 9 Global Step: 51720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:16:02,890-Speed 10496.10 samples/sec Loss 7.8022 LearningRate 0.2354 Epoch: 9 Global Step: 51730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:16:10,722-Speed 10460.49 samples/sec Loss 7.7891 LearningRate 0.2353 Epoch: 9 Global Step: 51740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:16:18,563-Speed 10449.68 samples/sec Loss 7.8494 LearningRate 0.2352 Epoch: 9 Global Step: 51750 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:16:26,350-Speed 10521.25 samples/sec Loss 7.8016 LearningRate 0.2351 Epoch: 9 Global Step: 51760 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:16:34,142-Speed 10515.04 samples/sec Loss 7.8450 LearningRate 0.2350 Epoch: 9 Global Step: 51770 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:16:41,956-Speed 10485.97 samples/sec Loss 7.8474 LearningRate 0.2349 Epoch: 9 Global Step: 51780 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:16:49,747-Speed 10515.44 samples/sec Loss 7.8427 LearningRate 0.2348 Epoch: 9 Global Step: 51790 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:16:57,534-Speed 10522.23 samples/sec Loss 7.8241 LearningRate 0.2347 Epoch: 9 Global Step: 51800 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:17:05,342-Speed 10492.87 samples/sec Loss 7.7859 LearningRate 0.2346 Epoch: 9 Global Step: 51810 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:17:13,139-Speed 10508.57 samples/sec Loss 7.7930 LearningRate 0.2346 Epoch: 9 Global Step: 51820 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:17:20,932-Speed 10514.10 samples/sec Loss 7.8134 LearningRate 0.2345 Epoch: 9 Global Step: 51830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:17:28,746-Speed 10488.73 samples/sec Loss 7.8163 LearningRate 0.2344 Epoch: 9 Global Step: 51840 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:17:51,182-Speed 3651.59 samples/sec Loss 7.8212 LearningRate 0.2343 Epoch: 10 Global Step: 51850 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:17:58,953-Speed 10543.27 samples/sec Loss 7.7921 LearningRate 0.2342 Epoch: 10 Global Step: 51860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:18:06,721-Speed 10547.17 samples/sec Loss 7.7550 LearningRate 0.2341 Epoch: 10 Global Step: 51870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:18:14,510-Speed 10519.51 samples/sec Loss 7.7873 LearningRate 0.2340 Epoch: 10 Global Step: 51880 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:18:22,280-Speed 10545.17 samples/sec Loss 7.7843 LearningRate 0.2339 Epoch: 10 Global Step: 51890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:18:30,048-Speed 10545.97 samples/sec Loss 7.7155 LearningRate 0.2338 Epoch: 10 Global Step: 51900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:18:37,829-Speed 10530.10 samples/sec Loss 7.7742 LearningRate 0.2337 Epoch: 10 Global Step: 51910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:18:45,615-Speed 10525.65 samples/sec Loss 7.8075 LearningRate 0.2337 Epoch: 10 Global Step: 51920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:18:53,414-Speed 10506.56 samples/sec Loss 7.7251 LearningRate 0.2336 Epoch: 10 Global Step: 51930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:19:01,229-Speed 10483.99 samples/sec Loss 7.7761 LearningRate 0.2335 Epoch: 10 Global Step: 51940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:19:09,065-Speed 10454.32 samples/sec Loss 7.8183 LearningRate 0.2334 Epoch: 10 Global Step: 51950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:19:16,897-Speed 10460.94 samples/sec Loss 7.8202 LearningRate 0.2333 Epoch: 10 Global Step: 51960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:19:24,724-Speed 10468.32 samples/sec Loss 7.7561 LearningRate 0.2332 Epoch: 10 Global Step: 51970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:19:32,526-Speed 10501.05 samples/sec Loss 7.7458 LearningRate 0.2331 Epoch: 10 Global Step: 51980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:19:40,324-Speed 10507.05 samples/sec Loss 7.7339 LearningRate 0.2330 Epoch: 10 Global Step: 51990 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:19:48,114-Speed 10517.62 samples/sec Loss 7.7187 LearningRate 0.2329 Epoch: 10 Global Step: 52000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:19:55,891-Speed 10534.95 samples/sec Loss 7.7635 LearningRate 0.2328 Epoch: 10 Global Step: 52010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:20:03,674-Speed 10527.27 samples/sec Loss 7.7815 LearningRate 0.2328 Epoch: 10 Global Step: 52020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:20:11,478-Speed 10498.31 samples/sec Loss 7.8287 LearningRate 0.2327 Epoch: 10 Global Step: 52030 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:20:19,267-Speed 10519.14 samples/sec Loss 7.7961 LearningRate 0.2326 Epoch: 10 Global Step: 52040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:20:27,076-Speed 10490.80 samples/sec Loss 7.7443 LearningRate 0.2325 Epoch: 10 Global Step: 52050 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:20:34,875-Speed 10505.64 samples/sec Loss 7.7605 LearningRate 0.2324 Epoch: 10 Global Step: 52060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:20:42,661-Speed 10522.97 samples/sec Loss 7.7161 LearningRate 0.2323 Epoch: 10 Global Step: 52070 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:20:50,450-Speed 10519.80 samples/sec Loss 7.7663 LearningRate 0.2322 Epoch: 10 Global Step: 52080 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:20:58,230-Speed 10531.72 samples/sec Loss 7.7563 LearningRate 0.2321 Epoch: 10 Global Step: 52090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:21:06,011-Speed 10529.24 samples/sec Loss 7.7851 LearningRate 0.2320 Epoch: 10 Global Step: 52100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:21:13,802-Speed 10515.08 samples/sec Loss 7.7453 LearningRate 0.2319 Epoch: 10 Global Step: 52110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:21:21,588-Speed 10524.79 samples/sec Loss 7.7763 LearningRate 0.2319 Epoch: 10 Global Step: 52120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:21:29,370-Speed 10527.11 samples/sec Loss 7.7767 LearningRate 0.2318 Epoch: 10 Global Step: 52130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:21:37,180-Speed 10490.09 samples/sec Loss 7.7743 LearningRate 0.2317 Epoch: 10 Global Step: 52140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:21:44,968-Speed 10520.90 samples/sec Loss 7.7966 LearningRate 0.2316 Epoch: 10 Global Step: 52150 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:21:52,775-Speed 10494.11 samples/sec Loss 7.7588 LearningRate 0.2315 Epoch: 10 Global Step: 52160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:22:00,567-Speed 10515.42 samples/sec Loss 7.7508 LearningRate 0.2314 Epoch: 10 Global Step: 52170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:22:08,353-Speed 10520.99 samples/sec Loss 7.7270 LearningRate 0.2313 Epoch: 10 Global Step: 52180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:22:16,157-Speed 10501.99 samples/sec Loss 7.7088 LearningRate 0.2312 Epoch: 10 Global Step: 52190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:22:23,950-Speed 10514.68 samples/sec Loss 7.7512 LearningRate 0.2311 Epoch: 10 Global Step: 52200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:22:31,727-Speed 10534.14 samples/sec Loss 7.7317 LearningRate 0.2310 Epoch: 10 Global Step: 52210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:22:39,521-Speed 10512.05 samples/sec Loss 7.7313 LearningRate 0.2310 Epoch: 10 Global Step: 52220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:22:47,310-Speed 10517.88 samples/sec Loss 7.7168 LearningRate 0.2309 Epoch: 10 Global Step: 52230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:22:55,122-Speed 10489.89 samples/sec Loss 7.7150 LearningRate 0.2308 Epoch: 10 Global Step: 52240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:23:02,937-Speed 10482.51 samples/sec Loss 7.7012 LearningRate 0.2307 Epoch: 10 Global Step: 52250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:23:10,730-Speed 10513.12 samples/sec Loss 7.7216 LearningRate 0.2306 Epoch: 10 Global Step: 52260 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:23:18,543-Speed 10487.16 samples/sec Loss 7.7701 LearningRate 0.2305 Epoch: 10 Global Step: 52270 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:23:26,367-Speed 10472.27 samples/sec Loss 7.7389 LearningRate 0.2304 Epoch: 10 Global Step: 52280 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:23:34,178-Speed 10488.54 samples/sec Loss 7.8209 LearningRate 0.2303 Epoch: 10 Global Step: 52290 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 02:23:42,013-Speed 10457.35 samples/sec Loss 7.7921 LearningRate 0.2302 Epoch: 10 Global Step: 52300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:23:49,889-Speed 10402.86 samples/sec Loss 7.7514 LearningRate 0.2301 Epoch: 10 Global Step: 52310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:23:57,745-Speed 10429.84 samples/sec Loss 7.8016 LearningRate 0.2301 Epoch: 10 Global Step: 52320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:24:05,574-Speed 10464.37 samples/sec Loss 7.8241 LearningRate 0.2300 Epoch: 10 Global Step: 52330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:24:13,408-Speed 10458.72 samples/sec Loss 7.7548 LearningRate 0.2299 Epoch: 10 Global Step: 52340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:24:21,228-Speed 10478.58 samples/sec Loss 7.7256 LearningRate 0.2298 Epoch: 10 Global Step: 52350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:24:29,053-Speed 10469.71 samples/sec Loss 7.7831 LearningRate 0.2297 Epoch: 10 Global Step: 52360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:24:36,916-Speed 10419.12 samples/sec Loss 7.6993 LearningRate 0.2296 Epoch: 10 Global Step: 52370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:24:44,747-Speed 10463.33 samples/sec Loss 7.7693 LearningRate 0.2295 Epoch: 10 Global Step: 52380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:24:52,569-Speed 10474.40 samples/sec Loss 7.7270 LearningRate 0.2294 Epoch: 10 Global Step: 52390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:25:00,442-Speed 10406.07 samples/sec Loss 7.6904 LearningRate 0.2293 Epoch: 10 Global Step: 52400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:25:08,295-Speed 10432.12 samples/sec Loss 7.7233 LearningRate 0.2292 Epoch: 10 Global Step: 52410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:25:16,120-Speed 10471.17 samples/sec Loss 7.6756 LearningRate 0.2292 Epoch: 10 Global Step: 52420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:25:24,003-Speed 10394.24 samples/sec Loss 7.7377 LearningRate 0.2291 Epoch: 10 Global Step: 52430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:25:31,856-Speed 10433.18 samples/sec Loss 7.6844 LearningRate 0.2290 Epoch: 10 Global Step: 52440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:25:39,688-Speed 10460.21 samples/sec Loss 7.7057 LearningRate 0.2289 Epoch: 10 Global Step: 52450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:25:47,520-Speed 10461.25 samples/sec Loss 7.7214 LearningRate 0.2288 Epoch: 10 Global Step: 52460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:25:55,296-Speed 10537.33 samples/sec Loss 7.7124 LearningRate 0.2287 Epoch: 10 Global Step: 52470 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:26:03,085-Speed 10518.36 samples/sec Loss 7.6890 LearningRate 0.2286 Epoch: 10 Global Step: 52480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:26:10,883-Speed 10508.33 samples/sec Loss 7.6525 LearningRate 0.2285 Epoch: 10 Global Step: 52490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:26:18,652-Speed 10545.52 samples/sec Loss 7.6416 LearningRate 0.2284 Epoch: 10 Global Step: 52500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:26:26,442-Speed 10518.18 samples/sec Loss 7.7464 LearningRate 0.2284 Epoch: 10 Global Step: 52510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:26:34,222-Speed 10530.13 samples/sec Loss 7.6280 LearningRate 0.2283 Epoch: 10 Global Step: 52520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:26:42,017-Speed 10510.84 samples/sec Loss 7.6885 LearningRate 0.2282 Epoch: 10 Global Step: 52530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:26:49,793-Speed 10536.36 samples/sec Loss 7.6915 LearningRate 0.2281 Epoch: 10 Global Step: 52540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:26:57,569-Speed 10535.96 samples/sec Loss 7.6483 LearningRate 0.2280 Epoch: 10 Global Step: 52550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:27:05,396-Speed 10468.40 samples/sec Loss 7.6933 LearningRate 0.2279 Epoch: 10 Global Step: 52560 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:27:13,161-Speed 10550.78 samples/sec Loss 7.6842 LearningRate 0.2278 Epoch: 10 Global Step: 52570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:27:20,976-Speed 10484.50 samples/sec Loss 7.7334 LearningRate 0.2277 Epoch: 10 Global Step: 52580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:27:28,782-Speed 10496.43 samples/sec Loss 7.6803 LearningRate 0.2276 Epoch: 10 Global Step: 52590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:27:36,572-Speed 10517.28 samples/sec Loss 7.7074 LearningRate 0.2276 Epoch: 10 Global Step: 52600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:27:44,348-Speed 10535.68 samples/sec Loss 7.6838 LearningRate 0.2275 Epoch: 10 Global Step: 52610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:27:52,111-Speed 10554.62 samples/sec Loss 7.6473 LearningRate 0.2274 Epoch: 10 Global Step: 52620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:27:59,914-Speed 10499.58 samples/sec Loss 7.6653 LearningRate 0.2273 Epoch: 10 Global Step: 52630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:28:07,685-Speed 10542.75 samples/sec Loss 7.6483 LearningRate 0.2272 Epoch: 10 Global Step: 52640 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:28:15,471-Speed 10524.10 samples/sec Loss 7.6634 LearningRate 0.2271 Epoch: 10 Global Step: 52650 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:28:23,261-Speed 10517.25 samples/sec Loss 7.6506 LearningRate 0.2270 Epoch: 10 Global Step: 52660 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:28:31,055-Speed 10512.98 samples/sec Loss 7.6904 LearningRate 0.2269 Epoch: 10 Global Step: 52670 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 02:28:38,884-Speed 10465.25 samples/sec Loss 7.7149 LearningRate 0.2268 Epoch: 10 Global Step: 52680 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 02:28:46,671-Speed 10521.81 samples/sec Loss 7.6837 LearningRate 0.2268 Epoch: 10 Global Step: 52690 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:28:54,449-Speed 10534.05 samples/sec Loss 7.7252 LearningRate 0.2267 Epoch: 10 Global Step: 52700 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:29:02,280-Speed 10466.17 samples/sec Loss 7.6890 LearningRate 0.2266 Epoch: 10 Global Step: 52710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:29:10,071-Speed 10515.41 samples/sec Loss 7.6976 LearningRate 0.2265 Epoch: 10 Global Step: 52720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:29:17,843-Speed 10541.25 samples/sec Loss 7.6718 LearningRate 0.2264 Epoch: 10 Global Step: 52730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:29:25,634-Speed 10516.10 samples/sec Loss 7.6454 LearningRate 0.2263 Epoch: 10 Global Step: 52740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:29:33,433-Speed 10505.40 samples/sec Loss 7.6885 LearningRate 0.2262 Epoch: 10 Global Step: 52750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:29:41,235-Speed 10501.29 samples/sec Loss 7.6886 LearningRate 0.2261 Epoch: 10 Global Step: 52760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:29:49,028-Speed 10512.56 samples/sec Loss 7.6610 LearningRate 0.2260 Epoch: 10 Global Step: 52770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:29:56,848-Speed 10478.67 samples/sec Loss 7.6732 LearningRate 0.2260 Epoch: 10 Global Step: 52780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:30:04,644-Speed 10508.76 samples/sec Loss 7.6936 LearningRate 0.2259 Epoch: 10 Global Step: 52790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:30:12,438-Speed 10511.67 samples/sec Loss 7.6445 LearningRate 0.2258 Epoch: 10 Global Step: 52800 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:30:20,231-Speed 10513.05 samples/sec Loss 7.6833 LearningRate 0.2257 Epoch: 10 Global Step: 52810 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:30:28,015-Speed 10526.91 samples/sec Loss 7.6198 LearningRate 0.2256 Epoch: 10 Global Step: 52820 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:30:35,850-Speed 10457.09 samples/sec Loss 7.6146 LearningRate 0.2255 Epoch: 10 Global Step: 52830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:30:43,668-Speed 10480.00 samples/sec Loss 7.6259 LearningRate 0.2254 Epoch: 10 Global Step: 52840 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:30:51,487-Speed 10482.39 samples/sec Loss 7.6197 LearningRate 0.2253 Epoch: 10 Global Step: 52850 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:30:59,289-Speed 10501.62 samples/sec Loss 7.6193 LearningRate 0.2252 Epoch: 10 Global Step: 52860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:31:07,097-Speed 10493.48 samples/sec Loss 7.6655 LearningRate 0.2252 Epoch: 10 Global Step: 52870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:31:14,881-Speed 10524.41 samples/sec Loss 7.6372 LearningRate 0.2251 Epoch: 10 Global Step: 52880 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:31:22,693-Speed 10488.81 samples/sec Loss 7.6226 LearningRate 0.2250 Epoch: 10 Global Step: 52890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:31:30,500-Speed 10495.02 samples/sec Loss 7.6877 LearningRate 0.2249 Epoch: 10 Global Step: 52900 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 02:31:38,300-Speed 10502.99 samples/sec Loss 7.6647 LearningRate 0.2248 Epoch: 10 Global Step: 52910 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 02:31:46,125-Speed 10469.91 samples/sec Loss 7.6582 LearningRate 0.2247 Epoch: 10 Global Step: 52920 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 02:31:53,933-Speed 10493.85 samples/sec Loss 7.6399 LearningRate 0.2246 Epoch: 10 Global Step: 52930 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 02:32:01,724-Speed 10516.27 samples/sec Loss 7.6337 LearningRate 0.2245 Epoch: 10 Global Step: 52940 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 02:32:09,503-Speed 10531.99 samples/sec Loss 7.6603 LearningRate 0.2244 Epoch: 10 Global Step: 52950 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 02:32:17,284-Speed 10529.67 samples/sec Loss 7.6187 LearningRate 0.2244 Epoch: 10 Global Step: 52960 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-16 02:32:25,085-Speed 10501.89 samples/sec Loss 7.6457 LearningRate 0.2243 Epoch: 10 Global Step: 52970 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-16 02:32:32,883-Speed 10507.03 samples/sec Loss 7.6280 LearningRate 0.2242 Epoch: 10 Global Step: 52980 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-16 02:32:40,679-Speed 10509.47 samples/sec Loss 7.6375 LearningRate 0.2241 Epoch: 10 Global Step: 52990 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-16 02:32:48,452-Speed 10539.90 samples/sec Loss 7.6623 LearningRate 0.2240 Epoch: 10 Global Step: 53000 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-16 02:32:56,261-Speed 10492.51 samples/sec Loss 7.6597 LearningRate 0.2239 Epoch: 10 Global Step: 53010 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-16 02:33:04,064-Speed 10499.46 samples/sec Loss 7.6374 LearningRate 0.2238 Epoch: 10 Global Step: 53020 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-16 02:33:11,884-Speed 10477.18 samples/sec Loss 7.6998 LearningRate 0.2237 Epoch: 10 Global Step: 53030 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-16 02:33:19,683-Speed 10505.74 samples/sec Loss 7.6562 LearningRate 0.2236 Epoch: 10 Global Step: 53040 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-16 02:33:27,471-Speed 10520.34 samples/sec Loss 7.6335 LearningRate 0.2236 Epoch: 10 Global Step: 53050 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-16 02:33:35,275-Speed 10498.02 samples/sec Loss 7.5531 LearningRate 0.2235 Epoch: 10 Global Step: 53060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:33:43,061-Speed 10523.33 samples/sec Loss 7.6041 LearningRate 0.2234 Epoch: 10 Global Step: 53070 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:33:50,858-Speed 10508.16 samples/sec Loss 7.6459 LearningRate 0.2233 Epoch: 10 Global Step: 53080 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:33:58,671-Speed 10486.92 samples/sec Loss 7.5931 LearningRate 0.2232 Epoch: 10 Global Step: 53090 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:34:06,456-Speed 10523.09 samples/sec Loss 7.5848 LearningRate 0.2231 Epoch: 10 Global Step: 53100 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:34:14,252-Speed 10509.12 samples/sec Loss 7.5847 LearningRate 0.2230 Epoch: 10 Global Step: 53110 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:34:22,025-Speed 10540.36 samples/sec Loss 7.6262 LearningRate 0.2229 Epoch: 10 Global Step: 53120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:34:29,806-Speed 10530.98 samples/sec Loss 7.6638 LearningRate 0.2229 Epoch: 10 Global Step: 53130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:34:37,682-Speed 10407.98 samples/sec Loss 7.6412 LearningRate 0.2228 Epoch: 10 Global Step: 53140 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:34:45,465-Speed 10527.01 samples/sec Loss 7.6679 LearningRate 0.2227 Epoch: 10 Global Step: 53150 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:34:53,243-Speed 10534.15 samples/sec Loss 7.5989 LearningRate 0.2226 Epoch: 10 Global Step: 53160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:35:01,050-Speed 10495.12 samples/sec Loss 7.5688 LearningRate 0.2225 Epoch: 10 Global Step: 53170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:35:08,844-Speed 10512.98 samples/sec Loss 7.6194 LearningRate 0.2224 Epoch: 10 Global Step: 53180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:35:16,644-Speed 10504.07 samples/sec Loss 7.6214 LearningRate 0.2223 Epoch: 10 Global Step: 53190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:35:24,454-Speed 10490.23 samples/sec Loss 7.6347 LearningRate 0.2222 Epoch: 10 Global Step: 53200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:35:32,241-Speed 10520.97 samples/sec Loss 7.6011 LearningRate 0.2222 Epoch: 10 Global Step: 53210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:35:40,015-Speed 10546.76 samples/sec Loss 7.6250 LearningRate 0.2221 Epoch: 10 Global Step: 53220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:35:47,801-Speed 10521.67 samples/sec Loss 7.5203 LearningRate 0.2220 Epoch: 10 Global Step: 53230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:35:55,601-Speed 10504.98 samples/sec Loss 7.6149 LearningRate 0.2219 Epoch: 10 Global Step: 53240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:36:03,383-Speed 10527.94 samples/sec Loss 7.6260 LearningRate 0.2218 Epoch: 10 Global Step: 53250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:36:11,209-Speed 10468.81 samples/sec Loss 7.6455 LearningRate 0.2217 Epoch: 10 Global Step: 53260 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 02:36:19,034-Speed 10470.61 samples/sec Loss 7.6136 LearningRate 0.2216 Epoch: 10 Global Step: 53270 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 02:36:26,864-Speed 10464.39 samples/sec Loss 7.6150 LearningRate 0.2215 Epoch: 10 Global Step: 53280 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 02:36:34,664-Speed 10503.73 samples/sec Loss 7.6133 LearningRate 0.2214 Epoch: 10 Global Step: 53290 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 02:36:42,448-Speed 10528.15 samples/sec Loss 7.6297 LearningRate 0.2214 Epoch: 10 Global Step: 53300 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 02:36:50,248-Speed 10503.88 samples/sec Loss 7.6103 LearningRate 0.2213 Epoch: 10 Global Step: 53310 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 02:36:58,030-Speed 10527.98 samples/sec Loss 7.5473 LearningRate 0.2212 Epoch: 10 Global Step: 53320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:37:05,820-Speed 10520.81 samples/sec Loss 7.5908 LearningRate 0.2211 Epoch: 10 Global Step: 53330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:37:13,617-Speed 10508.08 samples/sec Loss 7.5440 LearningRate 0.2210 Epoch: 10 Global Step: 53340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:37:21,456-Speed 10452.32 samples/sec Loss 7.5896 LearningRate 0.2209 Epoch: 10 Global Step: 53350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:37:29,283-Speed 10468.14 samples/sec Loss 7.5552 LearningRate 0.2208 Epoch: 10 Global Step: 53360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:37:37,085-Speed 10501.18 samples/sec Loss 7.5364 LearningRate 0.2207 Epoch: 10 Global Step: 53370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:37:44,870-Speed 10523.55 samples/sec Loss 7.5405 LearningRate 0.2207 Epoch: 10 Global Step: 53380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:37:52,702-Speed 10461.39 samples/sec Loss 7.5385 LearningRate 0.2206 Epoch: 10 Global Step: 53390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:38:00,567-Speed 10417.47 samples/sec Loss 7.5916 LearningRate 0.2205 Epoch: 10 Global Step: 53400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:38:08,378-Speed 10488.58 samples/sec Loss 7.5763 LearningRate 0.2204 Epoch: 10 Global Step: 53410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:38:16,176-Speed 10507.87 samples/sec Loss 7.6317 LearningRate 0.2203 Epoch: 10 Global Step: 53420 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 02:38:24,015-Speed 10454.19 samples/sec Loss 7.5692 LearningRate 0.2202 Epoch: 10 Global Step: 53430 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 02:38:31,865-Speed 10435.95 samples/sec Loss 7.6296 LearningRate 0.2201 Epoch: 10 Global Step: 53440 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 02:38:39,684-Speed 10478.55 samples/sec Loss 7.6125 LearningRate 0.2200 Epoch: 10 Global Step: 53450 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 02:38:47,528-Speed 10444.70 samples/sec Loss 7.5531 LearningRate 0.2200 Epoch: 10 Global Step: 53460 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 02:38:55,360-Speed 10461.74 samples/sec Loss 7.5776 LearningRate 0.2199 Epoch: 10 Global Step: 53470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:39:03,184-Speed 10471.95 samples/sec Loss 7.5315 LearningRate 0.2198 Epoch: 10 Global Step: 53480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:39:11,019-Speed 10456.17 samples/sec Loss 7.5863 LearningRate 0.2197 Epoch: 10 Global Step: 53490 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:39:18,827-Speed 10493.46 samples/sec Loss 7.6036 LearningRate 0.2196 Epoch: 10 Global Step: 53500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:39:26,643-Speed 10482.94 samples/sec Loss 7.5305 LearningRate 0.2195 Epoch: 10 Global Step: 53510 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:39:34,448-Speed 10496.59 samples/sec Loss 7.5847 LearningRate 0.2194 Epoch: 10 Global Step: 53520 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:39:42,263-Speed 10483.44 samples/sec Loss 7.5551 LearningRate 0.2193 Epoch: 10 Global Step: 53530 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:39:50,086-Speed 10473.53 samples/sec Loss 7.5510 LearningRate 0.2193 Epoch: 10 Global Step: 53540 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:39:57,874-Speed 10520.33 samples/sec Loss 7.4994 LearningRate 0.2192 Epoch: 10 Global Step: 53550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:40:05,676-Speed 10500.83 samples/sec Loss 7.5125 LearningRate 0.2191 Epoch: 10 Global Step: 53560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:40:13,505-Speed 10465.09 samples/sec Loss 7.5704 LearningRate 0.2190 Epoch: 10 Global Step: 53570 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 02:40:21,308-Speed 10505.57 samples/sec Loss 7.5832 LearningRate 0.2189 Epoch: 10 Global Step: 53580 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 02:40:29,094-Speed 10522.34 samples/sec Loss 7.5485 LearningRate 0.2188 Epoch: 10 Global Step: 53590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:40:36,891-Speed 10508.11 samples/sec Loss 7.5472 LearningRate 0.2187 Epoch: 10 Global Step: 53600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:40:44,677-Speed 10522.27 samples/sec Loss 7.5152 LearningRate 0.2186 Epoch: 10 Global Step: 53610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:40:52,470-Speed 10512.64 samples/sec Loss 7.5294 LearningRate 0.2186 Epoch: 10 Global Step: 53620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:41:00,274-Speed 10500.07 samples/sec Loss 7.5619 LearningRate 0.2185 Epoch: 10 Global Step: 53630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:41:08,068-Speed 10511.01 samples/sec Loss 7.5580 LearningRate 0.2184 Epoch: 10 Global Step: 53640 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:41:15,875-Speed 10495.04 samples/sec Loss 7.5210 LearningRate 0.2183 Epoch: 10 Global Step: 53650 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:41:23,695-Speed 10477.68 samples/sec Loss 7.5270 LearningRate 0.2182 Epoch: 10 Global Step: 53660 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:41:31,463-Speed 10546.81 samples/sec Loss 7.5303 LearningRate 0.2181 Epoch: 10 Global Step: 53670 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:41:39,233-Speed 10545.82 samples/sec Loss 7.4911 LearningRate 0.2180 Epoch: 10 Global Step: 53680 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:41:47,036-Speed 10499.74 samples/sec Loss 7.5610 LearningRate 0.2179 Epoch: 10 Global Step: 53690 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:41:54,830-Speed 10510.39 samples/sec Loss 7.6134 LearningRate 0.2179 Epoch: 10 Global Step: 53700 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:42:02,618-Speed 10521.79 samples/sec Loss 7.5608 LearningRate 0.2178 Epoch: 10 Global Step: 53710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:42:10,394-Speed 10535.84 samples/sec Loss 7.4678 LearningRate 0.2177 Epoch: 10 Global Step: 53720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:42:18,178-Speed 10525.58 samples/sec Loss 7.5400 LearningRate 0.2176 Epoch: 10 Global Step: 53730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:42:25,989-Speed 10488.38 samples/sec Loss 7.4827 LearningRate 0.2175 Epoch: 10 Global Step: 53740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:42:33,777-Speed 10521.00 samples/sec Loss 7.4976 LearningRate 0.2174 Epoch: 10 Global Step: 53750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:42:41,594-Speed 10480.88 samples/sec Loss 7.5276 LearningRate 0.2173 Epoch: 10 Global Step: 53760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:42:49,381-Speed 10522.63 samples/sec Loss 7.4778 LearningRate 0.2172 Epoch: 10 Global Step: 53770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:42:57,194-Speed 10484.86 samples/sec Loss 7.5068 LearningRate 0.2172 Epoch: 10 Global Step: 53780 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:43:04,995-Speed 10502.76 samples/sec Loss 7.5148 LearningRate 0.2171 Epoch: 10 Global Step: 53790 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:43:12,800-Speed 10497.24 samples/sec Loss 7.4893 LearningRate 0.2170 Epoch: 10 Global Step: 53800 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:43:20,592-Speed 10515.20 samples/sec Loss 7.5301 LearningRate 0.2169 Epoch: 10 Global Step: 53810 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:43:28,416-Speed 10472.25 samples/sec Loss 7.5612 LearningRate 0.2168 Epoch: 10 Global Step: 53820 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:43:36,238-Speed 10473.77 samples/sec Loss 7.5065 LearningRate 0.2167 Epoch: 10 Global Step: 53830 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:43:44,038-Speed 10504.98 samples/sec Loss 7.5130 LearningRate 0.2166 Epoch: 10 Global Step: 53840 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:43:51,867-Speed 10463.95 samples/sec Loss 7.4868 LearningRate 0.2166 Epoch: 10 Global Step: 53850 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:43:59,667-Speed 10504.21 samples/sec Loss 7.5387 LearningRate 0.2165 Epoch: 10 Global Step: 53860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:44:07,463-Speed 10509.80 samples/sec Loss 7.5248 LearningRate 0.2164 Epoch: 10 Global Step: 53870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:44:15,280-Speed 10481.62 samples/sec Loss 7.4925 LearningRate 0.2163 Epoch: 10 Global Step: 53880 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:44:23,108-Speed 10465.62 samples/sec Loss 7.5064 LearningRate 0.2162 Epoch: 10 Global Step: 53890 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:44:30,911-Speed 10499.48 samples/sec Loss 7.5224 LearningRate 0.2161 Epoch: 10 Global Step: 53900 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:44:38,712-Speed 10507.64 samples/sec Loss 7.5545 LearningRate 0.2160 Epoch: 10 Global Step: 53910 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:44:46,493-Speed 10529.43 samples/sec Loss 7.5483 LearningRate 0.2159 Epoch: 10 Global Step: 53920 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:44:54,269-Speed 10536.20 samples/sec Loss 7.4497 LearningRate 0.2159 Epoch: 10 Global Step: 53930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:45:02,104-Speed 10461.27 samples/sec Loss 7.4683 LearningRate 0.2158 Epoch: 10 Global Step: 53940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:45:09,905-Speed 10502.66 samples/sec Loss 7.4635 LearningRate 0.2157 Epoch: 10 Global Step: 53950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:45:17,723-Speed 10480.11 samples/sec Loss 7.4085 LearningRate 0.2156 Epoch: 10 Global Step: 53960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:45:25,523-Speed 10502.72 samples/sec Loss 7.4822 LearningRate 0.2155 Epoch: 10 Global Step: 53970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:45:33,326-Speed 10500.78 samples/sec Loss 7.4912 LearningRate 0.2154 Epoch: 10 Global Step: 53980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:45:41,138-Speed 10488.99 samples/sec Loss 7.5047 LearningRate 0.2153 Epoch: 10 Global Step: 53990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:45:48,972-Speed 10458.02 samples/sec Loss 7.4916 LearningRate 0.2153 Epoch: 10 Global Step: 54000 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:45:56,802-Speed 10463.17 samples/sec Loss 7.4730 LearningRate 0.2152 Epoch: 10 Global Step: 54010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:46:04,629-Speed 10467.11 samples/sec Loss 7.5005 LearningRate 0.2151 Epoch: 10 Global Step: 54020 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:46:12,426-Speed 10507.70 samples/sec Loss 7.4839 LearningRate 0.2150 Epoch: 10 Global Step: 54030 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 02:46:20,216-Speed 10517.43 samples/sec Loss 7.5087 LearningRate 0.2149 Epoch: 10 Global Step: 54040 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:46:28,025-Speed 10492.14 samples/sec Loss 7.5175 LearningRate 0.2148 Epoch: 10 Global Step: 54050 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:46:35,829-Speed 10498.79 samples/sec Loss 7.4950 LearningRate 0.2147 Epoch: 10 Global Step: 54060 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:46:43,629-Speed 10504.71 samples/sec Loss 7.5429 LearningRate 0.2146 Epoch: 10 Global Step: 54070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:46:51,447-Speed 10480.08 samples/sec Loss 7.5003 LearningRate 0.2146 Epoch: 10 Global Step: 54080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:46:59,243-Speed 10509.73 samples/sec Loss 7.5048 LearningRate 0.2145 Epoch: 10 Global Step: 54090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:47:07,023-Speed 10530.36 samples/sec Loss 7.5282 LearningRate 0.2144 Epoch: 10 Global Step: 54100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:47:14,802-Speed 10532.58 samples/sec Loss 7.5020 LearningRate 0.2143 Epoch: 10 Global Step: 54110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:47:22,612-Speed 10491.12 samples/sec Loss 7.5001 LearningRate 0.2142 Epoch: 10 Global Step: 54120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:47:30,428-Speed 10482.29 samples/sec Loss 7.4842 LearningRate 0.2141 Epoch: 10 Global Step: 54130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:47:38,220-Speed 10514.62 samples/sec Loss 7.4345 LearningRate 0.2140 Epoch: 10 Global Step: 54140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:47:46,015-Speed 10510.48 samples/sec Loss 7.4550 LearningRate 0.2140 Epoch: 10 Global Step: 54150 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:47:53,833-Speed 10480.78 samples/sec Loss 7.4501 LearningRate 0.2139 Epoch: 10 Global Step: 54160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:48:01,654-Speed 10476.06 samples/sec Loss 7.4611 LearningRate 0.2138 Epoch: 10 Global Step: 54170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:48:09,458-Speed 10498.28 samples/sec Loss 7.4836 LearningRate 0.2137 Epoch: 10 Global Step: 54180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:48:17,266-Speed 10493.74 samples/sec Loss 7.4330 LearningRate 0.2136 Epoch: 10 Global Step: 54190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:48:25,089-Speed 10472.18 samples/sec Loss 7.4826 LearningRate 0.2135 Epoch: 10 Global Step: 54200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:48:32,905-Speed 10482.72 samples/sec Loss 7.4574 LearningRate 0.2134 Epoch: 10 Global Step: 54210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:48:40,720-Speed 10483.98 samples/sec Loss 7.4319 LearningRate 0.2133 Epoch: 10 Global Step: 54220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:48:48,507-Speed 10521.68 samples/sec Loss 7.5166 LearningRate 0.2133 Epoch: 10 Global Step: 54230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:48:56,300-Speed 10512.98 samples/sec Loss 7.4503 LearningRate 0.2132 Epoch: 10 Global Step: 54240 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 02:49:04,142-Speed 10448.40 samples/sec Loss 7.4530 LearningRate 0.2131 Epoch: 10 Global Step: 54250 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 02:49:11,942-Speed 10504.35 samples/sec Loss 7.4492 LearningRate 0.2130 Epoch: 10 Global Step: 54260 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 02:49:19,729-Speed 10521.57 samples/sec Loss 7.4830 LearningRate 0.2129 Epoch: 10 Global Step: 54270 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 02:49:27,540-Speed 10488.79 samples/sec Loss 7.4528 LearningRate 0.2128 Epoch: 10 Global Step: 54280 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:49:35,317-Speed 10535.55 samples/sec Loss 7.4665 LearningRate 0.2127 Epoch: 10 Global Step: 54290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:49:43,107-Speed 10517.81 samples/sec Loss 7.4621 LearningRate 0.2127 Epoch: 10 Global Step: 54300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:49:50,922-Speed 10483.17 samples/sec Loss 7.4147 LearningRate 0.2126 Epoch: 10 Global Step: 54310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:49:58,706-Speed 10525.45 samples/sec Loss 7.5128 LearningRate 0.2125 Epoch: 10 Global Step: 54320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:50:06,553-Speed 10441.35 samples/sec Loss 7.4653 LearningRate 0.2124 Epoch: 10 Global Step: 54330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:50:14,360-Speed 10495.40 samples/sec Loss 7.4265 LearningRate 0.2123 Epoch: 10 Global Step: 54340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:50:22,176-Speed 10481.08 samples/sec Loss 7.4577 LearningRate 0.2122 Epoch: 10 Global Step: 54350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:50:29,965-Speed 10518.52 samples/sec Loss 7.4409 LearningRate 0.2121 Epoch: 10 Global Step: 54360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:50:37,755-Speed 10518.19 samples/sec Loss 7.3806 LearningRate 0.2121 Epoch: 10 Global Step: 54370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:50:45,555-Speed 10503.91 samples/sec Loss 7.3950 LearningRate 0.2120 Epoch: 10 Global Step: 54380 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 02:50:53,357-Speed 10500.62 samples/sec Loss 7.4613 LearningRate 0.2119 Epoch: 10 Global Step: 54390 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 02:51:01,158-Speed 10503.36 samples/sec Loss 7.5230 LearningRate 0.2118 Epoch: 10 Global Step: 54400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:51:08,979-Speed 10475.49 samples/sec Loss 7.4643 LearningRate 0.2117 Epoch: 10 Global Step: 54410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:51:16,789-Speed 10490.29 samples/sec Loss 7.4230 LearningRate 0.2116 Epoch: 10 Global Step: 54420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:51:24,638-Speed 10437.80 samples/sec Loss 7.4963 LearningRate 0.2115 Epoch: 10 Global Step: 54430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:51:32,470-Speed 10461.00 samples/sec Loss 7.4931 LearningRate 0.2115 Epoch: 10 Global Step: 54440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:51:40,271-Speed 10502.97 samples/sec Loss 7.4609 LearningRate 0.2114 Epoch: 10 Global Step: 54450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:51:48,057-Speed 10523.20 samples/sec Loss 7.4387 LearningRate 0.2113 Epoch: 10 Global Step: 54460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:51:55,881-Speed 10470.63 samples/sec Loss 7.4288 LearningRate 0.2112 Epoch: 10 Global Step: 54470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:52:03,688-Speed 10496.25 samples/sec Loss 7.4132 LearningRate 0.2111 Epoch: 10 Global Step: 54480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:52:11,460-Speed 10542.51 samples/sec Loss 7.3685 LearningRate 0.2110 Epoch: 10 Global Step: 54490 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:52:19,289-Speed 10463.24 samples/sec Loss 7.4003 LearningRate 0.2109 Epoch: 10 Global Step: 54500 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 02:52:27,085-Speed 10510.36 samples/sec Loss 7.4193 LearningRate 0.2109 Epoch: 10 Global Step: 54510 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:52:34,881-Speed 10510.30 samples/sec Loss 7.4138 LearningRate 0.2108 Epoch: 10 Global Step: 54520 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:52:42,673-Speed 10514.32 samples/sec Loss 7.4208 LearningRate 0.2107 Epoch: 10 Global Step: 54530 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:52:50,465-Speed 10514.53 samples/sec Loss 7.4074 LearningRate 0.2106 Epoch: 10 Global Step: 54540 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:52:58,269-Speed 10498.53 samples/sec Loss 7.3900 LearningRate 0.2105 Epoch: 10 Global Step: 54550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:53:06,077-Speed 10493.41 samples/sec Loss 7.3708 LearningRate 0.2104 Epoch: 10 Global Step: 54560 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:53:13,867-Speed 10517.31 samples/sec Loss 7.3740 LearningRate 0.2103 Epoch: 10 Global Step: 54570 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:53:21,656-Speed 10518.92 samples/sec Loss 7.3929 LearningRate 0.2103 Epoch: 10 Global Step: 54580 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:53:29,500-Speed 10445.78 samples/sec Loss 7.4474 LearningRate 0.2102 Epoch: 10 Global Step: 54590 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:53:37,299-Speed 10505.14 samples/sec Loss 7.4093 LearningRate 0.2101 Epoch: 10 Global Step: 54600 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:53:45,128-Speed 10464.49 samples/sec Loss 7.3607 LearningRate 0.2100 Epoch: 10 Global Step: 54610 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:53:52,905-Speed 10535.91 samples/sec Loss 7.4207 LearningRate 0.2099 Epoch: 10 Global Step: 54620 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:54:00,747-Speed 10447.46 samples/sec Loss 7.4135 LearningRate 0.2098 Epoch: 10 Global Step: 54630 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:54:08,533-Speed 10523.18 samples/sec Loss 7.4049 LearningRate 0.2097 Epoch: 10 Global Step: 54640 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:54:16,333-Speed 10502.87 samples/sec Loss 7.4169 LearningRate 0.2097 Epoch: 10 Global Step: 54650 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:54:24,140-Speed 10494.41 samples/sec Loss 7.4746 LearningRate 0.2096 Epoch: 10 Global Step: 54660 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:54:31,986-Speed 10445.92 samples/sec Loss 7.4344 LearningRate 0.2095 Epoch: 10 Global Step: 54670 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:54:39,801-Speed 10482.99 samples/sec Loss 7.4045 LearningRate 0.2094 Epoch: 10 Global Step: 54680 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:54:47,659-Speed 10427.67 samples/sec Loss 7.3982 LearningRate 0.2093 Epoch: 10 Global Step: 54690 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:54:55,480-Speed 10475.96 samples/sec Loss 7.3874 LearningRate 0.2092 Epoch: 10 Global Step: 54700 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:55:03,282-Speed 10500.50 samples/sec Loss 7.3847 LearningRate 0.2091 Epoch: 10 Global Step: 54710 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:55:11,075-Speed 10513.38 samples/sec Loss 7.3856 LearningRate 0.2091 Epoch: 10 Global Step: 54720 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:55:18,890-Speed 10484.04 samples/sec Loss 7.4272 LearningRate 0.2090 Epoch: 10 Global Step: 54730 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:55:26,698-Speed 10494.09 samples/sec Loss 7.4406 LearningRate 0.2089 Epoch: 10 Global Step: 54740 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:55:34,482-Speed 10526.56 samples/sec Loss 7.4112 LearningRate 0.2088 Epoch: 10 Global Step: 54750 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:55:42,251-Speed 10544.90 samples/sec Loss 7.3935 LearningRate 0.2087 Epoch: 10 Global Step: 54760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:55:50,108-Speed 10427.53 samples/sec Loss 7.3863 LearningRate 0.2086 Epoch: 10 Global Step: 54770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:55:57,949-Speed 10449.86 samples/sec Loss 7.3373 LearningRate 0.2085 Epoch: 10 Global Step: 54780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:56:05,746-Speed 10508.47 samples/sec Loss 7.4057 LearningRate 0.2085 Epoch: 10 Global Step: 54790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:56:13,523-Speed 10534.87 samples/sec Loss 7.4084 LearningRate 0.2084 Epoch: 10 Global Step: 54800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:56:21,321-Speed 10505.96 samples/sec Loss 7.4282 LearningRate 0.2083 Epoch: 10 Global Step: 54810 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:56:29,163-Speed 10447.31 samples/sec Loss 7.4162 LearningRate 0.2082 Epoch: 10 Global Step: 54820 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:56:36,953-Speed 10517.77 samples/sec Loss 7.3831 LearningRate 0.2081 Epoch: 10 Global Step: 54830 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:56:44,735-Speed 10527.79 samples/sec Loss 7.3649 LearningRate 0.2080 Epoch: 10 Global Step: 54840 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:56:52,518-Speed 10526.96 samples/sec Loss 7.3676 LearningRate 0.2079 Epoch: 10 Global Step: 54850 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 02:57:00,319-Speed 10503.32 samples/sec Loss 7.4195 LearningRate 0.2079 Epoch: 10 Global Step: 54860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:57:08,121-Speed 10505.06 samples/sec Loss 7.4277 LearningRate 0.2078 Epoch: 10 Global Step: 54870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:57:15,903-Speed 10528.53 samples/sec Loss 7.3991 LearningRate 0.2077 Epoch: 10 Global Step: 54880 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:57:23,687-Speed 10530.15 samples/sec Loss 7.3747 LearningRate 0.2076 Epoch: 10 Global Step: 54890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:57:31,476-Speed 10519.47 samples/sec Loss 7.3562 LearningRate 0.2075 Epoch: 10 Global Step: 54900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:57:39,340-Speed 10417.46 samples/sec Loss 7.3828 LearningRate 0.2074 Epoch: 10 Global Step: 54910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:57:47,126-Speed 10524.33 samples/sec Loss 7.2936 LearningRate 0.2074 Epoch: 10 Global Step: 54920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:57:54,903-Speed 10534.56 samples/sec Loss 7.3353 LearningRate 0.2073 Epoch: 10 Global Step: 54930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:58:02,722-Speed 10478.73 samples/sec Loss 7.2961 LearningRate 0.2072 Epoch: 10 Global Step: 54940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:58:10,572-Speed 10437.90 samples/sec Loss 7.4022 LearningRate 0.2071 Epoch: 10 Global Step: 54950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:58:18,363-Speed 10516.07 samples/sec Loss 7.3818 LearningRate 0.2070 Epoch: 10 Global Step: 54960 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 02:58:26,173-Speed 10489.49 samples/sec Loss 7.3772 LearningRate 0.2069 Epoch: 10 Global Step: 54970 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 02:58:33,975-Speed 10501.95 samples/sec Loss 7.3823 LearningRate 0.2068 Epoch: 10 Global Step: 54980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:58:41,783-Speed 10492.83 samples/sec Loss 7.3895 LearningRate 0.2068 Epoch: 10 Global Step: 54990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:58:49,600-Speed 10481.25 samples/sec Loss 7.3182 LearningRate 0.2067 Epoch: 10 Global Step: 55000 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:58:59,249-Speed 8491.13 samples/sec Loss 7.3697 LearningRate 0.2066 Epoch: 10 Global Step: 55010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:59:07,038-Speed 10518.98 samples/sec Loss 7.2973 LearningRate 0.2065 Epoch: 10 Global Step: 55020 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:59:14,838-Speed 10503.71 samples/sec Loss 7.3937 LearningRate 0.2064 Epoch: 10 Global Step: 55030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:59:22,641-Speed 10500.32 samples/sec Loss 7.3325 LearningRate 0.2063 Epoch: 10 Global Step: 55040 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:59:30,466-Speed 10469.66 samples/sec Loss 7.3461 LearningRate 0.2062 Epoch: 10 Global Step: 55050 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:59:38,293-Speed 10468.05 samples/sec Loss 7.4069 LearningRate 0.2062 Epoch: 10 Global Step: 55060 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:59:46,115-Speed 10474.74 samples/sec Loss 7.3137 LearningRate 0.2061 Epoch: 10 Global Step: 55070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 02:59:53,936-Speed 10475.02 samples/sec Loss 7.3541 LearningRate 0.2060 Epoch: 10 Global Step: 55080 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 03:00:01,773-Speed 10454.69 samples/sec Loss 7.3532 LearningRate 0.2059 Epoch: 10 Global Step: 55090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:00:09,611-Speed 10452.11 samples/sec Loss 7.3552 LearningRate 0.2058 Epoch: 10 Global Step: 55100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:00:17,476-Speed 10417.69 samples/sec Loss 7.3426 LearningRate 0.2057 Epoch: 10 Global Step: 55110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:00:25,301-Speed 10470.73 samples/sec Loss 7.2674 LearningRate 0.2057 Epoch: 10 Global Step: 55120 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:00:33,134-Speed 10458.80 samples/sec Loss 7.3196 LearningRate 0.2056 Epoch: 10 Global Step: 55130 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:00:40,928-Speed 10512.93 samples/sec Loss 7.3458 LearningRate 0.2055 Epoch: 10 Global Step: 55140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:00:48,719-Speed 10516.59 samples/sec Loss 7.3380 LearningRate 0.2054 Epoch: 10 Global Step: 55150 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-16 03:00:56,554-Speed 10455.59 samples/sec Loss 7.3365 LearningRate 0.2053 Epoch: 10 Global Step: 55160 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-16 03:01:04,375-Speed 10476.65 samples/sec Loss 7.3169 LearningRate 0.2052 Epoch: 10 Global Step: 55170 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-16 03:01:12,163-Speed 10520.26 samples/sec Loss 7.2829 LearningRate 0.2051 Epoch: 10 Global Step: 55180 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-16 03:01:19,928-Speed 10551.15 samples/sec Loss 7.3703 LearningRate 0.2051 Epoch: 10 Global Step: 55190 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-16 03:01:27,714-Speed 10522.40 samples/sec Loss 7.3266 LearningRate 0.2050 Epoch: 10 Global Step: 55200 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-16 03:01:35,500-Speed 10522.88 samples/sec Loss 7.3430 LearningRate 0.2049 Epoch: 10 Global Step: 55210 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-16 03:01:43,274-Speed 10540.00 samples/sec Loss 7.3129 LearningRate 0.2048 Epoch: 10 Global Step: 55220 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-16 03:01:51,051-Speed 10535.81 samples/sec Loss 7.4008 LearningRate 0.2047 Epoch: 10 Global Step: 55230 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-16 03:01:58,877-Speed 10467.34 samples/sec Loss 7.3585 LearningRate 0.2046 Epoch: 10 Global Step: 55240 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-01-16 03:02:06,697-Speed 10477.99 samples/sec Loss 7.3373 LearningRate 0.2046 Epoch: 10 Global Step: 55250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 03:02:14,490-Speed 10514.57 samples/sec Loss 7.2922 LearningRate 0.2045 Epoch: 10 Global Step: 55260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 03:02:22,289-Speed 10505.57 samples/sec Loss 7.2726 LearningRate 0.2044 Epoch: 10 Global Step: 55270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 03:02:30,061-Speed 10540.97 samples/sec Loss 7.3225 LearningRate 0.2043 Epoch: 10 Global Step: 55280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 03:02:37,871-Speed 10490.33 samples/sec Loss 7.2342 LearningRate 0.2042 Epoch: 10 Global Step: 55290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 03:02:45,691-Speed 10477.59 samples/sec Loss 7.3368 LearningRate 0.2041 Epoch: 10 Global Step: 55300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 03:02:53,493-Speed 10503.07 samples/sec Loss 7.3113 LearningRate 0.2040 Epoch: 10 Global Step: 55310 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 03:03:01,286-Speed 10513.86 samples/sec Loss 7.2735 LearningRate 0.2040 Epoch: 10 Global Step: 55320 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 03:03:09,104-Speed 10479.63 samples/sec Loss 7.2738 LearningRate 0.2039 Epoch: 10 Global Step: 55330 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 03:03:16,956-Speed 10434.42 samples/sec Loss 7.2810 LearningRate 0.2038 Epoch: 10 Global Step: 55340 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 03:03:24,753-Speed 10507.92 samples/sec Loss 7.3085 LearningRate 0.2037 Epoch: 10 Global Step: 55350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:03:32,556-Speed 10504.32 samples/sec Loss 7.2749 LearningRate 0.2036 Epoch: 10 Global Step: 55360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:03:40,340-Speed 10525.30 samples/sec Loss 7.2396 LearningRate 0.2035 Epoch: 10 Global Step: 55370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:03:48,116-Speed 10536.24 samples/sec Loss 7.3050 LearningRate 0.2035 Epoch: 10 Global Step: 55380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:03:55,973-Speed 10427.61 samples/sec Loss 7.3508 LearningRate 0.2034 Epoch: 10 Global Step: 55390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:04:03,784-Speed 10489.10 samples/sec Loss 7.3013 LearningRate 0.2033 Epoch: 10 Global Step: 55400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:04:11,590-Speed 10495.38 samples/sec Loss 7.3087 LearningRate 0.2032 Epoch: 10 Global Step: 55410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:04:19,397-Speed 10495.19 samples/sec Loss 7.3113 LearningRate 0.2031 Epoch: 10 Global Step: 55420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:04:27,233-Speed 10456.03 samples/sec Loss 7.3195 LearningRate 0.2030 Epoch: 10 Global Step: 55430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:04:35,035-Speed 10505.91 samples/sec Loss 7.3246 LearningRate 0.2030 Epoch: 10 Global Step: 55440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:04:42,859-Speed 10470.92 samples/sec Loss 7.3298 LearningRate 0.2029 Epoch: 10 Global Step: 55450 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 03:04:50,680-Speed 10477.11 samples/sec Loss 7.3009 LearningRate 0.2028 Epoch: 10 Global Step: 55460 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 03:04:58,544-Speed 10418.40 samples/sec Loss 7.2364 LearningRate 0.2027 Epoch: 10 Global Step: 55470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:05:06,366-Speed 10479.14 samples/sec Loss 7.2859 LearningRate 0.2026 Epoch: 10 Global Step: 55480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:05:14,152-Speed 10523.23 samples/sec Loss 7.2791 LearningRate 0.2025 Epoch: 10 Global Step: 55490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 03:05:21,967-Speed 10483.94 samples/sec Loss 7.2579 LearningRate 0.2024 Epoch: 10 Global Step: 55500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 03:05:29,789-Speed 10474.99 samples/sec Loss 7.2874 LearningRate 0.2024 Epoch: 10 Global Step: 55510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 03:05:37,585-Speed 10509.60 samples/sec Loss 7.3012 LearningRate 0.2023 Epoch: 10 Global Step: 55520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 03:05:45,362-Speed 10534.71 samples/sec Loss 7.2824 LearningRate 0.2022 Epoch: 10 Global Step: 55530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 03:05:53,180-Speed 10479.10 samples/sec Loss 7.2630 LearningRate 0.2021 Epoch: 10 Global Step: 55540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 03:06:00,966-Speed 10524.12 samples/sec Loss 7.2707 LearningRate 0.2020 Epoch: 10 Global Step: 55550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 03:06:08,763-Speed 10508.35 samples/sec Loss 7.2579 LearningRate 0.2019 Epoch: 10 Global Step: 55560 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 03:06:16,594-Speed 10462.07 samples/sec Loss 7.2257 LearningRate 0.2019 Epoch: 10 Global Step: 55570 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 03:06:24,379-Speed 10523.66 samples/sec Loss 7.2237 LearningRate 0.2018 Epoch: 10 Global Step: 55580 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-01-16 03:06:32,174-Speed 10511.41 samples/sec Loss 7.2773 LearningRate 0.2017 Epoch: 10 Global Step: 55590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:06:39,990-Speed 10482.37 samples/sec Loss 7.2801 LearningRate 0.2016 Epoch: 10 Global Step: 55600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:06:47,775-Speed 10524.18 samples/sec Loss 7.3248 LearningRate 0.2015 Epoch: 10 Global Step: 55610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:06:55,562-Speed 10520.82 samples/sec Loss 7.3118 LearningRate 0.2014 Epoch: 10 Global Step: 55620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:07:03,383-Speed 10475.52 samples/sec Loss 7.2601 LearningRate 0.2014 Epoch: 10 Global Step: 55630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:07:11,223-Speed 10451.26 samples/sec Loss 7.2781 LearningRate 0.2013 Epoch: 10 Global Step: 55640 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:07:19,027-Speed 10497.67 samples/sec Loss 7.2722 LearningRate 0.2012 Epoch: 10 Global Step: 55650 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:07:26,856-Speed 10466.28 samples/sec Loss 7.2778 LearningRate 0.2011 Epoch: 10 Global Step: 55660 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:07:34,664-Speed 10492.83 samples/sec Loss 7.2151 LearningRate 0.2010 Epoch: 10 Global Step: 55670 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:07:42,466-Speed 10501.24 samples/sec Loss 7.2700 LearningRate 0.2009 Epoch: 10 Global Step: 55680 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:07:50,263-Speed 10508.30 samples/sec Loss 7.2859 LearningRate 0.2009 Epoch: 10 Global Step: 55690 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 03:07:58,070-Speed 10494.51 samples/sec Loss 7.1776 LearningRate 0.2008 Epoch: 10 Global Step: 55700 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:08:05,862-Speed 10514.68 samples/sec Loss 7.2395 LearningRate 0.2007 Epoch: 10 Global Step: 55710 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:08:13,641-Speed 10532.89 samples/sec Loss 7.2575 LearningRate 0.2006 Epoch: 10 Global Step: 55720 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:08:21,437-Speed 10508.74 samples/sec Loss 7.2187 LearningRate 0.2005 Epoch: 10 Global Step: 55730 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:08:29,278-Speed 10448.21 samples/sec Loss 7.2231 LearningRate 0.2004 Epoch: 10 Global Step: 55740 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:08:37,097-Speed 10482.89 samples/sec Loss 7.2733 LearningRate 0.2004 Epoch: 10 Global Step: 55750 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:08:44,907-Speed 10492.32 samples/sec Loss 7.2789 LearningRate 0.2003 Epoch: 10 Global Step: 55760 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:08:52,720-Speed 10484.87 samples/sec Loss 7.2354 LearningRate 0.2002 Epoch: 10 Global Step: 55770 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:09:00,514-Speed 10512.37 samples/sec Loss 7.2834 LearningRate 0.2001 Epoch: 10 Global Step: 55780 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:09:08,316-Speed 10501.91 samples/sec Loss 7.2686 LearningRate 0.2000 Epoch: 10 Global Step: 55790 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:09:16,097-Speed 10529.75 samples/sec Loss 7.1975 LearningRate 0.1999 Epoch: 10 Global Step: 55800 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 03:09:23,879-Speed 10528.62 samples/sec Loss 7.2345 LearningRate 0.1999 Epoch: 10 Global Step: 55810 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:09:31,704-Speed 10469.38 samples/sec Loss 7.2399 LearningRate 0.1998 Epoch: 10 Global Step: 55820 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:09:39,478-Speed 10540.39 samples/sec Loss 7.2633 LearningRate 0.1997 Epoch: 10 Global Step: 55830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:09:47,273-Speed 10509.91 samples/sec Loss 7.2597 LearningRate 0.1996 Epoch: 10 Global Step: 55840 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:09:55,088-Speed 10484.28 samples/sec Loss 7.2196 LearningRate 0.1995 Epoch: 10 Global Step: 55850 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:10:02,901-Speed 10486.30 samples/sec Loss 7.2370 LearningRate 0.1994 Epoch: 10 Global Step: 55860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:10:10,707-Speed 10497.35 samples/sec Loss 7.2525 LearningRate 0.1994 Epoch: 10 Global Step: 55870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:10:18,482-Speed 10536.35 samples/sec Loss 7.2155 LearningRate 0.1993 Epoch: 10 Global Step: 55880 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:10:26,276-Speed 10513.49 samples/sec Loss 7.2719 LearningRate 0.1992 Epoch: 10 Global Step: 55890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:10:34,048-Speed 10541.66 samples/sec Loss 7.2161 LearningRate 0.1991 Epoch: 10 Global Step: 55900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:10:41,867-Speed 10478.23 samples/sec Loss 7.2325 LearningRate 0.1990 Epoch: 10 Global Step: 55910 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-01-16 03:10:49,686-Speed 10478.76 samples/sec Loss 7.2121 LearningRate 0.1989 Epoch: 10 Global Step: 55920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-01-16 03:10:57,483-Speed 10509.36 samples/sec Loss 7.1976 LearningRate 0.1989 Epoch: 10 Global Step: 55930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:11:05,282-Speed 10505.60 samples/sec Loss 7.2435 LearningRate 0.1988 Epoch: 10 Global Step: 55940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:11:13,070-Speed 10520.45 samples/sec Loss 7.2102 LearningRate 0.1987 Epoch: 10 Global Step: 55950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:11:20,872-Speed 10501.74 samples/sec Loss 7.2460 LearningRate 0.1986 Epoch: 10 Global Step: 55960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:11:28,694-Speed 10473.38 samples/sec Loss 7.2125 LearningRate 0.1985 Epoch: 10 Global Step: 55970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:11:36,481-Speed 10522.57 samples/sec Loss 7.2088 LearningRate 0.1984 Epoch: 10 Global Step: 55980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:11:44,255-Speed 10538.71 samples/sec Loss 7.1549 LearningRate 0.1984 Epoch: 10 Global Step: 55990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:11:52,049-Speed 10511.37 samples/sec Loss 7.1991 LearningRate 0.1983 Epoch: 10 Global Step: 56000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:11:59,842-Speed 10514.19 samples/sec Loss 7.1972 LearningRate 0.1982 Epoch: 10 Global Step: 56010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:12:07,634-Speed 10515.71 samples/sec Loss 7.1746 LearningRate 0.1981 Epoch: 10 Global Step: 56020 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 03:12:15,411-Speed 10535.28 samples/sec Loss 7.2329 LearningRate 0.1980 Epoch: 10 Global Step: 56030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:12:23,198-Speed 10521.69 samples/sec Loss 7.1645 LearningRate 0.1979 Epoch: 10 Global Step: 56040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:12:31,000-Speed 10502.33 samples/sec Loss 7.1805 LearningRate 0.1979 Epoch: 10 Global Step: 56050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:12:38,787-Speed 10520.52 samples/sec Loss 7.1747 LearningRate 0.1978 Epoch: 10 Global Step: 56060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:12:46,571-Speed 10525.35 samples/sec Loss 7.1843 LearningRate 0.1977 Epoch: 10 Global Step: 56070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:12:54,378-Speed 10496.54 samples/sec Loss 7.2058 LearningRate 0.1976 Epoch: 10 Global Step: 56080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:13:02,162-Speed 10524.67 samples/sec Loss 7.2215 LearningRate 0.1975 Epoch: 10 Global Step: 56090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:13:09,954-Speed 10514.47 samples/sec Loss 7.1903 LearningRate 0.1974 Epoch: 10 Global Step: 56100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:13:17,749-Speed 10511.51 samples/sec Loss 7.1766 LearningRate 0.1974 Epoch: 10 Global Step: 56110 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:13:25,544-Speed 10511.23 samples/sec Loss 7.2457 LearningRate 0.1973 Epoch: 10 Global Step: 56120 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:13:33,328-Speed 10525.16 samples/sec Loss 7.1817 LearningRate 0.1972 Epoch: 10 Global Step: 56130 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 03:13:41,125-Speed 10507.61 samples/sec Loss 7.1776 LearningRate 0.1971 Epoch: 10 Global Step: 56140 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 03:13:48,928-Speed 10500.62 samples/sec Loss 7.1962 LearningRate 0.1970 Epoch: 10 Global Step: 56150 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 03:13:56,760-Speed 10461.28 samples/sec Loss 7.1609 LearningRate 0.1969 Epoch: 10 Global Step: 56160 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 03:14:04,556-Speed 10508.45 samples/sec Loss 7.2026 LearningRate 0.1969 Epoch: 10 Global Step: 56170 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 03:14:12,325-Speed 10547.36 samples/sec Loss 7.2000 LearningRate 0.1968 Epoch: 10 Global Step: 56180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:14:20,136-Speed 10489.21 samples/sec Loss 7.2141 LearningRate 0.1967 Epoch: 10 Global Step: 56190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:14:27,917-Speed 10528.64 samples/sec Loss 7.1860 LearningRate 0.1966 Epoch: 10 Global Step: 56200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:14:35,698-Speed 10530.17 samples/sec Loss 7.1817 LearningRate 0.1965 Epoch: 10 Global Step: 56210 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:14:43,494-Speed 10509.61 samples/sec Loss 7.1529 LearningRate 0.1964 Epoch: 10 Global Step: 56220 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:14:51,286-Speed 10514.20 samples/sec Loss 7.1493 LearningRate 0.1964 Epoch: 10 Global Step: 56230 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:14:59,084-Speed 10506.33 samples/sec Loss 7.1750 LearningRate 0.1963 Epoch: 10 Global Step: 56240 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:15:06,932-Speed 10439.60 samples/sec Loss 7.2322 LearningRate 0.1962 Epoch: 10 Global Step: 56250 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:15:14,744-Speed 10488.54 samples/sec Loss 7.1761 LearningRate 0.1961 Epoch: 10 Global Step: 56260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:15:22,537-Speed 10513.10 samples/sec Loss 7.1506 LearningRate 0.1960 Epoch: 10 Global Step: 56270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:15:30,328-Speed 10515.61 samples/sec Loss 7.2068 LearningRate 0.1959 Epoch: 10 Global Step: 56280 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 03:15:38,123-Speed 10510.20 samples/sec Loss 7.1597 LearningRate 0.1959 Epoch: 10 Global Step: 56290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:15:45,888-Speed 10551.55 samples/sec Loss 7.2051 LearningRate 0.1958 Epoch: 10 Global Step: 56300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:15:53,687-Speed 10505.86 samples/sec Loss 7.2168 LearningRate 0.1957 Epoch: 10 Global Step: 56310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:16:01,541-Speed 10432.56 samples/sec Loss 7.2193 LearningRate 0.1956 Epoch: 10 Global Step: 56320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:16:09,340-Speed 10504.83 samples/sec Loss 7.1335 LearningRate 0.1955 Epoch: 10 Global Step: 56330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:16:17,177-Speed 10453.88 samples/sec Loss 7.1938 LearningRate 0.1955 Epoch: 10 Global Step: 56340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:16:24,961-Speed 10526.08 samples/sec Loss 7.1784 LearningRate 0.1954 Epoch: 10 Global Step: 56350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:16:32,763-Speed 10500.48 samples/sec Loss 7.1610 LearningRate 0.1953 Epoch: 10 Global Step: 56360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:16:40,628-Speed 10418.44 samples/sec Loss 7.1242 LearningRate 0.1952 Epoch: 10 Global Step: 56370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:16:48,405-Speed 10534.14 samples/sec Loss 7.1035 LearningRate 0.1951 Epoch: 10 Global Step: 56380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:16:56,187-Speed 10528.42 samples/sec Loss 7.1538 LearningRate 0.1950 Epoch: 10 Global Step: 56390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:17:04,000-Speed 10486.21 samples/sec Loss 7.1485 LearningRate 0.1950 Epoch: 10 Global Step: 56400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:17:11,803-Speed 10500.10 samples/sec Loss 7.1579 LearningRate 0.1949 Epoch: 10 Global Step: 56410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:17:19,608-Speed 10496.94 samples/sec Loss 7.1649 LearningRate 0.1948 Epoch: 10 Global Step: 56420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:17:27,395-Speed 10521.59 samples/sec Loss 7.1025 LearningRate 0.1947 Epoch: 10 Global Step: 56430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:17:35,188-Speed 10513.98 samples/sec Loss 7.1643 LearningRate 0.1946 Epoch: 10 Global Step: 56440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:17:42,994-Speed 10495.62 samples/sec Loss 7.1478 LearningRate 0.1945 Epoch: 10 Global Step: 56450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:17:50,780-Speed 10524.11 samples/sec Loss 7.1453 LearningRate 0.1945 Epoch: 10 Global Step: 56460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:17:58,554-Speed 10545.37 samples/sec Loss 7.1646 LearningRate 0.1944 Epoch: 10 Global Step: 56470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:18:06,331-Speed 10538.19 samples/sec Loss 7.1431 LearningRate 0.1943 Epoch: 10 Global Step: 56480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:18:14,120-Speed 10519.94 samples/sec Loss 7.0613 LearningRate 0.1942 Epoch: 10 Global Step: 56490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:18:21,903-Speed 10525.89 samples/sec Loss 7.1523 LearningRate 0.1941 Epoch: 10 Global Step: 56500 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 03:18:29,677-Speed 10538.97 samples/sec Loss 7.1154 LearningRate 0.1940 Epoch: 10 Global Step: 56510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:18:37,491-Speed 10485.64 samples/sec Loss 7.1103 LearningRate 0.1940 Epoch: 10 Global Step: 56520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:18:45,272-Speed 10531.43 samples/sec Loss 7.1264 LearningRate 0.1939 Epoch: 10 Global Step: 56530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:18:53,071-Speed 10504.76 samples/sec Loss 7.1461 LearningRate 0.1938 Epoch: 10 Global Step: 56540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:19:00,860-Speed 10518.95 samples/sec Loss 7.1756 LearningRate 0.1937 Epoch: 10 Global Step: 56550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:19:08,687-Speed 10466.98 samples/sec Loss 7.1157 LearningRate 0.1936 Epoch: 10 Global Step: 56560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:19:16,502-Speed 10484.95 samples/sec Loss 7.1217 LearningRate 0.1936 Epoch: 10 Global Step: 56570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:19:24,277-Speed 10536.70 samples/sec Loss 7.1060 LearningRate 0.1935 Epoch: 10 Global Step: 56580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:19:32,059-Speed 10529.04 samples/sec Loss 7.1486 LearningRate 0.1934 Epoch: 10 Global Step: 56590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:19:39,836-Speed 10534.62 samples/sec Loss 7.1351 LearningRate 0.1933 Epoch: 10 Global Step: 56600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:19:47,636-Speed 10504.99 samples/sec Loss 7.1442 LearningRate 0.1932 Epoch: 10 Global Step: 56610 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 03:19:55,422-Speed 10521.78 samples/sec Loss 7.1234 LearningRate 0.1931 Epoch: 10 Global Step: 56620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:20:03,225-Speed 10499.75 samples/sec Loss 7.1253 LearningRate 0.1931 Epoch: 10 Global Step: 56630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:20:11,005-Speed 10531.98 samples/sec Loss 7.0897 LearningRate 0.1930 Epoch: 10 Global Step: 56640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:20:18,792-Speed 10521.97 samples/sec Loss 7.1201 LearningRate 0.1929 Epoch: 10 Global Step: 56650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:20:26,561-Speed 10545.77 samples/sec Loss 7.0824 LearningRate 0.1928 Epoch: 10 Global Step: 56660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:20:34,362-Speed 10501.28 samples/sec Loss 7.1057 LearningRate 0.1927 Epoch: 10 Global Step: 56670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:20:42,196-Speed 10459.57 samples/sec Loss 7.1237 LearningRate 0.1927 Epoch: 10 Global Step: 56680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:20:50,001-Speed 10497.93 samples/sec Loss 7.1327 LearningRate 0.1926 Epoch: 10 Global Step: 56690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:20:57,798-Speed 10506.97 samples/sec Loss 7.1015 LearningRate 0.1925 Epoch: 10 Global Step: 56700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:21:05,620-Speed 10473.31 samples/sec Loss 7.0983 LearningRate 0.1924 Epoch: 10 Global Step: 56710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:21:13,415-Speed 10512.13 samples/sec Loss 7.1236 LearningRate 0.1923 Epoch: 10 Global Step: 56720 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 03:21:21,229-Speed 10484.86 samples/sec Loss 7.0799 LearningRate 0.1922 Epoch: 10 Global Step: 56730 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 03:21:29,025-Speed 10508.53 samples/sec Loss 7.0848 LearningRate 0.1922 Epoch: 10 Global Step: 56740 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 03:21:36,827-Speed 10501.52 samples/sec Loss 7.1370 LearningRate 0.1921 Epoch: 10 Global Step: 56750 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 03:21:44,634-Speed 10494.71 samples/sec Loss 7.1139 LearningRate 0.1920 Epoch: 10 Global Step: 56760 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 03:21:52,432-Speed 10507.27 samples/sec Loss 7.1016 LearningRate 0.1919 Epoch: 10 Global Step: 56770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:22:00,237-Speed 10496.83 samples/sec Loss 7.0441 LearningRate 0.1918 Epoch: 10 Global Step: 56780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:22:08,040-Speed 10504.00 samples/sec Loss 7.0576 LearningRate 0.1918 Epoch: 10 Global Step: 56790 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:22:15,837-Speed 10508.48 samples/sec Loss 7.0854 LearningRate 0.1917 Epoch: 10 Global Step: 56800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:22:23,638-Speed 10503.08 samples/sec Loss 7.1303 LearningRate 0.1916 Epoch: 10 Global Step: 56810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:22:31,448-Speed 10490.50 samples/sec Loss 7.0975 LearningRate 0.1915 Epoch: 10 Global Step: 56820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:22:39,271-Speed 10472.86 samples/sec Loss 7.0294 LearningRate 0.1914 Epoch: 10 Global Step: 56830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:22:47,095-Speed 10471.76 samples/sec Loss 7.1357 LearningRate 0.1913 Epoch: 10 Global Step: 56840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:22:54,873-Speed 10533.63 samples/sec Loss 7.0721 LearningRate 0.1913 Epoch: 10 Global Step: 56850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:23:02,689-Speed 10482.54 samples/sec Loss 7.0766 LearningRate 0.1912 Epoch: 10 Global Step: 56860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:23:10,488-Speed 10504.72 samples/sec Loss 7.0867 LearningRate 0.1911 Epoch: 10 Global Step: 56870 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 03:23:18,288-Speed 10504.90 samples/sec Loss 7.0730 LearningRate 0.1910 Epoch: 10 Global Step: 56880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:23:26,106-Speed 10479.57 samples/sec Loss 7.0743 LearningRate 0.1909 Epoch: 10 Global Step: 56890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:23:33,912-Speed 10495.46 samples/sec Loss 7.1506 LearningRate 0.1909 Epoch: 10 Global Step: 56900 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:23:41,724-Speed 10488.18 samples/sec Loss 7.1126 LearningRate 0.1908 Epoch: 10 Global Step: 56910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:23:49,502-Speed 10533.40 samples/sec Loss 7.1517 LearningRate 0.1907 Epoch: 10 Global Step: 56920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:23:57,297-Speed 10511.91 samples/sec Loss 7.1024 LearningRate 0.1906 Epoch: 10 Global Step: 56930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:24:05,098-Speed 10503.59 samples/sec Loss 7.1139 LearningRate 0.1905 Epoch: 10 Global Step: 56940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:24:12,882-Speed 10524.61 samples/sec Loss 7.1162 LearningRate 0.1904 Epoch: 10 Global Step: 56950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:24:20,707-Speed 10470.59 samples/sec Loss 7.1361 LearningRate 0.1904 Epoch: 10 Global Step: 56960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:24:28,526-Speed 10478.22 samples/sec Loss 7.0317 LearningRate 0.1903 Epoch: 10 Global Step: 56970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:24:36,340-Speed 10485.07 samples/sec Loss 7.0921 LearningRate 0.1902 Epoch: 10 Global Step: 56980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:24:44,175-Speed 10457.36 samples/sec Loss 7.0680 LearningRate 0.1901 Epoch: 10 Global Step: 56990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:24:51,953-Speed 10534.60 samples/sec Loss 7.0443 LearningRate 0.1900 Epoch: 10 Global Step: 57000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:24:59,756-Speed 10500.00 samples/sec Loss 7.0600 LearningRate 0.1900 Epoch: 10 Global Step: 57010 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:25:07,552-Speed 10509.95 samples/sec Loss 7.0826 LearningRate 0.1899 Epoch: 10 Global Step: 57020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:25:15,356-Speed 10497.73 samples/sec Loss 7.0776 LearningRate 0.1898 Epoch: 10 Global Step: 57030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:25:37,496-Speed 3700.32 samples/sec Loss 7.0811 LearningRate 0.1897 Epoch: 11 Global Step: 57040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:25:45,257-Speed 10560.68 samples/sec Loss 7.0777 LearningRate 0.1896 Epoch: 11 Global Step: 57050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:25:53,032-Speed 10538.59 samples/sec Loss 7.0473 LearningRate 0.1896 Epoch: 11 Global Step: 57060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:26:00,807-Speed 10537.77 samples/sec Loss 7.0316 LearningRate 0.1895 Epoch: 11 Global Step: 57070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:26:08,583-Speed 10539.17 samples/sec Loss 6.9979 LearningRate 0.1894 Epoch: 11 Global Step: 57080 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:26:16,389-Speed 10495.44 samples/sec Loss 7.0492 LearningRate 0.1893 Epoch: 11 Global Step: 57090 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:26:24,179-Speed 10517.32 samples/sec Loss 7.0739 LearningRate 0.1892 Epoch: 11 Global Step: 57100 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:26:31,971-Speed 10515.01 samples/sec Loss 7.0777 LearningRate 0.1891 Epoch: 11 Global Step: 57110 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:26:39,787-Speed 10483.36 samples/sec Loss 7.0574 LearningRate 0.1891 Epoch: 11 Global Step: 57120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:26:47,589-Speed 10500.51 samples/sec Loss 7.0512 LearningRate 0.1890 Epoch: 11 Global Step: 57130 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:26:55,427-Speed 10453.03 samples/sec Loss 7.0226 LearningRate 0.1889 Epoch: 11 Global Step: 57140 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:27:03,211-Speed 10533.33 samples/sec Loss 7.0130 LearningRate 0.1888 Epoch: 11 Global Step: 57150 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:27:11,018-Speed 10494.83 samples/sec Loss 7.0221 LearningRate 0.1887 Epoch: 11 Global Step: 57160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:27:18,881-Speed 10419.05 samples/sec Loss 7.0460 LearningRate 0.1887 Epoch: 11 Global Step: 57170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:27:26,670-Speed 10518.61 samples/sec Loss 7.0511 LearningRate 0.1886 Epoch: 11 Global Step: 57180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:27:34,437-Speed 10548.73 samples/sec Loss 7.0217 LearningRate 0.1885 Epoch: 11 Global Step: 57190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:27:42,225-Speed 10520.07 samples/sec Loss 7.0404 LearningRate 0.1884 Epoch: 11 Global Step: 57200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:27:50,007-Speed 10529.15 samples/sec Loss 7.0291 LearningRate 0.1883 Epoch: 11 Global Step: 57210 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:27:57,826-Speed 10478.17 samples/sec Loss 6.9708 LearningRate 0.1883 Epoch: 11 Global Step: 57220 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:28:05,635-Speed 10490.79 samples/sec Loss 7.0251 LearningRate 0.1882 Epoch: 11 Global Step: 57230 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:28:13,407-Speed 10543.17 samples/sec Loss 7.0046 LearningRate 0.1881 Epoch: 11 Global Step: 57240 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:28:21,270-Speed 10419.92 samples/sec Loss 7.0267 LearningRate 0.1880 Epoch: 11 Global Step: 57250 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:28:29,059-Speed 10518.14 samples/sec Loss 7.0519 LearningRate 0.1879 Epoch: 11 Global Step: 57260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:28:36,827-Speed 10546.26 samples/sec Loss 7.0070 LearningRate 0.1878 Epoch: 11 Global Step: 57270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:28:44,606-Speed 10532.66 samples/sec Loss 7.0787 LearningRate 0.1878 Epoch: 11 Global Step: 57280 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:28:52,414-Speed 10494.01 samples/sec Loss 7.0759 LearningRate 0.1877 Epoch: 11 Global Step: 57290 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:29:00,194-Speed 10529.80 samples/sec Loss 6.9930 LearningRate 0.1876 Epoch: 11 Global Step: 57300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:29:07,979-Speed 10523.61 samples/sec Loss 7.0270 LearningRate 0.1875 Epoch: 11 Global Step: 57310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:29:15,770-Speed 10516.33 samples/sec Loss 7.0434 LearningRate 0.1874 Epoch: 11 Global Step: 57320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:29:23,564-Speed 10512.52 samples/sec Loss 7.0302 LearningRate 0.1874 Epoch: 11 Global Step: 57330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:29:31,362-Speed 10506.25 samples/sec Loss 7.0227 LearningRate 0.1873 Epoch: 11 Global Step: 57340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:29:39,190-Speed 10468.18 samples/sec Loss 6.9638 LearningRate 0.1872 Epoch: 11 Global Step: 57350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:29:46,978-Speed 10519.48 samples/sec Loss 7.0425 LearningRate 0.1871 Epoch: 11 Global Step: 57360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:29:54,784-Speed 10495.86 samples/sec Loss 7.0419 LearningRate 0.1870 Epoch: 11 Global Step: 57370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:30:02,587-Speed 10500.80 samples/sec Loss 7.0705 LearningRate 0.1870 Epoch: 11 Global Step: 57380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:30:10,377-Speed 10517.97 samples/sec Loss 6.9759 LearningRate 0.1869 Epoch: 11 Global Step: 57390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:30:18,178-Speed 10502.35 samples/sec Loss 7.0173 LearningRate 0.1868 Epoch: 11 Global Step: 57400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:30:25,996-Speed 10479.69 samples/sec Loss 6.9784 LearningRate 0.1867 Epoch: 11 Global Step: 57410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:30:33,867-Speed 10409.87 samples/sec Loss 7.0010 LearningRate 0.1866 Epoch: 11 Global Step: 57420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:30:41,696-Speed 10464.42 samples/sec Loss 6.9496 LearningRate 0.1866 Epoch: 11 Global Step: 57430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:30:49,523-Speed 10468.04 samples/sec Loss 7.0307 LearningRate 0.1865 Epoch: 11 Global Step: 57440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:30:57,393-Speed 10411.10 samples/sec Loss 7.0561 LearningRate 0.1864 Epoch: 11 Global Step: 57450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:31:05,248-Speed 10430.70 samples/sec Loss 7.0342 LearningRate 0.1863 Epoch: 11 Global Step: 57460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:31:13,081-Speed 10458.28 samples/sec Loss 7.0132 LearningRate 0.1862 Epoch: 11 Global Step: 57470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:31:20,914-Speed 10459.49 samples/sec Loss 6.9904 LearningRate 0.1862 Epoch: 11 Global Step: 57480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:31:28,743-Speed 10465.67 samples/sec Loss 7.0180 LearningRate 0.1861 Epoch: 11 Global Step: 57490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:31:36,589-Speed 10442.28 samples/sec Loss 6.9837 LearningRate 0.1860 Epoch: 11 Global Step: 57500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:31:44,427-Speed 10453.48 samples/sec Loss 7.0093 LearningRate 0.1859 Epoch: 11 Global Step: 57510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:31:52,273-Speed 10442.28 samples/sec Loss 7.0101 LearningRate 0.1858 Epoch: 11 Global Step: 57520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:32:00,104-Speed 10463.60 samples/sec Loss 7.0772 LearningRate 0.1857 Epoch: 11 Global Step: 57530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:32:07,921-Speed 10480.05 samples/sec Loss 7.0033 LearningRate 0.1857 Epoch: 11 Global Step: 57540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:32:15,809-Speed 10386.85 samples/sec Loss 7.0158 LearningRate 0.1856 Epoch: 11 Global Step: 57550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:32:23,632-Speed 10473.98 samples/sec Loss 6.9949 LearningRate 0.1855 Epoch: 11 Global Step: 57560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:32:31,467-Speed 10456.94 samples/sec Loss 7.0147 LearningRate 0.1854 Epoch: 11 Global Step: 57570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:32:39,279-Speed 10486.99 samples/sec Loss 6.9665 LearningRate 0.1853 Epoch: 11 Global Step: 57580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:32:47,122-Speed 10446.16 samples/sec Loss 7.0086 LearningRate 0.1853 Epoch: 11 Global Step: 57590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:32:54,965-Speed 10447.28 samples/sec Loss 7.0149 LearningRate 0.1852 Epoch: 11 Global Step: 57600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:33:02,790-Speed 10470.71 samples/sec Loss 6.9893 LearningRate 0.1851 Epoch: 11 Global Step: 57610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:33:10,610-Speed 10476.12 samples/sec Loss 6.9996 LearningRate 0.1850 Epoch: 11 Global Step: 57620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:33:18,457-Speed 10440.95 samples/sec Loss 6.9534 LearningRate 0.1849 Epoch: 11 Global Step: 57630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:33:26,280-Speed 10474.04 samples/sec Loss 7.0139 LearningRate 0.1849 Epoch: 11 Global Step: 57640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:33:34,104-Speed 10472.14 samples/sec Loss 6.9953 LearningRate 0.1848 Epoch: 11 Global Step: 57650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:33:41,973-Speed 10412.62 samples/sec Loss 6.9284 LearningRate 0.1847 Epoch: 11 Global Step: 57660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:33:49,797-Speed 10472.02 samples/sec Loss 6.9726 LearningRate 0.1846 Epoch: 11 Global Step: 57670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:33:57,616-Speed 10477.92 samples/sec Loss 6.9704 LearningRate 0.1845 Epoch: 11 Global Step: 57680 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 03:34:05,393-Speed 10534.73 samples/sec Loss 7.0081 LearningRate 0.1845 Epoch: 11 Global Step: 57690 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 03:34:13,181-Speed 10520.13 samples/sec Loss 6.9356 LearningRate 0.1844 Epoch: 11 Global Step: 57700 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 03:34:20,993-Speed 10488.95 samples/sec Loss 6.9830 LearningRate 0.1843 Epoch: 11 Global Step: 57710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:34:28,813-Speed 10476.91 samples/sec Loss 6.9605 LearningRate 0.1842 Epoch: 11 Global Step: 57720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:34:36,640-Speed 10467.82 samples/sec Loss 6.9846 LearningRate 0.1841 Epoch: 11 Global Step: 57730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:34:44,462-Speed 10479.65 samples/sec Loss 7.0166 LearningRate 0.1841 Epoch: 11 Global Step: 57740 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:34:52,271-Speed 10492.20 samples/sec Loss 6.9421 LearningRate 0.1840 Epoch: 11 Global Step: 57750 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:35:00,080-Speed 10491.92 samples/sec Loss 6.9058 LearningRate 0.1839 Epoch: 11 Global Step: 57760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:35:07,916-Speed 10456.21 samples/sec Loss 6.9238 LearningRate 0.1838 Epoch: 11 Global Step: 57770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:35:15,766-Speed 10436.35 samples/sec Loss 6.9383 LearningRate 0.1837 Epoch: 11 Global Step: 57780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:35:23,588-Speed 10474.54 samples/sec Loss 6.9727 LearningRate 0.1837 Epoch: 11 Global Step: 57790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:35:31,400-Speed 10488.53 samples/sec Loss 6.9523 LearningRate 0.1836 Epoch: 11 Global Step: 57800 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:35:39,251-Speed 10435.29 samples/sec Loss 6.9550 LearningRate 0.1835 Epoch: 11 Global Step: 57810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:35:47,086-Speed 10457.71 samples/sec Loss 6.9634 LearningRate 0.1834 Epoch: 11 Global Step: 57820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:35:54,918-Speed 10460.86 samples/sec Loss 6.9730 LearningRate 0.1833 Epoch: 11 Global Step: 57830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:36:02,769-Speed 10435.21 samples/sec Loss 6.9545 LearningRate 0.1833 Epoch: 11 Global Step: 57840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:36:10,612-Speed 10447.25 samples/sec Loss 6.9407 LearningRate 0.1832 Epoch: 11 Global Step: 57850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:36:18,534-Speed 10342.13 samples/sec Loss 6.9718 LearningRate 0.1831 Epoch: 11 Global Step: 57860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:36:26,340-Speed 10495.08 samples/sec Loss 6.9581 LearningRate 0.1830 Epoch: 11 Global Step: 57870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:36:34,137-Speed 10507.58 samples/sec Loss 6.9938 LearningRate 0.1829 Epoch: 11 Global Step: 57880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:36:41,944-Speed 10495.25 samples/sec Loss 6.9619 LearningRate 0.1829 Epoch: 11 Global Step: 57890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:36:49,756-Speed 10488.24 samples/sec Loss 6.9406 LearningRate 0.1828 Epoch: 11 Global Step: 57900 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:36:57,560-Speed 10499.01 samples/sec Loss 6.9721 LearningRate 0.1827 Epoch: 11 Global Step: 57910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:37:05,361-Speed 10501.53 samples/sec Loss 6.9145 LearningRate 0.1826 Epoch: 11 Global Step: 57920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:37:13,201-Speed 10451.47 samples/sec Loss 6.9625 LearningRate 0.1825 Epoch: 11 Global Step: 57930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:37:21,007-Speed 10495.48 samples/sec Loss 6.9354 LearningRate 0.1825 Epoch: 11 Global Step: 57940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:37:28,816-Speed 10492.10 samples/sec Loss 6.9647 LearningRate 0.1824 Epoch: 11 Global Step: 57950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:37:36,594-Speed 10533.46 samples/sec Loss 6.9082 LearningRate 0.1823 Epoch: 11 Global Step: 57960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:37:44,383-Speed 10518.61 samples/sec Loss 6.9175 LearningRate 0.1822 Epoch: 11 Global Step: 57970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:37:52,196-Speed 10487.52 samples/sec Loss 6.9211 LearningRate 0.1821 Epoch: 11 Global Step: 57980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:38:00,002-Speed 10494.45 samples/sec Loss 6.9619 LearningRate 0.1821 Epoch: 11 Global Step: 57990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:38:07,777-Speed 10537.19 samples/sec Loss 6.9750 LearningRate 0.1820 Epoch: 11 Global Step: 58000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:38:15,564-Speed 10521.67 samples/sec Loss 6.9001 LearningRate 0.1819 Epoch: 11 Global Step: 58010 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:38:23,355-Speed 10517.69 samples/sec Loss 6.9312 LearningRate 0.1818 Epoch: 11 Global Step: 58020 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:38:31,158-Speed 10498.81 samples/sec Loss 6.9348 LearningRate 0.1817 Epoch: 11 Global Step: 58030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:38:38,971-Speed 10493.36 samples/sec Loss 6.9242 LearningRate 0.1817 Epoch: 11 Global Step: 58040 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:38:46,761-Speed 10518.26 samples/sec Loss 6.8676 LearningRate 0.1816 Epoch: 11 Global Step: 58050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:38:54,544-Speed 10527.42 samples/sec Loss 6.9128 LearningRate 0.1815 Epoch: 11 Global Step: 58060 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:39:02,342-Speed 10507.21 samples/sec Loss 6.9361 LearningRate 0.1814 Epoch: 11 Global Step: 58070 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:39:10,148-Speed 10494.71 samples/sec Loss 6.9268 LearningRate 0.1813 Epoch: 11 Global Step: 58080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:39:17,942-Speed 10512.03 samples/sec Loss 6.9340 LearningRate 0.1813 Epoch: 11 Global Step: 58090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:39:25,723-Speed 10530.44 samples/sec Loss 6.9278 LearningRate 0.1812 Epoch: 11 Global Step: 58100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:39:33,560-Speed 10454.05 samples/sec Loss 6.9264 LearningRate 0.1811 Epoch: 11 Global Step: 58110 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:39:41,366-Speed 10495.95 samples/sec Loss 6.8890 LearningRate 0.1810 Epoch: 11 Global Step: 58120 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:39:49,167-Speed 10502.36 samples/sec Loss 6.8726 LearningRate 0.1809 Epoch: 11 Global Step: 58130 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:39:56,958-Speed 10517.45 samples/sec Loss 6.9361 LearningRate 0.1809 Epoch: 11 Global Step: 58140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:40:04,750-Speed 10517.64 samples/sec Loss 6.9660 LearningRate 0.1808 Epoch: 11 Global Step: 58150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:40:12,540-Speed 10516.25 samples/sec Loss 6.9163 LearningRate 0.1807 Epoch: 11 Global Step: 58160 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:40:20,319-Speed 10533.33 samples/sec Loss 6.9664 LearningRate 0.1806 Epoch: 11 Global Step: 58170 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:40:28,104-Speed 10524.36 samples/sec Loss 6.9349 LearningRate 0.1806 Epoch: 11 Global Step: 58180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:40:35,886-Speed 10527.62 samples/sec Loss 6.9017 LearningRate 0.1805 Epoch: 11 Global Step: 58190 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:40:43,662-Speed 10537.22 samples/sec Loss 6.8819 LearningRate 0.1804 Epoch: 11 Global Step: 58200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:40:51,489-Speed 10468.14 samples/sec Loss 6.8984 LearningRate 0.1803 Epoch: 11 Global Step: 58210 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:40:59,247-Speed 10559.76 samples/sec Loss 6.8596 LearningRate 0.1802 Epoch: 11 Global Step: 58220 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:41:07,027-Speed 10531.90 samples/sec Loss 6.9574 LearningRate 0.1802 Epoch: 11 Global Step: 58230 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:41:14,825-Speed 10506.68 samples/sec Loss 6.8583 LearningRate 0.1801 Epoch: 11 Global Step: 58240 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:41:22,629-Speed 10498.35 samples/sec Loss 6.9181 LearningRate 0.1800 Epoch: 11 Global Step: 58250 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:41:30,432-Speed 10499.66 samples/sec Loss 6.9470 LearningRate 0.1799 Epoch: 11 Global Step: 58260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:41:38,253-Speed 10476.39 samples/sec Loss 6.9109 LearningRate 0.1798 Epoch: 11 Global Step: 58270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:41:46,041-Speed 10520.64 samples/sec Loss 6.8975 LearningRate 0.1798 Epoch: 11 Global Step: 58280 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 03:41:53,858-Speed 10480.59 samples/sec Loss 6.9182 LearningRate 0.1797 Epoch: 11 Global Step: 58290 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 03:42:01,667-Speed 10494.97 samples/sec Loss 6.8503 LearningRate 0.1796 Epoch: 11 Global Step: 58300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:42:09,469-Speed 10501.63 samples/sec Loss 6.8729 LearningRate 0.1795 Epoch: 11 Global Step: 58310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:42:17,316-Speed 10441.31 samples/sec Loss 6.8780 LearningRate 0.1794 Epoch: 11 Global Step: 58320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:42:25,138-Speed 10474.29 samples/sec Loss 6.8383 LearningRate 0.1794 Epoch: 11 Global Step: 58330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:42:32,926-Speed 10520.11 samples/sec Loss 6.8593 LearningRate 0.1793 Epoch: 11 Global Step: 58340 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:42:40,729-Speed 10500.89 samples/sec Loss 6.9050 LearningRate 0.1792 Epoch: 11 Global Step: 58350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:42:48,512-Speed 10525.98 samples/sec Loss 6.8870 LearningRate 0.1791 Epoch: 11 Global Step: 58360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:42:56,311-Speed 10504.93 samples/sec Loss 6.9002 LearningRate 0.1790 Epoch: 11 Global Step: 58370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:43:04,114-Speed 10500.13 samples/sec Loss 6.8751 LearningRate 0.1790 Epoch: 11 Global Step: 58380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:43:11,937-Speed 10474.34 samples/sec Loss 6.8985 LearningRate 0.1789 Epoch: 11 Global Step: 58390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:43:19,735-Speed 10506.25 samples/sec Loss 6.9353 LearningRate 0.1788 Epoch: 11 Global Step: 58400 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 03:43:27,538-Speed 10499.60 samples/sec Loss 6.8688 LearningRate 0.1787 Epoch: 11 Global Step: 58410 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 03:43:35,331-Speed 10514.26 samples/sec Loss 6.8857 LearningRate 0.1787 Epoch: 11 Global Step: 58420 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 03:43:43,133-Speed 10501.46 samples/sec Loss 6.8665 LearningRate 0.1786 Epoch: 11 Global Step: 58430 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 03:43:50,931-Speed 10506.74 samples/sec Loss 6.7982 LearningRate 0.1785 Epoch: 11 Global Step: 58440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:43:58,728-Speed 10507.15 samples/sec Loss 6.8546 LearningRate 0.1784 Epoch: 11 Global Step: 58450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:44:06,526-Speed 10507.59 samples/sec Loss 6.8460 LearningRate 0.1783 Epoch: 11 Global Step: 58460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:44:14,320-Speed 10512.62 samples/sec Loss 6.8890 LearningRate 0.1783 Epoch: 11 Global Step: 58470 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:44:22,111-Speed 10514.65 samples/sec Loss 6.8431 LearningRate 0.1782 Epoch: 11 Global Step: 58480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:44:29,887-Speed 10537.52 samples/sec Loss 6.8833 LearningRate 0.1781 Epoch: 11 Global Step: 58490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:44:37,705-Speed 10478.94 samples/sec Loss 6.8288 LearningRate 0.1780 Epoch: 11 Global Step: 58500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:44:45,506-Speed 10503.68 samples/sec Loss 6.8740 LearningRate 0.1779 Epoch: 11 Global Step: 58510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:44:53,312-Speed 10495.74 samples/sec Loss 6.8780 LearningRate 0.1779 Epoch: 11 Global Step: 58520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:45:01,138-Speed 10468.77 samples/sec Loss 6.8752 LearningRate 0.1778 Epoch: 11 Global Step: 58530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:45:08,924-Speed 10522.12 samples/sec Loss 6.8833 LearningRate 0.1777 Epoch: 11 Global Step: 58540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:45:16,713-Speed 10519.65 samples/sec Loss 6.8415 LearningRate 0.1776 Epoch: 11 Global Step: 58550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:45:24,496-Speed 10533.18 samples/sec Loss 6.8313 LearningRate 0.1775 Epoch: 11 Global Step: 58560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:45:32,275-Speed 10531.96 samples/sec Loss 6.8418 LearningRate 0.1775 Epoch: 11 Global Step: 58570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:45:40,086-Speed 10489.94 samples/sec Loss 6.8349 LearningRate 0.1774 Epoch: 11 Global Step: 58580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:45:47,867-Speed 10529.51 samples/sec Loss 6.8051 LearningRate 0.1773 Epoch: 11 Global Step: 58590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:45:55,646-Speed 10532.34 samples/sec Loss 6.8600 LearningRate 0.1772 Epoch: 11 Global Step: 58600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:46:03,450-Speed 10498.91 samples/sec Loss 6.8867 LearningRate 0.1772 Epoch: 11 Global Step: 58610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:46:11,232-Speed 10528.00 samples/sec Loss 6.8867 LearningRate 0.1771 Epoch: 11 Global Step: 58620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:46:19,008-Speed 10536.83 samples/sec Loss 6.8248 LearningRate 0.1770 Epoch: 11 Global Step: 58630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:46:26,789-Speed 10528.65 samples/sec Loss 6.8261 LearningRate 0.1769 Epoch: 11 Global Step: 58640 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 03:46:34,602-Speed 10486.77 samples/sec Loss 6.8455 LearningRate 0.1768 Epoch: 11 Global Step: 58650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:46:42,402-Speed 10504.58 samples/sec Loss 6.8239 LearningRate 0.1768 Epoch: 11 Global Step: 58660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:46:50,205-Speed 10498.67 samples/sec Loss 6.7835 LearningRate 0.1767 Epoch: 11 Global Step: 58670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:46:58,021-Speed 10483.06 samples/sec Loss 6.8333 LearningRate 0.1766 Epoch: 11 Global Step: 58680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:47:05,843-Speed 10475.05 samples/sec Loss 6.8671 LearningRate 0.1765 Epoch: 11 Global Step: 58690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:47:13,628-Speed 10523.91 samples/sec Loss 6.8317 LearningRate 0.1764 Epoch: 11 Global Step: 58700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:47:21,435-Speed 10495.14 samples/sec Loss 6.7897 LearningRate 0.1764 Epoch: 11 Global Step: 58710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:47:29,227-Speed 10513.77 samples/sec Loss 6.8033 LearningRate 0.1763 Epoch: 11 Global Step: 58720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:47:37,014-Speed 10520.94 samples/sec Loss 6.8162 LearningRate 0.1762 Epoch: 11 Global Step: 58730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:47:44,814-Speed 10505.49 samples/sec Loss 6.8543 LearningRate 0.1761 Epoch: 11 Global Step: 58740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:47:52,604-Speed 10517.47 samples/sec Loss 6.7986 LearningRate 0.1761 Epoch: 11 Global Step: 58750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:48:00,403-Speed 10505.12 samples/sec Loss 6.8032 LearningRate 0.1760 Epoch: 11 Global Step: 58760 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:48:08,190-Speed 10521.55 samples/sec Loss 6.7919 LearningRate 0.1759 Epoch: 11 Global Step: 58770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:48:15,991-Speed 10502.79 samples/sec Loss 6.8143 LearningRate 0.1758 Epoch: 11 Global Step: 58780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:48:23,791-Speed 10503.52 samples/sec Loss 6.8254 LearningRate 0.1757 Epoch: 11 Global Step: 58790 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:48:31,570-Speed 10535.60 samples/sec Loss 6.8148 LearningRate 0.1757 Epoch: 11 Global Step: 58800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:48:39,380-Speed 10490.17 samples/sec Loss 6.8027 LearningRate 0.1756 Epoch: 11 Global Step: 58810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:48:47,176-Speed 10510.44 samples/sec Loss 6.8544 LearningRate 0.1755 Epoch: 11 Global Step: 58820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:48:54,972-Speed 10510.09 samples/sec Loss 6.8276 LearningRate 0.1754 Epoch: 11 Global Step: 58830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:49:02,807-Speed 10455.95 samples/sec Loss 6.8480 LearningRate 0.1754 Epoch: 11 Global Step: 58840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:49:10,621-Speed 10485.32 samples/sec Loss 6.8315 LearningRate 0.1753 Epoch: 11 Global Step: 58850 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 03:49:18,481-Speed 10424.23 samples/sec Loss 6.7977 LearningRate 0.1752 Epoch: 11 Global Step: 58860 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 03:49:26,286-Speed 10497.42 samples/sec Loss 6.8200 LearningRate 0.1751 Epoch: 11 Global Step: 58870 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 03:49:34,096-Speed 10490.90 samples/sec Loss 6.7819 LearningRate 0.1750 Epoch: 11 Global Step: 58880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:49:41,933-Speed 10453.38 samples/sec Loss 6.7892 LearningRate 0.1750 Epoch: 11 Global Step: 58890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:49:49,748-Speed 10484.15 samples/sec Loss 6.8424 LearningRate 0.1749 Epoch: 11 Global Step: 58900 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:49:57,564-Speed 10482.66 samples/sec Loss 6.8013 LearningRate 0.1748 Epoch: 11 Global Step: 58910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:50:05,370-Speed 10495.03 samples/sec Loss 6.7951 LearningRate 0.1747 Epoch: 11 Global Step: 58920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:50:13,174-Speed 10499.22 samples/sec Loss 6.7700 LearningRate 0.1746 Epoch: 11 Global Step: 58930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:50:20,993-Speed 10478.06 samples/sec Loss 6.7776 LearningRate 0.1746 Epoch: 11 Global Step: 58940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:50:28,792-Speed 10505.55 samples/sec Loss 6.7812 LearningRate 0.1745 Epoch: 11 Global Step: 58950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:50:36,599-Speed 10494.38 samples/sec Loss 6.7870 LearningRate 0.1744 Epoch: 11 Global Step: 58960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:50:44,455-Speed 10429.10 samples/sec Loss 6.7901 LearningRate 0.1743 Epoch: 11 Global Step: 58970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:50:52,252-Speed 10508.95 samples/sec Loss 6.7723 LearningRate 0.1743 Epoch: 11 Global Step: 58980 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 03:51:00,052-Speed 10502.78 samples/sec Loss 6.7679 LearningRate 0.1742 Epoch: 11 Global Step: 58990 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 03:51:07,848-Speed 10509.70 samples/sec Loss 6.7902 LearningRate 0.1741 Epoch: 11 Global Step: 59000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:51:15,659-Speed 10490.74 samples/sec Loss 6.8112 LearningRate 0.1740 Epoch: 11 Global Step: 59010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:51:23,451-Speed 10514.07 samples/sec Loss 6.8199 LearningRate 0.1739 Epoch: 11 Global Step: 59020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:51:31,230-Speed 10532.22 samples/sec Loss 6.7999 LearningRate 0.1739 Epoch: 11 Global Step: 59030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:51:39,030-Speed 10503.80 samples/sec Loss 6.7917 LearningRate 0.1738 Epoch: 11 Global Step: 59040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:51:46,820-Speed 10516.99 samples/sec Loss 6.7411 LearningRate 0.1737 Epoch: 11 Global Step: 59050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:51:54,674-Speed 10432.37 samples/sec Loss 6.8560 LearningRate 0.1736 Epoch: 11 Global Step: 59060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:52:02,488-Speed 10484.18 samples/sec Loss 6.7899 LearningRate 0.1736 Epoch: 11 Global Step: 59070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:52:10,309-Speed 10477.44 samples/sec Loss 6.7528 LearningRate 0.1735 Epoch: 11 Global Step: 59080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:52:18,111-Speed 10500.59 samples/sec Loss 6.7848 LearningRate 0.1734 Epoch: 11 Global Step: 59090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:52:25,932-Speed 10481.57 samples/sec Loss 6.7571 LearningRate 0.1733 Epoch: 11 Global Step: 59100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:52:33,729-Speed 10508.39 samples/sec Loss 6.7545 LearningRate 0.1732 Epoch: 11 Global Step: 59110 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:52:41,521-Speed 10515.40 samples/sec Loss 6.7742 LearningRate 0.1732 Epoch: 11 Global Step: 59120 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:52:49,304-Speed 10526.56 samples/sec Loss 6.7610 LearningRate 0.1731 Epoch: 11 Global Step: 59130 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:52:57,092-Speed 10519.97 samples/sec Loss 6.7532 LearningRate 0.1730 Epoch: 11 Global Step: 59140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:53:04,902-Speed 10491.20 samples/sec Loss 6.7326 LearningRate 0.1729 Epoch: 11 Global Step: 59150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:53:12,690-Speed 10520.59 samples/sec Loss 6.8027 LearningRate 0.1729 Epoch: 11 Global Step: 59160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:53:20,509-Speed 10478.06 samples/sec Loss 6.7489 LearningRate 0.1728 Epoch: 11 Global Step: 59170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:53:28,353-Speed 10445.97 samples/sec Loss 6.7381 LearningRate 0.1727 Epoch: 11 Global Step: 59180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:53:36,138-Speed 10524.19 samples/sec Loss 6.6779 LearningRate 0.1726 Epoch: 11 Global Step: 59190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:53:43,923-Speed 10525.00 samples/sec Loss 6.7521 LearningRate 0.1725 Epoch: 11 Global Step: 59200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:53:51,717-Speed 10511.08 samples/sec Loss 6.7549 LearningRate 0.1725 Epoch: 11 Global Step: 59210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:53:59,546-Speed 10464.45 samples/sec Loss 6.7252 LearningRate 0.1724 Epoch: 11 Global Step: 59220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:54:07,383-Speed 10454.47 samples/sec Loss 6.7434 LearningRate 0.1723 Epoch: 11 Global Step: 59230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:54:15,172-Speed 10523.37 samples/sec Loss 6.7729 LearningRate 0.1722 Epoch: 11 Global Step: 59240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:54:22,968-Speed 10509.25 samples/sec Loss 6.8126 LearningRate 0.1722 Epoch: 11 Global Step: 59250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:54:30,764-Speed 10510.05 samples/sec Loss 6.7671 LearningRate 0.1721 Epoch: 11 Global Step: 59260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:54:38,558-Speed 10512.10 samples/sec Loss 6.7730 LearningRate 0.1720 Epoch: 11 Global Step: 59270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:54:46,372-Speed 10485.97 samples/sec Loss 6.7545 LearningRate 0.1719 Epoch: 11 Global Step: 59280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:54:54,140-Speed 10547.16 samples/sec Loss 6.7268 LearningRate 0.1719 Epoch: 11 Global Step: 59290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:55:01,942-Speed 10501.71 samples/sec Loss 6.7619 LearningRate 0.1718 Epoch: 11 Global Step: 59300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:55:09,724-Speed 10527.52 samples/sec Loss 6.7775 LearningRate 0.1717 Epoch: 11 Global Step: 59310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:55:17,517-Speed 10514.34 samples/sec Loss 6.7669 LearningRate 0.1716 Epoch: 11 Global Step: 59320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:55:25,297-Speed 10531.47 samples/sec Loss 6.7383 LearningRate 0.1715 Epoch: 11 Global Step: 59330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:55:33,076-Speed 10531.58 samples/sec Loss 6.7363 LearningRate 0.1715 Epoch: 11 Global Step: 59340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:55:40,874-Speed 10506.94 samples/sec Loss 6.7385 LearningRate 0.1714 Epoch: 11 Global Step: 59350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:55:48,652-Speed 10533.54 samples/sec Loss 6.7157 LearningRate 0.1713 Epoch: 11 Global Step: 59360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:55:56,455-Speed 10506.07 samples/sec Loss 6.7730 LearningRate 0.1712 Epoch: 11 Global Step: 59370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:56:04,244-Speed 10518.16 samples/sec Loss 6.7452 LearningRate 0.1712 Epoch: 11 Global Step: 59380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:56:12,029-Speed 10523.86 samples/sec Loss 6.7402 LearningRate 0.1711 Epoch: 11 Global Step: 59390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:56:19,810-Speed 10530.85 samples/sec Loss 6.7128 LearningRate 0.1710 Epoch: 11 Global Step: 59400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:56:27,608-Speed 10505.96 samples/sec Loss 6.6972 LearningRate 0.1709 Epoch: 11 Global Step: 59410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:56:35,391-Speed 10527.50 samples/sec Loss 6.7476 LearningRate 0.1708 Epoch: 11 Global Step: 59420 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:56:43,205-Speed 10485.47 samples/sec Loss 6.7481 LearningRate 0.1708 Epoch: 11 Global Step: 59430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:56:51,026-Speed 10474.64 samples/sec Loss 6.7276 LearningRate 0.1707 Epoch: 11 Global Step: 59440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:56:58,829-Speed 10504.00 samples/sec Loss 6.6948 LearningRate 0.1706 Epoch: 11 Global Step: 59450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:57:06,647-Speed 10478.49 samples/sec Loss 6.6802 LearningRate 0.1705 Epoch: 11 Global Step: 59460 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:57:14,423-Speed 10536.61 samples/sec Loss 6.6895 LearningRate 0.1705 Epoch: 11 Global Step: 59470 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:57:22,207-Speed 10526.04 samples/sec Loss 6.7169 LearningRate 0.1704 Epoch: 11 Global Step: 59480 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:57:30,027-Speed 10477.70 samples/sec Loss 6.7688 LearningRate 0.1703 Epoch: 11 Global Step: 59490 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:57:37,816-Speed 10517.56 samples/sec Loss 6.7677 LearningRate 0.1702 Epoch: 11 Global Step: 59500 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:57:45,604-Speed 10520.17 samples/sec Loss 6.7377 LearningRate 0.1702 Epoch: 11 Global Step: 59510 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:57:53,399-Speed 10510.15 samples/sec Loss 6.6921 LearningRate 0.1701 Epoch: 11 Global Step: 59520 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:58:01,192-Speed 10514.43 samples/sec Loss 6.7579 LearningRate 0.1700 Epoch: 11 Global Step: 59530 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:58:09,030-Speed 10452.23 samples/sec Loss 6.7088 LearningRate 0.1699 Epoch: 11 Global Step: 59540 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:58:16,829-Speed 10504.61 samples/sec Loss 6.7377 LearningRate 0.1698 Epoch: 11 Global Step: 59550 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:58:24,609-Speed 10531.02 samples/sec Loss 6.6959 LearningRate 0.1698 Epoch: 11 Global Step: 59560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:58:32,395-Speed 10523.47 samples/sec Loss 6.7502 LearningRate 0.1697 Epoch: 11 Global Step: 59570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 03:58:40,160-Speed 10551.21 samples/sec Loss 6.7028 LearningRate 0.1696 Epoch: 11 Global Step: 59580 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:58:47,970-Speed 10491.03 samples/sec Loss 6.7429 LearningRate 0.1695 Epoch: 11 Global Step: 59590 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:58:55,763-Speed 10513.92 samples/sec Loss 6.6774 LearningRate 0.1695 Epoch: 11 Global Step: 59600 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:59:03,558-Speed 10510.56 samples/sec Loss 6.7037 LearningRate 0.1694 Epoch: 11 Global Step: 59610 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:59:11,353-Speed 10510.56 samples/sec Loss 6.7361 LearningRate 0.1693 Epoch: 11 Global Step: 59620 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:59:19,151-Speed 10506.01 samples/sec Loss 6.6915 LearningRate 0.1692 Epoch: 11 Global Step: 59630 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:59:26,944-Speed 10514.08 samples/sec Loss 6.6598 LearningRate 0.1692 Epoch: 11 Global Step: 59640 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:59:34,741-Speed 10508.09 samples/sec Loss 6.6865 LearningRate 0.1691 Epoch: 11 Global Step: 59650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:59:42,543-Speed 10501.09 samples/sec Loss 6.6846 LearningRate 0.1690 Epoch: 11 Global Step: 59660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:59:50,352-Speed 10491.41 samples/sec Loss 6.6931 LearningRate 0.1689 Epoch: 11 Global Step: 59670 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 03:59:58,187-Speed 10457.56 samples/sec Loss 6.7101 LearningRate 0.1688 Epoch: 11 Global Step: 59680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:00:06,044-Speed 10428.01 samples/sec Loss 6.7191 LearningRate 0.1688 Epoch: 11 Global Step: 59690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:00:13,882-Speed 10452.53 samples/sec Loss 6.6368 LearningRate 0.1687 Epoch: 11 Global Step: 59700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:00:21,695-Speed 10486.95 samples/sec Loss 6.7041 LearningRate 0.1686 Epoch: 11 Global Step: 59710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:00:29,496-Speed 10504.79 samples/sec Loss 6.6916 LearningRate 0.1685 Epoch: 11 Global Step: 59720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:00:37,302-Speed 10494.67 samples/sec Loss 6.6843 LearningRate 0.1685 Epoch: 11 Global Step: 59730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:00:45,115-Speed 10487.43 samples/sec Loss 6.7329 LearningRate 0.1684 Epoch: 11 Global Step: 59740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:00:52,914-Speed 10504.85 samples/sec Loss 6.6796 LearningRate 0.1683 Epoch: 11 Global Step: 59750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:01:00,741-Speed 10467.54 samples/sec Loss 6.6774 LearningRate 0.1682 Epoch: 11 Global Step: 59760 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:01:08,563-Speed 10474.82 samples/sec Loss 6.6714 LearningRate 0.1682 Epoch: 11 Global Step: 59770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:01:16,355-Speed 10513.64 samples/sec Loss 6.6801 LearningRate 0.1681 Epoch: 11 Global Step: 59780 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 04:01:24,170-Speed 10483.83 samples/sec Loss 6.6959 LearningRate 0.1680 Epoch: 11 Global Step: 59790 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:01:32,011-Speed 10448.90 samples/sec Loss 6.6811 LearningRate 0.1679 Epoch: 11 Global Step: 59800 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:01:39,811-Speed 10504.00 samples/sec Loss 6.7129 LearningRate 0.1678 Epoch: 11 Global Step: 59810 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:01:47,640-Speed 10466.17 samples/sec Loss 6.7470 LearningRate 0.1678 Epoch: 11 Global Step: 59820 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:01:55,443-Speed 10499.55 samples/sec Loss 6.7100 LearningRate 0.1677 Epoch: 11 Global Step: 59830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:02:03,221-Speed 10534.26 samples/sec Loss 6.6551 LearningRate 0.1676 Epoch: 11 Global Step: 59840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:02:11,033-Speed 10487.84 samples/sec Loss 6.6609 LearningRate 0.1675 Epoch: 11 Global Step: 59850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:02:18,960-Speed 10336.00 samples/sec Loss 6.6771 LearningRate 0.1675 Epoch: 11 Global Step: 59860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:02:26,774-Speed 10485.76 samples/sec Loss 6.6698 LearningRate 0.1674 Epoch: 11 Global Step: 59870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:02:34,569-Speed 10510.30 samples/sec Loss 6.7050 LearningRate 0.1673 Epoch: 11 Global Step: 59880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:02:42,368-Speed 10506.63 samples/sec Loss 6.6511 LearningRate 0.1672 Epoch: 11 Global Step: 59890 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 04:02:50,160-Speed 10513.71 samples/sec Loss 6.6642 LearningRate 0.1672 Epoch: 11 Global Step: 59900 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 04:02:57,961-Speed 10502.57 samples/sec Loss 6.6957 LearningRate 0.1671 Epoch: 11 Global Step: 59910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:03:05,758-Speed 10508.18 samples/sec Loss 6.6835 LearningRate 0.1670 Epoch: 11 Global Step: 59920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:03:13,577-Speed 10478.39 samples/sec Loss 6.7115 LearningRate 0.1669 Epoch: 11 Global Step: 59930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:03:21,399-Speed 10475.02 samples/sec Loss 6.7007 LearningRate 0.1669 Epoch: 11 Global Step: 59940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:03:29,179-Speed 10531.51 samples/sec Loss 6.6850 LearningRate 0.1668 Epoch: 11 Global Step: 59950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:03:37,011-Speed 10459.50 samples/sec Loss 6.6529 LearningRate 0.1667 Epoch: 11 Global Step: 59960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:03:44,812-Speed 10503.47 samples/sec Loss 6.6214 LearningRate 0.1666 Epoch: 11 Global Step: 59970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:03:52,603-Speed 10516.76 samples/sec Loss 6.6482 LearningRate 0.1665 Epoch: 11 Global Step: 59980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:04:00,395-Speed 10513.50 samples/sec Loss 6.6261 LearningRate 0.1665 Epoch: 11 Global Step: 59990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:04:08,189-Speed 10512.33 samples/sec Loss 6.6901 LearningRate 0.1664 Epoch: 11 Global Step: 60000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:04:36,334-[lfw][60000]XNorm: 24.240593 Training: 2022-01-16 04:04:36,335-[lfw][60000]Accuracy-Flip: 0.99733+-0.00291 Training: 2022-01-16 04:04:36,336-[lfw][60000]Accuracy-Highest: 0.99783 Training: 2022-01-16 04:05:08,911-[cfp_fp][60000]XNorm: 21.586307 Training: 2022-01-16 04:05:08,911-[cfp_fp][60000]Accuracy-Flip: 0.98500+-0.00448 Training: 2022-01-16 04:05:08,912-[cfp_fp][60000]Accuracy-Highest: 0.98500 Training: 2022-01-16 04:05:36,980-[agedb_30][60000]XNorm: 23.682680 Training: 2022-01-16 04:05:36,980-[agedb_30][60000]Accuracy-Flip: 0.97067+-0.00803 Training: 2022-01-16 04:05:36,981-[agedb_30][60000]Accuracy-Highest: 0.97067 Training: 2022-01-16 04:05:44,744-Speed 848.48 samples/sec Loss 6.6711 LearningRate 0.1663 Epoch: 11 Global Step: 60010 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 04:05:52,479-Speed 10593.11 samples/sec Loss 6.6805 LearningRate 0.1662 Epoch: 11 Global Step: 60020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:06:00,218-Speed 10586.34 samples/sec Loss 6.6357 LearningRate 0.1662 Epoch: 11 Global Step: 60030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:06:07,979-Speed 10557.01 samples/sec Loss 6.6487 LearningRate 0.1661 Epoch: 11 Global Step: 60040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:06:15,733-Speed 10565.79 samples/sec Loss 6.6538 LearningRate 0.1660 Epoch: 11 Global Step: 60050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:06:23,478-Speed 10579.17 samples/sec Loss 6.6329 LearningRate 0.1659 Epoch: 11 Global Step: 60060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:06:31,273-Speed 10511.12 samples/sec Loss 6.6739 LearningRate 0.1659 Epoch: 11 Global Step: 60070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:06:39,054-Speed 10528.68 samples/sec Loss 6.6608 LearningRate 0.1658 Epoch: 11 Global Step: 60080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:06:46,839-Speed 10524.26 samples/sec Loss 6.6468 LearningRate 0.1657 Epoch: 11 Global Step: 60090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:06:54,611-Speed 10541.54 samples/sec Loss 6.6389 LearningRate 0.1656 Epoch: 11 Global Step: 60100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:07:02,386-Speed 10538.69 samples/sec Loss 6.6255 LearningRate 0.1656 Epoch: 11 Global Step: 60110 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:07:10,158-Speed 10540.69 samples/sec Loss 6.6361 LearningRate 0.1655 Epoch: 11 Global Step: 60120 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 04:07:17,961-Speed 10499.63 samples/sec Loss 6.6518 LearningRate 0.1654 Epoch: 11 Global Step: 60130 Fp16 Grad Scale: 262144 Required: 10 hours Training: 2022-01-16 04:07:25,745-Speed 10526.80 samples/sec Loss 6.6438 LearningRate 0.1653 Epoch: 11 Global Step: 60140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:07:33,514-Speed 10545.32 samples/sec Loss 6.6527 LearningRate 0.1653 Epoch: 11 Global Step: 60150 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 04:07:41,273-Speed 10559.33 samples/sec Loss 6.6263 LearningRate 0.1652 Epoch: 11 Global Step: 60160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 04:07:49,042-Speed 10546.56 samples/sec Loss 6.6178 LearningRate 0.1651 Epoch: 11 Global Step: 60170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 04:07:56,809-Speed 10549.39 samples/sec Loss 6.6040 LearningRate 0.1650 Epoch: 11 Global Step: 60180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 04:08:04,602-Speed 10512.60 samples/sec Loss 6.5735 LearningRate 0.1650 Epoch: 11 Global Step: 60190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 04:08:12,463-Speed 10422.71 samples/sec Loss 6.6322 LearningRate 0.1649 Epoch: 11 Global Step: 60200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 04:08:20,312-Speed 10438.98 samples/sec Loss 6.6231 LearningRate 0.1648 Epoch: 11 Global Step: 60210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 04:08:28,076-Speed 10552.24 samples/sec Loss 6.6310 LearningRate 0.1647 Epoch: 11 Global Step: 60220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 04:08:35,854-Speed 10534.59 samples/sec Loss 6.5671 LearningRate 0.1646 Epoch: 11 Global Step: 60230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 04:08:43,664-Speed 10490.52 samples/sec Loss 6.6084 LearningRate 0.1646 Epoch: 11 Global Step: 60240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 04:08:51,449-Speed 10523.61 samples/sec Loss 6.6350 LearningRate 0.1645 Epoch: 11 Global Step: 60250 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:08:59,217-Speed 10548.06 samples/sec Loss 6.6207 LearningRate 0.1644 Epoch: 11 Global Step: 60260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:09:07,000-Speed 10525.56 samples/sec Loss 6.5924 LearningRate 0.1643 Epoch: 11 Global Step: 60270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:09:14,794-Speed 10511.80 samples/sec Loss 6.6173 LearningRate 0.1643 Epoch: 11 Global Step: 60280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:09:22,560-Speed 10550.62 samples/sec Loss 6.5799 LearningRate 0.1642 Epoch: 11 Global Step: 60290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:09:30,354-Speed 10513.05 samples/sec Loss 6.5869 LearningRate 0.1641 Epoch: 11 Global Step: 60300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:09:38,115-Speed 10556.08 samples/sec Loss 6.5874 LearningRate 0.1640 Epoch: 11 Global Step: 60310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:09:45,959-Speed 10445.22 samples/sec Loss 6.5984 LearningRate 0.1640 Epoch: 11 Global Step: 60320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:09:53,778-Speed 10478.09 samples/sec Loss 6.6354 LearningRate 0.1639 Epoch: 11 Global Step: 60330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:10:01,610-Speed 10461.86 samples/sec Loss 6.6585 LearningRate 0.1638 Epoch: 11 Global Step: 60340 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:10:09,370-Speed 10557.07 samples/sec Loss 6.6174 LearningRate 0.1637 Epoch: 11 Global Step: 60350 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:10:17,189-Speed 10478.11 samples/sec Loss 6.6053 LearningRate 0.1637 Epoch: 11 Global Step: 60360 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:10:24,957-Speed 10547.29 samples/sec Loss 6.5930 LearningRate 0.1636 Epoch: 11 Global Step: 60370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:10:32,725-Speed 10547.90 samples/sec Loss 6.6218 LearningRate 0.1635 Epoch: 11 Global Step: 60380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 04:10:40,551-Speed 10468.85 samples/sec Loss 6.5569 LearningRate 0.1634 Epoch: 11 Global Step: 60390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 04:10:48,331-Speed 10530.91 samples/sec Loss 6.6191 LearningRate 0.1634 Epoch: 11 Global Step: 60400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 04:10:56,104-Speed 10543.63 samples/sec Loss 6.6114 LearningRate 0.1633 Epoch: 11 Global Step: 60410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 04:11:03,869-Speed 10551.35 samples/sec Loss 6.5561 LearningRate 0.1632 Epoch: 11 Global Step: 60420 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 04:11:11,636-Speed 10549.26 samples/sec Loss 6.5814 LearningRate 0.1631 Epoch: 11 Global Step: 60430 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 04:11:19,413-Speed 10535.31 samples/sec Loss 6.6179 LearningRate 0.1631 Epoch: 11 Global Step: 60440 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 04:11:27,185-Speed 10541.66 samples/sec Loss 6.5931 LearningRate 0.1630 Epoch: 11 Global Step: 60450 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 04:11:34,964-Speed 10532.96 samples/sec Loss 6.6386 LearningRate 0.1629 Epoch: 11 Global Step: 60460 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 04:11:42,759-Speed 10509.52 samples/sec Loss 6.5714 LearningRate 0.1628 Epoch: 11 Global Step: 60470 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-01-16 04:11:50,509-Speed 10571.61 samples/sec Loss 6.6376 LearningRate 0.1628 Epoch: 11 Global Step: 60480 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:11:58,282-Speed 10541.08 samples/sec Loss 6.5865 LearningRate 0.1627 Epoch: 11 Global Step: 60490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:12:06,059-Speed 10534.31 samples/sec Loss 6.5439 LearningRate 0.1626 Epoch: 11 Global Step: 60500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:12:13,833-Speed 10539.00 samples/sec Loss 6.5821 LearningRate 0.1625 Epoch: 11 Global Step: 60510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:12:21,596-Speed 10554.14 samples/sec Loss 6.5758 LearningRate 0.1625 Epoch: 11 Global Step: 60520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-01-16 04:12:29,370-Speed 10540.13 samples/sec Loss 6.5448 LearningRate 0.1624 Epoch: 11 Global Step: 60530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:12:37,162-Speed 10515.44 samples/sec Loss 6.5459 LearningRate 0.1623 Epoch: 11 Global Step: 60540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:12:44,936-Speed 10538.12 samples/sec Loss 6.5852 LearningRate 0.1622 Epoch: 11 Global Step: 60550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:12:52,686-Speed 10571.18 samples/sec Loss 6.5935 LearningRate 0.1622 Epoch: 11 Global Step: 60560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:13:00,438-Speed 10568.98 samples/sec Loss 6.5365 LearningRate 0.1621 Epoch: 11 Global Step: 60570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:13:08,210-Speed 10541.63 samples/sec Loss 6.5749 LearningRate 0.1620 Epoch: 11 Global Step: 60580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:13:16,015-Speed 10496.84 samples/sec Loss 6.5721 LearningRate 0.1619 Epoch: 11 Global Step: 60590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:13:23,809-Speed 10512.95 samples/sec Loss 6.6169 LearningRate 0.1619 Epoch: 11 Global Step: 60600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:13:31,598-Speed 10518.64 samples/sec Loss 6.5676 LearningRate 0.1618 Epoch: 11 Global Step: 60610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:13:39,444-Speed 10441.97 samples/sec Loss 6.5627 LearningRate 0.1617 Epoch: 11 Global Step: 60620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:13:47,220-Speed 10537.76 samples/sec Loss 6.5812 LearningRate 0.1616 Epoch: 11 Global Step: 60630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:13:54,991-Speed 10543.28 samples/sec Loss 6.5855 LearningRate 0.1616 Epoch: 11 Global Step: 60640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:14:02,783-Speed 10515.06 samples/sec Loss 6.5291 LearningRate 0.1615 Epoch: 11 Global Step: 60650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:14:10,565-Speed 10529.16 samples/sec Loss 6.6160 LearningRate 0.1614 Epoch: 11 Global Step: 60660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:14:18,412-Speed 10441.79 samples/sec Loss 6.6028 LearningRate 0.1613 Epoch: 11 Global Step: 60670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:14:26,198-Speed 10522.12 samples/sec Loss 6.5626 LearningRate 0.1613 Epoch: 11 Global Step: 60680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:14:33,987-Speed 10519.39 samples/sec Loss 6.5886 LearningRate 0.1612 Epoch: 11 Global Step: 60690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:14:41,786-Speed 10504.10 samples/sec Loss 6.5970 LearningRate 0.1611 Epoch: 11 Global Step: 60700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:14:49,609-Speed 10474.14 samples/sec Loss 6.5315 LearningRate 0.1610 Epoch: 11 Global Step: 60710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:14:57,410-Speed 10503.49 samples/sec Loss 6.5593 LearningRate 0.1610 Epoch: 11 Global Step: 60720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:15:05,221-Speed 10488.12 samples/sec Loss 6.5196 LearningRate 0.1609 Epoch: 11 Global Step: 60730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:15:13,016-Speed 10510.99 samples/sec Loss 6.5845 LearningRate 0.1608 Epoch: 11 Global Step: 60740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:15:20,793-Speed 10535.46 samples/sec Loss 6.5463 LearningRate 0.1607 Epoch: 11 Global Step: 60750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:15:28,604-Speed 10489.63 samples/sec Loss 6.5314 LearningRate 0.1607 Epoch: 11 Global Step: 60760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:15:36,398-Speed 10511.74 samples/sec Loss 6.5207 LearningRate 0.1606 Epoch: 11 Global Step: 60770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:15:44,226-Speed 10466.65 samples/sec Loss 6.5586 LearningRate 0.1605 Epoch: 11 Global Step: 60780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:15:52,027-Speed 10503.62 samples/sec Loss 6.5568 LearningRate 0.1604 Epoch: 11 Global Step: 60790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:15:59,867-Speed 10453.25 samples/sec Loss 6.5072 LearningRate 0.1604 Epoch: 11 Global Step: 60800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:16:07,685-Speed 10482.52 samples/sec Loss 6.5452 LearningRate 0.1603 Epoch: 11 Global Step: 60810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:16:15,516-Speed 10467.56 samples/sec Loss 6.5397 LearningRate 0.1602 Epoch: 11 Global Step: 60820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:16:23,315-Speed 10504.33 samples/sec Loss 6.5781 LearningRate 0.1601 Epoch: 11 Global Step: 60830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:16:31,152-Speed 10454.36 samples/sec Loss 6.5384 LearningRate 0.1601 Epoch: 11 Global Step: 60840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:16:38,944-Speed 10514.35 samples/sec Loss 6.4807 LearningRate 0.1600 Epoch: 11 Global Step: 60850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:16:46,740-Speed 10509.51 samples/sec Loss 6.5031 LearningRate 0.1599 Epoch: 11 Global Step: 60860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:16:54,518-Speed 10534.45 samples/sec Loss 6.5185 LearningRate 0.1598 Epoch: 11 Global Step: 60870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:17:02,308-Speed 10517.58 samples/sec Loss 6.5293 LearningRate 0.1598 Epoch: 11 Global Step: 60880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:17:10,088-Speed 10531.10 samples/sec Loss 6.5148 LearningRate 0.1597 Epoch: 11 Global Step: 60890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:17:17,863-Speed 10537.03 samples/sec Loss 6.5194 LearningRate 0.1596 Epoch: 11 Global Step: 60900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:17:25,648-Speed 10525.06 samples/sec Loss 6.4990 LearningRate 0.1595 Epoch: 11 Global Step: 60910 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 04:17:33,473-Speed 10470.86 samples/sec Loss 6.5033 LearningRate 0.1595 Epoch: 11 Global Step: 60920 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 04:17:41,270-Speed 10506.91 samples/sec Loss 6.5223 LearningRate 0.1594 Epoch: 11 Global Step: 60930 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 04:17:49,073-Speed 10499.84 samples/sec Loss 6.5410 LearningRate 0.1593 Epoch: 11 Global Step: 60940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:17:56,889-Speed 10483.66 samples/sec Loss 6.5112 LearningRate 0.1592 Epoch: 11 Global Step: 60950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:18:04,674-Speed 10523.26 samples/sec Loss 6.4931 LearningRate 0.1592 Epoch: 11 Global Step: 60960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:18:12,488-Speed 10485.41 samples/sec Loss 6.4936 LearningRate 0.1591 Epoch: 11 Global Step: 60970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:18:20,264-Speed 10535.61 samples/sec Loss 6.4933 LearningRate 0.1590 Epoch: 11 Global Step: 60980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:18:28,055-Speed 10516.64 samples/sec Loss 6.5490 LearningRate 0.1589 Epoch: 11 Global Step: 60990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:18:35,853-Speed 10506.75 samples/sec Loss 6.5427 LearningRate 0.1589 Epoch: 11 Global Step: 61000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:18:43,653-Speed 10504.23 samples/sec Loss 6.5013 LearningRate 0.1588 Epoch: 11 Global Step: 61010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:18:51,423-Speed 10544.79 samples/sec Loss 6.5474 LearningRate 0.1587 Epoch: 11 Global Step: 61020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:18:59,224-Speed 10502.06 samples/sec Loss 6.5786 LearningRate 0.1586 Epoch: 11 Global Step: 61030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:19:07,045-Speed 10476.72 samples/sec Loss 6.5466 LearningRate 0.1586 Epoch: 11 Global Step: 61040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:19:14,864-Speed 10478.18 samples/sec Loss 6.4739 LearningRate 0.1585 Epoch: 11 Global Step: 61050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:19:22,673-Speed 10492.59 samples/sec Loss 6.5254 LearningRate 0.1584 Epoch: 11 Global Step: 61060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:19:30,445-Speed 10542.19 samples/sec Loss 6.5160 LearningRate 0.1583 Epoch: 11 Global Step: 61070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:19:38,249-Speed 10501.20 samples/sec Loss 6.5216 LearningRate 0.1583 Epoch: 11 Global Step: 61080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:19:46,026-Speed 10535.25 samples/sec Loss 6.5370 LearningRate 0.1582 Epoch: 11 Global Step: 61090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:19:53,884-Speed 10427.00 samples/sec Loss 6.5348 LearningRate 0.1581 Epoch: 11 Global Step: 61100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:20:01,686-Speed 10500.99 samples/sec Loss 6.5400 LearningRate 0.1580 Epoch: 11 Global Step: 61110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:20:09,502-Speed 10483.27 samples/sec Loss 6.4978 LearningRate 0.1580 Epoch: 11 Global Step: 61120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:20:17,355-Speed 10432.95 samples/sec Loss 6.4961 LearningRate 0.1579 Epoch: 11 Global Step: 61130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:20:25,132-Speed 10535.06 samples/sec Loss 6.4678 LearningRate 0.1578 Epoch: 11 Global Step: 61140 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 04:20:32,905-Speed 10539.82 samples/sec Loss 6.4664 LearningRate 0.1578 Epoch: 11 Global Step: 61150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:20:40,687-Speed 10527.58 samples/sec Loss 6.5021 LearningRate 0.1577 Epoch: 11 Global Step: 61160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:20:48,476-Speed 10519.34 samples/sec Loss 6.4937 LearningRate 0.1576 Epoch: 11 Global Step: 61170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:20:56,277-Speed 10502.69 samples/sec Loss 6.4579 LearningRate 0.1575 Epoch: 11 Global Step: 61180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:21:04,063-Speed 10522.49 samples/sec Loss 6.4765 LearningRate 0.1575 Epoch: 11 Global Step: 61190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:21:11,867-Speed 10498.92 samples/sec Loss 6.5198 LearningRate 0.1574 Epoch: 11 Global Step: 61200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:21:19,642-Speed 10537.97 samples/sec Loss 6.4577 LearningRate 0.1573 Epoch: 11 Global Step: 61210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:21:27,435-Speed 10513.30 samples/sec Loss 6.4868 LearningRate 0.1572 Epoch: 11 Global Step: 61220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:21:35,248-Speed 10486.36 samples/sec Loss 6.4948 LearningRate 0.1572 Epoch: 11 Global Step: 61230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:21:43,045-Speed 10508.90 samples/sec Loss 6.4868 LearningRate 0.1571 Epoch: 11 Global Step: 61240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:21:50,831-Speed 10523.40 samples/sec Loss 6.4966 LearningRate 0.1570 Epoch: 11 Global Step: 61250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:21:58,615-Speed 10525.73 samples/sec Loss 6.4677 LearningRate 0.1569 Epoch: 11 Global Step: 61260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:22:06,411-Speed 10508.31 samples/sec Loss 6.4428 LearningRate 0.1569 Epoch: 11 Global Step: 61270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:22:14,246-Speed 10457.42 samples/sec Loss 6.4670 LearningRate 0.1568 Epoch: 11 Global Step: 61280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:22:22,047-Speed 10502.75 samples/sec Loss 6.4637 LearningRate 0.1567 Epoch: 11 Global Step: 61290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:22:29,836-Speed 10518.94 samples/sec Loss 6.4753 LearningRate 0.1566 Epoch: 11 Global Step: 61300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:22:37,624-Speed 10520.75 samples/sec Loss 6.4726 LearningRate 0.1566 Epoch: 11 Global Step: 61310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:22:45,432-Speed 10493.24 samples/sec Loss 6.4266 LearningRate 0.1565 Epoch: 11 Global Step: 61320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:22:53,214-Speed 10527.69 samples/sec Loss 6.4577 LearningRate 0.1564 Epoch: 11 Global Step: 61330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:23:01,021-Speed 10494.76 samples/sec Loss 6.4762 LearningRate 0.1563 Epoch: 11 Global Step: 61340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:23:08,813-Speed 10515.65 samples/sec Loss 6.4899 LearningRate 0.1563 Epoch: 11 Global Step: 61350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:23:16,622-Speed 10491.26 samples/sec Loss 6.4595 LearningRate 0.1562 Epoch: 11 Global Step: 61360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:23:24,433-Speed 10489.15 samples/sec Loss 6.4997 LearningRate 0.1561 Epoch: 11 Global Step: 61370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:23:32,271-Speed 10453.33 samples/sec Loss 6.4493 LearningRate 0.1560 Epoch: 11 Global Step: 61380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:23:40,077-Speed 10495.81 samples/sec Loss 6.4768 LearningRate 0.1560 Epoch: 11 Global Step: 61390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:23:47,883-Speed 10496.52 samples/sec Loss 6.4359 LearningRate 0.1559 Epoch: 11 Global Step: 61400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:23:55,700-Speed 10481.22 samples/sec Loss 6.4568 LearningRate 0.1558 Epoch: 11 Global Step: 61410 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:24:03,533-Speed 10460.22 samples/sec Loss 6.3971 LearningRate 0.1558 Epoch: 11 Global Step: 61420 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:24:11,305-Speed 10542.04 samples/sec Loss 6.4410 LearningRate 0.1557 Epoch: 11 Global Step: 61430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:24:19,087-Speed 10527.31 samples/sec Loss 6.4882 LearningRate 0.1556 Epoch: 11 Global Step: 61440 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:24:26,885-Speed 10508.44 samples/sec Loss 6.4861 LearningRate 0.1555 Epoch: 11 Global Step: 61450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:24:34,687-Speed 10501.79 samples/sec Loss 6.4923 LearningRate 0.1555 Epoch: 11 Global Step: 61460 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 04:24:42,474-Speed 10521.86 samples/sec Loss 6.4047 LearningRate 0.1554 Epoch: 11 Global Step: 61470 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 04:24:50,277-Speed 10500.02 samples/sec Loss 6.4715 LearningRate 0.1553 Epoch: 11 Global Step: 61480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:24:58,076-Speed 10506.59 samples/sec Loss 6.4136 LearningRate 0.1552 Epoch: 11 Global Step: 61490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:25:05,861-Speed 10524.74 samples/sec Loss 6.4369 LearningRate 0.1552 Epoch: 11 Global Step: 61500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:25:13,657-Speed 10509.98 samples/sec Loss 6.4549 LearningRate 0.1551 Epoch: 11 Global Step: 61510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:25:21,438-Speed 10529.37 samples/sec Loss 6.4632 LearningRate 0.1550 Epoch: 11 Global Step: 61520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:25:29,244-Speed 10497.49 samples/sec Loss 6.4551 LearningRate 0.1549 Epoch: 11 Global Step: 61530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:25:37,047-Speed 10500.17 samples/sec Loss 6.4496 LearningRate 0.1549 Epoch: 11 Global Step: 61540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:25:44,848-Speed 10501.46 samples/sec Loss 6.4311 LearningRate 0.1548 Epoch: 11 Global Step: 61550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:25:52,656-Speed 10493.36 samples/sec Loss 6.3986 LearningRate 0.1547 Epoch: 11 Global Step: 61560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:26:00,443-Speed 10521.58 samples/sec Loss 6.4452 LearningRate 0.1547 Epoch: 11 Global Step: 61570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:26:08,248-Speed 10497.32 samples/sec Loss 6.4214 LearningRate 0.1546 Epoch: 11 Global Step: 61580 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 04:26:16,039-Speed 10516.72 samples/sec Loss 6.4132 LearningRate 0.1545 Epoch: 11 Global Step: 61590 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 04:26:23,923-Speed 10391.82 samples/sec Loss 6.4155 LearningRate 0.1544 Epoch: 11 Global Step: 61600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:26:31,720-Speed 10507.15 samples/sec Loss 6.4079 LearningRate 0.1544 Epoch: 11 Global Step: 61610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:26:39,511-Speed 10517.30 samples/sec Loss 6.4265 LearningRate 0.1543 Epoch: 11 Global Step: 61620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:26:47,307-Speed 10508.01 samples/sec Loss 6.4073 LearningRate 0.1542 Epoch: 11 Global Step: 61630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:26:55,092-Speed 10525.30 samples/sec Loss 6.4490 LearningRate 0.1541 Epoch: 11 Global Step: 61640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:27:02,873-Speed 10528.68 samples/sec Loss 6.4561 LearningRate 0.1541 Epoch: 11 Global Step: 61650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:27:10,659-Speed 10523.32 samples/sec Loss 6.4394 LearningRate 0.1540 Epoch: 11 Global Step: 61660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:27:18,461-Speed 10499.94 samples/sec Loss 6.4501 LearningRate 0.1539 Epoch: 11 Global Step: 61670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:27:26,241-Speed 10532.09 samples/sec Loss 6.3542 LearningRate 0.1538 Epoch: 11 Global Step: 61680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:27:34,036-Speed 10510.18 samples/sec Loss 6.3930 LearningRate 0.1538 Epoch: 11 Global Step: 61690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:27:41,827-Speed 10516.60 samples/sec Loss 6.4234 LearningRate 0.1537 Epoch: 11 Global Step: 61700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:27:49,628-Speed 10502.73 samples/sec Loss 6.4127 LearningRate 0.1536 Epoch: 11 Global Step: 61710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:27:57,412-Speed 10525.68 samples/sec Loss 6.4690 LearningRate 0.1536 Epoch: 11 Global Step: 61720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:28:05,211-Speed 10505.01 samples/sec Loss 6.4516 LearningRate 0.1535 Epoch: 11 Global Step: 61730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:28:13,019-Speed 10493.03 samples/sec Loss 6.4235 LearningRate 0.1534 Epoch: 11 Global Step: 61740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:28:20,830-Speed 10488.78 samples/sec Loss 6.3808 LearningRate 0.1533 Epoch: 11 Global Step: 61750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:28:28,635-Speed 10498.34 samples/sec Loss 6.3927 LearningRate 0.1533 Epoch: 11 Global Step: 61760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:28:36,439-Speed 10497.92 samples/sec Loss 6.3640 LearningRate 0.1532 Epoch: 11 Global Step: 61770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:28:44,275-Speed 10455.98 samples/sec Loss 6.3798 LearningRate 0.1531 Epoch: 11 Global Step: 61780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:28:52,087-Speed 10488.58 samples/sec Loss 6.3963 LearningRate 0.1530 Epoch: 11 Global Step: 61790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:28:59,857-Speed 10544.78 samples/sec Loss 6.4183 LearningRate 0.1530 Epoch: 11 Global Step: 61800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:29:07,664-Speed 10494.34 samples/sec Loss 6.4016 LearningRate 0.1529 Epoch: 11 Global Step: 61810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:29:15,452-Speed 10519.71 samples/sec Loss 6.4280 LearningRate 0.1528 Epoch: 11 Global Step: 61820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:29:23,252-Speed 10503.95 samples/sec Loss 6.4171 LearningRate 0.1527 Epoch: 11 Global Step: 61830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:29:31,058-Speed 10496.84 samples/sec Loss 6.3862 LearningRate 0.1527 Epoch: 11 Global Step: 61840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:29:38,851-Speed 10513.58 samples/sec Loss 6.4089 LearningRate 0.1526 Epoch: 11 Global Step: 61850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:29:46,628-Speed 10534.87 samples/sec Loss 6.3945 LearningRate 0.1525 Epoch: 11 Global Step: 61860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:29:54,430-Speed 10501.32 samples/sec Loss 6.4085 LearningRate 0.1525 Epoch: 11 Global Step: 61870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:30:02,257-Speed 10467.07 samples/sec Loss 6.3815 LearningRate 0.1524 Epoch: 11 Global Step: 61880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:30:10,061-Speed 10499.29 samples/sec Loss 6.4082 LearningRate 0.1523 Epoch: 11 Global Step: 61890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:30:17,842-Speed 10530.24 samples/sec Loss 6.3631 LearningRate 0.1522 Epoch: 11 Global Step: 61900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:30:25,624-Speed 10527.47 samples/sec Loss 6.3549 LearningRate 0.1522 Epoch: 11 Global Step: 61910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:30:33,434-Speed 10491.45 samples/sec Loss 6.3860 LearningRate 0.1521 Epoch: 11 Global Step: 61920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:30:41,238-Speed 10498.79 samples/sec Loss 6.3885 LearningRate 0.1520 Epoch: 11 Global Step: 61930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:30:49,036-Speed 10507.28 samples/sec Loss 6.3606 LearningRate 0.1519 Epoch: 11 Global Step: 61940 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 04:30:56,837-Speed 10503.65 samples/sec Loss 6.3753 LearningRate 0.1519 Epoch: 11 Global Step: 61950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:31:04,631-Speed 10511.34 samples/sec Loss 6.3153 LearningRate 0.1518 Epoch: 11 Global Step: 61960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:31:12,429-Speed 10507.32 samples/sec Loss 6.3582 LearningRate 0.1517 Epoch: 11 Global Step: 61970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:31:20,235-Speed 10495.25 samples/sec Loss 6.3597 LearningRate 0.1517 Epoch: 11 Global Step: 61980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:31:28,057-Speed 10478.15 samples/sec Loss 6.3702 LearningRate 0.1516 Epoch: 11 Global Step: 61990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:31:35,869-Speed 10487.79 samples/sec Loss 6.3922 LearningRate 0.1515 Epoch: 11 Global Step: 62000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:31:43,667-Speed 10512.08 samples/sec Loss 6.3377 LearningRate 0.1514 Epoch: 11 Global Step: 62010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:31:51,453-Speed 10523.86 samples/sec Loss 6.3650 LearningRate 0.1514 Epoch: 11 Global Step: 62020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:31:59,231-Speed 10534.00 samples/sec Loss 6.4043 LearningRate 0.1513 Epoch: 11 Global Step: 62030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:32:07,023-Speed 10514.37 samples/sec Loss 6.4297 LearningRate 0.1512 Epoch: 11 Global Step: 62040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:32:14,817-Speed 10511.98 samples/sec Loss 6.3594 LearningRate 0.1511 Epoch: 11 Global Step: 62050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:32:22,602-Speed 10524.88 samples/sec Loss 6.3739 LearningRate 0.1511 Epoch: 11 Global Step: 62060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:32:30,396-Speed 10512.34 samples/sec Loss 6.3790 LearningRate 0.1510 Epoch: 11 Global Step: 62070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:32:38,201-Speed 10496.45 samples/sec Loss 6.3344 LearningRate 0.1509 Epoch: 11 Global Step: 62080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:32:45,990-Speed 10519.54 samples/sec Loss 6.3528 LearningRate 0.1509 Epoch: 11 Global Step: 62090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:32:53,777-Speed 10520.94 samples/sec Loss 6.4105 LearningRate 0.1508 Epoch: 11 Global Step: 62100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:33:01,571-Speed 10512.07 samples/sec Loss 6.3642 LearningRate 0.1507 Epoch: 11 Global Step: 62110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:33:09,372-Speed 10503.22 samples/sec Loss 6.4091 LearningRate 0.1506 Epoch: 11 Global Step: 62120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:33:17,197-Speed 10469.84 samples/sec Loss 6.3552 LearningRate 0.1506 Epoch: 11 Global Step: 62130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:33:25,094-Speed 10375.95 samples/sec Loss 6.3828 LearningRate 0.1505 Epoch: 11 Global Step: 62140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:33:32,902-Speed 10492.34 samples/sec Loss 6.3804 LearningRate 0.1504 Epoch: 11 Global Step: 62150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:33:40,710-Speed 10495.38 samples/sec Loss 6.3924 LearningRate 0.1503 Epoch: 11 Global Step: 62160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:33:48,505-Speed 10509.73 samples/sec Loss 6.3772 LearningRate 0.1503 Epoch: 11 Global Step: 62170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:33:56,317-Speed 10488.84 samples/sec Loss 6.3149 LearningRate 0.1502 Epoch: 11 Global Step: 62180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:34:04,124-Speed 10493.50 samples/sec Loss 6.3852 LearningRate 0.1501 Epoch: 11 Global Step: 62190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:34:11,952-Speed 10467.00 samples/sec Loss 6.3338 LearningRate 0.1501 Epoch: 11 Global Step: 62200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:34:19,767-Speed 10483.39 samples/sec Loss 6.3641 LearningRate 0.1500 Epoch: 11 Global Step: 62210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:34:42,690-Speed 3573.90 samples/sec Loss 6.4050 LearningRate 0.1499 Epoch: 12 Global Step: 62220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:34:50,497-Speed 10494.53 samples/sec Loss 6.3650 LearningRate 0.1498 Epoch: 12 Global Step: 62230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:34:58,294-Speed 10508.16 samples/sec Loss 6.3505 LearningRate 0.1498 Epoch: 12 Global Step: 62240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:35:06,160-Speed 10414.97 samples/sec Loss 6.3559 LearningRate 0.1497 Epoch: 12 Global Step: 62250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:35:13,975-Speed 10484.30 samples/sec Loss 6.3614 LearningRate 0.1496 Epoch: 12 Global Step: 62260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:35:21,758-Speed 10525.94 samples/sec Loss 6.3445 LearningRate 0.1496 Epoch: 12 Global Step: 62270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:35:29,541-Speed 10527.70 samples/sec Loss 6.3362 LearningRate 0.1495 Epoch: 12 Global Step: 62280 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:35:37,320-Speed 10532.59 samples/sec Loss 6.3327 LearningRate 0.1494 Epoch: 12 Global Step: 62290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:35:45,098-Speed 10533.73 samples/sec Loss 6.3103 LearningRate 0.1493 Epoch: 12 Global Step: 62300 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 04:35:52,922-Speed 10471.25 samples/sec Loss 6.3202 LearningRate 0.1493 Epoch: 12 Global Step: 62310 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 04:36:00,732-Speed 10491.60 samples/sec Loss 6.2826 LearningRate 0.1492 Epoch: 12 Global Step: 62320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:36:08,574-Speed 10447.96 samples/sec Loss 6.3023 LearningRate 0.1491 Epoch: 12 Global Step: 62330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:36:16,379-Speed 10497.27 samples/sec Loss 6.3563 LearningRate 0.1490 Epoch: 12 Global Step: 62340 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:36:24,191-Speed 10488.75 samples/sec Loss 6.2942 LearningRate 0.1490 Epoch: 12 Global Step: 62350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:36:31,990-Speed 10505.31 samples/sec Loss 6.3123 LearningRate 0.1489 Epoch: 12 Global Step: 62360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:36:39,779-Speed 10518.91 samples/sec Loss 6.3126 LearningRate 0.1488 Epoch: 12 Global Step: 62370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:36:47,559-Speed 10530.83 samples/sec Loss 6.3167 LearningRate 0.1488 Epoch: 12 Global Step: 62380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:36:55,360-Speed 10502.89 samples/sec Loss 6.2817 LearningRate 0.1487 Epoch: 12 Global Step: 62390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:37:03,163-Speed 10499.70 samples/sec Loss 6.3263 LearningRate 0.1486 Epoch: 12 Global Step: 62400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:37:10,990-Speed 10469.20 samples/sec Loss 6.3237 LearningRate 0.1485 Epoch: 12 Global Step: 62410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:37:18,776-Speed 10522.76 samples/sec Loss 6.3102 LearningRate 0.1485 Epoch: 12 Global Step: 62420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:37:26,586-Speed 10489.43 samples/sec Loss 6.3012 LearningRate 0.1484 Epoch: 12 Global Step: 62430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:37:34,371-Speed 10524.66 samples/sec Loss 6.2980 LearningRate 0.1483 Epoch: 12 Global Step: 62440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:37:42,166-Speed 10514.73 samples/sec Loss 6.3183 LearningRate 0.1483 Epoch: 12 Global Step: 62450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:37:49,973-Speed 10494.04 samples/sec Loss 6.3005 LearningRate 0.1482 Epoch: 12 Global Step: 62460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:37:57,776-Speed 10499.14 samples/sec Loss 6.2830 LearningRate 0.1481 Epoch: 12 Global Step: 62470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:38:05,565-Speed 10520.51 samples/sec Loss 6.3281 LearningRate 0.1480 Epoch: 12 Global Step: 62480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:38:13,359-Speed 10513.97 samples/sec Loss 6.2731 LearningRate 0.1480 Epoch: 12 Global Step: 62490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:38:21,160-Speed 10502.36 samples/sec Loss 6.3311 LearningRate 0.1479 Epoch: 12 Global Step: 62500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:38:28,972-Speed 10486.99 samples/sec Loss 6.3207 LearningRate 0.1478 Epoch: 12 Global Step: 62510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:38:36,757-Speed 10526.09 samples/sec Loss 6.2902 LearningRate 0.1478 Epoch: 12 Global Step: 62520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:38:44,572-Speed 10483.50 samples/sec Loss 6.3002 LearningRate 0.1477 Epoch: 12 Global Step: 62530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:38:52,356-Speed 10525.97 samples/sec Loss 6.3025 LearningRate 0.1476 Epoch: 12 Global Step: 62540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:39:00,175-Speed 10478.30 samples/sec Loss 6.3759 LearningRate 0.1475 Epoch: 12 Global Step: 62550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:39:07,980-Speed 10498.13 samples/sec Loss 6.2805 LearningRate 0.1475 Epoch: 12 Global Step: 62560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:39:15,786-Speed 10495.84 samples/sec Loss 6.2806 LearningRate 0.1474 Epoch: 12 Global Step: 62570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:39:23,614-Speed 10465.87 samples/sec Loss 6.2548 LearningRate 0.1473 Epoch: 12 Global Step: 62580 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 04:39:31,410-Speed 10508.98 samples/sec Loss 6.2771 LearningRate 0.1472 Epoch: 12 Global Step: 62590 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 04:39:39,216-Speed 10497.30 samples/sec Loss 6.2633 LearningRate 0.1472 Epoch: 12 Global Step: 62600 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 04:39:47,007-Speed 10515.49 samples/sec Loss 6.2926 LearningRate 0.1471 Epoch: 12 Global Step: 62610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:39:54,830-Speed 10473.08 samples/sec Loss 6.2955 LearningRate 0.1470 Epoch: 12 Global Step: 62620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:40:02,662-Speed 10461.29 samples/sec Loss 6.2936 LearningRate 0.1470 Epoch: 12 Global Step: 62630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:40:10,523-Speed 10428.60 samples/sec Loss 6.2713 LearningRate 0.1469 Epoch: 12 Global Step: 62640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:40:18,329-Speed 10497.34 samples/sec Loss 6.3066 LearningRate 0.1468 Epoch: 12 Global Step: 62650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:40:26,154-Speed 10470.10 samples/sec Loss 6.3096 LearningRate 0.1467 Epoch: 12 Global Step: 62660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:40:33,981-Speed 10470.31 samples/sec Loss 6.2970 LearningRate 0.1467 Epoch: 12 Global Step: 62670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:40:41,827-Speed 10442.44 samples/sec Loss 6.2683 LearningRate 0.1466 Epoch: 12 Global Step: 62680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:40:49,683-Speed 10428.55 samples/sec Loss 6.2270 LearningRate 0.1465 Epoch: 12 Global Step: 62690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:40:57,532-Speed 10439.47 samples/sec Loss 6.2574 LearningRate 0.1465 Epoch: 12 Global Step: 62700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:41:05,373-Speed 10449.87 samples/sec Loss 6.2772 LearningRate 0.1464 Epoch: 12 Global Step: 62710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:41:13,184-Speed 10488.51 samples/sec Loss 6.2903 LearningRate 0.1463 Epoch: 12 Global Step: 62720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:41:21,009-Speed 10470.69 samples/sec Loss 6.3178 LearningRate 0.1462 Epoch: 12 Global Step: 62730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:41:28,824-Speed 10483.87 samples/sec Loss 6.2894 LearningRate 0.1462 Epoch: 12 Global Step: 62740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:41:36,658-Speed 10457.82 samples/sec Loss 6.2888 LearningRate 0.1461 Epoch: 12 Global Step: 62750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:41:44,497-Speed 10452.61 samples/sec Loss 6.2740 LearningRate 0.1460 Epoch: 12 Global Step: 62760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:41:52,331-Speed 10457.57 samples/sec Loss 6.2803 LearningRate 0.1460 Epoch: 12 Global Step: 62770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:42:00,176-Speed 10444.37 samples/sec Loss 6.2757 LearningRate 0.1459 Epoch: 12 Global Step: 62780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:42:08,005-Speed 10463.79 samples/sec Loss 6.2642 LearningRate 0.1458 Epoch: 12 Global Step: 62790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:42:15,830-Speed 10470.97 samples/sec Loss 6.2033 LearningRate 0.1457 Epoch: 12 Global Step: 62800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:42:23,666-Speed 10455.26 samples/sec Loss 6.2467 LearningRate 0.1457 Epoch: 12 Global Step: 62810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:42:31,504-Speed 10453.43 samples/sec Loss 6.2711 LearningRate 0.1456 Epoch: 12 Global Step: 62820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:42:39,392-Speed 10386.49 samples/sec Loss 6.2235 LearningRate 0.1455 Epoch: 12 Global Step: 62830 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 04:42:47,258-Speed 10415.81 samples/sec Loss 6.2386 LearningRate 0.1455 Epoch: 12 Global Step: 62840 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 04:42:55,095-Speed 10454.87 samples/sec Loss 6.2830 LearningRate 0.1454 Epoch: 12 Global Step: 62850 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 04:43:02,934-Speed 10450.45 samples/sec Loss 6.3175 LearningRate 0.1453 Epoch: 12 Global Step: 62860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:43:10,784-Speed 10437.60 samples/sec Loss 6.2526 LearningRate 0.1452 Epoch: 12 Global Step: 62870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:43:18,621-Speed 10454.88 samples/sec Loss 6.2316 LearningRate 0.1452 Epoch: 12 Global Step: 62880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:43:26,446-Speed 10469.79 samples/sec Loss 6.2482 LearningRate 0.1451 Epoch: 12 Global Step: 62890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:43:34,307-Speed 10422.03 samples/sec Loss 6.2281 LearningRate 0.1450 Epoch: 12 Global Step: 62900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:43:42,115-Speed 10494.05 samples/sec Loss 6.2509 LearningRate 0.1450 Epoch: 12 Global Step: 62910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:43:49,953-Speed 10453.28 samples/sec Loss 6.2622 LearningRate 0.1449 Epoch: 12 Global Step: 62920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:43:57,752-Speed 10505.30 samples/sec Loss 6.2246 LearningRate 0.1448 Epoch: 12 Global Step: 62930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:44:05,556-Speed 10498.99 samples/sec Loss 6.2273 LearningRate 0.1448 Epoch: 12 Global Step: 62940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:44:13,333-Speed 10534.98 samples/sec Loss 6.2745 LearningRate 0.1447 Epoch: 12 Global Step: 62950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:44:21,147-Speed 10485.47 samples/sec Loss 6.2822 LearningRate 0.1446 Epoch: 12 Global Step: 62960 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 04:44:28,945-Speed 10506.24 samples/sec Loss 6.2118 LearningRate 0.1445 Epoch: 12 Global Step: 62970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:44:36,765-Speed 10476.58 samples/sec Loss 6.2637 LearningRate 0.1445 Epoch: 12 Global Step: 62980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:44:44,583-Speed 10480.66 samples/sec Loss 6.2823 LearningRate 0.1444 Epoch: 12 Global Step: 62990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:44:52,389-Speed 10496.23 samples/sec Loss 6.2111 LearningRate 0.1443 Epoch: 12 Global Step: 63000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:45:00,197-Speed 10491.96 samples/sec Loss 6.2363 LearningRate 0.1443 Epoch: 12 Global Step: 63010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:45:08,013-Speed 10482.14 samples/sec Loss 6.1829 LearningRate 0.1442 Epoch: 12 Global Step: 63020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:45:15,811-Speed 10508.39 samples/sec Loss 6.2250 LearningRate 0.1441 Epoch: 12 Global Step: 63030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:45:23,594-Speed 10525.77 samples/sec Loss 6.2585 LearningRate 0.1440 Epoch: 12 Global Step: 63040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:45:31,386-Speed 10514.42 samples/sec Loss 6.1986 LearningRate 0.1440 Epoch: 12 Global Step: 63050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:45:39,186-Speed 10503.78 samples/sec Loss 6.1936 LearningRate 0.1439 Epoch: 12 Global Step: 63060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:45:46,999-Speed 10489.33 samples/sec Loss 6.1970 LearningRate 0.1438 Epoch: 12 Global Step: 63070 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 04:45:54,871-Speed 10407.30 samples/sec Loss 6.2010 LearningRate 0.1438 Epoch: 12 Global Step: 63080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:46:02,661-Speed 10517.70 samples/sec Loss 6.2527 LearningRate 0.1437 Epoch: 12 Global Step: 63090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:46:10,426-Speed 10551.78 samples/sec Loss 6.2123 LearningRate 0.1436 Epoch: 12 Global Step: 63100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:46:18,236-Speed 10489.40 samples/sec Loss 6.2054 LearningRate 0.1435 Epoch: 12 Global Step: 63110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:46:26,038-Speed 10502.45 samples/sec Loss 6.2097 LearningRate 0.1435 Epoch: 12 Global Step: 63120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:46:33,822-Speed 10525.49 samples/sec Loss 6.2475 LearningRate 0.1434 Epoch: 12 Global Step: 63130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:46:41,632-Speed 10490.56 samples/sec Loss 6.2481 LearningRate 0.1433 Epoch: 12 Global Step: 63140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:46:49,440-Speed 10492.65 samples/sec Loss 6.2092 LearningRate 0.1433 Epoch: 12 Global Step: 63150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:46:57,232-Speed 10515.22 samples/sec Loss 6.2260 LearningRate 0.1432 Epoch: 12 Global Step: 63160 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:47:05,013-Speed 10531.79 samples/sec Loss 6.2081 LearningRate 0.1431 Epoch: 12 Global Step: 63170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:47:12,802-Speed 10521.42 samples/sec Loss 6.1642 LearningRate 0.1431 Epoch: 12 Global Step: 63180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:47:20,588-Speed 10523.32 samples/sec Loss 6.2307 LearningRate 0.1430 Epoch: 12 Global Step: 63190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:47:28,398-Speed 10490.78 samples/sec Loss 6.1456 LearningRate 0.1429 Epoch: 12 Global Step: 63200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:47:36,210-Speed 10487.24 samples/sec Loss 6.2360 LearningRate 0.1428 Epoch: 12 Global Step: 63210 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:47:44,139-Speed 10332.40 samples/sec Loss 6.2192 LearningRate 0.1428 Epoch: 12 Global Step: 63220 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:47:51,935-Speed 10510.52 samples/sec Loss 6.2080 LearningRate 0.1427 Epoch: 12 Global Step: 63230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:47:59,714-Speed 10530.84 samples/sec Loss 6.1638 LearningRate 0.1426 Epoch: 12 Global Step: 63240 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:48:07,515-Speed 10503.14 samples/sec Loss 6.1705 LearningRate 0.1426 Epoch: 12 Global Step: 63250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:48:15,300-Speed 10523.89 samples/sec Loss 6.1753 LearningRate 0.1425 Epoch: 12 Global Step: 63260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:48:23,086-Speed 10523.59 samples/sec Loss 6.2106 LearningRate 0.1424 Epoch: 12 Global Step: 63270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:48:30,884-Speed 10505.94 samples/sec Loss 6.2243 LearningRate 0.1423 Epoch: 12 Global Step: 63280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:48:38,694-Speed 10490.73 samples/sec Loss 6.1868 LearningRate 0.1423 Epoch: 12 Global Step: 63290 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:48:46,478-Speed 10526.24 samples/sec Loss 6.1886 LearningRate 0.1422 Epoch: 12 Global Step: 63300 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:48:54,255-Speed 10537.64 samples/sec Loss 6.2130 LearningRate 0.1421 Epoch: 12 Global Step: 63310 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:49:02,084-Speed 10464.92 samples/sec Loss 6.1962 LearningRate 0.1421 Epoch: 12 Global Step: 63320 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:49:09,894-Speed 10489.27 samples/sec Loss 6.2096 LearningRate 0.1420 Epoch: 12 Global Step: 63330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:49:17,703-Speed 10492.00 samples/sec Loss 6.1329 LearningRate 0.1419 Epoch: 12 Global Step: 63340 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:49:25,521-Speed 10480.73 samples/sec Loss 6.1640 LearningRate 0.1419 Epoch: 12 Global Step: 63350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:49:33,305-Speed 10525.00 samples/sec Loss 6.2506 LearningRate 0.1418 Epoch: 12 Global Step: 63360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:49:41,117-Speed 10488.35 samples/sec Loss 6.1684 LearningRate 0.1417 Epoch: 12 Global Step: 63370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:49:48,910-Speed 10513.00 samples/sec Loss 6.1979 LearningRate 0.1416 Epoch: 12 Global Step: 63380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:49:56,688-Speed 10533.97 samples/sec Loss 6.1906 LearningRate 0.1416 Epoch: 12 Global Step: 63390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:50:04,473-Speed 10524.65 samples/sec Loss 6.2121 LearningRate 0.1415 Epoch: 12 Global Step: 63400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:50:12,242-Speed 10545.87 samples/sec Loss 6.1724 LearningRate 0.1414 Epoch: 12 Global Step: 63410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:50:20,036-Speed 10511.42 samples/sec Loss 6.1894 LearningRate 0.1414 Epoch: 12 Global Step: 63420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:50:27,807-Speed 10543.28 samples/sec Loss 6.1814 LearningRate 0.1413 Epoch: 12 Global Step: 63430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:50:35,584-Speed 10535.15 samples/sec Loss 6.2155 LearningRate 0.1412 Epoch: 12 Global Step: 63440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:50:43,398-Speed 10484.28 samples/sec Loss 6.2167 LearningRate 0.1412 Epoch: 12 Global Step: 63450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:50:51,191-Speed 10514.60 samples/sec Loss 6.1460 LearningRate 0.1411 Epoch: 12 Global Step: 63460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:50:59,004-Speed 10486.56 samples/sec Loss 6.1414 LearningRate 0.1410 Epoch: 12 Global Step: 63470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:51:06,816-Speed 10487.46 samples/sec Loss 6.1797 LearningRate 0.1409 Epoch: 12 Global Step: 63480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:51:14,612-Speed 10509.88 samples/sec Loss 6.1418 LearningRate 0.1409 Epoch: 12 Global Step: 63490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:51:22,381-Speed 10545.83 samples/sec Loss 6.1628 LearningRate 0.1408 Epoch: 12 Global Step: 63500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:51:30,172-Speed 10515.74 samples/sec Loss 6.1574 LearningRate 0.1407 Epoch: 12 Global Step: 63510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:51:37,960-Speed 10520.36 samples/sec Loss 6.1872 LearningRate 0.1407 Epoch: 12 Global Step: 63520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:51:45,760-Speed 10504.10 samples/sec Loss 6.1596 LearningRate 0.1406 Epoch: 12 Global Step: 63530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:51:53,554-Speed 10512.36 samples/sec Loss 6.1895 LearningRate 0.1405 Epoch: 12 Global Step: 63540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:52:01,381-Speed 10468.24 samples/sec Loss 6.1835 LearningRate 0.1404 Epoch: 12 Global Step: 63550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:52:09,178-Speed 10507.78 samples/sec Loss 6.1553 LearningRate 0.1404 Epoch: 12 Global Step: 63560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:52:16,987-Speed 10491.91 samples/sec Loss 6.1232 LearningRate 0.1403 Epoch: 12 Global Step: 63570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:52:24,800-Speed 10486.23 samples/sec Loss 6.1247 LearningRate 0.1402 Epoch: 12 Global Step: 63580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:52:32,583-Speed 10526.83 samples/sec Loss 6.1554 LearningRate 0.1402 Epoch: 12 Global Step: 63590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:52:40,396-Speed 10487.92 samples/sec Loss 6.1185 LearningRate 0.1401 Epoch: 12 Global Step: 63600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:52:48,243-Speed 10440.61 samples/sec Loss 6.1149 LearningRate 0.1400 Epoch: 12 Global Step: 63610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:52:56,019-Speed 10536.68 samples/sec Loss 6.1751 LearningRate 0.1400 Epoch: 12 Global Step: 63620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:53:03,847-Speed 10470.18 samples/sec Loss 6.1740 LearningRate 0.1399 Epoch: 12 Global Step: 63630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:53:11,647-Speed 10504.52 samples/sec Loss 6.1875 LearningRate 0.1398 Epoch: 12 Global Step: 63640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:53:19,415-Speed 10547.92 samples/sec Loss 6.2081 LearningRate 0.1398 Epoch: 12 Global Step: 63650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:53:27,212-Speed 10507.81 samples/sec Loss 6.1490 LearningRate 0.1397 Epoch: 12 Global Step: 63660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:53:34,993-Speed 10530.32 samples/sec Loss 6.1446 LearningRate 0.1396 Epoch: 12 Global Step: 63670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:53:42,802-Speed 10492.63 samples/sec Loss 6.1779 LearningRate 0.1395 Epoch: 12 Global Step: 63680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:53:50,578-Speed 10535.40 samples/sec Loss 6.1315 LearningRate 0.1395 Epoch: 12 Global Step: 63690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:53:58,368-Speed 10520.46 samples/sec Loss 6.1400 LearningRate 0.1394 Epoch: 12 Global Step: 63700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:54:06,176-Speed 10493.42 samples/sec Loss 6.1188 LearningRate 0.1393 Epoch: 12 Global Step: 63710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:54:13,948-Speed 10542.12 samples/sec Loss 6.1115 LearningRate 0.1393 Epoch: 12 Global Step: 63720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:54:21,733-Speed 10524.13 samples/sec Loss 6.1872 LearningRate 0.1392 Epoch: 12 Global Step: 63730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:54:29,512-Speed 10533.71 samples/sec Loss 6.1021 LearningRate 0.1391 Epoch: 12 Global Step: 63740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:54:37,344-Speed 10460.71 samples/sec Loss 6.1091 LearningRate 0.1391 Epoch: 12 Global Step: 63750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:54:45,141-Speed 10509.11 samples/sec Loss 6.1449 LearningRate 0.1390 Epoch: 12 Global Step: 63760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:54:52,929-Speed 10520.03 samples/sec Loss 6.1653 LearningRate 0.1389 Epoch: 12 Global Step: 63770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:55:00,753-Speed 10470.85 samples/sec Loss 6.1389 LearningRate 0.1388 Epoch: 12 Global Step: 63780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:55:08,570-Speed 10481.89 samples/sec Loss 6.1301 LearningRate 0.1388 Epoch: 12 Global Step: 63790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:55:16,425-Speed 10430.38 samples/sec Loss 6.1609 LearningRate 0.1387 Epoch: 12 Global Step: 63800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:55:24,214-Speed 10517.78 samples/sec Loss 6.1396 LearningRate 0.1386 Epoch: 12 Global Step: 63810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:55:32,009-Speed 10511.60 samples/sec Loss 6.1374 LearningRate 0.1386 Epoch: 12 Global Step: 63820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:55:39,803-Speed 10511.17 samples/sec Loss 6.1205 LearningRate 0.1385 Epoch: 12 Global Step: 63830 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 04:55:47,602-Speed 10505.74 samples/sec Loss 6.1224 LearningRate 0.1384 Epoch: 12 Global Step: 63840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:55:55,424-Speed 10481.19 samples/sec Loss 6.1392 LearningRate 0.1384 Epoch: 12 Global Step: 63850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:56:03,212-Speed 10519.63 samples/sec Loss 6.1266 LearningRate 0.1383 Epoch: 12 Global Step: 63860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:56:11,049-Speed 10454.03 samples/sec Loss 6.1209 LearningRate 0.1382 Epoch: 12 Global Step: 63870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:56:18,908-Speed 10425.44 samples/sec Loss 6.1032 LearningRate 0.1381 Epoch: 12 Global Step: 63880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:56:26,688-Speed 10531.24 samples/sec Loss 6.0810 LearningRate 0.1381 Epoch: 12 Global Step: 63890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:56:34,467-Speed 10532.32 samples/sec Loss 6.1386 LearningRate 0.1380 Epoch: 12 Global Step: 63900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:56:42,256-Speed 10518.44 samples/sec Loss 6.0669 LearningRate 0.1379 Epoch: 12 Global Step: 63910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:56:50,028-Speed 10543.25 samples/sec Loss 6.0648 LearningRate 0.1379 Epoch: 12 Global Step: 63920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:56:57,833-Speed 10497.35 samples/sec Loss 6.1099 LearningRate 0.1378 Epoch: 12 Global Step: 63930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:57:05,636-Speed 10500.70 samples/sec Loss 6.0695 LearningRate 0.1377 Epoch: 12 Global Step: 63940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:57:13,416-Speed 10530.98 samples/sec Loss 6.1316 LearningRate 0.1377 Epoch: 12 Global Step: 63950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:57:21,263-Speed 10441.20 samples/sec Loss 6.1365 LearningRate 0.1376 Epoch: 12 Global Step: 63960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:57:29,050-Speed 10520.63 samples/sec Loss 6.1020 LearningRate 0.1375 Epoch: 12 Global Step: 63970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:57:36,845-Speed 10510.98 samples/sec Loss 6.1292 LearningRate 0.1375 Epoch: 12 Global Step: 63980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:57:44,663-Speed 10480.37 samples/sec Loss 6.0918 LearningRate 0.1374 Epoch: 12 Global Step: 63990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:57:52,453-Speed 10516.73 samples/sec Loss 6.0977 LearningRate 0.1373 Epoch: 12 Global Step: 64000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:58:00,245-Speed 10514.57 samples/sec Loss 6.1059 LearningRate 0.1372 Epoch: 12 Global Step: 64010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:58:08,070-Speed 10471.25 samples/sec Loss 6.0809 LearningRate 0.1372 Epoch: 12 Global Step: 64020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:58:15,855-Speed 10523.26 samples/sec Loss 6.0827 LearningRate 0.1371 Epoch: 12 Global Step: 64030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:58:23,642-Speed 10522.05 samples/sec Loss 6.0850 LearningRate 0.1370 Epoch: 12 Global Step: 64040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:58:31,486-Speed 10445.13 samples/sec Loss 6.1067 LearningRate 0.1370 Epoch: 12 Global Step: 64050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:58:39,325-Speed 10451.03 samples/sec Loss 6.0893 LearningRate 0.1369 Epoch: 12 Global Step: 64060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:58:47,123-Speed 10507.93 samples/sec Loss 6.0949 LearningRate 0.1368 Epoch: 12 Global Step: 64070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:58:54,918-Speed 10510.61 samples/sec Loss 6.0908 LearningRate 0.1368 Epoch: 12 Global Step: 64080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:59:02,743-Speed 10471.24 samples/sec Loss 6.0826 LearningRate 0.1367 Epoch: 12 Global Step: 64090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:59:10,559-Speed 10482.33 samples/sec Loss 6.1038 LearningRate 0.1366 Epoch: 12 Global Step: 64100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:59:18,353-Speed 10511.87 samples/sec Loss 6.0823 LearningRate 0.1366 Epoch: 12 Global Step: 64110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:59:26,173-Speed 10477.71 samples/sec Loss 6.0955 LearningRate 0.1365 Epoch: 12 Global Step: 64120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:59:33,979-Speed 10496.39 samples/sec Loss 6.0912 LearningRate 0.1364 Epoch: 12 Global Step: 64130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:59:41,781-Speed 10500.26 samples/sec Loss 6.0844 LearningRate 0.1363 Epoch: 12 Global Step: 64140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 04:59:49,569-Speed 10519.87 samples/sec Loss 6.0857 LearningRate 0.1363 Epoch: 12 Global Step: 64150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 04:59:57,341-Speed 10542.50 samples/sec Loss 6.0514 LearningRate 0.1362 Epoch: 12 Global Step: 64160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 05:00:05,120-Speed 10533.78 samples/sec Loss 6.0375 LearningRate 0.1361 Epoch: 12 Global Step: 64170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 05:00:12,909-Speed 10517.32 samples/sec Loss 6.0705 LearningRate 0.1361 Epoch: 12 Global Step: 64180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 05:00:20,722-Speed 10486.70 samples/sec Loss 6.0105 LearningRate 0.1360 Epoch: 12 Global Step: 64190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 05:00:28,515-Speed 10513.44 samples/sec Loss 6.0717 LearningRate 0.1359 Epoch: 12 Global Step: 64200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 05:00:36,332-Speed 10483.01 samples/sec Loss 6.0596 LearningRate 0.1359 Epoch: 12 Global Step: 64210 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 05:00:44,125-Speed 10512.76 samples/sec Loss 6.0578 LearningRate 0.1358 Epoch: 12 Global Step: 64220 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 05:00:51,915-Speed 10517.10 samples/sec Loss 6.0119 LearningRate 0.1357 Epoch: 12 Global Step: 64230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 05:00:59,698-Speed 10526.99 samples/sec Loss 6.0835 LearningRate 0.1357 Epoch: 12 Global Step: 64240 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 05:01:07,479-Speed 10529.67 samples/sec Loss 6.0380 LearningRate 0.1356 Epoch: 12 Global Step: 64250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 05:01:15,272-Speed 10514.11 samples/sec Loss 6.1047 LearningRate 0.1355 Epoch: 12 Global Step: 64260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:01:23,101-Speed 10463.47 samples/sec Loss 6.0451 LearningRate 0.1355 Epoch: 12 Global Step: 64270 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:01:30,905-Speed 10499.55 samples/sec Loss 6.0857 LearningRate 0.1354 Epoch: 12 Global Step: 64280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 05:01:38,698-Speed 10512.67 samples/sec Loss 6.0275 LearningRate 0.1353 Epoch: 12 Global Step: 64290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 05:01:46,485-Speed 10521.78 samples/sec Loss 6.0298 LearningRate 0.1352 Epoch: 12 Global Step: 64300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 05:01:54,282-Speed 10508.39 samples/sec Loss 6.0550 LearningRate 0.1352 Epoch: 12 Global Step: 64310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 05:02:02,124-Speed 10447.80 samples/sec Loss 6.0070 LearningRate 0.1351 Epoch: 12 Global Step: 64320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 05:02:09,928-Speed 10498.25 samples/sec Loss 6.0489 LearningRate 0.1350 Epoch: 12 Global Step: 64330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 05:02:17,739-Speed 10489.53 samples/sec Loss 6.0755 LearningRate 0.1350 Epoch: 12 Global Step: 64340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 05:02:25,520-Speed 10529.89 samples/sec Loss 6.0622 LearningRate 0.1349 Epoch: 12 Global Step: 64350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 05:02:33,305-Speed 10523.86 samples/sec Loss 6.0592 LearningRate 0.1348 Epoch: 12 Global Step: 64360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 05:02:41,123-Speed 10478.99 samples/sec Loss 6.0032 LearningRate 0.1348 Epoch: 12 Global Step: 64370 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 05:02:48,929-Speed 10495.93 samples/sec Loss 5.9985 LearningRate 0.1347 Epoch: 12 Global Step: 64380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:02:56,744-Speed 10484.98 samples/sec Loss 6.0601 LearningRate 0.1346 Epoch: 12 Global Step: 64390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:03:04,554-Speed 10491.36 samples/sec Loss 6.0438 LearningRate 0.1346 Epoch: 12 Global Step: 64400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:03:12,364-Speed 10490.81 samples/sec Loss 6.0247 LearningRate 0.1345 Epoch: 12 Global Step: 64410 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:03:20,154-Speed 10517.67 samples/sec Loss 6.0475 LearningRate 0.1344 Epoch: 12 Global Step: 64420 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:03:28,006-Speed 10435.97 samples/sec Loss 6.0318 LearningRate 0.1344 Epoch: 12 Global Step: 64430 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:03:35,797-Speed 10515.52 samples/sec Loss 6.0451 LearningRate 0.1343 Epoch: 12 Global Step: 64440 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:03:43,610-Speed 10490.60 samples/sec Loss 6.0393 LearningRate 0.1342 Epoch: 12 Global Step: 64450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:03:51,390-Speed 10531.30 samples/sec Loss 6.0488 LearningRate 0.1342 Epoch: 12 Global Step: 64460 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:03:59,198-Speed 10493.44 samples/sec Loss 6.0299 LearningRate 0.1341 Epoch: 12 Global Step: 64470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:04:06,987-Speed 10519.51 samples/sec Loss 6.0530 LearningRate 0.1340 Epoch: 12 Global Step: 64480 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 05:04:14,773-Speed 10522.37 samples/sec Loss 6.0033 LearningRate 0.1339 Epoch: 12 Global Step: 64490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:04:22,553-Speed 10534.30 samples/sec Loss 6.0535 LearningRate 0.1339 Epoch: 12 Global Step: 64500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:04:30,368-Speed 10484.46 samples/sec Loss 5.9896 LearningRate 0.1338 Epoch: 12 Global Step: 64510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:04:38,148-Speed 10530.78 samples/sec Loss 6.0428 LearningRate 0.1337 Epoch: 12 Global Step: 64520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:04:45,927-Speed 10533.37 samples/sec Loss 6.0391 LearningRate 0.1337 Epoch: 12 Global Step: 64530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:04:53,722-Speed 10510.27 samples/sec Loss 6.0098 LearningRate 0.1336 Epoch: 12 Global Step: 64540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:05:01,514-Speed 10515.20 samples/sec Loss 6.0183 LearningRate 0.1335 Epoch: 12 Global Step: 64550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:05:09,386-Speed 10406.62 samples/sec Loss 6.0582 LearningRate 0.1335 Epoch: 12 Global Step: 64560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:05:17,171-Speed 10524.59 samples/sec Loss 6.0561 LearningRate 0.1334 Epoch: 12 Global Step: 64570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:05:24,991-Speed 10477.12 samples/sec Loss 6.0004 LearningRate 0.1333 Epoch: 12 Global Step: 64580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:05:32,759-Speed 10547.64 samples/sec Loss 6.0507 LearningRate 0.1333 Epoch: 12 Global Step: 64590 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 05:05:40,550-Speed 10515.37 samples/sec Loss 5.9773 LearningRate 0.1332 Epoch: 12 Global Step: 64600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:05:48,353-Speed 10500.40 samples/sec Loss 6.0295 LearningRate 0.1331 Epoch: 12 Global Step: 64610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:05:56,137-Speed 10524.87 samples/sec Loss 6.0029 LearningRate 0.1331 Epoch: 12 Global Step: 64620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 05:06:03,949-Speed 10488.40 samples/sec Loss 6.0145 LearningRate 0.1330 Epoch: 12 Global Step: 64630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 05:06:11,734-Speed 10524.49 samples/sec Loss 6.0100 LearningRate 0.1329 Epoch: 12 Global Step: 64640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 05:06:19,528-Speed 10513.43 samples/sec Loss 5.9975 LearningRate 0.1329 Epoch: 12 Global Step: 64650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 05:06:27,322-Speed 10511.24 samples/sec Loss 6.0047 LearningRate 0.1328 Epoch: 12 Global Step: 64660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 05:06:35,105-Speed 10527.61 samples/sec Loss 6.0161 LearningRate 0.1327 Epoch: 12 Global Step: 64670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 05:06:42,936-Speed 10462.33 samples/sec Loss 6.0165 LearningRate 0.1327 Epoch: 12 Global Step: 64680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 05:06:50,740-Speed 10497.96 samples/sec Loss 6.0192 LearningRate 0.1326 Epoch: 12 Global Step: 64690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 05:06:58,544-Speed 10499.02 samples/sec Loss 6.0114 LearningRate 0.1325 Epoch: 12 Global Step: 64700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 05:07:06,386-Speed 10447.74 samples/sec Loss 6.0289 LearningRate 0.1324 Epoch: 12 Global Step: 64710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 05:07:14,175-Speed 10518.68 samples/sec Loss 5.9980 LearningRate 0.1324 Epoch: 12 Global Step: 64720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:07:21,985-Speed 10490.12 samples/sec Loss 5.9790 LearningRate 0.1323 Epoch: 12 Global Step: 64730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:07:29,779-Speed 10513.05 samples/sec Loss 6.0014 LearningRate 0.1322 Epoch: 12 Global Step: 64740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:07:37,578-Speed 10505.07 samples/sec Loss 6.0380 LearningRate 0.1322 Epoch: 12 Global Step: 64750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:07:45,363-Speed 10524.78 samples/sec Loss 5.9734 LearningRate 0.1321 Epoch: 12 Global Step: 64760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:07:53,177-Speed 10485.06 samples/sec Loss 5.9819 LearningRate 0.1320 Epoch: 12 Global Step: 64770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:08:00,968-Speed 10517.43 samples/sec Loss 6.0399 LearningRate 0.1320 Epoch: 12 Global Step: 64780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:08:08,783-Speed 10483.88 samples/sec Loss 6.0186 LearningRate 0.1319 Epoch: 12 Global Step: 64790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:08:16,572-Speed 10519.25 samples/sec Loss 5.9583 LearningRate 0.1318 Epoch: 12 Global Step: 64800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:08:24,366-Speed 10512.66 samples/sec Loss 6.0165 LearningRate 0.1318 Epoch: 12 Global Step: 64810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:08:32,164-Speed 10506.47 samples/sec Loss 5.9474 LearningRate 0.1317 Epoch: 12 Global Step: 64820 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 05:08:39,966-Speed 10503.51 samples/sec Loss 5.9685 LearningRate 0.1316 Epoch: 12 Global Step: 64830 Fp16 Grad Scale: 262144 Required: 9 hours Training: 2022-01-16 05:08:47,748-Speed 10528.04 samples/sec Loss 5.9828 LearningRate 0.1316 Epoch: 12 Global Step: 64840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:08:55,534-Speed 10522.90 samples/sec Loss 5.9595 LearningRate 0.1315 Epoch: 12 Global Step: 64850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:09:03,370-Speed 10456.46 samples/sec Loss 6.0058 LearningRate 0.1314 Epoch: 12 Global Step: 64860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:09:11,152-Speed 10528.59 samples/sec Loss 5.9467 LearningRate 0.1314 Epoch: 12 Global Step: 64870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:09:18,928-Speed 10537.07 samples/sec Loss 5.9577 LearningRate 0.1313 Epoch: 12 Global Step: 64880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:09:26,731-Speed 10499.89 samples/sec Loss 5.9242 LearningRate 0.1312 Epoch: 12 Global Step: 64890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:09:34,514-Speed 10526.37 samples/sec Loss 5.9989 LearningRate 0.1312 Epoch: 12 Global Step: 64900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:09:42,295-Speed 10529.92 samples/sec Loss 6.0147 LearningRate 0.1311 Epoch: 12 Global Step: 64910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:09:50,102-Speed 10494.22 samples/sec Loss 5.9469 LearningRate 0.1310 Epoch: 12 Global Step: 64920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:09:57,923-Speed 10475.98 samples/sec Loss 5.9468 LearningRate 0.1310 Epoch: 12 Global Step: 64930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:10:05,703-Speed 10530.66 samples/sec Loss 5.9966 LearningRate 0.1309 Epoch: 12 Global Step: 64940 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:10:13,487-Speed 10525.38 samples/sec Loss 5.9747 LearningRate 0.1308 Epoch: 12 Global Step: 64950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:10:21,267-Speed 10531.17 samples/sec Loss 5.9772 LearningRate 0.1308 Epoch: 12 Global Step: 64960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-01-16 05:10:29,042-Speed 10537.42 samples/sec Loss 5.9289 LearningRate 0.1307 Epoch: 12 Global Step: 64970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 05:10:36,837-Speed 10512.08 samples/sec Loss 5.9258 LearningRate 0.1306 Epoch: 12 Global Step: 64980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 05:10:44,626-Speed 10518.81 samples/sec Loss 5.9696 LearningRate 0.1306 Epoch: 12 Global Step: 64990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 05:10:52,425-Speed 10506.05 samples/sec Loss 5.9915 LearningRate 0.1305 Epoch: 12 Global Step: 65000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 05:11:00,213-Speed 10519.53 samples/sec Loss 6.0320 LearningRate 0.1304 Epoch: 12 Global Step: 65010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 05:11:07,998-Speed 10524.85 samples/sec Loss 5.9960 LearningRate 0.1303 Epoch: 12 Global Step: 65020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 05:11:15,780-Speed 10528.02 samples/sec Loss 5.9837 LearningRate 0.1303 Epoch: 12 Global Step: 65030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-01-16 05:11:23,591-Speed 10493.88 samples/sec Loss 5.9684 LearningRate 0.1302 Epoch: 12 Global Step: 65040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:11:31,375-Speed 10525.33 samples/sec Loss 5.9711 LearningRate 0.1301 Epoch: 12 Global Step: 65050 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-16 05:11:39,177-Speed 10500.87 samples/sec Loss 5.9305 LearningRate 0.1301 Epoch: 12 Global Step: 65060 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-16 05:11:46,968-Speed 10525.84 samples/sec Loss 5.9615 LearningRate 0.1300 Epoch: 12 Global Step: 65070 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-16 05:11:54,792-Speed 10472.21 samples/sec Loss 5.9423 LearningRate 0.1299 Epoch: 12 Global Step: 65080 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-16 05:12:02,590-Speed 10506.04 samples/sec Loss 5.9558 LearningRate 0.1299 Epoch: 12 Global Step: 65090 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-16 05:12:10,353-Speed 10554.65 samples/sec Loss 5.9657 LearningRate 0.1298 Epoch: 12 Global Step: 65100 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-16 05:12:18,123-Speed 10544.64 samples/sec Loss 5.9846 LearningRate 0.1297 Epoch: 12 Global Step: 65110 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-16 05:12:25,918-Speed 10510.68 samples/sec Loss 5.9791 LearningRate 0.1297 Epoch: 12 Global Step: 65120 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-16 05:12:33,697-Speed 10532.53 samples/sec Loss 5.9658 LearningRate 0.1296 Epoch: 12 Global Step: 65130 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-16 05:12:41,513-Speed 10481.82 samples/sec Loss 5.9378 LearningRate 0.1295 Epoch: 12 Global Step: 65140 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-16 05:12:49,276-Speed 10555.08 samples/sec Loss 5.9593 LearningRate 0.1295 Epoch: 12 Global Step: 65150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:12:57,078-Speed 10500.90 samples/sec Loss 5.9467 LearningRate 0.1294 Epoch: 12 Global Step: 65160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:13:04,873-Speed 10510.80 samples/sec Loss 5.9186 LearningRate 0.1293 Epoch: 12 Global Step: 65170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:13:12,691-Speed 10479.66 samples/sec Loss 5.9428 LearningRate 0.1293 Epoch: 12 Global Step: 65180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:13:20,479-Speed 10520.26 samples/sec Loss 5.9184 LearningRate 0.1292 Epoch: 12 Global Step: 65190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:13:28,250-Speed 10543.32 samples/sec Loss 5.8943 LearningRate 0.1291 Epoch: 12 Global Step: 65200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:13:36,053-Speed 10499.57 samples/sec Loss 5.9552 LearningRate 0.1291 Epoch: 12 Global Step: 65210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:13:43,844-Speed 10515.89 samples/sec Loss 5.9885 LearningRate 0.1290 Epoch: 12 Global Step: 65220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:13:51,658-Speed 10486.66 samples/sec Loss 5.9540 LearningRate 0.1289 Epoch: 12 Global Step: 65230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:13:59,450-Speed 10514.86 samples/sec Loss 5.9206 LearningRate 0.1289 Epoch: 12 Global Step: 65240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:14:07,268-Speed 10479.61 samples/sec Loss 5.8961 LearningRate 0.1288 Epoch: 12 Global Step: 65250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:14:15,076-Speed 10493.76 samples/sec Loss 5.9172 LearningRate 0.1287 Epoch: 12 Global Step: 65260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:14:22,873-Speed 10506.84 samples/sec Loss 5.9217 LearningRate 0.1287 Epoch: 12 Global Step: 65270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:14:30,676-Speed 10501.18 samples/sec Loss 5.9323 LearningRate 0.1286 Epoch: 12 Global Step: 65280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:14:38,476-Speed 10503.61 samples/sec Loss 5.9294 LearningRate 0.1285 Epoch: 12 Global Step: 65290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:14:46,260-Speed 10526.41 samples/sec Loss 5.9465 LearningRate 0.1285 Epoch: 12 Global Step: 65300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:14:54,043-Speed 10527.30 samples/sec Loss 5.9084 LearningRate 0.1284 Epoch: 12 Global Step: 65310 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-16 05:15:01,835-Speed 10513.85 samples/sec Loss 5.9278 LearningRate 0.1283 Epoch: 12 Global Step: 65320 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-16 05:15:09,618-Speed 10527.82 samples/sec Loss 5.8987 LearningRate 0.1283 Epoch: 12 Global Step: 65330 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-16 05:15:17,402-Speed 10525.00 samples/sec Loss 5.9746 LearningRate 0.1282 Epoch: 12 Global Step: 65340 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-16 05:15:25,182-Speed 10531.02 samples/sec Loss 5.8508 LearningRate 0.1281 Epoch: 12 Global Step: 65350 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-16 05:15:33,000-Speed 10480.02 samples/sec Loss 5.8846 LearningRate 0.1281 Epoch: 12 Global Step: 65360 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-16 05:15:40,772-Speed 10541.13 samples/sec Loss 5.8985 LearningRate 0.1280 Epoch: 12 Global Step: 65370 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-16 05:15:48,598-Speed 10469.41 samples/sec Loss 5.9006 LearningRate 0.1279 Epoch: 12 Global Step: 65380 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-16 05:15:56,370-Speed 10542.45 samples/sec Loss 5.9523 LearningRate 0.1279 Epoch: 12 Global Step: 65390 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-16 05:16:04,152-Speed 10527.14 samples/sec Loss 5.9456 LearningRate 0.1278 Epoch: 12 Global Step: 65400 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-01-16 05:16:11,952-Speed 10503.97 samples/sec Loss 5.9011 LearningRate 0.1277 Epoch: 12 Global Step: 65410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:16:19,737-Speed 10525.64 samples/sec Loss 5.8576 LearningRate 0.1277 Epoch: 12 Global Step: 65420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:16:27,537-Speed 10503.39 samples/sec Loss 5.9320 LearningRate 0.1276 Epoch: 12 Global Step: 65430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:16:35,344-Speed 10494.41 samples/sec Loss 5.9243 LearningRate 0.1275 Epoch: 12 Global Step: 65440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:16:43,168-Speed 10472.77 samples/sec Loss 5.9090 LearningRate 0.1275 Epoch: 12 Global Step: 65450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:16:50,935-Speed 10547.36 samples/sec Loss 5.9246 LearningRate 0.1274 Epoch: 12 Global Step: 65460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:16:58,707-Speed 10541.61 samples/sec Loss 5.8772 LearningRate 0.1273 Epoch: 12 Global Step: 65470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:17:06,512-Speed 10498.16 samples/sec Loss 5.9071 LearningRate 0.1273 Epoch: 12 Global Step: 65480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:17:14,286-Speed 10538.76 samples/sec Loss 5.9144 LearningRate 0.1272 Epoch: 12 Global Step: 65490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:17:22,072-Speed 10523.15 samples/sec Loss 5.8499 LearningRate 0.1271 Epoch: 12 Global Step: 65500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:17:29,854-Speed 10528.17 samples/sec Loss 5.8761 LearningRate 0.1271 Epoch: 12 Global Step: 65510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:17:37,667-Speed 10486.83 samples/sec Loss 5.8491 LearningRate 0.1270 Epoch: 12 Global Step: 65520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:17:45,468-Speed 10502.37 samples/sec Loss 5.8770 LearningRate 0.1269 Epoch: 12 Global Step: 65530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:17:53,272-Speed 10499.57 samples/sec Loss 5.9002 LearningRate 0.1269 Epoch: 12 Global Step: 65540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:18:01,044-Speed 10545.46 samples/sec Loss 5.9235 LearningRate 0.1268 Epoch: 12 Global Step: 65550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:18:08,859-Speed 10483.28 samples/sec Loss 5.9033 LearningRate 0.1267 Epoch: 12 Global Step: 65560 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:18:16,645-Speed 10525.28 samples/sec Loss 5.8726 LearningRate 0.1267 Epoch: 12 Global Step: 65570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:18:24,437-Speed 10514.98 samples/sec Loss 5.8942 LearningRate 0.1266 Epoch: 12 Global Step: 65580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:18:32,230-Speed 10513.83 samples/sec Loss 5.8772 LearningRate 0.1265 Epoch: 12 Global Step: 65590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:18:40,032-Speed 10501.75 samples/sec Loss 5.8662 LearningRate 0.1265 Epoch: 12 Global Step: 65600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:18:47,833-Speed 10503.02 samples/sec Loss 5.8679 LearningRate 0.1264 Epoch: 12 Global Step: 65610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:18:55,618-Speed 10524.75 samples/sec Loss 5.8525 LearningRate 0.1263 Epoch: 12 Global Step: 65620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:19:03,417-Speed 10504.73 samples/sec Loss 5.8735 LearningRate 0.1263 Epoch: 12 Global Step: 65630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:19:11,189-Speed 10541.99 samples/sec Loss 5.8913 LearningRate 0.1262 Epoch: 12 Global Step: 65640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:19:18,946-Speed 10562.30 samples/sec Loss 5.8951 LearningRate 0.1261 Epoch: 12 Global Step: 65650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:19:26,755-Speed 10492.59 samples/sec Loss 5.8956 LearningRate 0.1261 Epoch: 12 Global Step: 65660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:19:34,546-Speed 10515.71 samples/sec Loss 5.8509 LearningRate 0.1260 Epoch: 12 Global Step: 65670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:19:42,337-Speed 10517.22 samples/sec Loss 5.8387 LearningRate 0.1259 Epoch: 12 Global Step: 65680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:19:50,141-Speed 10499.03 samples/sec Loss 5.8978 LearningRate 0.1259 Epoch: 12 Global Step: 65690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:19:57,985-Speed 10444.72 samples/sec Loss 5.8764 LearningRate 0.1258 Epoch: 12 Global Step: 65700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:20:05,786-Speed 10502.50 samples/sec Loss 5.8714 LearningRate 0.1257 Epoch: 12 Global Step: 65710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:20:13,567-Speed 10530.30 samples/sec Loss 5.8565 LearningRate 0.1257 Epoch: 12 Global Step: 65720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:20:21,344-Speed 10535.59 samples/sec Loss 5.8254 LearningRate 0.1256 Epoch: 12 Global Step: 65730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:20:29,112-Speed 10550.86 samples/sec Loss 5.8556 LearningRate 0.1255 Epoch: 12 Global Step: 65740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:20:36,915-Speed 10500.01 samples/sec Loss 5.8356 LearningRate 0.1255 Epoch: 12 Global Step: 65750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:20:44,748-Speed 10459.25 samples/sec Loss 5.8481 LearningRate 0.1254 Epoch: 12 Global Step: 65760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:20:52,540-Speed 10516.37 samples/sec Loss 5.8451 LearningRate 0.1253 Epoch: 12 Global Step: 65770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:21:00,349-Speed 10491.39 samples/sec Loss 5.8724 LearningRate 0.1253 Epoch: 12 Global Step: 65780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:21:08,179-Speed 10463.47 samples/sec Loss 5.8834 LearningRate 0.1252 Epoch: 12 Global Step: 65790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:21:15,969-Speed 10518.52 samples/sec Loss 5.8354 LearningRate 0.1251 Epoch: 12 Global Step: 65800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:21:23,762-Speed 10511.97 samples/sec Loss 5.8355 LearningRate 0.1251 Epoch: 12 Global Step: 65810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:21:31,552-Speed 10517.60 samples/sec Loss 5.8448 LearningRate 0.1250 Epoch: 12 Global Step: 65820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:21:39,337-Speed 10525.03 samples/sec Loss 5.8511 LearningRate 0.1249 Epoch: 12 Global Step: 65830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:21:47,156-Speed 10479.00 samples/sec Loss 5.8351 LearningRate 0.1249 Epoch: 12 Global Step: 65840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:21:54,950-Speed 10511.37 samples/sec Loss 5.8347 LearningRate 0.1248 Epoch: 12 Global Step: 65850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:22:02,745-Speed 10511.27 samples/sec Loss 5.8494 LearningRate 0.1247 Epoch: 12 Global Step: 65860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:22:10,523-Speed 10534.37 samples/sec Loss 5.8619 LearningRate 0.1247 Epoch: 12 Global Step: 65870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:22:18,315-Speed 10514.04 samples/sec Loss 5.8475 LearningRate 0.1246 Epoch: 12 Global Step: 65880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:22:26,116-Speed 10502.67 samples/sec Loss 5.8276 LearningRate 0.1245 Epoch: 12 Global Step: 65890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:22:33,907-Speed 10516.39 samples/sec Loss 5.8505 LearningRate 0.1245 Epoch: 12 Global Step: 65900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:22:41,683-Speed 10536.27 samples/sec Loss 5.8327 LearningRate 0.1244 Epoch: 12 Global Step: 65910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:22:49,466-Speed 10526.11 samples/sec Loss 5.8314 LearningRate 0.1243 Epoch: 12 Global Step: 65920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:22:57,250-Speed 10526.69 samples/sec Loss 5.8242 LearningRate 0.1243 Epoch: 12 Global Step: 65930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:23:05,060-Speed 10489.70 samples/sec Loss 5.8240 LearningRate 0.1242 Epoch: 12 Global Step: 65940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:23:12,885-Speed 10471.13 samples/sec Loss 5.8368 LearningRate 0.1242 Epoch: 12 Global Step: 65950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:23:20,682-Speed 10508.50 samples/sec Loss 5.8275 LearningRate 0.1241 Epoch: 12 Global Step: 65960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:23:28,474-Speed 10514.89 samples/sec Loss 5.8392 LearningRate 0.1240 Epoch: 12 Global Step: 65970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:23:36,231-Speed 10561.97 samples/sec Loss 5.8416 LearningRate 0.1240 Epoch: 12 Global Step: 65980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:23:44,019-Speed 10519.68 samples/sec Loss 5.8254 LearningRate 0.1239 Epoch: 12 Global Step: 65990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:23:51,829-Speed 10490.35 samples/sec Loss 5.7885 LearningRate 0.1238 Epoch: 12 Global Step: 66000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:23:59,662-Speed 10459.87 samples/sec Loss 5.7909 LearningRate 0.1238 Epoch: 12 Global Step: 66010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:24:07,444-Speed 10528.63 samples/sec Loss 5.8041 LearningRate 0.1237 Epoch: 12 Global Step: 66020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:24:15,238-Speed 10512.00 samples/sec Loss 5.7890 LearningRate 0.1236 Epoch: 12 Global Step: 66030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:24:23,020-Speed 10527.88 samples/sec Loss 5.7890 LearningRate 0.1236 Epoch: 12 Global Step: 66040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:24:30,793-Speed 10539.86 samples/sec Loss 5.8295 LearningRate 0.1235 Epoch: 12 Global Step: 66050 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 05:24:38,632-Speed 10452.31 samples/sec Loss 5.7994 LearningRate 0.1234 Epoch: 12 Global Step: 66060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:24:46,441-Speed 10492.29 samples/sec Loss 5.8264 LearningRate 0.1234 Epoch: 12 Global Step: 66070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:24:54,245-Speed 10498.29 samples/sec Loss 5.8265 LearningRate 0.1233 Epoch: 12 Global Step: 66080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:25:02,078-Speed 10461.92 samples/sec Loss 5.7776 LearningRate 0.1232 Epoch: 12 Global Step: 66090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:25:09,879-Speed 10503.03 samples/sec Loss 5.8077 LearningRate 0.1232 Epoch: 12 Global Step: 66100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:25:17,687-Speed 10493.46 samples/sec Loss 5.7849 LearningRate 0.1231 Epoch: 12 Global Step: 66110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:25:25,492-Speed 10497.56 samples/sec Loss 5.7864 LearningRate 0.1230 Epoch: 12 Global Step: 66120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:25:33,290-Speed 10506.10 samples/sec Loss 5.7867 LearningRate 0.1230 Epoch: 12 Global Step: 66130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:25:41,079-Speed 10520.23 samples/sec Loss 5.7808 LearningRate 0.1229 Epoch: 12 Global Step: 66140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:25:48,857-Speed 10532.44 samples/sec Loss 5.7935 LearningRate 0.1228 Epoch: 12 Global Step: 66150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:25:56,671-Speed 10491.45 samples/sec Loss 5.8101 LearningRate 0.1228 Epoch: 12 Global Step: 66160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:26:04,476-Speed 10498.18 samples/sec Loss 5.8387 LearningRate 0.1227 Epoch: 12 Global Step: 66170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:26:12,277-Speed 10502.34 samples/sec Loss 5.8243 LearningRate 0.1226 Epoch: 12 Global Step: 66180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:26:20,099-Speed 10473.92 samples/sec Loss 5.7999 LearningRate 0.1226 Epoch: 12 Global Step: 66190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:26:27,930-Speed 10462.48 samples/sec Loss 5.7965 LearningRate 0.1225 Epoch: 12 Global Step: 66200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:26:35,782-Speed 10435.30 samples/sec Loss 5.7953 LearningRate 0.1224 Epoch: 12 Global Step: 66210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:26:43,638-Speed 10428.23 samples/sec Loss 5.8199 LearningRate 0.1224 Epoch: 12 Global Step: 66220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:26:51,426-Speed 10520.77 samples/sec Loss 5.8081 LearningRate 0.1223 Epoch: 12 Global Step: 66230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:26:59,214-Speed 10519.91 samples/sec Loss 5.7971 LearningRate 0.1223 Epoch: 12 Global Step: 66240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:27:07,010-Speed 10510.61 samples/sec Loss 5.7873 LearningRate 0.1222 Epoch: 12 Global Step: 66250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:27:14,862-Speed 10433.47 samples/sec Loss 5.7617 LearningRate 0.1221 Epoch: 12 Global Step: 66260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:27:22,694-Speed 10460.88 samples/sec Loss 5.7909 LearningRate 0.1221 Epoch: 12 Global Step: 66270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:27:30,498-Speed 10499.74 samples/sec Loss 5.8018 LearningRate 0.1220 Epoch: 12 Global Step: 66280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:27:38,321-Speed 10473.49 samples/sec Loss 5.7524 LearningRate 0.1219 Epoch: 12 Global Step: 66290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:27:46,122-Speed 10502.52 samples/sec Loss 5.7567 LearningRate 0.1219 Epoch: 12 Global Step: 66300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:27:53,936-Speed 10484.12 samples/sec Loss 5.8009 LearningRate 0.1218 Epoch: 12 Global Step: 66310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:28:01,732-Speed 10510.20 samples/sec Loss 5.7878 LearningRate 0.1217 Epoch: 12 Global Step: 66320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:28:09,538-Speed 10499.37 samples/sec Loss 5.8113 LearningRate 0.1217 Epoch: 12 Global Step: 66330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:28:17,358-Speed 10475.83 samples/sec Loss 5.7709 LearningRate 0.1216 Epoch: 12 Global Step: 66340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:28:25,141-Speed 10527.71 samples/sec Loss 5.8004 LearningRate 0.1215 Epoch: 12 Global Step: 66350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:28:32,923-Speed 10528.98 samples/sec Loss 5.7292 LearningRate 0.1215 Epoch: 12 Global Step: 66360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:28:40,744-Speed 10475.93 samples/sec Loss 5.7474 LearningRate 0.1214 Epoch: 12 Global Step: 66370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:28:48,541-Speed 10509.07 samples/sec Loss 5.7820 LearningRate 0.1213 Epoch: 12 Global Step: 66380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:28:56,335-Speed 10511.24 samples/sec Loss 5.7864 LearningRate 0.1213 Epoch: 12 Global Step: 66390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:29:04,132-Speed 10511.57 samples/sec Loss 5.7372 LearningRate 0.1212 Epoch: 12 Global Step: 66400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:29:11,958-Speed 10468.33 samples/sec Loss 5.7797 LearningRate 0.1211 Epoch: 12 Global Step: 66410 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 05:29:19,757-Speed 10506.07 samples/sec Loss 5.7629 LearningRate 0.1211 Epoch: 12 Global Step: 66420 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 05:29:27,560-Speed 10500.51 samples/sec Loss 5.7822 LearningRate 0.1210 Epoch: 12 Global Step: 66430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:29:35,362-Speed 10502.13 samples/sec Loss 5.7609 LearningRate 0.1209 Epoch: 12 Global Step: 66440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:29:43,168-Speed 10495.70 samples/sec Loss 5.7569 LearningRate 0.1209 Epoch: 12 Global Step: 66450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:29:50,985-Speed 10480.97 samples/sec Loss 5.7406 LearningRate 0.1208 Epoch: 12 Global Step: 66460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:29:58,808-Speed 10472.82 samples/sec Loss 5.7423 LearningRate 0.1208 Epoch: 12 Global Step: 66470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:30:06,621-Speed 10486.84 samples/sec Loss 5.7755 LearningRate 0.1207 Epoch: 12 Global Step: 66480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:30:14,410-Speed 10519.09 samples/sec Loss 5.7315 LearningRate 0.1206 Epoch: 12 Global Step: 66490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:30:22,212-Speed 10501.43 samples/sec Loss 5.7515 LearningRate 0.1206 Epoch: 12 Global Step: 66500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:30:30,004-Speed 10516.15 samples/sec Loss 5.7518 LearningRate 0.1205 Epoch: 12 Global Step: 66510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:30:37,786-Speed 10528.01 samples/sec Loss 5.7448 LearningRate 0.1204 Epoch: 12 Global Step: 66520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:30:45,574-Speed 10519.08 samples/sec Loss 5.7742 LearningRate 0.1204 Epoch: 12 Global Step: 66530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:30:53,352-Speed 10534.05 samples/sec Loss 5.7379 LearningRate 0.1203 Epoch: 12 Global Step: 66540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:31:01,132-Speed 10532.06 samples/sec Loss 5.7643 LearningRate 0.1202 Epoch: 12 Global Step: 66550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:31:08,916-Speed 10525.50 samples/sec Loss 5.7246 LearningRate 0.1202 Epoch: 12 Global Step: 66560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:31:16,705-Speed 10518.26 samples/sec Loss 5.7234 LearningRate 0.1201 Epoch: 12 Global Step: 66570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:31:24,493-Speed 10519.45 samples/sec Loss 5.6937 LearningRate 0.1200 Epoch: 12 Global Step: 66580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:31:32,321-Speed 10467.05 samples/sec Loss 5.7509 LearningRate 0.1200 Epoch: 12 Global Step: 66590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:31:40,128-Speed 10494.31 samples/sec Loss 5.7451 LearningRate 0.1199 Epoch: 12 Global Step: 66600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:31:47,933-Speed 10497.64 samples/sec Loss 5.7601 LearningRate 0.1198 Epoch: 12 Global Step: 66610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:31:55,744-Speed 10489.87 samples/sec Loss 5.7236 LearningRate 0.1198 Epoch: 12 Global Step: 66620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:32:03,557-Speed 10486.36 samples/sec Loss 5.7508 LearningRate 0.1197 Epoch: 12 Global Step: 66630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:32:11,357-Speed 10504.66 samples/sec Loss 5.7526 LearningRate 0.1197 Epoch: 12 Global Step: 66640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:32:19,197-Speed 10449.27 samples/sec Loss 5.7213 LearningRate 0.1196 Epoch: 12 Global Step: 66650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:32:26,988-Speed 10516.68 samples/sec Loss 5.7754 LearningRate 0.1195 Epoch: 12 Global Step: 66660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:32:34,808-Speed 10475.98 samples/sec Loss 5.7418 LearningRate 0.1195 Epoch: 12 Global Step: 66670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:32:42,660-Speed 10434.52 samples/sec Loss 5.7141 LearningRate 0.1194 Epoch: 12 Global Step: 66680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:32:50,450-Speed 10518.35 samples/sec Loss 5.7212 LearningRate 0.1193 Epoch: 12 Global Step: 66690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:32:58,238-Speed 10520.14 samples/sec Loss 5.7429 LearningRate 0.1193 Epoch: 12 Global Step: 66700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:33:06,031-Speed 10513.73 samples/sec Loss 5.7092 LearningRate 0.1192 Epoch: 12 Global Step: 66710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:33:13,886-Speed 10431.10 samples/sec Loss 5.7236 LearningRate 0.1191 Epoch: 12 Global Step: 66720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:33:21,683-Speed 10508.72 samples/sec Loss 5.7312 LearningRate 0.1191 Epoch: 12 Global Step: 66730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:33:29,482-Speed 10504.87 samples/sec Loss 5.7234 LearningRate 0.1190 Epoch: 12 Global Step: 66740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:33:37,272-Speed 10517.28 samples/sec Loss 5.7121 LearningRate 0.1189 Epoch: 12 Global Step: 66750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:33:45,081-Speed 10491.45 samples/sec Loss 5.7109 LearningRate 0.1189 Epoch: 12 Global Step: 66760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:33:52,940-Speed 10427.14 samples/sec Loss 5.6824 LearningRate 0.1188 Epoch: 12 Global Step: 66770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:34:00,728-Speed 10520.02 samples/sec Loss 5.7137 LearningRate 0.1188 Epoch: 12 Global Step: 66780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:34:08,544-Speed 10482.49 samples/sec Loss 5.7181 LearningRate 0.1187 Epoch: 12 Global Step: 66790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:34:16,338-Speed 10511.21 samples/sec Loss 5.6659 LearningRate 0.1186 Epoch: 12 Global Step: 66800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:34:24,143-Speed 10497.69 samples/sec Loss 5.7456 LearningRate 0.1186 Epoch: 12 Global Step: 66810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:34:31,932-Speed 10518.76 samples/sec Loss 5.7176 LearningRate 0.1185 Epoch: 12 Global Step: 66820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:34:39,751-Speed 10477.64 samples/sec Loss 5.7327 LearningRate 0.1184 Epoch: 12 Global Step: 66830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:34:47,602-Speed 10438.33 samples/sec Loss 5.7380 LearningRate 0.1184 Epoch: 12 Global Step: 66840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:34:55,432-Speed 10464.10 samples/sec Loss 5.6792 LearningRate 0.1183 Epoch: 12 Global Step: 66850 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 05:35:03,269-Speed 10454.16 samples/sec Loss 5.7019 LearningRate 0.1182 Epoch: 12 Global Step: 66860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:35:11,079-Speed 10490.02 samples/sec Loss 5.7119 LearningRate 0.1182 Epoch: 12 Global Step: 66870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:35:18,889-Speed 10491.85 samples/sec Loss 5.7074 LearningRate 0.1181 Epoch: 12 Global Step: 66880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:35:26,697-Speed 10492.14 samples/sec Loss 5.6783 LearningRate 0.1180 Epoch: 12 Global Step: 66890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:35:34,511-Speed 10485.57 samples/sec Loss 5.7233 LearningRate 0.1180 Epoch: 12 Global Step: 66900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:35:42,304-Speed 10513.27 samples/sec Loss 5.6741 LearningRate 0.1179 Epoch: 12 Global Step: 66910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:35:50,104-Speed 10504.56 samples/sec Loss 5.6999 LearningRate 0.1179 Epoch: 12 Global Step: 66920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:35:57,941-Speed 10454.45 samples/sec Loss 5.7001 LearningRate 0.1178 Epoch: 12 Global Step: 66930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:36:05,715-Speed 10538.78 samples/sec Loss 5.7036 LearningRate 0.1177 Epoch: 12 Global Step: 66940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:36:13,521-Speed 10496.72 samples/sec Loss 5.6696 LearningRate 0.1177 Epoch: 12 Global Step: 66950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:36:21,328-Speed 10494.23 samples/sec Loss 5.6465 LearningRate 0.1176 Epoch: 12 Global Step: 66960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:36:29,120-Speed 10515.40 samples/sec Loss 5.6730 LearningRate 0.1175 Epoch: 12 Global Step: 66970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:36:36,914-Speed 10511.29 samples/sec Loss 5.7069 LearningRate 0.1175 Epoch: 12 Global Step: 66980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:36:44,698-Speed 10524.60 samples/sec Loss 5.6824 LearningRate 0.1174 Epoch: 12 Global Step: 66990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:36:52,523-Speed 10471.85 samples/sec Loss 5.6850 LearningRate 0.1173 Epoch: 12 Global Step: 67000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:37:00,300-Speed 10535.14 samples/sec Loss 5.6618 LearningRate 0.1173 Epoch: 12 Global Step: 67010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:37:08,088-Speed 10518.82 samples/sec Loss 5.6896 LearningRate 0.1172 Epoch: 12 Global Step: 67020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:37:15,867-Speed 10532.63 samples/sec Loss 5.6609 LearningRate 0.1171 Epoch: 12 Global Step: 67030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:37:23,670-Speed 10499.98 samples/sec Loss 5.6577 LearningRate 0.1171 Epoch: 12 Global Step: 67040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:37:31,465-Speed 10512.17 samples/sec Loss 5.6717 LearningRate 0.1170 Epoch: 12 Global Step: 67050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:37:39,233-Speed 10546.53 samples/sec Loss 5.6498 LearningRate 0.1170 Epoch: 12 Global Step: 67060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:37:47,018-Speed 10524.63 samples/sec Loss 5.6953 LearningRate 0.1169 Epoch: 12 Global Step: 67070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:37:54,835-Speed 10481.70 samples/sec Loss 5.6551 LearningRate 0.1168 Epoch: 12 Global Step: 67080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:38:02,640-Speed 10497.26 samples/sec Loss 5.6753 LearningRate 0.1168 Epoch: 12 Global Step: 67090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:38:10,433-Speed 10513.37 samples/sec Loss 5.6673 LearningRate 0.1167 Epoch: 12 Global Step: 67100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:38:18,273-Speed 10450.60 samples/sec Loss 5.6868 LearningRate 0.1166 Epoch: 12 Global Step: 67110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:38:26,062-Speed 10519.70 samples/sec Loss 5.6818 LearningRate 0.1166 Epoch: 12 Global Step: 67120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:38:33,864-Speed 10500.73 samples/sec Loss 5.6081 LearningRate 0.1165 Epoch: 12 Global Step: 67130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:38:41,648-Speed 10525.35 samples/sec Loss 5.6611 LearningRate 0.1164 Epoch: 12 Global Step: 67140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:38:49,439-Speed 10516.93 samples/sec Loss 5.6914 LearningRate 0.1164 Epoch: 12 Global Step: 67150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:38:57,240-Speed 10502.57 samples/sec Loss 5.7050 LearningRate 0.1163 Epoch: 12 Global Step: 67160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:39:05,041-Speed 10502.39 samples/sec Loss 5.6571 LearningRate 0.1163 Epoch: 12 Global Step: 67170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:39:12,836-Speed 10509.98 samples/sec Loss 5.6428 LearningRate 0.1162 Epoch: 12 Global Step: 67180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:39:20,625-Speed 10519.60 samples/sec Loss 5.6855 LearningRate 0.1161 Epoch: 12 Global Step: 67190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:39:28,417-Speed 10513.97 samples/sec Loss 5.6514 LearningRate 0.1161 Epoch: 12 Global Step: 67200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:39:36,233-Speed 10483.50 samples/sec Loss 5.6893 LearningRate 0.1160 Epoch: 12 Global Step: 67210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:39:44,072-Speed 10451.42 samples/sec Loss 5.6482 LearningRate 0.1159 Epoch: 12 Global Step: 67220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:39:51,917-Speed 10444.00 samples/sec Loss 5.6230 LearningRate 0.1159 Epoch: 12 Global Step: 67230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:39:59,737-Speed 10476.61 samples/sec Loss 5.6768 LearningRate 0.1158 Epoch: 12 Global Step: 67240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:40:07,555-Speed 10479.73 samples/sec Loss 5.6560 LearningRate 0.1157 Epoch: 12 Global Step: 67250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:40:15,390-Speed 10457.29 samples/sec Loss 5.6370 LearningRate 0.1157 Epoch: 12 Global Step: 67260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:40:23,236-Speed 10443.48 samples/sec Loss 5.6622 LearningRate 0.1156 Epoch: 12 Global Step: 67270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:40:31,050-Speed 10484.93 samples/sec Loss 5.6241 LearningRate 0.1156 Epoch: 12 Global Step: 67280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:40:38,862-Speed 10488.03 samples/sec Loss 5.6697 LearningRate 0.1155 Epoch: 12 Global Step: 67290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:40:46,679-Speed 10482.02 samples/sec Loss 5.6654 LearningRate 0.1154 Epoch: 12 Global Step: 67300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:40:54,469-Speed 10519.24 samples/sec Loss 5.6528 LearningRate 0.1154 Epoch: 12 Global Step: 67310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:41:02,280-Speed 10488.57 samples/sec Loss 5.6461 LearningRate 0.1153 Epoch: 12 Global Step: 67320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:41:10,088-Speed 10493.18 samples/sec Loss 5.6395 LearningRate 0.1152 Epoch: 12 Global Step: 67330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:41:17,870-Speed 10528.36 samples/sec Loss 5.6430 LearningRate 0.1152 Epoch: 12 Global Step: 67340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:41:25,671-Speed 10502.71 samples/sec Loss 5.6217 LearningRate 0.1151 Epoch: 12 Global Step: 67350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:41:33,435-Speed 10552.20 samples/sec Loss 5.6255 LearningRate 0.1150 Epoch: 12 Global Step: 67360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:41:41,211-Speed 10535.95 samples/sec Loss 5.6623 LearningRate 0.1150 Epoch: 12 Global Step: 67370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:41:48,995-Speed 10526.26 samples/sec Loss 5.6492 LearningRate 0.1149 Epoch: 12 Global Step: 67380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:41:56,805-Speed 10490.92 samples/sec Loss 5.6729 LearningRate 0.1149 Epoch: 12 Global Step: 67390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:42:04,620-Speed 10484.32 samples/sec Loss 5.6423 LearningRate 0.1148 Epoch: 12 Global Step: 67400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:42:27,292-Speed 3613.38 samples/sec Loss 5.6380 LearningRate 0.1147 Epoch: 13 Global Step: 67410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:42:35,098-Speed 10497.15 samples/sec Loss 5.6487 LearningRate 0.1147 Epoch: 13 Global Step: 67420 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:42:42,878-Speed 10533.27 samples/sec Loss 5.6219 LearningRate 0.1146 Epoch: 13 Global Step: 67430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:42:50,678-Speed 10504.96 samples/sec Loss 5.6204 LearningRate 0.1145 Epoch: 13 Global Step: 67440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:42:58,486-Speed 10492.14 samples/sec Loss 5.5890 LearningRate 0.1145 Epoch: 13 Global Step: 67450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:43:06,294-Speed 10496.66 samples/sec Loss 5.6230 LearningRate 0.1144 Epoch: 13 Global Step: 67460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:43:14,109-Speed 10485.23 samples/sec Loss 5.6228 LearningRate 0.1144 Epoch: 13 Global Step: 67470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:43:21,930-Speed 10475.67 samples/sec Loss 5.5924 LearningRate 0.1143 Epoch: 13 Global Step: 67480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:43:29,755-Speed 10471.31 samples/sec Loss 5.6022 LearningRate 0.1142 Epoch: 13 Global Step: 67490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:43:37,584-Speed 10464.34 samples/sec Loss 5.6021 LearningRate 0.1142 Epoch: 13 Global Step: 67500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:43:45,395-Speed 10488.61 samples/sec Loss 5.6528 LearningRate 0.1141 Epoch: 13 Global Step: 67510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:43:53,217-Speed 10474.66 samples/sec Loss 5.5783 LearningRate 0.1140 Epoch: 13 Global Step: 67520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:44:01,032-Speed 10484.02 samples/sec Loss 5.5591 LearningRate 0.1140 Epoch: 13 Global Step: 67530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:44:08,812-Speed 10530.92 samples/sec Loss 5.5711 LearningRate 0.1139 Epoch: 13 Global Step: 67540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:44:16,595-Speed 10527.13 samples/sec Loss 5.5767 LearningRate 0.1138 Epoch: 13 Global Step: 67550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:44:24,373-Speed 10534.99 samples/sec Loss 5.6157 LearningRate 0.1138 Epoch: 13 Global Step: 67560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:44:32,155-Speed 10528.25 samples/sec Loss 5.6418 LearningRate 0.1137 Epoch: 13 Global Step: 67570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:44:39,914-Speed 10559.32 samples/sec Loss 5.5973 LearningRate 0.1137 Epoch: 13 Global Step: 67580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:44:47,674-Speed 10559.32 samples/sec Loss 5.6295 LearningRate 0.1136 Epoch: 13 Global Step: 67590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:44:55,424-Speed 10570.45 samples/sec Loss 5.5674 LearningRate 0.1135 Epoch: 13 Global Step: 67600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:45:03,248-Speed 10472.16 samples/sec Loss 5.5792 LearningRate 0.1135 Epoch: 13 Global Step: 67610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:45:11,038-Speed 10517.41 samples/sec Loss 5.5671 LearningRate 0.1134 Epoch: 13 Global Step: 67620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:45:18,824-Speed 10523.19 samples/sec Loss 5.5907 LearningRate 0.1133 Epoch: 13 Global Step: 67630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:45:26,626-Speed 10500.83 samples/sec Loss 5.5700 LearningRate 0.1133 Epoch: 13 Global Step: 67640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:45:34,448-Speed 10474.87 samples/sec Loss 5.5764 LearningRate 0.1132 Epoch: 13 Global Step: 67650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:45:42,228-Speed 10531.00 samples/sec Loss 5.6142 LearningRate 0.1132 Epoch: 13 Global Step: 67660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:45:50,007-Speed 10532.12 samples/sec Loss 5.5938 LearningRate 0.1131 Epoch: 13 Global Step: 67670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:45:57,824-Speed 10481.64 samples/sec Loss 5.5983 LearningRate 0.1130 Epoch: 13 Global Step: 67680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:46:05,598-Speed 10539.17 samples/sec Loss 5.6174 LearningRate 0.1130 Epoch: 13 Global Step: 67690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:46:13,396-Speed 10505.86 samples/sec Loss 5.5872 LearningRate 0.1129 Epoch: 13 Global Step: 67700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:46:21,185-Speed 10520.74 samples/sec Loss 5.6021 LearningRate 0.1128 Epoch: 13 Global Step: 67710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:46:28,973-Speed 10519.39 samples/sec Loss 5.6066 LearningRate 0.1128 Epoch: 13 Global Step: 67720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:46:36,755-Speed 10527.59 samples/sec Loss 5.6031 LearningRate 0.1127 Epoch: 13 Global Step: 67730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:46:44,540-Speed 10525.53 samples/sec Loss 5.6013 LearningRate 0.1127 Epoch: 13 Global Step: 67740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:46:52,308-Speed 10546.78 samples/sec Loss 5.5957 LearningRate 0.1126 Epoch: 13 Global Step: 67750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:47:00,088-Speed 10530.65 samples/sec Loss 5.5784 LearningRate 0.1125 Epoch: 13 Global Step: 67760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:47:07,867-Speed 10531.28 samples/sec Loss 5.5734 LearningRate 0.1125 Epoch: 13 Global Step: 67770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:47:15,650-Speed 10528.39 samples/sec Loss 5.5983 LearningRate 0.1124 Epoch: 13 Global Step: 67780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:47:23,442-Speed 10514.90 samples/sec Loss 5.5792 LearningRate 0.1123 Epoch: 13 Global Step: 67790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:47:31,250-Speed 10492.17 samples/sec Loss 5.5326 LearningRate 0.1123 Epoch: 13 Global Step: 67800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:47:39,074-Speed 10471.98 samples/sec Loss 5.6108 LearningRate 0.1122 Epoch: 13 Global Step: 67810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:47:46,902-Speed 10467.22 samples/sec Loss 5.5589 LearningRate 0.1122 Epoch: 13 Global Step: 67820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:47:54,737-Speed 10457.70 samples/sec Loss 5.5602 LearningRate 0.1121 Epoch: 13 Global Step: 67830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:48:02,574-Speed 10454.32 samples/sec Loss 5.5794 LearningRate 0.1120 Epoch: 13 Global Step: 67840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:48:10,386-Speed 10488.52 samples/sec Loss 5.5598 LearningRate 0.1120 Epoch: 13 Global Step: 67850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:48:18,229-Speed 10446.33 samples/sec Loss 5.5607 LearningRate 0.1119 Epoch: 13 Global Step: 67860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:48:26,056-Speed 10468.34 samples/sec Loss 5.5704 LearningRate 0.1118 Epoch: 13 Global Step: 67870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:48:33,872-Speed 10482.91 samples/sec Loss 5.6049 LearningRate 0.1118 Epoch: 13 Global Step: 67880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:48:41,699-Speed 10467.26 samples/sec Loss 5.6046 LearningRate 0.1117 Epoch: 13 Global Step: 67890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:48:49,522-Speed 10473.93 samples/sec Loss 5.5960 LearningRate 0.1117 Epoch: 13 Global Step: 67900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:48:57,368-Speed 10442.38 samples/sec Loss 5.5549 LearningRate 0.1116 Epoch: 13 Global Step: 67910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:49:05,203-Speed 10456.39 samples/sec Loss 5.5145 LearningRate 0.1115 Epoch: 13 Global Step: 67920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:49:13,022-Speed 10489.74 samples/sec Loss 5.5636 LearningRate 0.1115 Epoch: 13 Global Step: 67930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:49:20,841-Speed 10483.57 samples/sec Loss 5.5689 LearningRate 0.1114 Epoch: 13 Global Step: 67940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:49:28,707-Speed 10415.23 samples/sec Loss 5.5136 LearningRate 0.1113 Epoch: 13 Global Step: 67950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:49:36,549-Speed 10447.86 samples/sec Loss 5.5461 LearningRate 0.1113 Epoch: 13 Global Step: 67960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:49:44,403-Speed 10432.34 samples/sec Loss 5.5246 LearningRate 0.1112 Epoch: 13 Global Step: 67970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:49:52,326-Speed 10341.63 samples/sec Loss 5.5204 LearningRate 0.1112 Epoch: 13 Global Step: 67980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:50:00,165-Speed 10451.34 samples/sec Loss 5.5519 LearningRate 0.1111 Epoch: 13 Global Step: 67990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:50:08,027-Speed 10420.19 samples/sec Loss 5.5299 LearningRate 0.1110 Epoch: 13 Global Step: 68000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:50:15,868-Speed 10449.84 samples/sec Loss 5.5030 LearningRate 0.1110 Epoch: 13 Global Step: 68010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:50:23,693-Speed 10469.99 samples/sec Loss 5.5545 LearningRate 0.1109 Epoch: 13 Global Step: 68020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:50:31,499-Speed 10495.96 samples/sec Loss 5.5342 LearningRate 0.1108 Epoch: 13 Global Step: 68030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:50:39,320-Speed 10476.11 samples/sec Loss 5.5475 LearningRate 0.1108 Epoch: 13 Global Step: 68040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:50:47,126-Speed 10495.87 samples/sec Loss 5.5434 LearningRate 0.1107 Epoch: 13 Global Step: 68050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:50:54,926-Speed 10503.67 samples/sec Loss 5.5333 LearningRate 0.1107 Epoch: 13 Global Step: 68060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:51:02,731-Speed 10498.37 samples/sec Loss 5.5067 LearningRate 0.1106 Epoch: 13 Global Step: 68070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:51:10,541-Speed 10489.91 samples/sec Loss 5.5123 LearningRate 0.1105 Epoch: 13 Global Step: 68080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:51:18,354-Speed 10485.64 samples/sec Loss 5.5190 LearningRate 0.1105 Epoch: 13 Global Step: 68090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:51:26,200-Speed 10442.29 samples/sec Loss 5.5125 LearningRate 0.1104 Epoch: 13 Global Step: 68100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:51:34,005-Speed 10497.52 samples/sec Loss 5.5182 LearningRate 0.1103 Epoch: 13 Global Step: 68110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:51:41,859-Speed 10431.50 samples/sec Loss 5.4770 LearningRate 0.1103 Epoch: 13 Global Step: 68120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:51:49,656-Speed 10508.22 samples/sec Loss 5.5428 LearningRate 0.1102 Epoch: 13 Global Step: 68130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:51:57,490-Speed 10459.14 samples/sec Loss 5.5447 LearningRate 0.1102 Epoch: 13 Global Step: 68140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:52:05,303-Speed 10486.01 samples/sec Loss 5.4964 LearningRate 0.1101 Epoch: 13 Global Step: 68150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:52:13,141-Speed 10453.68 samples/sec Loss 5.5468 LearningRate 0.1100 Epoch: 13 Global Step: 68160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:52:20,960-Speed 10480.67 samples/sec Loss 5.5267 LearningRate 0.1100 Epoch: 13 Global Step: 68170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:52:28,757-Speed 10507.20 samples/sec Loss 5.5511 LearningRate 0.1099 Epoch: 13 Global Step: 68180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:52:36,559-Speed 10501.70 samples/sec Loss 5.5174 LearningRate 0.1098 Epoch: 13 Global Step: 68190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:52:44,352-Speed 10513.21 samples/sec Loss 5.4954 LearningRate 0.1098 Epoch: 13 Global Step: 68200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:52:52,137-Speed 10523.93 samples/sec Loss 5.4581 LearningRate 0.1097 Epoch: 13 Global Step: 68210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:52:59,955-Speed 10480.12 samples/sec Loss 5.4985 LearningRate 0.1097 Epoch: 13 Global Step: 68220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:53:07,756-Speed 10502.47 samples/sec Loss 5.5437 LearningRate 0.1096 Epoch: 13 Global Step: 68230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:53:15,548-Speed 10515.12 samples/sec Loss 5.5371 LearningRate 0.1095 Epoch: 13 Global Step: 68240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:53:23,365-Speed 10482.18 samples/sec Loss 5.5042 LearningRate 0.1095 Epoch: 13 Global Step: 68250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:53:31,169-Speed 10497.84 samples/sec Loss 5.4712 LearningRate 0.1094 Epoch: 13 Global Step: 68260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:53:38,939-Speed 10545.37 samples/sec Loss 5.5349 LearningRate 0.1094 Epoch: 13 Global Step: 68270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:53:46,720-Speed 10529.14 samples/sec Loss 5.5037 LearningRate 0.1093 Epoch: 13 Global Step: 68280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:53:54,516-Speed 10509.57 samples/sec Loss 5.5069 LearningRate 0.1092 Epoch: 13 Global Step: 68290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:54:02,292-Speed 10535.84 samples/sec Loss 5.5319 LearningRate 0.1092 Epoch: 13 Global Step: 68300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:54:10,067-Speed 10537.09 samples/sec Loss 5.4944 LearningRate 0.1091 Epoch: 13 Global Step: 68310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:54:17,883-Speed 10483.75 samples/sec Loss 5.4849 LearningRate 0.1090 Epoch: 13 Global Step: 68320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:54:25,709-Speed 10468.65 samples/sec Loss 5.4904 LearningRate 0.1090 Epoch: 13 Global Step: 68330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:54:33,497-Speed 10521.10 samples/sec Loss 5.4816 LearningRate 0.1089 Epoch: 13 Global Step: 68340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:54:41,290-Speed 10514.67 samples/sec Loss 5.4622 LearningRate 0.1089 Epoch: 13 Global Step: 68350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:54:49,075-Speed 10522.96 samples/sec Loss 5.5262 LearningRate 0.1088 Epoch: 13 Global Step: 68360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:54:56,864-Speed 10519.06 samples/sec Loss 5.4779 LearningRate 0.1087 Epoch: 13 Global Step: 68370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:55:04,667-Speed 10500.21 samples/sec Loss 5.4574 LearningRate 0.1087 Epoch: 13 Global Step: 68380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:55:12,461-Speed 10513.09 samples/sec Loss 5.4743 LearningRate 0.1086 Epoch: 13 Global Step: 68390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:55:20,266-Speed 10497.06 samples/sec Loss 5.5233 LearningRate 0.1086 Epoch: 13 Global Step: 68400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:55:28,079-Speed 10485.33 samples/sec Loss 5.4692 LearningRate 0.1085 Epoch: 13 Global Step: 68410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:55:35,895-Speed 10484.99 samples/sec Loss 5.4528 LearningRate 0.1084 Epoch: 13 Global Step: 68420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:55:43,716-Speed 10476.76 samples/sec Loss 5.4381 LearningRate 0.1084 Epoch: 13 Global Step: 68430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:55:51,533-Speed 10480.30 samples/sec Loss 5.4599 LearningRate 0.1083 Epoch: 13 Global Step: 68440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:55:59,361-Speed 10468.96 samples/sec Loss 5.4856 LearningRate 0.1082 Epoch: 13 Global Step: 68450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:56:07,171-Speed 10490.27 samples/sec Loss 5.4774 LearningRate 0.1082 Epoch: 13 Global Step: 68460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:56:14,966-Speed 10511.06 samples/sec Loss 5.4666 LearningRate 0.1081 Epoch: 13 Global Step: 68470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:56:22,757-Speed 10516.46 samples/sec Loss 5.4503 LearningRate 0.1081 Epoch: 13 Global Step: 68480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:56:30,548-Speed 10515.07 samples/sec Loss 5.5015 LearningRate 0.1080 Epoch: 13 Global Step: 68490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:56:38,395-Speed 10441.92 samples/sec Loss 5.4460 LearningRate 0.1079 Epoch: 13 Global Step: 68500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:56:46,178-Speed 10527.91 samples/sec Loss 5.4598 LearningRate 0.1079 Epoch: 13 Global Step: 68510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:56:53,961-Speed 10527.05 samples/sec Loss 5.4953 LearningRate 0.1078 Epoch: 13 Global Step: 68520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:57:01,810-Speed 10436.93 samples/sec Loss 5.5119 LearningRate 0.1078 Epoch: 13 Global Step: 68530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:57:09,620-Speed 10491.33 samples/sec Loss 5.4743 LearningRate 0.1077 Epoch: 13 Global Step: 68540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:57:17,404-Speed 10526.52 samples/sec Loss 5.4891 LearningRate 0.1076 Epoch: 13 Global Step: 68550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:57:25,228-Speed 10470.77 samples/sec Loss 5.4549 LearningRate 0.1076 Epoch: 13 Global Step: 68560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:57:33,017-Speed 10519.53 samples/sec Loss 5.4801 LearningRate 0.1075 Epoch: 13 Global Step: 68570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:57:40,810-Speed 10513.90 samples/sec Loss 5.4597 LearningRate 0.1074 Epoch: 13 Global Step: 68580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:57:48,601-Speed 10515.60 samples/sec Loss 5.4748 LearningRate 0.1074 Epoch: 13 Global Step: 68590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:57:56,397-Speed 10510.48 samples/sec Loss 5.4666 LearningRate 0.1073 Epoch: 13 Global Step: 68600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:58:04,194-Speed 10506.92 samples/sec Loss 5.4897 LearningRate 0.1073 Epoch: 13 Global Step: 68610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:58:11,978-Speed 10529.24 samples/sec Loss 5.4568 LearningRate 0.1072 Epoch: 13 Global Step: 68620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:58:19,774-Speed 10509.19 samples/sec Loss 5.4754 LearningRate 0.1071 Epoch: 13 Global Step: 68630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:58:27,561-Speed 10521.73 samples/sec Loss 5.4213 LearningRate 0.1071 Epoch: 13 Global Step: 68640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:58:35,359-Speed 10506.00 samples/sec Loss 5.4779 LearningRate 0.1070 Epoch: 13 Global Step: 68650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:58:43,191-Speed 10461.68 samples/sec Loss 5.4171 LearningRate 0.1070 Epoch: 13 Global Step: 68660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:58:50,992-Speed 10502.38 samples/sec Loss 5.4499 LearningRate 0.1069 Epoch: 13 Global Step: 68670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 05:58:58,791-Speed 10506.01 samples/sec Loss 5.4313 LearningRate 0.1068 Epoch: 13 Global Step: 68680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:59:06,612-Speed 10474.72 samples/sec Loss 5.4746 LearningRate 0.1068 Epoch: 13 Global Step: 68690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:59:14,412-Speed 10503.85 samples/sec Loss 5.4542 LearningRate 0.1067 Epoch: 13 Global Step: 68700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:59:22,207-Speed 10511.65 samples/sec Loss 5.4360 LearningRate 0.1067 Epoch: 13 Global Step: 68710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:59:30,014-Speed 10494.32 samples/sec Loss 5.4351 LearningRate 0.1066 Epoch: 13 Global Step: 68720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:59:37,806-Speed 10514.44 samples/sec Loss 5.4682 LearningRate 0.1065 Epoch: 13 Global Step: 68730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:59:45,591-Speed 10524.93 samples/sec Loss 5.4283 LearningRate 0.1065 Epoch: 13 Global Step: 68740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 05:59:53,407-Speed 10481.67 samples/sec Loss 5.4388 LearningRate 0.1064 Epoch: 13 Global Step: 68750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:00:01,203-Speed 10509.72 samples/sec Loss 5.4357 LearningRate 0.1063 Epoch: 13 Global Step: 68760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:00:09,010-Speed 10494.53 samples/sec Loss 5.3798 LearningRate 0.1063 Epoch: 13 Global Step: 68770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:00:16,842-Speed 10461.94 samples/sec Loss 5.4327 LearningRate 0.1062 Epoch: 13 Global Step: 68780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 06:00:24,659-Speed 10481.17 samples/sec Loss 5.4597 LearningRate 0.1062 Epoch: 13 Global Step: 68790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 06:00:32,468-Speed 10490.68 samples/sec Loss 5.4207 LearningRate 0.1061 Epoch: 13 Global Step: 68800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 06:00:40,289-Speed 10476.19 samples/sec Loss 5.4613 LearningRate 0.1060 Epoch: 13 Global Step: 68810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 06:00:48,094-Speed 10497.94 samples/sec Loss 5.4559 LearningRate 0.1060 Epoch: 13 Global Step: 68820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 06:00:55,894-Speed 10503.06 samples/sec Loss 5.3935 LearningRate 0.1059 Epoch: 13 Global Step: 68830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 06:01:03,694-Speed 10504.94 samples/sec Loss 5.4204 LearningRate 0.1059 Epoch: 13 Global Step: 68840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 06:01:11,506-Speed 10486.21 samples/sec Loss 5.4144 LearningRate 0.1058 Epoch: 13 Global Step: 68850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 06:01:19,331-Speed 10471.42 samples/sec Loss 5.4371 LearningRate 0.1057 Epoch: 13 Global Step: 68860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 06:01:27,134-Speed 10500.40 samples/sec Loss 5.4135 LearningRate 0.1057 Epoch: 13 Global Step: 68870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 06:01:34,985-Speed 10435.27 samples/sec Loss 5.4554 LearningRate 0.1056 Epoch: 13 Global Step: 68880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 06:01:42,767-Speed 10528.46 samples/sec Loss 5.3818 LearningRate 0.1056 Epoch: 13 Global Step: 68890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:01:50,594-Speed 10467.17 samples/sec Loss 5.4575 LearningRate 0.1055 Epoch: 13 Global Step: 68900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:01:58,427-Speed 10460.91 samples/sec Loss 5.4134 LearningRate 0.1054 Epoch: 13 Global Step: 68910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:02:06,229-Speed 10499.76 samples/sec Loss 5.4417 LearningRate 0.1054 Epoch: 13 Global Step: 68920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:02:14,036-Speed 10495.14 samples/sec Loss 5.4127 LearningRate 0.1053 Epoch: 13 Global Step: 68930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:02:21,834-Speed 10507.34 samples/sec Loss 5.4312 LearningRate 0.1053 Epoch: 13 Global Step: 68940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:02:29,629-Speed 10510.84 samples/sec Loss 5.4286 LearningRate 0.1052 Epoch: 13 Global Step: 68950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:02:37,414-Speed 10523.98 samples/sec Loss 5.4271 LearningRate 0.1051 Epoch: 13 Global Step: 68960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:02:45,250-Speed 10455.37 samples/sec Loss 5.4439 LearningRate 0.1051 Epoch: 13 Global Step: 68970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:02:53,106-Speed 10429.76 samples/sec Loss 5.4363 LearningRate 0.1050 Epoch: 13 Global Step: 68980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:03:00,926-Speed 10477.21 samples/sec Loss 5.4698 LearningRate 0.1050 Epoch: 13 Global Step: 68990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:03:08,736-Speed 10489.36 samples/sec Loss 5.4142 LearningRate 0.1049 Epoch: 13 Global Step: 69000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:03:16,573-Speed 10455.10 samples/sec Loss 5.3955 LearningRate 0.1048 Epoch: 13 Global Step: 69010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:03:24,353-Speed 10531.51 samples/sec Loss 5.3994 LearningRate 0.1048 Epoch: 13 Global Step: 69020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:03:32,148-Speed 10510.92 samples/sec Loss 5.3819 LearningRate 0.1047 Epoch: 13 Global Step: 69030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:03:39,942-Speed 10511.56 samples/sec Loss 5.3519 LearningRate 0.1046 Epoch: 13 Global Step: 69040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:03:47,771-Speed 10465.18 samples/sec Loss 5.3640 LearningRate 0.1046 Epoch: 13 Global Step: 69050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:03:55,565-Speed 10512.07 samples/sec Loss 5.4215 LearningRate 0.1045 Epoch: 13 Global Step: 69060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:04:03,361-Speed 10508.80 samples/sec Loss 5.4297 LearningRate 0.1045 Epoch: 13 Global Step: 69070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:04:11,154-Speed 10513.17 samples/sec Loss 5.3675 LearningRate 0.1044 Epoch: 13 Global Step: 69080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:04:18,946-Speed 10514.92 samples/sec Loss 5.3721 LearningRate 0.1043 Epoch: 13 Global Step: 69090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 06:04:26,745-Speed 10505.00 samples/sec Loss 5.3933 LearningRate 0.1043 Epoch: 13 Global Step: 69100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 06:04:34,572-Speed 10474.94 samples/sec Loss 5.4090 LearningRate 0.1042 Epoch: 13 Global Step: 69110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 06:04:42,389-Speed 10480.61 samples/sec Loss 5.4210 LearningRate 0.1042 Epoch: 13 Global Step: 69120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 06:04:50,187-Speed 10508.54 samples/sec Loss 5.3665 LearningRate 0.1041 Epoch: 13 Global Step: 69130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:04:58,007-Speed 10478.50 samples/sec Loss 5.4019 LearningRate 0.1040 Epoch: 13 Global Step: 69140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:05:05,819-Speed 10487.88 samples/sec Loss 5.3847 LearningRate 0.1040 Epoch: 13 Global Step: 69150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:05:13,628-Speed 10492.52 samples/sec Loss 5.3484 LearningRate 0.1039 Epoch: 13 Global Step: 69160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:05:21,406-Speed 10534.01 samples/sec Loss 5.3587 LearningRate 0.1039 Epoch: 13 Global Step: 69170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:05:29,208-Speed 10500.30 samples/sec Loss 5.3741 LearningRate 0.1038 Epoch: 13 Global Step: 69180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:05:37,066-Speed 10426.57 samples/sec Loss 5.3723 LearningRate 0.1037 Epoch: 13 Global Step: 69190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:05:44,874-Speed 10493.33 samples/sec Loss 5.4028 LearningRate 0.1037 Epoch: 13 Global Step: 69200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:05:52,653-Speed 10532.55 samples/sec Loss 5.3453 LearningRate 0.1036 Epoch: 13 Global Step: 69210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:06:00,459-Speed 10495.44 samples/sec Loss 5.3894 LearningRate 0.1036 Epoch: 13 Global Step: 69220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:06:08,234-Speed 10537.56 samples/sec Loss 5.3858 LearningRate 0.1035 Epoch: 13 Global Step: 69230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 06:06:16,021-Speed 10521.59 samples/sec Loss 5.4031 LearningRate 0.1034 Epoch: 13 Global Step: 69240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 06:06:23,814-Speed 10513.15 samples/sec Loss 5.3759 LearningRate 0.1034 Epoch: 13 Global Step: 69250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 06:06:31,588-Speed 10539.24 samples/sec Loss 5.3459 LearningRate 0.1033 Epoch: 13 Global Step: 69260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 06:06:39,388-Speed 10504.34 samples/sec Loss 5.3855 LearningRate 0.1033 Epoch: 13 Global Step: 69270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 06:06:47,188-Speed 10504.12 samples/sec Loss 5.3742 LearningRate 0.1032 Epoch: 13 Global Step: 69280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 06:06:55,015-Speed 10467.32 samples/sec Loss 5.3480 LearningRate 0.1031 Epoch: 13 Global Step: 69290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 06:07:02,825-Speed 10491.49 samples/sec Loss 5.3565 LearningRate 0.1031 Epoch: 13 Global Step: 69300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 06:07:10,605-Speed 10530.95 samples/sec Loss 5.4100 LearningRate 0.1030 Epoch: 13 Global Step: 69310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 06:07:18,426-Speed 10475.89 samples/sec Loss 5.3598 LearningRate 0.1030 Epoch: 13 Global Step: 69320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 06:07:26,212-Speed 10523.77 samples/sec Loss 5.3750 LearningRate 0.1029 Epoch: 13 Global Step: 69330 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 06:07:34,009-Speed 10507.49 samples/sec Loss 5.3248 LearningRate 0.1028 Epoch: 13 Global Step: 69340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 06:07:41,813-Speed 10498.83 samples/sec Loss 5.3439 LearningRate 0.1028 Epoch: 13 Global Step: 69350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 06:07:49,610-Speed 10507.60 samples/sec Loss 5.3673 LearningRate 0.1027 Epoch: 13 Global Step: 69360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 06:07:57,493-Speed 10393.83 samples/sec Loss 5.3674 LearningRate 0.1027 Epoch: 13 Global Step: 69370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 06:08:05,286-Speed 10513.80 samples/sec Loss 5.3399 LearningRate 0.1026 Epoch: 13 Global Step: 69380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 06:08:13,102-Speed 10482.54 samples/sec Loss 5.2941 LearningRate 0.1025 Epoch: 13 Global Step: 69390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 06:08:20,905-Speed 10499.37 samples/sec Loss 5.3526 LearningRate 0.1025 Epoch: 13 Global Step: 69400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 06:08:28,713-Speed 10493.85 samples/sec Loss 5.3382 LearningRate 0.1024 Epoch: 13 Global Step: 69410 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 06:08:36,513-Speed 10504.03 samples/sec Loss 5.3653 LearningRate 0.1024 Epoch: 13 Global Step: 69420 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 06:08:44,305-Speed 10514.11 samples/sec Loss 5.3272 LearningRate 0.1023 Epoch: 13 Global Step: 69430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 06:08:52,127-Speed 10474.64 samples/sec Loss 5.3726 LearningRate 0.1022 Epoch: 13 Global Step: 69440 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-01-16 06:08:59,931-Speed 10499.13 samples/sec Loss 5.3563 LearningRate 0.1022 Epoch: 13 Global Step: 69450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 06:09:07,727-Speed 10509.61 samples/sec Loss 5.3724 LearningRate 0.1021 Epoch: 13 Global Step: 69460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 06:09:15,541-Speed 10485.21 samples/sec Loss 5.2990 LearningRate 0.1021 Epoch: 13 Global Step: 69470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 06:09:23,325-Speed 10525.29 samples/sec Loss 5.3425 LearningRate 0.1020 Epoch: 13 Global Step: 69480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-01-16 06:09:31,109-Speed 10524.93 samples/sec Loss 5.2990 LearningRate 0.1019 Epoch: 13 Global Step: 69490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:09:38,955-Speed 10442.69 samples/sec Loss 5.3258 LearningRate 0.1019 Epoch: 13 Global Step: 69500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:09:46,755-Speed 10504.09 samples/sec Loss 5.3229 LearningRate 0.1018 Epoch: 13 Global Step: 69510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:09:54,557-Speed 10501.62 samples/sec Loss 5.3030 LearningRate 0.1018 Epoch: 13 Global Step: 69520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:10:02,381-Speed 10472.00 samples/sec Loss 5.3092 LearningRate 0.1017 Epoch: 13 Global Step: 69530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:10:10,158-Speed 10534.58 samples/sec Loss 5.3220 LearningRate 0.1017 Epoch: 13 Global Step: 69540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:10:17,978-Speed 10476.51 samples/sec Loss 5.3144 LearningRate 0.1016 Epoch: 13 Global Step: 69550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-01-16 06:10:25,790-Speed 10489.50 samples/sec Loss 5.3237 LearningRate 0.1015 Epoch: 13 Global Step: 69560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:10:33,587-Speed 10507.14 samples/sec Loss 5.3326 LearningRate 0.1015 Epoch: 13 Global Step: 69570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:10:41,449-Speed 10420.73 samples/sec Loss 5.3324 LearningRate 0.1014 Epoch: 13 Global Step: 69580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:10:49,249-Speed 10505.17 samples/sec Loss 5.2811 LearningRate 0.1014 Epoch: 13 Global Step: 69590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:10:57,076-Speed 10466.98 samples/sec Loss 5.3061 LearningRate 0.1013 Epoch: 13 Global Step: 69600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:11:04,855-Speed 10533.32 samples/sec Loss 5.3435 LearningRate 0.1012 Epoch: 13 Global Step: 69610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:11:12,655-Speed 10504.55 samples/sec Loss 5.3233 LearningRate 0.1012 Epoch: 13 Global Step: 69620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:11:20,476-Speed 10475.55 samples/sec Loss 5.3016 LearningRate 0.1011 Epoch: 13 Global Step: 69630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:11:28,325-Speed 10438.40 samples/sec Loss 5.3056 LearningRate 0.1011 Epoch: 13 Global Step: 69640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:11:36,141-Speed 10482.29 samples/sec Loss 5.2926 LearningRate 0.1010 Epoch: 13 Global Step: 69650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:11:43,964-Speed 10473.55 samples/sec Loss 5.2954 LearningRate 0.1009 Epoch: 13 Global Step: 69660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:11:51,777-Speed 10486.08 samples/sec Loss 5.2798 LearningRate 0.1009 Epoch: 13 Global Step: 69670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:11:59,587-Speed 10490.86 samples/sec Loss 5.3089 LearningRate 0.1008 Epoch: 13 Global Step: 69680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:12:07,385-Speed 10507.20 samples/sec Loss 5.3585 LearningRate 0.1008 Epoch: 13 Global Step: 69690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:12:15,179-Speed 10511.47 samples/sec Loss 5.3364 LearningRate 0.1007 Epoch: 13 Global Step: 69700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:12:22,995-Speed 10482.68 samples/sec Loss 5.2966 LearningRate 0.1006 Epoch: 13 Global Step: 69710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:12:30,817-Speed 10473.69 samples/sec Loss 5.2822 LearningRate 0.1006 Epoch: 13 Global Step: 69720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:12:38,618-Speed 10503.37 samples/sec Loss 5.2869 LearningRate 0.1005 Epoch: 13 Global Step: 69730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:12:46,402-Speed 10525.31 samples/sec Loss 5.3096 LearningRate 0.1005 Epoch: 13 Global Step: 69740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:12:54,224-Speed 10474.01 samples/sec Loss 5.3434 LearningRate 0.1004 Epoch: 13 Global Step: 69750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:13:02,016-Speed 10515.29 samples/sec Loss 5.3344 LearningRate 0.1003 Epoch: 13 Global Step: 69760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:13:09,856-Speed 10450.77 samples/sec Loss 5.3374 LearningRate 0.1003 Epoch: 13 Global Step: 69770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:13:17,671-Speed 10483.84 samples/sec Loss 5.3117 LearningRate 0.1002 Epoch: 13 Global Step: 69780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:13:25,461-Speed 10517.88 samples/sec Loss 5.2762 LearningRate 0.1002 Epoch: 13 Global Step: 69790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:13:33,279-Speed 10480.19 samples/sec Loss 5.2856 LearningRate 0.1001 Epoch: 13 Global Step: 69800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:13:41,088-Speed 10491.89 samples/sec Loss 5.2500 LearningRate 0.1000 Epoch: 13 Global Step: 69810 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 06:13:48,909-Speed 10474.69 samples/sec Loss 5.3334 LearningRate 0.1000 Epoch: 13 Global Step: 69820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:13:56,709-Speed 10504.59 samples/sec Loss 5.2597 LearningRate 0.0999 Epoch: 13 Global Step: 69830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:14:04,508-Speed 10505.17 samples/sec Loss 5.2802 LearningRate 0.0999 Epoch: 13 Global Step: 69840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:14:12,332-Speed 10471.80 samples/sec Loss 5.3152 LearningRate 0.0998 Epoch: 13 Global Step: 69850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:14:20,127-Speed 10509.46 samples/sec Loss 5.2958 LearningRate 0.0998 Epoch: 13 Global Step: 69860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:14:27,937-Speed 10491.74 samples/sec Loss 5.2960 LearningRate 0.0997 Epoch: 13 Global Step: 69870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:14:35,742-Speed 10497.67 samples/sec Loss 5.2774 LearningRate 0.0996 Epoch: 13 Global Step: 69880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:14:43,530-Speed 10518.93 samples/sec Loss 5.2489 LearningRate 0.0996 Epoch: 13 Global Step: 69890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:14:51,349-Speed 10479.45 samples/sec Loss 5.3146 LearningRate 0.0995 Epoch: 13 Global Step: 69900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:14:59,167-Speed 10479.37 samples/sec Loss 5.2797 LearningRate 0.0995 Epoch: 13 Global Step: 69910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:15:06,952-Speed 10523.97 samples/sec Loss 5.2931 LearningRate 0.0994 Epoch: 13 Global Step: 69920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:15:14,753-Speed 10502.59 samples/sec Loss 5.2639 LearningRate 0.0993 Epoch: 13 Global Step: 69930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:15:22,547-Speed 10512.21 samples/sec Loss 5.2749 LearningRate 0.0993 Epoch: 13 Global Step: 69940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:15:30,394-Speed 10441.96 samples/sec Loss 5.2759 LearningRate 0.0992 Epoch: 13 Global Step: 69950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:15:38,198-Speed 10497.35 samples/sec Loss 5.3047 LearningRate 0.0992 Epoch: 13 Global Step: 69960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:15:46,017-Speed 10479.24 samples/sec Loss 5.2873 LearningRate 0.0991 Epoch: 13 Global Step: 69970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:15:53,841-Speed 10472.85 samples/sec Loss 5.2943 LearningRate 0.0990 Epoch: 13 Global Step: 69980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:16:01,638-Speed 10508.03 samples/sec Loss 5.2893 LearningRate 0.0990 Epoch: 13 Global Step: 69990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:16:09,442-Speed 10497.93 samples/sec Loss 5.2927 LearningRate 0.0989 Epoch: 13 Global Step: 70000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:16:37,631-[lfw][70000]XNorm: 24.195342 Training: 2022-01-16 06:16:37,632-[lfw][70000]Accuracy-Flip: 0.99783+-0.00269 Training: 2022-01-16 06:16:37,633-[lfw][70000]Accuracy-Highest: 0.99783 Training: 2022-01-16 06:17:10,348-[cfp_fp][70000]XNorm: 21.559299 Training: 2022-01-16 06:17:10,349-[cfp_fp][70000]Accuracy-Flip: 0.98700+-0.00536 Training: 2022-01-16 06:17:10,349-[cfp_fp][70000]Accuracy-Highest: 0.98700 Training: 2022-01-16 06:17:38,163-[agedb_30][70000]XNorm: 23.580380 Training: 2022-01-16 06:17:38,164-[agedb_30][70000]Accuracy-Flip: 0.97667+-0.00500 Training: 2022-01-16 06:17:38,164-[agedb_30][70000]Accuracy-Highest: 0.97667 Training: 2022-01-16 06:17:45,920-Speed 849.12 samples/sec Loss 5.2332 LearningRate 0.0989 Epoch: 13 Global Step: 70010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:17:53,650-Speed 10599.18 samples/sec Loss 5.2801 LearningRate 0.0988 Epoch: 13 Global Step: 70020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:18:01,395-Speed 10577.56 samples/sec Loss 5.2409 LearningRate 0.0988 Epoch: 13 Global Step: 70030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:18:09,151-Speed 10567.45 samples/sec Loss 5.2373 LearningRate 0.0987 Epoch: 13 Global Step: 70040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:18:16,918-Speed 10549.60 samples/sec Loss 5.2341 LearningRate 0.0986 Epoch: 13 Global Step: 70050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:18:24,687-Speed 10545.82 samples/sec Loss 5.2561 LearningRate 0.0986 Epoch: 13 Global Step: 70060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:18:32,481-Speed 10512.71 samples/sec Loss 5.2528 LearningRate 0.0985 Epoch: 13 Global Step: 70070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:18:40,277-Speed 10509.22 samples/sec Loss 5.2593 LearningRate 0.0985 Epoch: 13 Global Step: 70080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:18:48,057-Speed 10530.43 samples/sec Loss 5.2665 LearningRate 0.0984 Epoch: 13 Global Step: 70090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:18:55,815-Speed 10560.93 samples/sec Loss 5.2714 LearningRate 0.0983 Epoch: 13 Global Step: 70100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:19:03,648-Speed 10460.59 samples/sec Loss 5.2613 LearningRate 0.0983 Epoch: 13 Global Step: 70110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:19:11,403-Speed 10564.90 samples/sec Loss 5.2310 LearningRate 0.0982 Epoch: 13 Global Step: 70120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:19:19,173-Speed 10552.83 samples/sec Loss 5.2395 LearningRate 0.0982 Epoch: 13 Global Step: 70130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:19:26,966-Speed 10513.11 samples/sec Loss 5.2580 LearningRate 0.0981 Epoch: 13 Global Step: 70140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:19:34,750-Speed 10525.01 samples/sec Loss 5.2394 LearningRate 0.0981 Epoch: 13 Global Step: 70150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:19:42,500-Speed 10572.18 samples/sec Loss 5.2410 LearningRate 0.0980 Epoch: 13 Global Step: 70160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:19:50,281-Speed 10530.50 samples/sec Loss 5.2310 LearningRate 0.0979 Epoch: 13 Global Step: 70170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:19:58,073-Speed 10514.90 samples/sec Loss 5.2709 LearningRate 0.0979 Epoch: 13 Global Step: 70180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:20:05,858-Speed 10523.84 samples/sec Loss 5.2517 LearningRate 0.0978 Epoch: 13 Global Step: 70190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:20:13,626-Speed 10546.92 samples/sec Loss 5.2402 LearningRate 0.0978 Epoch: 13 Global Step: 70200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:20:21,384-Speed 10561.54 samples/sec Loss 5.2394 LearningRate 0.0977 Epoch: 13 Global Step: 70210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:20:29,168-Speed 10525.06 samples/sec Loss 5.2651 LearningRate 0.0976 Epoch: 13 Global Step: 70220 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 06:20:36,973-Speed 10498.39 samples/sec Loss 5.2609 LearningRate 0.0976 Epoch: 13 Global Step: 70230 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 06:20:44,800-Speed 10467.89 samples/sec Loss 5.2576 LearningRate 0.0975 Epoch: 13 Global Step: 70240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:20:52,606-Speed 10494.66 samples/sec Loss 5.2773 LearningRate 0.0975 Epoch: 13 Global Step: 70250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:21:00,421-Speed 10484.13 samples/sec Loss 5.2451 LearningRate 0.0974 Epoch: 13 Global Step: 70260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:21:08,246-Speed 10470.01 samples/sec Loss 5.2248 LearningRate 0.0973 Epoch: 13 Global Step: 70270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:21:16,072-Speed 10472.78 samples/sec Loss 5.2646 LearningRate 0.0973 Epoch: 13 Global Step: 70280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:21:23,896-Speed 10471.71 samples/sec Loss 5.2443 LearningRate 0.0972 Epoch: 13 Global Step: 70290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:21:31,708-Speed 10487.85 samples/sec Loss 5.2774 LearningRate 0.0972 Epoch: 13 Global Step: 70300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:21:39,535-Speed 10468.45 samples/sec Loss 5.2085 LearningRate 0.0971 Epoch: 13 Global Step: 70310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:21:47,299-Speed 10552.19 samples/sec Loss 5.1932 LearningRate 0.0971 Epoch: 13 Global Step: 70320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:21:55,106-Speed 10494.76 samples/sec Loss 5.2204 LearningRate 0.0970 Epoch: 13 Global Step: 70330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:22:02,890-Speed 10529.18 samples/sec Loss 5.2262 LearningRate 0.0969 Epoch: 13 Global Step: 70340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:22:10,679-Speed 10519.10 samples/sec Loss 5.2270 LearningRate 0.0969 Epoch: 13 Global Step: 70350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:22:18,461-Speed 10528.57 samples/sec Loss 5.2280 LearningRate 0.0968 Epoch: 13 Global Step: 70360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:22:26,223-Speed 10554.41 samples/sec Loss 5.2306 LearningRate 0.0968 Epoch: 13 Global Step: 70370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:22:34,009-Speed 10524.10 samples/sec Loss 5.2193 LearningRate 0.0967 Epoch: 13 Global Step: 70380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:22:41,779-Speed 10544.85 samples/sec Loss 5.1937 LearningRate 0.0967 Epoch: 13 Global Step: 70390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:22:49,546-Speed 10547.92 samples/sec Loss 5.2266 LearningRate 0.0966 Epoch: 13 Global Step: 70400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:22:57,327-Speed 10529.31 samples/sec Loss 5.1991 LearningRate 0.0965 Epoch: 13 Global Step: 70410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:23:05,116-Speed 10518.96 samples/sec Loss 5.2008 LearningRate 0.0965 Epoch: 13 Global Step: 70420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:23:12,927-Speed 10489.27 samples/sec Loss 5.2252 LearningRate 0.0964 Epoch: 13 Global Step: 70430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:23:20,689-Speed 10554.96 samples/sec Loss 5.1821 LearningRate 0.0964 Epoch: 13 Global Step: 70440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:23:28,453-Speed 10552.18 samples/sec Loss 5.1902 LearningRate 0.0963 Epoch: 13 Global Step: 70450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:23:36,226-Speed 10541.11 samples/sec Loss 5.1999 LearningRate 0.0962 Epoch: 13 Global Step: 70460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:23:44,029-Speed 10500.86 samples/sec Loss 5.2031 LearningRate 0.0962 Epoch: 13 Global Step: 70470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:23:51,799-Speed 10543.66 samples/sec Loss 5.2532 LearningRate 0.0961 Epoch: 13 Global Step: 70480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:23:59,583-Speed 10525.92 samples/sec Loss 5.2066 LearningRate 0.0961 Epoch: 13 Global Step: 70490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:24:07,363-Speed 10530.49 samples/sec Loss 5.2173 LearningRate 0.0960 Epoch: 13 Global Step: 70500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:24:15,161-Speed 10507.79 samples/sec Loss 5.2023 LearningRate 0.0960 Epoch: 13 Global Step: 70510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:24:22,935-Speed 10538.39 samples/sec Loss 5.1809 LearningRate 0.0959 Epoch: 13 Global Step: 70520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:24:30,774-Speed 10452.24 samples/sec Loss 5.1864 LearningRate 0.0958 Epoch: 13 Global Step: 70530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:24:38,611-Speed 10455.63 samples/sec Loss 5.1917 LearningRate 0.0958 Epoch: 13 Global Step: 70540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:24:46,431-Speed 10476.48 samples/sec Loss 5.1752 LearningRate 0.0957 Epoch: 13 Global Step: 70550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:24:54,232-Speed 10503.42 samples/sec Loss 5.1625 LearningRate 0.0957 Epoch: 13 Global Step: 70560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:25:02,011-Speed 10531.79 samples/sec Loss 5.1749 LearningRate 0.0956 Epoch: 13 Global Step: 70570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:25:09,774-Speed 10554.77 samples/sec Loss 5.2231 LearningRate 0.0956 Epoch: 13 Global Step: 70580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:25:17,539-Speed 10550.76 samples/sec Loss 5.2065 LearningRate 0.0955 Epoch: 13 Global Step: 70590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:25:25,300-Speed 10556.37 samples/sec Loss 5.1758 LearningRate 0.0954 Epoch: 13 Global Step: 70600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:25:33,077-Speed 10535.10 samples/sec Loss 5.2035 LearningRate 0.0954 Epoch: 13 Global Step: 70610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:25:40,845-Speed 10548.09 samples/sec Loss 5.1661 LearningRate 0.0953 Epoch: 13 Global Step: 70620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:25:48,614-Speed 10544.78 samples/sec Loss 5.1959 LearningRate 0.0953 Epoch: 13 Global Step: 70630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:25:56,405-Speed 10516.65 samples/sec Loss 5.1741 LearningRate 0.0952 Epoch: 13 Global Step: 70640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:26:04,193-Speed 10520.26 samples/sec Loss 5.1699 LearningRate 0.0951 Epoch: 13 Global Step: 70650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:26:11,960-Speed 10547.77 samples/sec Loss 5.1927 LearningRate 0.0951 Epoch: 13 Global Step: 70660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:26:19,754-Speed 10511.41 samples/sec Loss 5.1222 LearningRate 0.0950 Epoch: 13 Global Step: 70670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:26:27,540-Speed 10523.77 samples/sec Loss 5.1627 LearningRate 0.0950 Epoch: 13 Global Step: 70680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:26:35,299-Speed 10559.45 samples/sec Loss 5.1746 LearningRate 0.0949 Epoch: 13 Global Step: 70690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:26:43,095-Speed 10508.61 samples/sec Loss 5.1795 LearningRate 0.0949 Epoch: 13 Global Step: 70700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:26:50,895-Speed 10505.34 samples/sec Loss 5.1435 LearningRate 0.0948 Epoch: 13 Global Step: 70710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:26:58,688-Speed 10512.63 samples/sec Loss 5.1587 LearningRate 0.0947 Epoch: 13 Global Step: 70720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:27:06,527-Speed 10451.96 samples/sec Loss 5.1685 LearningRate 0.0947 Epoch: 13 Global Step: 70730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:27:14,334-Speed 10494.77 samples/sec Loss 5.1620 LearningRate 0.0946 Epoch: 13 Global Step: 70740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:27:22,113-Speed 10531.45 samples/sec Loss 5.1974 LearningRate 0.0946 Epoch: 13 Global Step: 70750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:27:29,927-Speed 10484.80 samples/sec Loss 5.1667 LearningRate 0.0945 Epoch: 13 Global Step: 70760 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 06:27:37,693-Speed 10551.42 samples/sec Loss 5.1641 LearningRate 0.0945 Epoch: 13 Global Step: 70770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:27:45,507-Speed 10486.76 samples/sec Loss 5.2051 LearningRate 0.0944 Epoch: 13 Global Step: 70780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:27:53,323-Speed 10482.35 samples/sec Loss 5.1111 LearningRate 0.0943 Epoch: 13 Global Step: 70790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:28:01,113-Speed 10516.94 samples/sec Loss 5.1506 LearningRate 0.0943 Epoch: 13 Global Step: 70800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:28:08,899-Speed 10523.17 samples/sec Loss 5.1959 LearningRate 0.0942 Epoch: 13 Global Step: 70810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:28:16,715-Speed 10483.83 samples/sec Loss 5.1476 LearningRate 0.0942 Epoch: 13 Global Step: 70820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:28:24,507-Speed 10513.48 samples/sec Loss 5.1247 LearningRate 0.0941 Epoch: 13 Global Step: 70830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:28:32,328-Speed 10477.20 samples/sec Loss 5.1744 LearningRate 0.0941 Epoch: 13 Global Step: 70840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:28:40,102-Speed 10539.31 samples/sec Loss 5.1526 LearningRate 0.0940 Epoch: 13 Global Step: 70850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:28:47,909-Speed 10494.24 samples/sec Loss 5.1490 LearningRate 0.0939 Epoch: 13 Global Step: 70860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:28:55,680-Speed 10542.21 samples/sec Loss 5.1384 LearningRate 0.0939 Epoch: 13 Global Step: 70870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:29:03,465-Speed 10525.15 samples/sec Loss 5.1158 LearningRate 0.0938 Epoch: 13 Global Step: 70880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:29:11,248-Speed 10527.61 samples/sec Loss 5.1660 LearningRate 0.0938 Epoch: 13 Global Step: 70890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:29:19,032-Speed 10524.58 samples/sec Loss 5.1342 LearningRate 0.0937 Epoch: 13 Global Step: 70900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:29:26,828-Speed 10509.19 samples/sec Loss 5.1131 LearningRate 0.0937 Epoch: 13 Global Step: 70910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:29:34,632-Speed 10498.34 samples/sec Loss 5.1248 LearningRate 0.0936 Epoch: 13 Global Step: 70920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:29:42,426-Speed 10512.03 samples/sec Loss 5.1755 LearningRate 0.0935 Epoch: 13 Global Step: 70930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:29:50,239-Speed 10486.99 samples/sec Loss 5.1520 LearningRate 0.0935 Epoch: 13 Global Step: 70940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:29:58,005-Speed 10549.55 samples/sec Loss 5.1929 LearningRate 0.0934 Epoch: 13 Global Step: 70950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:30:05,781-Speed 10536.87 samples/sec Loss 5.1492 LearningRate 0.0934 Epoch: 13 Global Step: 70960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:30:13,613-Speed 10460.47 samples/sec Loss 5.1570 LearningRate 0.0933 Epoch: 13 Global Step: 70970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:30:21,424-Speed 10489.46 samples/sec Loss 5.1131 LearningRate 0.0933 Epoch: 13 Global Step: 70980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:30:29,195-Speed 10542.36 samples/sec Loss 5.1529 LearningRate 0.0932 Epoch: 13 Global Step: 70990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:30:37,002-Speed 10494.82 samples/sec Loss 5.1229 LearningRate 0.0931 Epoch: 13 Global Step: 71000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:30:44,802-Speed 10504.08 samples/sec Loss 5.1032 LearningRate 0.0931 Epoch: 13 Global Step: 71010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:30:52,584-Speed 10527.86 samples/sec Loss 5.1424 LearningRate 0.0930 Epoch: 13 Global Step: 71020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:31:00,370-Speed 10523.18 samples/sec Loss 5.1355 LearningRate 0.0930 Epoch: 13 Global Step: 71030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:31:08,146-Speed 10536.65 samples/sec Loss 5.1496 LearningRate 0.0929 Epoch: 13 Global Step: 71040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:31:15,954-Speed 10493.70 samples/sec Loss 5.1078 LearningRate 0.0929 Epoch: 13 Global Step: 71050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:31:23,803-Speed 10437.95 samples/sec Loss 5.0982 LearningRate 0.0928 Epoch: 13 Global Step: 71060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:31:31,590-Speed 10521.58 samples/sec Loss 5.1527 LearningRate 0.0927 Epoch: 13 Global Step: 71070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:31:39,369-Speed 10532.88 samples/sec Loss 5.1728 LearningRate 0.0927 Epoch: 13 Global Step: 71080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:31:47,149-Speed 10531.85 samples/sec Loss 5.1203 LearningRate 0.0926 Epoch: 13 Global Step: 71090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:31:54,909-Speed 10557.77 samples/sec Loss 5.1581 LearningRate 0.0926 Epoch: 13 Global Step: 71100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:32:02,692-Speed 10526.51 samples/sec Loss 5.0702 LearningRate 0.0925 Epoch: 13 Global Step: 71110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:32:10,464-Speed 10542.13 samples/sec Loss 5.0852 LearningRate 0.0925 Epoch: 13 Global Step: 71120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:32:18,244-Speed 10531.72 samples/sec Loss 5.1099 LearningRate 0.0924 Epoch: 13 Global Step: 71130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:32:26,030-Speed 10522.42 samples/sec Loss 5.1266 LearningRate 0.0923 Epoch: 13 Global Step: 71140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:32:33,839-Speed 10490.60 samples/sec Loss 5.0968 LearningRate 0.0923 Epoch: 13 Global Step: 71150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:32:41,638-Speed 10506.74 samples/sec Loss 5.1239 LearningRate 0.0922 Epoch: 13 Global Step: 71160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:32:49,412-Speed 10539.39 samples/sec Loss 5.0935 LearningRate 0.0922 Epoch: 13 Global Step: 71170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:32:57,210-Speed 10505.57 samples/sec Loss 5.0954 LearningRate 0.0921 Epoch: 13 Global Step: 71180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:33:05,038-Speed 10466.24 samples/sec Loss 5.1087 LearningRate 0.0921 Epoch: 13 Global Step: 71190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:33:12,806-Speed 10547.23 samples/sec Loss 5.0701 LearningRate 0.0920 Epoch: 13 Global Step: 71200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:33:20,595-Speed 10520.20 samples/sec Loss 5.0854 LearningRate 0.0919 Epoch: 13 Global Step: 71210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:33:28,445-Speed 10436.90 samples/sec Loss 5.0980 LearningRate 0.0919 Epoch: 13 Global Step: 71220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:33:36,241-Speed 10509.68 samples/sec Loss 5.0848 LearningRate 0.0918 Epoch: 13 Global Step: 71230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:33:44,031-Speed 10518.73 samples/sec Loss 5.1365 LearningRate 0.0918 Epoch: 13 Global Step: 71240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:33:51,835-Speed 10498.01 samples/sec Loss 5.1130 LearningRate 0.0917 Epoch: 13 Global Step: 71250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:33:59,618-Speed 10526.66 samples/sec Loss 5.0834 LearningRate 0.0917 Epoch: 13 Global Step: 71260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:34:07,500-Speed 10394.32 samples/sec Loss 5.0706 LearningRate 0.0916 Epoch: 13 Global Step: 71270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:34:15,293-Speed 10514.00 samples/sec Loss 5.1239 LearningRate 0.0916 Epoch: 13 Global Step: 71280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:34:23,083-Speed 10517.77 samples/sec Loss 5.1162 LearningRate 0.0915 Epoch: 13 Global Step: 71290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:34:30,860-Speed 10535.19 samples/sec Loss 5.0812 LearningRate 0.0914 Epoch: 13 Global Step: 71300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:34:38,637-Speed 10535.09 samples/sec Loss 5.0823 LearningRate 0.0914 Epoch: 13 Global Step: 71310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:34:46,441-Speed 10499.73 samples/sec Loss 5.0708 LearningRate 0.0913 Epoch: 13 Global Step: 71320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:34:54,245-Speed 10497.88 samples/sec Loss 5.0830 LearningRate 0.0913 Epoch: 13 Global Step: 71330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:35:02,039-Speed 10511.26 samples/sec Loss 5.0908 LearningRate 0.0912 Epoch: 13 Global Step: 71340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:35:09,838-Speed 10506.65 samples/sec Loss 5.0440 LearningRate 0.0912 Epoch: 13 Global Step: 71350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:35:17,630-Speed 10514.97 samples/sec Loss 5.0862 LearningRate 0.0911 Epoch: 13 Global Step: 71360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:35:25,413-Speed 10526.68 samples/sec Loss 5.0681 LearningRate 0.0910 Epoch: 13 Global Step: 71370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:35:33,212-Speed 10506.69 samples/sec Loss 5.0493 LearningRate 0.0910 Epoch: 13 Global Step: 71380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:35:41,030-Speed 10479.10 samples/sec Loss 5.0858 LearningRate 0.0909 Epoch: 13 Global Step: 71390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:35:48,844-Speed 10492.45 samples/sec Loss 5.1105 LearningRate 0.0909 Epoch: 13 Global Step: 71400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:35:56,677-Speed 10460.21 samples/sec Loss 5.0795 LearningRate 0.0908 Epoch: 13 Global Step: 71410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:36:04,506-Speed 10464.81 samples/sec Loss 5.0643 LearningRate 0.0908 Epoch: 13 Global Step: 71420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:36:12,314-Speed 10493.32 samples/sec Loss 5.0893 LearningRate 0.0907 Epoch: 13 Global Step: 71430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:36:20,111-Speed 10507.69 samples/sec Loss 5.0855 LearningRate 0.0907 Epoch: 13 Global Step: 71440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:36:27,897-Speed 10523.55 samples/sec Loss 5.0792 LearningRate 0.0906 Epoch: 13 Global Step: 71450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:36:35,699-Speed 10501.05 samples/sec Loss 5.0650 LearningRate 0.0905 Epoch: 13 Global Step: 71460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:36:43,510-Speed 10488.98 samples/sec Loss 5.0875 LearningRate 0.0905 Epoch: 13 Global Step: 71470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:36:51,372-Speed 10420.25 samples/sec Loss 5.0693 LearningRate 0.0904 Epoch: 13 Global Step: 71480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:36:59,189-Speed 10481.66 samples/sec Loss 5.0542 LearningRate 0.0904 Epoch: 13 Global Step: 71490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:37:07,014-Speed 10469.93 samples/sec Loss 5.0438 LearningRate 0.0903 Epoch: 13 Global Step: 71500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:37:14,813-Speed 10505.00 samples/sec Loss 5.0711 LearningRate 0.0903 Epoch: 13 Global Step: 71510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:37:22,629-Speed 10483.21 samples/sec Loss 5.0496 LearningRate 0.0902 Epoch: 13 Global Step: 71520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:37:30,413-Speed 10524.76 samples/sec Loss 5.0279 LearningRate 0.0901 Epoch: 13 Global Step: 71530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:37:38,210-Speed 10507.79 samples/sec Loss 5.0558 LearningRate 0.0901 Epoch: 13 Global Step: 71540 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 06:37:46,016-Speed 10496.27 samples/sec Loss 5.0569 LearningRate 0.0900 Epoch: 13 Global Step: 71550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:37:53,809-Speed 10513.81 samples/sec Loss 5.0379 LearningRate 0.0900 Epoch: 13 Global Step: 71560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:38:01,608-Speed 10507.99 samples/sec Loss 5.0743 LearningRate 0.0899 Epoch: 13 Global Step: 71570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:38:09,408-Speed 10504.17 samples/sec Loss 5.0037 LearningRate 0.0899 Epoch: 13 Global Step: 71580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:38:17,217-Speed 10492.29 samples/sec Loss 5.0613 LearningRate 0.0898 Epoch: 13 Global Step: 71590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:38:24,999-Speed 10527.86 samples/sec Loss 5.0060 LearningRate 0.0898 Epoch: 13 Global Step: 71600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:38:32,779-Speed 10532.10 samples/sec Loss 5.0437 LearningRate 0.0897 Epoch: 13 Global Step: 71610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:38:40,570-Speed 10515.09 samples/sec Loss 5.0008 LearningRate 0.0896 Epoch: 13 Global Step: 71620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:38:48,383-Speed 10486.91 samples/sec Loss 5.0541 LearningRate 0.0896 Epoch: 13 Global Step: 71630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:38:56,188-Speed 10497.68 samples/sec Loss 5.0518 LearningRate 0.0895 Epoch: 13 Global Step: 71640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:39:03,967-Speed 10534.28 samples/sec Loss 5.0693 LearningRate 0.0895 Epoch: 13 Global Step: 71650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:39:11,759-Speed 10514.91 samples/sec Loss 5.0190 LearningRate 0.0894 Epoch: 13 Global Step: 71660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:39:19,572-Speed 10485.74 samples/sec Loss 5.0320 LearningRate 0.0894 Epoch: 13 Global Step: 71670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:39:27,356-Speed 10526.34 samples/sec Loss 5.0529 LearningRate 0.0893 Epoch: 13 Global Step: 71680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:39:35,156-Speed 10504.47 samples/sec Loss 5.0164 LearningRate 0.0893 Epoch: 13 Global Step: 71690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:39:42,947-Speed 10515.42 samples/sec Loss 5.0414 LearningRate 0.0892 Epoch: 13 Global Step: 71700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:39:50,745-Speed 10506.68 samples/sec Loss 5.0150 LearningRate 0.0891 Epoch: 13 Global Step: 71710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:39:58,538-Speed 10513.97 samples/sec Loss 5.0060 LearningRate 0.0891 Epoch: 13 Global Step: 71720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:40:06,330-Speed 10514.33 samples/sec Loss 5.0391 LearningRate 0.0890 Epoch: 13 Global Step: 71730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:40:14,122-Speed 10514.70 samples/sec Loss 5.0335 LearningRate 0.0890 Epoch: 13 Global Step: 71740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:40:21,906-Speed 10525.60 samples/sec Loss 5.0068 LearningRate 0.0889 Epoch: 13 Global Step: 71750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:40:29,690-Speed 10525.13 samples/sec Loss 4.9859 LearningRate 0.0889 Epoch: 13 Global Step: 71760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:40:37,460-Speed 10551.81 samples/sec Loss 4.9771 LearningRate 0.0888 Epoch: 13 Global Step: 71770 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-16 06:40:45,232-Speed 10542.95 samples/sec Loss 5.0609 LearningRate 0.0887 Epoch: 13 Global Step: 71780 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-16 06:40:53,031-Speed 10505.61 samples/sec Loss 5.0197 LearningRate 0.0887 Epoch: 13 Global Step: 71790 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-16 06:41:00,843-Speed 10490.35 samples/sec Loss 5.0300 LearningRate 0.0886 Epoch: 13 Global Step: 71800 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-16 06:41:08,629-Speed 10523.62 samples/sec Loss 5.0220 LearningRate 0.0886 Epoch: 13 Global Step: 71810 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-16 06:41:16,413-Speed 10525.94 samples/sec Loss 5.0238 LearningRate 0.0885 Epoch: 13 Global Step: 71820 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-16 06:41:24,227-Speed 10485.42 samples/sec Loss 5.0269 LearningRate 0.0885 Epoch: 13 Global Step: 71830 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-16 06:41:32,057-Speed 10463.92 samples/sec Loss 5.0203 LearningRate 0.0884 Epoch: 13 Global Step: 71840 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-16 06:41:39,874-Speed 10481.85 samples/sec Loss 5.0440 LearningRate 0.0884 Epoch: 13 Global Step: 71850 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-16 06:41:47,699-Speed 10471.11 samples/sec Loss 5.0236 LearningRate 0.0883 Epoch: 13 Global Step: 71860 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-01-16 06:41:55,482-Speed 10526.98 samples/sec Loss 4.9915 LearningRate 0.0882 Epoch: 13 Global Step: 71870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:42:03,262-Speed 10530.39 samples/sec Loss 5.0474 LearningRate 0.0882 Epoch: 13 Global Step: 71880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:42:11,054-Speed 10514.76 samples/sec Loss 5.0074 LearningRate 0.0881 Epoch: 13 Global Step: 71890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:42:18,922-Speed 10413.05 samples/sec Loss 4.9878 LearningRate 0.0881 Epoch: 13 Global Step: 71900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:42:26,743-Speed 10479.25 samples/sec Loss 4.9981 LearningRate 0.0880 Epoch: 13 Global Step: 71910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:42:34,533-Speed 10515.88 samples/sec Loss 5.0042 LearningRate 0.0880 Epoch: 13 Global Step: 71920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:42:42,325-Speed 10515.38 samples/sec Loss 4.9972 LearningRate 0.0879 Epoch: 13 Global Step: 71930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:42:50,121-Speed 10510.02 samples/sec Loss 5.0176 LearningRate 0.0879 Epoch: 13 Global Step: 71940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:42:57,950-Speed 10464.99 samples/sec Loss 4.9774 LearningRate 0.0878 Epoch: 13 Global Step: 71950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:43:05,748-Speed 10505.66 samples/sec Loss 4.9944 LearningRate 0.0878 Epoch: 13 Global Step: 71960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:43:13,544-Speed 10510.46 samples/sec Loss 4.9781 LearningRate 0.0877 Epoch: 13 Global Step: 71970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:43:21,311-Speed 10547.90 samples/sec Loss 5.0158 LearningRate 0.0876 Epoch: 13 Global Step: 71980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:43:29,118-Speed 10494.16 samples/sec Loss 4.9966 LearningRate 0.0876 Epoch: 13 Global Step: 71990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:43:36,904-Speed 10524.46 samples/sec Loss 4.9998 LearningRate 0.0875 Epoch: 13 Global Step: 72000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:43:44,704-Speed 10510.16 samples/sec Loss 4.9621 LearningRate 0.0875 Epoch: 13 Global Step: 72010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:43:52,508-Speed 10498.42 samples/sec Loss 5.0106 LearningRate 0.0874 Epoch: 13 Global Step: 72020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:44:00,320-Speed 10487.70 samples/sec Loss 5.0107 LearningRate 0.0874 Epoch: 13 Global Step: 72030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:44:08,116-Speed 10509.44 samples/sec Loss 4.9871 LearningRate 0.0873 Epoch: 13 Global Step: 72040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:44:15,892-Speed 10540.22 samples/sec Loss 4.9791 LearningRate 0.0873 Epoch: 13 Global Step: 72050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:44:23,678-Speed 10522.86 samples/sec Loss 4.9976 LearningRate 0.0872 Epoch: 13 Global Step: 72060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:44:31,451-Speed 10540.00 samples/sec Loss 4.9931 LearningRate 0.0871 Epoch: 13 Global Step: 72070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:44:39,233-Speed 10529.14 samples/sec Loss 5.0247 LearningRate 0.0871 Epoch: 13 Global Step: 72080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:44:47,023-Speed 10516.85 samples/sec Loss 5.0086 LearningRate 0.0870 Epoch: 13 Global Step: 72090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:44:54,823-Speed 10504.48 samples/sec Loss 4.9751 LearningRate 0.0870 Epoch: 13 Global Step: 72100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:45:02,680-Speed 10427.25 samples/sec Loss 4.9847 LearningRate 0.0869 Epoch: 13 Global Step: 72110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:45:10,493-Speed 10486.00 samples/sec Loss 4.9585 LearningRate 0.0869 Epoch: 13 Global Step: 72120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:45:18,270-Speed 10536.09 samples/sec Loss 4.9959 LearningRate 0.0868 Epoch: 13 Global Step: 72130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:45:26,046-Speed 10535.89 samples/sec Loss 4.9565 LearningRate 0.0868 Epoch: 13 Global Step: 72140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:45:33,856-Speed 10489.71 samples/sec Loss 4.9868 LearningRate 0.0867 Epoch: 13 Global Step: 72150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:45:41,682-Speed 10469.32 samples/sec Loss 4.9486 LearningRate 0.0866 Epoch: 13 Global Step: 72160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:45:49,462-Speed 10531.25 samples/sec Loss 4.9771 LearningRate 0.0866 Epoch: 13 Global Step: 72170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:45:57,280-Speed 10479.70 samples/sec Loss 4.9770 LearningRate 0.0865 Epoch: 13 Global Step: 72180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:46:05,072-Speed 10517.26 samples/sec Loss 4.9784 LearningRate 0.0865 Epoch: 13 Global Step: 72190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:46:12,856-Speed 10525.62 samples/sec Loss 4.9557 LearningRate 0.0864 Epoch: 13 Global Step: 72200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:46:20,651-Speed 10511.04 samples/sec Loss 4.9269 LearningRate 0.0864 Epoch: 13 Global Step: 72210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:46:28,461-Speed 10489.51 samples/sec Loss 4.9456 LearningRate 0.0863 Epoch: 13 Global Step: 72220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:46:36,252-Speed 10516.63 samples/sec Loss 4.9677 LearningRate 0.0863 Epoch: 13 Global Step: 72230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:46:44,064-Speed 10488.33 samples/sec Loss 4.9285 LearningRate 0.0862 Epoch: 13 Global Step: 72240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:46:51,863-Speed 10505.14 samples/sec Loss 4.9605 LearningRate 0.0862 Epoch: 13 Global Step: 72250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:46:59,691-Speed 10468.94 samples/sec Loss 4.9323 LearningRate 0.0861 Epoch: 13 Global Step: 72260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:47:07,490-Speed 10507.20 samples/sec Loss 4.9811 LearningRate 0.0860 Epoch: 13 Global Step: 72270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:47:15,306-Speed 10482.25 samples/sec Loss 4.9567 LearningRate 0.0860 Epoch: 13 Global Step: 72280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:47:23,122-Speed 10482.11 samples/sec Loss 4.9707 LearningRate 0.0859 Epoch: 13 Global Step: 72290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:47:30,923-Speed 10501.55 samples/sec Loss 4.9842 LearningRate 0.0859 Epoch: 13 Global Step: 72300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:47:38,722-Speed 10506.67 samples/sec Loss 4.9711 LearningRate 0.0858 Epoch: 13 Global Step: 72310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:47:46,514-Speed 10515.03 samples/sec Loss 4.9386 LearningRate 0.0858 Epoch: 13 Global Step: 72320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:47:54,334-Speed 10475.77 samples/sec Loss 4.9136 LearningRate 0.0857 Epoch: 13 Global Step: 72330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:48:02,133-Speed 10505.81 samples/sec Loss 4.9418 LearningRate 0.0857 Epoch: 13 Global Step: 72340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:48:09,943-Speed 10491.24 samples/sec Loss 4.9593 LearningRate 0.0856 Epoch: 13 Global Step: 72350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:48:17,757-Speed 10484.69 samples/sec Loss 4.9410 LearningRate 0.0856 Epoch: 13 Global Step: 72360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:48:25,601-Speed 10444.29 samples/sec Loss 4.9307 LearningRate 0.0855 Epoch: 13 Global Step: 72370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:48:33,392-Speed 10515.89 samples/sec Loss 4.9246 LearningRate 0.0854 Epoch: 13 Global Step: 72380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:48:41,176-Speed 10525.99 samples/sec Loss 4.9564 LearningRate 0.0854 Epoch: 13 Global Step: 72390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:48:48,965-Speed 10519.30 samples/sec Loss 4.9599 LearningRate 0.0853 Epoch: 13 Global Step: 72400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:48:56,761-Speed 10509.60 samples/sec Loss 4.9433 LearningRate 0.0853 Epoch: 13 Global Step: 72410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:49:04,568-Speed 10493.50 samples/sec Loss 4.8929 LearningRate 0.0852 Epoch: 13 Global Step: 72420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:49:12,351-Speed 10528.00 samples/sec Loss 4.9139 LearningRate 0.0852 Epoch: 13 Global Step: 72430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:49:20,145-Speed 10517.33 samples/sec Loss 4.9232 LearningRate 0.0851 Epoch: 13 Global Step: 72440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:49:27,947-Speed 10500.28 samples/sec Loss 4.9502 LearningRate 0.0851 Epoch: 13 Global Step: 72450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:49:35,733-Speed 10522.84 samples/sec Loss 4.9349 LearningRate 0.0850 Epoch: 13 Global Step: 72460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:49:43,506-Speed 10541.78 samples/sec Loss 4.9526 LearningRate 0.0850 Epoch: 13 Global Step: 72470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:49:51,305-Speed 10504.66 samples/sec Loss 4.9200 LearningRate 0.0849 Epoch: 13 Global Step: 72480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:49:59,115-Speed 10490.00 samples/sec Loss 4.9407 LearningRate 0.0848 Epoch: 13 Global Step: 72490 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-01-16 06:50:06,927-Speed 10487.68 samples/sec Loss 4.9731 LearningRate 0.0848 Epoch: 13 Global Step: 72500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:50:14,763-Speed 10455.91 samples/sec Loss 4.9388 LearningRate 0.0847 Epoch: 13 Global Step: 72510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:50:22,540-Speed 10534.57 samples/sec Loss 4.9458 LearningRate 0.0847 Epoch: 13 Global Step: 72520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:50:30,344-Speed 10499.14 samples/sec Loss 4.8657 LearningRate 0.0846 Epoch: 13 Global Step: 72530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:50:38,134-Speed 10516.48 samples/sec Loss 4.9116 LearningRate 0.0846 Epoch: 13 Global Step: 72540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:50:45,916-Speed 10528.36 samples/sec Loss 4.9212 LearningRate 0.0845 Epoch: 13 Global Step: 72550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:50:53,709-Speed 10513.38 samples/sec Loss 4.9612 LearningRate 0.0845 Epoch: 13 Global Step: 72560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:51:01,476-Speed 10548.66 samples/sec Loss 4.9459 LearningRate 0.0844 Epoch: 13 Global Step: 72570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:51:09,270-Speed 10512.90 samples/sec Loss 4.9123 LearningRate 0.0844 Epoch: 13 Global Step: 72580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:51:32,516-Speed 3524.09 samples/sec Loss 4.9469 LearningRate 0.0843 Epoch: 14 Global Step: 72590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:51:40,276-Speed 10559.16 samples/sec Loss 4.9391 LearningRate 0.0842 Epoch: 14 Global Step: 72600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:51:48,080-Speed 10498.94 samples/sec Loss 4.8938 LearningRate 0.0842 Epoch: 14 Global Step: 72610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:51:55,855-Speed 10540.50 samples/sec Loss 4.8671 LearningRate 0.0841 Epoch: 14 Global Step: 72620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:52:03,633-Speed 10533.78 samples/sec Loss 4.8702 LearningRate 0.0841 Epoch: 14 Global Step: 72630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:52:11,384-Speed 10569.17 samples/sec Loss 4.8750 LearningRate 0.0840 Epoch: 14 Global Step: 72640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:52:19,187-Speed 10500.93 samples/sec Loss 4.9102 LearningRate 0.0840 Epoch: 14 Global Step: 72650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:52:26,959-Speed 10542.49 samples/sec Loss 4.8967 LearningRate 0.0839 Epoch: 14 Global Step: 72660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:52:34,720-Speed 10555.63 samples/sec Loss 4.8839 LearningRate 0.0839 Epoch: 14 Global Step: 72670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:52:42,515-Speed 10511.19 samples/sec Loss 4.8719 LearningRate 0.0838 Epoch: 14 Global Step: 72680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:52:50,307-Speed 10514.80 samples/sec Loss 4.8891 LearningRate 0.0838 Epoch: 14 Global Step: 72690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:52:58,082-Speed 10537.27 samples/sec Loss 4.8811 LearningRate 0.0837 Epoch: 14 Global Step: 72700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:53:05,859-Speed 10534.08 samples/sec Loss 4.8624 LearningRate 0.0836 Epoch: 14 Global Step: 72710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:53:13,649-Speed 10517.64 samples/sec Loss 4.8436 LearningRate 0.0836 Epoch: 14 Global Step: 72720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:53:21,427-Speed 10534.83 samples/sec Loss 4.8924 LearningRate 0.0835 Epoch: 14 Global Step: 72730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:53:29,241-Speed 10485.04 samples/sec Loss 4.8293 LearningRate 0.0835 Epoch: 14 Global Step: 72740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:53:37,025-Speed 10525.03 samples/sec Loss 4.8494 LearningRate 0.0834 Epoch: 14 Global Step: 72750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:53:44,799-Speed 10539.29 samples/sec Loss 4.8787 LearningRate 0.0834 Epoch: 14 Global Step: 72760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:53:52,580-Speed 10529.44 samples/sec Loss 4.8932 LearningRate 0.0833 Epoch: 14 Global Step: 72770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:54:00,371-Speed 10516.64 samples/sec Loss 4.8808 LearningRate 0.0833 Epoch: 14 Global Step: 72780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:54:08,138-Speed 10547.68 samples/sec Loss 4.8602 LearningRate 0.0832 Epoch: 14 Global Step: 72790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:54:15,911-Speed 10540.81 samples/sec Loss 4.8914 LearningRate 0.0832 Epoch: 14 Global Step: 72800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:54:23,677-Speed 10550.50 samples/sec Loss 4.8469 LearningRate 0.0831 Epoch: 14 Global Step: 72810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:54:31,447-Speed 10544.61 samples/sec Loss 4.8523 LearningRate 0.0831 Epoch: 14 Global Step: 72820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:54:39,228-Speed 10528.47 samples/sec Loss 4.8826 LearningRate 0.0830 Epoch: 14 Global Step: 72830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:54:47,008-Speed 10531.79 samples/sec Loss 4.8952 LearningRate 0.0829 Epoch: 14 Global Step: 72840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:54:54,793-Speed 10523.94 samples/sec Loss 4.8835 LearningRate 0.0829 Epoch: 14 Global Step: 72850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:55:02,573-Speed 10531.21 samples/sec Loss 4.8926 LearningRate 0.0828 Epoch: 14 Global Step: 72860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:55:10,363-Speed 10517.05 samples/sec Loss 4.8860 LearningRate 0.0828 Epoch: 14 Global Step: 72870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:55:18,158-Speed 10510.70 samples/sec Loss 4.9020 LearningRate 0.0827 Epoch: 14 Global Step: 72880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:55:25,937-Speed 10532.88 samples/sec Loss 4.8622 LearningRate 0.0827 Epoch: 14 Global Step: 72890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:55:33,706-Speed 10546.07 samples/sec Loss 4.8759 LearningRate 0.0826 Epoch: 14 Global Step: 72900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:55:41,482-Speed 10534.77 samples/sec Loss 4.8712 LearningRate 0.0826 Epoch: 14 Global Step: 72910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:55:49,249-Speed 10549.72 samples/sec Loss 4.8483 LearningRate 0.0825 Epoch: 14 Global Step: 72920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:55:57,021-Speed 10541.54 samples/sec Loss 4.8555 LearningRate 0.0825 Epoch: 14 Global Step: 72930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:56:04,790-Speed 10545.54 samples/sec Loss 4.8498 LearningRate 0.0824 Epoch: 14 Global Step: 72940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:56:12,587-Speed 10507.90 samples/sec Loss 4.8565 LearningRate 0.0824 Epoch: 14 Global Step: 72950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:56:20,417-Speed 10464.17 samples/sec Loss 4.8286 LearningRate 0.0823 Epoch: 14 Global Step: 72960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:56:28,251-Speed 10458.97 samples/sec Loss 4.8736 LearningRate 0.0823 Epoch: 14 Global Step: 72970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:56:36,069-Speed 10479.61 samples/sec Loss 4.8905 LearningRate 0.0822 Epoch: 14 Global Step: 72980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:56:43,881-Speed 10486.67 samples/sec Loss 4.8887 LearningRate 0.0821 Epoch: 14 Global Step: 72990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:56:51,697-Speed 10483.34 samples/sec Loss 4.8757 LearningRate 0.0821 Epoch: 14 Global Step: 73000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:56:59,521-Speed 10470.83 samples/sec Loss 4.8267 LearningRate 0.0820 Epoch: 14 Global Step: 73010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:57:07,368-Speed 10441.41 samples/sec Loss 4.8662 LearningRate 0.0820 Epoch: 14 Global Step: 73020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:57:15,227-Speed 10425.24 samples/sec Loss 4.8377 LearningRate 0.0819 Epoch: 14 Global Step: 73030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:57:23,102-Speed 10404.41 samples/sec Loss 4.8530 LearningRate 0.0819 Epoch: 14 Global Step: 73040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:57:30,924-Speed 10474.59 samples/sec Loss 4.8377 LearningRate 0.0818 Epoch: 14 Global Step: 73050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:57:38,775-Speed 10435.25 samples/sec Loss 4.8821 LearningRate 0.0818 Epoch: 14 Global Step: 73060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:57:46,593-Speed 10480.38 samples/sec Loss 4.8420 LearningRate 0.0817 Epoch: 14 Global Step: 73070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:57:54,438-Speed 10444.18 samples/sec Loss 4.8642 LearningRate 0.0817 Epoch: 14 Global Step: 73080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:58:02,244-Speed 10494.73 samples/sec Loss 4.8736 LearningRate 0.0816 Epoch: 14 Global Step: 73090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:58:10,063-Speed 10479.18 samples/sec Loss 4.8442 LearningRate 0.0816 Epoch: 14 Global Step: 73100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:58:17,881-Speed 10480.65 samples/sec Loss 4.8093 LearningRate 0.0815 Epoch: 14 Global Step: 73110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:58:25,700-Speed 10478.35 samples/sec Loss 4.7942 LearningRate 0.0814 Epoch: 14 Global Step: 73120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:58:33,510-Speed 10490.25 samples/sec Loss 4.8054 LearningRate 0.0814 Epoch: 14 Global Step: 73130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:58:41,336-Speed 10469.47 samples/sec Loss 4.8295 LearningRate 0.0813 Epoch: 14 Global Step: 73140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 06:58:49,168-Speed 10460.19 samples/sec Loss 4.8391 LearningRate 0.0813 Epoch: 14 Global Step: 73150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:58:57,000-Speed 10462.14 samples/sec Loss 4.7924 LearningRate 0.0812 Epoch: 14 Global Step: 73160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:59:04,833-Speed 10459.40 samples/sec Loss 4.8519 LearningRate 0.0812 Epoch: 14 Global Step: 73170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:59:12,653-Speed 10477.07 samples/sec Loss 4.8641 LearningRate 0.0811 Epoch: 14 Global Step: 73180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:59:20,474-Speed 10475.37 samples/sec Loss 4.7960 LearningRate 0.0811 Epoch: 14 Global Step: 73190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:59:28,297-Speed 10473.52 samples/sec Loss 4.8172 LearningRate 0.0810 Epoch: 14 Global Step: 73200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:59:36,093-Speed 10509.22 samples/sec Loss 4.8093 LearningRate 0.0810 Epoch: 14 Global Step: 73210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:59:43,902-Speed 10495.00 samples/sec Loss 4.8316 LearningRate 0.0809 Epoch: 14 Global Step: 73220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:59:51,711-Speed 10491.52 samples/sec Loss 4.8802 LearningRate 0.0809 Epoch: 14 Global Step: 73230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 06:59:59,517-Speed 10496.27 samples/sec Loss 4.8232 LearningRate 0.0808 Epoch: 14 Global Step: 73240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:00:07,302-Speed 10522.74 samples/sec Loss 4.8139 LearningRate 0.0808 Epoch: 14 Global Step: 73250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:00:15,115-Speed 10489.13 samples/sec Loss 4.7796 LearningRate 0.0807 Epoch: 14 Global Step: 73260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:00:22,901-Speed 10522.36 samples/sec Loss 4.8291 LearningRate 0.0807 Epoch: 14 Global Step: 73270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:00:30,688-Speed 10521.46 samples/sec Loss 4.8058 LearningRate 0.0806 Epoch: 14 Global Step: 73280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:00:38,510-Speed 10474.51 samples/sec Loss 4.8023 LearningRate 0.0805 Epoch: 14 Global Step: 73290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:00:46,322-Speed 10487.49 samples/sec Loss 4.8153 LearningRate 0.0805 Epoch: 14 Global Step: 73300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:00:54,152-Speed 10463.05 samples/sec Loss 4.8104 LearningRate 0.0804 Epoch: 14 Global Step: 73310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:01:01,968-Speed 10483.22 samples/sec Loss 4.8428 LearningRate 0.0804 Epoch: 14 Global Step: 73320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:01:09,788-Speed 10476.64 samples/sec Loss 4.7981 LearningRate 0.0803 Epoch: 14 Global Step: 73330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:01:17,604-Speed 10483.12 samples/sec Loss 4.8409 LearningRate 0.0803 Epoch: 14 Global Step: 73340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:01:25,438-Speed 10457.90 samples/sec Loss 4.8111 LearningRate 0.0802 Epoch: 14 Global Step: 73350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:01:33,265-Speed 10467.98 samples/sec Loss 4.7786 LearningRate 0.0802 Epoch: 14 Global Step: 73360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:01:41,065-Speed 10503.19 samples/sec Loss 4.8208 LearningRate 0.0801 Epoch: 14 Global Step: 73370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:01:48,864-Speed 10506.61 samples/sec Loss 4.7953 LearningRate 0.0801 Epoch: 14 Global Step: 73380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:01:56,652-Speed 10519.60 samples/sec Loss 4.7710 LearningRate 0.0800 Epoch: 14 Global Step: 73390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:02:04,458-Speed 10495.82 samples/sec Loss 4.7074 LearningRate 0.0800 Epoch: 14 Global Step: 73400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:02:12,247-Speed 10518.97 samples/sec Loss 4.7437 LearningRate 0.0799 Epoch: 14 Global Step: 73410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:02:20,042-Speed 10513.91 samples/sec Loss 4.7953 LearningRate 0.0799 Epoch: 14 Global Step: 73420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:02:27,826-Speed 10524.65 samples/sec Loss 4.7921 LearningRate 0.0798 Epoch: 14 Global Step: 73430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:02:35,598-Speed 10541.97 samples/sec Loss 4.7622 LearningRate 0.0798 Epoch: 14 Global Step: 73440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:02:43,416-Speed 10480.30 samples/sec Loss 4.7980 LearningRate 0.0797 Epoch: 14 Global Step: 73450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:02:51,239-Speed 10473.60 samples/sec Loss 4.7841 LearningRate 0.0796 Epoch: 14 Global Step: 73460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:02:59,055-Speed 10481.66 samples/sec Loss 4.7922 LearningRate 0.0796 Epoch: 14 Global Step: 73470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:03:06,836-Speed 10530.54 samples/sec Loss 4.8175 LearningRate 0.0795 Epoch: 14 Global Step: 73480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:03:14,616-Speed 10530.25 samples/sec Loss 4.7660 LearningRate 0.0795 Epoch: 14 Global Step: 73490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:03:22,409-Speed 10514.82 samples/sec Loss 4.8010 LearningRate 0.0794 Epoch: 14 Global Step: 73500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:03:30,234-Speed 10469.84 samples/sec Loss 4.7685 LearningRate 0.0794 Epoch: 14 Global Step: 73510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:03:38,025-Speed 10516.40 samples/sec Loss 4.7785 LearningRate 0.0793 Epoch: 14 Global Step: 73520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:03:45,823-Speed 10507.30 samples/sec Loss 4.7494 LearningRate 0.0793 Epoch: 14 Global Step: 73530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:03:53,624-Speed 10502.25 samples/sec Loss 4.8039 LearningRate 0.0792 Epoch: 14 Global Step: 73540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:04:01,449-Speed 10470.49 samples/sec Loss 4.7684 LearningRate 0.0792 Epoch: 14 Global Step: 73550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:04:09,253-Speed 10499.63 samples/sec Loss 4.7692 LearningRate 0.0791 Epoch: 14 Global Step: 73560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:04:17,058-Speed 10496.47 samples/sec Loss 4.8061 LearningRate 0.0791 Epoch: 14 Global Step: 73570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:04:24,860-Speed 10501.21 samples/sec Loss 4.7638 LearningRate 0.0790 Epoch: 14 Global Step: 73580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:04:32,632-Speed 10541.62 samples/sec Loss 4.7630 LearningRate 0.0790 Epoch: 14 Global Step: 73590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:04:40,461-Speed 10465.72 samples/sec Loss 4.7376 LearningRate 0.0789 Epoch: 14 Global Step: 73600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:04:48,263-Speed 10500.38 samples/sec Loss 4.7668 LearningRate 0.0789 Epoch: 14 Global Step: 73610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:04:56,032-Speed 10545.84 samples/sec Loss 4.7645 LearningRate 0.0788 Epoch: 14 Global Step: 73620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:05:03,851-Speed 10478.53 samples/sec Loss 4.7585 LearningRate 0.0788 Epoch: 14 Global Step: 73630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:05:11,714-Speed 10423.22 samples/sec Loss 4.7517 LearningRate 0.0787 Epoch: 14 Global Step: 73640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:05:19,599-Speed 10390.33 samples/sec Loss 4.7371 LearningRate 0.0786 Epoch: 14 Global Step: 73650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:05:27,386-Speed 10521.95 samples/sec Loss 4.7638 LearningRate 0.0786 Epoch: 14 Global Step: 73660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:05:35,169-Speed 10526.14 samples/sec Loss 4.7489 LearningRate 0.0785 Epoch: 14 Global Step: 73670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:05:42,977-Speed 10493.37 samples/sec Loss 4.7441 LearningRate 0.0785 Epoch: 14 Global Step: 73680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:05:50,772-Speed 10511.34 samples/sec Loss 4.7749 LearningRate 0.0784 Epoch: 14 Global Step: 73690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:05:58,592-Speed 10476.30 samples/sec Loss 4.7483 LearningRate 0.0784 Epoch: 14 Global Step: 73700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:06:06,418-Speed 10469.11 samples/sec Loss 4.7572 LearningRate 0.0783 Epoch: 14 Global Step: 73710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:06:14,220-Speed 10501.78 samples/sec Loss 4.7602 LearningRate 0.0783 Epoch: 14 Global Step: 73720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:06:21,997-Speed 10534.72 samples/sec Loss 4.7528 LearningRate 0.0782 Epoch: 14 Global Step: 73730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:06:29,770-Speed 10539.93 samples/sec Loss 4.7742 LearningRate 0.0782 Epoch: 14 Global Step: 73740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:06:37,572-Speed 10501.91 samples/sec Loss 4.7642 LearningRate 0.0781 Epoch: 14 Global Step: 73750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:06:45,381-Speed 10491.84 samples/sec Loss 4.7070 LearningRate 0.0781 Epoch: 14 Global Step: 73760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:06:53,170-Speed 10518.88 samples/sec Loss 4.7430 LearningRate 0.0780 Epoch: 14 Global Step: 73770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:07:00,972-Speed 10512.94 samples/sec Loss 4.7376 LearningRate 0.0780 Epoch: 14 Global Step: 73780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:07:08,749-Speed 10535.28 samples/sec Loss 4.7380 LearningRate 0.0779 Epoch: 14 Global Step: 73790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:07:16,535-Speed 10523.35 samples/sec Loss 4.7697 LearningRate 0.0779 Epoch: 14 Global Step: 73800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:07:24,323-Speed 10518.67 samples/sec Loss 4.8024 LearningRate 0.0778 Epoch: 14 Global Step: 73810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:07:32,138-Speed 10483.77 samples/sec Loss 4.7486 LearningRate 0.0778 Epoch: 14 Global Step: 73820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:07:39,946-Speed 10493.83 samples/sec Loss 4.6983 LearningRate 0.0777 Epoch: 14 Global Step: 73830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:07:47,723-Speed 10534.94 samples/sec Loss 4.7076 LearningRate 0.0777 Epoch: 14 Global Step: 73840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:07:55,526-Speed 10499.15 samples/sec Loss 4.7509 LearningRate 0.0776 Epoch: 14 Global Step: 73850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:08:03,344-Speed 10480.13 samples/sec Loss 4.7374 LearningRate 0.0776 Epoch: 14 Global Step: 73860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:08:11,129-Speed 10524.58 samples/sec Loss 4.7181 LearningRate 0.0775 Epoch: 14 Global Step: 73870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:08:18,946-Speed 10482.86 samples/sec Loss 4.7146 LearningRate 0.0774 Epoch: 14 Global Step: 73880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:08:26,742-Speed 10508.88 samples/sec Loss 4.6995 LearningRate 0.0774 Epoch: 14 Global Step: 73890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:08:34,535-Speed 10513.40 samples/sec Loss 4.7125 LearningRate 0.0773 Epoch: 14 Global Step: 73900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:08:42,333-Speed 10507.37 samples/sec Loss 4.7021 LearningRate 0.0773 Epoch: 14 Global Step: 73910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:08:50,140-Speed 10495.77 samples/sec Loss 4.7368 LearningRate 0.0772 Epoch: 14 Global Step: 73920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:08:57,948-Speed 10493.48 samples/sec Loss 4.7162 LearningRate 0.0772 Epoch: 14 Global Step: 73930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:09:05,753-Speed 10498.78 samples/sec Loss 4.7211 LearningRate 0.0771 Epoch: 14 Global Step: 73940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:09:13,545-Speed 10513.01 samples/sec Loss 4.7220 LearningRate 0.0771 Epoch: 14 Global Step: 73950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:09:21,321-Speed 10537.03 samples/sec Loss 4.7108 LearningRate 0.0770 Epoch: 14 Global Step: 73960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:09:29,170-Speed 10439.02 samples/sec Loss 4.6963 LearningRate 0.0770 Epoch: 14 Global Step: 73970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:09:36,978-Speed 10493.59 samples/sec Loss 4.6720 LearningRate 0.0769 Epoch: 14 Global Step: 73980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:09:44,787-Speed 10492.59 samples/sec Loss 4.6983 LearningRate 0.0769 Epoch: 14 Global Step: 73990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:09:52,614-Speed 10466.75 samples/sec Loss 4.6990 LearningRate 0.0768 Epoch: 14 Global Step: 74000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:10:00,405-Speed 10517.57 samples/sec Loss 4.6846 LearningRate 0.0768 Epoch: 14 Global Step: 74010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:10:08,208-Speed 10499.53 samples/sec Loss 4.7162 LearningRate 0.0767 Epoch: 14 Global Step: 74020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:10:16,036-Speed 10466.27 samples/sec Loss 4.7319 LearningRate 0.0767 Epoch: 14 Global Step: 74030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:10:23,836-Speed 10503.13 samples/sec Loss 4.7381 LearningRate 0.0766 Epoch: 14 Global Step: 74040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:10:31,630-Speed 10512.03 samples/sec Loss 4.6748 LearningRate 0.0766 Epoch: 14 Global Step: 74050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:10:39,438-Speed 10494.28 samples/sec Loss 4.6989 LearningRate 0.0765 Epoch: 14 Global Step: 74060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:10:47,227-Speed 10517.67 samples/sec Loss 4.7145 LearningRate 0.0765 Epoch: 14 Global Step: 74070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:10:55,004-Speed 10541.30 samples/sec Loss 4.6984 LearningRate 0.0764 Epoch: 14 Global Step: 74080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:11:02,787-Speed 10526.48 samples/sec Loss 4.7128 LearningRate 0.0764 Epoch: 14 Global Step: 74090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-01-16 07:11:10,592-Speed 10497.18 samples/sec Loss 4.7474 LearningRate 0.0763 Epoch: 14 Global Step: 74100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:11:18,391-Speed 10504.97 samples/sec Loss 4.6992 LearningRate 0.0763 Epoch: 14 Global Step: 74110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:11:26,200-Speed 10491.97 samples/sec Loss 4.7049 LearningRate 0.0762 Epoch: 14 Global Step: 74120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:11:33,987-Speed 10522.06 samples/sec Loss 4.6947 LearningRate 0.0762 Epoch: 14 Global Step: 74130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-01-16 07:11:41,781-Speed 10512.29 samples/sec Loss 4.6868 LearningRate 0.0761 Epoch: 14 Global Step: 74140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:11:49,594-Speed 10486.71 samples/sec Loss 4.6814 LearningRate 0.0761 Epoch: 14 Global Step: 74150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:11:57,411-Speed 10480.51 samples/sec Loss 4.6785 LearningRate 0.0760 Epoch: 14 Global Step: 74160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:12:05,246-Speed 10456.68 samples/sec Loss 4.7334 LearningRate 0.0759 Epoch: 14 Global Step: 74170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:12:13,041-Speed 10510.93 samples/sec Loss 4.6826 LearningRate 0.0759 Epoch: 14 Global Step: 74180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:12:20,849-Speed 10492.98 samples/sec Loss 4.6550 LearningRate 0.0758 Epoch: 14 Global Step: 74190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:12:28,635-Speed 10523.40 samples/sec Loss 4.6571 LearningRate 0.0758 Epoch: 14 Global Step: 74200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:12:36,460-Speed 10470.15 samples/sec Loss 4.6672 LearningRate 0.0757 Epoch: 14 Global Step: 74210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:12:44,248-Speed 10520.32 samples/sec Loss 4.6928 LearningRate 0.0757 Epoch: 14 Global Step: 74220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:12:52,050-Speed 10501.76 samples/sec Loss 4.7244 LearningRate 0.0756 Epoch: 14 Global Step: 74230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:12:59,846-Speed 10509.59 samples/sec Loss 4.6723 LearningRate 0.0756 Epoch: 14 Global Step: 74240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:13:07,630-Speed 10524.68 samples/sec Loss 4.7083 LearningRate 0.0755 Epoch: 14 Global Step: 74250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:13:15,406-Speed 10536.02 samples/sec Loss 4.6814 LearningRate 0.0755 Epoch: 14 Global Step: 74260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:13:23,195-Speed 10519.09 samples/sec Loss 4.6447 LearningRate 0.0754 Epoch: 14 Global Step: 74270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:13:30,998-Speed 10499.79 samples/sec Loss 4.6175 LearningRate 0.0754 Epoch: 14 Global Step: 74280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:13:38,787-Speed 10519.21 samples/sec Loss 4.6375 LearningRate 0.0753 Epoch: 14 Global Step: 74290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:13:46,553-Speed 10551.15 samples/sec Loss 4.6444 LearningRate 0.0753 Epoch: 14 Global Step: 74300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:13:54,366-Speed 10485.70 samples/sec Loss 4.6421 LearningRate 0.0752 Epoch: 14 Global Step: 74310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:14:02,182-Speed 10481.85 samples/sec Loss 4.6340 LearningRate 0.0752 Epoch: 14 Global Step: 74320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:14:09,969-Speed 10521.75 samples/sec Loss 4.6754 LearningRate 0.0751 Epoch: 14 Global Step: 74330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:14:17,807-Speed 10453.43 samples/sec Loss 4.6517 LearningRate 0.0751 Epoch: 14 Global Step: 74340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:14:25,613-Speed 10495.58 samples/sec Loss 4.6640 LearningRate 0.0750 Epoch: 14 Global Step: 74350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:14:33,449-Speed 10455.31 samples/sec Loss 4.6218 LearningRate 0.0750 Epoch: 14 Global Step: 74360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:14:41,304-Speed 10431.80 samples/sec Loss 4.6624 LearningRate 0.0749 Epoch: 14 Global Step: 74370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:14:49,090-Speed 10522.44 samples/sec Loss 4.6695 LearningRate 0.0749 Epoch: 14 Global Step: 74380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:14:56,875-Speed 10525.06 samples/sec Loss 4.6721 LearningRate 0.0748 Epoch: 14 Global Step: 74390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:15:04,665-Speed 10516.02 samples/sec Loss 4.6904 LearningRate 0.0748 Epoch: 14 Global Step: 74400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:15:12,452-Speed 10522.62 samples/sec Loss 4.6611 LearningRate 0.0747 Epoch: 14 Global Step: 74410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:15:20,242-Speed 10516.43 samples/sec Loss 4.6704 LearningRate 0.0747 Epoch: 14 Global Step: 74420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:15:28,042-Speed 10504.98 samples/sec Loss 4.6399 LearningRate 0.0746 Epoch: 14 Global Step: 74430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:15:35,833-Speed 10516.00 samples/sec Loss 4.6505 LearningRate 0.0746 Epoch: 14 Global Step: 74440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:15:43,608-Speed 10536.94 samples/sec Loss 4.6448 LearningRate 0.0745 Epoch: 14 Global Step: 74450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:15:51,418-Speed 10493.91 samples/sec Loss 4.6039 LearningRate 0.0745 Epoch: 14 Global Step: 74460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:15:59,192-Speed 10538.36 samples/sec Loss 4.6397 LearningRate 0.0744 Epoch: 14 Global Step: 74470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:16:06,972-Speed 10530.94 samples/sec Loss 4.6565 LearningRate 0.0744 Epoch: 14 Global Step: 74480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:16:14,768-Speed 10509.49 samples/sec Loss 4.6599 LearningRate 0.0743 Epoch: 14 Global Step: 74490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:16:22,596-Speed 10467.27 samples/sec Loss 4.6626 LearningRate 0.0743 Epoch: 14 Global Step: 74500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:16:30,419-Speed 10477.53 samples/sec Loss 4.6417 LearningRate 0.0742 Epoch: 14 Global Step: 74510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:16:38,226-Speed 10495.17 samples/sec Loss 4.6287 LearningRate 0.0742 Epoch: 14 Global Step: 74520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:16:46,061-Speed 10456.04 samples/sec Loss 4.6534 LearningRate 0.0741 Epoch: 14 Global Step: 74530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:16:53,869-Speed 10494.53 samples/sec Loss 4.6492 LearningRate 0.0741 Epoch: 14 Global Step: 74540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:17:01,645-Speed 10535.34 samples/sec Loss 4.5901 LearningRate 0.0740 Epoch: 14 Global Step: 74550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:17:09,426-Speed 10530.38 samples/sec Loss 4.6235 LearningRate 0.0740 Epoch: 14 Global Step: 74560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:17:17,204-Speed 10534.08 samples/sec Loss 4.6175 LearningRate 0.0739 Epoch: 14 Global Step: 74570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:17:24,983-Speed 10533.12 samples/sec Loss 4.6544 LearningRate 0.0739 Epoch: 14 Global Step: 74580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:17:32,779-Speed 10508.95 samples/sec Loss 4.6454 LearningRate 0.0738 Epoch: 14 Global Step: 74590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:17:40,592-Speed 10503.31 samples/sec Loss 4.6495 LearningRate 0.0738 Epoch: 14 Global Step: 74600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:17:48,398-Speed 10495.63 samples/sec Loss 4.6031 LearningRate 0.0737 Epoch: 14 Global Step: 74610 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-16 07:17:56,179-Speed 10529.72 samples/sec Loss 4.6663 LearningRate 0.0736 Epoch: 14 Global Step: 74620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:18:03,953-Speed 10538.68 samples/sec Loss 4.6424 LearningRate 0.0736 Epoch: 14 Global Step: 74630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:18:11,745-Speed 10514.98 samples/sec Loss 4.6318 LearningRate 0.0735 Epoch: 14 Global Step: 74640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:18:19,517-Speed 10542.82 samples/sec Loss 4.6347 LearningRate 0.0735 Epoch: 14 Global Step: 74650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:18:27,298-Speed 10528.78 samples/sec Loss 4.6225 LearningRate 0.0734 Epoch: 14 Global Step: 74660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:18:35,100-Speed 10501.01 samples/sec Loss 4.6396 LearningRate 0.0734 Epoch: 14 Global Step: 74670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:18:42,902-Speed 10501.17 samples/sec Loss 4.6339 LearningRate 0.0733 Epoch: 14 Global Step: 74680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:18:50,669-Speed 10548.51 samples/sec Loss 4.5678 LearningRate 0.0733 Epoch: 14 Global Step: 74690 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-16 07:18:58,464-Speed 10511.18 samples/sec Loss 4.6140 LearningRate 0.0732 Epoch: 14 Global Step: 74700 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-16 07:19:06,272-Speed 10493.09 samples/sec Loss 4.5846 LearningRate 0.0732 Epoch: 14 Global Step: 74710 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-16 07:19:14,062-Speed 10517.25 samples/sec Loss 4.6001 LearningRate 0.0731 Epoch: 14 Global Step: 74720 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-16 07:19:21,834-Speed 10545.84 samples/sec Loss 4.6261 LearningRate 0.0731 Epoch: 14 Global Step: 74730 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-16 07:19:29,604-Speed 10544.89 samples/sec Loss 4.6058 LearningRate 0.0730 Epoch: 14 Global Step: 74740 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-16 07:19:37,399-Speed 10514.01 samples/sec Loss 4.5853 LearningRate 0.0730 Epoch: 14 Global Step: 74750 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-16 07:19:45,183-Speed 10525.39 samples/sec Loss 4.6234 LearningRate 0.0729 Epoch: 14 Global Step: 74760 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-16 07:19:52,972-Speed 10518.38 samples/sec Loss 4.6147 LearningRate 0.0729 Epoch: 14 Global Step: 74770 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-16 07:20:00,754-Speed 10528.56 samples/sec Loss 4.5677 LearningRate 0.0728 Epoch: 14 Global Step: 74780 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-16 07:20:08,559-Speed 10497.85 samples/sec Loss 4.6336 LearningRate 0.0728 Epoch: 14 Global Step: 74790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:20:16,402-Speed 10446.08 samples/sec Loss 4.5752 LearningRate 0.0727 Epoch: 14 Global Step: 74800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:20:24,195-Speed 10513.79 samples/sec Loss 4.5813 LearningRate 0.0727 Epoch: 14 Global Step: 74810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:20:31,983-Speed 10519.61 samples/sec Loss 4.5896 LearningRate 0.0726 Epoch: 14 Global Step: 74820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:20:39,761-Speed 10534.52 samples/sec Loss 4.5728 LearningRate 0.0726 Epoch: 14 Global Step: 74830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:20:47,614-Speed 10432.46 samples/sec Loss 4.5843 LearningRate 0.0725 Epoch: 14 Global Step: 74840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:20:55,411-Speed 10507.79 samples/sec Loss 4.5994 LearningRate 0.0725 Epoch: 14 Global Step: 74850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:21:03,182-Speed 10542.23 samples/sec Loss 4.6023 LearningRate 0.0724 Epoch: 14 Global Step: 74860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:21:10,970-Speed 10522.10 samples/sec Loss 4.5894 LearningRate 0.0724 Epoch: 14 Global Step: 74870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:21:18,727-Speed 10561.48 samples/sec Loss 4.5858 LearningRate 0.0723 Epoch: 14 Global Step: 74880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:21:26,501-Speed 10539.42 samples/sec Loss 4.5649 LearningRate 0.0723 Epoch: 14 Global Step: 74890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:21:34,283-Speed 10528.59 samples/sec Loss 4.5644 LearningRate 0.0722 Epoch: 14 Global Step: 74900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:21:42,070-Speed 10526.37 samples/sec Loss 4.6080 LearningRate 0.0722 Epoch: 14 Global Step: 74910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:21:49,858-Speed 10520.67 samples/sec Loss 4.5853 LearningRate 0.0721 Epoch: 14 Global Step: 74920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:21:57,660-Speed 10500.10 samples/sec Loss 4.5719 LearningRate 0.0721 Epoch: 14 Global Step: 74930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:22:05,446-Speed 10523.54 samples/sec Loss 4.5804 LearningRate 0.0720 Epoch: 14 Global Step: 74940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:22:13,233-Speed 10521.36 samples/sec Loss 4.6030 LearningRate 0.0720 Epoch: 14 Global Step: 74950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:22:21,062-Speed 10465.26 samples/sec Loss 4.6020 LearningRate 0.0719 Epoch: 14 Global Step: 74960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:22:28,860-Speed 10505.97 samples/sec Loss 4.5647 LearningRate 0.0719 Epoch: 14 Global Step: 74970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:22:36,656-Speed 10515.48 samples/sec Loss 4.5539 LearningRate 0.0718 Epoch: 14 Global Step: 74980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:22:44,449-Speed 10519.00 samples/sec Loss 4.5435 LearningRate 0.0718 Epoch: 14 Global Step: 74990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:22:52,237-Speed 10520.11 samples/sec Loss 4.5734 LearningRate 0.0717 Epoch: 14 Global Step: 75000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:23:00,022-Speed 10523.85 samples/sec Loss 4.5342 LearningRate 0.0717 Epoch: 14 Global Step: 75010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:23:07,818-Speed 10508.78 samples/sec Loss 4.5967 LearningRate 0.0716 Epoch: 14 Global Step: 75020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:23:15,609-Speed 10517.08 samples/sec Loss 4.6034 LearningRate 0.0716 Epoch: 14 Global Step: 75030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:23:23,402-Speed 10513.60 samples/sec Loss 4.5656 LearningRate 0.0715 Epoch: 14 Global Step: 75040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:23:31,218-Speed 10481.84 samples/sec Loss 4.5738 LearningRate 0.0715 Epoch: 14 Global Step: 75050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:23:39,026-Speed 10492.98 samples/sec Loss 4.5710 LearningRate 0.0714 Epoch: 14 Global Step: 75060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:23:46,838-Speed 10488.53 samples/sec Loss 4.5396 LearningRate 0.0714 Epoch: 14 Global Step: 75070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:23:54,623-Speed 10524.27 samples/sec Loss 4.6115 LearningRate 0.0713 Epoch: 14 Global Step: 75080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:24:02,419-Speed 10509.08 samples/sec Loss 4.5201 LearningRate 0.0713 Epoch: 14 Global Step: 75090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:24:10,265-Speed 10441.86 samples/sec Loss 4.5265 LearningRate 0.0712 Epoch: 14 Global Step: 75100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:24:18,059-Speed 10512.78 samples/sec Loss 4.5574 LearningRate 0.0712 Epoch: 14 Global Step: 75110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:24:25,846-Speed 10521.56 samples/sec Loss 4.5457 LearningRate 0.0711 Epoch: 14 Global Step: 75120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:24:33,660-Speed 10484.30 samples/sec Loss 4.5705 LearningRate 0.0711 Epoch: 14 Global Step: 75130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:24:41,473-Speed 10486.97 samples/sec Loss 4.5859 LearningRate 0.0710 Epoch: 14 Global Step: 75140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:24:49,254-Speed 10530.30 samples/sec Loss 4.5781 LearningRate 0.0710 Epoch: 14 Global Step: 75150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:24:57,050-Speed 10508.73 samples/sec Loss 4.5596 LearningRate 0.0709 Epoch: 14 Global Step: 75160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:25:04,842-Speed 10515.19 samples/sec Loss 4.5725 LearningRate 0.0709 Epoch: 14 Global Step: 75170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:25:12,661-Speed 10478.09 samples/sec Loss 4.5604 LearningRate 0.0708 Epoch: 14 Global Step: 75180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:25:20,466-Speed 10498.78 samples/sec Loss 4.5846 LearningRate 0.0708 Epoch: 14 Global Step: 75190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:25:28,238-Speed 10541.41 samples/sec Loss 4.5565 LearningRate 0.0707 Epoch: 14 Global Step: 75200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:25:36,022-Speed 10525.63 samples/sec Loss 4.5471 LearningRate 0.0707 Epoch: 14 Global Step: 75210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:25:43,803-Speed 10528.94 samples/sec Loss 4.5430 LearningRate 0.0706 Epoch: 14 Global Step: 75220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:25:51,589-Speed 10522.97 samples/sec Loss 4.5303 LearningRate 0.0706 Epoch: 14 Global Step: 75230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:25:59,380-Speed 10516.29 samples/sec Loss 4.5205 LearningRate 0.0705 Epoch: 14 Global Step: 75240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:26:07,184-Speed 10498.20 samples/sec Loss 4.5119 LearningRate 0.0705 Epoch: 14 Global Step: 75250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:26:14,986-Speed 10501.86 samples/sec Loss 4.5372 LearningRate 0.0704 Epoch: 14 Global Step: 75260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:26:22,770-Speed 10525.68 samples/sec Loss 4.5306 LearningRate 0.0704 Epoch: 14 Global Step: 75270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:26:30,583-Speed 10485.91 samples/sec Loss 4.5874 LearningRate 0.0703 Epoch: 14 Global Step: 75280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:26:38,400-Speed 10482.08 samples/sec Loss 4.4766 LearningRate 0.0703 Epoch: 14 Global Step: 75290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:26:46,176-Speed 10536.13 samples/sec Loss 4.5001 LearningRate 0.0702 Epoch: 14 Global Step: 75300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:26:53,992-Speed 10481.67 samples/sec Loss 4.5204 LearningRate 0.0702 Epoch: 14 Global Step: 75310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:27:01,795-Speed 10499.49 samples/sec Loss 4.5580 LearningRate 0.0701 Epoch: 14 Global Step: 75320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:27:09,604-Speed 10492.81 samples/sec Loss 4.5613 LearningRate 0.0701 Epoch: 14 Global Step: 75330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:27:17,417-Speed 10485.98 samples/sec Loss 4.5051 LearningRate 0.0700 Epoch: 14 Global Step: 75340 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-16 07:27:25,215-Speed 10507.59 samples/sec Loss 4.5275 LearningRate 0.0700 Epoch: 14 Global Step: 75350 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-16 07:27:33,022-Speed 10493.36 samples/sec Loss 4.5226 LearningRate 0.0699 Epoch: 14 Global Step: 75360 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-16 07:27:40,797-Speed 10538.92 samples/sec Loss 4.5055 LearningRate 0.0699 Epoch: 14 Global Step: 75370 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-16 07:27:48,565-Speed 10546.60 samples/sec Loss 4.4991 LearningRate 0.0698 Epoch: 14 Global Step: 75380 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-16 07:27:56,355-Speed 10517.74 samples/sec Loss 4.5611 LearningRate 0.0698 Epoch: 14 Global Step: 75390 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-16 07:28:04,161-Speed 10495.28 samples/sec Loss 4.5115 LearningRate 0.0697 Epoch: 14 Global Step: 75400 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-16 07:28:11,938-Speed 10536.31 samples/sec Loss 4.5066 LearningRate 0.0697 Epoch: 14 Global Step: 75410 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-16 07:28:19,751-Speed 10485.20 samples/sec Loss 4.5301 LearningRate 0.0697 Epoch: 14 Global Step: 75420 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-16 07:28:27,566-Speed 10484.09 samples/sec Loss 4.5095 LearningRate 0.0696 Epoch: 14 Global Step: 75430 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-16 07:28:35,362-Speed 10509.08 samples/sec Loss 4.5352 LearningRate 0.0696 Epoch: 14 Global Step: 75440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:28:43,191-Speed 10466.05 samples/sec Loss 4.5041 LearningRate 0.0695 Epoch: 14 Global Step: 75450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:28:50,977-Speed 10523.06 samples/sec Loss 4.5361 LearningRate 0.0695 Epoch: 14 Global Step: 75460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:28:58,791-Speed 10484.70 samples/sec Loss 4.4982 LearningRate 0.0694 Epoch: 14 Global Step: 75470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:29:06,587-Speed 10510.06 samples/sec Loss 4.5056 LearningRate 0.0694 Epoch: 14 Global Step: 75480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:29:14,407-Speed 10477.04 samples/sec Loss 4.4706 LearningRate 0.0693 Epoch: 14 Global Step: 75490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:29:22,218-Speed 10488.06 samples/sec Loss 4.5316 LearningRate 0.0693 Epoch: 14 Global Step: 75500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:29:30,035-Speed 10481.16 samples/sec Loss 4.5276 LearningRate 0.0692 Epoch: 14 Global Step: 75510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:29:37,816-Speed 10529.71 samples/sec Loss 4.5413 LearningRate 0.0692 Epoch: 14 Global Step: 75520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:29:45,627-Speed 10490.63 samples/sec Loss 4.5347 LearningRate 0.0691 Epoch: 14 Global Step: 75530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:29:53,431-Speed 10497.03 samples/sec Loss 4.5194 LearningRate 0.0691 Epoch: 14 Global Step: 75540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:30:01,203-Speed 10541.99 samples/sec Loss 4.4952 LearningRate 0.0690 Epoch: 14 Global Step: 75550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:30:08,973-Speed 10544.17 samples/sec Loss 4.4754 LearningRate 0.0690 Epoch: 14 Global Step: 75560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:30:16,820-Speed 10441.32 samples/sec Loss 4.5391 LearningRate 0.0689 Epoch: 14 Global Step: 75570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:30:24,662-Speed 10446.83 samples/sec Loss 4.4925 LearningRate 0.0689 Epoch: 14 Global Step: 75580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:30:32,488-Speed 10469.93 samples/sec Loss 4.5122 LearningRate 0.0688 Epoch: 14 Global Step: 75590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:30:40,279-Speed 10515.79 samples/sec Loss 4.5014 LearningRate 0.0688 Epoch: 14 Global Step: 75600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:30:48,069-Speed 10517.96 samples/sec Loss 4.4871 LearningRate 0.0687 Epoch: 14 Global Step: 75610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:30:55,927-Speed 10426.40 samples/sec Loss 4.4993 LearningRate 0.0687 Epoch: 14 Global Step: 75620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:31:03,768-Speed 10449.00 samples/sec Loss 4.4881 LearningRate 0.0686 Epoch: 14 Global Step: 75630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:31:11,563-Speed 10510.01 samples/sec Loss 4.4877 LearningRate 0.0686 Epoch: 14 Global Step: 75640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:31:19,378-Speed 10484.77 samples/sec Loss 4.5050 LearningRate 0.0685 Epoch: 14 Global Step: 75650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:31:27,238-Speed 10423.36 samples/sec Loss 4.4637 LearningRate 0.0685 Epoch: 14 Global Step: 75660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:31:35,058-Speed 10477.55 samples/sec Loss 4.5052 LearningRate 0.0684 Epoch: 14 Global Step: 75670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:31:42,876-Speed 10479.28 samples/sec Loss 4.4766 LearningRate 0.0684 Epoch: 14 Global Step: 75680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:31:50,669-Speed 10514.44 samples/sec Loss 4.5031 LearningRate 0.0683 Epoch: 14 Global Step: 75690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:31:58,495-Speed 10469.16 samples/sec Loss 4.4715 LearningRate 0.0683 Epoch: 14 Global Step: 75700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:32:06,296-Speed 10501.44 samples/sec Loss 4.4833 LearningRate 0.0682 Epoch: 14 Global Step: 75710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:32:14,085-Speed 10518.53 samples/sec Loss 4.4700 LearningRate 0.0682 Epoch: 14 Global Step: 75720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:32:21,899-Speed 10485.81 samples/sec Loss 4.4487 LearningRate 0.0681 Epoch: 14 Global Step: 75730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:32:29,709-Speed 10493.95 samples/sec Loss 4.4832 LearningRate 0.0681 Epoch: 14 Global Step: 75740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:32:37,512-Speed 10500.75 samples/sec Loss 4.5085 LearningRate 0.0680 Epoch: 14 Global Step: 75750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:32:45,316-Speed 10497.89 samples/sec Loss 4.4872 LearningRate 0.0680 Epoch: 14 Global Step: 75760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:32:53,131-Speed 10484.87 samples/sec Loss 4.4562 LearningRate 0.0679 Epoch: 14 Global Step: 75770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:33:00,927-Speed 10509.77 samples/sec Loss 4.4595 LearningRate 0.0679 Epoch: 14 Global Step: 75780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:33:08,765-Speed 10453.27 samples/sec Loss 4.4665 LearningRate 0.0678 Epoch: 14 Global Step: 75790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:33:16,609-Speed 10445.02 samples/sec Loss 4.4630 LearningRate 0.0678 Epoch: 14 Global Step: 75800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:33:24,422-Speed 10485.83 samples/sec Loss 4.4653 LearningRate 0.0677 Epoch: 14 Global Step: 75810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:33:32,217-Speed 10510.94 samples/sec Loss 4.4160 LearningRate 0.0677 Epoch: 14 Global Step: 75820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:33:40,010-Speed 10513.37 samples/sec Loss 4.4446 LearningRate 0.0676 Epoch: 14 Global Step: 75830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:33:47,790-Speed 10530.12 samples/sec Loss 4.4636 LearningRate 0.0676 Epoch: 14 Global Step: 75840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:33:55,620-Speed 10464.18 samples/sec Loss 4.4893 LearningRate 0.0675 Epoch: 14 Global Step: 75850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:34:03,477-Speed 10427.84 samples/sec Loss 4.4208 LearningRate 0.0675 Epoch: 14 Global Step: 75860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:34:11,295-Speed 10480.25 samples/sec Loss 4.4672 LearningRate 0.0675 Epoch: 14 Global Step: 75870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:34:19,086-Speed 10515.50 samples/sec Loss 4.4611 LearningRate 0.0674 Epoch: 14 Global Step: 75880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:34:26,887-Speed 10502.50 samples/sec Loss 4.4247 LearningRate 0.0674 Epoch: 14 Global Step: 75890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:34:34,654-Speed 10548.00 samples/sec Loss 4.4367 LearningRate 0.0673 Epoch: 14 Global Step: 75900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:34:42,438-Speed 10525.75 samples/sec Loss 4.4648 LearningRate 0.0673 Epoch: 14 Global Step: 75910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:34:50,226-Speed 10521.49 samples/sec Loss 4.4381 LearningRate 0.0672 Epoch: 14 Global Step: 75920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:34:58,011-Speed 10523.18 samples/sec Loss 4.3900 LearningRate 0.0672 Epoch: 14 Global Step: 75930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:35:05,788-Speed 10535.63 samples/sec Loss 4.4570 LearningRate 0.0671 Epoch: 14 Global Step: 75940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:35:13,599-Speed 10488.95 samples/sec Loss 4.4027 LearningRate 0.0671 Epoch: 14 Global Step: 75950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:35:21,404-Speed 10497.24 samples/sec Loss 4.3731 LearningRate 0.0670 Epoch: 14 Global Step: 75960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:35:29,214-Speed 10491.07 samples/sec Loss 4.4453 LearningRate 0.0670 Epoch: 14 Global Step: 75970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:35:37,001-Speed 10524.11 samples/sec Loss 4.4215 LearningRate 0.0669 Epoch: 14 Global Step: 75980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:35:44,784-Speed 10526.45 samples/sec Loss 4.4488 LearningRate 0.0669 Epoch: 14 Global Step: 75990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:35:52,573-Speed 10518.69 samples/sec Loss 4.4658 LearningRate 0.0668 Epoch: 14 Global Step: 76000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:36:00,355-Speed 10528.72 samples/sec Loss 4.4220 LearningRate 0.0668 Epoch: 14 Global Step: 76010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:36:08,148-Speed 10513.62 samples/sec Loss 4.4488 LearningRate 0.0667 Epoch: 14 Global Step: 76020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:36:15,918-Speed 10544.41 samples/sec Loss 4.3900 LearningRate 0.0667 Epoch: 14 Global Step: 76030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:36:23,734-Speed 10483.69 samples/sec Loss 4.4002 LearningRate 0.0666 Epoch: 14 Global Step: 76040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:36:31,562-Speed 10471.86 samples/sec Loss 4.4399 LearningRate 0.0666 Epoch: 14 Global Step: 76050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:36:39,402-Speed 10454.98 samples/sec Loss 4.4367 LearningRate 0.0665 Epoch: 14 Global Step: 76060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:36:47,255-Speed 10433.90 samples/sec Loss 4.4289 LearningRate 0.0665 Epoch: 14 Global Step: 76070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:36:55,053-Speed 10505.63 samples/sec Loss 4.4405 LearningRate 0.0664 Epoch: 14 Global Step: 76080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:37:02,862-Speed 10492.60 samples/sec Loss 4.4186 LearningRate 0.0664 Epoch: 14 Global Step: 76090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:37:10,655-Speed 10513.60 samples/sec Loss 4.4212 LearningRate 0.0663 Epoch: 14 Global Step: 76100 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-01-16 07:37:18,453-Speed 10505.84 samples/sec Loss 4.3923 LearningRate 0.0663 Epoch: 14 Global Step: 76110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:37:26,272-Speed 10479.08 samples/sec Loss 4.4037 LearningRate 0.0662 Epoch: 14 Global Step: 76120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:37:34,097-Speed 10469.90 samples/sec Loss 4.4153 LearningRate 0.0662 Epoch: 14 Global Step: 76130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:37:41,900-Speed 10500.58 samples/sec Loss 4.4256 LearningRate 0.0661 Epoch: 14 Global Step: 76140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:37:49,710-Speed 10490.44 samples/sec Loss 4.4284 LearningRate 0.0661 Epoch: 14 Global Step: 76150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:37:57,490-Speed 10536.27 samples/sec Loss 4.4138 LearningRate 0.0661 Epoch: 14 Global Step: 76160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:38:05,281-Speed 10516.44 samples/sec Loss 4.3932 LearningRate 0.0660 Epoch: 14 Global Step: 76170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:38:13,075-Speed 10511.76 samples/sec Loss 4.3835 LearningRate 0.0660 Epoch: 14 Global Step: 76180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:38:20,863-Speed 10520.96 samples/sec Loss 4.4104 LearningRate 0.0659 Epoch: 14 Global Step: 76190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:38:28,677-Speed 10484.03 samples/sec Loss 4.4367 LearningRate 0.0659 Epoch: 14 Global Step: 76200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:38:36,508-Speed 10462.83 samples/sec Loss 4.3986 LearningRate 0.0658 Epoch: 14 Global Step: 76210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:38:44,363-Speed 10431.40 samples/sec Loss 4.4179 LearningRate 0.0658 Epoch: 14 Global Step: 76220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:38:52,177-Speed 10484.92 samples/sec Loss 4.3799 LearningRate 0.0657 Epoch: 14 Global Step: 76230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:38:59,986-Speed 10491.73 samples/sec Loss 4.3854 LearningRate 0.0657 Epoch: 14 Global Step: 76240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:39:07,774-Speed 10519.75 samples/sec Loss 4.3560 LearningRate 0.0656 Epoch: 14 Global Step: 76250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:39:15,566-Speed 10515.75 samples/sec Loss 4.4141 LearningRate 0.0656 Epoch: 14 Global Step: 76260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:39:23,363-Speed 10507.10 samples/sec Loss 4.3862 LearningRate 0.0655 Epoch: 14 Global Step: 76270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:39:31,150-Speed 10522.12 samples/sec Loss 4.3836 LearningRate 0.0655 Epoch: 14 Global Step: 76280 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-16 07:39:38,957-Speed 10496.38 samples/sec Loss 4.4113 LearningRate 0.0654 Epoch: 14 Global Step: 76290 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-16 07:39:46,788-Speed 10464.78 samples/sec Loss 4.4113 LearningRate 0.0654 Epoch: 14 Global Step: 76300 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-16 07:39:54,604-Speed 10482.46 samples/sec Loss 4.4022 LearningRate 0.0653 Epoch: 14 Global Step: 76310 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-16 07:40:02,398-Speed 10512.72 samples/sec Loss 4.3880 LearningRate 0.0653 Epoch: 14 Global Step: 76320 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-16 07:40:10,200-Speed 10501.27 samples/sec Loss 4.3828 LearningRate 0.0652 Epoch: 14 Global Step: 76330 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-16 07:40:17,976-Speed 10537.29 samples/sec Loss 4.3828 LearningRate 0.0652 Epoch: 14 Global Step: 76340 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-16 07:40:25,763-Speed 10520.09 samples/sec Loss 4.3551 LearningRate 0.0651 Epoch: 14 Global Step: 76350 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-16 07:40:33,578-Speed 10483.73 samples/sec Loss 4.3801 LearningRate 0.0651 Epoch: 14 Global Step: 76360 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-16 07:40:41,393-Speed 10485.25 samples/sec Loss 4.3864 LearningRate 0.0650 Epoch: 14 Global Step: 76370 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-01-16 07:40:49,206-Speed 10486.19 samples/sec Loss 4.3872 LearningRate 0.0650 Epoch: 14 Global Step: 76380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:40:56,993-Speed 10520.97 samples/sec Loss 4.3719 LearningRate 0.0650 Epoch: 14 Global Step: 76390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:41:04,816-Speed 10472.06 samples/sec Loss 4.3771 LearningRate 0.0649 Epoch: 14 Global Step: 76400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:41:12,597-Speed 10530.97 samples/sec Loss 4.3796 LearningRate 0.0649 Epoch: 14 Global Step: 76410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:41:20,355-Speed 10561.40 samples/sec Loss 4.3885 LearningRate 0.0648 Epoch: 14 Global Step: 76420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:41:28,186-Speed 10462.08 samples/sec Loss 4.3917 LearningRate 0.0648 Epoch: 14 Global Step: 76430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:41:35,970-Speed 10525.60 samples/sec Loss 4.3541 LearningRate 0.0647 Epoch: 14 Global Step: 76440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:41:43,751-Speed 10529.94 samples/sec Loss 4.3901 LearningRate 0.0647 Epoch: 14 Global Step: 76450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:41:51,562-Speed 10489.23 samples/sec Loss 4.3530 LearningRate 0.0646 Epoch: 14 Global Step: 76460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:41:59,389-Speed 10467.94 samples/sec Loss 4.3201 LearningRate 0.0646 Epoch: 14 Global Step: 76470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:42:07,222-Speed 10459.79 samples/sec Loss 4.3531 LearningRate 0.0645 Epoch: 14 Global Step: 76480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:42:15,026-Speed 10498.34 samples/sec Loss 4.3321 LearningRate 0.0645 Epoch: 14 Global Step: 76490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:42:22,826-Speed 10503.75 samples/sec Loss 4.3485 LearningRate 0.0644 Epoch: 14 Global Step: 76500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:42:30,645-Speed 10478.43 samples/sec Loss 4.3716 LearningRate 0.0644 Epoch: 14 Global Step: 76510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:42:38,418-Speed 10539.75 samples/sec Loss 4.3419 LearningRate 0.0643 Epoch: 14 Global Step: 76520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:42:46,241-Speed 10473.90 samples/sec Loss 4.3424 LearningRate 0.0643 Epoch: 14 Global Step: 76530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:42:54,040-Speed 10505.54 samples/sec Loss 4.3578 LearningRate 0.0642 Epoch: 14 Global Step: 76540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:43:01,861-Speed 10476.16 samples/sec Loss 4.2993 LearningRate 0.0642 Epoch: 14 Global Step: 76550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:43:09,673-Speed 10487.27 samples/sec Loss 4.3698 LearningRate 0.0641 Epoch: 14 Global Step: 76560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:43:17,477-Speed 10498.54 samples/sec Loss 4.3612 LearningRate 0.0641 Epoch: 14 Global Step: 76570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:43:25,285-Speed 10492.87 samples/sec Loss 4.3875 LearningRate 0.0641 Epoch: 14 Global Step: 76580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:43:33,102-Speed 10482.05 samples/sec Loss 4.3465 LearningRate 0.0640 Epoch: 14 Global Step: 76590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:43:40,884-Speed 10527.97 samples/sec Loss 4.3516 LearningRate 0.0640 Epoch: 14 Global Step: 76600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:43:48,679-Speed 10511.54 samples/sec Loss 4.3669 LearningRate 0.0639 Epoch: 14 Global Step: 76610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:43:56,465-Speed 10522.42 samples/sec Loss 4.3470 LearningRate 0.0639 Epoch: 14 Global Step: 76620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:44:04,245-Speed 10531.55 samples/sec Loss 4.3332 LearningRate 0.0638 Epoch: 14 Global Step: 76630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:44:12,029-Speed 10524.97 samples/sec Loss 4.3418 LearningRate 0.0638 Epoch: 14 Global Step: 76640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:44:19,820-Speed 10516.12 samples/sec Loss 4.3302 LearningRate 0.0637 Epoch: 14 Global Step: 76650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:44:27,615-Speed 10509.51 samples/sec Loss 4.3442 LearningRate 0.0637 Epoch: 14 Global Step: 76660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:44:35,399-Speed 10526.52 samples/sec Loss 4.3611 LearningRate 0.0636 Epoch: 14 Global Step: 76670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:44:43,178-Speed 10531.94 samples/sec Loss 4.3289 LearningRate 0.0636 Epoch: 14 Global Step: 76680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:44:50,973-Speed 10510.22 samples/sec Loss 4.3422 LearningRate 0.0635 Epoch: 14 Global Step: 76690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:44:58,764-Speed 10515.99 samples/sec Loss 4.3411 LearningRate 0.0635 Epoch: 14 Global Step: 76700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:45:06,553-Speed 10519.09 samples/sec Loss 4.3493 LearningRate 0.0634 Epoch: 14 Global Step: 76710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:45:14,325-Speed 10541.95 samples/sec Loss 4.3609 LearningRate 0.0634 Epoch: 14 Global Step: 76720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:45:22,108-Speed 10526.43 samples/sec Loss 4.3570 LearningRate 0.0633 Epoch: 14 Global Step: 76730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:45:29,969-Speed 10421.45 samples/sec Loss 4.3464 LearningRate 0.0633 Epoch: 14 Global Step: 76740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:45:37,782-Speed 10488.37 samples/sec Loss 4.3408 LearningRate 0.0632 Epoch: 14 Global Step: 76750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:45:45,560-Speed 10532.60 samples/sec Loss 4.3290 LearningRate 0.0632 Epoch: 14 Global Step: 76760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:45:53,327-Speed 10548.93 samples/sec Loss 4.3409 LearningRate 0.0632 Epoch: 14 Global Step: 76770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:46:01,141-Speed 10486.18 samples/sec Loss 4.3545 LearningRate 0.0631 Epoch: 14 Global Step: 76780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:46:08,952-Speed 10488.32 samples/sec Loss 4.3215 LearningRate 0.0631 Epoch: 14 Global Step: 76790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:46:16,773-Speed 10475.66 samples/sec Loss 4.3039 LearningRate 0.0630 Epoch: 14 Global Step: 76800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:46:24,576-Speed 10500.04 samples/sec Loss 4.3397 LearningRate 0.0630 Epoch: 14 Global Step: 76810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:46:32,365-Speed 10519.79 samples/sec Loss 4.2709 LearningRate 0.0629 Epoch: 14 Global Step: 76820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:46:40,159-Speed 10511.87 samples/sec Loss 4.3335 LearningRate 0.0629 Epoch: 14 Global Step: 76830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:46:47,963-Speed 10498.45 samples/sec Loss 4.3389 LearningRate 0.0628 Epoch: 14 Global Step: 76840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:46:55,760-Speed 10506.49 samples/sec Loss 4.3275 LearningRate 0.0628 Epoch: 14 Global Step: 76850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:47:03,552-Speed 10516.45 samples/sec Loss 4.3233 LearningRate 0.0627 Epoch: 14 Global Step: 76860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:47:11,348-Speed 10509.54 samples/sec Loss 4.3004 LearningRate 0.0627 Epoch: 14 Global Step: 76870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:47:19,134-Speed 10521.74 samples/sec Loss 4.2626 LearningRate 0.0626 Epoch: 14 Global Step: 76880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:47:26,918-Speed 10530.47 samples/sec Loss 4.3035 LearningRate 0.0626 Epoch: 14 Global Step: 76890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:47:34,707-Speed 10519.15 samples/sec Loss 4.2876 LearningRate 0.0625 Epoch: 14 Global Step: 76900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:47:42,504-Speed 10509.03 samples/sec Loss 4.3370 LearningRate 0.0625 Epoch: 14 Global Step: 76910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:47:50,329-Speed 10475.02 samples/sec Loss 4.3219 LearningRate 0.0625 Epoch: 14 Global Step: 76920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:47:58,109-Speed 10530.85 samples/sec Loss 4.3249 LearningRate 0.0624 Epoch: 14 Global Step: 76930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:48:05,893-Speed 10524.47 samples/sec Loss 4.2553 LearningRate 0.0624 Epoch: 14 Global Step: 76940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:48:13,682-Speed 10519.18 samples/sec Loss 4.3025 LearningRate 0.0623 Epoch: 14 Global Step: 76950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:48:21,471-Speed 10519.99 samples/sec Loss 4.3049 LearningRate 0.0623 Epoch: 14 Global Step: 76960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:48:29,247-Speed 10535.72 samples/sec Loss 4.2981 LearningRate 0.0622 Epoch: 14 Global Step: 76970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:48:37,028-Speed 10530.57 samples/sec Loss 4.2853 LearningRate 0.0622 Epoch: 14 Global Step: 76980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:48:44,816-Speed 10520.31 samples/sec Loss 4.3074 LearningRate 0.0621 Epoch: 14 Global Step: 76990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:48:52,595-Speed 10531.01 samples/sec Loss 4.2818 LearningRate 0.0621 Epoch: 14 Global Step: 77000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:49:00,382-Speed 10522.03 samples/sec Loss 4.2849 LearningRate 0.0620 Epoch: 14 Global Step: 77010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:49:08,181-Speed 10506.51 samples/sec Loss 4.2719 LearningRate 0.0620 Epoch: 14 Global Step: 77020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:49:15,971-Speed 10517.28 samples/sec Loss 4.2802 LearningRate 0.0619 Epoch: 14 Global Step: 77030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:49:23,823-Speed 10435.15 samples/sec Loss 4.2898 LearningRate 0.0619 Epoch: 14 Global Step: 77040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:49:31,623-Speed 10503.82 samples/sec Loss 4.3113 LearningRate 0.0618 Epoch: 14 Global Step: 77050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:49:39,432-Speed 10492.47 samples/sec Loss 4.2782 LearningRate 0.0618 Epoch: 14 Global Step: 77060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:49:47,235-Speed 10499.31 samples/sec Loss 4.3200 LearningRate 0.0618 Epoch: 14 Global Step: 77070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:49:55,078-Speed 10447.00 samples/sec Loss 4.2845 LearningRate 0.0617 Epoch: 14 Global Step: 77080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:50:02,905-Speed 10467.45 samples/sec Loss 4.2647 LearningRate 0.0617 Epoch: 14 Global Step: 77090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:50:10,696-Speed 10515.63 samples/sec Loss 4.2857 LearningRate 0.0616 Epoch: 14 Global Step: 77100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:50:18,508-Speed 10488.99 samples/sec Loss 4.2839 LearningRate 0.0616 Epoch: 14 Global Step: 77110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:50:26,294-Speed 10522.55 samples/sec Loss 4.2635 LearningRate 0.0615 Epoch: 14 Global Step: 77120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:50:34,080-Speed 10522.13 samples/sec Loss 4.2631 LearningRate 0.0615 Epoch: 14 Global Step: 77130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:50:41,866-Speed 10522.44 samples/sec Loss 4.2780 LearningRate 0.0614 Epoch: 14 Global Step: 77140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:50:49,669-Speed 10500.63 samples/sec Loss 4.2741 LearningRate 0.0614 Epoch: 14 Global Step: 77150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:50:57,457-Speed 10519.91 samples/sec Loss 4.2296 LearningRate 0.0613 Epoch: 14 Global Step: 77160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:51:05,248-Speed 10515.99 samples/sec Loss 4.2717 LearningRate 0.0613 Epoch: 14 Global Step: 77170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:51:13,054-Speed 10495.64 samples/sec Loss 4.2741 LearningRate 0.0612 Epoch: 14 Global Step: 77180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:51:20,850-Speed 10509.04 samples/sec Loss 4.2787 LearningRate 0.0612 Epoch: 14 Global Step: 77190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:51:28,643-Speed 10514.26 samples/sec Loss 4.2442 LearningRate 0.0612 Epoch: 14 Global Step: 77200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:51:36,468-Speed 10470.23 samples/sec Loss 4.2379 LearningRate 0.0611 Epoch: 14 Global Step: 77210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:51:44,295-Speed 10468.29 samples/sec Loss 4.2788 LearningRate 0.0611 Epoch: 14 Global Step: 77220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:51:52,123-Speed 10466.19 samples/sec Loss 4.2544 LearningRate 0.0610 Epoch: 14 Global Step: 77230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:51:59,923-Speed 10503.88 samples/sec Loss 4.2577 LearningRate 0.0610 Epoch: 14 Global Step: 77240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:52:07,708-Speed 10524.54 samples/sec Loss 4.2472 LearningRate 0.0609 Epoch: 14 Global Step: 77250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:52:15,531-Speed 10473.90 samples/sec Loss 4.2449 LearningRate 0.0609 Epoch: 14 Global Step: 77260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:52:23,342-Speed 10489.34 samples/sec Loss 4.2776 LearningRate 0.0608 Epoch: 14 Global Step: 77270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:52:31,135-Speed 10512.34 samples/sec Loss 4.2608 LearningRate 0.0608 Epoch: 14 Global Step: 77280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:52:38,908-Speed 10540.68 samples/sec Loss 4.2519 LearningRate 0.0607 Epoch: 14 Global Step: 77290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:52:46,714-Speed 10496.05 samples/sec Loss 4.2679 LearningRate 0.0607 Epoch: 14 Global Step: 77300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:52:54,531-Speed 10482.58 samples/sec Loss 4.2289 LearningRate 0.0606 Epoch: 14 Global Step: 77310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:53:02,332-Speed 10502.77 samples/sec Loss 4.2316 LearningRate 0.0606 Epoch: 14 Global Step: 77320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:53:10,138-Speed 10495.74 samples/sec Loss 4.2442 LearningRate 0.0606 Epoch: 14 Global Step: 77330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:53:17,959-Speed 10474.86 samples/sec Loss 4.2348 LearningRate 0.0605 Epoch: 14 Global Step: 77340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:53:25,757-Speed 10506.62 samples/sec Loss 4.2803 LearningRate 0.0605 Epoch: 14 Global Step: 77350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:53:33,545-Speed 10520.85 samples/sec Loss 4.2742 LearningRate 0.0604 Epoch: 14 Global Step: 77360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:53:41,360-Speed 10483.86 samples/sec Loss 4.2473 LearningRate 0.0604 Epoch: 14 Global Step: 77370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:53:49,207-Speed 10440.60 samples/sec Loss 4.2623 LearningRate 0.0603 Epoch: 14 Global Step: 77380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:53:56,999-Speed 10516.00 samples/sec Loss 4.2457 LearningRate 0.0603 Epoch: 14 Global Step: 77390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:54:04,822-Speed 10472.24 samples/sec Loss 4.2570 LearningRate 0.0602 Epoch: 14 Global Step: 77400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:54:12,645-Speed 10473.02 samples/sec Loss 4.2224 LearningRate 0.0602 Epoch: 14 Global Step: 77410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:54:20,517-Speed 10408.53 samples/sec Loss 4.2451 LearningRate 0.0601 Epoch: 14 Global Step: 77420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:54:28,330-Speed 10486.65 samples/sec Loss 4.1834 LearningRate 0.0601 Epoch: 14 Global Step: 77430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:54:36,138-Speed 10492.67 samples/sec Loss 4.2125 LearningRate 0.0600 Epoch: 14 Global Step: 77440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:54:43,942-Speed 10498.54 samples/sec Loss 4.2629 LearningRate 0.0600 Epoch: 14 Global Step: 77450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:54:51,740-Speed 10507.98 samples/sec Loss 4.2191 LearningRate 0.0600 Epoch: 14 Global Step: 77460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:54:59,519-Speed 10533.01 samples/sec Loss 4.2462 LearningRate 0.0599 Epoch: 14 Global Step: 77470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:55:07,294-Speed 10536.46 samples/sec Loss 4.2275 LearningRate 0.0599 Epoch: 14 Global Step: 77480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:55:15,085-Speed 10516.06 samples/sec Loss 4.1991 LearningRate 0.0598 Epoch: 14 Global Step: 77490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:55:22,882-Speed 10508.39 samples/sec Loss 4.2055 LearningRate 0.0598 Epoch: 14 Global Step: 77500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:55:30,678-Speed 10510.54 samples/sec Loss 4.1805 LearningRate 0.0597 Epoch: 14 Global Step: 77510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:55:38,460-Speed 10526.89 samples/sec Loss 4.2163 LearningRate 0.0597 Epoch: 14 Global Step: 77520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:55:46,237-Speed 10534.40 samples/sec Loss 4.1957 LearningRate 0.0596 Epoch: 14 Global Step: 77530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:55:54,072-Speed 10457.47 samples/sec Loss 4.2441 LearningRate 0.0596 Epoch: 14 Global Step: 77540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:56:01,868-Speed 10509.79 samples/sec Loss 4.1762 LearningRate 0.0595 Epoch: 14 Global Step: 77550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:56:09,641-Speed 10539.57 samples/sec Loss 4.2297 LearningRate 0.0595 Epoch: 14 Global Step: 77560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:56:17,419-Speed 10534.87 samples/sec Loss 4.2404 LearningRate 0.0595 Epoch: 14 Global Step: 77570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:56:25,199-Speed 10531.95 samples/sec Loss 4.2395 LearningRate 0.0594 Epoch: 14 Global Step: 77580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:56:32,981-Speed 10527.82 samples/sec Loss 4.2202 LearningRate 0.0594 Epoch: 14 Global Step: 77590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:56:40,790-Speed 10492.26 samples/sec Loss 4.2020 LearningRate 0.0593 Epoch: 14 Global Step: 77600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:56:48,586-Speed 10509.07 samples/sec Loss 4.1711 LearningRate 0.0593 Epoch: 14 Global Step: 77610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:56:56,398-Speed 10487.64 samples/sec Loss 4.2094 LearningRate 0.0592 Epoch: 14 Global Step: 77620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:57:04,215-Speed 10485.69 samples/sec Loss 4.2333 LearningRate 0.0592 Epoch: 14 Global Step: 77630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:57:11,991-Speed 10536.75 samples/sec Loss 4.2183 LearningRate 0.0591 Epoch: 14 Global Step: 77640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:57:19,782-Speed 10515.33 samples/sec Loss 4.2040 LearningRate 0.0591 Epoch: 14 Global Step: 77650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:57:27,573-Speed 10517.45 samples/sec Loss 4.1665 LearningRate 0.0590 Epoch: 14 Global Step: 77660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:57:35,353-Speed 10530.94 samples/sec Loss 4.1809 LearningRate 0.0590 Epoch: 14 Global Step: 77670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:57:43,141-Speed 10520.27 samples/sec Loss 4.1685 LearningRate 0.0590 Epoch: 14 Global Step: 77680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:57:50,936-Speed 10510.61 samples/sec Loss 4.2309 LearningRate 0.0589 Epoch: 14 Global Step: 77690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:57:58,720-Speed 10525.42 samples/sec Loss 4.2279 LearningRate 0.0589 Epoch: 14 Global Step: 77700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:58:06,501-Speed 10529.39 samples/sec Loss 4.1888 LearningRate 0.0588 Epoch: 14 Global Step: 77710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:58:14,298-Speed 10507.08 samples/sec Loss 4.2296 LearningRate 0.0588 Epoch: 14 Global Step: 77720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:58:22,152-Speed 10432.57 samples/sec Loss 4.2259 LearningRate 0.0587 Epoch: 14 Global Step: 77730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:58:29,965-Speed 10486.88 samples/sec Loss 4.2046 LearningRate 0.0587 Epoch: 14 Global Step: 77740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:58:37,770-Speed 10497.50 samples/sec Loss 4.2123 LearningRate 0.0586 Epoch: 14 Global Step: 77750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:58:45,562-Speed 10514.61 samples/sec Loss 4.1742 LearningRate 0.0586 Epoch: 14 Global Step: 77760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:58:53,381-Speed 10479.06 samples/sec Loss 4.1924 LearningRate 0.0585 Epoch: 14 Global Step: 77770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:59:16,015-Speed 3619.41 samples/sec Loss 4.1958 LearningRate 0.0585 Epoch: 15 Global Step: 77780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:59:23,773-Speed 10561.38 samples/sec Loss 4.1777 LearningRate 0.0585 Epoch: 15 Global Step: 77790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 07:59:31,526-Speed 10567.40 samples/sec Loss 4.2187 LearningRate 0.0584 Epoch: 15 Global Step: 77800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:59:39,278-Speed 10569.69 samples/sec Loss 4.1509 LearningRate 0.0584 Epoch: 15 Global Step: 77810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:59:47,044-Speed 10548.84 samples/sec Loss 4.1587 LearningRate 0.0583 Epoch: 15 Global Step: 77820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 07:59:54,850-Speed 10497.29 samples/sec Loss 4.1704 LearningRate 0.0583 Epoch: 15 Global Step: 77830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:00:02,622-Speed 10541.74 samples/sec Loss 4.1956 LearningRate 0.0582 Epoch: 15 Global Step: 77840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:00:10,397-Speed 10536.86 samples/sec Loss 4.1794 LearningRate 0.0582 Epoch: 15 Global Step: 77850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:00:18,197-Speed 10505.16 samples/sec Loss 4.1592 LearningRate 0.0581 Epoch: 15 Global Step: 77860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:00:25,994-Speed 10508.05 samples/sec Loss 4.1525 LearningRate 0.0581 Epoch: 15 Global Step: 77870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:00:33,785-Speed 10516.31 samples/sec Loss 4.1803 LearningRate 0.0581 Epoch: 15 Global Step: 77880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:00:41,568-Speed 10526.15 samples/sec Loss 4.1120 LearningRate 0.0580 Epoch: 15 Global Step: 77890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:00:49,396-Speed 10466.97 samples/sec Loss 4.1452 LearningRate 0.0580 Epoch: 15 Global Step: 77900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:00:57,197-Speed 10502.46 samples/sec Loss 4.1417 LearningRate 0.0579 Epoch: 15 Global Step: 77910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:01:05,058-Speed 10422.77 samples/sec Loss 4.0973 LearningRate 0.0579 Epoch: 15 Global Step: 77920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:01:12,895-Speed 10453.28 samples/sec Loss 4.1404 LearningRate 0.0578 Epoch: 15 Global Step: 77930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:01:20,677-Speed 10529.04 samples/sec Loss 4.1623 LearningRate 0.0578 Epoch: 15 Global Step: 77940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:01:28,482-Speed 10496.90 samples/sec Loss 4.1376 LearningRate 0.0577 Epoch: 15 Global Step: 77950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:01:36,284-Speed 10501.41 samples/sec Loss 4.1628 LearningRate 0.0577 Epoch: 15 Global Step: 77960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:01:44,077-Speed 10512.37 samples/sec Loss 4.1496 LearningRate 0.0576 Epoch: 15 Global Step: 77970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:01:51,842-Speed 10551.40 samples/sec Loss 4.1688 LearningRate 0.0576 Epoch: 15 Global Step: 77980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:01:59,673-Speed 10463.80 samples/sec Loss 4.1561 LearningRate 0.0576 Epoch: 15 Global Step: 77990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:02:07,459-Speed 10521.68 samples/sec Loss 4.1575 LearningRate 0.0575 Epoch: 15 Global Step: 78000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 08:02:15,276-Speed 10480.61 samples/sec Loss 4.1601 LearningRate 0.0575 Epoch: 15 Global Step: 78010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 08:02:23,078-Speed 10502.62 samples/sec Loss 4.1479 LearningRate 0.0574 Epoch: 15 Global Step: 78020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 08:02:30,874-Speed 10509.20 samples/sec Loss 4.1421 LearningRate 0.0574 Epoch: 15 Global Step: 78030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:02:38,693-Speed 10477.84 samples/sec Loss 4.1476 LearningRate 0.0573 Epoch: 15 Global Step: 78040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:02:46,500-Speed 10495.04 samples/sec Loss 4.1755 LearningRate 0.0573 Epoch: 15 Global Step: 78050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:02:54,289-Speed 10519.86 samples/sec Loss 4.1545 LearningRate 0.0572 Epoch: 15 Global Step: 78060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:03:02,088-Speed 10505.07 samples/sec Loss 4.1631 LearningRate 0.0572 Epoch: 15 Global Step: 78070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:03:09,880-Speed 10515.23 samples/sec Loss 4.1405 LearningRate 0.0572 Epoch: 15 Global Step: 78080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:03:17,675-Speed 10510.05 samples/sec Loss 4.1471 LearningRate 0.0571 Epoch: 15 Global Step: 78090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:03:25,484-Speed 10493.33 samples/sec Loss 4.1539 LearningRate 0.0571 Epoch: 15 Global Step: 78100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:03:33,301-Speed 10480.81 samples/sec Loss 4.1476 LearningRate 0.0570 Epoch: 15 Global Step: 78110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:03:41,086-Speed 10524.92 samples/sec Loss 4.1275 LearningRate 0.0570 Epoch: 15 Global Step: 78120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:03:48,881-Speed 10510.99 samples/sec Loss 4.1194 LearningRate 0.0569 Epoch: 15 Global Step: 78130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 08:03:56,690-Speed 10492.10 samples/sec Loss 4.1250 LearningRate 0.0569 Epoch: 15 Global Step: 78140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 08:04:04,510-Speed 10477.55 samples/sec Loss 4.0985 LearningRate 0.0568 Epoch: 15 Global Step: 78150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:04:12,314-Speed 10498.90 samples/sec Loss 4.1299 LearningRate 0.0568 Epoch: 15 Global Step: 78160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:04:20,127-Speed 10485.85 samples/sec Loss 4.0832 LearningRate 0.0568 Epoch: 15 Global Step: 78170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:04:27,937-Speed 10490.61 samples/sec Loss 4.1295 LearningRate 0.0567 Epoch: 15 Global Step: 78180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:04:35,769-Speed 10461.53 samples/sec Loss 4.1011 LearningRate 0.0567 Epoch: 15 Global Step: 78190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:04:43,603-Speed 10463.78 samples/sec Loss 4.1181 LearningRate 0.0566 Epoch: 15 Global Step: 78200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:04:51,442-Speed 10451.58 samples/sec Loss 4.1344 LearningRate 0.0566 Epoch: 15 Global Step: 78210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:04:59,276-Speed 10459.57 samples/sec Loss 4.1392 LearningRate 0.0565 Epoch: 15 Global Step: 78220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:05:07,138-Speed 10419.69 samples/sec Loss 4.1595 LearningRate 0.0565 Epoch: 15 Global Step: 78230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:05:14,967-Speed 10466.17 samples/sec Loss 4.1395 LearningRate 0.0564 Epoch: 15 Global Step: 78240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:05:22,791-Speed 10471.94 samples/sec Loss 4.1348 LearningRate 0.0564 Epoch: 15 Global Step: 78250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 08:05:30,646-Speed 10429.99 samples/sec Loss 4.0785 LearningRate 0.0564 Epoch: 15 Global Step: 78260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 08:05:38,509-Speed 10419.32 samples/sec Loss 4.0961 LearningRate 0.0563 Epoch: 15 Global Step: 78270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:05:46,367-Speed 10427.56 samples/sec Loss 4.1129 LearningRate 0.0563 Epoch: 15 Global Step: 78280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:05:54,244-Speed 10401.16 samples/sec Loss 4.0963 LearningRate 0.0562 Epoch: 15 Global Step: 78290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:06:02,061-Speed 10482.20 samples/sec Loss 4.0838 LearningRate 0.0562 Epoch: 15 Global Step: 78300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:06:09,872-Speed 10489.47 samples/sec Loss 4.1250 LearningRate 0.0561 Epoch: 15 Global Step: 78310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:06:17,668-Speed 10510.02 samples/sec Loss 4.1253 LearningRate 0.0561 Epoch: 15 Global Step: 78320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:06:25,491-Speed 10472.34 samples/sec Loss 4.1223 LearningRate 0.0560 Epoch: 15 Global Step: 78330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:06:33,325-Speed 10457.97 samples/sec Loss 4.1300 LearningRate 0.0560 Epoch: 15 Global Step: 78340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:06:41,136-Speed 10489.05 samples/sec Loss 4.1259 LearningRate 0.0560 Epoch: 15 Global Step: 78350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:06:48,950-Speed 10485.42 samples/sec Loss 4.1130 LearningRate 0.0559 Epoch: 15 Global Step: 78360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:06:56,767-Speed 10480.27 samples/sec Loss 4.1089 LearningRate 0.0559 Epoch: 15 Global Step: 78370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 08:07:04,596-Speed 10469.65 samples/sec Loss 4.1276 LearningRate 0.0558 Epoch: 15 Global Step: 78380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 08:07:12,406-Speed 10490.10 samples/sec Loss 4.0904 LearningRate 0.0558 Epoch: 15 Global Step: 78390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 08:07:20,210-Speed 10499.40 samples/sec Loss 4.0440 LearningRate 0.0557 Epoch: 15 Global Step: 78400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 08:07:28,005-Speed 10509.96 samples/sec Loss 4.0895 LearningRate 0.0557 Epoch: 15 Global Step: 78410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 08:07:35,789-Speed 10525.71 samples/sec Loss 4.0592 LearningRate 0.0556 Epoch: 15 Global Step: 78420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 08:07:43,594-Speed 10497.36 samples/sec Loss 4.0606 LearningRate 0.0556 Epoch: 15 Global Step: 78430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 08:07:51,393-Speed 10509.20 samples/sec Loss 4.0705 LearningRate 0.0556 Epoch: 15 Global Step: 78440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 08:07:59,173-Speed 10530.30 samples/sec Loss 4.0792 LearningRate 0.0555 Epoch: 15 Global Step: 78450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:08:06,965-Speed 10516.12 samples/sec Loss 4.0677 LearningRate 0.0555 Epoch: 15 Global Step: 78460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:08:14,786-Speed 10477.48 samples/sec Loss 4.0843 LearningRate 0.0554 Epoch: 15 Global Step: 78470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:08:22,575-Speed 10518.97 samples/sec Loss 4.0635 LearningRate 0.0554 Epoch: 15 Global Step: 78480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:08:30,367-Speed 10514.36 samples/sec Loss 4.0781 LearningRate 0.0553 Epoch: 15 Global Step: 78490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:08:38,159-Speed 10514.55 samples/sec Loss 4.0901 LearningRate 0.0553 Epoch: 15 Global Step: 78500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:08:45,957-Speed 10506.40 samples/sec Loss 4.1280 LearningRate 0.0553 Epoch: 15 Global Step: 78510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:08:53,749-Speed 10515.73 samples/sec Loss 4.0678 LearningRate 0.0552 Epoch: 15 Global Step: 78520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:09:01,539-Speed 10517.21 samples/sec Loss 4.0681 LearningRate 0.0552 Epoch: 15 Global Step: 78530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:09:09,363-Speed 10471.05 samples/sec Loss 4.0755 LearningRate 0.0551 Epoch: 15 Global Step: 78540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:09:17,160-Speed 10508.92 samples/sec Loss 4.0587 LearningRate 0.0551 Epoch: 15 Global Step: 78550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 08:09:24,957-Speed 10508.45 samples/sec Loss 4.0778 LearningRate 0.0550 Epoch: 15 Global Step: 78560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 08:09:32,745-Speed 10519.68 samples/sec Loss 4.0799 LearningRate 0.0550 Epoch: 15 Global Step: 78570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-01-16 08:09:40,505-Speed 10558.35 samples/sec Loss 4.0414 LearningRate 0.0549 Epoch: 15 Global Step: 78580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:09:48,290-Speed 10524.22 samples/sec Loss 4.0648 LearningRate 0.0549 Epoch: 15 Global Step: 78590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:09:56,078-Speed 10524.40 samples/sec Loss 4.0718 LearningRate 0.0549 Epoch: 15 Global Step: 78600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:10:03,869-Speed 10516.29 samples/sec Loss 4.0708 LearningRate 0.0548 Epoch: 15 Global Step: 78610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:10:11,673-Speed 10498.44 samples/sec Loss 4.0436 LearningRate 0.0548 Epoch: 15 Global Step: 78620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:10:19,457-Speed 10525.31 samples/sec Loss 4.0733 LearningRate 0.0547 Epoch: 15 Global Step: 78630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:10:27,260-Speed 10500.59 samples/sec Loss 4.0683 LearningRate 0.0547 Epoch: 15 Global Step: 78640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:10:35,050-Speed 10516.04 samples/sec Loss 4.0861 LearningRate 0.0546 Epoch: 15 Global Step: 78650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:10:42,835-Speed 10524.87 samples/sec Loss 4.0547 LearningRate 0.0546 Epoch: 15 Global Step: 78660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-01-16 08:10:50,688-Speed 10433.94 samples/sec Loss 4.0704 LearningRate 0.0546 Epoch: 15 Global Step: 78670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:10:58,481-Speed 10512.26 samples/sec Loss 4.0343 LearningRate 0.0545 Epoch: 15 Global Step: 78680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:11:06,288-Speed 10494.76 samples/sec Loss 4.0484 LearningRate 0.0545 Epoch: 15 Global Step: 78690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:11:14,086-Speed 10510.23 samples/sec Loss 4.0505 LearningRate 0.0544 Epoch: 15 Global Step: 78700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:11:21,905-Speed 10477.52 samples/sec Loss 4.0734 LearningRate 0.0544 Epoch: 15 Global Step: 78710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:11:29,719-Speed 10485.22 samples/sec Loss 4.0983 LearningRate 0.0543 Epoch: 15 Global Step: 78720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:11:37,532-Speed 10486.85 samples/sec Loss 4.0531 LearningRate 0.0543 Epoch: 15 Global Step: 78730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:11:45,323-Speed 10516.02 samples/sec Loss 4.0624 LearningRate 0.0542 Epoch: 15 Global Step: 78740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:11:53,105-Speed 10527.69 samples/sec Loss 3.9916 LearningRate 0.0542 Epoch: 15 Global Step: 78750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:12:00,944-Speed 10451.64 samples/sec Loss 3.9848 LearningRate 0.0542 Epoch: 15 Global Step: 78760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:12:08,737-Speed 10514.61 samples/sec Loss 4.0329 LearningRate 0.0541 Epoch: 15 Global Step: 78770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:12:16,528-Speed 10516.73 samples/sec Loss 4.0329 LearningRate 0.0541 Epoch: 15 Global Step: 78780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:12:24,351-Speed 10473.85 samples/sec Loss 4.0108 LearningRate 0.0540 Epoch: 15 Global Step: 78790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:12:32,161-Speed 10490.70 samples/sec Loss 4.0393 LearningRate 0.0540 Epoch: 15 Global Step: 78800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:12:40,040-Speed 10398.11 samples/sec Loss 4.0389 LearningRate 0.0539 Epoch: 15 Global Step: 78810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:12:47,860-Speed 10476.89 samples/sec Loss 4.0280 LearningRate 0.0539 Epoch: 15 Global Step: 78820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:12:55,678-Speed 10480.25 samples/sec Loss 4.0256 LearningRate 0.0539 Epoch: 15 Global Step: 78830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:13:03,509-Speed 10461.82 samples/sec Loss 4.0172 LearningRate 0.0538 Epoch: 15 Global Step: 78840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:13:11,306-Speed 10508.62 samples/sec Loss 4.0175 LearningRate 0.0538 Epoch: 15 Global Step: 78850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:13:19,128-Speed 10475.02 samples/sec Loss 4.0017 LearningRate 0.0537 Epoch: 15 Global Step: 78860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:13:26,950-Speed 10474.49 samples/sec Loss 4.0207 LearningRate 0.0537 Epoch: 15 Global Step: 78870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:13:34,772-Speed 10474.62 samples/sec Loss 4.0677 LearningRate 0.0536 Epoch: 15 Global Step: 78880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:13:42,575-Speed 10500.13 samples/sec Loss 4.0437 LearningRate 0.0536 Epoch: 15 Global Step: 78890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:13:50,399-Speed 10474.89 samples/sec Loss 4.0665 LearningRate 0.0536 Epoch: 15 Global Step: 78900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:13:58,210-Speed 10489.15 samples/sec Loss 4.0443 LearningRate 0.0535 Epoch: 15 Global Step: 78910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:14:06,041-Speed 10462.97 samples/sec Loss 4.0336 LearningRate 0.0535 Epoch: 15 Global Step: 78920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:14:13,847-Speed 10496.19 samples/sec Loss 4.0511 LearningRate 0.0534 Epoch: 15 Global Step: 78930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:14:21,694-Speed 10441.54 samples/sec Loss 4.0378 LearningRate 0.0534 Epoch: 15 Global Step: 78940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:14:29,555-Speed 10422.25 samples/sec Loss 3.9995 LearningRate 0.0533 Epoch: 15 Global Step: 78950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:14:37,365-Speed 10493.61 samples/sec Loss 4.0238 LearningRate 0.0533 Epoch: 15 Global Step: 78960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:14:45,166-Speed 10503.77 samples/sec Loss 4.0305 LearningRate 0.0533 Epoch: 15 Global Step: 78970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:14:52,989-Speed 10471.68 samples/sec Loss 4.0398 LearningRate 0.0532 Epoch: 15 Global Step: 78980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:15:00,801-Speed 10488.07 samples/sec Loss 4.0232 LearningRate 0.0532 Epoch: 15 Global Step: 78990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:15:08,646-Speed 10443.76 samples/sec Loss 4.0229 LearningRate 0.0531 Epoch: 15 Global Step: 79000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:15:16,461-Speed 10487.81 samples/sec Loss 4.0114 LearningRate 0.0531 Epoch: 15 Global Step: 79010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:15:24,277-Speed 10481.49 samples/sec Loss 4.0137 LearningRate 0.0530 Epoch: 15 Global Step: 79020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:15:32,110-Speed 10460.23 samples/sec Loss 3.9885 LearningRate 0.0530 Epoch: 15 Global Step: 79030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:15:39,925-Speed 10483.76 samples/sec Loss 3.9930 LearningRate 0.0529 Epoch: 15 Global Step: 79040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:15:47,713-Speed 10520.95 samples/sec Loss 4.0123 LearningRate 0.0529 Epoch: 15 Global Step: 79050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:15:55,513-Speed 10502.89 samples/sec Loss 3.9926 LearningRate 0.0529 Epoch: 15 Global Step: 79060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:16:03,307-Speed 10512.10 samples/sec Loss 4.0204 LearningRate 0.0528 Epoch: 15 Global Step: 79070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:16:11,109-Speed 10501.15 samples/sec Loss 4.0038 LearningRate 0.0528 Epoch: 15 Global Step: 79080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:16:18,929-Speed 10477.34 samples/sec Loss 3.9780 LearningRate 0.0527 Epoch: 15 Global Step: 79090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:16:26,764-Speed 10456.81 samples/sec Loss 3.9963 LearningRate 0.0527 Epoch: 15 Global Step: 79100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:16:34,576-Speed 10487.14 samples/sec Loss 4.0140 LearningRate 0.0526 Epoch: 15 Global Step: 79110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:16:42,397-Speed 10476.08 samples/sec Loss 3.9790 LearningRate 0.0526 Epoch: 15 Global Step: 79120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:16:50,198-Speed 10503.59 samples/sec Loss 3.9874 LearningRate 0.0526 Epoch: 15 Global Step: 79130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:16:58,004-Speed 10495.21 samples/sec Loss 3.9775 LearningRate 0.0525 Epoch: 15 Global Step: 79140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:17:05,784-Speed 10531.26 samples/sec Loss 3.9908 LearningRate 0.0525 Epoch: 15 Global Step: 79150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:17:13,571-Speed 10522.16 samples/sec Loss 3.9793 LearningRate 0.0524 Epoch: 15 Global Step: 79160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:17:21,369-Speed 10507.01 samples/sec Loss 4.0238 LearningRate 0.0524 Epoch: 15 Global Step: 79170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:17:29,206-Speed 10452.98 samples/sec Loss 3.9964 LearningRate 0.0523 Epoch: 15 Global Step: 79180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:17:36,992-Speed 10522.93 samples/sec Loss 3.9999 LearningRate 0.0523 Epoch: 15 Global Step: 79190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:17:44,799-Speed 10495.71 samples/sec Loss 3.9734 LearningRate 0.0523 Epoch: 15 Global Step: 79200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:17:52,594-Speed 10510.93 samples/sec Loss 3.9909 LearningRate 0.0522 Epoch: 15 Global Step: 79210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:18:00,405-Speed 10488.73 samples/sec Loss 3.9526 LearningRate 0.0522 Epoch: 15 Global Step: 79220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:18:08,212-Speed 10494.35 samples/sec Loss 3.9399 LearningRate 0.0521 Epoch: 15 Global Step: 79230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:18:16,002-Speed 10518.11 samples/sec Loss 3.9839 LearningRate 0.0521 Epoch: 15 Global Step: 79240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:18:23,795-Speed 10512.58 samples/sec Loss 3.9368 LearningRate 0.0521 Epoch: 15 Global Step: 79250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:18:31,600-Speed 10497.16 samples/sec Loss 4.0137 LearningRate 0.0520 Epoch: 15 Global Step: 79260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:18:39,380-Speed 10530.33 samples/sec Loss 3.9947 LearningRate 0.0520 Epoch: 15 Global Step: 79270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:18:47,166-Speed 10523.15 samples/sec Loss 3.9568 LearningRate 0.0519 Epoch: 15 Global Step: 79280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:18:54,970-Speed 10499.14 samples/sec Loss 3.9927 LearningRate 0.0519 Epoch: 15 Global Step: 79290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:19:02,776-Speed 10495.65 samples/sec Loss 3.9593 LearningRate 0.0518 Epoch: 15 Global Step: 79300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:19:10,579-Speed 10500.37 samples/sec Loss 3.9829 LearningRate 0.0518 Epoch: 15 Global Step: 79310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:19:18,352-Speed 10541.90 samples/sec Loss 3.9720 LearningRate 0.0518 Epoch: 15 Global Step: 79320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:19:26,154-Speed 10502.20 samples/sec Loss 3.9483 LearningRate 0.0517 Epoch: 15 Global Step: 79330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:19:33,957-Speed 10499.47 samples/sec Loss 3.9737 LearningRate 0.0517 Epoch: 15 Global Step: 79340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:19:41,758-Speed 10502.81 samples/sec Loss 3.9525 LearningRate 0.0516 Epoch: 15 Global Step: 79350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:19:49,569-Speed 10489.48 samples/sec Loss 3.9843 LearningRate 0.0516 Epoch: 15 Global Step: 79360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:19:57,368-Speed 10504.65 samples/sec Loss 3.9631 LearningRate 0.0515 Epoch: 15 Global Step: 79370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:20:05,209-Speed 10449.64 samples/sec Loss 3.9537 LearningRate 0.0515 Epoch: 15 Global Step: 79380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:20:13,057-Speed 10444.35 samples/sec Loss 3.9469 LearningRate 0.0515 Epoch: 15 Global Step: 79390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:20:20,861-Speed 10498.73 samples/sec Loss 3.9751 LearningRate 0.0514 Epoch: 15 Global Step: 79400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:20:28,676-Speed 10484.56 samples/sec Loss 3.9758 LearningRate 0.0514 Epoch: 15 Global Step: 79410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:20:36,504-Speed 10464.91 samples/sec Loss 3.9436 LearningRate 0.0513 Epoch: 15 Global Step: 79420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:20:44,305-Speed 10503.10 samples/sec Loss 3.9378 LearningRate 0.0513 Epoch: 15 Global Step: 79430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:20:52,120-Speed 10483.87 samples/sec Loss 3.9524 LearningRate 0.0512 Epoch: 15 Global Step: 79440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:20:59,936-Speed 10482.22 samples/sec Loss 3.9343 LearningRate 0.0512 Epoch: 15 Global Step: 79450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:21:07,766-Speed 10463.83 samples/sec Loss 3.9390 LearningRate 0.0512 Epoch: 15 Global Step: 79460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:21:15,577-Speed 10490.18 samples/sec Loss 3.9358 LearningRate 0.0511 Epoch: 15 Global Step: 79470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:21:23,373-Speed 10508.51 samples/sec Loss 3.9527 LearningRate 0.0511 Epoch: 15 Global Step: 79480 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-01-16 08:21:31,167-Speed 10519.26 samples/sec Loss 3.9494 LearningRate 0.0510 Epoch: 15 Global Step: 79490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:21:38,960-Speed 10513.02 samples/sec Loss 3.9412 LearningRate 0.0510 Epoch: 15 Global Step: 79500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:21:46,773-Speed 10485.96 samples/sec Loss 3.9420 LearningRate 0.0509 Epoch: 15 Global Step: 79510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:21:54,585-Speed 10487.74 samples/sec Loss 3.9258 LearningRate 0.0509 Epoch: 15 Global Step: 79520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:22:02,387-Speed 10501.50 samples/sec Loss 3.9130 LearningRate 0.0509 Epoch: 15 Global Step: 79530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:22:10,178-Speed 10516.78 samples/sec Loss 3.9434 LearningRate 0.0508 Epoch: 15 Global Step: 79540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:22:17,986-Speed 10493.13 samples/sec Loss 3.9598 LearningRate 0.0508 Epoch: 15 Global Step: 79550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:22:25,796-Speed 10490.21 samples/sec Loss 3.9356 LearningRate 0.0507 Epoch: 15 Global Step: 79560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:22:33,575-Speed 10532.68 samples/sec Loss 3.8879 LearningRate 0.0507 Epoch: 15 Global Step: 79570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:22:41,385-Speed 10490.28 samples/sec Loss 3.9382 LearningRate 0.0507 Epoch: 15 Global Step: 79580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:22:49,181-Speed 10509.62 samples/sec Loss 3.9066 LearningRate 0.0506 Epoch: 15 Global Step: 79590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:22:56,987-Speed 10495.96 samples/sec Loss 3.9274 LearningRate 0.0506 Epoch: 15 Global Step: 79600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-16 08:23:04,851-Speed 10421.72 samples/sec Loss 3.9415 LearningRate 0.0505 Epoch: 15 Global Step: 79610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-16 08:23:12,673-Speed 10473.89 samples/sec Loss 3.9252 LearningRate 0.0505 Epoch: 15 Global Step: 79620 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-16 08:23:20,477-Speed 10499.05 samples/sec Loss 3.9276 LearningRate 0.0504 Epoch: 15 Global Step: 79630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-16 08:23:28,281-Speed 10499.65 samples/sec Loss 3.9292 LearningRate 0.0504 Epoch: 15 Global Step: 79640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-16 08:23:36,091-Speed 10490.95 samples/sec Loss 3.9129 LearningRate 0.0504 Epoch: 15 Global Step: 79650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-16 08:23:43,913-Speed 10473.90 samples/sec Loss 3.9046 LearningRate 0.0503 Epoch: 15 Global Step: 79660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-16 08:23:51,708-Speed 10511.35 samples/sec Loss 3.9350 LearningRate 0.0503 Epoch: 15 Global Step: 79670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-16 08:23:59,491-Speed 10525.99 samples/sec Loss 3.9087 LearningRate 0.0502 Epoch: 15 Global Step: 79680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-16 08:24:07,271-Speed 10532.28 samples/sec Loss 3.9090 LearningRate 0.0502 Epoch: 15 Global Step: 79690 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-01-16 08:24:15,082-Speed 10488.39 samples/sec Loss 3.8747 LearningRate 0.0502 Epoch: 15 Global Step: 79700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:24:22,897-Speed 10484.18 samples/sec Loss 3.9152 LearningRate 0.0501 Epoch: 15 Global Step: 79710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:24:30,705-Speed 10495.09 samples/sec Loss 3.9269 LearningRate 0.0501 Epoch: 15 Global Step: 79720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:24:38,547-Speed 10446.63 samples/sec Loss 3.9023 LearningRate 0.0500 Epoch: 15 Global Step: 79730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:24:46,363-Speed 10483.10 samples/sec Loss 3.9029 LearningRate 0.0500 Epoch: 15 Global Step: 79740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:24:54,144-Speed 10528.41 samples/sec Loss 3.9128 LearningRate 0.0499 Epoch: 15 Global Step: 79750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:25:01,945-Speed 10503.49 samples/sec Loss 3.8808 LearningRate 0.0499 Epoch: 15 Global Step: 79760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:25:09,766-Speed 10475.41 samples/sec Loss 3.9013 LearningRate 0.0499 Epoch: 15 Global Step: 79770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:25:17,567-Speed 10502.88 samples/sec Loss 3.8749 LearningRate 0.0498 Epoch: 15 Global Step: 79780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:25:25,373-Speed 10495.83 samples/sec Loss 3.8793 LearningRate 0.0498 Epoch: 15 Global Step: 79790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:25:33,169-Speed 10510.60 samples/sec Loss 3.8810 LearningRate 0.0497 Epoch: 15 Global Step: 79800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:25:40,985-Speed 10486.16 samples/sec Loss 3.9076 LearningRate 0.0497 Epoch: 15 Global Step: 79810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:25:48,776-Speed 10515.32 samples/sec Loss 3.8968 LearningRate 0.0497 Epoch: 15 Global Step: 79820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:25:56,547-Speed 10543.99 samples/sec Loss 3.9209 LearningRate 0.0496 Epoch: 15 Global Step: 79830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:26:04,339-Speed 10515.11 samples/sec Loss 3.8858 LearningRate 0.0496 Epoch: 15 Global Step: 79840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:26:12,141-Speed 10503.17 samples/sec Loss 3.9125 LearningRate 0.0495 Epoch: 15 Global Step: 79850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:26:19,961-Speed 10476.68 samples/sec Loss 3.8818 LearningRate 0.0495 Epoch: 15 Global Step: 79860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:26:27,761-Speed 10504.85 samples/sec Loss 3.8928 LearningRate 0.0494 Epoch: 15 Global Step: 79870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:26:35,588-Speed 10470.77 samples/sec Loss 3.8892 LearningRate 0.0494 Epoch: 15 Global Step: 79880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:26:43,417-Speed 10465.23 samples/sec Loss 3.8482 LearningRate 0.0494 Epoch: 15 Global Step: 79890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:26:51,217-Speed 10503.82 samples/sec Loss 3.8622 LearningRate 0.0493 Epoch: 15 Global Step: 79900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:26:59,022-Speed 10497.38 samples/sec Loss 3.8573 LearningRate 0.0493 Epoch: 15 Global Step: 79910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:27:06,821-Speed 10504.91 samples/sec Loss 3.8775 LearningRate 0.0492 Epoch: 15 Global Step: 79920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:27:14,640-Speed 10479.54 samples/sec Loss 3.8663 LearningRate 0.0492 Epoch: 15 Global Step: 79930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:27:22,408-Speed 10547.98 samples/sec Loss 3.9348 LearningRate 0.0492 Epoch: 15 Global Step: 79940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:27:30,261-Speed 10433.28 samples/sec Loss 3.8792 LearningRate 0.0491 Epoch: 15 Global Step: 79950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:27:38,094-Speed 10460.29 samples/sec Loss 3.8578 LearningRate 0.0491 Epoch: 15 Global Step: 79960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:27:45,933-Speed 10452.69 samples/sec Loss 3.8475 LearningRate 0.0490 Epoch: 15 Global Step: 79970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:27:53,744-Speed 10488.85 samples/sec Loss 3.8606 LearningRate 0.0490 Epoch: 15 Global Step: 79980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:28:01,560-Speed 10481.26 samples/sec Loss 3.8720 LearningRate 0.0489 Epoch: 15 Global Step: 79990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:28:09,386-Speed 10468.87 samples/sec Loss 3.8733 LearningRate 0.0489 Epoch: 15 Global Step: 80000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:28:36,733-[lfw][80000]XNorm: 22.776289 Training: 2022-01-16 08:28:36,734-[lfw][80000]Accuracy-Flip: 0.99783+-0.00289 Training: 2022-01-16 08:28:36,734-[lfw][80000]Accuracy-Highest: 0.99783 Training: 2022-01-16 08:29:09,832-[cfp_fp][80000]XNorm: 20.435738 Training: 2022-01-16 08:29:09,833-[cfp_fp][80000]Accuracy-Flip: 0.99129+-0.00364 Training: 2022-01-16 08:29:09,833-[cfp_fp][80000]Accuracy-Highest: 0.99129 Training: 2022-01-16 08:29:37,972-[agedb_30][80000]XNorm: 22.371951 Training: 2022-01-16 08:29:37,972-[agedb_30][80000]Accuracy-Flip: 0.97950+-0.00495 Training: 2022-01-16 08:29:37,973-[agedb_30][80000]Accuracy-Highest: 0.97950 Training: 2022-01-16 08:29:45,718-Speed 850.41 samples/sec Loss 3.8905 LearningRate 0.0489 Epoch: 15 Global Step: 80010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:29:53,512-Speed 10512.95 samples/sec Loss 3.8427 LearningRate 0.0488 Epoch: 15 Global Step: 80020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:30:01,292-Speed 10530.72 samples/sec Loss 3.8742 LearningRate 0.0488 Epoch: 15 Global Step: 80030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:30:09,052-Speed 10557.88 samples/sec Loss 3.8619 LearningRate 0.0487 Epoch: 15 Global Step: 80040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:30:16,846-Speed 10512.15 samples/sec Loss 3.8878 LearningRate 0.0487 Epoch: 15 Global Step: 80050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:30:24,669-Speed 10473.70 samples/sec Loss 3.8689 LearningRate 0.0487 Epoch: 15 Global Step: 80060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:30:32,488-Speed 10478.73 samples/sec Loss 3.8256 LearningRate 0.0486 Epoch: 15 Global Step: 80070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:30:40,258-Speed 10543.96 samples/sec Loss 3.8570 LearningRate 0.0486 Epoch: 15 Global Step: 80080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:30:48,018-Speed 10557.99 samples/sec Loss 3.8517 LearningRate 0.0485 Epoch: 15 Global Step: 80090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:30:55,804-Speed 10522.58 samples/sec Loss 3.8599 LearningRate 0.0485 Epoch: 15 Global Step: 80100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:31:03,595-Speed 10516.38 samples/sec Loss 3.8615 LearningRate 0.0485 Epoch: 15 Global Step: 80110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:31:11,379-Speed 10526.26 samples/sec Loss 3.8226 LearningRate 0.0484 Epoch: 15 Global Step: 80120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:31:19,166-Speed 10521.49 samples/sec Loss 3.8242 LearningRate 0.0484 Epoch: 15 Global Step: 80130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:31:27,012-Speed 10443.15 samples/sec Loss 3.8500 LearningRate 0.0483 Epoch: 15 Global Step: 80140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:31:34,789-Speed 10535.34 samples/sec Loss 3.8597 LearningRate 0.0483 Epoch: 15 Global Step: 80150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:31:42,570-Speed 10530.06 samples/sec Loss 3.8768 LearningRate 0.0482 Epoch: 15 Global Step: 80160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:31:50,366-Speed 10509.32 samples/sec Loss 3.8559 LearningRate 0.0482 Epoch: 15 Global Step: 80170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:31:58,169-Speed 10500.87 samples/sec Loss 3.8299 LearningRate 0.0482 Epoch: 15 Global Step: 80180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:32:06,003-Speed 10458.27 samples/sec Loss 3.8507 LearningRate 0.0481 Epoch: 15 Global Step: 80190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:32:13,807-Speed 10500.29 samples/sec Loss 3.8271 LearningRate 0.0481 Epoch: 15 Global Step: 80200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:32:21,600-Speed 10513.82 samples/sec Loss 3.8337 LearningRate 0.0480 Epoch: 15 Global Step: 80210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:32:29,443-Speed 10447.35 samples/sec Loss 3.8257 LearningRate 0.0480 Epoch: 15 Global Step: 80220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:32:37,309-Speed 10415.68 samples/sec Loss 3.8409 LearningRate 0.0480 Epoch: 15 Global Step: 80230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:32:45,104-Speed 10510.74 samples/sec Loss 3.8649 LearningRate 0.0479 Epoch: 15 Global Step: 80240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:32:52,871-Speed 10548.95 samples/sec Loss 3.8346 LearningRate 0.0479 Epoch: 15 Global Step: 80250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:33:00,670-Speed 10506.87 samples/sec Loss 3.8459 LearningRate 0.0478 Epoch: 15 Global Step: 80260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:33:08,452-Speed 10528.35 samples/sec Loss 3.8368 LearningRate 0.0478 Epoch: 15 Global Step: 80270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:33:16,253-Speed 10501.69 samples/sec Loss 3.8201 LearningRate 0.0478 Epoch: 15 Global Step: 80280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:33:24,051-Speed 10506.82 samples/sec Loss 3.7984 LearningRate 0.0477 Epoch: 15 Global Step: 80290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:33:31,823-Speed 10542.28 samples/sec Loss 3.8287 LearningRate 0.0477 Epoch: 15 Global Step: 80300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:33:39,627-Speed 10497.49 samples/sec Loss 3.8521 LearningRate 0.0476 Epoch: 15 Global Step: 80310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:33:47,422-Speed 10511.66 samples/sec Loss 3.8485 LearningRate 0.0476 Epoch: 15 Global Step: 80320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:33:55,195-Speed 10539.86 samples/sec Loss 3.8234 LearningRate 0.0476 Epoch: 15 Global Step: 80330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:34:02,974-Speed 10534.12 samples/sec Loss 3.8489 LearningRate 0.0475 Epoch: 15 Global Step: 80340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:34:10,771-Speed 10508.59 samples/sec Loss 3.8269 LearningRate 0.0475 Epoch: 15 Global Step: 80350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:34:18,612-Speed 10450.14 samples/sec Loss 3.8334 LearningRate 0.0474 Epoch: 15 Global Step: 80360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:34:26,400-Speed 10519.86 samples/sec Loss 3.8252 LearningRate 0.0474 Epoch: 15 Global Step: 80370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:34:34,194-Speed 10512.37 samples/sec Loss 3.8368 LearningRate 0.0473 Epoch: 15 Global Step: 80380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:34:41,984-Speed 10518.33 samples/sec Loss 3.8399 LearningRate 0.0473 Epoch: 15 Global Step: 80390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:34:49,779-Speed 10510.58 samples/sec Loss 3.7935 LearningRate 0.0473 Epoch: 15 Global Step: 80400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:34:57,591-Speed 10488.79 samples/sec Loss 3.7854 LearningRate 0.0472 Epoch: 15 Global Step: 80410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:35:05,371-Speed 10532.35 samples/sec Loss 3.7830 LearningRate 0.0472 Epoch: 15 Global Step: 80420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:35:13,171-Speed 10502.61 samples/sec Loss 3.7960 LearningRate 0.0471 Epoch: 15 Global Step: 80430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:35:20,939-Speed 10546.75 samples/sec Loss 3.7856 LearningRate 0.0471 Epoch: 15 Global Step: 80440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:35:28,731-Speed 10516.07 samples/sec Loss 3.7980 LearningRate 0.0471 Epoch: 15 Global Step: 80450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:35:36,509-Speed 10532.82 samples/sec Loss 3.8138 LearningRate 0.0470 Epoch: 15 Global Step: 80460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:35:44,302-Speed 10512.60 samples/sec Loss 3.7870 LearningRate 0.0470 Epoch: 15 Global Step: 80470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:35:52,079-Speed 10536.70 samples/sec Loss 3.7669 LearningRate 0.0469 Epoch: 15 Global Step: 80480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:35:59,856-Speed 10534.49 samples/sec Loss 3.7939 LearningRate 0.0469 Epoch: 15 Global Step: 80490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:36:07,665-Speed 10492.66 samples/sec Loss 3.8067 LearningRate 0.0469 Epoch: 15 Global Step: 80500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:36:15,459-Speed 10511.61 samples/sec Loss 3.7799 LearningRate 0.0468 Epoch: 15 Global Step: 80510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:36:23,252-Speed 10513.26 samples/sec Loss 3.8071 LearningRate 0.0468 Epoch: 15 Global Step: 80520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:36:31,034-Speed 10528.88 samples/sec Loss 3.7773 LearningRate 0.0467 Epoch: 15 Global Step: 80530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:36:38,825-Speed 10516.58 samples/sec Loss 3.7667 LearningRate 0.0467 Epoch: 15 Global Step: 80540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:36:46,617-Speed 10514.59 samples/sec Loss 3.8025 LearningRate 0.0467 Epoch: 15 Global Step: 80550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:36:54,413-Speed 10509.54 samples/sec Loss 3.7552 LearningRate 0.0466 Epoch: 15 Global Step: 80560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:37:02,193-Speed 10531.04 samples/sec Loss 3.7857 LearningRate 0.0466 Epoch: 15 Global Step: 80570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:37:09,991-Speed 10507.46 samples/sec Loss 3.8021 LearningRate 0.0465 Epoch: 15 Global Step: 80580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:37:17,779-Speed 10520.00 samples/sec Loss 3.7853 LearningRate 0.0465 Epoch: 15 Global Step: 80590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:37:25,553-Speed 10538.65 samples/sec Loss 3.8172 LearningRate 0.0465 Epoch: 15 Global Step: 80600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:37:33,368-Speed 10484.39 samples/sec Loss 3.8026 LearningRate 0.0464 Epoch: 15 Global Step: 80610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:37:41,151-Speed 10528.03 samples/sec Loss 3.7706 LearningRate 0.0464 Epoch: 15 Global Step: 80620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:37:48,933-Speed 10528.59 samples/sec Loss 3.7930 LearningRate 0.0463 Epoch: 15 Global Step: 80630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:37:56,745-Speed 10487.44 samples/sec Loss 3.8168 LearningRate 0.0463 Epoch: 15 Global Step: 80640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:38:04,548-Speed 10499.81 samples/sec Loss 3.8106 LearningRate 0.0463 Epoch: 15 Global Step: 80650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:38:12,380-Speed 10461.76 samples/sec Loss 3.7980 LearningRate 0.0462 Epoch: 15 Global Step: 80660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:38:20,168-Speed 10519.39 samples/sec Loss 3.7755 LearningRate 0.0462 Epoch: 15 Global Step: 80670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:38:27,992-Speed 10471.01 samples/sec Loss 3.8030 LearningRate 0.0461 Epoch: 15 Global Step: 80680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:38:35,779-Speed 10522.46 samples/sec Loss 3.7760 LearningRate 0.0461 Epoch: 15 Global Step: 80690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:38:43,583-Speed 10498.52 samples/sec Loss 3.7663 LearningRate 0.0461 Epoch: 15 Global Step: 80700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:38:51,384-Speed 10501.17 samples/sec Loss 3.7539 LearningRate 0.0460 Epoch: 15 Global Step: 80710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:38:59,220-Speed 10456.55 samples/sec Loss 3.7406 LearningRate 0.0460 Epoch: 15 Global Step: 80720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:39:07,064-Speed 10445.73 samples/sec Loss 3.7579 LearningRate 0.0459 Epoch: 15 Global Step: 80730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:39:14,919-Speed 10430.49 samples/sec Loss 3.7900 LearningRate 0.0459 Epoch: 15 Global Step: 80740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:39:22,723-Speed 10497.37 samples/sec Loss 3.7727 LearningRate 0.0459 Epoch: 15 Global Step: 80750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:39:30,518-Speed 10511.54 samples/sec Loss 3.7955 LearningRate 0.0458 Epoch: 15 Global Step: 80760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:39:38,318-Speed 10505.06 samples/sec Loss 3.7411 LearningRate 0.0458 Epoch: 15 Global Step: 80770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:39:46,134-Speed 10482.34 samples/sec Loss 3.7723 LearningRate 0.0457 Epoch: 15 Global Step: 80780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:39:53,942-Speed 10494.61 samples/sec Loss 3.7507 LearningRate 0.0457 Epoch: 15 Global Step: 80790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:40:01,757-Speed 10483.87 samples/sec Loss 3.7725 LearningRate 0.0457 Epoch: 15 Global Step: 80800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:40:09,567-Speed 10490.89 samples/sec Loss 3.7736 LearningRate 0.0456 Epoch: 15 Global Step: 80810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:40:17,387-Speed 10479.59 samples/sec Loss 3.7859 LearningRate 0.0456 Epoch: 15 Global Step: 80820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:40:25,188-Speed 10501.76 samples/sec Loss 3.7758 LearningRate 0.0455 Epoch: 15 Global Step: 80830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:40:32,981-Speed 10514.95 samples/sec Loss 3.7467 LearningRate 0.0455 Epoch: 15 Global Step: 80840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:40:40,784-Speed 10499.88 samples/sec Loss 3.7182 LearningRate 0.0455 Epoch: 15 Global Step: 80850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:40:48,581-Speed 10508.45 samples/sec Loss 3.7558 LearningRate 0.0454 Epoch: 15 Global Step: 80860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:40:56,381-Speed 10502.78 samples/sec Loss 3.7772 LearningRate 0.0454 Epoch: 15 Global Step: 80870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:41:04,195-Speed 10485.55 samples/sec Loss 3.7304 LearningRate 0.0453 Epoch: 15 Global Step: 80880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:41:11,995-Speed 10503.90 samples/sec Loss 3.7458 LearningRate 0.0453 Epoch: 15 Global Step: 80890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:41:19,809-Speed 10484.65 samples/sec Loss 3.7577 LearningRate 0.0453 Epoch: 15 Global Step: 80900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:41:27,597-Speed 10524.40 samples/sec Loss 3.7575 LearningRate 0.0452 Epoch: 15 Global Step: 80910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:41:35,370-Speed 10539.59 samples/sec Loss 3.7585 LearningRate 0.0452 Epoch: 15 Global Step: 80920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:41:43,161-Speed 10516.06 samples/sec Loss 3.7172 LearningRate 0.0451 Epoch: 15 Global Step: 80930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:41:50,970-Speed 10490.99 samples/sec Loss 3.7290 LearningRate 0.0451 Epoch: 15 Global Step: 80940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:41:58,800-Speed 10464.07 samples/sec Loss 3.7423 LearningRate 0.0451 Epoch: 15 Global Step: 80950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:42:06,593-Speed 10512.93 samples/sec Loss 3.7183 LearningRate 0.0450 Epoch: 15 Global Step: 80960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:42:14,394-Speed 10503.59 samples/sec Loss 3.7511 LearningRate 0.0450 Epoch: 15 Global Step: 80970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:42:22,182-Speed 10520.69 samples/sec Loss 3.7402 LearningRate 0.0449 Epoch: 15 Global Step: 80980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:42:29,981-Speed 10504.99 samples/sec Loss 3.7458 LearningRate 0.0449 Epoch: 15 Global Step: 80990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:42:37,767-Speed 10524.05 samples/sec Loss 3.7145 LearningRate 0.0449 Epoch: 15 Global Step: 81000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:42:45,552-Speed 10524.49 samples/sec Loss 3.7183 LearningRate 0.0448 Epoch: 15 Global Step: 81010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:42:53,372-Speed 10476.68 samples/sec Loss 3.7057 LearningRate 0.0448 Epoch: 15 Global Step: 81020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:43:01,172-Speed 10503.45 samples/sec Loss 3.7129 LearningRate 0.0447 Epoch: 15 Global Step: 81030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:43:08,975-Speed 10501.32 samples/sec Loss 3.7542 LearningRate 0.0447 Epoch: 15 Global Step: 81040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:43:16,789-Speed 10486.67 samples/sec Loss 3.7101 LearningRate 0.0447 Epoch: 15 Global Step: 81050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:43:24,653-Speed 10418.73 samples/sec Loss 3.7340 LearningRate 0.0446 Epoch: 15 Global Step: 81060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:43:32,453-Speed 10503.43 samples/sec Loss 3.7085 LearningRate 0.0446 Epoch: 15 Global Step: 81070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:43:40,268-Speed 10487.82 samples/sec Loss 3.7223 LearningRate 0.0445 Epoch: 15 Global Step: 81080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:43:48,082-Speed 10486.08 samples/sec Loss 3.7135 LearningRate 0.0445 Epoch: 15 Global Step: 81090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:43:55,882-Speed 10503.34 samples/sec Loss 3.6979 LearningRate 0.0445 Epoch: 15 Global Step: 81100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:44:03,697-Speed 10483.22 samples/sec Loss 3.7087 LearningRate 0.0444 Epoch: 15 Global Step: 81110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:44:11,490-Speed 10514.32 samples/sec Loss 3.7244 LearningRate 0.0444 Epoch: 15 Global Step: 81120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:44:19,277-Speed 10521.33 samples/sec Loss 3.7159 LearningRate 0.0443 Epoch: 15 Global Step: 81130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:44:27,067-Speed 10517.61 samples/sec Loss 3.6615 LearningRate 0.0443 Epoch: 15 Global Step: 81140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:44:34,850-Speed 10526.51 samples/sec Loss 3.6969 LearningRate 0.0443 Epoch: 15 Global Step: 81150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:44:42,658-Speed 10493.40 samples/sec Loss 3.7080 LearningRate 0.0442 Epoch: 15 Global Step: 81160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:44:50,459-Speed 10502.50 samples/sec Loss 3.6911 LearningRate 0.0442 Epoch: 15 Global Step: 81170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:44:58,241-Speed 10528.13 samples/sec Loss 3.7046 LearningRate 0.0442 Epoch: 15 Global Step: 81180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:45:06,044-Speed 10500.81 samples/sec Loss 3.7168 LearningRate 0.0441 Epoch: 15 Global Step: 81190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:45:13,874-Speed 10463.13 samples/sec Loss 3.6782 LearningRate 0.0441 Epoch: 15 Global Step: 81200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:45:21,658-Speed 10526.05 samples/sec Loss 3.7294 LearningRate 0.0440 Epoch: 15 Global Step: 81210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:45:29,446-Speed 10520.74 samples/sec Loss 3.6860 LearningRate 0.0440 Epoch: 15 Global Step: 81220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:45:37,237-Speed 10515.68 samples/sec Loss 3.6945 LearningRate 0.0440 Epoch: 15 Global Step: 81230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:45:45,047-Speed 10490.14 samples/sec Loss 3.6940 LearningRate 0.0439 Epoch: 15 Global Step: 81240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:45:52,859-Speed 10487.53 samples/sec Loss 3.6818 LearningRate 0.0439 Epoch: 15 Global Step: 81250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:46:00,666-Speed 10494.50 samples/sec Loss 3.7047 LearningRate 0.0438 Epoch: 15 Global Step: 81260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:46:08,459-Speed 10513.66 samples/sec Loss 3.7504 LearningRate 0.0438 Epoch: 15 Global Step: 81270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:46:16,255-Speed 10510.68 samples/sec Loss 3.7158 LearningRate 0.0438 Epoch: 15 Global Step: 81280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:46:24,044-Speed 10518.45 samples/sec Loss 3.6635 LearningRate 0.0437 Epoch: 15 Global Step: 81290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:46:31,824-Speed 10531.12 samples/sec Loss 3.6951 LearningRate 0.0437 Epoch: 15 Global Step: 81300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:46:39,603-Speed 10532.83 samples/sec Loss 3.6493 LearningRate 0.0436 Epoch: 15 Global Step: 81310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:46:47,394-Speed 10515.30 samples/sec Loss 3.6916 LearningRate 0.0436 Epoch: 15 Global Step: 81320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:46:55,183-Speed 10519.67 samples/sec Loss 3.7077 LearningRate 0.0436 Epoch: 15 Global Step: 81330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:47:03,003-Speed 10476.53 samples/sec Loss 3.7018 LearningRate 0.0435 Epoch: 15 Global Step: 81340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:47:10,797-Speed 10512.50 samples/sec Loss 3.6828 LearningRate 0.0435 Epoch: 15 Global Step: 81350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:47:18,582-Speed 10523.27 samples/sec Loss 3.6897 LearningRate 0.0434 Epoch: 15 Global Step: 81360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:47:26,389-Speed 10494.95 samples/sec Loss 3.6633 LearningRate 0.0434 Epoch: 15 Global Step: 81370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:47:34,181-Speed 10515.45 samples/sec Loss 3.6658 LearningRate 0.0434 Epoch: 15 Global Step: 81380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:47:41,978-Speed 10507.48 samples/sec Loss 3.6619 LearningRate 0.0433 Epoch: 15 Global Step: 81390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:47:49,783-Speed 10498.03 samples/sec Loss 3.6933 LearningRate 0.0433 Epoch: 15 Global Step: 81400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:47:57,574-Speed 10516.94 samples/sec Loss 3.7001 LearningRate 0.0433 Epoch: 15 Global Step: 81410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:48:05,434-Speed 10422.90 samples/sec Loss 3.6975 LearningRate 0.0432 Epoch: 15 Global Step: 81420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:48:13,277-Speed 10446.96 samples/sec Loss 3.6349 LearningRate 0.0432 Epoch: 15 Global Step: 81430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:48:21,088-Speed 10490.44 samples/sec Loss 3.6706 LearningRate 0.0431 Epoch: 15 Global Step: 81440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:48:28,893-Speed 10497.61 samples/sec Loss 3.6434 LearningRate 0.0431 Epoch: 15 Global Step: 81450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:48:36,686-Speed 10514.25 samples/sec Loss 3.6576 LearningRate 0.0431 Epoch: 15 Global Step: 81460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:48:44,491-Speed 10498.26 samples/sec Loss 3.6712 LearningRate 0.0430 Epoch: 15 Global Step: 81470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:48:52,315-Speed 10470.65 samples/sec Loss 3.6730 LearningRate 0.0430 Epoch: 15 Global Step: 81480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:49:00,145-Speed 10464.24 samples/sec Loss 3.6845 LearningRate 0.0429 Epoch: 15 Global Step: 81490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:49:07,991-Speed 10443.57 samples/sec Loss 3.6451 LearningRate 0.0429 Epoch: 15 Global Step: 81500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:49:15,778-Speed 10522.04 samples/sec Loss 3.6735 LearningRate 0.0429 Epoch: 15 Global Step: 81510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:49:23,564-Speed 10522.56 samples/sec Loss 3.6425 LearningRate 0.0428 Epoch: 15 Global Step: 81520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:49:31,376-Speed 10489.98 samples/sec Loss 3.6686 LearningRate 0.0428 Epoch: 15 Global Step: 81530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:49:39,168-Speed 10513.86 samples/sec Loss 3.6409 LearningRate 0.0428 Epoch: 15 Global Step: 81540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:49:46,974-Speed 10497.24 samples/sec Loss 3.6422 LearningRate 0.0427 Epoch: 15 Global Step: 81550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:49:54,780-Speed 10494.30 samples/sec Loss 3.6544 LearningRate 0.0427 Epoch: 15 Global Step: 81560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:50:02,572-Speed 10515.60 samples/sec Loss 3.6013 LearningRate 0.0426 Epoch: 15 Global Step: 81570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:50:10,356-Speed 10525.68 samples/sec Loss 3.6389 LearningRate 0.0426 Epoch: 15 Global Step: 81580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:50:18,184-Speed 10466.30 samples/sec Loss 3.6498 LearningRate 0.0426 Epoch: 15 Global Step: 81590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:50:26,018-Speed 10457.94 samples/sec Loss 3.6471 LearningRate 0.0425 Epoch: 15 Global Step: 81600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:50:33,813-Speed 10511.20 samples/sec Loss 3.6252 LearningRate 0.0425 Epoch: 15 Global Step: 81610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:50:41,630-Speed 10487.47 samples/sec Loss 3.6422 LearningRate 0.0424 Epoch: 15 Global Step: 81620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:50:49,435-Speed 10497.00 samples/sec Loss 3.6317 LearningRate 0.0424 Epoch: 15 Global Step: 81630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:50:57,211-Speed 10537.78 samples/sec Loss 3.6097 LearningRate 0.0424 Epoch: 15 Global Step: 81640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:51:05,011-Speed 10503.75 samples/sec Loss 3.6516 LearningRate 0.0423 Epoch: 15 Global Step: 81650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:51:12,824-Speed 10486.24 samples/sec Loss 3.6419 LearningRate 0.0423 Epoch: 15 Global Step: 81660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:51:20,609-Speed 10524.15 samples/sec Loss 3.6417 LearningRate 0.0422 Epoch: 15 Global Step: 81670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:51:28,401-Speed 10515.47 samples/sec Loss 3.5997 LearningRate 0.0422 Epoch: 15 Global Step: 81680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:51:36,187-Speed 10522.22 samples/sec Loss 3.6154 LearningRate 0.0422 Epoch: 15 Global Step: 81690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:51:43,953-Speed 10550.45 samples/sec Loss 3.6719 LearningRate 0.0421 Epoch: 15 Global Step: 81700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:51:51,761-Speed 10495.10 samples/sec Loss 3.6033 LearningRate 0.0421 Epoch: 15 Global Step: 81710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:51:59,556-Speed 10511.24 samples/sec Loss 3.6218 LearningRate 0.0421 Epoch: 15 Global Step: 81720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:52:07,358-Speed 10500.27 samples/sec Loss 3.6403 LearningRate 0.0420 Epoch: 15 Global Step: 81730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:52:15,140-Speed 10528.93 samples/sec Loss 3.6451 LearningRate 0.0420 Epoch: 15 Global Step: 81740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:52:22,935-Speed 10510.34 samples/sec Loss 3.6420 LearningRate 0.0419 Epoch: 15 Global Step: 81750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:52:30,723-Speed 10519.71 samples/sec Loss 3.6223 LearningRate 0.0419 Epoch: 15 Global Step: 81760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:52:38,518-Speed 10511.38 samples/sec Loss 3.6064 LearningRate 0.0419 Epoch: 15 Global Step: 81770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:52:46,315-Speed 10508.61 samples/sec Loss 3.6292 LearningRate 0.0418 Epoch: 15 Global Step: 81780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:52:54,121-Speed 10495.59 samples/sec Loss 3.6261 LearningRate 0.0418 Epoch: 15 Global Step: 81790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:53:01,911-Speed 10518.22 samples/sec Loss 3.6257 LearningRate 0.0418 Epoch: 15 Global Step: 81800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:53:09,726-Speed 10483.22 samples/sec Loss 3.6193 LearningRate 0.0417 Epoch: 15 Global Step: 81810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:53:17,609-Speed 10394.60 samples/sec Loss 3.6458 LearningRate 0.0417 Epoch: 15 Global Step: 81820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:53:25,438-Speed 10465.72 samples/sec Loss 3.6043 LearningRate 0.0416 Epoch: 15 Global Step: 81830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:53:33,235-Speed 10506.39 samples/sec Loss 3.6049 LearningRate 0.0416 Epoch: 15 Global Step: 81840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:53:41,064-Speed 10465.79 samples/sec Loss 3.5904 LearningRate 0.0416 Epoch: 15 Global Step: 81850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:53:48,877-Speed 10486.55 samples/sec Loss 3.6164 LearningRate 0.0415 Epoch: 15 Global Step: 81860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:53:56,706-Speed 10465.25 samples/sec Loss 3.6221 LearningRate 0.0415 Epoch: 15 Global Step: 81870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:54:04,491-Speed 10524.40 samples/sec Loss 3.6101 LearningRate 0.0414 Epoch: 15 Global Step: 81880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:54:12,322-Speed 10461.49 samples/sec Loss 3.6016 LearningRate 0.0414 Epoch: 15 Global Step: 81890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:54:20,144-Speed 10475.63 samples/sec Loss 3.6184 LearningRate 0.0414 Epoch: 15 Global Step: 81900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:54:27,942-Speed 10506.45 samples/sec Loss 3.6330 LearningRate 0.0413 Epoch: 15 Global Step: 81910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:54:35,747-Speed 10496.14 samples/sec Loss 3.5893 LearningRate 0.0413 Epoch: 15 Global Step: 81920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:54:43,562-Speed 10484.15 samples/sec Loss 3.5789 LearningRate 0.0413 Epoch: 15 Global Step: 81930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:54:51,372-Speed 10496.76 samples/sec Loss 3.6065 LearningRate 0.0412 Epoch: 15 Global Step: 81940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:54:59,183-Speed 10489.02 samples/sec Loss 3.5975 LearningRate 0.0412 Epoch: 15 Global Step: 81950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:55:06,987-Speed 10499.95 samples/sec Loss 3.5870 LearningRate 0.0411 Epoch: 15 Global Step: 81960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:55:14,775-Speed 10519.57 samples/sec Loss 3.5925 LearningRate 0.0411 Epoch: 15 Global Step: 81970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:55:22,573-Speed 10506.80 samples/sec Loss 3.5934 LearningRate 0.0411 Epoch: 15 Global Step: 81980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:55:30,368-Speed 10509.97 samples/sec Loss 3.5853 LearningRate 0.0410 Epoch: 15 Global Step: 81990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:55:38,151-Speed 10527.19 samples/sec Loss 3.6086 LearningRate 0.0410 Epoch: 15 Global Step: 82000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:55:45,952-Speed 10502.40 samples/sec Loss 3.5697 LearningRate 0.0410 Epoch: 15 Global Step: 82010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:55:53,763-Speed 10489.45 samples/sec Loss 3.5719 LearningRate 0.0409 Epoch: 15 Global Step: 82020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:56:01,587-Speed 10472.35 samples/sec Loss 3.5907 LearningRate 0.0409 Epoch: 15 Global Step: 82030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:56:09,389-Speed 10501.79 samples/sec Loss 3.5649 LearningRate 0.0408 Epoch: 15 Global Step: 82040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:56:17,201-Speed 10487.73 samples/sec Loss 3.5913 LearningRate 0.0408 Epoch: 15 Global Step: 82050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:56:24,998-Speed 10507.57 samples/sec Loss 3.5755 LearningRate 0.0408 Epoch: 15 Global Step: 82060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:56:32,814-Speed 10481.20 samples/sec Loss 3.5940 LearningRate 0.0407 Epoch: 15 Global Step: 82070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:56:40,609-Speed 10511.53 samples/sec Loss 3.5708 LearningRate 0.0407 Epoch: 15 Global Step: 82080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:56:48,404-Speed 10511.03 samples/sec Loss 3.5446 LearningRate 0.0407 Epoch: 15 Global Step: 82090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:56:56,231-Speed 10467.47 samples/sec Loss 3.5632 LearningRate 0.0406 Epoch: 15 Global Step: 82100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:57:04,036-Speed 10497.38 samples/sec Loss 3.5532 LearningRate 0.0406 Epoch: 15 Global Step: 82110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:57:11,844-Speed 10493.37 samples/sec Loss 3.5843 LearningRate 0.0405 Epoch: 15 Global Step: 82120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:57:19,638-Speed 10512.26 samples/sec Loss 3.5657 LearningRate 0.0405 Epoch: 15 Global Step: 82130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:57:27,458-Speed 10476.02 samples/sec Loss 3.5538 LearningRate 0.0405 Epoch: 15 Global Step: 82140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:57:35,243-Speed 10525.35 samples/sec Loss 3.5665 LearningRate 0.0404 Epoch: 15 Global Step: 82150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:57:43,021-Speed 10538.69 samples/sec Loss 3.5796 LearningRate 0.0404 Epoch: 15 Global Step: 82160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:57:50,824-Speed 10498.65 samples/sec Loss 3.6057 LearningRate 0.0404 Epoch: 15 Global Step: 82170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:57:58,644-Speed 10477.21 samples/sec Loss 3.5643 LearningRate 0.0403 Epoch: 15 Global Step: 82180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:58:06,446-Speed 10501.24 samples/sec Loss 3.5645 LearningRate 0.0403 Epoch: 15 Global Step: 82190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:58:14,279-Speed 10460.68 samples/sec Loss 3.5189 LearningRate 0.0402 Epoch: 15 Global Step: 82200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:58:22,066-Speed 10520.73 samples/sec Loss 3.6055 LearningRate 0.0402 Epoch: 15 Global Step: 82210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:58:29,860-Speed 10512.21 samples/sec Loss 3.5593 LearningRate 0.0402 Epoch: 15 Global Step: 82220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:58:37,647-Speed 10521.33 samples/sec Loss 3.5601 LearningRate 0.0401 Epoch: 15 Global Step: 82230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:58:45,440-Speed 10514.41 samples/sec Loss 3.5912 LearningRate 0.0401 Epoch: 15 Global Step: 82240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:58:53,218-Speed 10532.34 samples/sec Loss 3.5600 LearningRate 0.0401 Epoch: 15 Global Step: 82250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:59:01,002-Speed 10525.82 samples/sec Loss 3.5495 LearningRate 0.0400 Epoch: 15 Global Step: 82260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:59:08,792-Speed 10518.18 samples/sec Loss 3.5585 LearningRate 0.0400 Epoch: 15 Global Step: 82270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:59:16,594-Speed 10500.65 samples/sec Loss 3.5654 LearningRate 0.0399 Epoch: 15 Global Step: 82280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:59:24,413-Speed 10478.29 samples/sec Loss 3.5608 LearningRate 0.0399 Epoch: 15 Global Step: 82290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:59:32,252-Speed 10452.43 samples/sec Loss 3.5655 LearningRate 0.0399 Epoch: 15 Global Step: 82300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:59:40,064-Speed 10487.04 samples/sec Loss 3.5577 LearningRate 0.0398 Epoch: 15 Global Step: 82310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 08:59:47,852-Speed 10520.62 samples/sec Loss 3.5369 LearningRate 0.0398 Epoch: 15 Global Step: 82320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 08:59:55,647-Speed 10510.03 samples/sec Loss 3.5125 LearningRate 0.0398 Epoch: 15 Global Step: 82330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:00:03,448-Speed 10503.40 samples/sec Loss 3.5604 LearningRate 0.0397 Epoch: 15 Global Step: 82340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:00:11,282-Speed 10458.48 samples/sec Loss 3.5546 LearningRate 0.0397 Epoch: 15 Global Step: 82350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:00:19,143-Speed 10422.37 samples/sec Loss 3.5145 LearningRate 0.0396 Epoch: 15 Global Step: 82360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:00:26,987-Speed 10445.20 samples/sec Loss 3.5650 LearningRate 0.0396 Epoch: 15 Global Step: 82370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:00:34,793-Speed 10496.05 samples/sec Loss 3.5435 LearningRate 0.0396 Epoch: 15 Global Step: 82380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:00:42,594-Speed 10502.76 samples/sec Loss 3.5124 LearningRate 0.0395 Epoch: 15 Global Step: 82390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:00:50,385-Speed 10516.80 samples/sec Loss 3.5253 LearningRate 0.0395 Epoch: 15 Global Step: 82400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:00:58,210-Speed 10469.38 samples/sec Loss 3.5106 LearningRate 0.0395 Epoch: 15 Global Step: 82410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:01:06,032-Speed 10475.48 samples/sec Loss 3.5177 LearningRate 0.0394 Epoch: 15 Global Step: 82420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 09:01:13,841-Speed 10491.11 samples/sec Loss 3.5276 LearningRate 0.0394 Epoch: 15 Global Step: 82430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 09:01:21,646-Speed 10497.81 samples/sec Loss 3.5563 LearningRate 0.0393 Epoch: 15 Global Step: 82440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 09:01:29,436-Speed 10518.39 samples/sec Loss 3.5473 LearningRate 0.0393 Epoch: 15 Global Step: 82450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 09:01:37,222-Speed 10522.54 samples/sec Loss 3.4926 LearningRate 0.0393 Epoch: 15 Global Step: 82460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 09:01:45,005-Speed 10527.45 samples/sec Loss 3.4913 LearningRate 0.0392 Epoch: 15 Global Step: 82470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 09:01:52,785-Speed 10531.40 samples/sec Loss 3.4950 LearningRate 0.0392 Epoch: 15 Global Step: 82480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:02:00,588-Speed 10500.28 samples/sec Loss 3.4935 LearningRate 0.0392 Epoch: 15 Global Step: 82490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:02:08,407-Speed 10478.95 samples/sec Loss 3.5227 LearningRate 0.0391 Epoch: 15 Global Step: 82500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:02:16,182-Speed 10536.23 samples/sec Loss 3.5171 LearningRate 0.0391 Epoch: 15 Global Step: 82510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:02:23,977-Speed 10511.29 samples/sec Loss 3.5143 LearningRate 0.0390 Epoch: 15 Global Step: 82520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:02:31,753-Speed 10535.66 samples/sec Loss 3.5382 LearningRate 0.0390 Epoch: 15 Global Step: 82530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:02:39,538-Speed 10524.58 samples/sec Loss 3.5253 LearningRate 0.0390 Epoch: 15 Global Step: 82540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:02:47,351-Speed 10486.11 samples/sec Loss 3.5331 LearningRate 0.0389 Epoch: 15 Global Step: 82550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:02:55,177-Speed 10469.35 samples/sec Loss 3.5327 LearningRate 0.0389 Epoch: 15 Global Step: 82560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:03:02,955-Speed 10533.22 samples/sec Loss 3.5166 LearningRate 0.0389 Epoch: 15 Global Step: 82570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:03:10,723-Speed 10548.63 samples/sec Loss 3.5104 LearningRate 0.0388 Epoch: 15 Global Step: 82580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:03:18,529-Speed 10495.61 samples/sec Loss 3.5142 LearningRate 0.0388 Epoch: 15 Global Step: 82590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:03:26,333-Speed 10497.67 samples/sec Loss 3.4867 LearningRate 0.0388 Epoch: 15 Global Step: 82600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:03:34,127-Speed 10514.28 samples/sec Loss 3.5084 LearningRate 0.0387 Epoch: 15 Global Step: 82610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:03:41,920-Speed 10513.89 samples/sec Loss 3.4875 LearningRate 0.0387 Epoch: 15 Global Step: 82620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:03:49,728-Speed 10492.70 samples/sec Loss 3.5066 LearningRate 0.0386 Epoch: 15 Global Step: 82630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:03:57,523-Speed 10510.93 samples/sec Loss 3.4960 LearningRate 0.0386 Epoch: 15 Global Step: 82640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:04:05,315-Speed 10514.09 samples/sec Loss 3.4968 LearningRate 0.0386 Epoch: 15 Global Step: 82650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:04:13,108-Speed 10513.84 samples/sec Loss 3.5092 LearningRate 0.0385 Epoch: 15 Global Step: 82660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:04:20,896-Speed 10519.64 samples/sec Loss 3.4783 LearningRate 0.0385 Epoch: 15 Global Step: 82670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:04:28,696-Speed 10504.22 samples/sec Loss 3.5128 LearningRate 0.0385 Epoch: 15 Global Step: 82680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 09:04:36,485-Speed 10519.17 samples/sec Loss 3.4867 LearningRate 0.0384 Epoch: 15 Global Step: 82690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 09:04:44,284-Speed 10505.10 samples/sec Loss 3.5052 LearningRate 0.0384 Epoch: 15 Global Step: 82700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 09:04:52,079-Speed 10510.85 samples/sec Loss 3.4973 LearningRate 0.0384 Epoch: 15 Global Step: 82710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 09:04:59,878-Speed 10505.28 samples/sec Loss 3.5092 LearningRate 0.0383 Epoch: 15 Global Step: 82720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 09:05:07,683-Speed 10497.43 samples/sec Loss 3.4696 LearningRate 0.0383 Epoch: 15 Global Step: 82730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 09:05:15,468-Speed 10524.84 samples/sec Loss 3.4832 LearningRate 0.0382 Epoch: 15 Global Step: 82740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:05:23,270-Speed 10500.30 samples/sec Loss 3.4925 LearningRate 0.0382 Epoch: 15 Global Step: 82750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:05:31,076-Speed 10496.36 samples/sec Loss 3.4837 LearningRate 0.0382 Epoch: 15 Global Step: 82760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:05:38,865-Speed 10518.93 samples/sec Loss 3.4871 LearningRate 0.0381 Epoch: 15 Global Step: 82770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:05:46,657-Speed 10515.22 samples/sec Loss 3.4878 LearningRate 0.0381 Epoch: 15 Global Step: 82780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:05:54,463-Speed 10495.85 samples/sec Loss 3.4754 LearningRate 0.0381 Epoch: 15 Global Step: 82790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:06:02,299-Speed 10456.29 samples/sec Loss 3.4740 LearningRate 0.0380 Epoch: 15 Global Step: 82800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:06:10,113-Speed 10486.50 samples/sec Loss 3.4834 LearningRate 0.0380 Epoch: 15 Global Step: 82810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:06:17,894-Speed 10530.06 samples/sec Loss 3.4868 LearningRate 0.0379 Epoch: 15 Global Step: 82820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:06:25,724-Speed 10462.67 samples/sec Loss 3.4470 LearningRate 0.0379 Epoch: 15 Global Step: 82830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:06:33,523-Speed 10505.81 samples/sec Loss 3.4708 LearningRate 0.0379 Epoch: 15 Global Step: 82840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 09:06:41,363-Speed 10450.37 samples/sec Loss 3.4911 LearningRate 0.0378 Epoch: 15 Global Step: 82850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 09:06:49,183-Speed 10477.73 samples/sec Loss 3.4717 LearningRate 0.0378 Epoch: 15 Global Step: 82860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 09:06:56,974-Speed 10515.81 samples/sec Loss 3.4537 LearningRate 0.0378 Epoch: 15 Global Step: 82870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 09:07:04,771-Speed 10508.72 samples/sec Loss 3.4731 LearningRate 0.0377 Epoch: 15 Global Step: 82880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:07:12,570-Speed 10505.16 samples/sec Loss 3.4578 LearningRate 0.0377 Epoch: 15 Global Step: 82890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:07:20,355-Speed 10523.85 samples/sec Loss 3.4822 LearningRate 0.0377 Epoch: 15 Global Step: 82900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:07:28,147-Speed 10516.20 samples/sec Loss 3.4539 LearningRate 0.0376 Epoch: 15 Global Step: 82910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:07:35,953-Speed 10494.83 samples/sec Loss 3.4604 LearningRate 0.0376 Epoch: 15 Global Step: 82920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:07:43,754-Speed 10501.88 samples/sec Loss 3.4982 LearningRate 0.0376 Epoch: 15 Global Step: 82930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:07:51,554-Speed 10505.02 samples/sec Loss 3.4631 LearningRate 0.0375 Epoch: 15 Global Step: 82940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:07:59,370-Speed 10482.97 samples/sec Loss 3.4939 LearningRate 0.0375 Epoch: 15 Global Step: 82950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:08:22,166-Speed 3593.74 samples/sec Loss 3.4383 LearningRate 0.0374 Epoch: 16 Global Step: 82960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:08:29,922-Speed 10563.20 samples/sec Loss 3.4692 LearningRate 0.0374 Epoch: 16 Global Step: 82970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:08:37,690-Speed 10547.57 samples/sec Loss 3.4382 LearningRate 0.0374 Epoch: 16 Global Step: 82980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:08:45,437-Speed 10574.91 samples/sec Loss 3.4489 LearningRate 0.0373 Epoch: 16 Global Step: 82990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:08:53,188-Speed 10570.16 samples/sec Loss 3.4567 LearningRate 0.0373 Epoch: 16 Global Step: 83000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:09:00,958-Speed 10545.18 samples/sec Loss 3.4499 LearningRate 0.0373 Epoch: 16 Global Step: 83010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:09:08,724-Speed 10549.39 samples/sec Loss 3.4274 LearningRate 0.0372 Epoch: 16 Global Step: 83020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:09:16,508-Speed 10526.59 samples/sec Loss 3.4461 LearningRate 0.0372 Epoch: 16 Global Step: 83030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:09:24,306-Speed 10505.84 samples/sec Loss 3.4673 LearningRate 0.0372 Epoch: 16 Global Step: 83040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:09:32,101-Speed 10511.14 samples/sec Loss 3.4445 LearningRate 0.0371 Epoch: 16 Global Step: 83050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:09:39,873-Speed 10541.61 samples/sec Loss 3.4175 LearningRate 0.0371 Epoch: 16 Global Step: 83060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:09:47,659-Speed 10523.46 samples/sec Loss 3.4337 LearningRate 0.0370 Epoch: 16 Global Step: 83070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:09:55,454-Speed 10510.78 samples/sec Loss 3.4295 LearningRate 0.0370 Epoch: 16 Global Step: 83080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 09:10:03,228-Speed 10540.09 samples/sec Loss 3.4475 LearningRate 0.0370 Epoch: 16 Global Step: 83090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:10:11,011-Speed 10527.14 samples/sec Loss 3.4010 LearningRate 0.0369 Epoch: 16 Global Step: 83100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:10:18,791-Speed 10529.36 samples/sec Loss 3.4176 LearningRate 0.0369 Epoch: 16 Global Step: 83110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:10:26,594-Speed 10500.63 samples/sec Loss 3.4205 LearningRate 0.0369 Epoch: 16 Global Step: 83120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:10:34,383-Speed 10518.66 samples/sec Loss 3.4148 LearningRate 0.0368 Epoch: 16 Global Step: 83130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:10:42,181-Speed 10507.37 samples/sec Loss 3.3880 LearningRate 0.0368 Epoch: 16 Global Step: 83140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:10:49,959-Speed 10533.14 samples/sec Loss 3.4061 LearningRate 0.0368 Epoch: 16 Global Step: 83150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:10:57,746-Speed 10521.14 samples/sec Loss 3.4323 LearningRate 0.0367 Epoch: 16 Global Step: 83160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:11:05,534-Speed 10520.46 samples/sec Loss 3.4157 LearningRate 0.0367 Epoch: 16 Global Step: 83170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:11:13,299-Speed 10550.91 samples/sec Loss 3.4353 LearningRate 0.0367 Epoch: 16 Global Step: 83180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:11:21,075-Speed 10536.22 samples/sec Loss 3.4268 LearningRate 0.0366 Epoch: 16 Global Step: 83190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-01-16 09:11:28,842-Speed 10548.96 samples/sec Loss 3.3846 LearningRate 0.0366 Epoch: 16 Global Step: 83200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:11:36,631-Speed 10522.24 samples/sec Loss 3.4121 LearningRate 0.0365 Epoch: 16 Global Step: 83210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:11:44,422-Speed 10515.35 samples/sec Loss 3.4253 LearningRate 0.0365 Epoch: 16 Global Step: 83220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-01-16 09:11:52,196-Speed 10539.88 samples/sec Loss 3.4164 LearningRate 0.0365 Epoch: 16 Global Step: 83230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:11:59,983-Speed 10521.83 samples/sec Loss 3.3996 LearningRate 0.0364 Epoch: 16 Global Step: 83240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:12:07,752-Speed 10545.79 samples/sec Loss 3.3922 LearningRate 0.0364 Epoch: 16 Global Step: 83250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:12:15,554-Speed 10502.19 samples/sec Loss 3.4259 LearningRate 0.0364 Epoch: 16 Global Step: 83260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:12:23,328-Speed 10538.87 samples/sec Loss 3.3995 LearningRate 0.0363 Epoch: 16 Global Step: 83270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:12:31,107-Speed 10533.12 samples/sec Loss 3.4443 LearningRate 0.0363 Epoch: 16 Global Step: 83280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:12:38,871-Speed 10552.22 samples/sec Loss 3.4318 LearningRate 0.0363 Epoch: 16 Global Step: 83290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:12:46,659-Speed 10520.03 samples/sec Loss 3.3967 LearningRate 0.0362 Epoch: 16 Global Step: 83300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:12:54,442-Speed 10527.66 samples/sec Loss 3.4152 LearningRate 0.0362 Epoch: 16 Global Step: 83310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:13:02,220-Speed 10536.36 samples/sec Loss 3.4038 LearningRate 0.0362 Epoch: 16 Global Step: 83320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:13:10,039-Speed 10478.38 samples/sec Loss 3.4206 LearningRate 0.0361 Epoch: 16 Global Step: 83330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:13:17,838-Speed 10505.09 samples/sec Loss 3.4451 LearningRate 0.0361 Epoch: 16 Global Step: 83340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:13:25,646-Speed 10492.73 samples/sec Loss 3.4426 LearningRate 0.0360 Epoch: 16 Global Step: 83350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:13:33,456-Speed 10493.45 samples/sec Loss 3.4282 LearningRate 0.0360 Epoch: 16 Global Step: 83360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:13:41,286-Speed 10463.76 samples/sec Loss 3.4085 LearningRate 0.0360 Epoch: 16 Global Step: 83370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:13:49,091-Speed 10497.09 samples/sec Loss 3.3957 LearningRate 0.0359 Epoch: 16 Global Step: 83380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:13:56,903-Speed 10487.06 samples/sec Loss 3.3726 LearningRate 0.0359 Epoch: 16 Global Step: 83390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:14:04,742-Speed 10451.59 samples/sec Loss 3.3608 LearningRate 0.0359 Epoch: 16 Global Step: 83400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:14:12,556-Speed 10486.13 samples/sec Loss 3.3807 LearningRate 0.0358 Epoch: 16 Global Step: 83410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:14:20,364-Speed 10492.26 samples/sec Loss 3.3782 LearningRate 0.0358 Epoch: 16 Global Step: 83420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:14:28,184-Speed 10477.84 samples/sec Loss 3.4156 LearningRate 0.0358 Epoch: 16 Global Step: 83430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:14:36,017-Speed 10459.19 samples/sec Loss 3.3838 LearningRate 0.0357 Epoch: 16 Global Step: 83440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:14:43,850-Speed 10460.17 samples/sec Loss 3.3891 LearningRate 0.0357 Epoch: 16 Global Step: 83450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:14:51,696-Speed 10441.57 samples/sec Loss 3.3788 LearningRate 0.0357 Epoch: 16 Global Step: 83460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:14:59,579-Speed 10393.37 samples/sec Loss 3.3625 LearningRate 0.0356 Epoch: 16 Global Step: 83470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:15:07,409-Speed 10464.55 samples/sec Loss 3.3636 LearningRate 0.0356 Epoch: 16 Global Step: 83480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:15:15,250-Speed 10449.17 samples/sec Loss 3.3802 LearningRate 0.0356 Epoch: 16 Global Step: 83490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:15:23,089-Speed 10451.10 samples/sec Loss 3.3983 LearningRate 0.0355 Epoch: 16 Global Step: 83500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:15:30,932-Speed 10446.12 samples/sec Loss 3.3882 LearningRate 0.0355 Epoch: 16 Global Step: 83510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:15:38,783-Speed 10436.75 samples/sec Loss 3.3918 LearningRate 0.0354 Epoch: 16 Global Step: 83520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:15:46,640-Speed 10429.24 samples/sec Loss 3.3457 LearningRate 0.0354 Epoch: 16 Global Step: 83530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:15:54,499-Speed 10425.24 samples/sec Loss 3.3657 LearningRate 0.0354 Epoch: 16 Global Step: 83540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:16:02,362-Speed 10421.22 samples/sec Loss 3.3685 LearningRate 0.0353 Epoch: 16 Global Step: 83550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:16:10,200-Speed 10452.88 samples/sec Loss 3.3543 LearningRate 0.0353 Epoch: 16 Global Step: 83560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:16:18,046-Speed 10442.41 samples/sec Loss 3.3690 LearningRate 0.0353 Epoch: 16 Global Step: 83570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:16:25,871-Speed 10471.12 samples/sec Loss 3.3640 LearningRate 0.0352 Epoch: 16 Global Step: 83580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:16:33,723-Speed 10433.42 samples/sec Loss 3.3496 LearningRate 0.0352 Epoch: 16 Global Step: 83590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:16:41,579-Speed 10430.18 samples/sec Loss 3.3840 LearningRate 0.0352 Epoch: 16 Global Step: 83600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:16:49,410-Speed 10462.68 samples/sec Loss 3.3310 LearningRate 0.0351 Epoch: 16 Global Step: 83610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:16:57,262-Speed 10434.50 samples/sec Loss 3.3267 LearningRate 0.0351 Epoch: 16 Global Step: 83620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:17:05,123-Speed 10421.71 samples/sec Loss 3.3925 LearningRate 0.0351 Epoch: 16 Global Step: 83630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:17:12,992-Speed 10412.40 samples/sec Loss 3.3432 LearningRate 0.0350 Epoch: 16 Global Step: 83640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:17:20,844-Speed 10434.08 samples/sec Loss 3.3468 LearningRate 0.0350 Epoch: 16 Global Step: 83650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:17:28,676-Speed 10460.62 samples/sec Loss 3.3735 LearningRate 0.0350 Epoch: 16 Global Step: 83660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:17:36,523-Speed 10439.70 samples/sec Loss 3.3417 LearningRate 0.0349 Epoch: 16 Global Step: 83670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:17:44,364-Speed 10449.76 samples/sec Loss 3.3474 LearningRate 0.0349 Epoch: 16 Global Step: 83680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:17:52,210-Speed 10443.62 samples/sec Loss 3.3429 LearningRate 0.0349 Epoch: 16 Global Step: 83690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:18:00,042-Speed 10459.68 samples/sec Loss 3.3785 LearningRate 0.0348 Epoch: 16 Global Step: 83700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:18:07,850-Speed 10494.21 samples/sec Loss 3.3596 LearningRate 0.0348 Epoch: 16 Global Step: 83710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:18:15,694-Speed 10446.01 samples/sec Loss 3.3432 LearningRate 0.0347 Epoch: 16 Global Step: 83720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:18:23,508-Speed 10483.66 samples/sec Loss 3.3067 LearningRate 0.0347 Epoch: 16 Global Step: 83730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:18:31,315-Speed 10494.76 samples/sec Loss 3.3450 LearningRate 0.0347 Epoch: 16 Global Step: 83740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:18:39,129-Speed 10486.05 samples/sec Loss 3.3161 LearningRate 0.0346 Epoch: 16 Global Step: 83750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:18:46,946-Speed 10480.67 samples/sec Loss 3.3189 LearningRate 0.0346 Epoch: 16 Global Step: 83760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:18:54,743-Speed 10508.80 samples/sec Loss 3.3403 LearningRate 0.0346 Epoch: 16 Global Step: 83770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:19:02,563-Speed 10475.75 samples/sec Loss 3.3466 LearningRate 0.0345 Epoch: 16 Global Step: 83780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:19:10,364-Speed 10502.63 samples/sec Loss 3.3323 LearningRate 0.0345 Epoch: 16 Global Step: 83790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:19:18,182-Speed 10480.26 samples/sec Loss 3.3280 LearningRate 0.0345 Epoch: 16 Global Step: 83800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:19:25,997-Speed 10483.52 samples/sec Loss 3.3465 LearningRate 0.0344 Epoch: 16 Global Step: 83810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:19:33,796-Speed 10505.72 samples/sec Loss 3.3335 LearningRate 0.0344 Epoch: 16 Global Step: 83820 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:19:41,596-Speed 10503.45 samples/sec Loss 3.3359 LearningRate 0.0344 Epoch: 16 Global Step: 83830 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:19:49,405-Speed 10492.63 samples/sec Loss 3.3400 LearningRate 0.0343 Epoch: 16 Global Step: 83840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:19:57,210-Speed 10497.50 samples/sec Loss 3.3202 LearningRate 0.0343 Epoch: 16 Global Step: 83850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:20:05,024-Speed 10485.34 samples/sec Loss 3.3234 LearningRate 0.0343 Epoch: 16 Global Step: 83860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:20:12,868-Speed 10444.30 samples/sec Loss 3.3088 LearningRate 0.0342 Epoch: 16 Global Step: 83870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:20:20,705-Speed 10454.88 samples/sec Loss 3.3268 LearningRate 0.0342 Epoch: 16 Global Step: 83880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:20:28,509-Speed 10497.59 samples/sec Loss 3.3550 LearningRate 0.0342 Epoch: 16 Global Step: 83890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:20:36,318-Speed 10492.96 samples/sec Loss 3.3369 LearningRate 0.0341 Epoch: 16 Global Step: 83900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:20:44,131-Speed 10485.93 samples/sec Loss 3.3279 LearningRate 0.0341 Epoch: 16 Global Step: 83910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:20:51,959-Speed 10467.17 samples/sec Loss 3.3147 LearningRate 0.0341 Epoch: 16 Global Step: 83920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:20:59,764-Speed 10497.46 samples/sec Loss 3.3078 LearningRate 0.0340 Epoch: 16 Global Step: 83930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:21:07,582-Speed 10479.67 samples/sec Loss 3.3156 LearningRate 0.0340 Epoch: 16 Global Step: 83940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:21:15,386-Speed 10498.38 samples/sec Loss 3.3183 LearningRate 0.0339 Epoch: 16 Global Step: 83950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:21:23,179-Speed 10515.33 samples/sec Loss 3.3109 LearningRate 0.0339 Epoch: 16 Global Step: 83960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:21:30,975-Speed 10509.12 samples/sec Loss 3.3094 LearningRate 0.0339 Epoch: 16 Global Step: 83970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:21:38,800-Speed 10470.01 samples/sec Loss 3.3217 LearningRate 0.0338 Epoch: 16 Global Step: 83980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:21:46,603-Speed 10500.97 samples/sec Loss 3.2865 LearningRate 0.0338 Epoch: 16 Global Step: 83990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:21:54,418-Speed 10482.98 samples/sec Loss 3.2974 LearningRate 0.0338 Epoch: 16 Global Step: 84000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:22:02,200-Speed 10528.51 samples/sec Loss 3.2964 LearningRate 0.0337 Epoch: 16 Global Step: 84010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:22:09,995-Speed 10510.00 samples/sec Loss 3.2877 LearningRate 0.0337 Epoch: 16 Global Step: 84020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:22:17,791-Speed 10510.03 samples/sec Loss 3.2967 LearningRate 0.0337 Epoch: 16 Global Step: 84030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:22:25,571-Speed 10530.42 samples/sec Loss 3.3288 LearningRate 0.0336 Epoch: 16 Global Step: 84040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:22:33,349-Speed 10533.48 samples/sec Loss 3.3127 LearningRate 0.0336 Epoch: 16 Global Step: 84050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:22:41,141-Speed 10515.86 samples/sec Loss 3.3219 LearningRate 0.0336 Epoch: 16 Global Step: 84060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:22:48,945-Speed 10498.54 samples/sec Loss 3.3041 LearningRate 0.0335 Epoch: 16 Global Step: 84070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:22:56,748-Speed 10499.75 samples/sec Loss 3.3230 LearningRate 0.0335 Epoch: 16 Global Step: 84080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:23:04,541-Speed 10514.40 samples/sec Loss 3.2888 LearningRate 0.0335 Epoch: 16 Global Step: 84090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:23:12,336-Speed 10511.07 samples/sec Loss 3.3109 LearningRate 0.0334 Epoch: 16 Global Step: 84100 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:23:20,134-Speed 10506.58 samples/sec Loss 3.2862 LearningRate 0.0334 Epoch: 16 Global Step: 84110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:23:27,951-Speed 10480.77 samples/sec Loss 3.2766 LearningRate 0.0334 Epoch: 16 Global Step: 84120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:23:35,756-Speed 10498.79 samples/sec Loss 3.2916 LearningRate 0.0333 Epoch: 16 Global Step: 84130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:23:43,542-Speed 10522.20 samples/sec Loss 3.3078 LearningRate 0.0333 Epoch: 16 Global Step: 84140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:23:51,348-Speed 10495.61 samples/sec Loss 3.2741 LearningRate 0.0333 Epoch: 16 Global Step: 84150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:23:59,162-Speed 10485.07 samples/sec Loss 3.2856 LearningRate 0.0332 Epoch: 16 Global Step: 84160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:24:06,980-Speed 10480.27 samples/sec Loss 3.2859 LearningRate 0.0332 Epoch: 16 Global Step: 84170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:24:14,795-Speed 10483.85 samples/sec Loss 3.2875 LearningRate 0.0332 Epoch: 16 Global Step: 84180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:24:22,573-Speed 10534.39 samples/sec Loss 3.2836 LearningRate 0.0331 Epoch: 16 Global Step: 84190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:24:30,371-Speed 10505.80 samples/sec Loss 3.3000 LearningRate 0.0331 Epoch: 16 Global Step: 84200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:24:38,158-Speed 10522.99 samples/sec Loss 3.2715 LearningRate 0.0331 Epoch: 16 Global Step: 84210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:24:45,946-Speed 10519.69 samples/sec Loss 3.2528 LearningRate 0.0330 Epoch: 16 Global Step: 84220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:24:53,738-Speed 10515.15 samples/sec Loss 3.3035 LearningRate 0.0330 Epoch: 16 Global Step: 84230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:25:01,531-Speed 10512.60 samples/sec Loss 3.2522 LearningRate 0.0330 Epoch: 16 Global Step: 84240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:25:09,332-Speed 10503.68 samples/sec Loss 3.2621 LearningRate 0.0329 Epoch: 16 Global Step: 84250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:25:17,135-Speed 10499.22 samples/sec Loss 3.2808 LearningRate 0.0329 Epoch: 16 Global Step: 84260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:25:24,922-Speed 10521.60 samples/sec Loss 3.2557 LearningRate 0.0329 Epoch: 16 Global Step: 84270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:25:32,718-Speed 10509.51 samples/sec Loss 3.2769 LearningRate 0.0328 Epoch: 16 Global Step: 84280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:25:40,503-Speed 10524.14 samples/sec Loss 3.2694 LearningRate 0.0328 Epoch: 16 Global Step: 84290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:25:48,329-Speed 10469.00 samples/sec Loss 3.2925 LearningRate 0.0328 Epoch: 16 Global Step: 84300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:25:56,127-Speed 10507.38 samples/sec Loss 3.2615 LearningRate 0.0327 Epoch: 16 Global Step: 84310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:26:03,925-Speed 10505.62 samples/sec Loss 3.2643 LearningRate 0.0327 Epoch: 16 Global Step: 84320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:26:11,741-Speed 10483.09 samples/sec Loss 3.2885 LearningRate 0.0327 Epoch: 16 Global Step: 84330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:26:19,572-Speed 10461.83 samples/sec Loss 3.2519 LearningRate 0.0326 Epoch: 16 Global Step: 84340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:26:27,377-Speed 10497.65 samples/sec Loss 3.2635 LearningRate 0.0326 Epoch: 16 Global Step: 84350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:26:35,153-Speed 10535.98 samples/sec Loss 3.2567 LearningRate 0.0326 Epoch: 16 Global Step: 84360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:26:42,932-Speed 10533.09 samples/sec Loss 3.2575 LearningRate 0.0325 Epoch: 16 Global Step: 84370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:26:50,728-Speed 10509.10 samples/sec Loss 3.2356 LearningRate 0.0325 Epoch: 16 Global Step: 84380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:26:58,534-Speed 10496.40 samples/sec Loss 3.2390 LearningRate 0.0325 Epoch: 16 Global Step: 84390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:27:06,336-Speed 10501.04 samples/sec Loss 3.2387 LearningRate 0.0324 Epoch: 16 Global Step: 84400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:27:14,136-Speed 10504.51 samples/sec Loss 3.2424 LearningRate 0.0324 Epoch: 16 Global Step: 84410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:27:21,939-Speed 10500.21 samples/sec Loss 3.2343 LearningRate 0.0324 Epoch: 16 Global Step: 84420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:27:29,728-Speed 10518.50 samples/sec Loss 3.2509 LearningRate 0.0323 Epoch: 16 Global Step: 84430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:27:37,564-Speed 10454.75 samples/sec Loss 3.2644 LearningRate 0.0323 Epoch: 16 Global Step: 84440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:27:45,350-Speed 10523.17 samples/sec Loss 3.2644 LearningRate 0.0323 Epoch: 16 Global Step: 84450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:27:53,142-Speed 10515.76 samples/sec Loss 3.2624 LearningRate 0.0322 Epoch: 16 Global Step: 84460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:28:00,934-Speed 10515.01 samples/sec Loss 3.2404 LearningRate 0.0322 Epoch: 16 Global Step: 84470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:28:08,736-Speed 10500.41 samples/sec Loss 3.2223 LearningRate 0.0322 Epoch: 16 Global Step: 84480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:28:16,539-Speed 10499.37 samples/sec Loss 3.2365 LearningRate 0.0321 Epoch: 16 Global Step: 84490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:28:24,353-Speed 10485.84 samples/sec Loss 3.2460 LearningRate 0.0321 Epoch: 16 Global Step: 84500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:28:32,150-Speed 10508.04 samples/sec Loss 3.2405 LearningRate 0.0320 Epoch: 16 Global Step: 84510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:28:39,976-Speed 10468.14 samples/sec Loss 3.2289 LearningRate 0.0320 Epoch: 16 Global Step: 84520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:28:47,789-Speed 10486.24 samples/sec Loss 3.2500 LearningRate 0.0320 Epoch: 16 Global Step: 84530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:28:55,557-Speed 10547.87 samples/sec Loss 3.2582 LearningRate 0.0319 Epoch: 16 Global Step: 84540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:29:03,351-Speed 10511.30 samples/sec Loss 3.2318 LearningRate 0.0319 Epoch: 16 Global Step: 84550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:29:11,141-Speed 10518.39 samples/sec Loss 3.2569 LearningRate 0.0319 Epoch: 16 Global Step: 84560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:29:18,963-Speed 10474.30 samples/sec Loss 3.2003 LearningRate 0.0318 Epoch: 16 Global Step: 84570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:29:26,743-Speed 10531.79 samples/sec Loss 3.2236 LearningRate 0.0318 Epoch: 16 Global Step: 84580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:29:34,533-Speed 10516.41 samples/sec Loss 3.2360 LearningRate 0.0318 Epoch: 16 Global Step: 84590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:29:42,342-Speed 10492.31 samples/sec Loss 3.2295 LearningRate 0.0317 Epoch: 16 Global Step: 84600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:29:50,166-Speed 10471.93 samples/sec Loss 3.1992 LearningRate 0.0317 Epoch: 16 Global Step: 84610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:29:57,985-Speed 10478.99 samples/sec Loss 3.2023 LearningRate 0.0317 Epoch: 16 Global Step: 84620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:30:05,781-Speed 10509.02 samples/sec Loss 3.2018 LearningRate 0.0316 Epoch: 16 Global Step: 84630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:30:13,587-Speed 10495.93 samples/sec Loss 3.2198 LearningRate 0.0316 Epoch: 16 Global Step: 84640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:30:21,367-Speed 10530.60 samples/sec Loss 3.2237 LearningRate 0.0316 Epoch: 16 Global Step: 84650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:30:29,139-Speed 10543.07 samples/sec Loss 3.1977 LearningRate 0.0316 Epoch: 16 Global Step: 84660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:30:36,967-Speed 10466.83 samples/sec Loss 3.2227 LearningRate 0.0315 Epoch: 16 Global Step: 84670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:30:44,769-Speed 10501.67 samples/sec Loss 3.2070 LearningRate 0.0315 Epoch: 16 Global Step: 84680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:30:52,559-Speed 10517.86 samples/sec Loss 3.2284 LearningRate 0.0315 Epoch: 16 Global Step: 84690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:31:00,340-Speed 10530.21 samples/sec Loss 3.2265 LearningRate 0.0314 Epoch: 16 Global Step: 84700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:31:08,119-Speed 10530.54 samples/sec Loss 3.2240 LearningRate 0.0314 Epoch: 16 Global Step: 84710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:31:15,906-Speed 10521.78 samples/sec Loss 3.1771 LearningRate 0.0314 Epoch: 16 Global Step: 84720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:31:23,689-Speed 10528.20 samples/sec Loss 3.2001 LearningRate 0.0313 Epoch: 16 Global Step: 84730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:31:31,486-Speed 10507.98 samples/sec Loss 3.2058 LearningRate 0.0313 Epoch: 16 Global Step: 84740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:31:39,289-Speed 10500.38 samples/sec Loss 3.2015 LearningRate 0.0313 Epoch: 16 Global Step: 84750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:31:47,130-Speed 10449.12 samples/sec Loss 3.1930 LearningRate 0.0312 Epoch: 16 Global Step: 84760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:31:54,939-Speed 10492.24 samples/sec Loss 3.1996 LearningRate 0.0312 Epoch: 16 Global Step: 84770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:32:02,756-Speed 10481.97 samples/sec Loss 3.2463 LearningRate 0.0312 Epoch: 16 Global Step: 84780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:32:10,552-Speed 10508.16 samples/sec Loss 3.2002 LearningRate 0.0311 Epoch: 16 Global Step: 84790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:32:18,401-Speed 10438.14 samples/sec Loss 3.2048 LearningRate 0.0311 Epoch: 16 Global Step: 84800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:32:26,226-Speed 10470.43 samples/sec Loss 3.2073 LearningRate 0.0311 Epoch: 16 Global Step: 84810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:32:34,028-Speed 10502.46 samples/sec Loss 3.1992 LearningRate 0.0310 Epoch: 16 Global Step: 84820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:32:41,827-Speed 10504.58 samples/sec Loss 3.1921 LearningRate 0.0310 Epoch: 16 Global Step: 84830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:32:49,645-Speed 10479.81 samples/sec Loss 3.1777 LearningRate 0.0310 Epoch: 16 Global Step: 84840 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:32:57,458-Speed 10485.78 samples/sec Loss 3.1925 LearningRate 0.0309 Epoch: 16 Global Step: 84850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:33:05,257-Speed 10506.68 samples/sec Loss 3.1921 LearningRate 0.0309 Epoch: 16 Global Step: 84860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:33:13,052-Speed 10509.66 samples/sec Loss 3.1866 LearningRate 0.0309 Epoch: 16 Global Step: 84870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:33:20,839-Speed 10521.11 samples/sec Loss 3.1803 LearningRate 0.0308 Epoch: 16 Global Step: 84880 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:33:28,632-Speed 10513.61 samples/sec Loss 3.1620 LearningRate 0.0308 Epoch: 16 Global Step: 84890 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:33:36,433-Speed 10503.63 samples/sec Loss 3.1800 LearningRate 0.0308 Epoch: 16 Global Step: 84900 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:33:44,225-Speed 10513.28 samples/sec Loss 3.1844 LearningRate 0.0307 Epoch: 16 Global Step: 84910 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:33:52,040-Speed 10486.59 samples/sec Loss 3.1630 LearningRate 0.0307 Epoch: 16 Global Step: 84920 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:33:59,833-Speed 10514.48 samples/sec Loss 3.1367 LearningRate 0.0307 Epoch: 16 Global Step: 84930 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:34:07,640-Speed 10493.42 samples/sec Loss 3.1599 LearningRate 0.0306 Epoch: 16 Global Step: 84940 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:34:15,446-Speed 10496.66 samples/sec Loss 3.1565 LearningRate 0.0306 Epoch: 16 Global Step: 84950 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:34:23,257-Speed 10488.89 samples/sec Loss 3.1867 LearningRate 0.0306 Epoch: 16 Global Step: 84960 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:34:31,051-Speed 10513.30 samples/sec Loss 3.1637 LearningRate 0.0305 Epoch: 16 Global Step: 84970 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:34:38,850-Speed 10505.37 samples/sec Loss 3.1989 LearningRate 0.0305 Epoch: 16 Global Step: 84980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:34:46,647-Speed 10506.97 samples/sec Loss 3.1657 LearningRate 0.0305 Epoch: 16 Global Step: 84990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:34:54,434-Speed 10521.23 samples/sec Loss 3.1589 LearningRate 0.0304 Epoch: 16 Global Step: 85000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:35:02,254-Speed 10482.20 samples/sec Loss 3.1794 LearningRate 0.0304 Epoch: 16 Global Step: 85010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:35:10,079-Speed 10469.79 samples/sec Loss 3.1633 LearningRate 0.0304 Epoch: 16 Global Step: 85020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:35:17,884-Speed 10496.53 samples/sec Loss 3.1485 LearningRate 0.0303 Epoch: 16 Global Step: 85030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:35:25,689-Speed 10497.38 samples/sec Loss 3.1336 LearningRate 0.0303 Epoch: 16 Global Step: 85040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:35:33,484-Speed 10510.53 samples/sec Loss 3.1480 LearningRate 0.0303 Epoch: 16 Global Step: 85050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:35:41,300-Speed 10482.94 samples/sec Loss 3.1662 LearningRate 0.0302 Epoch: 16 Global Step: 85060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:35:49,097-Speed 10508.23 samples/sec Loss 3.1405 LearningRate 0.0302 Epoch: 16 Global Step: 85070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:35:56,911-Speed 10484.13 samples/sec Loss 3.1416 LearningRate 0.0302 Epoch: 16 Global Step: 85080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:36:04,725-Speed 10488.62 samples/sec Loss 3.1660 LearningRate 0.0301 Epoch: 16 Global Step: 85090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:36:12,533-Speed 10492.33 samples/sec Loss 3.1594 LearningRate 0.0301 Epoch: 16 Global Step: 85100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:36:20,338-Speed 10497.50 samples/sec Loss 3.1601 LearningRate 0.0301 Epoch: 16 Global Step: 85110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:36:28,145-Speed 10494.99 samples/sec Loss 3.1525 LearningRate 0.0300 Epoch: 16 Global Step: 85120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:36:35,974-Speed 10465.48 samples/sec Loss 3.1504 LearningRate 0.0300 Epoch: 16 Global Step: 85130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:36:43,760-Speed 10522.80 samples/sec Loss 3.1360 LearningRate 0.0300 Epoch: 16 Global Step: 85140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:36:51,564-Speed 10500.92 samples/sec Loss 3.1602 LearningRate 0.0299 Epoch: 16 Global Step: 85150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:36:59,369-Speed 10503.23 samples/sec Loss 3.1538 LearningRate 0.0299 Epoch: 16 Global Step: 85160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:37:07,147-Speed 10534.26 samples/sec Loss 3.1328 LearningRate 0.0299 Epoch: 16 Global Step: 85170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:37:14,938-Speed 10516.14 samples/sec Loss 3.1325 LearningRate 0.0298 Epoch: 16 Global Step: 85180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:37:22,713-Speed 10538.15 samples/sec Loss 3.1604 LearningRate 0.0298 Epoch: 16 Global Step: 85190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:37:30,542-Speed 10464.00 samples/sec Loss 3.1545 LearningRate 0.0298 Epoch: 16 Global Step: 85200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:37:38,383-Speed 10450.74 samples/sec Loss 3.1482 LearningRate 0.0298 Epoch: 16 Global Step: 85210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:37:46,181-Speed 10506.19 samples/sec Loss 3.1538 LearningRate 0.0297 Epoch: 16 Global Step: 85220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:37:53,994-Speed 10485.13 samples/sec Loss 3.1378 LearningRate 0.0297 Epoch: 16 Global Step: 85230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:38:01,813-Speed 10478.59 samples/sec Loss 3.1058 LearningRate 0.0297 Epoch: 16 Global Step: 85240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:38:09,614-Speed 10503.92 samples/sec Loss 3.1249 LearningRate 0.0296 Epoch: 16 Global Step: 85250 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:38:17,404-Speed 10516.90 samples/sec Loss 3.1089 LearningRate 0.0296 Epoch: 16 Global Step: 85260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:38:25,195-Speed 10519.16 samples/sec Loss 3.1226 LearningRate 0.0296 Epoch: 16 Global Step: 85270 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:38:32,992-Speed 10508.19 samples/sec Loss 3.1069 LearningRate 0.0295 Epoch: 16 Global Step: 85280 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:38:40,835-Speed 10447.74 samples/sec Loss 3.1259 LearningRate 0.0295 Epoch: 16 Global Step: 85290 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:38:48,644-Speed 10491.78 samples/sec Loss 3.1364 LearningRate 0.0295 Epoch: 16 Global Step: 85300 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:38:56,454-Speed 10489.91 samples/sec Loss 3.1472 LearningRate 0.0294 Epoch: 16 Global Step: 85310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:39:04,299-Speed 10443.07 samples/sec Loss 3.1279 LearningRate 0.0294 Epoch: 16 Global Step: 85320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:39:12,127-Speed 10467.47 samples/sec Loss 3.1406 LearningRate 0.0294 Epoch: 16 Global Step: 85330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:39:19,943-Speed 10486.73 samples/sec Loss 3.1509 LearningRate 0.0293 Epoch: 16 Global Step: 85340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:39:27,810-Speed 10413.16 samples/sec Loss 3.1415 LearningRate 0.0293 Epoch: 16 Global Step: 85350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:39:35,602-Speed 10515.58 samples/sec Loss 3.1262 LearningRate 0.0293 Epoch: 16 Global Step: 85360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:39:43,417-Speed 10483.94 samples/sec Loss 3.1162 LearningRate 0.0292 Epoch: 16 Global Step: 85370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:39:51,221-Speed 10498.00 samples/sec Loss 3.1098 LearningRate 0.0292 Epoch: 16 Global Step: 85380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:39:59,034-Speed 10487.64 samples/sec Loss 3.1097 LearningRate 0.0292 Epoch: 16 Global Step: 85390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:40:06,818-Speed 10524.61 samples/sec Loss 3.1449 LearningRate 0.0291 Epoch: 16 Global Step: 85400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:40:14,618-Speed 10506.94 samples/sec Loss 3.1056 LearningRate 0.0291 Epoch: 16 Global Step: 85410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:40:22,453-Speed 10458.00 samples/sec Loss 3.0964 LearningRate 0.0291 Epoch: 16 Global Step: 85420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:40:30,267-Speed 10485.25 samples/sec Loss 3.1075 LearningRate 0.0290 Epoch: 16 Global Step: 85430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:40:38,057-Speed 10517.92 samples/sec Loss 3.0993 LearningRate 0.0290 Epoch: 16 Global Step: 85440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:40:45,861-Speed 10498.25 samples/sec Loss 3.1041 LearningRate 0.0290 Epoch: 16 Global Step: 85450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:40:53,674-Speed 10486.76 samples/sec Loss 3.1133 LearningRate 0.0290 Epoch: 16 Global Step: 85460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:41:01,481-Speed 10494.69 samples/sec Loss 3.0962 LearningRate 0.0289 Epoch: 16 Global Step: 85470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:41:09,297-Speed 10481.83 samples/sec Loss 3.0980 LearningRate 0.0289 Epoch: 16 Global Step: 85480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:41:17,107-Speed 10490.70 samples/sec Loss 3.0912 LearningRate 0.0289 Epoch: 16 Global Step: 85490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:41:24,911-Speed 10499.35 samples/sec Loss 3.1082 LearningRate 0.0288 Epoch: 16 Global Step: 85500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:41:32,704-Speed 10513.00 samples/sec Loss 3.1243 LearningRate 0.0288 Epoch: 16 Global Step: 85510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:41:40,491-Speed 10521.11 samples/sec Loss 3.1023 LearningRate 0.0288 Epoch: 16 Global Step: 85520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:41:48,293-Speed 10500.82 samples/sec Loss 3.1271 LearningRate 0.0287 Epoch: 16 Global Step: 85530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:41:56,105-Speed 10488.83 samples/sec Loss 3.0951 LearningRate 0.0287 Epoch: 16 Global Step: 85540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:42:03,891-Speed 10522.87 samples/sec Loss 3.1131 LearningRate 0.0287 Epoch: 16 Global Step: 85550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:42:11,705-Speed 10484.33 samples/sec Loss 3.0978 LearningRate 0.0286 Epoch: 16 Global Step: 85560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:42:19,473-Speed 10547.82 samples/sec Loss 3.0651 LearningRate 0.0286 Epoch: 16 Global Step: 85570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:42:27,285-Speed 10487.54 samples/sec Loss 3.0937 LearningRate 0.0286 Epoch: 16 Global Step: 85580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:42:35,079-Speed 10512.33 samples/sec Loss 3.0824 LearningRate 0.0285 Epoch: 16 Global Step: 85590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:42:42,870-Speed 10515.52 samples/sec Loss 3.0751 LearningRate 0.0285 Epoch: 16 Global Step: 85600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:42:50,660-Speed 10517.64 samples/sec Loss 3.0830 LearningRate 0.0285 Epoch: 16 Global Step: 85610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:42:58,445-Speed 10523.92 samples/sec Loss 3.0727 LearningRate 0.0284 Epoch: 16 Global Step: 85620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:43:06,235-Speed 10518.29 samples/sec Loss 3.0624 LearningRate 0.0284 Epoch: 16 Global Step: 85630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:43:14,023-Speed 10520.66 samples/sec Loss 3.0688 LearningRate 0.0284 Epoch: 16 Global Step: 85640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:43:21,803-Speed 10529.93 samples/sec Loss 3.0671 LearningRate 0.0284 Epoch: 16 Global Step: 85650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:43:29,616-Speed 10487.39 samples/sec Loss 3.0942 LearningRate 0.0283 Epoch: 16 Global Step: 85660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:43:37,392-Speed 10536.03 samples/sec Loss 3.0844 LearningRate 0.0283 Epoch: 16 Global Step: 85670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:43:45,183-Speed 10515.69 samples/sec Loss 3.0434 LearningRate 0.0283 Epoch: 16 Global Step: 85680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:43:52,985-Speed 10501.47 samples/sec Loss 3.0887 LearningRate 0.0282 Epoch: 16 Global Step: 85690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:44:00,770-Speed 10524.78 samples/sec Loss 3.0690 LearningRate 0.0282 Epoch: 16 Global Step: 85700 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:44:08,588-Speed 10480.15 samples/sec Loss 3.0870 LearningRate 0.0282 Epoch: 16 Global Step: 85710 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:44:16,380-Speed 10514.49 samples/sec Loss 3.0801 LearningRate 0.0281 Epoch: 16 Global Step: 85720 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:44:24,161-Speed 10529.93 samples/sec Loss 3.0731 LearningRate 0.0281 Epoch: 16 Global Step: 85730 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:44:31,938-Speed 10535.14 samples/sec Loss 3.0978 LearningRate 0.0281 Epoch: 16 Global Step: 85740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:44:39,732-Speed 10512.05 samples/sec Loss 3.0545 LearningRate 0.0280 Epoch: 16 Global Step: 85750 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:44:47,513-Speed 10529.54 samples/sec Loss 3.0458 LearningRate 0.0280 Epoch: 16 Global Step: 85760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:44:55,306-Speed 10512.02 samples/sec Loss 3.1056 LearningRate 0.0280 Epoch: 16 Global Step: 85770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:45:03,093-Speed 10522.42 samples/sec Loss 3.0977 LearningRate 0.0279 Epoch: 16 Global Step: 85780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:45:10,876-Speed 10525.97 samples/sec Loss 3.0620 LearningRate 0.0279 Epoch: 16 Global Step: 85790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:45:18,681-Speed 10497.08 samples/sec Loss 3.0835 LearningRate 0.0279 Epoch: 16 Global Step: 85800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:45:26,484-Speed 10500.84 samples/sec Loss 3.0712 LearningRate 0.0279 Epoch: 16 Global Step: 85810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:45:34,260-Speed 10541.84 samples/sec Loss 3.0554 LearningRate 0.0278 Epoch: 16 Global Step: 85820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:45:42,043-Speed 10532.07 samples/sec Loss 3.0469 LearningRate 0.0278 Epoch: 16 Global Step: 85830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:45:49,824-Speed 10529.53 samples/sec Loss 3.0332 LearningRate 0.0278 Epoch: 16 Global Step: 85840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:45:57,642-Speed 10480.06 samples/sec Loss 3.0488 LearningRate 0.0277 Epoch: 16 Global Step: 85850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:46:05,433-Speed 10515.80 samples/sec Loss 3.0597 LearningRate 0.0277 Epoch: 16 Global Step: 85860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:46:13,226-Speed 10514.22 samples/sec Loss 3.0605 LearningRate 0.0277 Epoch: 16 Global Step: 85870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:46:21,009-Speed 10526.61 samples/sec Loss 3.0626 LearningRate 0.0276 Epoch: 16 Global Step: 85880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:46:28,800-Speed 10515.21 samples/sec Loss 3.0667 LearningRate 0.0276 Epoch: 16 Global Step: 85890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:46:36,581-Speed 10530.17 samples/sec Loss 3.0426 LearningRate 0.0276 Epoch: 16 Global Step: 85900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:46:44,374-Speed 10513.01 samples/sec Loss 3.0514 LearningRate 0.0275 Epoch: 16 Global Step: 85910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:46:52,152-Speed 10534.01 samples/sec Loss 3.0509 LearningRate 0.0275 Epoch: 16 Global Step: 85920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:46:59,935-Speed 10526.61 samples/sec Loss 3.0789 LearningRate 0.0275 Epoch: 16 Global Step: 85930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:47:07,719-Speed 10525.46 samples/sec Loss 3.0649 LearningRate 0.0274 Epoch: 16 Global Step: 85940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:47:15,532-Speed 10485.82 samples/sec Loss 3.0613 LearningRate 0.0274 Epoch: 16 Global Step: 85950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:47:23,340-Speed 10493.85 samples/sec Loss 3.0211 LearningRate 0.0274 Epoch: 16 Global Step: 85960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:47:31,147-Speed 10493.92 samples/sec Loss 3.0247 LearningRate 0.0274 Epoch: 16 Global Step: 85970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:47:38,941-Speed 10512.06 samples/sec Loss 3.0454 LearningRate 0.0273 Epoch: 16 Global Step: 85980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:47:46,747-Speed 10496.43 samples/sec Loss 3.0226 LearningRate 0.0273 Epoch: 16 Global Step: 85990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:47:54,536-Speed 10518.87 samples/sec Loss 3.0236 LearningRate 0.0273 Epoch: 16 Global Step: 86000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:48:02,353-Speed 10481.94 samples/sec Loss 3.0392 LearningRate 0.0272 Epoch: 16 Global Step: 86010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:48:10,168-Speed 10484.05 samples/sec Loss 3.0079 LearningRate 0.0272 Epoch: 16 Global Step: 86020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:48:17,963-Speed 10510.63 samples/sec Loss 3.0577 LearningRate 0.0272 Epoch: 16 Global Step: 86030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:48:25,744-Speed 10529.77 samples/sec Loss 3.0192 LearningRate 0.0271 Epoch: 16 Global Step: 86040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:48:33,566-Speed 10473.53 samples/sec Loss 3.0256 LearningRate 0.0271 Epoch: 16 Global Step: 86050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:48:41,387-Speed 10477.51 samples/sec Loss 3.0345 LearningRate 0.0271 Epoch: 16 Global Step: 86060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:48:49,221-Speed 10458.23 samples/sec Loss 3.0358 LearningRate 0.0270 Epoch: 16 Global Step: 86070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:48:57,047-Speed 10469.32 samples/sec Loss 3.0340 LearningRate 0.0270 Epoch: 16 Global Step: 86080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:49:04,868-Speed 10475.20 samples/sec Loss 3.0208 LearningRate 0.0270 Epoch: 16 Global Step: 86090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:49:12,638-Speed 10543.40 samples/sec Loss 2.9830 LearningRate 0.0270 Epoch: 16 Global Step: 86100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:49:20,421-Speed 10528.27 samples/sec Loss 3.0356 LearningRate 0.0269 Epoch: 16 Global Step: 86110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:49:28,208-Speed 10521.08 samples/sec Loss 3.0143 LearningRate 0.0269 Epoch: 16 Global Step: 86120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:49:36,004-Speed 10510.24 samples/sec Loss 3.0030 LearningRate 0.0269 Epoch: 16 Global Step: 86130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:49:43,763-Speed 10559.14 samples/sec Loss 2.9737 LearningRate 0.0268 Epoch: 16 Global Step: 86140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:49:51,544-Speed 10531.50 samples/sec Loss 3.0049 LearningRate 0.0268 Epoch: 16 Global Step: 86150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:49:59,328-Speed 10524.81 samples/sec Loss 2.9847 LearningRate 0.0268 Epoch: 16 Global Step: 86160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:50:07,117-Speed 10519.45 samples/sec Loss 3.0256 LearningRate 0.0267 Epoch: 16 Global Step: 86170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:50:14,912-Speed 10510.55 samples/sec Loss 2.9954 LearningRate 0.0267 Epoch: 16 Global Step: 86180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:50:22,696-Speed 10524.48 samples/sec Loss 3.0110 LearningRate 0.0267 Epoch: 16 Global Step: 86190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:50:30,477-Speed 10530.14 samples/sec Loss 2.9891 LearningRate 0.0266 Epoch: 16 Global Step: 86200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:50:38,305-Speed 10467.11 samples/sec Loss 3.0046 LearningRate 0.0266 Epoch: 16 Global Step: 86210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:50:46,109-Speed 10498.05 samples/sec Loss 3.0305 LearningRate 0.0266 Epoch: 16 Global Step: 86220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:50:53,913-Speed 10498.99 samples/sec Loss 3.0017 LearningRate 0.0266 Epoch: 16 Global Step: 86230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:51:01,707-Speed 10511.53 samples/sec Loss 2.9915 LearningRate 0.0265 Epoch: 16 Global Step: 86240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:51:09,475-Speed 10547.69 samples/sec Loss 2.9802 LearningRate 0.0265 Epoch: 16 Global Step: 86250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:51:17,260-Speed 10523.74 samples/sec Loss 2.9774 LearningRate 0.0265 Epoch: 16 Global Step: 86260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:51:25,034-Speed 10537.96 samples/sec Loss 3.0012 LearningRate 0.0264 Epoch: 16 Global Step: 86270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:51:32,827-Speed 10513.72 samples/sec Loss 2.9864 LearningRate 0.0264 Epoch: 16 Global Step: 86280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:51:40,630-Speed 10505.28 samples/sec Loss 3.0044 LearningRate 0.0264 Epoch: 16 Global Step: 86290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:51:48,428-Speed 10507.89 samples/sec Loss 2.9720 LearningRate 0.0263 Epoch: 16 Global Step: 86300 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:51:56,239-Speed 10487.83 samples/sec Loss 2.9978 LearningRate 0.0263 Epoch: 16 Global Step: 86310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:52:04,084-Speed 10443.81 samples/sec Loss 2.9621 LearningRate 0.0263 Epoch: 16 Global Step: 86320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:52:11,895-Speed 10490.60 samples/sec Loss 2.9658 LearningRate 0.0263 Epoch: 16 Global Step: 86330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:52:19,738-Speed 10446.19 samples/sec Loss 2.9875 LearningRate 0.0262 Epoch: 16 Global Step: 86340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:52:27,543-Speed 10496.17 samples/sec Loss 2.9860 LearningRate 0.0262 Epoch: 16 Global Step: 86350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:52:35,330-Speed 10521.46 samples/sec Loss 2.9840 LearningRate 0.0262 Epoch: 16 Global Step: 86360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:52:43,099-Speed 10546.85 samples/sec Loss 2.9982 LearningRate 0.0261 Epoch: 16 Global Step: 86370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:52:50,889-Speed 10517.50 samples/sec Loss 2.9831 LearningRate 0.0261 Epoch: 16 Global Step: 86380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:52:58,691-Speed 10501.17 samples/sec Loss 3.0005 LearningRate 0.0261 Epoch: 16 Global Step: 86390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:53:06,472-Speed 10532.49 samples/sec Loss 2.9625 LearningRate 0.0260 Epoch: 16 Global Step: 86400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:53:14,255-Speed 10527.91 samples/sec Loss 3.0002 LearningRate 0.0260 Epoch: 16 Global Step: 86410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:53:22,040-Speed 10523.60 samples/sec Loss 3.0037 LearningRate 0.0260 Epoch: 16 Global Step: 86420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:53:29,889-Speed 10437.61 samples/sec Loss 2.9454 LearningRate 0.0260 Epoch: 16 Global Step: 86430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:53:37,695-Speed 10499.17 samples/sec Loss 2.9668 LearningRate 0.0259 Epoch: 16 Global Step: 86440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:53:45,510-Speed 10483.42 samples/sec Loss 2.9694 LearningRate 0.0259 Epoch: 16 Global Step: 86450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:53:53,354-Speed 10445.44 samples/sec Loss 2.9675 LearningRate 0.0259 Epoch: 16 Global Step: 86460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:54:01,179-Speed 10471.10 samples/sec Loss 2.9654 LearningRate 0.0258 Epoch: 16 Global Step: 86470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:54:08,983-Speed 10498.23 samples/sec Loss 2.9505 LearningRate 0.0258 Epoch: 16 Global Step: 86480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:54:16,789-Speed 10495.92 samples/sec Loss 2.9702 LearningRate 0.0258 Epoch: 16 Global Step: 86490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:54:24,619-Speed 10463.36 samples/sec Loss 2.9564 LearningRate 0.0257 Epoch: 16 Global Step: 86500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:54:32,408-Speed 10519.32 samples/sec Loss 2.9561 LearningRate 0.0257 Epoch: 16 Global Step: 86510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:54:40,215-Speed 10495.04 samples/sec Loss 2.9509 LearningRate 0.0257 Epoch: 16 Global Step: 86520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 09:54:48,007-Speed 10514.12 samples/sec Loss 3.0091 LearningRate 0.0257 Epoch: 16 Global Step: 86530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:54:55,853-Speed 10443.18 samples/sec Loss 2.9661 LearningRate 0.0256 Epoch: 16 Global Step: 86540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:55:03,661-Speed 10492.85 samples/sec Loss 2.9621 LearningRate 0.0256 Epoch: 16 Global Step: 86550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:55:11,469-Speed 10493.97 samples/sec Loss 2.9604 LearningRate 0.0256 Epoch: 16 Global Step: 86560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:55:19,309-Speed 10449.73 samples/sec Loss 2.9802 LearningRate 0.0255 Epoch: 16 Global Step: 86570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:55:27,103-Speed 10512.42 samples/sec Loss 2.9497 LearningRate 0.0255 Epoch: 16 Global Step: 86580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:55:34,889-Speed 10523.38 samples/sec Loss 2.9264 LearningRate 0.0255 Epoch: 16 Global Step: 86590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:55:42,669-Speed 10531.36 samples/sec Loss 2.9226 LearningRate 0.0254 Epoch: 16 Global Step: 86600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:55:50,445-Speed 10536.05 samples/sec Loss 2.9464 LearningRate 0.0254 Epoch: 16 Global Step: 86610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:55:58,220-Speed 10537.75 samples/sec Loss 2.9431 LearningRate 0.0254 Epoch: 16 Global Step: 86620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:56:06,006-Speed 10522.49 samples/sec Loss 2.9430 LearningRate 0.0254 Epoch: 16 Global Step: 86630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:56:13,818-Speed 10488.34 samples/sec Loss 2.9013 LearningRate 0.0253 Epoch: 16 Global Step: 86640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:56:21,604-Speed 10523.20 samples/sec Loss 2.9530 LearningRate 0.0253 Epoch: 16 Global Step: 86650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:56:29,408-Speed 10498.59 samples/sec Loss 2.9189 LearningRate 0.0253 Epoch: 16 Global Step: 86660 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:56:37,212-Speed 10498.77 samples/sec Loss 2.9857 LearningRate 0.0252 Epoch: 16 Global Step: 86670 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:56:45,005-Speed 10513.14 samples/sec Loss 2.9511 LearningRate 0.0252 Epoch: 16 Global Step: 86680 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:56:52,789-Speed 10525.33 samples/sec Loss 2.9320 LearningRate 0.0252 Epoch: 16 Global Step: 86690 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:57:00,610-Speed 10475.78 samples/sec Loss 2.9258 LearningRate 0.0251 Epoch: 16 Global Step: 86700 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:57:08,424-Speed 10485.68 samples/sec Loss 2.9511 LearningRate 0.0251 Epoch: 16 Global Step: 86710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:57:16,230-Speed 10495.63 samples/sec Loss 2.9312 LearningRate 0.0251 Epoch: 16 Global Step: 86720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:57:24,020-Speed 10516.91 samples/sec Loss 2.9158 LearningRate 0.0251 Epoch: 16 Global Step: 86730 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:57:31,846-Speed 10469.62 samples/sec Loss 2.9253 LearningRate 0.0250 Epoch: 16 Global Step: 86740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:57:39,661-Speed 10483.21 samples/sec Loss 2.8981 LearningRate 0.0250 Epoch: 16 Global Step: 86750 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:57:47,459-Speed 10506.61 samples/sec Loss 2.9211 LearningRate 0.0250 Epoch: 16 Global Step: 86760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:57:55,265-Speed 10496.23 samples/sec Loss 2.9290 LearningRate 0.0249 Epoch: 16 Global Step: 86770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:58:03,071-Speed 10495.94 samples/sec Loss 2.9185 LearningRate 0.0249 Epoch: 16 Global Step: 86780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:58:10,864-Speed 10513.97 samples/sec Loss 2.9131 LearningRate 0.0249 Epoch: 16 Global Step: 86790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:58:18,682-Speed 10479.36 samples/sec Loss 2.9194 LearningRate 0.0248 Epoch: 16 Global Step: 86800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:58:26,484-Speed 10502.47 samples/sec Loss 2.9161 LearningRate 0.0248 Epoch: 16 Global Step: 86810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:58:34,304-Speed 10477.23 samples/sec Loss 2.9265 LearningRate 0.0248 Epoch: 16 Global Step: 86820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 09:58:42,145-Speed 10447.98 samples/sec Loss 2.8967 LearningRate 0.0248 Epoch: 16 Global Step: 86830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:58:49,962-Speed 10481.61 samples/sec Loss 2.9234 LearningRate 0.0247 Epoch: 16 Global Step: 86840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:58:57,767-Speed 10497.20 samples/sec Loss 2.8972 LearningRate 0.0247 Epoch: 16 Global Step: 86850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:59:05,570-Speed 10500.75 samples/sec Loss 2.8989 LearningRate 0.0247 Epoch: 16 Global Step: 86860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:59:13,366-Speed 10508.27 samples/sec Loss 2.9003 LearningRate 0.0246 Epoch: 16 Global Step: 86870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:59:21,174-Speed 10494.45 samples/sec Loss 2.9178 LearningRate 0.0246 Epoch: 16 Global Step: 86880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:59:28,989-Speed 10483.24 samples/sec Loss 2.9234 LearningRate 0.0246 Epoch: 16 Global Step: 86890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:59:36,763-Speed 10539.68 samples/sec Loss 2.9098 LearningRate 0.0246 Epoch: 16 Global Step: 86900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:59:44,582-Speed 10479.06 samples/sec Loss 2.9010 LearningRate 0.0245 Epoch: 16 Global Step: 86910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 09:59:52,376-Speed 10510.76 samples/sec Loss 2.9159 LearningRate 0.0245 Epoch: 16 Global Step: 86920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:00:00,163-Speed 10521.76 samples/sec Loss 2.8777 LearningRate 0.0245 Epoch: 16 Global Step: 86930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 10:00:07,934-Speed 10543.71 samples/sec Loss 2.8936 LearningRate 0.0244 Epoch: 16 Global Step: 86940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:00:15,722-Speed 10520.10 samples/sec Loss 2.9160 LearningRate 0.0244 Epoch: 16 Global Step: 86950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:00:23,511-Speed 10518.16 samples/sec Loss 2.9066 LearningRate 0.0244 Epoch: 16 Global Step: 86960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:00:31,304-Speed 10513.55 samples/sec Loss 2.8962 LearningRate 0.0244 Epoch: 16 Global Step: 86970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:00:39,120-Speed 10483.62 samples/sec Loss 2.8987 LearningRate 0.0243 Epoch: 16 Global Step: 86980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:00:46,925-Speed 10496.93 samples/sec Loss 2.8961 LearningRate 0.0243 Epoch: 16 Global Step: 86990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:00:54,755-Speed 10464.25 samples/sec Loss 2.9015 LearningRate 0.0243 Epoch: 16 Global Step: 87000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:01:02,552-Speed 10508.55 samples/sec Loss 2.9002 LearningRate 0.0242 Epoch: 16 Global Step: 87010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:01:10,352-Speed 10503.85 samples/sec Loss 2.9128 LearningRate 0.0242 Epoch: 16 Global Step: 87020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:01:18,167-Speed 10484.49 samples/sec Loss 2.8904 LearningRate 0.0242 Epoch: 16 Global Step: 87030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:01:25,981-Speed 10485.17 samples/sec Loss 2.9218 LearningRate 0.0241 Epoch: 16 Global Step: 87040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 10:01:33,765-Speed 10526.82 samples/sec Loss 2.8869 LearningRate 0.0241 Epoch: 16 Global Step: 87050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 10:01:41,573-Speed 10492.40 samples/sec Loss 2.8606 LearningRate 0.0241 Epoch: 16 Global Step: 87060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:01:49,422-Speed 10440.07 samples/sec Loss 2.9029 LearningRate 0.0241 Epoch: 16 Global Step: 87070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:01:57,247-Speed 10469.35 samples/sec Loss 2.8804 LearningRate 0.0240 Epoch: 16 Global Step: 87080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:02:05,055-Speed 10494.11 samples/sec Loss 2.8868 LearningRate 0.0240 Epoch: 16 Global Step: 87090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:02:12,845-Speed 10519.77 samples/sec Loss 2.8905 LearningRate 0.0240 Epoch: 16 Global Step: 87100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:02:20,632-Speed 10521.34 samples/sec Loss 2.8783 LearningRate 0.0239 Epoch: 16 Global Step: 87110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:02:28,417-Speed 10524.66 samples/sec Loss 2.8980 LearningRate 0.0239 Epoch: 16 Global Step: 87120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:02:36,244-Speed 10468.14 samples/sec Loss 2.8862 LearningRate 0.0239 Epoch: 16 Global Step: 87130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:02:44,040-Speed 10508.87 samples/sec Loss 2.8554 LearningRate 0.0239 Epoch: 16 Global Step: 87140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:02:51,882-Speed 10446.74 samples/sec Loss 2.8552 LearningRate 0.0238 Epoch: 16 Global Step: 87150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:02:59,709-Speed 10468.38 samples/sec Loss 2.8735 LearningRate 0.0238 Epoch: 16 Global Step: 87160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 10:03:07,529-Speed 10477.46 samples/sec Loss 2.8673 LearningRate 0.0238 Epoch: 16 Global Step: 87170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:03:15,361-Speed 10460.11 samples/sec Loss 2.8933 LearningRate 0.0237 Epoch: 16 Global Step: 87180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:03:23,185-Speed 10472.69 samples/sec Loss 2.8559 LearningRate 0.0237 Epoch: 16 Global Step: 87190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:03:30,990-Speed 10498.18 samples/sec Loss 2.8761 LearningRate 0.0237 Epoch: 16 Global Step: 87200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:03:38,787-Speed 10507.58 samples/sec Loss 2.8718 LearningRate 0.0237 Epoch: 16 Global Step: 87210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:03:46,597-Speed 10489.67 samples/sec Loss 2.8664 LearningRate 0.0236 Epoch: 16 Global Step: 87220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:03:54,422-Speed 10474.61 samples/sec Loss 2.8477 LearningRate 0.0236 Epoch: 16 Global Step: 87230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:04:02,235-Speed 10488.27 samples/sec Loss 2.8699 LearningRate 0.0236 Epoch: 16 Global Step: 87240 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 10:04:10,038-Speed 10499.05 samples/sec Loss 2.8400 LearningRate 0.0235 Epoch: 16 Global Step: 87250 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 10:04:17,884-Speed 10442.10 samples/sec Loss 2.8915 LearningRate 0.0235 Epoch: 16 Global Step: 87260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 10:04:25,720-Speed 10455.77 samples/sec Loss 2.8701 LearningRate 0.0235 Epoch: 16 Global Step: 87270 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 10:04:33,502-Speed 10530.11 samples/sec Loss 2.8728 LearningRate 0.0235 Epoch: 16 Global Step: 87280 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 10:04:41,296-Speed 10511.94 samples/sec Loss 2.8755 LearningRate 0.0234 Epoch: 16 Global Step: 87290 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 10:04:49,095-Speed 10504.85 samples/sec Loss 2.8720 LearningRate 0.0234 Epoch: 16 Global Step: 87300 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 10:04:56,897-Speed 10502.26 samples/sec Loss 2.8383 LearningRate 0.0234 Epoch: 16 Global Step: 87310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 10:05:04,684-Speed 10521.31 samples/sec Loss 2.8133 LearningRate 0.0233 Epoch: 16 Global Step: 87320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 10:05:12,501-Speed 10481.56 samples/sec Loss 2.8652 LearningRate 0.0233 Epoch: 16 Global Step: 87330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-01-16 10:05:20,309-Speed 10492.68 samples/sec Loss 2.8215 LearningRate 0.0233 Epoch: 16 Global Step: 87340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:05:28,210-Speed 10369.96 samples/sec Loss 2.8465 LearningRate 0.0233 Epoch: 16 Global Step: 87350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:05:36,007-Speed 10508.43 samples/sec Loss 2.8578 LearningRate 0.0232 Epoch: 16 Global Step: 87360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:05:43,818-Speed 10491.18 samples/sec Loss 2.8438 LearningRate 0.0232 Epoch: 16 Global Step: 87370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:05:51,624-Speed 10494.99 samples/sec Loss 2.8375 LearningRate 0.0232 Epoch: 16 Global Step: 87380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:05:59,418-Speed 10512.29 samples/sec Loss 2.8453 LearningRate 0.0231 Epoch: 16 Global Step: 87390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:06:07,227-Speed 10492.58 samples/sec Loss 2.8410 LearningRate 0.0231 Epoch: 16 Global Step: 87400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:06:15,045-Speed 10478.42 samples/sec Loss 2.8401 LearningRate 0.0231 Epoch: 16 Global Step: 87410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:06:22,894-Speed 10439.78 samples/sec Loss 2.8240 LearningRate 0.0231 Epoch: 16 Global Step: 87420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:06:30,728-Speed 10457.17 samples/sec Loss 2.8341 LearningRate 0.0230 Epoch: 16 Global Step: 87430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:06:38,530-Speed 10501.61 samples/sec Loss 2.8560 LearningRate 0.0230 Epoch: 16 Global Step: 87440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 10:06:46,336-Speed 10496.57 samples/sec Loss 2.8359 LearningRate 0.0230 Epoch: 16 Global Step: 87450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:06:54,131-Speed 10511.80 samples/sec Loss 2.8578 LearningRate 0.0229 Epoch: 16 Global Step: 87460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:07:01,940-Speed 10492.11 samples/sec Loss 2.8581 LearningRate 0.0229 Epoch: 16 Global Step: 87470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:07:09,734-Speed 10510.85 samples/sec Loss 2.8207 LearningRate 0.0229 Epoch: 16 Global Step: 87480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:07:17,533-Speed 10504.90 samples/sec Loss 2.8292 LearningRate 0.0229 Epoch: 16 Global Step: 87490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:07:25,350-Speed 10481.59 samples/sec Loss 2.8379 LearningRate 0.0228 Epoch: 16 Global Step: 87500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:07:33,134-Speed 10526.42 samples/sec Loss 2.8009 LearningRate 0.0228 Epoch: 16 Global Step: 87510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:07:40,964-Speed 10463.57 samples/sec Loss 2.8227 LearningRate 0.0228 Epoch: 16 Global Step: 87520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:07:48,803-Speed 10452.13 samples/sec Loss 2.8192 LearningRate 0.0227 Epoch: 16 Global Step: 87530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:07:56,613-Speed 10490.54 samples/sec Loss 2.8225 LearningRate 0.0227 Epoch: 16 Global Step: 87540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:08:04,458-Speed 10443.83 samples/sec Loss 2.8258 LearningRate 0.0227 Epoch: 16 Global Step: 87550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 10:08:12,283-Speed 10470.70 samples/sec Loss 2.8135 LearningRate 0.0227 Epoch: 16 Global Step: 87560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 10:08:20,084-Speed 10502.57 samples/sec Loss 2.7968 LearningRate 0.0226 Epoch: 16 Global Step: 87570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 10:08:27,903-Speed 10479.27 samples/sec Loss 2.8390 LearningRate 0.0226 Epoch: 16 Global Step: 87580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:08:35,709-Speed 10495.70 samples/sec Loss 2.7973 LearningRate 0.0226 Epoch: 16 Global Step: 87590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:08:43,519-Speed 10491.10 samples/sec Loss 2.7957 LearningRate 0.0226 Epoch: 16 Global Step: 87600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:08:51,336-Speed 10480.56 samples/sec Loss 2.8151 LearningRate 0.0225 Epoch: 16 Global Step: 87610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:08:59,153-Speed 10481.34 samples/sec Loss 2.8152 LearningRate 0.0225 Epoch: 16 Global Step: 87620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:09:06,957-Speed 10498.66 samples/sec Loss 2.8006 LearningRate 0.0225 Epoch: 16 Global Step: 87630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:09:14,748-Speed 10515.31 samples/sec Loss 2.8180 LearningRate 0.0224 Epoch: 16 Global Step: 87640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:09:22,546-Speed 10506.91 samples/sec Loss 2.8054 LearningRate 0.0224 Epoch: 16 Global Step: 87650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:09:30,334-Speed 10521.19 samples/sec Loss 2.8006 LearningRate 0.0224 Epoch: 16 Global Step: 87660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:09:38,133-Speed 10506.09 samples/sec Loss 2.8132 LearningRate 0.0224 Epoch: 16 Global Step: 87670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:09:45,942-Speed 10491.83 samples/sec Loss 2.8074 LearningRate 0.0223 Epoch: 16 Global Step: 87680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-01-16 10:09:53,749-Speed 10494.08 samples/sec Loss 2.7963 LearningRate 0.0223 Epoch: 16 Global Step: 87690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:10:01,564-Speed 10484.97 samples/sec Loss 2.7932 LearningRate 0.0223 Epoch: 16 Global Step: 87700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:10:09,369-Speed 10497.14 samples/sec Loss 2.8178 LearningRate 0.0222 Epoch: 16 Global Step: 87710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:10:17,153-Speed 10524.91 samples/sec Loss 2.7855 LearningRate 0.0222 Epoch: 16 Global Step: 87720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:10:24,962-Speed 10492.43 samples/sec Loss 2.7935 LearningRate 0.0222 Epoch: 16 Global Step: 87730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:10:32,752-Speed 10517.28 samples/sec Loss 2.7695 LearningRate 0.0222 Epoch: 16 Global Step: 87740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:10:40,561-Speed 10492.66 samples/sec Loss 2.7894 LearningRate 0.0221 Epoch: 16 Global Step: 87750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:10:48,387-Speed 10468.40 samples/sec Loss 2.8214 LearningRate 0.0221 Epoch: 16 Global Step: 87760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-01-16 10:10:56,182-Speed 10516.09 samples/sec Loss 2.8041 LearningRate 0.0221 Epoch: 16 Global Step: 87770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:11:03,987-Speed 10498.18 samples/sec Loss 2.7909 LearningRate 0.0220 Epoch: 16 Global Step: 87780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:11:11,767-Speed 10531.44 samples/sec Loss 2.7754 LearningRate 0.0220 Epoch: 16 Global Step: 87790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:11:19,566-Speed 10503.93 samples/sec Loss 2.7827 LearningRate 0.0220 Epoch: 16 Global Step: 87800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:11:27,369-Speed 10499.89 samples/sec Loss 2.7944 LearningRate 0.0220 Epoch: 16 Global Step: 87810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:11:35,199-Speed 10464.77 samples/sec Loss 2.7906 LearningRate 0.0219 Epoch: 16 Global Step: 87820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:11:42,992-Speed 10513.93 samples/sec Loss 2.7931 LearningRate 0.0219 Epoch: 16 Global Step: 87830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:11:50,796-Speed 10498.25 samples/sec Loss 2.7938 LearningRate 0.0219 Epoch: 16 Global Step: 87840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:11:58,572-Speed 10536.55 samples/sec Loss 2.8035 LearningRate 0.0219 Epoch: 16 Global Step: 87850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:12:06,377-Speed 10501.35 samples/sec Loss 2.7905 LearningRate 0.0218 Epoch: 16 Global Step: 87860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:12:14,178-Speed 10502.46 samples/sec Loss 2.7837 LearningRate 0.0218 Epoch: 16 Global Step: 87870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:12:21,998-Speed 10477.50 samples/sec Loss 2.7903 LearningRate 0.0218 Epoch: 16 Global Step: 87880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:12:29,793-Speed 10510.78 samples/sec Loss 2.7663 LearningRate 0.0217 Epoch: 16 Global Step: 87890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:12:37,590-Speed 10507.40 samples/sec Loss 2.7909 LearningRate 0.0217 Epoch: 16 Global Step: 87900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:12:45,389-Speed 10506.02 samples/sec Loss 2.8026 LearningRate 0.0217 Epoch: 16 Global Step: 87910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:12:53,206-Speed 10486.19 samples/sec Loss 2.7649 LearningRate 0.0217 Epoch: 16 Global Step: 87920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:13:00,988-Speed 10527.33 samples/sec Loss 2.7726 LearningRate 0.0216 Epoch: 16 Global Step: 87930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:13:08,809-Speed 10477.06 samples/sec Loss 2.7778 LearningRate 0.0216 Epoch: 16 Global Step: 87940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:13:16,639-Speed 10464.09 samples/sec Loss 2.7658 LearningRate 0.0216 Epoch: 16 Global Step: 87950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:13:24,432-Speed 10513.15 samples/sec Loss 2.7621 LearningRate 0.0216 Epoch: 16 Global Step: 87960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:13:32,214-Speed 10527.57 samples/sec Loss 2.7608 LearningRate 0.0215 Epoch: 16 Global Step: 87970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:13:39,990-Speed 10537.29 samples/sec Loss 2.7835 LearningRate 0.0215 Epoch: 16 Global Step: 87980 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-16 10:13:47,779-Speed 10519.17 samples/sec Loss 2.7762 LearningRate 0.0215 Epoch: 16 Global Step: 87990 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-16 10:13:55,592-Speed 10486.70 samples/sec Loss 2.7422 LearningRate 0.0214 Epoch: 16 Global Step: 88000 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-16 10:14:03,395-Speed 10499.28 samples/sec Loss 2.7756 LearningRate 0.0214 Epoch: 16 Global Step: 88010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:14:11,170-Speed 10538.17 samples/sec Loss 2.7700 LearningRate 0.0214 Epoch: 16 Global Step: 88020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:14:18,971-Speed 10502.68 samples/sec Loss 2.7593 LearningRate 0.0214 Epoch: 16 Global Step: 88030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:14:26,787-Speed 10482.51 samples/sec Loss 2.7541 LearningRate 0.0213 Epoch: 16 Global Step: 88040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:14:34,589-Speed 10501.49 samples/sec Loss 2.7587 LearningRate 0.0213 Epoch: 16 Global Step: 88050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:14:42,422-Speed 10459.47 samples/sec Loss 2.7839 LearningRate 0.0213 Epoch: 16 Global Step: 88060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:14:50,226-Speed 10498.36 samples/sec Loss 2.7471 LearningRate 0.0213 Epoch: 16 Global Step: 88070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:14:58,020-Speed 10512.11 samples/sec Loss 2.7617 LearningRate 0.0212 Epoch: 16 Global Step: 88080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:15:05,805-Speed 10524.40 samples/sec Loss 2.7181 LearningRate 0.0212 Epoch: 16 Global Step: 88090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:15:13,606-Speed 10501.36 samples/sec Loss 2.7524 LearningRate 0.0212 Epoch: 16 Global Step: 88100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:15:21,413-Speed 10494.91 samples/sec Loss 2.7562 LearningRate 0.0211 Epoch: 16 Global Step: 88110 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-16 10:15:29,270-Speed 10428.55 samples/sec Loss 2.7553 LearningRate 0.0211 Epoch: 16 Global Step: 88120 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-16 10:15:37,102-Speed 10461.06 samples/sec Loss 2.7560 LearningRate 0.0211 Epoch: 16 Global Step: 88130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:15:44,887-Speed 10523.67 samples/sec Loss 2.7777 LearningRate 0.0211 Epoch: 16 Global Step: 88140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:16:07,405-Speed 3638.07 samples/sec Loss 2.8004 LearningRate 0.0210 Epoch: 17 Global Step: 88150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:16:15,171-Speed 10550.69 samples/sec Loss 2.7609 LearningRate 0.0210 Epoch: 17 Global Step: 88160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:16:22,944-Speed 10540.73 samples/sec Loss 2.7338 LearningRate 0.0210 Epoch: 17 Global Step: 88170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:16:30,736-Speed 10515.97 samples/sec Loss 2.7665 LearningRate 0.0210 Epoch: 17 Global Step: 88180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:16:38,521-Speed 10522.95 samples/sec Loss 2.7015 LearningRate 0.0209 Epoch: 17 Global Step: 88190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:16:46,331-Speed 10491.80 samples/sec Loss 2.7395 LearningRate 0.0209 Epoch: 17 Global Step: 88200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:16:54,115-Speed 10525.17 samples/sec Loss 2.7075 LearningRate 0.0209 Epoch: 17 Global Step: 88210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:17:01,946-Speed 10462.19 samples/sec Loss 2.7362 LearningRate 0.0208 Epoch: 17 Global Step: 88220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:17:09,730-Speed 10526.30 samples/sec Loss 2.7108 LearningRate 0.0208 Epoch: 17 Global Step: 88230 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-16 10:17:17,561-Speed 10462.32 samples/sec Loss 2.7311 LearningRate 0.0208 Epoch: 17 Global Step: 88240 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-16 10:17:25,360-Speed 10505.22 samples/sec Loss 2.7251 LearningRate 0.0208 Epoch: 17 Global Step: 88250 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-16 10:17:33,183-Speed 10473.16 samples/sec Loss 2.7281 LearningRate 0.0207 Epoch: 17 Global Step: 88260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:17:40,988-Speed 10498.19 samples/sec Loss 2.7197 LearningRate 0.0207 Epoch: 17 Global Step: 88270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:17:48,830-Speed 10447.82 samples/sec Loss 2.7130 LearningRate 0.0207 Epoch: 17 Global Step: 88280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:17:56,616-Speed 10522.66 samples/sec Loss 2.7086 LearningRate 0.0207 Epoch: 17 Global Step: 88290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:18:04,409-Speed 10513.60 samples/sec Loss 2.7242 LearningRate 0.0206 Epoch: 17 Global Step: 88300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:18:12,221-Speed 10488.64 samples/sec Loss 2.7311 LearningRate 0.0206 Epoch: 17 Global Step: 88310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:18:19,996-Speed 10537.64 samples/sec Loss 2.7004 LearningRate 0.0206 Epoch: 17 Global Step: 88320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:18:27,779-Speed 10526.32 samples/sec Loss 2.7070 LearningRate 0.0205 Epoch: 17 Global Step: 88330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:18:35,577-Speed 10506.75 samples/sec Loss 2.7118 LearningRate 0.0205 Epoch: 17 Global Step: 88340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:18:43,387-Speed 10491.19 samples/sec Loss 2.7128 LearningRate 0.0205 Epoch: 17 Global Step: 88350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:18:51,184-Speed 10507.88 samples/sec Loss 2.7002 LearningRate 0.0205 Epoch: 17 Global Step: 88360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:18:59,025-Speed 10448.74 samples/sec Loss 2.7060 LearningRate 0.0204 Epoch: 17 Global Step: 88370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:19:06,846-Speed 10474.84 samples/sec Loss 2.7021 LearningRate 0.0204 Epoch: 17 Global Step: 88380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:19:14,633-Speed 10522.74 samples/sec Loss 2.7095 LearningRate 0.0204 Epoch: 17 Global Step: 88390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:19:22,408-Speed 10539.43 samples/sec Loss 2.7101 LearningRate 0.0204 Epoch: 17 Global Step: 88400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:19:30,225-Speed 10482.03 samples/sec Loss 2.6870 LearningRate 0.0203 Epoch: 17 Global Step: 88410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:19:38,003-Speed 10533.06 samples/sec Loss 2.7248 LearningRate 0.0203 Epoch: 17 Global Step: 88420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:19:45,783-Speed 10531.39 samples/sec Loss 2.7151 LearningRate 0.0203 Epoch: 17 Global Step: 88430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:19:53,582-Speed 10504.85 samples/sec Loss 2.6873 LearningRate 0.0203 Epoch: 17 Global Step: 88440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:20:01,385-Speed 10499.65 samples/sec Loss 2.7308 LearningRate 0.0202 Epoch: 17 Global Step: 88450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:20:09,223-Speed 10452.67 samples/sec Loss 2.6989 LearningRate 0.0202 Epoch: 17 Global Step: 88460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:20:17,050-Speed 10467.82 samples/sec Loss 2.7239 LearningRate 0.0202 Epoch: 17 Global Step: 88470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:20:24,873-Speed 10472.73 samples/sec Loss 2.7095 LearningRate 0.0201 Epoch: 17 Global Step: 88480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:20:32,724-Speed 10435.84 samples/sec Loss 2.7003 LearningRate 0.0201 Epoch: 17 Global Step: 88490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:20:40,550-Speed 10469.53 samples/sec Loss 2.7013 LearningRate 0.0201 Epoch: 17 Global Step: 88500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:20:48,410-Speed 10424.12 samples/sec Loss 2.6644 LearningRate 0.0201 Epoch: 17 Global Step: 88510 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-16 10:20:56,250-Speed 10450.53 samples/sec Loss 2.7002 LearningRate 0.0200 Epoch: 17 Global Step: 88520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:21:04,086-Speed 10455.30 samples/sec Loss 2.7143 LearningRate 0.0200 Epoch: 17 Global Step: 88530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:21:11,931-Speed 10444.01 samples/sec Loss 2.6696 LearningRate 0.0200 Epoch: 17 Global Step: 88540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:21:19,781-Speed 10437.27 samples/sec Loss 2.6830 LearningRate 0.0200 Epoch: 17 Global Step: 88550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:21:27,609-Speed 10465.62 samples/sec Loss 2.6995 LearningRate 0.0199 Epoch: 17 Global Step: 88560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:21:35,442-Speed 10459.33 samples/sec Loss 2.6702 LearningRate 0.0199 Epoch: 17 Global Step: 88570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:21:43,276-Speed 10458.46 samples/sec Loss 2.6744 LearningRate 0.0199 Epoch: 17 Global Step: 88580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:21:51,093-Speed 10481.25 samples/sec Loss 2.6966 LearningRate 0.0199 Epoch: 17 Global Step: 88590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:21:58,944-Speed 10436.11 samples/sec Loss 2.6845 LearningRate 0.0198 Epoch: 17 Global Step: 88600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:22:06,763-Speed 10478.30 samples/sec Loss 2.6880 LearningRate 0.0198 Epoch: 17 Global Step: 88610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:22:14,594-Speed 10462.96 samples/sec Loss 2.6906 LearningRate 0.0198 Epoch: 17 Global Step: 88620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:22:22,426-Speed 10461.48 samples/sec Loss 2.6817 LearningRate 0.0198 Epoch: 17 Global Step: 88630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:22:30,272-Speed 10441.98 samples/sec Loss 2.6844 LearningRate 0.0197 Epoch: 17 Global Step: 88640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:22:38,125-Speed 10433.79 samples/sec Loss 2.6728 LearningRate 0.0197 Epoch: 17 Global Step: 88650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:22:45,988-Speed 10419.28 samples/sec Loss 2.6934 LearningRate 0.0197 Epoch: 17 Global Step: 88660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:22:53,816-Speed 10466.15 samples/sec Loss 2.6678 LearningRate 0.0196 Epoch: 17 Global Step: 88670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:23:01,673-Speed 10427.43 samples/sec Loss 2.6816 LearningRate 0.0196 Epoch: 17 Global Step: 88680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:23:09,523-Speed 10438.18 samples/sec Loss 2.6788 LearningRate 0.0196 Epoch: 17 Global Step: 88690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:23:17,360-Speed 10454.30 samples/sec Loss 2.6754 LearningRate 0.0196 Epoch: 17 Global Step: 88700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:23:25,228-Speed 10414.38 samples/sec Loss 2.6715 LearningRate 0.0195 Epoch: 17 Global Step: 88710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:23:33,069-Speed 10449.16 samples/sec Loss 2.6495 LearningRate 0.0195 Epoch: 17 Global Step: 88720 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-16 10:23:40,903-Speed 10459.43 samples/sec Loss 2.6562 LearningRate 0.0195 Epoch: 17 Global Step: 88730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:23:48,747-Speed 10444.91 samples/sec Loss 2.6848 LearningRate 0.0195 Epoch: 17 Global Step: 88740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:23:56,593-Speed 10441.92 samples/sec Loss 2.6345 LearningRate 0.0194 Epoch: 17 Global Step: 88750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:24:04,422-Speed 10465.23 samples/sec Loss 2.6582 LearningRate 0.0194 Epoch: 17 Global Step: 88760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:24:12,241-Speed 10477.87 samples/sec Loss 2.6580 LearningRate 0.0194 Epoch: 17 Global Step: 88770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:24:20,080-Speed 10452.21 samples/sec Loss 2.6945 LearningRate 0.0194 Epoch: 17 Global Step: 88780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:24:27,939-Speed 10423.85 samples/sec Loss 2.6460 LearningRate 0.0193 Epoch: 17 Global Step: 88790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:24:35,784-Speed 10446.40 samples/sec Loss 2.6284 LearningRate 0.0193 Epoch: 17 Global Step: 88800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:24:43,650-Speed 10415.53 samples/sec Loss 2.6326 LearningRate 0.0193 Epoch: 17 Global Step: 88810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:24:51,477-Speed 10468.47 samples/sec Loss 2.6216 LearningRate 0.0193 Epoch: 17 Global Step: 88820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:24:59,276-Speed 10503.71 samples/sec Loss 2.6619 LearningRate 0.0192 Epoch: 17 Global Step: 88830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:25:07,110-Speed 10459.28 samples/sec Loss 2.6581 LearningRate 0.0192 Epoch: 17 Global Step: 88840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:25:14,952-Speed 10447.23 samples/sec Loss 2.6541 LearningRate 0.0192 Epoch: 17 Global Step: 88850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:25:22,781-Speed 10465.08 samples/sec Loss 2.6486 LearningRate 0.0192 Epoch: 17 Global Step: 88860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:25:30,646-Speed 10417.34 samples/sec Loss 2.6172 LearningRate 0.0191 Epoch: 17 Global Step: 88870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:25:38,486-Speed 10450.41 samples/sec Loss 2.6659 LearningRate 0.0191 Epoch: 17 Global Step: 88880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:25:46,311-Speed 10470.79 samples/sec Loss 2.6536 LearningRate 0.0191 Epoch: 17 Global Step: 88890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:25:54,123-Speed 10488.23 samples/sec Loss 2.6362 LearningRate 0.0191 Epoch: 17 Global Step: 88900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:26:01,903-Speed 10531.73 samples/sec Loss 2.6340 LearningRate 0.0190 Epoch: 17 Global Step: 88910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:26:09,674-Speed 10542.41 samples/sec Loss 2.6323 LearningRate 0.0190 Epoch: 17 Global Step: 88920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:26:17,478-Speed 10498.43 samples/sec Loss 2.6260 LearningRate 0.0190 Epoch: 17 Global Step: 88930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:26:25,275-Speed 10508.22 samples/sec Loss 2.6299 LearningRate 0.0189 Epoch: 17 Global Step: 88940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:26:33,053-Speed 10533.78 samples/sec Loss 2.6149 LearningRate 0.0189 Epoch: 17 Global Step: 88950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:26:40,839-Speed 10522.44 samples/sec Loss 2.6421 LearningRate 0.0189 Epoch: 17 Global Step: 88960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:26:48,627-Speed 10519.18 samples/sec Loss 2.6404 LearningRate 0.0189 Epoch: 17 Global Step: 88970 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-16 10:26:56,442-Speed 10484.86 samples/sec Loss 2.6360 LearningRate 0.0188 Epoch: 17 Global Step: 88980 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-16 10:27:04,223-Speed 10529.60 samples/sec Loss 2.6433 LearningRate 0.0188 Epoch: 17 Global Step: 88990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:27:12,014-Speed 10516.51 samples/sec Loss 2.6081 LearningRate 0.0188 Epoch: 17 Global Step: 89000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:27:19,792-Speed 10533.14 samples/sec Loss 2.6282 LearningRate 0.0188 Epoch: 17 Global Step: 89010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:27:27,604-Speed 10487.73 samples/sec Loss 2.6523 LearningRate 0.0187 Epoch: 17 Global Step: 89020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:27:35,408-Speed 10498.35 samples/sec Loss 2.6237 LearningRate 0.0187 Epoch: 17 Global Step: 89030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:27:43,208-Speed 10504.91 samples/sec Loss 2.6193 LearningRate 0.0187 Epoch: 17 Global Step: 89040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:27:51,010-Speed 10500.99 samples/sec Loss 2.6176 LearningRate 0.0187 Epoch: 17 Global Step: 89050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:27:58,805-Speed 10510.81 samples/sec Loss 2.6288 LearningRate 0.0186 Epoch: 17 Global Step: 89060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:28:06,606-Speed 10502.49 samples/sec Loss 2.6244 LearningRate 0.0186 Epoch: 17 Global Step: 89070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:28:14,410-Speed 10498.09 samples/sec Loss 2.5954 LearningRate 0.0186 Epoch: 17 Global Step: 89080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:28:22,221-Speed 10489.69 samples/sec Loss 2.6184 LearningRate 0.0186 Epoch: 17 Global Step: 89090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:28:30,028-Speed 10495.05 samples/sec Loss 2.6232 LearningRate 0.0185 Epoch: 17 Global Step: 89100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:28:37,839-Speed 10488.81 samples/sec Loss 2.6114 LearningRate 0.0185 Epoch: 17 Global Step: 89110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:28:45,661-Speed 10473.27 samples/sec Loss 2.6311 LearningRate 0.0185 Epoch: 17 Global Step: 89120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:28:53,458-Speed 10508.22 samples/sec Loss 2.5935 LearningRate 0.0185 Epoch: 17 Global Step: 89130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:29:01,285-Speed 10468.65 samples/sec Loss 2.5981 LearningRate 0.0184 Epoch: 17 Global Step: 89140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:29:09,136-Speed 10435.14 samples/sec Loss 2.6073 LearningRate 0.0184 Epoch: 17 Global Step: 89150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:29:16,900-Speed 10552.64 samples/sec Loss 2.5943 LearningRate 0.0184 Epoch: 17 Global Step: 89160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:29:24,680-Speed 10530.26 samples/sec Loss 2.5902 LearningRate 0.0184 Epoch: 17 Global Step: 89170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:29:32,470-Speed 10518.75 samples/sec Loss 2.5705 LearningRate 0.0183 Epoch: 17 Global Step: 89180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:29:40,246-Speed 10537.09 samples/sec Loss 2.6106 LearningRate 0.0183 Epoch: 17 Global Step: 89190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:29:48,062-Speed 10481.90 samples/sec Loss 2.5871 LearningRate 0.0183 Epoch: 17 Global Step: 89200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:29:55,844-Speed 10527.82 samples/sec Loss 2.6005 LearningRate 0.0183 Epoch: 17 Global Step: 89210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:30:03,648-Speed 10499.30 samples/sec Loss 2.5942 LearningRate 0.0182 Epoch: 17 Global Step: 89220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:30:11,469-Speed 10476.13 samples/sec Loss 2.6099 LearningRate 0.0182 Epoch: 17 Global Step: 89230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:30:19,269-Speed 10502.66 samples/sec Loss 2.6080 LearningRate 0.0182 Epoch: 17 Global Step: 89240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:30:27,074-Speed 10498.01 samples/sec Loss 2.5784 LearningRate 0.0182 Epoch: 17 Global Step: 89250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:30:34,882-Speed 10492.77 samples/sec Loss 2.6141 LearningRate 0.0181 Epoch: 17 Global Step: 89260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:30:42,672-Speed 10517.03 samples/sec Loss 2.6155 LearningRate 0.0181 Epoch: 17 Global Step: 89270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:30:50,478-Speed 10495.57 samples/sec Loss 2.6125 LearningRate 0.0181 Epoch: 17 Global Step: 89280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:30:58,276-Speed 10508.18 samples/sec Loss 2.6256 LearningRate 0.0181 Epoch: 17 Global Step: 89290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:31:06,102-Speed 10468.37 samples/sec Loss 2.5870 LearningRate 0.0180 Epoch: 17 Global Step: 89300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:31:13,893-Speed 10515.90 samples/sec Loss 2.5892 LearningRate 0.0180 Epoch: 17 Global Step: 89310 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-16 10:31:21,709-Speed 10482.30 samples/sec Loss 2.5982 LearningRate 0.0180 Epoch: 17 Global Step: 89320 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-16 10:31:29,498-Speed 10519.65 samples/sec Loss 2.6049 LearningRate 0.0180 Epoch: 17 Global Step: 89330 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-16 10:31:37,277-Speed 10531.60 samples/sec Loss 2.6287 LearningRate 0.0179 Epoch: 17 Global Step: 89340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:31:45,104-Speed 10467.31 samples/sec Loss 2.5736 LearningRate 0.0179 Epoch: 17 Global Step: 89350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:31:52,921-Speed 10481.37 samples/sec Loss 2.5643 LearningRate 0.0179 Epoch: 17 Global Step: 89360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:32:00,709-Speed 10521.00 samples/sec Loss 2.5794 LearningRate 0.0179 Epoch: 17 Global Step: 89370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:32:08,491-Speed 10527.95 samples/sec Loss 2.5813 LearningRate 0.0178 Epoch: 17 Global Step: 89380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:32:16,315-Speed 10472.03 samples/sec Loss 2.5953 LearningRate 0.0178 Epoch: 17 Global Step: 89390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:32:24,127-Speed 10487.35 samples/sec Loss 2.5587 LearningRate 0.0178 Epoch: 17 Global Step: 89400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:32:31,931-Speed 10499.71 samples/sec Loss 2.5688 LearningRate 0.0178 Epoch: 17 Global Step: 89410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:32:39,723-Speed 10514.91 samples/sec Loss 2.5672 LearningRate 0.0177 Epoch: 17 Global Step: 89420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:32:47,508-Speed 10523.53 samples/sec Loss 2.5727 LearningRate 0.0177 Epoch: 17 Global Step: 89430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:32:55,304-Speed 10508.19 samples/sec Loss 2.5831 LearningRate 0.0177 Epoch: 17 Global Step: 89440 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-16 10:33:03,099-Speed 10519.54 samples/sec Loss 2.5777 LearningRate 0.0177 Epoch: 17 Global Step: 89450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:33:10,894-Speed 10510.89 samples/sec Loss 2.5455 LearningRate 0.0176 Epoch: 17 Global Step: 89460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:33:18,717-Speed 10472.65 samples/sec Loss 2.5879 LearningRate 0.0176 Epoch: 17 Global Step: 89470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:33:26,516-Speed 10505.47 samples/sec Loss 2.5576 LearningRate 0.0176 Epoch: 17 Global Step: 89480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:33:34,323-Speed 10495.86 samples/sec Loss 2.5791 LearningRate 0.0176 Epoch: 17 Global Step: 89490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:33:42,147-Speed 10471.40 samples/sec Loss 2.5559 LearningRate 0.0175 Epoch: 17 Global Step: 89500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:33:49,941-Speed 10512.83 samples/sec Loss 2.5448 LearningRate 0.0175 Epoch: 17 Global Step: 89510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:33:57,731-Speed 10517.79 samples/sec Loss 2.5435 LearningRate 0.0175 Epoch: 17 Global Step: 89520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:34:05,523-Speed 10514.03 samples/sec Loss 2.5586 LearningRate 0.0175 Epoch: 17 Global Step: 89530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:34:13,308-Speed 10524.91 samples/sec Loss 2.5649 LearningRate 0.0174 Epoch: 17 Global Step: 89540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:34:21,095-Speed 10521.46 samples/sec Loss 2.5650 LearningRate 0.0174 Epoch: 17 Global Step: 89550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:34:28,923-Speed 10465.37 samples/sec Loss 2.5557 LearningRate 0.0174 Epoch: 17 Global Step: 89560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:34:36,773-Speed 10444.15 samples/sec Loss 2.5565 LearningRate 0.0174 Epoch: 17 Global Step: 89570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:34:44,568-Speed 10511.48 samples/sec Loss 2.5623 LearningRate 0.0173 Epoch: 17 Global Step: 89580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:34:52,363-Speed 10510.43 samples/sec Loss 2.5637 LearningRate 0.0173 Epoch: 17 Global Step: 89590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:35:00,195-Speed 10460.70 samples/sec Loss 2.5655 LearningRate 0.0173 Epoch: 17 Global Step: 89600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:35:07,997-Speed 10501.98 samples/sec Loss 2.5583 LearningRate 0.0173 Epoch: 17 Global Step: 89610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:35:15,818-Speed 10475.04 samples/sec Loss 2.5468 LearningRate 0.0172 Epoch: 17 Global Step: 89620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:35:23,633-Speed 10483.75 samples/sec Loss 2.5680 LearningRate 0.0172 Epoch: 17 Global Step: 89630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:35:31,422-Speed 10519.44 samples/sec Loss 2.5487 LearningRate 0.0172 Epoch: 17 Global Step: 89640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:35:39,275-Speed 10432.57 samples/sec Loss 2.5520 LearningRate 0.0172 Epoch: 17 Global Step: 89650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:35:47,082-Speed 10495.54 samples/sec Loss 2.5936 LearningRate 0.0171 Epoch: 17 Global Step: 89660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:35:54,935-Speed 10432.99 samples/sec Loss 2.5657 LearningRate 0.0171 Epoch: 17 Global Step: 89670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:36:02,726-Speed 10515.58 samples/sec Loss 2.5365 LearningRate 0.0171 Epoch: 17 Global Step: 89680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:36:10,520-Speed 10511.22 samples/sec Loss 2.5481 LearningRate 0.0171 Epoch: 17 Global Step: 89690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:36:18,359-Speed 10452.48 samples/sec Loss 2.5289 LearningRate 0.0170 Epoch: 17 Global Step: 89700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:36:26,144-Speed 10527.60 samples/sec Loss 2.5340 LearningRate 0.0170 Epoch: 17 Global Step: 89710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:36:33,951-Speed 10494.90 samples/sec Loss 2.5408 LearningRate 0.0170 Epoch: 17 Global Step: 89720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:36:41,775-Speed 10470.93 samples/sec Loss 2.5543 LearningRate 0.0170 Epoch: 17 Global Step: 89730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:36:49,586-Speed 10490.75 samples/sec Loss 2.5539 LearningRate 0.0169 Epoch: 17 Global Step: 89740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:36:57,370-Speed 10524.46 samples/sec Loss 2.4892 LearningRate 0.0169 Epoch: 17 Global Step: 89750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:37:05,195-Speed 10470.40 samples/sec Loss 2.5505 LearningRate 0.0169 Epoch: 17 Global Step: 89760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:37:12,982-Speed 10522.53 samples/sec Loss 2.5319 LearningRate 0.0169 Epoch: 17 Global Step: 89770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:37:20,774-Speed 10514.31 samples/sec Loss 2.5387 LearningRate 0.0169 Epoch: 17 Global Step: 89780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:37:28,602-Speed 10466.05 samples/sec Loss 2.5361 LearningRate 0.0168 Epoch: 17 Global Step: 89790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:37:36,405-Speed 10504.41 samples/sec Loss 2.5235 LearningRate 0.0168 Epoch: 17 Global Step: 89800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:37:44,191-Speed 10523.28 samples/sec Loss 2.5221 LearningRate 0.0168 Epoch: 17 Global Step: 89810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:37:52,005-Speed 10484.63 samples/sec Loss 2.5140 LearningRate 0.0168 Epoch: 17 Global Step: 89820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:37:59,799-Speed 10512.83 samples/sec Loss 2.5509 LearningRate 0.0167 Epoch: 17 Global Step: 89830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:38:07,610-Speed 10488.54 samples/sec Loss 2.5069 LearningRate 0.0167 Epoch: 17 Global Step: 89840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:38:15,428-Speed 10479.56 samples/sec Loss 2.5158 LearningRate 0.0167 Epoch: 17 Global Step: 89850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:38:23,236-Speed 10493.46 samples/sec Loss 2.5190 LearningRate 0.0167 Epoch: 17 Global Step: 89860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:38:31,039-Speed 10500.46 samples/sec Loss 2.4876 LearningRate 0.0166 Epoch: 17 Global Step: 89870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:38:38,839-Speed 10504.30 samples/sec Loss 2.5198 LearningRate 0.0166 Epoch: 17 Global Step: 89880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:38:46,644-Speed 10496.73 samples/sec Loss 2.5063 LearningRate 0.0166 Epoch: 17 Global Step: 89890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:38:54,467-Speed 10473.57 samples/sec Loss 2.5145 LearningRate 0.0166 Epoch: 17 Global Step: 89900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:39:02,266-Speed 10505.05 samples/sec Loss 2.5193 LearningRate 0.0165 Epoch: 17 Global Step: 89910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:39:10,070-Speed 10498.72 samples/sec Loss 2.5317 LearningRate 0.0165 Epoch: 17 Global Step: 89920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:39:17,878-Speed 10492.80 samples/sec Loss 2.5299 LearningRate 0.0165 Epoch: 17 Global Step: 89930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:39:25,690-Speed 10488.21 samples/sec Loss 2.5409 LearningRate 0.0165 Epoch: 17 Global Step: 89940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:39:33,474-Speed 10525.46 samples/sec Loss 2.5200 LearningRate 0.0164 Epoch: 17 Global Step: 89950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:39:41,305-Speed 10462.21 samples/sec Loss 2.5019 LearningRate 0.0164 Epoch: 17 Global Step: 89960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:39:49,129-Speed 10471.74 samples/sec Loss 2.4828 LearningRate 0.0164 Epoch: 17 Global Step: 89970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:39:56,925-Speed 10508.97 samples/sec Loss 2.5088 LearningRate 0.0164 Epoch: 17 Global Step: 89980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:40:04,747-Speed 10478.75 samples/sec Loss 2.4963 LearningRate 0.0163 Epoch: 17 Global Step: 89990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:40:12,575-Speed 10465.61 samples/sec Loss 2.5064 LearningRate 0.0163 Epoch: 17 Global Step: 90000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:40:40,215-[lfw][90000]XNorm: 23.583664 Training: 2022-01-16 10:40:40,216-[lfw][90000]Accuracy-Flip: 0.99783+-0.00248 Training: 2022-01-16 10:40:40,216-[lfw][90000]Accuracy-Highest: 0.99783 Training: 2022-01-16 10:41:12,288-[cfp_fp][90000]XNorm: 21.463575 Training: 2022-01-16 10:41:12,288-[cfp_fp][90000]Accuracy-Flip: 0.99257+-0.00393 Training: 2022-01-16 10:41:12,288-[cfp_fp][90000]Accuracy-Highest: 0.99257 Training: 2022-01-16 10:41:39,973-[agedb_30][90000]XNorm: 23.186239 Training: 2022-01-16 10:41:39,974-[agedb_30][90000]Accuracy-Flip: 0.98083+-0.00569 Training: 2022-01-16 10:41:39,974-[agedb_30][90000]Accuracy-Highest: 0.98083 Training: 2022-01-16 10:41:47,722-Speed 861.01 samples/sec Loss 2.5005 LearningRate 0.0163 Epoch: 17 Global Step: 90010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:41:55,454-Speed 10596.87 samples/sec Loss 2.5265 LearningRate 0.0163 Epoch: 17 Global Step: 90020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:42:03,209-Speed 10564.52 samples/sec Loss 2.4929 LearningRate 0.0162 Epoch: 17 Global Step: 90030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:42:10,974-Speed 10551.70 samples/sec Loss 2.5398 LearningRate 0.0162 Epoch: 17 Global Step: 90040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:42:18,768-Speed 10511.61 samples/sec Loss 2.5043 LearningRate 0.0162 Epoch: 17 Global Step: 90050 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-16 10:42:26,577-Speed 10491.83 samples/sec Loss 2.4913 LearningRate 0.0162 Epoch: 17 Global Step: 90060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:42:34,345-Speed 10547.27 samples/sec Loss 2.4706 LearningRate 0.0162 Epoch: 17 Global Step: 90070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:42:42,103-Speed 10560.11 samples/sec Loss 2.4965 LearningRate 0.0161 Epoch: 17 Global Step: 90080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:42:49,886-Speed 10527.12 samples/sec Loss 2.4993 LearningRate 0.0161 Epoch: 17 Global Step: 90090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:42:57,635-Speed 10573.37 samples/sec Loss 2.4939 LearningRate 0.0161 Epoch: 17 Global Step: 90100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:43:05,392-Speed 10562.48 samples/sec Loss 2.4855 LearningRate 0.0161 Epoch: 17 Global Step: 90110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:43:13,158-Speed 10549.60 samples/sec Loss 2.4633 LearningRate 0.0160 Epoch: 17 Global Step: 90120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:43:20,939-Speed 10529.41 samples/sec Loss 2.4618 LearningRate 0.0160 Epoch: 17 Global Step: 90130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:43:28,767-Speed 10466.98 samples/sec Loss 2.4715 LearningRate 0.0160 Epoch: 17 Global Step: 90140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:43:36,562-Speed 10511.43 samples/sec Loss 2.4439 LearningRate 0.0160 Epoch: 17 Global Step: 90150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:43:44,332-Speed 10543.06 samples/sec Loss 2.4765 LearningRate 0.0159 Epoch: 17 Global Step: 90160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:43:52,109-Speed 10534.94 samples/sec Loss 2.5049 LearningRate 0.0159 Epoch: 17 Global Step: 90170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:43:59,890-Speed 10530.36 samples/sec Loss 2.4925 LearningRate 0.0159 Epoch: 17 Global Step: 90180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:44:07,660-Speed 10544.56 samples/sec Loss 2.5037 LearningRate 0.0159 Epoch: 17 Global Step: 90190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:44:15,436-Speed 10536.72 samples/sec Loss 2.4735 LearningRate 0.0158 Epoch: 17 Global Step: 90200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:44:23,223-Speed 10521.18 samples/sec Loss 2.4702 LearningRate 0.0158 Epoch: 17 Global Step: 90210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:44:30,981-Speed 10564.62 samples/sec Loss 2.4772 LearningRate 0.0158 Epoch: 17 Global Step: 90220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:44:38,752-Speed 10543.24 samples/sec Loss 2.4805 LearningRate 0.0158 Epoch: 17 Global Step: 90230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:44:46,541-Speed 10517.65 samples/sec Loss 2.4544 LearningRate 0.0158 Epoch: 17 Global Step: 90240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:44:54,312-Speed 10544.25 samples/sec Loss 2.4370 LearningRate 0.0157 Epoch: 17 Global Step: 90250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:45:02,089-Speed 10535.65 samples/sec Loss 2.4481 LearningRate 0.0157 Epoch: 17 Global Step: 90260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:45:09,880-Speed 10515.91 samples/sec Loss 2.4508 LearningRate 0.0157 Epoch: 17 Global Step: 90270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:45:17,700-Speed 10477.28 samples/sec Loss 2.4495 LearningRate 0.0157 Epoch: 17 Global Step: 90280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:45:25,475-Speed 10538.24 samples/sec Loss 2.4576 LearningRate 0.0156 Epoch: 17 Global Step: 90290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:45:33,242-Speed 10549.01 samples/sec Loss 2.4547 LearningRate 0.0156 Epoch: 17 Global Step: 90300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:45:41,007-Speed 10550.83 samples/sec Loss 2.4670 LearningRate 0.0156 Epoch: 17 Global Step: 90310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:45:48,772-Speed 10552.18 samples/sec Loss 2.4722 LearningRate 0.0156 Epoch: 17 Global Step: 90320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:45:56,550-Speed 10533.81 samples/sec Loss 2.4543 LearningRate 0.0155 Epoch: 17 Global Step: 90330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:46:04,317-Speed 10548.44 samples/sec Loss 2.4587 LearningRate 0.0155 Epoch: 17 Global Step: 90340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:46:12,085-Speed 10547.66 samples/sec Loss 2.4196 LearningRate 0.0155 Epoch: 17 Global Step: 90350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:46:19,867-Speed 10527.07 samples/sec Loss 2.4605 LearningRate 0.0155 Epoch: 17 Global Step: 90360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:46:27,655-Speed 10520.72 samples/sec Loss 2.4683 LearningRate 0.0155 Epoch: 17 Global Step: 90370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:46:35,418-Speed 10556.87 samples/sec Loss 2.4651 LearningRate 0.0154 Epoch: 17 Global Step: 90380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:46:43,188-Speed 10544.52 samples/sec Loss 2.4618 LearningRate 0.0154 Epoch: 17 Global Step: 90390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:46:50,977-Speed 10517.76 samples/sec Loss 2.4609 LearningRate 0.0154 Epoch: 17 Global Step: 90400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:46:58,765-Speed 10520.67 samples/sec Loss 2.4615 LearningRate 0.0154 Epoch: 17 Global Step: 90410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:47:06,568-Speed 10500.28 samples/sec Loss 2.4220 LearningRate 0.0153 Epoch: 17 Global Step: 90420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:47:14,334-Speed 10549.81 samples/sec Loss 2.4423 LearningRate 0.0153 Epoch: 17 Global Step: 90430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:47:22,120-Speed 10522.26 samples/sec Loss 2.4345 LearningRate 0.0153 Epoch: 17 Global Step: 90440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:47:29,901-Speed 10530.43 samples/sec Loss 2.4155 LearningRate 0.0153 Epoch: 17 Global Step: 90450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:47:37,665-Speed 10552.52 samples/sec Loss 2.4472 LearningRate 0.0152 Epoch: 17 Global Step: 90460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:47:45,455-Speed 10517.47 samples/sec Loss 2.4297 LearningRate 0.0152 Epoch: 17 Global Step: 90470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:47:53,252-Speed 10508.06 samples/sec Loss 2.4344 LearningRate 0.0152 Epoch: 17 Global Step: 90480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:48:01,050-Speed 10506.94 samples/sec Loss 2.4362 LearningRate 0.0152 Epoch: 17 Global Step: 90490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:48:08,873-Speed 10472.99 samples/sec Loss 2.4427 LearningRate 0.0151 Epoch: 17 Global Step: 90500 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-16 10:48:16,675-Speed 10501.30 samples/sec Loss 2.4245 LearningRate 0.0151 Epoch: 17 Global Step: 90510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:48:24,463-Speed 10520.22 samples/sec Loss 2.4347 LearningRate 0.0151 Epoch: 17 Global Step: 90520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:48:32,234-Speed 10543.91 samples/sec Loss 2.4355 LearningRate 0.0151 Epoch: 17 Global Step: 90530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:48:39,998-Speed 10551.55 samples/sec Loss 2.4275 LearningRate 0.0151 Epoch: 17 Global Step: 90540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:48:47,757-Speed 10559.63 samples/sec Loss 2.4354 LearningRate 0.0150 Epoch: 17 Global Step: 90550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:48:55,531-Speed 10539.71 samples/sec Loss 2.4562 LearningRate 0.0150 Epoch: 17 Global Step: 90560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:49:03,321-Speed 10517.54 samples/sec Loss 2.4325 LearningRate 0.0150 Epoch: 17 Global Step: 90570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:49:11,103-Speed 10528.16 samples/sec Loss 2.4200 LearningRate 0.0150 Epoch: 17 Global Step: 90580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:49:18,877-Speed 10537.63 samples/sec Loss 2.4386 LearningRate 0.0149 Epoch: 17 Global Step: 90590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:49:26,658-Speed 10530.94 samples/sec Loss 2.4231 LearningRate 0.0149 Epoch: 17 Global Step: 90600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:49:34,428-Speed 10543.92 samples/sec Loss 2.4148 LearningRate 0.0149 Epoch: 17 Global Step: 90610 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-16 10:49:42,188-Speed 10558.15 samples/sec Loss 2.4298 LearningRate 0.0149 Epoch: 17 Global Step: 90620 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-16 10:49:49,970-Speed 10528.27 samples/sec Loss 2.4190 LearningRate 0.0149 Epoch: 17 Global Step: 90630 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-16 10:49:57,743-Speed 10541.07 samples/sec Loss 2.4292 LearningRate 0.0148 Epoch: 17 Global Step: 90640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:50:05,532-Speed 10518.39 samples/sec Loss 2.4282 LearningRate 0.0148 Epoch: 17 Global Step: 90650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:50:13,330-Speed 10505.81 samples/sec Loss 2.4092 LearningRate 0.0148 Epoch: 17 Global Step: 90660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:50:21,103-Speed 10540.90 samples/sec Loss 2.4246 LearningRate 0.0148 Epoch: 17 Global Step: 90670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:50:28,895-Speed 10514.15 samples/sec Loss 2.4038 LearningRate 0.0147 Epoch: 17 Global Step: 90680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:50:36,662-Speed 10551.61 samples/sec Loss 2.4094 LearningRate 0.0147 Epoch: 17 Global Step: 90690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:50:44,438-Speed 10535.56 samples/sec Loss 2.4257 LearningRate 0.0147 Epoch: 17 Global Step: 90700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:50:52,213-Speed 10543.47 samples/sec Loss 2.4169 LearningRate 0.0147 Epoch: 17 Global Step: 90710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:51:00,003-Speed 10518.52 samples/sec Loss 2.4411 LearningRate 0.0146 Epoch: 17 Global Step: 90720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:51:07,782-Speed 10532.12 samples/sec Loss 2.4229 LearningRate 0.0146 Epoch: 17 Global Step: 90730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:51:15,593-Speed 10488.85 samples/sec Loss 2.4146 LearningRate 0.0146 Epoch: 17 Global Step: 90740 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-16 10:51:23,379-Speed 10523.37 samples/sec Loss 2.4229 LearningRate 0.0146 Epoch: 17 Global Step: 90750 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-16 10:51:31,194-Speed 10484.43 samples/sec Loss 2.4350 LearningRate 0.0146 Epoch: 17 Global Step: 90760 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-16 10:51:38,994-Speed 10503.59 samples/sec Loss 2.4040 LearningRate 0.0145 Epoch: 17 Global Step: 90770 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-16 10:51:46,777-Speed 10526.18 samples/sec Loss 2.4053 LearningRate 0.0145 Epoch: 17 Global Step: 90780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:51:54,626-Speed 10438.15 samples/sec Loss 2.4238 LearningRate 0.0145 Epoch: 17 Global Step: 90790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:52:02,405-Speed 10532.23 samples/sec Loss 2.4210 LearningRate 0.0145 Epoch: 17 Global Step: 90800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:52:10,201-Speed 10509.67 samples/sec Loss 2.4014 LearningRate 0.0144 Epoch: 17 Global Step: 90810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:52:17,975-Speed 10539.27 samples/sec Loss 2.4064 LearningRate 0.0144 Epoch: 17 Global Step: 90820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:52:25,755-Speed 10532.28 samples/sec Loss 2.3903 LearningRate 0.0144 Epoch: 17 Global Step: 90830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:52:33,539-Speed 10524.82 samples/sec Loss 2.3538 LearningRate 0.0144 Epoch: 17 Global Step: 90840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:52:41,335-Speed 10509.93 samples/sec Loss 2.4091 LearningRate 0.0144 Epoch: 17 Global Step: 90850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:52:49,108-Speed 10540.49 samples/sec Loss 2.3830 LearningRate 0.0143 Epoch: 17 Global Step: 90860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:52:56,898-Speed 10517.70 samples/sec Loss 2.3966 LearningRate 0.0143 Epoch: 17 Global Step: 90870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:53:04,691-Speed 10512.40 samples/sec Loss 2.4218 LearningRate 0.0143 Epoch: 17 Global Step: 90880 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-16 10:53:12,484-Speed 10519.34 samples/sec Loss 2.3581 LearningRate 0.0143 Epoch: 17 Global Step: 90890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:53:20,295-Speed 10489.96 samples/sec Loss 2.3728 LearningRate 0.0142 Epoch: 17 Global Step: 90900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:53:28,094-Speed 10505.04 samples/sec Loss 2.3779 LearningRate 0.0142 Epoch: 17 Global Step: 90910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:53:35,877-Speed 10527.06 samples/sec Loss 2.4117 LearningRate 0.0142 Epoch: 17 Global Step: 90920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:53:43,670-Speed 10512.93 samples/sec Loss 2.3899 LearningRate 0.0142 Epoch: 17 Global Step: 90930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:53:51,438-Speed 10547.45 samples/sec Loss 2.4034 LearningRate 0.0142 Epoch: 17 Global Step: 90940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:53:59,214-Speed 10537.17 samples/sec Loss 2.3710 LearningRate 0.0141 Epoch: 17 Global Step: 90950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:54:06,990-Speed 10534.80 samples/sec Loss 2.3849 LearningRate 0.0141 Epoch: 17 Global Step: 90960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:54:14,816-Speed 10468.92 samples/sec Loss 2.3912 LearningRate 0.0141 Epoch: 17 Global Step: 90970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:54:22,604-Speed 10520.72 samples/sec Loss 2.4018 LearningRate 0.0141 Epoch: 17 Global Step: 90980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:54:30,384-Speed 10531.48 samples/sec Loss 2.3606 LearningRate 0.0140 Epoch: 17 Global Step: 90990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:54:38,170-Speed 10522.77 samples/sec Loss 2.3863 LearningRate 0.0140 Epoch: 17 Global Step: 91000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:54:45,987-Speed 10480.06 samples/sec Loss 2.3933 LearningRate 0.0140 Epoch: 17 Global Step: 91010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:54:53,757-Speed 10545.10 samples/sec Loss 2.3663 LearningRate 0.0140 Epoch: 17 Global Step: 91020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:55:01,535-Speed 10533.91 samples/sec Loss 2.3863 LearningRate 0.0140 Epoch: 17 Global Step: 91030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:55:09,308-Speed 10540.55 samples/sec Loss 2.3856 LearningRate 0.0139 Epoch: 17 Global Step: 91040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:55:17,075-Speed 10547.90 samples/sec Loss 2.3639 LearningRate 0.0139 Epoch: 17 Global Step: 91050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:55:24,865-Speed 10518.38 samples/sec Loss 2.3855 LearningRate 0.0139 Epoch: 17 Global Step: 91060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:55:32,632-Speed 10548.21 samples/sec Loss 2.3716 LearningRate 0.0139 Epoch: 17 Global Step: 91070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:55:40,406-Speed 10537.98 samples/sec Loss 2.3636 LearningRate 0.0138 Epoch: 17 Global Step: 91080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:55:48,213-Speed 10502.07 samples/sec Loss 2.3905 LearningRate 0.0138 Epoch: 17 Global Step: 91090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:55:55,990-Speed 10535.31 samples/sec Loss 2.3595 LearningRate 0.0138 Epoch: 17 Global Step: 91100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:56:03,758-Speed 10546.39 samples/sec Loss 2.3572 LearningRate 0.0138 Epoch: 17 Global Step: 91110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:56:11,526-Speed 10547.01 samples/sec Loss 2.3571 LearningRate 0.0138 Epoch: 17 Global Step: 91120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:56:19,362-Speed 10456.43 samples/sec Loss 2.3783 LearningRate 0.0137 Epoch: 17 Global Step: 91130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:56:27,156-Speed 10512.69 samples/sec Loss 2.3731 LearningRate 0.0137 Epoch: 17 Global Step: 91140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:56:34,936-Speed 10529.98 samples/sec Loss 2.3519 LearningRate 0.0137 Epoch: 17 Global Step: 91150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:56:42,733-Speed 10508.51 samples/sec Loss 2.3559 LearningRate 0.0137 Epoch: 17 Global Step: 91160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:56:50,519-Speed 10522.16 samples/sec Loss 2.3444 LearningRate 0.0136 Epoch: 17 Global Step: 91170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:56:58,315-Speed 10509.75 samples/sec Loss 2.3629 LearningRate 0.0136 Epoch: 17 Global Step: 91180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:57:06,124-Speed 10492.44 samples/sec Loss 2.3670 LearningRate 0.0136 Epoch: 17 Global Step: 91190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:57:13,909-Speed 10524.13 samples/sec Loss 2.3488 LearningRate 0.0136 Epoch: 17 Global Step: 91200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:57:21,690-Speed 10530.55 samples/sec Loss 2.3448 LearningRate 0.0136 Epoch: 17 Global Step: 91210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:57:29,527-Speed 10454.60 samples/sec Loss 2.3266 LearningRate 0.0135 Epoch: 17 Global Step: 91220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:57:37,306-Speed 10532.06 samples/sec Loss 2.3607 LearningRate 0.0135 Epoch: 17 Global Step: 91230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:57:45,108-Speed 10501.08 samples/sec Loss 2.3476 LearningRate 0.0135 Epoch: 17 Global Step: 91240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:57:52,893-Speed 10525.04 samples/sec Loss 2.3607 LearningRate 0.0135 Epoch: 17 Global Step: 91250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:58:00,676-Speed 10527.33 samples/sec Loss 2.3256 LearningRate 0.0135 Epoch: 17 Global Step: 91260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:58:08,464-Speed 10520.07 samples/sec Loss 2.3611 LearningRate 0.0134 Epoch: 17 Global Step: 91270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:58:16,260-Speed 10510.84 samples/sec Loss 2.3506 LearningRate 0.0134 Epoch: 17 Global Step: 91280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:58:24,031-Speed 10542.69 samples/sec Loss 2.3497 LearningRate 0.0134 Epoch: 17 Global Step: 91290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:58:31,822-Speed 10517.49 samples/sec Loss 2.3509 LearningRate 0.0134 Epoch: 17 Global Step: 91300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:58:39,671-Speed 10438.37 samples/sec Loss 2.3411 LearningRate 0.0133 Epoch: 17 Global Step: 91310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:58:47,481-Speed 10489.18 samples/sec Loss 2.3209 LearningRate 0.0133 Epoch: 17 Global Step: 91320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 10:58:55,288-Speed 10494.30 samples/sec Loss 2.3354 LearningRate 0.0133 Epoch: 17 Global Step: 91330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:59:03,082-Speed 10512.63 samples/sec Loss 2.3388 LearningRate 0.0133 Epoch: 17 Global Step: 91340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:59:10,892-Speed 10490.46 samples/sec Loss 2.3409 LearningRate 0.0133 Epoch: 17 Global Step: 91350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:59:18,719-Speed 10467.83 samples/sec Loss 2.3325 LearningRate 0.0132 Epoch: 17 Global Step: 91360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:59:26,516-Speed 10508.91 samples/sec Loss 2.3121 LearningRate 0.0132 Epoch: 17 Global Step: 91370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:59:34,305-Speed 10519.67 samples/sec Loss 2.3270 LearningRate 0.0132 Epoch: 17 Global Step: 91380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:59:42,078-Speed 10539.76 samples/sec Loss 2.3307 LearningRate 0.0132 Epoch: 17 Global Step: 91390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:59:49,851-Speed 10541.80 samples/sec Loss 2.3346 LearningRate 0.0132 Epoch: 17 Global Step: 91400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 10:59:57,639-Speed 10519.17 samples/sec Loss 2.3384 LearningRate 0.0131 Epoch: 17 Global Step: 91410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:00:05,416-Speed 10534.42 samples/sec Loss 2.3028 LearningRate 0.0131 Epoch: 17 Global Step: 91420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:00:13,211-Speed 10511.00 samples/sec Loss 2.3207 LearningRate 0.0131 Epoch: 17 Global Step: 91430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:00:21,013-Speed 10501.97 samples/sec Loss 2.3374 LearningRate 0.0131 Epoch: 17 Global Step: 91440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:00:28,799-Speed 10523.00 samples/sec Loss 2.3072 LearningRate 0.0130 Epoch: 17 Global Step: 91450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:00:36,597-Speed 10506.17 samples/sec Loss 2.3361 LearningRate 0.0130 Epoch: 17 Global Step: 91460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:00:44,425-Speed 10469.51 samples/sec Loss 2.3025 LearningRate 0.0130 Epoch: 17 Global Step: 91470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:00:52,204-Speed 10533.47 samples/sec Loss 2.3430 LearningRate 0.0130 Epoch: 17 Global Step: 91480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:01:00,026-Speed 10472.97 samples/sec Loss 2.2978 LearningRate 0.0130 Epoch: 17 Global Step: 91490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:01:07,818-Speed 10514.21 samples/sec Loss 2.2976 LearningRate 0.0129 Epoch: 17 Global Step: 91500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:01:15,602-Speed 10526.42 samples/sec Loss 2.3223 LearningRate 0.0129 Epoch: 17 Global Step: 91510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:01:23,388-Speed 10523.93 samples/sec Loss 2.3260 LearningRate 0.0129 Epoch: 17 Global Step: 91520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:01:31,206-Speed 10479.13 samples/sec Loss 2.3302 LearningRate 0.0129 Epoch: 17 Global Step: 91530 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-16 11:01:38,997-Speed 10519.35 samples/sec Loss 2.3109 LearningRate 0.0129 Epoch: 17 Global Step: 91540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:01:46,798-Speed 10502.74 samples/sec Loss 2.3278 LearningRate 0.0128 Epoch: 17 Global Step: 91550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:01:54,628-Speed 10463.57 samples/sec Loss 2.3301 LearningRate 0.0128 Epoch: 17 Global Step: 91560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:02:02,418-Speed 10517.56 samples/sec Loss 2.2973 LearningRate 0.0128 Epoch: 17 Global Step: 91570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:02:10,221-Speed 10498.91 samples/sec Loss 2.2991 LearningRate 0.0128 Epoch: 17 Global Step: 91580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:02:18,003-Speed 10529.16 samples/sec Loss 2.2943 LearningRate 0.0127 Epoch: 17 Global Step: 91590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:02:25,794-Speed 10516.68 samples/sec Loss 2.2933 LearningRate 0.0127 Epoch: 17 Global Step: 91600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:02:33,584-Speed 10517.17 samples/sec Loss 2.3046 LearningRate 0.0127 Epoch: 17 Global Step: 91610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:02:41,381-Speed 10508.21 samples/sec Loss 2.3164 LearningRate 0.0127 Epoch: 17 Global Step: 91620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:02:49,173-Speed 10518.58 samples/sec Loss 2.3016 LearningRate 0.0127 Epoch: 17 Global Step: 91630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:02:56,998-Speed 10469.58 samples/sec Loss 2.3053 LearningRate 0.0126 Epoch: 17 Global Step: 91640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:03:04,774-Speed 10537.24 samples/sec Loss 2.3216 LearningRate 0.0126 Epoch: 17 Global Step: 91650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:03:12,564-Speed 10519.87 samples/sec Loss 2.2818 LearningRate 0.0126 Epoch: 17 Global Step: 91660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:03:20,408-Speed 10446.31 samples/sec Loss 2.2999 LearningRate 0.0126 Epoch: 17 Global Step: 91670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:03:28,203-Speed 10509.71 samples/sec Loss 2.2950 LearningRate 0.0126 Epoch: 17 Global Step: 91680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:03:36,003-Speed 10502.97 samples/sec Loss 2.2739 LearningRate 0.0125 Epoch: 17 Global Step: 91690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:03:43,824-Speed 10476.65 samples/sec Loss 2.3099 LearningRate 0.0125 Epoch: 17 Global Step: 91700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:03:51,604-Speed 10531.71 samples/sec Loss 2.3116 LearningRate 0.0125 Epoch: 17 Global Step: 91710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:03:59,463-Speed 10424.03 samples/sec Loss 2.3037 LearningRate 0.0125 Epoch: 17 Global Step: 91720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:04:07,265-Speed 10501.16 samples/sec Loss 2.3256 LearningRate 0.0125 Epoch: 17 Global Step: 91730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:04:15,088-Speed 10474.00 samples/sec Loss 2.2874 LearningRate 0.0124 Epoch: 17 Global Step: 91740 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-16 11:04:22,865-Speed 10535.72 samples/sec Loss 2.2800 LearningRate 0.0124 Epoch: 17 Global Step: 91750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:04:30,654-Speed 10519.72 samples/sec Loss 2.2733 LearningRate 0.0124 Epoch: 17 Global Step: 91760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:04:38,449-Speed 10510.30 samples/sec Loss 2.3020 LearningRate 0.0124 Epoch: 17 Global Step: 91770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:04:46,260-Speed 10488.23 samples/sec Loss 2.2669 LearningRate 0.0124 Epoch: 17 Global Step: 91780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:04:54,072-Speed 10488.35 samples/sec Loss 2.2822 LearningRate 0.0123 Epoch: 17 Global Step: 91790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 11:05:01,873-Speed 10502.93 samples/sec Loss 2.2735 LearningRate 0.0123 Epoch: 17 Global Step: 91800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 11:05:09,686-Speed 10486.06 samples/sec Loss 2.2835 LearningRate 0.0123 Epoch: 17 Global Step: 91810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 11:05:17,479-Speed 10514.01 samples/sec Loss 2.2644 LearningRate 0.0123 Epoch: 17 Global Step: 91820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 11:05:25,274-Speed 10511.15 samples/sec Loss 2.2823 LearningRate 0.0122 Epoch: 17 Global Step: 91830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 11:05:33,046-Speed 10541.52 samples/sec Loss 2.2654 LearningRate 0.0122 Epoch: 17 Global Step: 91840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 11:05:40,850-Speed 10498.79 samples/sec Loss 2.2807 LearningRate 0.0122 Epoch: 17 Global Step: 91850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 11:05:48,641-Speed 10515.77 samples/sec Loss 2.2799 LearningRate 0.0122 Epoch: 17 Global Step: 91860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 11:05:56,452-Speed 10488.04 samples/sec Loss 2.2713 LearningRate 0.0122 Epoch: 17 Global Step: 91870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 11:06:04,248-Speed 10510.20 samples/sec Loss 2.2435 LearningRate 0.0121 Epoch: 17 Global Step: 91880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 11:06:12,079-Speed 10462.09 samples/sec Loss 2.2765 LearningRate 0.0121 Epoch: 17 Global Step: 91890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:06:19,889-Speed 10491.28 samples/sec Loss 2.2696 LearningRate 0.0121 Epoch: 17 Global Step: 91900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:06:27,697-Speed 10492.22 samples/sec Loss 2.2596 LearningRate 0.0121 Epoch: 17 Global Step: 91910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:06:35,510-Speed 10487.18 samples/sec Loss 2.2754 LearningRate 0.0121 Epoch: 17 Global Step: 91920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:06:43,304-Speed 10512.76 samples/sec Loss 2.2652 LearningRate 0.0120 Epoch: 17 Global Step: 91930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:06:51,134-Speed 10464.16 samples/sec Loss 2.2747 LearningRate 0.0120 Epoch: 17 Global Step: 91940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:06:58,951-Speed 10481.70 samples/sec Loss 2.2975 LearningRate 0.0120 Epoch: 17 Global Step: 91950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:07:06,752-Speed 10504.89 samples/sec Loss 2.2415 LearningRate 0.0120 Epoch: 17 Global Step: 91960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:07:14,567-Speed 10483.72 samples/sec Loss 2.2483 LearningRate 0.0120 Epoch: 17 Global Step: 91970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:07:22,410-Speed 10446.22 samples/sec Loss 2.2473 LearningRate 0.0119 Epoch: 17 Global Step: 91980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:07:30,230-Speed 10477.82 samples/sec Loss 2.2547 LearningRate 0.0119 Epoch: 17 Global Step: 91990 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-01-16 11:07:38,012-Speed 10527.53 samples/sec Loss 2.2695 LearningRate 0.0119 Epoch: 17 Global Step: 92000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:07:45,818-Speed 10495.92 samples/sec Loss 2.2307 LearningRate 0.0119 Epoch: 17 Global Step: 92010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:07:53,664-Speed 10444.54 samples/sec Loss 2.2702 LearningRate 0.0119 Epoch: 17 Global Step: 92020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-01-16 11:08:01,445-Speed 10530.04 samples/sec Loss 2.2510 LearningRate 0.0118 Epoch: 17 Global Step: 92030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 11:08:09,246-Speed 10501.92 samples/sec Loss 2.2517 LearningRate 0.0118 Epoch: 17 Global Step: 92040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 11:08:17,069-Speed 10474.52 samples/sec Loss 2.2462 LearningRate 0.0118 Epoch: 17 Global Step: 92050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 11:08:24,924-Speed 10430.53 samples/sec Loss 2.1995 LearningRate 0.0118 Epoch: 17 Global Step: 92060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 11:08:32,723-Speed 10506.21 samples/sec Loss 2.2429 LearningRate 0.0118 Epoch: 17 Global Step: 92070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 11:08:40,528-Speed 10499.14 samples/sec Loss 2.2357 LearningRate 0.0117 Epoch: 17 Global Step: 92080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 11:08:48,310-Speed 10527.10 samples/sec Loss 2.2412 LearningRate 0.0117 Epoch: 17 Global Step: 92090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 11:08:56,103-Speed 10513.14 samples/sec Loss 2.2484 LearningRate 0.0117 Epoch: 17 Global Step: 92100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 11:09:03,886-Speed 10527.02 samples/sec Loss 2.2583 LearningRate 0.0117 Epoch: 17 Global Step: 92110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 11:09:11,665-Speed 10533.07 samples/sec Loss 2.2286 LearningRate 0.0117 Epoch: 17 Global Step: 92120 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-16 11:09:19,468-Speed 10499.18 samples/sec Loss 2.2502 LearningRate 0.0116 Epoch: 17 Global Step: 92130 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-16 11:09:27,286-Speed 10479.76 samples/sec Loss 2.2365 LearningRate 0.0116 Epoch: 17 Global Step: 92140 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-16 11:09:35,073-Speed 10522.73 samples/sec Loss 2.2294 LearningRate 0.0116 Epoch: 17 Global Step: 92150 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-16 11:09:42,863-Speed 10525.20 samples/sec Loss 2.2417 LearningRate 0.0116 Epoch: 17 Global Step: 92160 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-16 11:09:50,645-Speed 10528.06 samples/sec Loss 2.2514 LearningRate 0.0116 Epoch: 17 Global Step: 92170 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-16 11:09:58,427-Speed 10527.39 samples/sec Loss 2.2252 LearningRate 0.0115 Epoch: 17 Global Step: 92180 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-16 11:10:06,242-Speed 10483.99 samples/sec Loss 2.2405 LearningRate 0.0115 Epoch: 17 Global Step: 92190 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-16 11:10:14,068-Speed 10469.67 samples/sec Loss 2.2529 LearningRate 0.0115 Epoch: 17 Global Step: 92200 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-16 11:10:21,891-Speed 10472.65 samples/sec Loss 2.2504 LearningRate 0.0115 Epoch: 17 Global Step: 92210 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-01-16 11:10:29,695-Speed 10498.51 samples/sec Loss 2.2364 LearningRate 0.0115 Epoch: 17 Global Step: 92220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 11:10:37,499-Speed 10501.87 samples/sec Loss 2.2282 LearningRate 0.0114 Epoch: 17 Global Step: 92230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 11:10:45,306-Speed 10495.58 samples/sec Loss 2.2491 LearningRate 0.0114 Epoch: 17 Global Step: 92240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 11:10:53,091-Speed 10524.17 samples/sec Loss 2.2179 LearningRate 0.0114 Epoch: 17 Global Step: 92250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 11:11:00,879-Speed 10519.66 samples/sec Loss 2.2171 LearningRate 0.0114 Epoch: 17 Global Step: 92260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 11:11:08,674-Speed 10510.95 samples/sec Loss 2.2442 LearningRate 0.0114 Epoch: 17 Global Step: 92270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 11:11:16,513-Speed 10451.76 samples/sec Loss 2.2326 LearningRate 0.0113 Epoch: 17 Global Step: 92280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 11:11:24,327-Speed 10484.85 samples/sec Loss 2.2577 LearningRate 0.0113 Epoch: 17 Global Step: 92290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 11:11:32,130-Speed 10499.90 samples/sec Loss 2.2497 LearningRate 0.0113 Epoch: 17 Global Step: 92300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 11:11:39,936-Speed 10496.42 samples/sec Loss 2.2217 LearningRate 0.0113 Epoch: 17 Global Step: 92310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-01-16 11:11:47,729-Speed 10513.65 samples/sec Loss 2.2043 LearningRate 0.0113 Epoch: 17 Global Step: 92320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:11:55,532-Speed 10499.57 samples/sec Loss 2.1998 LearningRate 0.0112 Epoch: 17 Global Step: 92330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:12:03,335-Speed 10500.66 samples/sec Loss 2.1954 LearningRate 0.0112 Epoch: 17 Global Step: 92340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:12:11,127-Speed 10515.19 samples/sec Loss 2.2285 LearningRate 0.0112 Epoch: 17 Global Step: 92350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:12:18,924-Speed 10507.17 samples/sec Loss 2.2133 LearningRate 0.0112 Epoch: 17 Global Step: 92360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:12:26,725-Speed 10503.02 samples/sec Loss 2.2218 LearningRate 0.0112 Epoch: 17 Global Step: 92370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:12:34,524-Speed 10504.62 samples/sec Loss 2.2196 LearningRate 0.0111 Epoch: 17 Global Step: 92380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:12:42,322-Speed 10507.37 samples/sec Loss 2.2062 LearningRate 0.0111 Epoch: 17 Global Step: 92390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:12:50,121-Speed 10504.89 samples/sec Loss 2.2213 LearningRate 0.0111 Epoch: 17 Global Step: 92400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:12:57,912-Speed 10515.23 samples/sec Loss 2.2206 LearningRate 0.0111 Epoch: 17 Global Step: 92410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:13:05,696-Speed 10532.86 samples/sec Loss 2.2126 LearningRate 0.0111 Epoch: 17 Global Step: 92420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:13:13,487-Speed 10517.95 samples/sec Loss 2.2088 LearningRate 0.0110 Epoch: 17 Global Step: 92430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:13:21,289-Speed 10500.35 samples/sec Loss 2.2191 LearningRate 0.0110 Epoch: 17 Global Step: 92440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:13:29,099-Speed 10490.06 samples/sec Loss 2.2109 LearningRate 0.0110 Epoch: 17 Global Step: 92450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:13:36,965-Speed 10415.58 samples/sec Loss 2.2099 LearningRate 0.0110 Epoch: 17 Global Step: 92460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:13:44,881-Speed 10350.70 samples/sec Loss 2.1875 LearningRate 0.0110 Epoch: 17 Global Step: 92470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:13:52,679-Speed 10506.57 samples/sec Loss 2.1948 LearningRate 0.0109 Epoch: 17 Global Step: 92480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:14:00,478-Speed 10504.53 samples/sec Loss 2.1542 LearningRate 0.0109 Epoch: 17 Global Step: 92490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:14:08,251-Speed 10540.72 samples/sec Loss 2.1847 LearningRate 0.0109 Epoch: 17 Global Step: 92500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:14:16,048-Speed 10508.38 samples/sec Loss 2.1954 LearningRate 0.0109 Epoch: 17 Global Step: 92510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:14:23,831-Speed 10527.89 samples/sec Loss 2.1696 LearningRate 0.0109 Epoch: 17 Global Step: 92520 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-16 11:14:31,671-Speed 10451.82 samples/sec Loss 2.2333 LearningRate 0.0108 Epoch: 17 Global Step: 92530 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-16 11:14:39,471-Speed 10503.85 samples/sec Loss 2.1789 LearningRate 0.0108 Epoch: 17 Global Step: 92540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:14:47,289-Speed 10479.80 samples/sec Loss 2.2024 LearningRate 0.0108 Epoch: 17 Global Step: 92550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:14:55,109-Speed 10478.02 samples/sec Loss 2.2085 LearningRate 0.0108 Epoch: 17 Global Step: 92560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:15:02,901-Speed 10513.72 samples/sec Loss 2.2072 LearningRate 0.0108 Epoch: 17 Global Step: 92570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:15:10,739-Speed 10454.19 samples/sec Loss 2.1880 LearningRate 0.0107 Epoch: 17 Global Step: 92580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:15:18,534-Speed 10511.16 samples/sec Loss 2.2152 LearningRate 0.0107 Epoch: 17 Global Step: 92590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:15:26,313-Speed 10532.39 samples/sec Loss 2.1938 LearningRate 0.0107 Epoch: 17 Global Step: 92600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:15:34,091-Speed 10535.68 samples/sec Loss 2.1933 LearningRate 0.0107 Epoch: 17 Global Step: 92610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:15:41,897-Speed 10495.19 samples/sec Loss 2.1880 LearningRate 0.0107 Epoch: 17 Global Step: 92620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:15:49,696-Speed 10506.25 samples/sec Loss 2.1987 LearningRate 0.0106 Epoch: 17 Global Step: 92630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:15:57,501-Speed 10496.60 samples/sec Loss 2.2085 LearningRate 0.0106 Epoch: 17 Global Step: 92640 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-16 11:16:05,311-Speed 10490.06 samples/sec Loss 2.1870 LearningRate 0.0106 Epoch: 17 Global Step: 92650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:16:13,102-Speed 10517.90 samples/sec Loss 2.1790 LearningRate 0.0106 Epoch: 17 Global Step: 92660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:16:20,887-Speed 10524.27 samples/sec Loss 2.1739 LearningRate 0.0106 Epoch: 17 Global Step: 92670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:16:28,648-Speed 10556.06 samples/sec Loss 2.1753 LearningRate 0.0106 Epoch: 17 Global Step: 92680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:16:36,451-Speed 10501.18 samples/sec Loss 2.1547 LearningRate 0.0105 Epoch: 17 Global Step: 92690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:16:44,294-Speed 10446.14 samples/sec Loss 2.1541 LearningRate 0.0105 Epoch: 17 Global Step: 92700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:16:52,071-Speed 10535.59 samples/sec Loss 2.1863 LearningRate 0.0105 Epoch: 17 Global Step: 92710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:16:59,865-Speed 10511.46 samples/sec Loss 2.1723 LearningRate 0.0105 Epoch: 17 Global Step: 92720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:17:07,664-Speed 10505.12 samples/sec Loss 2.1621 LearningRate 0.0105 Epoch: 17 Global Step: 92730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:17:15,448-Speed 10525.15 samples/sec Loss 2.1768 LearningRate 0.0104 Epoch: 17 Global Step: 92740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:17:23,255-Speed 10494.52 samples/sec Loss 2.1501 LearningRate 0.0104 Epoch: 17 Global Step: 92750 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-16 11:17:31,063-Speed 10492.67 samples/sec Loss 2.1764 LearningRate 0.0104 Epoch: 17 Global Step: 92760 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-16 11:17:38,837-Speed 10540.34 samples/sec Loss 2.1677 LearningRate 0.0104 Epoch: 17 Global Step: 92770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:17:46,609-Speed 10542.25 samples/sec Loss 2.1678 LearningRate 0.0104 Epoch: 17 Global Step: 92780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:17:54,393-Speed 10525.09 samples/sec Loss 2.1805 LearningRate 0.0103 Epoch: 17 Global Step: 92790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:18:02,194-Speed 10502.32 samples/sec Loss 2.2089 LearningRate 0.0103 Epoch: 17 Global Step: 92800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:18:09,987-Speed 10513.24 samples/sec Loss 2.1762 LearningRate 0.0103 Epoch: 17 Global Step: 92810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:18:17,760-Speed 10539.92 samples/sec Loss 2.1509 LearningRate 0.0103 Epoch: 17 Global Step: 92820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:18:25,533-Speed 10540.43 samples/sec Loss 2.1656 LearningRate 0.0103 Epoch: 17 Global Step: 92830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:18:33,321-Speed 10520.92 samples/sec Loss 2.1577 LearningRate 0.0102 Epoch: 17 Global Step: 92840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:18:41,120-Speed 10504.94 samples/sec Loss 2.1634 LearningRate 0.0102 Epoch: 17 Global Step: 92850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:18:48,926-Speed 10495.65 samples/sec Loss 2.1512 LearningRate 0.0102 Epoch: 17 Global Step: 92860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:18:56,737-Speed 10489.65 samples/sec Loss 2.1776 LearningRate 0.0102 Epoch: 17 Global Step: 92870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:19:04,527-Speed 10517.64 samples/sec Loss 2.1581 LearningRate 0.0102 Epoch: 17 Global Step: 92880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:19:12,315-Speed 10520.23 samples/sec Loss 2.1469 LearningRate 0.0102 Epoch: 17 Global Step: 92890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:19:20,090-Speed 10537.63 samples/sec Loss 2.1510 LearningRate 0.0101 Epoch: 17 Global Step: 92900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:19:27,898-Speed 10495.75 samples/sec Loss 2.1429 LearningRate 0.0101 Epoch: 17 Global Step: 92910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:19:35,683-Speed 10523.69 samples/sec Loss 2.1239 LearningRate 0.0101 Epoch: 17 Global Step: 92920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:19:43,488-Speed 10497.83 samples/sec Loss 2.1598 LearningRate 0.0101 Epoch: 17 Global Step: 92930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:19:51,336-Speed 10439.81 samples/sec Loss 2.1643 LearningRate 0.0101 Epoch: 17 Global Step: 92940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:19:59,116-Speed 10531.76 samples/sec Loss 2.1499 LearningRate 0.0100 Epoch: 17 Global Step: 92950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:20:06,913-Speed 10508.73 samples/sec Loss 2.1476 LearningRate 0.0100 Epoch: 17 Global Step: 92960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:20:14,699-Speed 10523.56 samples/sec Loss 2.1533 LearningRate 0.0100 Epoch: 17 Global Step: 92970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:20:22,510-Speed 10491.00 samples/sec Loss 2.1480 LearningRate 0.0100 Epoch: 17 Global Step: 92980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:20:30,299-Speed 10519.45 samples/sec Loss 2.1509 LearningRate 0.0100 Epoch: 17 Global Step: 92990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:20:38,103-Speed 10498.16 samples/sec Loss 2.1199 LearningRate 0.0099 Epoch: 17 Global Step: 93000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:20:45,931-Speed 10466.54 samples/sec Loss 2.1419 LearningRate 0.0099 Epoch: 17 Global Step: 93010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:20:53,712-Speed 10530.42 samples/sec Loss 2.1442 LearningRate 0.0099 Epoch: 17 Global Step: 93020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:21:01,501-Speed 10518.33 samples/sec Loss 2.1410 LearningRate 0.0099 Epoch: 17 Global Step: 93030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:21:09,316-Speed 10484.58 samples/sec Loss 2.1573 LearningRate 0.0099 Epoch: 17 Global Step: 93040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:21:17,117-Speed 10502.82 samples/sec Loss 2.1267 LearningRate 0.0099 Epoch: 17 Global Step: 93050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:21:24,926-Speed 10491.13 samples/sec Loss 2.1380 LearningRate 0.0098 Epoch: 17 Global Step: 93060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:21:32,719-Speed 10513.26 samples/sec Loss 2.1329 LearningRate 0.0098 Epoch: 17 Global Step: 93070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:21:40,510-Speed 10516.84 samples/sec Loss 2.1295 LearningRate 0.0098 Epoch: 17 Global Step: 93080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:21:48,290-Speed 10530.60 samples/sec Loss 2.1066 LearningRate 0.0098 Epoch: 17 Global Step: 93090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:21:56,096-Speed 10496.01 samples/sec Loss 2.1256 LearningRate 0.0098 Epoch: 17 Global Step: 93100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:22:03,908-Speed 10487.57 samples/sec Loss 2.1451 LearningRate 0.0097 Epoch: 17 Global Step: 93110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:22:11,730-Speed 10479.34 samples/sec Loss 2.1371 LearningRate 0.0097 Epoch: 17 Global Step: 93120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:22:19,546-Speed 10482.56 samples/sec Loss 2.1343 LearningRate 0.0097 Epoch: 17 Global Step: 93130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:22:27,368-Speed 10473.80 samples/sec Loss 2.1203 LearningRate 0.0097 Epoch: 17 Global Step: 93140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:22:35,161-Speed 10514.34 samples/sec Loss 2.1358 LearningRate 0.0097 Epoch: 17 Global Step: 93150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:22:42,943-Speed 10528.69 samples/sec Loss 2.1232 LearningRate 0.0097 Epoch: 17 Global Step: 93160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:22:50,762-Speed 10477.82 samples/sec Loss 2.1465 LearningRate 0.0096 Epoch: 17 Global Step: 93170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:22:58,543-Speed 10528.65 samples/sec Loss 2.1363 LearningRate 0.0096 Epoch: 17 Global Step: 93180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:23:06,351-Speed 10494.07 samples/sec Loss 2.1483 LearningRate 0.0096 Epoch: 17 Global Step: 93190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:23:14,133-Speed 10528.00 samples/sec Loss 2.1195 LearningRate 0.0096 Epoch: 17 Global Step: 93200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:23:21,953-Speed 10477.27 samples/sec Loss 2.1149 LearningRate 0.0096 Epoch: 17 Global Step: 93210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:23:29,746-Speed 10513.67 samples/sec Loss 2.1238 LearningRate 0.0095 Epoch: 17 Global Step: 93220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:23:37,534-Speed 10520.16 samples/sec Loss 2.1201 LearningRate 0.0095 Epoch: 17 Global Step: 93230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:23:45,323-Speed 10519.36 samples/sec Loss 2.1114 LearningRate 0.0095 Epoch: 17 Global Step: 93240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:23:53,141-Speed 10479.35 samples/sec Loss 2.1032 LearningRate 0.0095 Epoch: 17 Global Step: 93250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:24:00,972-Speed 10462.26 samples/sec Loss 2.1176 LearningRate 0.0095 Epoch: 17 Global Step: 93260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:24:08,779-Speed 10495.36 samples/sec Loss 2.1173 LearningRate 0.0095 Epoch: 17 Global Step: 93270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:24:16,588-Speed 10492.06 samples/sec Loss 2.1155 LearningRate 0.0094 Epoch: 17 Global Step: 93280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:24:24,388-Speed 10503.94 samples/sec Loss 2.1178 LearningRate 0.0094 Epoch: 17 Global Step: 93290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:24:32,177-Speed 10518.93 samples/sec Loss 2.1351 LearningRate 0.0094 Epoch: 17 Global Step: 93300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:24:39,988-Speed 10490.20 samples/sec Loss 2.1333 LearningRate 0.0094 Epoch: 17 Global Step: 93310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:24:47,793-Speed 10497.70 samples/sec Loss 2.1126 LearningRate 0.0094 Epoch: 17 Global Step: 93320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:25:10,502-Speed 3607.46 samples/sec Loss 2.1130 LearningRate 0.0093 Epoch: 18 Global Step: 93330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:25:18,278-Speed 10536.71 samples/sec Loss 2.1021 LearningRate 0.0093 Epoch: 18 Global Step: 93340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:25:26,048-Speed 10546.45 samples/sec Loss 2.1271 LearningRate 0.0093 Epoch: 18 Global Step: 93350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:25:33,831-Speed 10525.62 samples/sec Loss 2.0976 LearningRate 0.0093 Epoch: 18 Global Step: 93360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:25:41,616-Speed 10524.40 samples/sec Loss 2.1074 LearningRate 0.0093 Epoch: 18 Global Step: 93370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:25:49,386-Speed 10544.41 samples/sec Loss 2.0954 LearningRate 0.0093 Epoch: 18 Global Step: 93380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:25:57,172-Speed 10522.41 samples/sec Loss 2.0894 LearningRate 0.0092 Epoch: 18 Global Step: 93390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:26:04,963-Speed 10516.23 samples/sec Loss 2.1136 LearningRate 0.0092 Epoch: 18 Global Step: 93400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:26:12,782-Speed 10478.84 samples/sec Loss 2.1019 LearningRate 0.0092 Epoch: 18 Global Step: 93410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:26:20,592-Speed 10490.55 samples/sec Loss 2.0913 LearningRate 0.0092 Epoch: 18 Global Step: 93420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:26:28,391-Speed 10505.51 samples/sec Loss 2.0710 LearningRate 0.0092 Epoch: 18 Global Step: 93430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:26:36,170-Speed 10531.06 samples/sec Loss 2.0746 LearningRate 0.0091 Epoch: 18 Global Step: 93440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:26:43,954-Speed 10527.36 samples/sec Loss 2.1291 LearningRate 0.0091 Epoch: 18 Global Step: 93450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:26:51,740-Speed 10522.27 samples/sec Loss 2.1121 LearningRate 0.0091 Epoch: 18 Global Step: 93460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:26:59,529-Speed 10522.24 samples/sec Loss 2.0729 LearningRate 0.0091 Epoch: 18 Global Step: 93470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:27:07,321-Speed 10515.19 samples/sec Loss 2.0894 LearningRate 0.0091 Epoch: 18 Global Step: 93480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:27:15,102-Speed 10533.14 samples/sec Loss 2.0664 LearningRate 0.0091 Epoch: 18 Global Step: 93490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:27:22,854-Speed 10568.60 samples/sec Loss 2.0738 LearningRate 0.0090 Epoch: 18 Global Step: 93500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:27:30,633-Speed 10534.94 samples/sec Loss 2.0648 LearningRate 0.0090 Epoch: 18 Global Step: 93510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:27:38,401-Speed 10546.80 samples/sec Loss 2.0879 LearningRate 0.0090 Epoch: 18 Global Step: 93520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:27:46,171-Speed 10544.88 samples/sec Loss 2.0858 LearningRate 0.0090 Epoch: 18 Global Step: 93530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:27:53,989-Speed 10479.26 samples/sec Loss 2.0904 LearningRate 0.0090 Epoch: 18 Global Step: 93540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:28:01,783-Speed 10512.01 samples/sec Loss 2.0726 LearningRate 0.0089 Epoch: 18 Global Step: 93550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:28:09,580-Speed 10507.54 samples/sec Loss 2.0837 LearningRate 0.0089 Epoch: 18 Global Step: 93560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:28:17,381-Speed 10504.19 samples/sec Loss 2.0841 LearningRate 0.0089 Epoch: 18 Global Step: 93570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:28:25,157-Speed 10536.96 samples/sec Loss 2.0925 LearningRate 0.0089 Epoch: 18 Global Step: 93580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:28:32,953-Speed 10508.47 samples/sec Loss 2.0831 LearningRate 0.0089 Epoch: 18 Global Step: 93590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:28:40,728-Speed 10537.38 samples/sec Loss 2.0909 LearningRate 0.0089 Epoch: 18 Global Step: 93600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:28:48,549-Speed 10476.01 samples/sec Loss 2.0783 LearningRate 0.0088 Epoch: 18 Global Step: 93610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:28:56,333-Speed 10526.87 samples/sec Loss 2.0778 LearningRate 0.0088 Epoch: 18 Global Step: 93620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:29:04,104-Speed 10542.24 samples/sec Loss 2.1039 LearningRate 0.0088 Epoch: 18 Global Step: 93630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:29:11,896-Speed 10515.28 samples/sec Loss 2.0587 LearningRate 0.0088 Epoch: 18 Global Step: 93640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:29:19,662-Speed 10548.64 samples/sec Loss 2.0644 LearningRate 0.0088 Epoch: 18 Global Step: 93650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:29:27,444-Speed 10528.59 samples/sec Loss 2.0835 LearningRate 0.0088 Epoch: 18 Global Step: 93660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:29:35,222-Speed 10533.53 samples/sec Loss 2.0833 LearningRate 0.0087 Epoch: 18 Global Step: 93670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:29:42,982-Speed 10558.91 samples/sec Loss 2.0629 LearningRate 0.0087 Epoch: 18 Global Step: 93680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:29:50,751-Speed 10544.93 samples/sec Loss 2.0697 LearningRate 0.0087 Epoch: 18 Global Step: 93690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:29:58,534-Speed 10527.65 samples/sec Loss 2.0906 LearningRate 0.0087 Epoch: 18 Global Step: 93700 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-16 11:30:06,310-Speed 10536.54 samples/sec Loss 2.0619 LearningRate 0.0087 Epoch: 18 Global Step: 93710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:30:14,099-Speed 10517.91 samples/sec Loss 2.0752 LearningRate 0.0087 Epoch: 18 Global Step: 93720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:30:21,887-Speed 10520.21 samples/sec Loss 2.0645 LearningRate 0.0086 Epoch: 18 Global Step: 93730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:30:29,698-Speed 10489.89 samples/sec Loss 2.0543 LearningRate 0.0086 Epoch: 18 Global Step: 93740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:30:37,514-Speed 10482.69 samples/sec Loss 2.0682 LearningRate 0.0086 Epoch: 18 Global Step: 93750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:30:45,367-Speed 10432.82 samples/sec Loss 2.0666 LearningRate 0.0086 Epoch: 18 Global Step: 93760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:30:53,148-Speed 10530.04 samples/sec Loss 2.0455 LearningRate 0.0086 Epoch: 18 Global Step: 93770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:31:01,000-Speed 10433.65 samples/sec Loss 2.0712 LearningRate 0.0085 Epoch: 18 Global Step: 93780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:31:08,882-Speed 10394.97 samples/sec Loss 2.0533 LearningRate 0.0085 Epoch: 18 Global Step: 93790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:31:16,712-Speed 10463.97 samples/sec Loss 2.0528 LearningRate 0.0085 Epoch: 18 Global Step: 93800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:31:24,564-Speed 10434.84 samples/sec Loss 2.0795 LearningRate 0.0085 Epoch: 18 Global Step: 93810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:31:32,389-Speed 10471.84 samples/sec Loss 2.0688 LearningRate 0.0085 Epoch: 18 Global Step: 93820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:31:40,212-Speed 10474.31 samples/sec Loss 2.0485 LearningRate 0.0085 Epoch: 18 Global Step: 93830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:31:48,041-Speed 10465.08 samples/sec Loss 2.0509 LearningRate 0.0084 Epoch: 18 Global Step: 93840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:31:55,908-Speed 10413.05 samples/sec Loss 2.0567 LearningRate 0.0084 Epoch: 18 Global Step: 93850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:32:03,735-Speed 10468.71 samples/sec Loss 2.0637 LearningRate 0.0084 Epoch: 18 Global Step: 93860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:32:11,556-Speed 10476.03 samples/sec Loss 2.0585 LearningRate 0.0084 Epoch: 18 Global Step: 93870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:32:19,414-Speed 10425.78 samples/sec Loss 2.0586 LearningRate 0.0084 Epoch: 18 Global Step: 93880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:32:27,250-Speed 10456.51 samples/sec Loss 2.0450 LearningRate 0.0084 Epoch: 18 Global Step: 93890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:32:35,077-Speed 10467.24 samples/sec Loss 2.0346 LearningRate 0.0083 Epoch: 18 Global Step: 93900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:32:42,887-Speed 10491.05 samples/sec Loss 2.0345 LearningRate 0.0083 Epoch: 18 Global Step: 93910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:32:50,685-Speed 10510.97 samples/sec Loss 2.0511 LearningRate 0.0083 Epoch: 18 Global Step: 93920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:32:58,533-Speed 10439.34 samples/sec Loss 2.0862 LearningRate 0.0083 Epoch: 18 Global Step: 93930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:33:06,337-Speed 10499.98 samples/sec Loss 2.0272 LearningRate 0.0083 Epoch: 18 Global Step: 93940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:33:14,125-Speed 10519.09 samples/sec Loss 2.0523 LearningRate 0.0083 Epoch: 18 Global Step: 93950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:33:21,929-Speed 10499.49 samples/sec Loss 2.0679 LearningRate 0.0082 Epoch: 18 Global Step: 93960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:33:29,748-Speed 10477.71 samples/sec Loss 2.0141 LearningRate 0.0082 Epoch: 18 Global Step: 93970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:33:37,557-Speed 10493.24 samples/sec Loss 2.0520 LearningRate 0.0082 Epoch: 18 Global Step: 93980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:33:45,355-Speed 10505.60 samples/sec Loss 2.0028 LearningRate 0.0082 Epoch: 18 Global Step: 93990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:33:53,148-Speed 10513.62 samples/sec Loss 2.0538 LearningRate 0.0082 Epoch: 18 Global Step: 94000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:34:00,973-Speed 10470.15 samples/sec Loss 2.0272 LearningRate 0.0082 Epoch: 18 Global Step: 94010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:34:08,758-Speed 10525.69 samples/sec Loss 2.0288 LearningRate 0.0081 Epoch: 18 Global Step: 94020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:34:16,536-Speed 10532.84 samples/sec Loss 2.0444 LearningRate 0.0081 Epoch: 18 Global Step: 94030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:34:24,332-Speed 10509.12 samples/sec Loss 2.0153 LearningRate 0.0081 Epoch: 18 Global Step: 94040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:34:32,134-Speed 10503.85 samples/sec Loss 2.0122 LearningRate 0.0081 Epoch: 18 Global Step: 94050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:34:39,949-Speed 10485.23 samples/sec Loss 2.0263 LearningRate 0.0081 Epoch: 18 Global Step: 94060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:34:47,769-Speed 10475.81 samples/sec Loss 2.0258 LearningRate 0.0081 Epoch: 18 Global Step: 94070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:34:55,573-Speed 10506.66 samples/sec Loss 2.0054 LearningRate 0.0080 Epoch: 18 Global Step: 94080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:35:03,395-Speed 10474.88 samples/sec Loss 2.0190 LearningRate 0.0080 Epoch: 18 Global Step: 94090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:35:11,193-Speed 10506.56 samples/sec Loss 2.0191 LearningRate 0.0080 Epoch: 18 Global Step: 94100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:35:18,966-Speed 10540.17 samples/sec Loss 2.0036 LearningRate 0.0080 Epoch: 18 Global Step: 94110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:35:26,752-Speed 10523.30 samples/sec Loss 2.0009 LearningRate 0.0080 Epoch: 18 Global Step: 94120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:35:34,521-Speed 10545.12 samples/sec Loss 2.0321 LearningRate 0.0080 Epoch: 18 Global Step: 94130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:35:42,314-Speed 10514.33 samples/sec Loss 2.0483 LearningRate 0.0079 Epoch: 18 Global Step: 94140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:35:50,103-Speed 10518.86 samples/sec Loss 2.0227 LearningRate 0.0079 Epoch: 18 Global Step: 94150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:35:57,905-Speed 10500.53 samples/sec Loss 2.0338 LearningRate 0.0079 Epoch: 18 Global Step: 94160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:36:05,724-Speed 10479.11 samples/sec Loss 2.0053 LearningRate 0.0079 Epoch: 18 Global Step: 94170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:36:13,515-Speed 10515.44 samples/sec Loss 2.0114 LearningRate 0.0079 Epoch: 18 Global Step: 94180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:36:21,306-Speed 10516.71 samples/sec Loss 1.9967 LearningRate 0.0079 Epoch: 18 Global Step: 94190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:36:29,110-Speed 10497.85 samples/sec Loss 2.0130 LearningRate 0.0078 Epoch: 18 Global Step: 94200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:36:36,912-Speed 10501.96 samples/sec Loss 2.0242 LearningRate 0.0078 Epoch: 18 Global Step: 94210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:36:44,717-Speed 10497.05 samples/sec Loss 2.0110 LearningRate 0.0078 Epoch: 18 Global Step: 94220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:36:52,513-Speed 10508.51 samples/sec Loss 2.0145 LearningRate 0.0078 Epoch: 18 Global Step: 94230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:37:00,295-Speed 10528.63 samples/sec Loss 2.0008 LearningRate 0.0078 Epoch: 18 Global Step: 94240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:37:08,122-Speed 10468.36 samples/sec Loss 1.9943 LearningRate 0.0078 Epoch: 18 Global Step: 94250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:37:15,920-Speed 10506.22 samples/sec Loss 2.0103 LearningRate 0.0077 Epoch: 18 Global Step: 94260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:37:23,735-Speed 10483.01 samples/sec Loss 2.0032 LearningRate 0.0077 Epoch: 18 Global Step: 94270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:37:31,548-Speed 10487.87 samples/sec Loss 2.0026 LearningRate 0.0077 Epoch: 18 Global Step: 94280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:37:39,347-Speed 10505.33 samples/sec Loss 1.9840 LearningRate 0.0077 Epoch: 18 Global Step: 94290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:37:47,161-Speed 10486.84 samples/sec Loss 2.0045 LearningRate 0.0077 Epoch: 18 Global Step: 94300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:37:54,964-Speed 10499.93 samples/sec Loss 1.9948 LearningRate 0.0077 Epoch: 18 Global Step: 94310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:38:02,768-Speed 10498.31 samples/sec Loss 2.0020 LearningRate 0.0076 Epoch: 18 Global Step: 94320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:38:10,552-Speed 10526.57 samples/sec Loss 1.9984 LearningRate 0.0076 Epoch: 18 Global Step: 94330 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-16 11:38:18,352-Speed 10503.84 samples/sec Loss 1.9950 LearningRate 0.0076 Epoch: 18 Global Step: 94340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:38:26,169-Speed 10481.04 samples/sec Loss 1.9827 LearningRate 0.0076 Epoch: 18 Global Step: 94350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:38:33,952-Speed 10526.44 samples/sec Loss 2.0102 LearningRate 0.0076 Epoch: 18 Global Step: 94360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:38:41,755-Speed 10500.15 samples/sec Loss 1.9867 LearningRate 0.0076 Epoch: 18 Global Step: 94370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:38:49,542-Speed 10521.38 samples/sec Loss 1.9835 LearningRate 0.0075 Epoch: 18 Global Step: 94380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:38:57,336-Speed 10511.36 samples/sec Loss 1.9953 LearningRate 0.0075 Epoch: 18 Global Step: 94390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:39:05,169-Speed 10460.36 samples/sec Loss 1.9808 LearningRate 0.0075 Epoch: 18 Global Step: 94400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:39:12,964-Speed 10511.38 samples/sec Loss 1.9988 LearningRate 0.0075 Epoch: 18 Global Step: 94410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:39:20,743-Speed 10532.39 samples/sec Loss 1.9890 LearningRate 0.0075 Epoch: 18 Global Step: 94420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:39:28,535-Speed 10514.66 samples/sec Loss 1.9804 LearningRate 0.0075 Epoch: 18 Global Step: 94430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:39:36,321-Speed 10522.44 samples/sec Loss 1.9934 LearningRate 0.0074 Epoch: 18 Global Step: 94440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:39:44,116-Speed 10511.13 samples/sec Loss 2.0018 LearningRate 0.0074 Epoch: 18 Global Step: 94450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:39:51,908-Speed 10515.28 samples/sec Loss 1.9910 LearningRate 0.0074 Epoch: 18 Global Step: 94460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:39:59,709-Speed 10503.02 samples/sec Loss 1.9868 LearningRate 0.0074 Epoch: 18 Global Step: 94470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:40:07,501-Speed 10515.37 samples/sec Loss 1.9911 LearningRate 0.0074 Epoch: 18 Global Step: 94480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:40:15,288-Speed 10519.91 samples/sec Loss 1.9867 LearningRate 0.0074 Epoch: 18 Global Step: 94490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:40:23,078-Speed 10518.02 samples/sec Loss 1.9901 LearningRate 0.0073 Epoch: 18 Global Step: 94500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:40:30,876-Speed 10507.79 samples/sec Loss 1.9818 LearningRate 0.0073 Epoch: 18 Global Step: 94510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:40:38,678-Speed 10500.11 samples/sec Loss 1.9850 LearningRate 0.0073 Epoch: 18 Global Step: 94520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:40:46,478-Speed 10504.08 samples/sec Loss 1.9755 LearningRate 0.0073 Epoch: 18 Global Step: 94530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:40:54,252-Speed 10539.35 samples/sec Loss 1.9689 LearningRate 0.0073 Epoch: 18 Global Step: 94540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:41:02,052-Speed 10504.22 samples/sec Loss 1.9699 LearningRate 0.0073 Epoch: 18 Global Step: 94550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:41:09,839-Speed 10520.67 samples/sec Loss 1.9959 LearningRate 0.0073 Epoch: 18 Global Step: 94560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:41:17,612-Speed 10541.71 samples/sec Loss 1.9801 LearningRate 0.0072 Epoch: 18 Global Step: 94570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:41:25,420-Speed 10493.03 samples/sec Loss 1.9561 LearningRate 0.0072 Epoch: 18 Global Step: 94580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:41:33,200-Speed 10530.52 samples/sec Loss 1.9867 LearningRate 0.0072 Epoch: 18 Global Step: 94590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:41:41,006-Speed 10496.97 samples/sec Loss 1.9706 LearningRate 0.0072 Epoch: 18 Global Step: 94600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:41:48,808-Speed 10500.98 samples/sec Loss 1.9798 LearningRate 0.0072 Epoch: 18 Global Step: 94610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:41:56,624-Speed 10484.64 samples/sec Loss 1.9660 LearningRate 0.0072 Epoch: 18 Global Step: 94620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:42:04,464-Speed 10450.77 samples/sec Loss 1.9917 LearningRate 0.0071 Epoch: 18 Global Step: 94630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:42:12,301-Speed 10455.10 samples/sec Loss 1.9685 LearningRate 0.0071 Epoch: 18 Global Step: 94640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:42:20,099-Speed 10506.10 samples/sec Loss 1.9710 LearningRate 0.0071 Epoch: 18 Global Step: 94650 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-16 11:42:27,886-Speed 10525.19 samples/sec Loss 1.9808 LearningRate 0.0071 Epoch: 18 Global Step: 94660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:42:35,668-Speed 10528.95 samples/sec Loss 1.9569 LearningRate 0.0071 Epoch: 18 Global Step: 94670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:42:43,455-Speed 10521.85 samples/sec Loss 1.9499 LearningRate 0.0071 Epoch: 18 Global Step: 94680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:42:51,250-Speed 10513.27 samples/sec Loss 1.9595 LearningRate 0.0070 Epoch: 18 Global Step: 94690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:42:59,073-Speed 10474.22 samples/sec Loss 1.9897 LearningRate 0.0070 Epoch: 18 Global Step: 94700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:43:06,858-Speed 10523.55 samples/sec Loss 1.9620 LearningRate 0.0070 Epoch: 18 Global Step: 94710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:43:14,628-Speed 10544.72 samples/sec Loss 1.9789 LearningRate 0.0070 Epoch: 18 Global Step: 94720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:43:22,420-Speed 10514.63 samples/sec Loss 1.9440 LearningRate 0.0070 Epoch: 18 Global Step: 94730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:43:30,208-Speed 10519.88 samples/sec Loss 1.9717 LearningRate 0.0070 Epoch: 18 Global Step: 94740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:43:38,002-Speed 10512.24 samples/sec Loss 1.9525 LearningRate 0.0070 Epoch: 18 Global Step: 94750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:43:45,791-Speed 10519.21 samples/sec Loss 1.9715 LearningRate 0.0069 Epoch: 18 Global Step: 94760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:43:53,643-Speed 10434.61 samples/sec Loss 1.9557 LearningRate 0.0069 Epoch: 18 Global Step: 94770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:44:01,452-Speed 10490.80 samples/sec Loss 1.9845 LearningRate 0.0069 Epoch: 18 Global Step: 94780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:44:09,275-Speed 10476.49 samples/sec Loss 1.9543 LearningRate 0.0069 Epoch: 18 Global Step: 94790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:44:17,100-Speed 10469.88 samples/sec Loss 1.9624 LearningRate 0.0069 Epoch: 18 Global Step: 94800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:44:24,907-Speed 10494.35 samples/sec Loss 1.9868 LearningRate 0.0069 Epoch: 18 Global Step: 94810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:44:32,697-Speed 10518.50 samples/sec Loss 1.9364 LearningRate 0.0068 Epoch: 18 Global Step: 94820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:44:40,497-Speed 10503.56 samples/sec Loss 1.9481 LearningRate 0.0068 Epoch: 18 Global Step: 94830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:44:48,272-Speed 10537.21 samples/sec Loss 1.9384 LearningRate 0.0068 Epoch: 18 Global Step: 94840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:44:56,116-Speed 10445.22 samples/sec Loss 1.9467 LearningRate 0.0068 Epoch: 18 Global Step: 94850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:45:03,913-Speed 10517.96 samples/sec Loss 1.9616 LearningRate 0.0068 Epoch: 18 Global Step: 94860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:45:11,714-Speed 10509.09 samples/sec Loss 1.9314 LearningRate 0.0068 Epoch: 18 Global Step: 94870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:45:19,517-Speed 10499.78 samples/sec Loss 1.9440 LearningRate 0.0068 Epoch: 18 Global Step: 94880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:45:27,311-Speed 10512.31 samples/sec Loss 1.9531 LearningRate 0.0067 Epoch: 18 Global Step: 94890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:45:35,109-Speed 10506.48 samples/sec Loss 1.9365 LearningRate 0.0067 Epoch: 18 Global Step: 94900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:45:42,926-Speed 10481.64 samples/sec Loss 1.9478 LearningRate 0.0067 Epoch: 18 Global Step: 94910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:45:50,713-Speed 10521.31 samples/sec Loss 1.9517 LearningRate 0.0067 Epoch: 18 Global Step: 94920 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-16 11:45:58,502-Speed 10517.55 samples/sec Loss 1.9705 LearningRate 0.0067 Epoch: 18 Global Step: 94930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:46:06,288-Speed 10523.17 samples/sec Loss 1.9399 LearningRate 0.0067 Epoch: 18 Global Step: 94940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:46:14,160-Speed 10408.40 samples/sec Loss 1.9361 LearningRate 0.0066 Epoch: 18 Global Step: 94950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:46:21,998-Speed 10458.52 samples/sec Loss 1.9596 LearningRate 0.0066 Epoch: 18 Global Step: 94960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:46:29,780-Speed 10528.36 samples/sec Loss 1.9125 LearningRate 0.0066 Epoch: 18 Global Step: 94970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:46:37,570-Speed 10528.29 samples/sec Loss 1.9387 LearningRate 0.0066 Epoch: 18 Global Step: 94980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:46:45,350-Speed 10530.54 samples/sec Loss 1.9421 LearningRate 0.0066 Epoch: 18 Global Step: 94990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:46:53,193-Speed 10446.53 samples/sec Loss 1.9261 LearningRate 0.0066 Epoch: 18 Global Step: 95000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:47:01,000-Speed 10494.42 samples/sec Loss 1.9441 LearningRate 0.0066 Epoch: 18 Global Step: 95010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:47:08,795-Speed 10510.58 samples/sec Loss 1.9397 LearningRate 0.0065 Epoch: 18 Global Step: 95020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:47:16,613-Speed 10481.03 samples/sec Loss 1.9245 LearningRate 0.0065 Epoch: 18 Global Step: 95030 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-16 11:47:24,450-Speed 10457.95 samples/sec Loss 1.9199 LearningRate 0.0065 Epoch: 18 Global Step: 95040 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-16 11:47:32,286-Speed 10454.86 samples/sec Loss 1.9138 LearningRate 0.0065 Epoch: 18 Global Step: 95050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:47:40,086-Speed 10504.25 samples/sec Loss 1.9248 LearningRate 0.0065 Epoch: 18 Global Step: 95060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:47:47,878-Speed 10515.02 samples/sec Loss 1.9239 LearningRate 0.0065 Epoch: 18 Global Step: 95070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:47:55,681-Speed 10499.90 samples/sec Loss 1.9360 LearningRate 0.0065 Epoch: 18 Global Step: 95080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:48:03,476-Speed 10510.07 samples/sec Loss 1.9227 LearningRate 0.0064 Epoch: 18 Global Step: 95090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:48:11,328-Speed 10434.83 samples/sec Loss 1.9164 LearningRate 0.0064 Epoch: 18 Global Step: 95100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:48:19,153-Speed 10470.41 samples/sec Loss 1.9129 LearningRate 0.0064 Epoch: 18 Global Step: 95110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:48:26,966-Speed 10485.85 samples/sec Loss 1.9264 LearningRate 0.0064 Epoch: 18 Global Step: 95120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:48:34,791-Speed 10471.12 samples/sec Loss 1.9366 LearningRate 0.0064 Epoch: 18 Global Step: 95130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:48:42,604-Speed 10485.89 samples/sec Loss 1.9365 LearningRate 0.0064 Epoch: 18 Global Step: 95140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:48:50,376-Speed 10542.41 samples/sec Loss 1.9236 LearningRate 0.0063 Epoch: 18 Global Step: 95150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:48:58,141-Speed 10551.22 samples/sec Loss 1.9256 LearningRate 0.0063 Epoch: 18 Global Step: 95160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:49:05,955-Speed 10484.91 samples/sec Loss 1.9198 LearningRate 0.0063 Epoch: 18 Global Step: 95170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:49:13,738-Speed 10526.08 samples/sec Loss 1.8868 LearningRate 0.0063 Epoch: 18 Global Step: 95180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:49:21,547-Speed 10493.47 samples/sec Loss 1.9208 LearningRate 0.0063 Epoch: 18 Global Step: 95190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:49:29,323-Speed 10535.67 samples/sec Loss 1.8966 LearningRate 0.0063 Epoch: 18 Global Step: 95200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:49:37,117-Speed 10512.59 samples/sec Loss 1.9209 LearningRate 0.0063 Epoch: 18 Global Step: 95210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:49:44,902-Speed 10524.14 samples/sec Loss 1.9209 LearningRate 0.0062 Epoch: 18 Global Step: 95220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:49:52,680-Speed 10533.17 samples/sec Loss 1.9200 LearningRate 0.0062 Epoch: 18 Global Step: 95230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:50:00,467-Speed 10520.90 samples/sec Loss 1.9306 LearningRate 0.0062 Epoch: 18 Global Step: 95240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:50:08,301-Speed 10459.07 samples/sec Loss 1.9111 LearningRate 0.0062 Epoch: 18 Global Step: 95250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:50:16,142-Speed 10449.12 samples/sec Loss 1.9175 LearningRate 0.0062 Epoch: 18 Global Step: 95260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:50:23,935-Speed 10513.30 samples/sec Loss 1.9235 LearningRate 0.0062 Epoch: 18 Global Step: 95270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:50:31,736-Speed 10502.46 samples/sec Loss 1.9147 LearningRate 0.0062 Epoch: 18 Global Step: 95280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:50:39,533-Speed 10508.05 samples/sec Loss 1.9170 LearningRate 0.0061 Epoch: 18 Global Step: 95290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:50:47,340-Speed 10494.17 samples/sec Loss 1.9045 LearningRate 0.0061 Epoch: 18 Global Step: 95300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:50:55,106-Speed 10549.98 samples/sec Loss 1.8929 LearningRate 0.0061 Epoch: 18 Global Step: 95310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:51:02,903-Speed 10508.45 samples/sec Loss 1.8839 LearningRate 0.0061 Epoch: 18 Global Step: 95320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:51:10,694-Speed 10515.34 samples/sec Loss 1.9298 LearningRate 0.0061 Epoch: 18 Global Step: 95330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:51:18,496-Speed 10501.78 samples/sec Loss 1.8918 LearningRate 0.0061 Epoch: 18 Global Step: 95340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:51:26,306-Speed 10490.53 samples/sec Loss 1.8896 LearningRate 0.0061 Epoch: 18 Global Step: 95350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:51:34,124-Speed 10480.42 samples/sec Loss 1.9016 LearningRate 0.0060 Epoch: 18 Global Step: 95360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:51:41,911-Speed 10521.48 samples/sec Loss 1.8830 LearningRate 0.0060 Epoch: 18 Global Step: 95370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:51:49,736-Speed 10470.30 samples/sec Loss 1.8744 LearningRate 0.0060 Epoch: 18 Global Step: 95380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:51:57,522-Speed 10522.48 samples/sec Loss 1.8994 LearningRate 0.0060 Epoch: 18 Global Step: 95390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:52:05,301-Speed 10531.67 samples/sec Loss 1.9155 LearningRate 0.0060 Epoch: 18 Global Step: 95400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:52:13,109-Speed 10493.72 samples/sec Loss 1.8857 LearningRate 0.0060 Epoch: 18 Global Step: 95410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:52:20,895-Speed 10524.30 samples/sec Loss 1.8951 LearningRate 0.0060 Epoch: 18 Global Step: 95420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:52:28,676-Speed 10528.64 samples/sec Loss 1.8903 LearningRate 0.0059 Epoch: 18 Global Step: 95430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:52:36,461-Speed 10523.02 samples/sec Loss 1.9031 LearningRate 0.0059 Epoch: 18 Global Step: 95440 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-16 11:52:44,253-Speed 10515.02 samples/sec Loss 1.8737 LearningRate 0.0059 Epoch: 18 Global Step: 95450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:52:52,057-Speed 10498.74 samples/sec Loss 1.8880 LearningRate 0.0059 Epoch: 18 Global Step: 95460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:52:59,857-Speed 10503.96 samples/sec Loss 1.8750 LearningRate 0.0059 Epoch: 18 Global Step: 95470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:53:07,662-Speed 10496.78 samples/sec Loss 1.8643 LearningRate 0.0059 Epoch: 18 Global Step: 95480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:53:15,450-Speed 10520.97 samples/sec Loss 1.8731 LearningRate 0.0058 Epoch: 18 Global Step: 95490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:53:23,242-Speed 10514.40 samples/sec Loss 1.8916 LearningRate 0.0058 Epoch: 18 Global Step: 95500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:53:31,046-Speed 10498.10 samples/sec Loss 1.9026 LearningRate 0.0058 Epoch: 18 Global Step: 95510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:53:38,878-Speed 10461.26 samples/sec Loss 1.8786 LearningRate 0.0058 Epoch: 18 Global Step: 95520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:53:46,659-Speed 10530.44 samples/sec Loss 1.8664 LearningRate 0.0058 Epoch: 18 Global Step: 95530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:53:54,434-Speed 10537.65 samples/sec Loss 1.8691 LearningRate 0.0058 Epoch: 18 Global Step: 95540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:54:02,221-Speed 10521.22 samples/sec Loss 1.8754 LearningRate 0.0058 Epoch: 18 Global Step: 95550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:54:10,028-Speed 10494.78 samples/sec Loss 1.8831 LearningRate 0.0058 Epoch: 18 Global Step: 95560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:54:17,871-Speed 10446.31 samples/sec Loss 1.8881 LearningRate 0.0057 Epoch: 18 Global Step: 95570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:54:25,656-Speed 10524.09 samples/sec Loss 1.8788 LearningRate 0.0057 Epoch: 18 Global Step: 95580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:54:33,457-Speed 10502.19 samples/sec Loss 1.8818 LearningRate 0.0057 Epoch: 18 Global Step: 95590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:54:41,244-Speed 10522.97 samples/sec Loss 1.9130 LearningRate 0.0057 Epoch: 18 Global Step: 95600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:54:49,021-Speed 10535.29 samples/sec Loss 1.8676 LearningRate 0.0057 Epoch: 18 Global Step: 95610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:54:56,798-Speed 10534.64 samples/sec Loss 1.8717 LearningRate 0.0057 Epoch: 18 Global Step: 95620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:55:04,589-Speed 10515.77 samples/sec Loss 1.8747 LearningRate 0.0057 Epoch: 18 Global Step: 95630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:55:12,395-Speed 10496.62 samples/sec Loss 1.8705 LearningRate 0.0056 Epoch: 18 Global Step: 95640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:55:20,198-Speed 10500.49 samples/sec Loss 1.8843 LearningRate 0.0056 Epoch: 18 Global Step: 95650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:55:28,054-Speed 10428.28 samples/sec Loss 1.8775 LearningRate 0.0056 Epoch: 18 Global Step: 95660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:55:35,882-Speed 10465.85 samples/sec Loss 1.8744 LearningRate 0.0056 Epoch: 18 Global Step: 95670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:55:43,686-Speed 10499.89 samples/sec Loss 1.8505 LearningRate 0.0056 Epoch: 18 Global Step: 95680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:55:51,493-Speed 10494.40 samples/sec Loss 1.8694 LearningRate 0.0056 Epoch: 18 Global Step: 95690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:55:59,298-Speed 10496.96 samples/sec Loss 1.8725 LearningRate 0.0056 Epoch: 18 Global Step: 95700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:56:07,081-Speed 10527.97 samples/sec Loss 1.8870 LearningRate 0.0055 Epoch: 18 Global Step: 95710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:56:14,878-Speed 10508.63 samples/sec Loss 1.8684 LearningRate 0.0055 Epoch: 18 Global Step: 95720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:56:22,676-Speed 10505.82 samples/sec Loss 1.8937 LearningRate 0.0055 Epoch: 18 Global Step: 95730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:56:30,452-Speed 10536.11 samples/sec Loss 1.8676 LearningRate 0.0055 Epoch: 18 Global Step: 95740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:56:38,234-Speed 10528.29 samples/sec Loss 1.8897 LearningRate 0.0055 Epoch: 18 Global Step: 95750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:56:46,013-Speed 10533.05 samples/sec Loss 1.8731 LearningRate 0.0055 Epoch: 18 Global Step: 95760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:56:53,795-Speed 10527.95 samples/sec Loss 1.8512 LearningRate 0.0055 Epoch: 18 Global Step: 95770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:57:01,597-Speed 10501.22 samples/sec Loss 1.8600 LearningRate 0.0054 Epoch: 18 Global Step: 95780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:57:09,442-Speed 10444.13 samples/sec Loss 1.8707 LearningRate 0.0054 Epoch: 18 Global Step: 95790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 11:57:17,232-Speed 10517.44 samples/sec Loss 1.8497 LearningRate 0.0054 Epoch: 18 Global Step: 95800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:57:25,020-Speed 10520.13 samples/sec Loss 1.8347 LearningRate 0.0054 Epoch: 18 Global Step: 95810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:57:32,810-Speed 10517.23 samples/sec Loss 1.8765 LearningRate 0.0054 Epoch: 18 Global Step: 95820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:57:40,586-Speed 10537.48 samples/sec Loss 1.8498 LearningRate 0.0054 Epoch: 18 Global Step: 95830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:57:48,368-Speed 10527.59 samples/sec Loss 1.8420 LearningRate 0.0054 Epoch: 18 Global Step: 95840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:57:56,147-Speed 10531.83 samples/sec Loss 1.8411 LearningRate 0.0053 Epoch: 18 Global Step: 95850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:58:03,925-Speed 10533.50 samples/sec Loss 1.8598 LearningRate 0.0053 Epoch: 18 Global Step: 95860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:58:11,705-Speed 10532.04 samples/sec Loss 1.8385 LearningRate 0.0053 Epoch: 18 Global Step: 95870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:58:19,500-Speed 10510.83 samples/sec Loss 1.8649 LearningRate 0.0053 Epoch: 18 Global Step: 95880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:58:27,280-Speed 10530.45 samples/sec Loss 1.8431 LearningRate 0.0053 Epoch: 18 Global Step: 95890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:58:35,081-Speed 10505.19 samples/sec Loss 1.8532 LearningRate 0.0053 Epoch: 18 Global Step: 95900 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-16 11:58:42,904-Speed 10474.70 samples/sec Loss 1.8423 LearningRate 0.0053 Epoch: 18 Global Step: 95910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:58:50,673-Speed 10544.51 samples/sec Loss 1.8521 LearningRate 0.0053 Epoch: 18 Global Step: 95920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:58:58,453-Speed 10530.62 samples/sec Loss 1.8624 LearningRate 0.0052 Epoch: 18 Global Step: 95930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:59:06,257-Speed 10499.94 samples/sec Loss 1.8361 LearningRate 0.0052 Epoch: 18 Global Step: 95940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:59:14,058-Speed 10502.31 samples/sec Loss 1.8567 LearningRate 0.0052 Epoch: 18 Global Step: 95950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:59:21,848-Speed 10517.47 samples/sec Loss 1.8578 LearningRate 0.0052 Epoch: 18 Global Step: 95960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:59:29,622-Speed 10544.55 samples/sec Loss 1.8303 LearningRate 0.0052 Epoch: 18 Global Step: 95970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:59:37,419-Speed 10508.22 samples/sec Loss 1.8422 LearningRate 0.0052 Epoch: 18 Global Step: 95980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:59:45,209-Speed 10517.78 samples/sec Loss 1.8446 LearningRate 0.0052 Epoch: 18 Global Step: 95990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 11:59:53,018-Speed 10494.26 samples/sec Loss 1.8311 LearningRate 0.0051 Epoch: 18 Global Step: 96000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 12:00:00,821-Speed 10504.43 samples/sec Loss 1.8422 LearningRate 0.0051 Epoch: 18 Global Step: 96010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 12:00:08,616-Speed 10516.65 samples/sec Loss 1.8293 LearningRate 0.0051 Epoch: 18 Global Step: 96020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 12:00:16,389-Speed 10539.61 samples/sec Loss 1.8369 LearningRate 0.0051 Epoch: 18 Global Step: 96030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 12:00:24,178-Speed 10519.74 samples/sec Loss 1.8292 LearningRate 0.0051 Epoch: 18 Global Step: 96040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 12:00:31,988-Speed 10491.07 samples/sec Loss 1.8497 LearningRate 0.0051 Epoch: 18 Global Step: 96050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 12:00:39,776-Speed 10520.16 samples/sec Loss 1.8379 LearningRate 0.0051 Epoch: 18 Global Step: 96060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 12:00:47,555-Speed 10532.77 samples/sec Loss 1.8294 LearningRate 0.0051 Epoch: 18 Global Step: 96070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 12:00:55,335-Speed 10530.92 samples/sec Loss 1.8272 LearningRate 0.0050 Epoch: 18 Global Step: 96080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 12:01:03,115-Speed 10531.33 samples/sec Loss 1.8430 LearningRate 0.0050 Epoch: 18 Global Step: 96090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 12:01:10,903-Speed 10519.49 samples/sec Loss 1.8433 LearningRate 0.0050 Epoch: 18 Global Step: 96100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 12:01:18,698-Speed 10510.89 samples/sec Loss 1.8422 LearningRate 0.0050 Epoch: 18 Global Step: 96110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 12:01:26,499-Speed 10501.63 samples/sec Loss 1.8496 LearningRate 0.0050 Epoch: 18 Global Step: 96120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 12:01:34,321-Speed 10475.63 samples/sec Loss 1.8481 LearningRate 0.0050 Epoch: 18 Global Step: 96130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 12:01:42,183-Speed 10422.27 samples/sec Loss 1.8407 LearningRate 0.0050 Epoch: 18 Global Step: 96140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:01:49,972-Speed 10517.13 samples/sec Loss 1.8324 LearningRate 0.0049 Epoch: 18 Global Step: 96150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:01:57,778-Speed 10497.56 samples/sec Loss 1.8455 LearningRate 0.0049 Epoch: 18 Global Step: 96160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:02:05,561-Speed 10527.01 samples/sec Loss 1.8345 LearningRate 0.0049 Epoch: 18 Global Step: 96170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:02:13,350-Speed 10518.22 samples/sec Loss 1.8341 LearningRate 0.0049 Epoch: 18 Global Step: 96180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:02:21,139-Speed 10519.14 samples/sec Loss 1.8323 LearningRate 0.0049 Epoch: 18 Global Step: 96190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:02:28,921-Speed 10528.49 samples/sec Loss 1.8172 LearningRate 0.0049 Epoch: 18 Global Step: 96200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:02:36,712-Speed 10515.42 samples/sec Loss 1.8204 LearningRate 0.0049 Epoch: 18 Global Step: 96210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:02:44,536-Speed 10471.81 samples/sec Loss 1.8203 LearningRate 0.0049 Epoch: 18 Global Step: 96220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:02:52,315-Speed 10532.75 samples/sec Loss 1.8165 LearningRate 0.0048 Epoch: 18 Global Step: 96230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:03:00,113-Speed 10507.24 samples/sec Loss 1.8250 LearningRate 0.0048 Epoch: 18 Global Step: 96240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 12:03:07,914-Speed 10503.40 samples/sec Loss 1.8296 LearningRate 0.0048 Epoch: 18 Global Step: 96250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 12:03:15,699-Speed 10524.57 samples/sec Loss 1.8035 LearningRate 0.0048 Epoch: 18 Global Step: 96260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 12:03:23,511-Speed 10487.69 samples/sec Loss 1.8191 LearningRate 0.0048 Epoch: 18 Global Step: 96270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 12:03:31,294-Speed 10527.17 samples/sec Loss 1.8497 LearningRate 0.0048 Epoch: 18 Global Step: 96280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:03:39,055-Speed 10556.36 samples/sec Loss 1.8259 LearningRate 0.0048 Epoch: 18 Global Step: 96290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:03:46,823-Speed 10546.71 samples/sec Loss 1.8229 LearningRate 0.0048 Epoch: 18 Global Step: 96300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:03:54,599-Speed 10539.65 samples/sec Loss 1.8195 LearningRate 0.0047 Epoch: 18 Global Step: 96310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:04:02,421-Speed 10475.24 samples/sec Loss 1.8071 LearningRate 0.0047 Epoch: 18 Global Step: 96320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:04:10,218-Speed 10507.33 samples/sec Loss 1.8017 LearningRate 0.0047 Epoch: 18 Global Step: 96330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:04:17,995-Speed 10539.40 samples/sec Loss 1.8267 LearningRate 0.0047 Epoch: 18 Global Step: 96340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:04:25,783-Speed 10520.84 samples/sec Loss 1.7946 LearningRate 0.0047 Epoch: 18 Global Step: 96350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:04:33,581-Speed 10508.02 samples/sec Loss 1.8064 LearningRate 0.0047 Epoch: 18 Global Step: 96360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:04:41,395-Speed 10484.52 samples/sec Loss 1.8147 LearningRate 0.0047 Epoch: 18 Global Step: 96370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:04:49,185-Speed 10516.87 samples/sec Loss 1.8148 LearningRate 0.0046 Epoch: 18 Global Step: 96380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 12:04:56,970-Speed 10525.14 samples/sec Loss 1.7920 LearningRate 0.0046 Epoch: 18 Global Step: 96390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 12:05:04,786-Speed 10482.08 samples/sec Loss 1.7928 LearningRate 0.0046 Epoch: 18 Global Step: 96400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 12:05:12,576-Speed 10517.75 samples/sec Loss 1.8061 LearningRate 0.0046 Epoch: 18 Global Step: 96410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 12:05:20,367-Speed 10515.01 samples/sec Loss 1.8019 LearningRate 0.0046 Epoch: 18 Global Step: 96420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 12:05:28,217-Speed 10438.29 samples/sec Loss 1.8140 LearningRate 0.0046 Epoch: 18 Global Step: 96430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 12:05:36,004-Speed 10521.73 samples/sec Loss 1.8164 LearningRate 0.0046 Epoch: 18 Global Step: 96440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 12:05:43,816-Speed 10487.55 samples/sec Loss 1.8277 LearningRate 0.0046 Epoch: 18 Global Step: 96450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 12:05:51,612-Speed 10508.88 samples/sec Loss 1.8026 LearningRate 0.0045 Epoch: 18 Global Step: 96460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 12:05:59,397-Speed 10527.09 samples/sec Loss 1.8034 LearningRate 0.0045 Epoch: 18 Global Step: 96470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 12:06:07,189-Speed 10515.00 samples/sec Loss 1.7755 LearningRate 0.0045 Epoch: 18 Global Step: 96480 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-01-16 12:06:14,991-Speed 10500.78 samples/sec Loss 1.8160 LearningRate 0.0045 Epoch: 18 Global Step: 96490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 12:06:22,807-Speed 10482.68 samples/sec Loss 1.7819 LearningRate 0.0045 Epoch: 18 Global Step: 96500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 12:06:30,614-Speed 10497.47 samples/sec Loss 1.7903 LearningRate 0.0045 Epoch: 18 Global Step: 96510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 12:06:38,436-Speed 10475.84 samples/sec Loss 1.7878 LearningRate 0.0045 Epoch: 18 Global Step: 96520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 12:06:46,227-Speed 10516.22 samples/sec Loss 1.7973 LearningRate 0.0045 Epoch: 18 Global Step: 96530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:06:54,044-Speed 10480.48 samples/sec Loss 1.7853 LearningRate 0.0044 Epoch: 18 Global Step: 96540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:07:01,832-Speed 10520.63 samples/sec Loss 1.7932 LearningRate 0.0044 Epoch: 18 Global Step: 96550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:07:09,636-Speed 10499.22 samples/sec Loss 1.7783 LearningRate 0.0044 Epoch: 18 Global Step: 96560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:07:17,434-Speed 10507.18 samples/sec Loss 1.7862 LearningRate 0.0044 Epoch: 18 Global Step: 96570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:07:25,225-Speed 10516.14 samples/sec Loss 1.7840 LearningRate 0.0044 Epoch: 18 Global Step: 96580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:07:33,026-Speed 10502.26 samples/sec Loss 1.7858 LearningRate 0.0044 Epoch: 18 Global Step: 96590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:07:40,860-Speed 10458.48 samples/sec Loss 1.7941 LearningRate 0.0044 Epoch: 18 Global Step: 96600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:07:48,676-Speed 10482.19 samples/sec Loss 1.7854 LearningRate 0.0044 Epoch: 18 Global Step: 96610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:07:56,480-Speed 10499.99 samples/sec Loss 1.7747 LearningRate 0.0043 Epoch: 18 Global Step: 96620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:08:04,276-Speed 10509.80 samples/sec Loss 1.7834 LearningRate 0.0043 Epoch: 18 Global Step: 96630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 12:08:12,055-Speed 10531.90 samples/sec Loss 1.7893 LearningRate 0.0043 Epoch: 18 Global Step: 96640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:08:19,839-Speed 10525.69 samples/sec Loss 1.7806 LearningRate 0.0043 Epoch: 18 Global Step: 96650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:08:27,630-Speed 10515.79 samples/sec Loss 1.7727 LearningRate 0.0043 Epoch: 18 Global Step: 96660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:08:35,413-Speed 10527.25 samples/sec Loss 1.7897 LearningRate 0.0043 Epoch: 18 Global Step: 96670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:08:43,207-Speed 10513.35 samples/sec Loss 1.7707 LearningRate 0.0043 Epoch: 18 Global Step: 96680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:08:51,011-Speed 10498.19 samples/sec Loss 1.7935 LearningRate 0.0043 Epoch: 18 Global Step: 96690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:08:58,793-Speed 10528.25 samples/sec Loss 1.7752 LearningRate 0.0042 Epoch: 18 Global Step: 96700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:09:06,634-Speed 10449.07 samples/sec Loss 1.7642 LearningRate 0.0042 Epoch: 18 Global Step: 96710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:09:14,424-Speed 10517.94 samples/sec Loss 1.8018 LearningRate 0.0042 Epoch: 18 Global Step: 96720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:09:22,234-Speed 10490.83 samples/sec Loss 1.7846 LearningRate 0.0042 Epoch: 18 Global Step: 96730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:09:30,068-Speed 10457.17 samples/sec Loss 1.8029 LearningRate 0.0042 Epoch: 18 Global Step: 96740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 12:09:37,877-Speed 10492.08 samples/sec Loss 1.7908 LearningRate 0.0042 Epoch: 18 Global Step: 96750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:09:45,676-Speed 10505.57 samples/sec Loss 1.7945 LearningRate 0.0042 Epoch: 18 Global Step: 96760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:09:53,479-Speed 10499.81 samples/sec Loss 1.7603 LearningRate 0.0042 Epoch: 18 Global Step: 96770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:10:01,276-Speed 10508.34 samples/sec Loss 1.7809 LearningRate 0.0042 Epoch: 18 Global Step: 96780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:10:09,074-Speed 10506.72 samples/sec Loss 1.7691 LearningRate 0.0041 Epoch: 18 Global Step: 96790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:10:16,890-Speed 10483.25 samples/sec Loss 1.7585 LearningRate 0.0041 Epoch: 18 Global Step: 96800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:10:24,700-Speed 10490.70 samples/sec Loss 1.7793 LearningRate 0.0041 Epoch: 18 Global Step: 96810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:10:32,527-Speed 10467.55 samples/sec Loss 1.7753 LearningRate 0.0041 Epoch: 18 Global Step: 96820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:10:40,316-Speed 10518.76 samples/sec Loss 1.7603 LearningRate 0.0041 Epoch: 18 Global Step: 96830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:10:48,109-Speed 10519.22 samples/sec Loss 1.7747 LearningRate 0.0041 Epoch: 18 Global Step: 96840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-01-16 12:10:55,939-Speed 10464.72 samples/sec Loss 1.7716 LearningRate 0.0041 Epoch: 18 Global Step: 96850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-01-16 12:11:03,729-Speed 10516.67 samples/sec Loss 1.7743 LearningRate 0.0041 Epoch: 18 Global Step: 96860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:11:11,519-Speed 10517.86 samples/sec Loss 1.7786 LearningRate 0.0040 Epoch: 18 Global Step: 96870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:11:19,315-Speed 10509.64 samples/sec Loss 1.7527 LearningRate 0.0040 Epoch: 18 Global Step: 96880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:11:27,154-Speed 10451.42 samples/sec Loss 1.7677 LearningRate 0.0040 Epoch: 18 Global Step: 96890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:11:34,970-Speed 10482.57 samples/sec Loss 1.7727 LearningRate 0.0040 Epoch: 18 Global Step: 96900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:11:42,755-Speed 10523.86 samples/sec Loss 1.7499 LearningRate 0.0040 Epoch: 18 Global Step: 96910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:11:50,638-Speed 10501.49 samples/sec Loss 1.7703 LearningRate 0.0040 Epoch: 18 Global Step: 96920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:11:58,431-Speed 10513.54 samples/sec Loss 1.7790 LearningRate 0.0040 Epoch: 18 Global Step: 96930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:12:06,271-Speed 10450.10 samples/sec Loss 1.7619 LearningRate 0.0040 Epoch: 18 Global Step: 96940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:12:14,058-Speed 10521.10 samples/sec Loss 1.7573 LearningRate 0.0040 Epoch: 18 Global Step: 96950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:12:21,887-Speed 10465.93 samples/sec Loss 1.7609 LearningRate 0.0039 Epoch: 18 Global Step: 96960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:12:29,711-Speed 10470.62 samples/sec Loss 1.7756 LearningRate 0.0039 Epoch: 18 Global Step: 96970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:12:37,520-Speed 10491.88 samples/sec Loss 1.7497 LearningRate 0.0039 Epoch: 18 Global Step: 96980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:12:45,310-Speed 10517.52 samples/sec Loss 1.7712 LearningRate 0.0039 Epoch: 18 Global Step: 96990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:12:53,109-Speed 10505.46 samples/sec Loss 1.7352 LearningRate 0.0039 Epoch: 18 Global Step: 97000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:13:00,905-Speed 10509.39 samples/sec Loss 1.7463 LearningRate 0.0039 Epoch: 18 Global Step: 97010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:13:08,695-Speed 10517.84 samples/sec Loss 1.7569 LearningRate 0.0039 Epoch: 18 Global Step: 97020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:13:16,527-Speed 10461.69 samples/sec Loss 1.7535 LearningRate 0.0039 Epoch: 18 Global Step: 97030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:13:24,323-Speed 10508.66 samples/sec Loss 1.7630 LearningRate 0.0038 Epoch: 18 Global Step: 97040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:13:32,123-Speed 10503.81 samples/sec Loss 1.7435 LearningRate 0.0038 Epoch: 18 Global Step: 97050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:13:39,942-Speed 10478.49 samples/sec Loss 1.7850 LearningRate 0.0038 Epoch: 18 Global Step: 97060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:13:47,739-Speed 10507.60 samples/sec Loss 1.7498 LearningRate 0.0038 Epoch: 18 Global Step: 97070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:13:55,543-Speed 10499.15 samples/sec Loss 1.7471 LearningRate 0.0038 Epoch: 18 Global Step: 97080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:14:03,338-Speed 10510.49 samples/sec Loss 1.7454 LearningRate 0.0038 Epoch: 18 Global Step: 97090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:14:11,150-Speed 10487.52 samples/sec Loss 1.7321 LearningRate 0.0038 Epoch: 18 Global Step: 97100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:14:18,953-Speed 10502.89 samples/sec Loss 1.7578 LearningRate 0.0038 Epoch: 18 Global Step: 97110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:14:26,756-Speed 10499.03 samples/sec Loss 1.7427 LearningRate 0.0038 Epoch: 18 Global Step: 97120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:14:34,532-Speed 10537.42 samples/sec Loss 1.7591 LearningRate 0.0037 Epoch: 18 Global Step: 97130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:14:42,312-Speed 10531.12 samples/sec Loss 1.7476 LearningRate 0.0037 Epoch: 18 Global Step: 97140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:14:50,079-Speed 10548.48 samples/sec Loss 1.7335 LearningRate 0.0037 Epoch: 18 Global Step: 97150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:14:57,868-Speed 10517.98 samples/sec Loss 1.7458 LearningRate 0.0037 Epoch: 18 Global Step: 97160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:15:05,674-Speed 10509.45 samples/sec Loss 1.7220 LearningRate 0.0037 Epoch: 18 Global Step: 97170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:15:13,476-Speed 10515.39 samples/sec Loss 1.7464 LearningRate 0.0037 Epoch: 18 Global Step: 97180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:15:21,294-Speed 10479.38 samples/sec Loss 1.7290 LearningRate 0.0037 Epoch: 18 Global Step: 97190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:15:29,107-Speed 10497.50 samples/sec Loss 1.7418 LearningRate 0.0037 Epoch: 18 Global Step: 97200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:15:36,924-Speed 10499.88 samples/sec Loss 1.7304 LearningRate 0.0037 Epoch: 18 Global Step: 97210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:15:44,697-Speed 10556.90 samples/sec Loss 1.7301 LearningRate 0.0036 Epoch: 18 Global Step: 97220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:15:52,487-Speed 10554.39 samples/sec Loss 1.7517 LearningRate 0.0036 Epoch: 18 Global Step: 97230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:16:00,281-Speed 10511.31 samples/sec Loss 1.7466 LearningRate 0.0036 Epoch: 18 Global Step: 97240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:16:08,093-Speed 10487.70 samples/sec Loss 1.7337 LearningRate 0.0036 Epoch: 18 Global Step: 97250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:16:15,897-Speed 10499.07 samples/sec Loss 1.7359 LearningRate 0.0036 Epoch: 18 Global Step: 97260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:16:23,740-Speed 10468.49 samples/sec Loss 1.7219 LearningRate 0.0036 Epoch: 18 Global Step: 97270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:16:31,558-Speed 10480.19 samples/sec Loss 1.7383 LearningRate 0.0036 Epoch: 18 Global Step: 97280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:16:39,380-Speed 10473.95 samples/sec Loss 1.7482 LearningRate 0.0036 Epoch: 18 Global Step: 97290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:16:47,179-Speed 10519.56 samples/sec Loss 1.7364 LearningRate 0.0035 Epoch: 18 Global Step: 97300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:16:55,185-Speed 10548.36 samples/sec Loss 1.7429 LearningRate 0.0035 Epoch: 18 Global Step: 97310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:17:02,976-Speed 10516.31 samples/sec Loss 1.7182 LearningRate 0.0035 Epoch: 18 Global Step: 97320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:17:10,766-Speed 10528.52 samples/sec Loss 1.7222 LearningRate 0.0035 Epoch: 18 Global Step: 97330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:17:18,542-Speed 10547.60 samples/sec Loss 1.7210 LearningRate 0.0035 Epoch: 18 Global Step: 97340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:17:26,334-Speed 10513.94 samples/sec Loss 1.7340 LearningRate 0.0035 Epoch: 18 Global Step: 97350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:17:34,145-Speed 10501.25 samples/sec Loss 1.7529 LearningRate 0.0035 Epoch: 18 Global Step: 97360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:17:41,941-Speed 10524.43 samples/sec Loss 1.7333 LearningRate 0.0035 Epoch: 18 Global Step: 97370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:17:49,729-Speed 10520.63 samples/sec Loss 1.7442 LearningRate 0.0035 Epoch: 18 Global Step: 97380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:17:57,509-Speed 10530.54 samples/sec Loss 1.7478 LearningRate 0.0035 Epoch: 18 Global Step: 97390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:18:05,304-Speed 10521.55 samples/sec Loss 1.7510 LearningRate 0.0034 Epoch: 18 Global Step: 97400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:18:13,088-Speed 10534.17 samples/sec Loss 1.7257 LearningRate 0.0034 Epoch: 18 Global Step: 97410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:18:20,863-Speed 10548.53 samples/sec Loss 1.7308 LearningRate 0.0034 Epoch: 18 Global Step: 97420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:18:28,638-Speed 10537.47 samples/sec Loss 1.7215 LearningRate 0.0034 Epoch: 18 Global Step: 97430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:18:36,423-Speed 10540.65 samples/sec Loss 1.7195 LearningRate 0.0034 Epoch: 18 Global Step: 97440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:18:44,223-Speed 10507.72 samples/sec Loss 1.7324 LearningRate 0.0034 Epoch: 18 Global Step: 97450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:18:52,032-Speed 10520.91 samples/sec Loss 1.7218 LearningRate 0.0034 Epoch: 18 Global Step: 97460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:18:59,822-Speed 10517.18 samples/sec Loss 1.7159 LearningRate 0.0034 Epoch: 18 Global Step: 97470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:19:07,702-Speed 10397.19 samples/sec Loss 1.7291 LearningRate 0.0034 Epoch: 18 Global Step: 97480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:19:15,504-Speed 10522.93 samples/sec Loss 1.6995 LearningRate 0.0033 Epoch: 18 Global Step: 97490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:19:23,298-Speed 10518.97 samples/sec Loss 1.7020 LearningRate 0.0033 Epoch: 18 Global Step: 97500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:19:31,102-Speed 10497.99 samples/sec Loss 1.7066 LearningRate 0.0033 Epoch: 18 Global Step: 97510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:19:38,897-Speed 10510.63 samples/sec Loss 1.7046 LearningRate 0.0033 Epoch: 18 Global Step: 97520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:19:46,698-Speed 10518.22 samples/sec Loss 1.7119 LearningRate 0.0033 Epoch: 18 Global Step: 97530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:19:54,483-Speed 10536.30 samples/sec Loss 1.7118 LearningRate 0.0033 Epoch: 18 Global Step: 97540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:20:02,277-Speed 10511.26 samples/sec Loss 1.7237 LearningRate 0.0033 Epoch: 18 Global Step: 97550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:20:10,108-Speed 10480.86 samples/sec Loss 1.7136 LearningRate 0.0033 Epoch: 18 Global Step: 97560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:20:17,885-Speed 10543.20 samples/sec Loss 1.7237 LearningRate 0.0033 Epoch: 18 Global Step: 97570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:20:25,679-Speed 10524.34 samples/sec Loss 1.7029 LearningRate 0.0032 Epoch: 18 Global Step: 97580 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-16 12:20:33,449-Speed 10544.36 samples/sec Loss 1.6984 LearningRate 0.0032 Epoch: 18 Global Step: 97590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:20:41,257-Speed 10526.36 samples/sec Loss 1.7308 LearningRate 0.0032 Epoch: 18 Global Step: 97600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:20:49,036-Speed 10531.55 samples/sec Loss 1.7262 LearningRate 0.0032 Epoch: 18 Global Step: 97610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:20:56,837-Speed 10517.57 samples/sec Loss 1.7161 LearningRate 0.0032 Epoch: 18 Global Step: 97620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:21:04,650-Speed 10499.29 samples/sec Loss 1.7058 LearningRate 0.0032 Epoch: 18 Global Step: 97630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:21:12,462-Speed 10486.78 samples/sec Loss 1.7119 LearningRate 0.0032 Epoch: 18 Global Step: 97640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:21:20,247-Speed 10524.24 samples/sec Loss 1.6938 LearningRate 0.0032 Epoch: 18 Global Step: 97650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:21:28,029-Speed 10540.71 samples/sec Loss 1.6875 LearningRate 0.0032 Epoch: 18 Global Step: 97660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:21:35,827-Speed 10522.77 samples/sec Loss 1.7025 LearningRate 0.0032 Epoch: 18 Global Step: 97670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:21:43,629-Speed 10501.30 samples/sec Loss 1.7033 LearningRate 0.0031 Epoch: 18 Global Step: 97680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:21:51,421-Speed 10520.88 samples/sec Loss 1.6884 LearningRate 0.0031 Epoch: 18 Global Step: 97690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:21:59,240-Speed 10478.76 samples/sec Loss 1.7102 LearningRate 0.0031 Epoch: 18 Global Step: 97700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:22:07,051-Speed 10488.64 samples/sec Loss 1.6888 LearningRate 0.0031 Epoch: 18 Global Step: 97710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:22:14,855-Speed 10498.81 samples/sec Loss 1.6967 LearningRate 0.0031 Epoch: 18 Global Step: 97720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:22:22,659-Speed 10498.98 samples/sec Loss 1.7211 LearningRate 0.0031 Epoch: 18 Global Step: 97730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:22:30,471-Speed 10486.99 samples/sec Loss 1.7108 LearningRate 0.0031 Epoch: 18 Global Step: 97740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:22:38,279-Speed 10493.68 samples/sec Loss 1.6850 LearningRate 0.0031 Epoch: 18 Global Step: 97750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:22:46,067-Speed 10520.92 samples/sec Loss 1.7141 LearningRate 0.0031 Epoch: 18 Global Step: 97760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:22:53,862-Speed 10510.01 samples/sec Loss 1.7166 LearningRate 0.0030 Epoch: 18 Global Step: 97770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:23:01,634-Speed 10541.81 samples/sec Loss 1.6981 LearningRate 0.0030 Epoch: 18 Global Step: 97780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:23:09,427-Speed 10514.02 samples/sec Loss 1.7145 LearningRate 0.0030 Epoch: 18 Global Step: 97790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:23:17,229-Speed 10501.05 samples/sec Loss 1.6897 LearningRate 0.0030 Epoch: 18 Global Step: 97800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:23:25,023-Speed 10512.14 samples/sec Loss 1.7219 LearningRate 0.0030 Epoch: 18 Global Step: 97810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:23:32,839-Speed 10482.78 samples/sec Loss 1.7243 LearningRate 0.0030 Epoch: 18 Global Step: 97820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:23:40,626-Speed 10522.38 samples/sec Loss 1.6971 LearningRate 0.0030 Epoch: 18 Global Step: 97830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:23:48,452-Speed 10468.13 samples/sec Loss 1.6826 LearningRate 0.0030 Epoch: 18 Global Step: 97840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:23:56,294-Speed 10447.46 samples/sec Loss 1.7021 LearningRate 0.0030 Epoch: 18 Global Step: 97850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:24:04,088-Speed 10511.48 samples/sec Loss 1.6765 LearningRate 0.0030 Epoch: 18 Global Step: 97860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:24:11,895-Speed 10495.07 samples/sec Loss 1.6816 LearningRate 0.0029 Epoch: 18 Global Step: 97870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:24:19,702-Speed 10493.90 samples/sec Loss 1.6911 LearningRate 0.0029 Epoch: 18 Global Step: 97880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:24:27,519-Speed 10482.18 samples/sec Loss 1.6985 LearningRate 0.0029 Epoch: 18 Global Step: 97890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:24:35,305-Speed 10527.69 samples/sec Loss 1.6767 LearningRate 0.0029 Epoch: 18 Global Step: 97900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:24:43,092-Speed 10520.61 samples/sec Loss 1.6941 LearningRate 0.0029 Epoch: 18 Global Step: 97910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:24:50,883-Speed 10516.62 samples/sec Loss 1.6908 LearningRate 0.0029 Epoch: 18 Global Step: 97920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:24:58,715-Speed 10461.12 samples/sec Loss 1.6771 LearningRate 0.0029 Epoch: 18 Global Step: 97930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:25:06,761-Speed 10501.34 samples/sec Loss 1.6821 LearningRate 0.0029 Epoch: 18 Global Step: 97940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:25:14,536-Speed 10537.38 samples/sec Loss 1.6765 LearningRate 0.0029 Epoch: 18 Global Step: 97950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:25:22,334-Speed 10513.85 samples/sec Loss 1.6694 LearningRate 0.0029 Epoch: 18 Global Step: 97960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:25:30,127-Speed 10512.95 samples/sec Loss 1.6949 LearningRate 0.0028 Epoch: 18 Global Step: 97970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:25:37,954-Speed 10468.11 samples/sec Loss 1.6936 LearningRate 0.0028 Epoch: 18 Global Step: 97980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:25:45,761-Speed 10551.76 samples/sec Loss 1.7013 LearningRate 0.0028 Epoch: 18 Global Step: 97990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:25:53,552-Speed 10515.65 samples/sec Loss 1.6909 LearningRate 0.0028 Epoch: 18 Global Step: 98000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:26:01,349-Speed 10507.29 samples/sec Loss 1.6726 LearningRate 0.0028 Epoch: 18 Global Step: 98010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:26:09,156-Speed 10495.73 samples/sec Loss 1.6765 LearningRate 0.0028 Epoch: 18 Global Step: 98020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:26:16,967-Speed 10492.59 samples/sec Loss 1.6726 LearningRate 0.0028 Epoch: 18 Global Step: 98030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:26:24,754-Speed 10522.28 samples/sec Loss 1.6807 LearningRate 0.0028 Epoch: 18 Global Step: 98040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:26:32,544-Speed 10517.75 samples/sec Loss 1.6868 LearningRate 0.0028 Epoch: 18 Global Step: 98050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:26:40,326-Speed 10527.32 samples/sec Loss 1.6871 LearningRate 0.0028 Epoch: 18 Global Step: 98060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:26:48,158-Speed 10461.60 samples/sec Loss 1.6984 LearningRate 0.0027 Epoch: 18 Global Step: 98070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:26:55,924-Speed 10549.78 samples/sec Loss 1.6800 LearningRate 0.0027 Epoch: 18 Global Step: 98080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:27:03,721-Speed 10507.95 samples/sec Loss 1.6654 LearningRate 0.0027 Epoch: 18 Global Step: 98090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:27:11,499-Speed 10537.66 samples/sec Loss 1.6746 LearningRate 0.0027 Epoch: 18 Global Step: 98100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:27:19,288-Speed 10519.36 samples/sec Loss 1.7022 LearningRate 0.0027 Epoch: 18 Global Step: 98110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:27:27,090-Speed 10501.35 samples/sec Loss 1.6690 LearningRate 0.0027 Epoch: 18 Global Step: 98120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:27:34,872-Speed 10526.68 samples/sec Loss 1.6729 LearningRate 0.0027 Epoch: 18 Global Step: 98130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:27:42,689-Speed 10531.91 samples/sec Loss 1.6830 LearningRate 0.0027 Epoch: 18 Global Step: 98140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:27:50,527-Speed 10454.06 samples/sec Loss 1.6694 LearningRate 0.0027 Epoch: 18 Global Step: 98150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:27:58,310-Speed 10526.68 samples/sec Loss 1.6783 LearningRate 0.0027 Epoch: 18 Global Step: 98160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:28:06,115-Speed 10496.49 samples/sec Loss 1.6593 LearningRate 0.0026 Epoch: 18 Global Step: 98170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:28:13,974-Speed 10425.69 samples/sec Loss 1.6640 LearningRate 0.0026 Epoch: 18 Global Step: 98180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:28:21,785-Speed 10490.64 samples/sec Loss 1.6686 LearningRate 0.0026 Epoch: 18 Global Step: 98190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:28:29,569-Speed 10524.26 samples/sec Loss 1.6706 LearningRate 0.0026 Epoch: 18 Global Step: 98200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:28:37,352-Speed 10529.07 samples/sec Loss 1.6808 LearningRate 0.0026 Epoch: 18 Global Step: 98210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:28:45,240-Speed 10385.79 samples/sec Loss 1.6532 LearningRate 0.0026 Epoch: 18 Global Step: 98220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:28:53,030-Speed 10517.97 samples/sec Loss 1.6800 LearningRate 0.0026 Epoch: 18 Global Step: 98230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:29:00,827-Speed 10507.88 samples/sec Loss 1.6731 LearningRate 0.0026 Epoch: 18 Global Step: 98240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:29:08,637-Speed 10490.58 samples/sec Loss 1.6623 LearningRate 0.0026 Epoch: 18 Global Step: 98250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:29:16,431-Speed 10513.71 samples/sec Loss 1.6726 LearningRate 0.0026 Epoch: 18 Global Step: 98260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:29:24,254-Speed 10528.12 samples/sec Loss 1.6610 LearningRate 0.0026 Epoch: 18 Global Step: 98270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:29:32,042-Speed 10519.83 samples/sec Loss 1.6734 LearningRate 0.0025 Epoch: 18 Global Step: 98280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:29:39,871-Speed 10464.92 samples/sec Loss 1.6677 LearningRate 0.0025 Epoch: 18 Global Step: 98290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:29:47,662-Speed 10517.34 samples/sec Loss 1.6772 LearningRate 0.0025 Epoch: 18 Global Step: 98300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:29:55,441-Speed 10532.62 samples/sec Loss 1.6763 LearningRate 0.0025 Epoch: 18 Global Step: 98310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:30:03,233-Speed 10513.15 samples/sec Loss 1.6675 LearningRate 0.0025 Epoch: 18 Global Step: 98320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:30:11,029-Speed 10510.42 samples/sec Loss 1.6726 LearningRate 0.0025 Epoch: 18 Global Step: 98330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:30:18,799-Speed 10544.94 samples/sec Loss 1.6616 LearningRate 0.0025 Epoch: 18 Global Step: 98340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:30:26,598-Speed 10505.20 samples/sec Loss 1.6751 LearningRate 0.0025 Epoch: 18 Global Step: 98350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:30:34,430-Speed 10464.08 samples/sec Loss 1.6724 LearningRate 0.0025 Epoch: 18 Global Step: 98360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:30:42,237-Speed 10493.55 samples/sec Loss 1.6672 LearningRate 0.0025 Epoch: 18 Global Step: 98370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:30:50,067-Speed 10463.48 samples/sec Loss 1.6762 LearningRate 0.0024 Epoch: 18 Global Step: 98380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:30:58,162-Speed 10465.89 samples/sec Loss 1.6370 LearningRate 0.0024 Epoch: 18 Global Step: 98390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:31:05,987-Speed 10471.59 samples/sec Loss 1.6586 LearningRate 0.0024 Epoch: 18 Global Step: 98400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:31:13,852-Speed 10417.05 samples/sec Loss 1.6691 LearningRate 0.0024 Epoch: 18 Global Step: 98410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:31:21,673-Speed 10481.72 samples/sec Loss 1.6722 LearningRate 0.0024 Epoch: 18 Global Step: 98420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:31:29,477-Speed 10499.09 samples/sec Loss 1.6334 LearningRate 0.0024 Epoch: 18 Global Step: 98430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:31:37,306-Speed 10465.06 samples/sec Loss 1.6436 LearningRate 0.0024 Epoch: 18 Global Step: 98440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:31:45,110-Speed 10497.87 samples/sec Loss 1.6498 LearningRate 0.0024 Epoch: 18 Global Step: 98450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:31:52,922-Speed 10488.69 samples/sec Loss 1.6613 LearningRate 0.0024 Epoch: 18 Global Step: 98460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:32:00,751-Speed 10464.97 samples/sec Loss 1.6512 LearningRate 0.0024 Epoch: 18 Global Step: 98470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:32:08,579-Speed 10467.17 samples/sec Loss 1.6622 LearningRate 0.0024 Epoch: 18 Global Step: 98480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:32:16,403-Speed 10470.92 samples/sec Loss 1.6486 LearningRate 0.0023 Epoch: 18 Global Step: 98490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:32:24,238-Speed 10460.69 samples/sec Loss 1.6617 LearningRate 0.0023 Epoch: 18 Global Step: 98500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:32:32,077-Speed 10452.27 samples/sec Loss 1.6664 LearningRate 0.0023 Epoch: 18 Global Step: 98510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:33:00,044-Speed 2929.73 samples/sec Loss 1.6659 LearningRate 0.0023 Epoch: 19 Global Step: 98520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:33:07,808-Speed 10553.19 samples/sec Loss 1.6732 LearningRate 0.0023 Epoch: 19 Global Step: 98530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:33:15,587-Speed 10531.48 samples/sec Loss 1.6466 LearningRate 0.0023 Epoch: 19 Global Step: 98540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:33:23,362-Speed 10537.66 samples/sec Loss 1.6474 LearningRate 0.0023 Epoch: 19 Global Step: 98550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:33:31,130-Speed 10548.48 samples/sec Loss 1.6584 LearningRate 0.0023 Epoch: 19 Global Step: 98560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:33:38,905-Speed 10536.93 samples/sec Loss 1.6512 LearningRate 0.0023 Epoch: 19 Global Step: 98570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:33:46,689-Speed 10525.71 samples/sec Loss 1.6320 LearningRate 0.0023 Epoch: 19 Global Step: 98580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:33:54,463-Speed 10538.38 samples/sec Loss 1.6495 LearningRate 0.0023 Epoch: 19 Global Step: 98590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:34:02,237-Speed 10540.04 samples/sec Loss 1.6454 LearningRate 0.0023 Epoch: 19 Global Step: 98600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:34:10,074-Speed 10544.17 samples/sec Loss 1.6502 LearningRate 0.0022 Epoch: 19 Global Step: 98610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:34:17,878-Speed 10498.70 samples/sec Loss 1.6330 LearningRate 0.0022 Epoch: 19 Global Step: 98620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:34:25,675-Speed 10507.81 samples/sec Loss 1.6435 LearningRate 0.0022 Epoch: 19 Global Step: 98630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:34:33,497-Speed 10475.86 samples/sec Loss 1.6449 LearningRate 0.0022 Epoch: 19 Global Step: 98640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:34:41,394-Speed 10521.78 samples/sec Loss 1.6375 LearningRate 0.0022 Epoch: 19 Global Step: 98650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:34:49,193-Speed 10504.10 samples/sec Loss 1.6393 LearningRate 0.0022 Epoch: 19 Global Step: 98660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:34:56,980-Speed 10521.96 samples/sec Loss 1.6401 LearningRate 0.0022 Epoch: 19 Global Step: 98670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:35:04,779-Speed 10510.58 samples/sec Loss 1.6332 LearningRate 0.0022 Epoch: 19 Global Step: 98680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:35:12,566-Speed 10521.39 samples/sec Loss 1.6262 LearningRate 0.0022 Epoch: 19 Global Step: 98690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:35:20,346-Speed 10530.65 samples/sec Loss 1.6113 LearningRate 0.0022 Epoch: 19 Global Step: 98700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:35:28,121-Speed 10537.20 samples/sec Loss 1.6318 LearningRate 0.0022 Epoch: 19 Global Step: 98710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:35:35,911-Speed 10531.20 samples/sec Loss 1.6368 LearningRate 0.0021 Epoch: 19 Global Step: 98720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:35:43,700-Speed 10518.56 samples/sec Loss 1.6314 LearningRate 0.0021 Epoch: 19 Global Step: 98730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:35:51,501-Speed 10502.92 samples/sec Loss 1.6429 LearningRate 0.0021 Epoch: 19 Global Step: 98740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:35:59,269-Speed 10548.38 samples/sec Loss 1.6492 LearningRate 0.0021 Epoch: 19 Global Step: 98750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:36:07,037-Speed 10548.31 samples/sec Loss 1.6374 LearningRate 0.0021 Epoch: 19 Global Step: 98760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:36:14,814-Speed 10535.87 samples/sec Loss 1.6416 LearningRate 0.0021 Epoch: 19 Global Step: 98770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:36:22,612-Speed 10506.54 samples/sec Loss 1.6377 LearningRate 0.0021 Epoch: 19 Global Step: 98780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:36:30,438-Speed 10469.90 samples/sec Loss 1.6305 LearningRate 0.0021 Epoch: 19 Global Step: 98790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:36:38,232-Speed 10513.25 samples/sec Loss 1.6194 LearningRate 0.0021 Epoch: 19 Global Step: 98800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:36:46,017-Speed 10523.02 samples/sec Loss 1.6271 LearningRate 0.0021 Epoch: 19 Global Step: 98810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:36:53,798-Speed 10530.63 samples/sec Loss 1.6490 LearningRate 0.0021 Epoch: 19 Global Step: 98820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:37:01,635-Speed 10529.99 samples/sec Loss 1.6316 LearningRate 0.0021 Epoch: 19 Global Step: 98830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:37:09,415-Speed 10530.25 samples/sec Loss 1.6431 LearningRate 0.0020 Epoch: 19 Global Step: 98840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:37:17,203-Speed 10520.69 samples/sec Loss 1.6432 LearningRate 0.0020 Epoch: 19 Global Step: 98850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:37:24,976-Speed 10540.59 samples/sec Loss 1.6175 LearningRate 0.0020 Epoch: 19 Global Step: 98860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:37:32,769-Speed 10511.91 samples/sec Loss 1.6288 LearningRate 0.0020 Epoch: 19 Global Step: 98870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:37:40,569-Speed 10504.50 samples/sec Loss 1.6114 LearningRate 0.0020 Epoch: 19 Global Step: 98880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:37:48,355-Speed 10522.82 samples/sec Loss 1.6352 LearningRate 0.0020 Epoch: 19 Global Step: 98890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:37:56,140-Speed 10523.60 samples/sec Loss 1.6366 LearningRate 0.0020 Epoch: 19 Global Step: 98900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:38:03,931-Speed 10516.85 samples/sec Loss 1.6370 LearningRate 0.0020 Epoch: 19 Global Step: 98910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:38:11,762-Speed 10463.23 samples/sec Loss 1.6246 LearningRate 0.0020 Epoch: 19 Global Step: 98920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:38:19,601-Speed 10451.39 samples/sec Loss 1.6336 LearningRate 0.0020 Epoch: 19 Global Step: 98930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:38:27,443-Speed 10449.33 samples/sec Loss 1.6109 LearningRate 0.0020 Epoch: 19 Global Step: 98940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:38:35,304-Speed 10422.77 samples/sec Loss 1.6159 LearningRate 0.0020 Epoch: 19 Global Step: 98950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:38:43,164-Speed 10424.09 samples/sec Loss 1.6358 LearningRate 0.0019 Epoch: 19 Global Step: 98960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:38:51,045-Speed 10395.93 samples/sec Loss 1.6232 LearningRate 0.0019 Epoch: 19 Global Step: 98970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:38:58,895-Speed 10436.84 samples/sec Loss 1.6211 LearningRate 0.0019 Epoch: 19 Global Step: 98980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:39:06,742-Speed 10441.93 samples/sec Loss 1.6312 LearningRate 0.0019 Epoch: 19 Global Step: 98990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:39:14,580-Speed 10452.85 samples/sec Loss 1.6217 LearningRate 0.0019 Epoch: 19 Global Step: 99000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:39:22,434-Speed 10431.05 samples/sec Loss 1.6288 LearningRate 0.0019 Epoch: 19 Global Step: 99010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:39:30,276-Speed 10448.25 samples/sec Loss 1.6063 LearningRate 0.0019 Epoch: 19 Global Step: 99020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:39:38,117-Speed 10456.25 samples/sec Loss 1.6251 LearningRate 0.0019 Epoch: 19 Global Step: 99030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:39:45,931-Speed 10484.83 samples/sec Loss 1.6212 LearningRate 0.0019 Epoch: 19 Global Step: 99040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:39:53,790-Speed 10424.50 samples/sec Loss 1.6048 LearningRate 0.0019 Epoch: 19 Global Step: 99050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:40:01,626-Speed 10456.13 samples/sec Loss 1.5989 LearningRate 0.0019 Epoch: 19 Global Step: 99060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:40:09,475-Speed 10437.58 samples/sec Loss 1.6192 LearningRate 0.0019 Epoch: 19 Global Step: 99070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:40:17,297-Speed 10475.51 samples/sec Loss 1.6216 LearningRate 0.0018 Epoch: 19 Global Step: 99080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:40:25,165-Speed 10411.97 samples/sec Loss 1.5978 LearningRate 0.0018 Epoch: 19 Global Step: 99090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:40:32,994-Speed 10464.96 samples/sec Loss 1.5963 LearningRate 0.0018 Epoch: 19 Global Step: 99100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:40:40,844-Speed 10437.66 samples/sec Loss 1.6176 LearningRate 0.0018 Epoch: 19 Global Step: 99110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:40:48,702-Speed 10426.58 samples/sec Loss 1.6238 LearningRate 0.0018 Epoch: 19 Global Step: 99120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:40:56,529-Speed 10468.51 samples/sec Loss 1.6480 LearningRate 0.0018 Epoch: 19 Global Step: 99130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:41:04,412-Speed 10461.97 samples/sec Loss 1.6316 LearningRate 0.0018 Epoch: 19 Global Step: 99140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:41:12,238-Speed 10468.57 samples/sec Loss 1.5966 LearningRate 0.0018 Epoch: 19 Global Step: 99150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:41:20,057-Speed 10478.65 samples/sec Loss 1.6142 LearningRate 0.0018 Epoch: 19 Global Step: 99160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:41:27,893-Speed 10456.84 samples/sec Loss 1.6255 LearningRate 0.0018 Epoch: 19 Global Step: 99170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:41:35,734-Speed 10448.70 samples/sec Loss 1.6156 LearningRate 0.0018 Epoch: 19 Global Step: 99180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:41:43,561-Speed 10466.91 samples/sec Loss 1.6059 LearningRate 0.0018 Epoch: 19 Global Step: 99190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:41:51,442-Speed 10396.77 samples/sec Loss 1.6196 LearningRate 0.0018 Epoch: 19 Global Step: 99200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:41:59,401-Speed 10504.82 samples/sec Loss 1.5750 LearningRate 0.0017 Epoch: 19 Global Step: 99210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:42:07,256-Speed 10429.39 samples/sec Loss 1.6009 LearningRate 0.0017 Epoch: 19 Global Step: 99220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:42:16,297-Speed 10458.17 samples/sec Loss 1.6054 LearningRate 0.0017 Epoch: 19 Global Step: 99230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:42:24,123-Speed 10469.39 samples/sec Loss 1.5915 LearningRate 0.0017 Epoch: 19 Global Step: 99240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:42:31,943-Speed 10476.16 samples/sec Loss 1.5876 LearningRate 0.0017 Epoch: 19 Global Step: 99250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:42:39,776-Speed 10460.50 samples/sec Loss 1.6096 LearningRate 0.0017 Epoch: 19 Global Step: 99260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:42:47,640-Speed 10418.93 samples/sec Loss 1.6325 LearningRate 0.0017 Epoch: 19 Global Step: 99270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:42:55,449-Speed 10491.47 samples/sec Loss 1.5825 LearningRate 0.0017 Epoch: 19 Global Step: 99280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:43:03,289-Speed 10450.69 samples/sec Loss 1.5855 LearningRate 0.0017 Epoch: 19 Global Step: 99290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:43:11,119-Speed 10464.47 samples/sec Loss 1.5879 LearningRate 0.0017 Epoch: 19 Global Step: 99300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:43:18,967-Speed 10438.83 samples/sec Loss 1.6112 LearningRate 0.0017 Epoch: 19 Global Step: 99310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:43:26,809-Speed 10447.52 samples/sec Loss 1.6012 LearningRate 0.0017 Epoch: 19 Global Step: 99320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:43:34,636-Speed 10476.47 samples/sec Loss 1.6218 LearningRate 0.0017 Epoch: 19 Global Step: 99330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:43:42,458-Speed 10473.71 samples/sec Loss 1.6045 LearningRate 0.0016 Epoch: 19 Global Step: 99340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:43:50,269-Speed 10490.06 samples/sec Loss 1.5968 LearningRate 0.0016 Epoch: 19 Global Step: 99350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:43:58,051-Speed 10528.04 samples/sec Loss 1.5761 LearningRate 0.0016 Epoch: 19 Global Step: 99360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:44:05,833-Speed 10529.06 samples/sec Loss 1.6012 LearningRate 0.0016 Epoch: 19 Global Step: 99370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:44:13,632-Speed 10505.06 samples/sec Loss 1.6155 LearningRate 0.0016 Epoch: 19 Global Step: 99380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:44:21,414-Speed 10527.92 samples/sec Loss 1.6171 LearningRate 0.0016 Epoch: 19 Global Step: 99390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:44:29,216-Speed 10501.19 samples/sec Loss 1.6105 LearningRate 0.0016 Epoch: 19 Global Step: 99400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:44:37,024-Speed 10492.76 samples/sec Loss 1.5906 LearningRate 0.0016 Epoch: 19 Global Step: 99410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:44:44,838-Speed 10485.56 samples/sec Loss 1.5865 LearningRate 0.0016 Epoch: 19 Global Step: 99420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:44:52,646-Speed 10493.95 samples/sec Loss 1.6088 LearningRate 0.0016 Epoch: 19 Global Step: 99430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:45:00,436-Speed 10517.18 samples/sec Loss 1.5913 LearningRate 0.0016 Epoch: 19 Global Step: 99440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:45:08,247-Speed 10489.47 samples/sec Loss 1.6031 LearningRate 0.0016 Epoch: 19 Global Step: 99450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:45:16,052-Speed 10497.77 samples/sec Loss 1.5878 LearningRate 0.0016 Epoch: 19 Global Step: 99460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:45:23,864-Speed 10488.79 samples/sec Loss 1.5858 LearningRate 0.0015 Epoch: 19 Global Step: 99470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:45:33,231-Speed 10532.78 samples/sec Loss 1.5899 LearningRate 0.0015 Epoch: 19 Global Step: 99480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:45:41,031-Speed 10503.59 samples/sec Loss 1.5940 LearningRate 0.0015 Epoch: 19 Global Step: 99490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:45:48,830-Speed 10504.94 samples/sec Loss 1.5769 LearningRate 0.0015 Epoch: 19 Global Step: 99500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:45:56,636-Speed 10496.81 samples/sec Loss 1.6082 LearningRate 0.0015 Epoch: 19 Global Step: 99510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:46:04,437-Speed 10502.66 samples/sec Loss 1.5942 LearningRate 0.0015 Epoch: 19 Global Step: 99520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:46:12,230-Speed 10512.74 samples/sec Loss 1.5845 LearningRate 0.0015 Epoch: 19 Global Step: 99530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:46:20,018-Speed 10520.54 samples/sec Loss 1.5882 LearningRate 0.0015 Epoch: 19 Global Step: 99540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:46:27,813-Speed 10509.96 samples/sec Loss 1.5830 LearningRate 0.0015 Epoch: 19 Global Step: 99550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:46:35,610-Speed 10508.83 samples/sec Loss 1.5950 LearningRate 0.0015 Epoch: 19 Global Step: 99560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:46:43,402-Speed 10513.82 samples/sec Loss 1.5822 LearningRate 0.0015 Epoch: 19 Global Step: 99570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:46:51,183-Speed 10530.65 samples/sec Loss 1.5924 LearningRate 0.0015 Epoch: 19 Global Step: 99580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:46:58,973-Speed 10516.00 samples/sec Loss 1.5763 LearningRate 0.0015 Epoch: 19 Global Step: 99590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:47:06,845-Speed 10492.45 samples/sec Loss 1.5969 LearningRate 0.0015 Epoch: 19 Global Step: 99600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:47:14,686-Speed 10450.09 samples/sec Loss 1.5910 LearningRate 0.0014 Epoch: 19 Global Step: 99610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:47:22,478-Speed 10514.68 samples/sec Loss 1.5792 LearningRate 0.0014 Epoch: 19 Global Step: 99620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:47:30,269-Speed 10516.05 samples/sec Loss 1.6032 LearningRate 0.0014 Epoch: 19 Global Step: 99630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:47:38,052-Speed 10527.30 samples/sec Loss 1.5922 LearningRate 0.0014 Epoch: 19 Global Step: 99640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:47:45,983-Speed 10523.13 samples/sec Loss 1.5784 LearningRate 0.0014 Epoch: 19 Global Step: 99650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:47:53,786-Speed 10500.20 samples/sec Loss 1.5924 LearningRate 0.0014 Epoch: 19 Global Step: 99660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:48:01,591-Speed 10496.41 samples/sec Loss 1.5842 LearningRate 0.0014 Epoch: 19 Global Step: 99670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:48:09,384-Speed 10514.16 samples/sec Loss 1.5789 LearningRate 0.0014 Epoch: 19 Global Step: 99680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:48:17,185-Speed 10502.98 samples/sec Loss 1.5739 LearningRate 0.0014 Epoch: 19 Global Step: 99690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:48:25,000-Speed 10484.23 samples/sec Loss 1.5860 LearningRate 0.0014 Epoch: 19 Global Step: 99700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:48:33,076-Speed 10533.84 samples/sec Loss 1.5754 LearningRate 0.0014 Epoch: 19 Global Step: 99710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:48:40,874-Speed 10506.26 samples/sec Loss 1.5871 LearningRate 0.0014 Epoch: 19 Global Step: 99720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:48:48,695-Speed 10475.22 samples/sec Loss 1.5828 LearningRate 0.0014 Epoch: 19 Global Step: 99730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:48:58,188-Speed 10488.97 samples/sec Loss 1.5746 LearningRate 0.0014 Epoch: 19 Global Step: 99740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:49:05,974-Speed 10532.22 samples/sec Loss 1.5777 LearningRate 0.0013 Epoch: 19 Global Step: 99750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:49:13,753-Speed 10531.90 samples/sec Loss 1.5885 LearningRate 0.0013 Epoch: 19 Global Step: 99760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:49:21,540-Speed 10520.79 samples/sec Loss 1.5819 LearningRate 0.0013 Epoch: 19 Global Step: 99770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:49:29,323-Speed 10527.82 samples/sec Loss 1.5748 LearningRate 0.0013 Epoch: 19 Global Step: 99780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:49:37,113-Speed 10516.95 samples/sec Loss 1.5852 LearningRate 0.0013 Epoch: 19 Global Step: 99790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:49:44,931-Speed 10480.86 samples/sec Loss 1.5620 LearningRate 0.0013 Epoch: 19 Global Step: 99800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:49:52,719-Speed 10518.41 samples/sec Loss 1.5926 LearningRate 0.0013 Epoch: 19 Global Step: 99810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:50:00,512-Speed 10514.96 samples/sec Loss 1.5594 LearningRate 0.0013 Epoch: 19 Global Step: 99820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:50:08,313-Speed 10502.10 samples/sec Loss 1.5687 LearningRate 0.0013 Epoch: 19 Global Step: 99830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:50:16,092-Speed 10533.37 samples/sec Loss 1.5615 LearningRate 0.0013 Epoch: 19 Global Step: 99840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:50:23,960-Speed 10412.38 samples/sec Loss 1.5777 LearningRate 0.0013 Epoch: 19 Global Step: 99850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:50:31,743-Speed 10527.08 samples/sec Loss 1.5987 LearningRate 0.0013 Epoch: 19 Global Step: 99860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:50:39,556-Speed 10486.41 samples/sec Loss 1.5997 LearningRate 0.0013 Epoch: 19 Global Step: 99870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:50:47,341-Speed 10524.70 samples/sec Loss 1.5657 LearningRate 0.0013 Epoch: 19 Global Step: 99880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:50:55,138-Speed 10507.76 samples/sec Loss 1.5765 LearningRate 0.0013 Epoch: 19 Global Step: 99890 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-16 12:51:02,931-Speed 10513.70 samples/sec Loss 1.5820 LearningRate 0.0012 Epoch: 19 Global Step: 99900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:51:10,714-Speed 10526.37 samples/sec Loss 1.5915 LearningRate 0.0012 Epoch: 19 Global Step: 99910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:51:18,531-Speed 10482.51 samples/sec Loss 1.5588 LearningRate 0.0012 Epoch: 19 Global Step: 99920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:51:26,361-Speed 10464.12 samples/sec Loss 1.5926 LearningRate 0.0012 Epoch: 19 Global Step: 99930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:51:34,169-Speed 10494.41 samples/sec Loss 1.5753 LearningRate 0.0012 Epoch: 19 Global Step: 99940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:51:41,942-Speed 10540.03 samples/sec Loss 1.5873 LearningRate 0.0012 Epoch: 19 Global Step: 99950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:51:49,746-Speed 10508.21 samples/sec Loss 1.5665 LearningRate 0.0012 Epoch: 19 Global Step: 99960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:51:57,535-Speed 10518.95 samples/sec Loss 1.5669 LearningRate 0.0012 Epoch: 19 Global Step: 99970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:52:05,383-Speed 10440.05 samples/sec Loss 1.5713 LearningRate 0.0012 Epoch: 19 Global Step: 99980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:52:13,190-Speed 10493.78 samples/sec Loss 1.5690 LearningRate 0.0012 Epoch: 19 Global Step: 99990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:52:20,996-Speed 10496.03 samples/sec Loss 1.5840 LearningRate 0.0012 Epoch: 19 Global Step: 100000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:52:48,849-[lfw][100000]XNorm: 23.508723 Training: 2022-01-16 12:52:48,849-[lfw][100000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-01-16 12:52:48,850-[lfw][100000]Accuracy-Highest: 0.99817 Training: 2022-01-16 12:53:21,017-[cfp_fp][100000]XNorm: 21.667794 Training: 2022-01-16 12:53:21,018-[cfp_fp][100000]Accuracy-Flip: 0.99243+-0.00350 Training: 2022-01-16 12:53:21,018-[cfp_fp][100000]Accuracy-Highest: 0.99257 Training: 2022-01-16 12:53:49,284-[agedb_30][100000]XNorm: 23.007740 Training: 2022-01-16 12:53:49,285-[agedb_30][100000]Accuracy-Flip: 0.98083+-0.00638 Training: 2022-01-16 12:53:49,285-[agedb_30][100000]Accuracy-Highest: 0.98083 Training: 2022-01-16 12:53:57,034-Speed 853.01 samples/sec Loss 1.5561 LearningRate 0.0012 Epoch: 19 Global Step: 100010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:54:04,745-Speed 10625.24 samples/sec Loss 1.5865 LearningRate 0.0012 Epoch: 19 Global Step: 100020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:54:12,475-Speed 10599.23 samples/sec Loss 1.5903 LearningRate 0.0012 Epoch: 19 Global Step: 100030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:54:20,225-Speed 10571.53 samples/sec Loss 1.5743 LearningRate 0.0012 Epoch: 19 Global Step: 100040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:54:27,993-Speed 10547.99 samples/sec Loss 1.5639 LearningRate 0.0011 Epoch: 19 Global Step: 100050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:54:35,782-Speed 10517.91 samples/sec Loss 1.5509 LearningRate 0.0011 Epoch: 19 Global Step: 100060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:54:43,552-Speed 10545.72 samples/sec Loss 1.5774 LearningRate 0.0011 Epoch: 19 Global Step: 100070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:54:51,307-Speed 10564.05 samples/sec Loss 1.5535 LearningRate 0.0011 Epoch: 19 Global Step: 100080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:54:59,080-Speed 10541.25 samples/sec Loss 1.5794 LearningRate 0.0011 Epoch: 19 Global Step: 100090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:55:06,833-Speed 10568.00 samples/sec Loss 1.5663 LearningRate 0.0011 Epoch: 19 Global Step: 100100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:55:14,595-Speed 10554.39 samples/sec Loss 1.5630 LearningRate 0.0011 Epoch: 19 Global Step: 100110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:55:22,373-Speed 10534.47 samples/sec Loss 1.5778 LearningRate 0.0011 Epoch: 19 Global Step: 100120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:55:30,154-Speed 10530.73 samples/sec Loss 1.5595 LearningRate 0.0011 Epoch: 19 Global Step: 100130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:55:37,918-Speed 10552.67 samples/sec Loss 1.5760 LearningRate 0.0011 Epoch: 19 Global Step: 100140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:55:45,694-Speed 10537.02 samples/sec Loss 1.5600 LearningRate 0.0011 Epoch: 19 Global Step: 100150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:55:53,455-Speed 10559.21 samples/sec Loss 1.5730 LearningRate 0.0011 Epoch: 19 Global Step: 100160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:56:01,213-Speed 10560.58 samples/sec Loss 1.5401 LearningRate 0.0011 Epoch: 19 Global Step: 100170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:56:09,054-Speed 10449.30 samples/sec Loss 1.5560 LearningRate 0.0011 Epoch: 19 Global Step: 100180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:56:16,788-Speed 10594.34 samples/sec Loss 1.5571 LearningRate 0.0011 Epoch: 19 Global Step: 100190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:56:24,527-Speed 10586.46 samples/sec Loss 1.5768 LearningRate 0.0011 Epoch: 19 Global Step: 100200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:56:32,283-Speed 10562.91 samples/sec Loss 1.5628 LearningRate 0.0011 Epoch: 19 Global Step: 100210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:56:40,053-Speed 10547.23 samples/sec Loss 1.5582 LearningRate 0.0010 Epoch: 19 Global Step: 100220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:56:47,808-Speed 10565.44 samples/sec Loss 1.5716 LearningRate 0.0010 Epoch: 19 Global Step: 100230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:56:55,562-Speed 10566.19 samples/sec Loss 1.5617 LearningRate 0.0010 Epoch: 19 Global Step: 100240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:57:03,309-Speed 10575.33 samples/sec Loss 1.5387 LearningRate 0.0010 Epoch: 19 Global Step: 100250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:57:11,054-Speed 10578.99 samples/sec Loss 1.5609 LearningRate 0.0010 Epoch: 19 Global Step: 100260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:57:18,802-Speed 10574.73 samples/sec Loss 1.5314 LearningRate 0.0010 Epoch: 19 Global Step: 100270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:57:26,574-Speed 10542.74 samples/sec Loss 1.5797 LearningRate 0.0010 Epoch: 19 Global Step: 100280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:57:34,334-Speed 10557.28 samples/sec Loss 1.5385 LearningRate 0.0010 Epoch: 19 Global Step: 100290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:57:42,109-Speed 10537.15 samples/sec Loss 1.5486 LearningRate 0.0010 Epoch: 19 Global Step: 100300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:57:49,917-Speed 10493.92 samples/sec Loss 1.5630 LearningRate 0.0010 Epoch: 19 Global Step: 100310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:57:57,712-Speed 10510.89 samples/sec Loss 1.5460 LearningRate 0.0010 Epoch: 19 Global Step: 100320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:58:05,472-Speed 10556.78 samples/sec Loss 1.5685 LearningRate 0.0010 Epoch: 19 Global Step: 100330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:58:13,248-Speed 10538.22 samples/sec Loss 1.5718 LearningRate 0.0010 Epoch: 19 Global Step: 100340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 12:58:21,012-Speed 10557.28 samples/sec Loss 1.5629 LearningRate 0.0010 Epoch: 19 Global Step: 100350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:58:28,775-Speed 10553.10 samples/sec Loss 1.5355 LearningRate 0.0010 Epoch: 19 Global Step: 100360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:58:36,541-Speed 10549.56 samples/sec Loss 1.5440 LearningRate 0.0010 Epoch: 19 Global Step: 100370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:58:44,327-Speed 10523.10 samples/sec Loss 1.5714 LearningRate 0.0009 Epoch: 19 Global Step: 100380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:58:52,106-Speed 10532.43 samples/sec Loss 1.5518 LearningRate 0.0009 Epoch: 19 Global Step: 100390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:58:59,882-Speed 10536.06 samples/sec Loss 1.5693 LearningRate 0.0009 Epoch: 19 Global Step: 100400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:59:07,686-Speed 10498.46 samples/sec Loss 1.5540 LearningRate 0.0009 Epoch: 19 Global Step: 100410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:59:15,450-Speed 10554.13 samples/sec Loss 1.5583 LearningRate 0.0009 Epoch: 19 Global Step: 100420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:59:23,209-Speed 10558.18 samples/sec Loss 1.5580 LearningRate 0.0009 Epoch: 19 Global Step: 100430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:59:30,974-Speed 10551.61 samples/sec Loss 1.5391 LearningRate 0.0009 Epoch: 19 Global Step: 100440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:59:38,751-Speed 10535.43 samples/sec Loss 1.5454 LearningRate 0.0009 Epoch: 19 Global Step: 100450 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-16 12:59:46,514-Speed 10554.71 samples/sec Loss 1.5405 LearningRate 0.0009 Epoch: 19 Global Step: 100460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 12:59:54,366-Speed 10434.49 samples/sec Loss 1.5513 LearningRate 0.0009 Epoch: 19 Global Step: 100470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:00:02,120-Speed 10566.09 samples/sec Loss 1.5518 LearningRate 0.0009 Epoch: 19 Global Step: 100480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:00:09,890-Speed 10544.84 samples/sec Loss 1.5346 LearningRate 0.0009 Epoch: 19 Global Step: 100490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 13:00:17,698-Speed 10492.60 samples/sec Loss 1.5326 LearningRate 0.0009 Epoch: 19 Global Step: 100500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 13:00:25,472-Speed 10539.00 samples/sec Loss 1.5187 LearningRate 0.0009 Epoch: 19 Global Step: 100510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 13:00:33,264-Speed 10515.57 samples/sec Loss 1.5265 LearningRate 0.0009 Epoch: 19 Global Step: 100520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 13:00:41,034-Speed 10548.34 samples/sec Loss 1.5420 LearningRate 0.0009 Epoch: 19 Global Step: 100530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 13:00:48,786-Speed 10571.84 samples/sec Loss 1.5390 LearningRate 0.0009 Epoch: 19 Global Step: 100540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 13:00:56,554-Speed 10546.40 samples/sec Loss 1.5517 LearningRate 0.0009 Epoch: 19 Global Step: 100550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 13:01:04,314-Speed 10558.75 samples/sec Loss 1.5409 LearningRate 0.0008 Epoch: 19 Global Step: 100560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 13:01:12,107-Speed 10514.04 samples/sec Loss 1.5628 LearningRate 0.0008 Epoch: 19 Global Step: 100570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 13:01:19,886-Speed 10533.37 samples/sec Loss 1.5592 LearningRate 0.0008 Epoch: 19 Global Step: 100580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 13:01:27,677-Speed 10515.77 samples/sec Loss 1.5325 LearningRate 0.0008 Epoch: 19 Global Step: 100590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:01:35,441-Speed 10555.26 samples/sec Loss 1.5339 LearningRate 0.0008 Epoch: 19 Global Step: 100600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:01:43,229-Speed 10521.31 samples/sec Loss 1.5361 LearningRate 0.0008 Epoch: 19 Global Step: 100610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:01:50,992-Speed 10553.92 samples/sec Loss 1.5380 LearningRate 0.0008 Epoch: 19 Global Step: 100620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:01:58,753-Speed 10556.36 samples/sec Loss 1.5555 LearningRate 0.0008 Epoch: 19 Global Step: 100630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:02:06,520-Speed 10548.91 samples/sec Loss 1.5310 LearningRate 0.0008 Epoch: 19 Global Step: 100640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:02:14,285-Speed 10553.00 samples/sec Loss 1.5451 LearningRate 0.0008 Epoch: 19 Global Step: 100650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:02:22,056-Speed 10544.54 samples/sec Loss 1.5431 LearningRate 0.0008 Epoch: 19 Global Step: 100660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:02:29,829-Speed 10539.25 samples/sec Loss 1.5467 LearningRate 0.0008 Epoch: 19 Global Step: 100670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:02:37,596-Speed 10549.81 samples/sec Loss 1.5070 LearningRate 0.0008 Epoch: 19 Global Step: 100680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:02:45,416-Speed 10476.88 samples/sec Loss 1.5411 LearningRate 0.0008 Epoch: 19 Global Step: 100690 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-16 13:02:53,200-Speed 10525.63 samples/sec Loss 1.5589 LearningRate 0.0008 Epoch: 19 Global Step: 100700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:03:00,990-Speed 10516.43 samples/sec Loss 1.5524 LearningRate 0.0008 Epoch: 19 Global Step: 100710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:03:08,763-Speed 10541.04 samples/sec Loss 1.5470 LearningRate 0.0008 Epoch: 19 Global Step: 100720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:03:16,568-Speed 10497.82 samples/sec Loss 1.5309 LearningRate 0.0008 Epoch: 19 Global Step: 100730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:03:24,362-Speed 10511.30 samples/sec Loss 1.5439 LearningRate 0.0008 Epoch: 19 Global Step: 100740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:03:32,140-Speed 10534.35 samples/sec Loss 1.5482 LearningRate 0.0007 Epoch: 19 Global Step: 100750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:03:39,924-Speed 10525.65 samples/sec Loss 1.5387 LearningRate 0.0007 Epoch: 19 Global Step: 100760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:03:47,718-Speed 10511.04 samples/sec Loss 1.5367 LearningRate 0.0007 Epoch: 19 Global Step: 100770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:03:55,498-Speed 10535.01 samples/sec Loss 1.5298 LearningRate 0.0007 Epoch: 19 Global Step: 100780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 13:04:03,264-Speed 10549.45 samples/sec Loss 1.5298 LearningRate 0.0007 Epoch: 19 Global Step: 100790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 13:04:11,065-Speed 10502.75 samples/sec Loss 1.5311 LearningRate 0.0007 Epoch: 19 Global Step: 100800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 13:04:18,852-Speed 10528.86 samples/sec Loss 1.5300 LearningRate 0.0007 Epoch: 19 Global Step: 100810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 13:04:26,629-Speed 10538.51 samples/sec Loss 1.5245 LearningRate 0.0007 Epoch: 19 Global Step: 100820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 13:04:34,423-Speed 10512.80 samples/sec Loss 1.5366 LearningRate 0.0007 Epoch: 19 Global Step: 100830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 13:04:42,209-Speed 10522.31 samples/sec Loss 1.5493 LearningRate 0.0007 Epoch: 19 Global Step: 100840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 13:04:50,007-Speed 10507.77 samples/sec Loss 1.5266 LearningRate 0.0007 Epoch: 19 Global Step: 100850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 13:04:57,772-Speed 10551.47 samples/sec Loss 1.5282 LearningRate 0.0007 Epoch: 19 Global Step: 100860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 13:05:05,567-Speed 10510.61 samples/sec Loss 1.5468 LearningRate 0.0007 Epoch: 19 Global Step: 100870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 13:05:13,347-Speed 10531.51 samples/sec Loss 1.5377 LearningRate 0.0007 Epoch: 19 Global Step: 100880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:05:21,138-Speed 10516.12 samples/sec Loss 1.5296 LearningRate 0.0007 Epoch: 19 Global Step: 100890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:05:28,938-Speed 10503.47 samples/sec Loss 1.5398 LearningRate 0.0007 Epoch: 19 Global Step: 100900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:05:36,714-Speed 10536.65 samples/sec Loss 1.5487 LearningRate 0.0007 Epoch: 19 Global Step: 100910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:05:44,501-Speed 10523.52 samples/sec Loss 1.5322 LearningRate 0.0007 Epoch: 19 Global Step: 100920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:05:52,300-Speed 10505.53 samples/sec Loss 1.5351 LearningRate 0.0007 Epoch: 19 Global Step: 100930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:06:00,067-Speed 10548.73 samples/sec Loss 1.5424 LearningRate 0.0007 Epoch: 19 Global Step: 100940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:06:07,842-Speed 10538.16 samples/sec Loss 1.5456 LearningRate 0.0006 Epoch: 19 Global Step: 100950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:06:15,617-Speed 10537.68 samples/sec Loss 1.5339 LearningRate 0.0006 Epoch: 19 Global Step: 100960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:06:23,398-Speed 10528.91 samples/sec Loss 1.5279 LearningRate 0.0006 Epoch: 19 Global Step: 100970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:06:31,197-Speed 10506.53 samples/sec Loss 1.5271 LearningRate 0.0006 Epoch: 19 Global Step: 100980 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-16 13:06:38,978-Speed 10529.16 samples/sec Loss 1.5333 LearningRate 0.0006 Epoch: 19 Global Step: 100990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:06:46,757-Speed 10532.58 samples/sec Loss 1.5106 LearningRate 0.0006 Epoch: 19 Global Step: 101000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:06:54,547-Speed 10516.53 samples/sec Loss 1.5399 LearningRate 0.0006 Epoch: 19 Global Step: 101010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:07:02,338-Speed 10517.08 samples/sec Loss 1.5282 LearningRate 0.0006 Epoch: 19 Global Step: 101020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:07:10,143-Speed 10496.05 samples/sec Loss 1.5266 LearningRate 0.0006 Epoch: 19 Global Step: 101030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:07:17,929-Speed 10523.17 samples/sec Loss 1.5167 LearningRate 0.0006 Epoch: 19 Global Step: 101040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:07:25,771-Speed 10448.56 samples/sec Loss 1.5102 LearningRate 0.0006 Epoch: 19 Global Step: 101050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:07:33,540-Speed 10546.19 samples/sec Loss 1.5279 LearningRate 0.0006 Epoch: 19 Global Step: 101060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:07:41,328-Speed 10518.87 samples/sec Loss 1.5323 LearningRate 0.0006 Epoch: 19 Global Step: 101070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:07:49,114-Speed 10523.84 samples/sec Loss 1.5328 LearningRate 0.0006 Epoch: 19 Global Step: 101080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:07:56,936-Speed 10475.28 samples/sec Loss 1.5426 LearningRate 0.0006 Epoch: 19 Global Step: 101090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:08:04,719-Speed 10525.45 samples/sec Loss 1.5487 LearningRate 0.0006 Epoch: 19 Global Step: 101100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:08:12,593-Speed 10406.49 samples/sec Loss 1.5500 LearningRate 0.0006 Epoch: 19 Global Step: 101110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:08:20,377-Speed 10524.98 samples/sec Loss 1.5531 LearningRate 0.0006 Epoch: 19 Global Step: 101120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:08:28,222-Speed 10444.41 samples/sec Loss 1.5195 LearningRate 0.0006 Epoch: 19 Global Step: 101130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:08:35,997-Speed 10537.55 samples/sec Loss 1.5223 LearningRate 0.0006 Epoch: 19 Global Step: 101140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:08:43,809-Speed 10488.49 samples/sec Loss 1.5293 LearningRate 0.0006 Epoch: 19 Global Step: 101150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:08:51,571-Speed 10555.87 samples/sec Loss 1.5147 LearningRate 0.0006 Epoch: 19 Global Step: 101160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:08:59,362-Speed 10521.36 samples/sec Loss 1.5261 LearningRate 0.0005 Epoch: 19 Global Step: 101170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:09:07,174-Speed 10487.92 samples/sec Loss 1.5155 LearningRate 0.0005 Epoch: 19 Global Step: 101180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:09:15,009-Speed 10458.67 samples/sec Loss 1.5114 LearningRate 0.0005 Epoch: 19 Global Step: 101190 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-01-16 13:09:22,790-Speed 10529.44 samples/sec Loss 1.5196 LearningRate 0.0005 Epoch: 19 Global Step: 101200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:09:30,559-Speed 10546.05 samples/sec Loss 1.5277 LearningRate 0.0005 Epoch: 19 Global Step: 101210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:09:38,332-Speed 10540.87 samples/sec Loss 1.5190 LearningRate 0.0005 Epoch: 19 Global Step: 101220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 13:09:46,139-Speed 10493.75 samples/sec Loss 1.5085 LearningRate 0.0005 Epoch: 19 Global Step: 101230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 13:09:53,929-Speed 10517.72 samples/sec Loss 1.5042 LearningRate 0.0005 Epoch: 19 Global Step: 101240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 13:10:01,720-Speed 10516.69 samples/sec Loss 1.5467 LearningRate 0.0005 Epoch: 19 Global Step: 101250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 13:10:09,544-Speed 10472.27 samples/sec Loss 1.5127 LearningRate 0.0005 Epoch: 19 Global Step: 101260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 13:10:17,345-Speed 10503.30 samples/sec Loss 1.5314 LearningRate 0.0005 Epoch: 19 Global Step: 101270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 13:10:25,140-Speed 10510.70 samples/sec Loss 1.5342 LearningRate 0.0005 Epoch: 19 Global Step: 101280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 13:10:32,936-Speed 10509.95 samples/sec Loss 1.5303 LearningRate 0.0005 Epoch: 19 Global Step: 101290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 13:10:40,731-Speed 10511.09 samples/sec Loss 1.5133 LearningRate 0.0005 Epoch: 19 Global Step: 101300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 13:10:48,513-Speed 10528.27 samples/sec Loss 1.5244 LearningRate 0.0005 Epoch: 19 Global Step: 101310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 13:10:56,338-Speed 10470.13 samples/sec Loss 1.5378 LearningRate 0.0005 Epoch: 19 Global Step: 101320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:11:04,150-Speed 10488.04 samples/sec Loss 1.5226 LearningRate 0.0005 Epoch: 19 Global Step: 101330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:11:11,936-Speed 10524.00 samples/sec Loss 1.5371 LearningRate 0.0005 Epoch: 19 Global Step: 101340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-01-16 13:11:19,724-Speed 10519.18 samples/sec Loss 1.5102 LearningRate 0.0005 Epoch: 19 Global Step: 101350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 13:11:27,514-Speed 10517.83 samples/sec Loss 1.5049 LearningRate 0.0005 Epoch: 19 Global Step: 101360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 13:11:35,323-Speed 10492.85 samples/sec Loss 1.4977 LearningRate 0.0005 Epoch: 19 Global Step: 101370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 13:11:43,140-Speed 10480.97 samples/sec Loss 1.5183 LearningRate 0.0005 Epoch: 19 Global Step: 101380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 13:11:50,938-Speed 10506.99 samples/sec Loss 1.5084 LearningRate 0.0005 Epoch: 19 Global Step: 101390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 13:11:58,732-Speed 10511.73 samples/sec Loss 1.5176 LearningRate 0.0005 Epoch: 19 Global Step: 101400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-01-16 13:12:06,511-Speed 10532.27 samples/sec Loss 1.5149 LearningRate 0.0004 Epoch: 19 Global Step: 101410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:12:14,297-Speed 10526.42 samples/sec Loss 1.5279 LearningRate 0.0004 Epoch: 19 Global Step: 101420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:12:22,076-Speed 10532.43 samples/sec Loss 1.5397 LearningRate 0.0004 Epoch: 19 Global Step: 101430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:12:29,883-Speed 10495.80 samples/sec Loss 1.5200 LearningRate 0.0004 Epoch: 19 Global Step: 101440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:12:37,669-Speed 10522.46 samples/sec Loss 1.5155 LearningRate 0.0004 Epoch: 19 Global Step: 101450 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:12:45,472-Speed 10500.77 samples/sec Loss 1.5067 LearningRate 0.0004 Epoch: 19 Global Step: 101460 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:12:53,246-Speed 10541.73 samples/sec Loss 1.5250 LearningRate 0.0004 Epoch: 19 Global Step: 101470 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:13:01,087-Speed 10449.71 samples/sec Loss 1.5178 LearningRate 0.0004 Epoch: 19 Global Step: 101480 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:13:08,875-Speed 10520.11 samples/sec Loss 1.4996 LearningRate 0.0004 Epoch: 19 Global Step: 101490 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:13:16,674-Speed 10507.01 samples/sec Loss 1.5155 LearningRate 0.0004 Epoch: 19 Global Step: 101500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:13:24,445-Speed 10542.73 samples/sec Loss 1.5188 LearningRate 0.0004 Epoch: 19 Global Step: 101510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:13:32,240-Speed 10510.93 samples/sec Loss 1.5225 LearningRate 0.0004 Epoch: 19 Global Step: 101520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:13:40,024-Speed 10525.22 samples/sec Loss 1.5257 LearningRate 0.0004 Epoch: 19 Global Step: 101530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:13:47,798-Speed 10540.12 samples/sec Loss 1.5060 LearningRate 0.0004 Epoch: 19 Global Step: 101540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:13:55,584-Speed 10521.97 samples/sec Loss 1.5180 LearningRate 0.0004 Epoch: 19 Global Step: 101550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:14:03,356-Speed 10541.50 samples/sec Loss 1.5231 LearningRate 0.0004 Epoch: 19 Global Step: 101560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:14:11,185-Speed 10465.47 samples/sec Loss 1.4838 LearningRate 0.0004 Epoch: 19 Global Step: 101570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:14:18,992-Speed 10493.86 samples/sec Loss 1.5065 LearningRate 0.0004 Epoch: 19 Global Step: 101580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:14:26,779-Speed 10522.82 samples/sec Loss 1.5294 LearningRate 0.0004 Epoch: 19 Global Step: 101590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:14:34,548-Speed 10545.12 samples/sec Loss 1.4758 LearningRate 0.0004 Epoch: 19 Global Step: 101600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:14:42,359-Speed 10488.19 samples/sec Loss 1.5095 LearningRate 0.0004 Epoch: 19 Global Step: 101610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:14:50,168-Speed 10492.86 samples/sec Loss 1.5097 LearningRate 0.0004 Epoch: 19 Global Step: 101620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:14:57,942-Speed 10538.72 samples/sec Loss 1.5237 LearningRate 0.0004 Epoch: 19 Global Step: 101630 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-16 13:15:05,738-Speed 10511.06 samples/sec Loss 1.5161 LearningRate 0.0004 Epoch: 19 Global Step: 101640 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-16 13:15:13,533-Speed 10510.83 samples/sec Loss 1.5157 LearningRate 0.0004 Epoch: 19 Global Step: 101650 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-16 13:15:21,328-Speed 10511.28 samples/sec Loss 1.5026 LearningRate 0.0004 Epoch: 19 Global Step: 101660 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-16 13:15:29,118-Speed 10516.79 samples/sec Loss 1.5198 LearningRate 0.0004 Epoch: 19 Global Step: 101670 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-16 13:15:36,904-Speed 10523.87 samples/sec Loss 1.5163 LearningRate 0.0003 Epoch: 19 Global Step: 101680 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-16 13:15:44,673-Speed 10545.04 samples/sec Loss 1.5004 LearningRate 0.0003 Epoch: 19 Global Step: 101690 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-16 13:15:52,511-Speed 10454.91 samples/sec Loss 1.5174 LearningRate 0.0003 Epoch: 19 Global Step: 101700 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-16 13:16:00,292-Speed 10528.14 samples/sec Loss 1.5016 LearningRate 0.0003 Epoch: 19 Global Step: 101710 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-16 13:16:08,078-Speed 10527.19 samples/sec Loss 1.5084 LearningRate 0.0003 Epoch: 19 Global Step: 101720 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-01-16 13:16:15,870-Speed 10514.96 samples/sec Loss 1.5016 LearningRate 0.0003 Epoch: 19 Global Step: 101730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:16:23,641-Speed 10544.82 samples/sec Loss 1.4833 LearningRate 0.0003 Epoch: 19 Global Step: 101740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:16:31,429-Speed 10519.37 samples/sec Loss 1.4877 LearningRate 0.0003 Epoch: 19 Global Step: 101750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:16:39,226-Speed 10508.40 samples/sec Loss 1.5301 LearningRate 0.0003 Epoch: 19 Global Step: 101760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:16:47,014-Speed 10519.80 samples/sec Loss 1.5126 LearningRate 0.0003 Epoch: 19 Global Step: 101770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:16:54,797-Speed 10527.03 samples/sec Loss 1.5062 LearningRate 0.0003 Epoch: 19 Global Step: 101780 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:17:02,598-Speed 10502.93 samples/sec Loss 1.5015 LearningRate 0.0003 Epoch: 19 Global Step: 101790 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:17:10,419-Speed 10475.38 samples/sec Loss 1.5074 LearningRate 0.0003 Epoch: 19 Global Step: 101800 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:17:18,200-Speed 10529.46 samples/sec Loss 1.5185 LearningRate 0.0003 Epoch: 19 Global Step: 101810 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:17:25,969-Speed 10545.22 samples/sec Loss 1.5141 LearningRate 0.0003 Epoch: 19 Global Step: 101820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:17:33,756-Speed 10521.94 samples/sec Loss 1.4962 LearningRate 0.0003 Epoch: 19 Global Step: 101830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:17:41,525-Speed 10546.48 samples/sec Loss 1.5270 LearningRate 0.0003 Epoch: 19 Global Step: 101840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:17:49,335-Speed 10490.17 samples/sec Loss 1.5045 LearningRate 0.0003 Epoch: 19 Global Step: 101850 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:17:57,128-Speed 10514.51 samples/sec Loss 1.5020 LearningRate 0.0003 Epoch: 19 Global Step: 101860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:18:04,937-Speed 10491.57 samples/sec Loss 1.4920 LearningRate 0.0003 Epoch: 19 Global Step: 101870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:18:12,749-Speed 10488.97 samples/sec Loss 1.4940 LearningRate 0.0003 Epoch: 19 Global Step: 101880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:18:20,563-Speed 10485.66 samples/sec Loss 1.5025 LearningRate 0.0003 Epoch: 19 Global Step: 101890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:18:28,359-Speed 10510.05 samples/sec Loss 1.5034 LearningRate 0.0003 Epoch: 19 Global Step: 101900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:18:36,169-Speed 10490.40 samples/sec Loss 1.5205 LearningRate 0.0003 Epoch: 19 Global Step: 101910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:18:43,959-Speed 10518.50 samples/sec Loss 1.4964 LearningRate 0.0003 Epoch: 19 Global Step: 101920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:18:51,746-Speed 10529.51 samples/sec Loss 1.5049 LearningRate 0.0003 Epoch: 19 Global Step: 101930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:18:59,569-Speed 10473.11 samples/sec Loss 1.5227 LearningRate 0.0003 Epoch: 19 Global Step: 101940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:19:07,383-Speed 10485.37 samples/sec Loss 1.4834 LearningRate 0.0003 Epoch: 19 Global Step: 101950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:19:15,201-Speed 10479.48 samples/sec Loss 1.5090 LearningRate 0.0003 Epoch: 19 Global Step: 101960 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:19:23,001-Speed 10504.70 samples/sec Loss 1.4971 LearningRate 0.0003 Epoch: 19 Global Step: 101970 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:19:30,835-Speed 10458.51 samples/sec Loss 1.5050 LearningRate 0.0003 Epoch: 19 Global Step: 101980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:19:38,686-Speed 10434.56 samples/sec Loss 1.5329 LearningRate 0.0002 Epoch: 19 Global Step: 101990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:19:46,477-Speed 10515.94 samples/sec Loss 1.4883 LearningRate 0.0002 Epoch: 19 Global Step: 102000 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:19:54,243-Speed 10554.52 samples/sec Loss 1.5003 LearningRate 0.0002 Epoch: 19 Global Step: 102010 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:20:02,043-Speed 10504.02 samples/sec Loss 1.4985 LearningRate 0.0002 Epoch: 19 Global Step: 102020 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:20:09,855-Speed 10487.16 samples/sec Loss 1.4951 LearningRate 0.0002 Epoch: 19 Global Step: 102030 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:20:17,644-Speed 10522.37 samples/sec Loss 1.5103 LearningRate 0.0002 Epoch: 19 Global Step: 102040 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:20:25,424-Speed 10533.13 samples/sec Loss 1.4928 LearningRate 0.0002 Epoch: 19 Global Step: 102050 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:20:33,217-Speed 10513.55 samples/sec Loss 1.5239 LearningRate 0.0002 Epoch: 19 Global Step: 102060 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:20:41,005-Speed 10520.90 samples/sec Loss 1.4869 LearningRate 0.0002 Epoch: 19 Global Step: 102070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:20:48,795-Speed 10516.96 samples/sec Loss 1.5033 LearningRate 0.0002 Epoch: 19 Global Step: 102080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:20:56,617-Speed 10473.42 samples/sec Loss 1.5108 LearningRate 0.0002 Epoch: 19 Global Step: 102090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:21:04,429-Speed 10492.98 samples/sec Loss 1.4929 LearningRate 0.0002 Epoch: 19 Global Step: 102100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:21:12,237-Speed 10493.64 samples/sec Loss 1.4946 LearningRate 0.0002 Epoch: 19 Global Step: 102110 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:21:20,059-Speed 10473.05 samples/sec Loss 1.5218 LearningRate 0.0002 Epoch: 19 Global Step: 102120 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:21:27,863-Speed 10501.60 samples/sec Loss 1.5076 LearningRate 0.0002 Epoch: 19 Global Step: 102130 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:21:35,703-Speed 10453.66 samples/sec Loss 1.5014 LearningRate 0.0002 Epoch: 19 Global Step: 102140 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:21:43,516-Speed 10487.40 samples/sec Loss 1.4876 LearningRate 0.0002 Epoch: 19 Global Step: 102150 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:21:51,339-Speed 10472.68 samples/sec Loss 1.4948 LearningRate 0.0002 Epoch: 19 Global Step: 102160 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:21:59,143-Speed 10498.90 samples/sec Loss 1.4881 LearningRate 0.0002 Epoch: 19 Global Step: 102170 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:22:06,940-Speed 10507.58 samples/sec Loss 1.4732 LearningRate 0.0002 Epoch: 19 Global Step: 102180 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-16 13:22:14,717-Speed 10535.06 samples/sec Loss 1.5031 LearningRate 0.0002 Epoch: 19 Global Step: 102190 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:22:22,488-Speed 10543.30 samples/sec Loss 1.4902 LearningRate 0.0002 Epoch: 19 Global Step: 102200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:22:30,256-Speed 10547.14 samples/sec Loss 1.4932 LearningRate 0.0002 Epoch: 19 Global Step: 102210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:22:38,087-Speed 10469.33 samples/sec Loss 1.4888 LearningRate 0.0002 Epoch: 19 Global Step: 102220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:22:45,893-Speed 10495.88 samples/sec Loss 1.4972 LearningRate 0.0002 Epoch: 19 Global Step: 102230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:22:53,692-Speed 10505.54 samples/sec Loss 1.5006 LearningRate 0.0002 Epoch: 19 Global Step: 102240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:23:01,495-Speed 10500.71 samples/sec Loss 1.4736 LearningRate 0.0002 Epoch: 19 Global Step: 102250 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:23:09,291-Speed 10508.72 samples/sec Loss 1.5021 LearningRate 0.0002 Epoch: 19 Global Step: 102260 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:23:17,098-Speed 10495.99 samples/sec Loss 1.5076 LearningRate 0.0002 Epoch: 19 Global Step: 102270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:23:24,888-Speed 10516.71 samples/sec Loss 1.5011 LearningRate 0.0002 Epoch: 19 Global Step: 102280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:23:32,668-Speed 10530.77 samples/sec Loss 1.4962 LearningRate 0.0002 Epoch: 19 Global Step: 102290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:23:40,466-Speed 10507.38 samples/sec Loss 1.4766 LearningRate 0.0002 Epoch: 19 Global Step: 102300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:23:48,297-Speed 10461.72 samples/sec Loss 1.4812 LearningRate 0.0002 Epoch: 19 Global Step: 102310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:23:56,095-Speed 10507.30 samples/sec Loss 1.5075 LearningRate 0.0002 Epoch: 19 Global Step: 102320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:24:03,890-Speed 10509.23 samples/sec Loss 1.4907 LearningRate 0.0002 Epoch: 19 Global Step: 102330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:24:11,683-Speed 10513.71 samples/sec Loss 1.5069 LearningRate 0.0002 Epoch: 19 Global Step: 102340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:24:19,475-Speed 10515.37 samples/sec Loss 1.4961 LearningRate 0.0002 Epoch: 19 Global Step: 102350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:24:27,253-Speed 10533.64 samples/sec Loss 1.4873 LearningRate 0.0002 Epoch: 19 Global Step: 102360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:24:35,051-Speed 10507.16 samples/sec Loss 1.4886 LearningRate 0.0001 Epoch: 19 Global Step: 102370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:24:42,827-Speed 10536.48 samples/sec Loss 1.4907 LearningRate 0.0001 Epoch: 19 Global Step: 102380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:24:50,636-Speed 10492.34 samples/sec Loss 1.4761 LearningRate 0.0001 Epoch: 19 Global Step: 102390 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:24:58,424-Speed 10520.47 samples/sec Loss 1.4914 LearningRate 0.0001 Epoch: 19 Global Step: 102400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:25:06,210-Speed 10523.42 samples/sec Loss 1.4980 LearningRate 0.0001 Epoch: 19 Global Step: 102410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:25:13,986-Speed 10537.43 samples/sec Loss 1.4915 LearningRate 0.0001 Epoch: 19 Global Step: 102420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:25:21,760-Speed 10538.07 samples/sec Loss 1.4806 LearningRate 0.0001 Epoch: 19 Global Step: 102430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:25:29,545-Speed 10525.17 samples/sec Loss 1.5069 LearningRate 0.0001 Epoch: 19 Global Step: 102440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:25:37,316-Speed 10543.17 samples/sec Loss 1.4951 LearningRate 0.0001 Epoch: 19 Global Step: 102450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:25:45,108-Speed 10513.51 samples/sec Loss 1.4956 LearningRate 0.0001 Epoch: 19 Global Step: 102460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:25:52,915-Speed 10495.28 samples/sec Loss 1.4929 LearningRate 0.0001 Epoch: 19 Global Step: 102470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:26:00,724-Speed 10491.43 samples/sec Loss 1.4951 LearningRate 0.0001 Epoch: 19 Global Step: 102480 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:26:08,570-Speed 10443.33 samples/sec Loss 1.4822 LearningRate 0.0001 Epoch: 19 Global Step: 102490 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:26:16,366-Speed 10508.80 samples/sec Loss 1.5029 LearningRate 0.0001 Epoch: 19 Global Step: 102500 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:26:24,160-Speed 10511.98 samples/sec Loss 1.5059 LearningRate 0.0001 Epoch: 19 Global Step: 102510 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:26:31,953-Speed 10515.11 samples/sec Loss 1.4899 LearningRate 0.0001 Epoch: 19 Global Step: 102520 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:26:39,727-Speed 10538.20 samples/sec Loss 1.4880 LearningRate 0.0001 Epoch: 19 Global Step: 102530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:26:47,514-Speed 10521.81 samples/sec Loss 1.5000 LearningRate 0.0001 Epoch: 19 Global Step: 102540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:26:55,304-Speed 10522.13 samples/sec Loss 1.5015 LearningRate 0.0001 Epoch: 19 Global Step: 102550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:27:03,091-Speed 10521.60 samples/sec Loss 1.4851 LearningRate 0.0001 Epoch: 19 Global Step: 102560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:27:10,890-Speed 10505.59 samples/sec Loss 1.5006 LearningRate 0.0001 Epoch: 19 Global Step: 102570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:27:18,678-Speed 10519.62 samples/sec Loss 1.4834 LearningRate 0.0001 Epoch: 19 Global Step: 102580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:27:26,467-Speed 10518.95 samples/sec Loss 1.4933 LearningRate 0.0001 Epoch: 19 Global Step: 102590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:27:34,234-Speed 10551.42 samples/sec Loss 1.4767 LearningRate 0.0001 Epoch: 19 Global Step: 102600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:27:42,042-Speed 10493.95 samples/sec Loss 1.4827 LearningRate 0.0001 Epoch: 19 Global Step: 102610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:27:49,830-Speed 10519.37 samples/sec Loss 1.4824 LearningRate 0.0001 Epoch: 19 Global Step: 102620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:27:57,618-Speed 10521.73 samples/sec Loss 1.4874 LearningRate 0.0001 Epoch: 19 Global Step: 102630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:28:05,418-Speed 10503.11 samples/sec Loss 1.4814 LearningRate 0.0001 Epoch: 19 Global Step: 102640 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:28:13,216-Speed 10506.61 samples/sec Loss 1.4965 LearningRate 0.0001 Epoch: 19 Global Step: 102650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:28:21,072-Speed 10429.19 samples/sec Loss 1.4758 LearningRate 0.0001 Epoch: 19 Global Step: 102660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:28:28,878-Speed 10496.23 samples/sec Loss 1.4749 LearningRate 0.0001 Epoch: 19 Global Step: 102670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:28:36,694-Speed 10483.56 samples/sec Loss 1.4859 LearningRate 0.0001 Epoch: 19 Global Step: 102680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:28:44,493-Speed 10503.74 samples/sec Loss 1.4965 LearningRate 0.0001 Epoch: 19 Global Step: 102690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:28:52,293-Speed 10504.36 samples/sec Loss 1.4879 LearningRate 0.0001 Epoch: 19 Global Step: 102700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:29:00,094-Speed 10502.60 samples/sec Loss 1.4827 LearningRate 0.0001 Epoch: 19 Global Step: 102710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:29:07,875-Speed 10530.53 samples/sec Loss 1.4824 LearningRate 0.0001 Epoch: 19 Global Step: 102720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:29:15,673-Speed 10505.70 samples/sec Loss 1.4867 LearningRate 0.0001 Epoch: 19 Global Step: 102730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:29:23,452-Speed 10533.16 samples/sec Loss 1.4957 LearningRate 0.0001 Epoch: 19 Global Step: 102740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:29:31,240-Speed 10520.45 samples/sec Loss 1.4750 LearningRate 0.0001 Epoch: 19 Global Step: 102750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:29:39,024-Speed 10525.75 samples/sec Loss 1.4922 LearningRate 0.0001 Epoch: 19 Global Step: 102760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:29:46,814-Speed 10517.04 samples/sec Loss 1.4826 LearningRate 0.0001 Epoch: 19 Global Step: 102770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:29:54,612-Speed 10506.37 samples/sec Loss 1.4974 LearningRate 0.0001 Epoch: 19 Global Step: 102780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:30:02,399-Speed 10521.83 samples/sec Loss 1.4940 LearningRate 0.0001 Epoch: 19 Global Step: 102790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:30:10,189-Speed 10516.92 samples/sec Loss 1.4993 LearningRate 0.0001 Epoch: 19 Global Step: 102800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:30:17,982-Speed 10512.63 samples/sec Loss 1.4921 LearningRate 0.0001 Epoch: 19 Global Step: 102810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:30:25,786-Speed 10499.49 samples/sec Loss 1.4786 LearningRate 0.0001 Epoch: 19 Global Step: 102820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:30:33,582-Speed 10509.13 samples/sec Loss 1.4767 LearningRate 0.0001 Epoch: 19 Global Step: 102830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:30:41,364-Speed 10527.73 samples/sec Loss 1.4765 LearningRate 0.0001 Epoch: 19 Global Step: 102840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:30:49,139-Speed 10538.80 samples/sec Loss 1.4866 LearningRate 0.0001 Epoch: 19 Global Step: 102850 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-16 13:30:56,920-Speed 10529.75 samples/sec Loss 1.4921 LearningRate 0.0001 Epoch: 19 Global Step: 102860 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:31:04,758-Speed 10455.50 samples/sec Loss 1.4790 LearningRate 0.0001 Epoch: 19 Global Step: 102870 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:31:12,547-Speed 10519.45 samples/sec Loss 1.4968 LearningRate 0.0001 Epoch: 19 Global Step: 102880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:31:20,347-Speed 10505.07 samples/sec Loss 1.4966 LearningRate 0.0001 Epoch: 19 Global Step: 102890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:31:28,138-Speed 10515.25 samples/sec Loss 1.4854 LearningRate 0.0001 Epoch: 19 Global Step: 102900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:31:35,924-Speed 10523.47 samples/sec Loss 1.4895 LearningRate 0.0001 Epoch: 19 Global Step: 102910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:31:43,704-Speed 10533.31 samples/sec Loss 1.4908 LearningRate 0.0001 Epoch: 19 Global Step: 102920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:31:51,492-Speed 10519.73 samples/sec Loss 1.4898 LearningRate 0.0000 Epoch: 19 Global Step: 102930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:31:59,284-Speed 10516.48 samples/sec Loss 1.4835 LearningRate 0.0000 Epoch: 19 Global Step: 102940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:32:07,058-Speed 10539.47 samples/sec Loss 1.5050 LearningRate 0.0000 Epoch: 19 Global Step: 102950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:32:14,847-Speed 10518.99 samples/sec Loss 1.4818 LearningRate 0.0000 Epoch: 19 Global Step: 102960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:32:22,660-Speed 10487.19 samples/sec Loss 1.4898 LearningRate 0.0000 Epoch: 19 Global Step: 102970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:32:30,465-Speed 10497.03 samples/sec Loss 1.4967 LearningRate 0.0000 Epoch: 19 Global Step: 102980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:32:38,253-Speed 10521.69 samples/sec Loss 1.4994 LearningRate 0.0000 Epoch: 19 Global Step: 102990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:32:46,040-Speed 10520.37 samples/sec Loss 1.4724 LearningRate 0.0000 Epoch: 19 Global Step: 103000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:32:53,829-Speed 10523.51 samples/sec Loss 1.4843 LearningRate 0.0000 Epoch: 19 Global Step: 103010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:33:01,628-Speed 10505.20 samples/sec Loss 1.4762 LearningRate 0.0000 Epoch: 19 Global Step: 103020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:33:09,413-Speed 10524.15 samples/sec Loss 1.4677 LearningRate 0.0000 Epoch: 19 Global Step: 103030 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:33:17,195-Speed 10528.46 samples/sec Loss 1.4792 LearningRate 0.0000 Epoch: 19 Global Step: 103040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:33:25,021-Speed 10470.35 samples/sec Loss 1.4755 LearningRate 0.0000 Epoch: 19 Global Step: 103050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:33:32,809-Speed 10519.88 samples/sec Loss 1.4875 LearningRate 0.0000 Epoch: 19 Global Step: 103060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:33:40,606-Speed 10508.78 samples/sec Loss 1.5104 LearningRate 0.0000 Epoch: 19 Global Step: 103070 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:33:48,430-Speed 10471.21 samples/sec Loss 1.5020 LearningRate 0.0000 Epoch: 19 Global Step: 103080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:33:56,246-Speed 10482.87 samples/sec Loss 1.5014 LearningRate 0.0000 Epoch: 19 Global Step: 103090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:34:04,053-Speed 10495.38 samples/sec Loss 1.4766 LearningRate 0.0000 Epoch: 19 Global Step: 103100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:34:11,842-Speed 10519.46 samples/sec Loss 1.4781 LearningRate 0.0000 Epoch: 19 Global Step: 103110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:34:19,624-Speed 10527.72 samples/sec Loss 1.4950 LearningRate 0.0000 Epoch: 19 Global Step: 103120 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:34:27,427-Speed 10500.68 samples/sec Loss 1.4790 LearningRate 0.0000 Epoch: 19 Global Step: 103130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:34:35,245-Speed 10479.61 samples/sec Loss 1.4812 LearningRate 0.0000 Epoch: 19 Global Step: 103140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:34:43,049-Speed 10498.98 samples/sec Loss 1.4776 LearningRate 0.0000 Epoch: 19 Global Step: 103150 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:34:50,872-Speed 10473.53 samples/sec Loss 1.4862 LearningRate 0.0000 Epoch: 19 Global Step: 103160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:34:58,697-Speed 10471.22 samples/sec Loss 1.4803 LearningRate 0.0000 Epoch: 19 Global Step: 103170 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:35:06,525-Speed 10466.16 samples/sec Loss 1.4822 LearningRate 0.0000 Epoch: 19 Global Step: 103180 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:35:14,322-Speed 10508.60 samples/sec Loss 1.4830 LearningRate 0.0000 Epoch: 19 Global Step: 103190 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:35:22,112-Speed 10517.35 samples/sec Loss 1.4935 LearningRate 0.0000 Epoch: 19 Global Step: 103200 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:35:29,930-Speed 10480.51 samples/sec Loss 1.4564 LearningRate 0.0000 Epoch: 19 Global Step: 103210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:35:37,744-Speed 10484.23 samples/sec Loss 1.4787 LearningRate 0.0000 Epoch: 19 Global Step: 103220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:35:45,541-Speed 10511.45 samples/sec Loss 1.4762 LearningRate 0.0000 Epoch: 19 Global Step: 103230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:35:53,349-Speed 10497.30 samples/sec Loss 1.4901 LearningRate 0.0000 Epoch: 19 Global Step: 103240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:36:01,143-Speed 10512.15 samples/sec Loss 1.4965 LearningRate 0.0000 Epoch: 19 Global Step: 103250 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:36:08,941-Speed 10506.72 samples/sec Loss 1.4916 LearningRate 0.0000 Epoch: 19 Global Step: 103260 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:36:16,726-Speed 10523.82 samples/sec Loss 1.4891 LearningRate 0.0000 Epoch: 19 Global Step: 103270 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:36:24,511-Speed 10524.90 samples/sec Loss 1.4506 LearningRate 0.0000 Epoch: 19 Global Step: 103280 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:36:32,315-Speed 10498.33 samples/sec Loss 1.4768 LearningRate 0.0000 Epoch: 19 Global Step: 103290 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:36:40,106-Speed 10516.46 samples/sec Loss 1.4685 LearningRate 0.0000 Epoch: 19 Global Step: 103300 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:36:47,884-Speed 10533.22 samples/sec Loss 1.4752 LearningRate 0.0000 Epoch: 19 Global Step: 103310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:36:55,661-Speed 10535.09 samples/sec Loss 1.4830 LearningRate 0.0000 Epoch: 19 Global Step: 103320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:37:03,454-Speed 10513.00 samples/sec Loss 1.4783 LearningRate 0.0000 Epoch: 19 Global Step: 103330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:37:11,249-Speed 10510.89 samples/sec Loss 1.4681 LearningRate 0.0000 Epoch: 19 Global Step: 103340 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:37:19,033-Speed 10525.20 samples/sec Loss 1.4808 LearningRate 0.0000 Epoch: 19 Global Step: 103350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:37:26,823-Speed 10518.76 samples/sec Loss 1.4886 LearningRate 0.0000 Epoch: 19 Global Step: 103360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:37:34,604-Speed 10529.00 samples/sec Loss 1.4816 LearningRate 0.0000 Epoch: 19 Global Step: 103370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:37:42,438-Speed 10458.73 samples/sec Loss 1.4730 LearningRate 0.0000 Epoch: 19 Global Step: 103380 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:37:50,243-Speed 10496.13 samples/sec Loss 1.4782 LearningRate 0.0000 Epoch: 19 Global Step: 103390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:37:58,054-Speed 10489.80 samples/sec Loss 1.5001 LearningRate 0.0000 Epoch: 19 Global Step: 103400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:38:05,845-Speed 10515.90 samples/sec Loss 1.5069 LearningRate 0.0000 Epoch: 19 Global Step: 103410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:38:13,681-Speed 10456.15 samples/sec Loss 1.4843 LearningRate 0.0000 Epoch: 19 Global Step: 103420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:38:21,501-Speed 10477.27 samples/sec Loss 1.4958 LearningRate 0.0000 Epoch: 19 Global Step: 103430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:38:29,322-Speed 10475.97 samples/sec Loss 1.4883 LearningRate 0.0000 Epoch: 19 Global Step: 103440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:38:37,117-Speed 10510.04 samples/sec Loss 1.4793 LearningRate 0.0000 Epoch: 19 Global Step: 103450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:38:44,894-Speed 10536.15 samples/sec Loss 1.4842 LearningRate 0.0000 Epoch: 19 Global Step: 103460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:38:52,682-Speed 10520.58 samples/sec Loss 1.4741 LearningRate 0.0000 Epoch: 19 Global Step: 103470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:39:00,494-Speed 10486.32 samples/sec Loss 1.4744 LearningRate 0.0000 Epoch: 19 Global Step: 103480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:39:08,304-Speed 10490.69 samples/sec Loss 1.5020 LearningRate 0.0000 Epoch: 19 Global Step: 103490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:39:16,110-Speed 10495.10 samples/sec Loss 1.4779 LearningRate 0.0000 Epoch: 19 Global Step: 103500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-01-16 13:39:23,890-Speed 10535.50 samples/sec Loss 1.4748 LearningRate 0.0000 Epoch: 19 Global Step: 103510 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:39:31,712-Speed 10473.90 samples/sec Loss 1.4903 LearningRate 0.0000 Epoch: 19 Global Step: 103520 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:39:39,499-Speed 10522.50 samples/sec Loss 1.4805 LearningRate 0.0000 Epoch: 19 Global Step: 103530 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:39:47,301-Speed 10503.47 samples/sec Loss 1.4746 LearningRate 0.0000 Epoch: 19 Global Step: 103540 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:39:55,108-Speed 10494.73 samples/sec Loss 1.4676 LearningRate 0.0000 Epoch: 19 Global Step: 103550 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:40:02,906-Speed 10506.29 samples/sec Loss 1.4933 LearningRate 0.0000 Epoch: 19 Global Step: 103560 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:40:10,747-Speed 10449.03 samples/sec Loss 1.4835 LearningRate 0.0000 Epoch: 19 Global Step: 103570 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:40:18,581-Speed 10459.34 samples/sec Loss 1.4966 LearningRate 0.0000 Epoch: 19 Global Step: 103580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:40:26,380-Speed 10505.04 samples/sec Loss 1.4674 LearningRate 0.0000 Epoch: 19 Global Step: 103590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:40:34,202-Speed 10473.47 samples/sec Loss 1.4757 LearningRate 0.0000 Epoch: 19 Global Step: 103600 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:40:42,004-Speed 10501.70 samples/sec Loss 1.4692 LearningRate 0.0000 Epoch: 19 Global Step: 103610 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-01-16 13:40:49,803-Speed 10505.25 samples/sec Loss 1.4987 LearningRate 0.0000 Epoch: 19 Global Step: 103620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:40:57,603-Speed 10504.47 samples/sec Loss 1.5092 LearningRate 0.0000 Epoch: 19 Global Step: 103630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:41:05,402-Speed 10505.40 samples/sec Loss 1.4787 LearningRate 0.0000 Epoch: 19 Global Step: 103640 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:41:13,186-Speed 10525.35 samples/sec Loss 1.4806 LearningRate 0.0000 Epoch: 19 Global Step: 103650 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:41:20,969-Speed 10527.44 samples/sec Loss 1.4747 LearningRate 0.0000 Epoch: 19 Global Step: 103660 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:41:28,757-Speed 10519.85 samples/sec Loss 1.4651 LearningRate 0.0000 Epoch: 19 Global Step: 103670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-01-16 13:41:36,581-Speed 10471.65 samples/sec Loss 1.4816 LearningRate 0.0000 Epoch: 19 Global Step: 103680 Fp16 Grad Scale: 65536 Required: -0 hours Training: 2022-01-16 13:41:44,374-Speed 10512.75 samples/sec Loss 1.4793 LearningRate 0.0000 Epoch: 19 Global Step: 103690 Fp16 Grad Scale: 65536 Required: -0 hours