Training: 2022-01-15 14:51:43,310-rank_id: 0
Training: 2022-01-15 14:52:04,169-: loss                     cosface
Training: 2022-01-15 14:52:04,170-: network                  r50
Training: 2022-01-15 14:52:04,170-: resume                   False
Training: 2022-01-15 14:52:04,170-: output                   work_dirs/webface42m_r50_lr01_pfc02_bs8k_16gpus
Training: 2022-01-15 14:52:04,170-: embedding_size           512
Training: 2022-01-15 14:52:04,170-: sample_rate              0.2
Training: 2022-01-15 14:52:04,171-: fp16                     True
Training: 2022-01-15 14:52:04,171-: momentum                 0.9
Training: 2022-01-15 14:52:04,171-: weight_decay             0.0005
Training: 2022-01-15 14:52:04,171-: batch_size               512
Training: 2022-01-15 14:52:04,171-: lr                       0.6
Training: 2022-01-15 14:52:04,171-: dali                     True
Training: 2022-01-15 14:52:04,171-: verbose                  10000
Training: 2022-01-15 14:52:04,171-: frequent                 10
Training: 2022-01-15 14:52:04,171-: score                    None
Training: 2022-01-15 14:52:04,171-: rec                      /train_tmp/WebFace42M
Training: 2022-01-15 14:52:04,171-: num_classes              2059906
Training: 2022-01-15 14:52:04,171-: num_image                42474557
Training: 2022-01-15 14:52:04,171-: num_epoch                20
Training: 2022-01-15 14:52:04,171-: warmup_epoch             4
Training: 2022-01-15 14:52:04,171-: val_targets              ['lfw', 'cfp_fp', 'agedb_30']
Training: 2022-01-15 14:52:04,171-: warmup_step              20736
Training: 2022-01-15 14:52:04,171-: total_step               103680
Training: 2022-01-15 14:53:11,153-Reducer buckets have been rebuilt in this iteration.
Training: 2022-01-15 14:53:26,109-Speed 10538.89 samples/sec   Loss 42.4980   LearningRate 0.0006   Epoch: 0   Global Step: 20   Fp16 Grad Scale: 16384   Required: 32 hours
Training: 2022-01-15 14:53:33,934-Speed 10472.32 samples/sec   Loss 42.5048   LearningRate 0.0009   Epoch: 0   Global Step: 30   Fp16 Grad Scale: 16384   Required: 29 hours
Training: 2022-01-15 14:53:41,750-Speed 10482.47 samples/sec   Loss 42.4908   LearningRate 0.0012   Epoch: 0   Global Step: 40   Fp16 Grad Scale: 16384   Required: 28 hours
Training: 2022-01-15 14:53:49,556-Speed 10496.88 samples/sec   Loss 42.4908   LearningRate 0.0014   Epoch: 0   Global Step: 50   Fp16 Grad Scale: 16384   Required: 27 hours
Training: 2022-01-15 14:53:57,384-Speed 10468.01 samples/sec   Loss 42.4911   LearningRate 0.0017   Epoch: 0   Global Step: 60   Fp16 Grad Scale: 16384   Required: 26 hours
Training: 2022-01-15 14:54:05,202-Speed 10479.91 samples/sec   Loss 42.4814   LearningRate 0.0020   Epoch: 0   Global Step: 70   Fp16 Grad Scale: 16384   Required: 25 hours
Training: 2022-01-15 14:54:12,998-Speed 10509.65 samples/sec   Loss 42.4677   LearningRate 0.0023   Epoch: 0   Global Step: 80   Fp16 Grad Scale: 16384   Required: 25 hours
Training: 2022-01-15 14:54:20,785-Speed 10522.36 samples/sec   Loss 42.4678   LearningRate 0.0026   Epoch: 0   Global Step: 90   Fp16 Grad Scale: 16384   Required: 25 hours
Training: 2022-01-15 14:54:28,610-Speed 10471.12 samples/sec   Loss 42.4525   LearningRate 0.0029   Epoch: 0   Global Step: 100   Fp16 Grad Scale: 16384   Required: 25 hours
Training: 2022-01-15 14:54:36,485-Speed 10404.45 samples/sec   Loss 42.4509   LearningRate 0.0032   Epoch: 0   Global Step: 110   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-15 14:54:44,363-Speed 10400.99 samples/sec   Loss 42.4429   LearningRate 0.0035   Epoch: 0   Global Step: 120   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-15 14:54:52,274-Speed 10357.30 samples/sec   Loss 42.4529   LearningRate 0.0038   Epoch: 0   Global Step: 130   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-15 14:55:00,077-Speed 10500.45 samples/sec   Loss 42.4124   LearningRate 0.0041   Epoch: 0   Global Step: 140   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-15 14:55:07,869-Speed 10514.96 samples/sec   Loss 42.4071   LearningRate 0.0043   Epoch: 0   Global Step: 150   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-15 14:55:15,684-Speed 10482.92 samples/sec   Loss 42.3922   LearningRate 0.0046   Epoch: 0   Global Step: 160   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-15 14:55:23,491-Speed 10495.91 samples/sec   Loss 42.3490   LearningRate 0.0049   Epoch: 0   Global Step: 170   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-15 14:55:31,291-Speed 10504.49 samples/sec   Loss 42.3321   LearningRate 0.0052   Epoch: 0   Global Step: 180   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-15 14:55:39,073-Speed 10529.22 samples/sec   Loss 42.2759   LearningRate 0.0055   Epoch: 0   Global Step: 190   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-15 14:55:46,882-Speed 10495.15 samples/sec   Loss 42.2436   LearningRate 0.0058   Epoch: 0   Global Step: 200   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-15 14:55:54,796-Speed 10352.03 samples/sec   Loss 42.1667   LearningRate 0.0061   Epoch: 0   Global Step: 210   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 14:56:02,611-Speed 10484.63 samples/sec   Loss 42.1119   LearningRate 0.0064   Epoch: 0   Global Step: 220   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 14:56:10,376-Speed 10552.01 samples/sec   Loss 42.0450   LearningRate 0.0067   Epoch: 0   Global Step: 230   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 14:56:18,138-Speed 10555.45 samples/sec   Loss 41.9606   LearningRate 0.0069   Epoch: 0   Global Step: 240   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 14:56:25,926-Speed 10522.16 samples/sec   Loss 41.8878   LearningRate 0.0072   Epoch: 0   Global Step: 250   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 14:56:33,708-Speed 10527.91 samples/sec   Loss 41.8050   LearningRate 0.0075   Epoch: 0   Global Step: 260   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 14:56:41,496-Speed 10521.71 samples/sec   Loss 41.7209   LearningRate 0.0078   Epoch: 0   Global Step: 270   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 14:56:49,259-Speed 10555.28 samples/sec   Loss 41.6181   LearningRate 0.0081   Epoch: 0   Global Step: 280   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 14:56:57,047-Speed 10519.80 samples/sec   Loss 41.5033   LearningRate 0.0084   Epoch: 0   Global Step: 290   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 14:57:04,801-Speed 10565.80 samples/sec   Loss 41.4112   LearningRate 0.0087   Epoch: 0   Global Step: 300   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 14:57:12,567-Speed 10551.28 samples/sec   Loss 41.3446   LearningRate 0.0090   Epoch: 0   Global Step: 310   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 14:57:20,332-Speed 10552.23 samples/sec   Loss 41.2324   LearningRate 0.0093   Epoch: 0   Global Step: 320   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 14:57:28,113-Speed 10529.74 samples/sec   Loss 41.1418   LearningRate 0.0095   Epoch: 0   Global Step: 330   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 14:57:35,899-Speed 10521.81 samples/sec   Loss 41.0666   LearningRate 0.0098   Epoch: 0   Global Step: 340   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 14:57:43,663-Speed 10553.62 samples/sec   Loss 40.9541   LearningRate 0.0101   Epoch: 0   Global Step: 350   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 14:57:51,459-Speed 10510.38 samples/sec   Loss 40.8803   LearningRate 0.0104   Epoch: 0   Global Step: 360   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 14:57:59,217-Speed 10560.87 samples/sec   Loss 40.7930   LearningRate 0.0107   Epoch: 0   Global Step: 370   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 14:58:06,973-Speed 10563.53 samples/sec   Loss 40.6926   LearningRate 0.0110   Epoch: 0   Global Step: 380   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 14:58:14,750-Speed 10535.93 samples/sec   Loss 40.6391   LearningRate 0.0113   Epoch: 0   Global Step: 390   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 14:58:22,511-Speed 10556.58 samples/sec   Loss 40.5538   LearningRate 0.0116   Epoch: 0   Global Step: 400   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 14:58:30,258-Speed 10576.90 samples/sec   Loss 40.4857   LearningRate 0.0119   Epoch: 0   Global Step: 410   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 14:58:38,013-Speed 10565.16 samples/sec   Loss 40.3895   LearningRate 0.0122   Epoch: 0   Global Step: 420   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 14:58:45,792-Speed 10536.41 samples/sec   Loss 40.3293   LearningRate 0.0124   Epoch: 0   Global Step: 430   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 14:58:53,563-Speed 10543.34 samples/sec   Loss 40.2559   LearningRate 0.0127   Epoch: 0   Global Step: 440   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 14:59:01,336-Speed 10542.08 samples/sec   Loss 40.1917   LearningRate 0.0130   Epoch: 0   Global Step: 450   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 14:59:09,126-Speed 10517.69 samples/sec   Loss 40.0976   LearningRate 0.0133   Epoch: 0   Global Step: 460   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 14:59:16,903-Speed 10541.06 samples/sec   Loss 40.0501   LearningRate 0.0136   Epoch: 0   Global Step: 470   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 14:59:24,696-Speed 10513.07 samples/sec   Loss 39.9828   LearningRate 0.0139   Epoch: 0   Global Step: 480   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 14:59:32,499-Speed 10500.02 samples/sec   Loss 39.9063   LearningRate 0.0142   Epoch: 0   Global Step: 490   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 14:59:40,285-Speed 10522.96 samples/sec   Loss 39.8531   LearningRate 0.0145   Epoch: 0   Global Step: 500   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 14:59:48,077-Speed 10516.17 samples/sec   Loss 39.7836   LearningRate 0.0148   Epoch: 0   Global Step: 510   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 14:59:55,866-Speed 10518.50 samples/sec   Loss 39.7285   LearningRate 0.0150   Epoch: 0   Global Step: 520   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 15:00:03,648-Speed 10528.19 samples/sec   Loss 39.6652   LearningRate 0.0153   Epoch: 0   Global Step: 530   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 15:00:11,423-Speed 10538.79 samples/sec   Loss 39.6111   LearningRate 0.0156   Epoch: 0   Global Step: 540   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 15:00:19,186-Speed 10555.47 samples/sec   Loss 39.5778   LearningRate 0.0159   Epoch: 0   Global Step: 550   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 15:00:26,966-Speed 10530.43 samples/sec   Loss 39.4876   LearningRate 0.0162   Epoch: 0   Global Step: 560   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 15:00:34,733-Speed 10549.76 samples/sec   Loss 39.4386   LearningRate 0.0165   Epoch: 0   Global Step: 570   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 15:00:42,534-Speed 10502.25 samples/sec   Loss 39.4006   LearningRate 0.0168   Epoch: 0   Global Step: 580   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 15:00:50,334-Speed 10504.16 samples/sec   Loss 39.3441   LearningRate 0.0171   Epoch: 0   Global Step: 590   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 15:00:58,156-Speed 10475.69 samples/sec   Loss 39.2938   LearningRate 0.0174   Epoch: 0   Global Step: 600   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 15:01:05,936-Speed 10530.61 samples/sec   Loss 39.2431   LearningRate 0.0177   Epoch: 0   Global Step: 610   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 15:01:13,722-Speed 10523.59 samples/sec   Loss 39.2166   LearningRate 0.0179   Epoch: 0   Global Step: 620   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-15 15:01:21,519-Speed 10510.90 samples/sec   Loss 39.1755   LearningRate 0.0182   Epoch: 0   Global Step: 630   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-15 15:01:29,292-Speed 10540.97 samples/sec   Loss 39.1499   LearningRate 0.0185   Epoch: 0   Global Step: 640   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-15 15:01:37,060-Speed 10547.51 samples/sec   Loss 39.0989   LearningRate 0.0188   Epoch: 0   Global Step: 650   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-15 15:01:44,822-Speed 10555.47 samples/sec   Loss 39.0727   LearningRate 0.0191   Epoch: 0   Global Step: 660   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-15 15:01:52,592-Speed 10544.66 samples/sec   Loss 39.0130   LearningRate 0.0194   Epoch: 0   Global Step: 670   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-15 15:02:00,356-Speed 10553.55 samples/sec   Loss 38.9975   LearningRate 0.0197   Epoch: 0   Global Step: 680   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-15 15:02:08,117-Speed 10556.65 samples/sec   Loss 38.9499   LearningRate 0.0200   Epoch: 0   Global Step: 690   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-15 15:02:15,896-Speed 10533.14 samples/sec   Loss 38.9410   LearningRate 0.0203   Epoch: 0   Global Step: 700   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-15 15:02:23,690-Speed 10512.85 samples/sec   Loss 38.8997   LearningRate 0.0205   Epoch: 0   Global Step: 710   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-15 15:02:31,466-Speed 10536.29 samples/sec   Loss 38.8655   LearningRate 0.0208   Epoch: 0   Global Step: 720   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 15:02:39,220-Speed 10571.25 samples/sec   Loss 38.8419   LearningRate 0.0211   Epoch: 0   Global Step: 730   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 15:02:47,033-Speed 10485.95 samples/sec   Loss 38.8172   LearningRate 0.0214   Epoch: 0   Global Step: 740   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 15:02:54,798-Speed 10552.53 samples/sec   Loss 38.7806   LearningRate 0.0217   Epoch: 0   Global Step: 750   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-15 15:03:02,547-Speed 10581.24 samples/sec   Loss 38.7802   LearningRate 0.0220   Epoch: 0   Global Step: 760   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-15 15:03:10,323-Speed 10536.87 samples/sec   Loss 38.7691   LearningRate 0.0223   Epoch: 0   Global Step: 770   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-15 15:03:18,092-Speed 10546.67 samples/sec   Loss 38.7287   LearningRate 0.0226   Epoch: 0   Global Step: 780   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-15 15:03:25,872-Speed 10534.40 samples/sec   Loss 38.7063   LearningRate 0.0229   Epoch: 0   Global Step: 790   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-15 15:03:33,682-Speed 10490.84 samples/sec   Loss 38.6898   LearningRate 0.0231   Epoch: 0   Global Step: 800   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-15 15:03:41,477-Speed 10511.84 samples/sec   Loss 38.6850   LearningRate 0.0234   Epoch: 0   Global Step: 810   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-15 15:03:49,280-Speed 10501.17 samples/sec   Loss 38.6583   LearningRate 0.0237   Epoch: 0   Global Step: 820   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-15 15:03:57,146-Speed 10416.54 samples/sec   Loss 38.6302   LearningRate 0.0240   Epoch: 0   Global Step: 830   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-15 15:04:04,948-Speed 10501.66 samples/sec   Loss 38.6152   LearningRate 0.0243   Epoch: 0   Global Step: 840   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-15 15:04:12,711-Speed 10554.32 samples/sec   Loss 38.5989   LearningRate 0.0246   Epoch: 0   Global Step: 850   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-15 15:04:20,501-Speed 10517.37 samples/sec   Loss 38.5868   LearningRate 0.0249   Epoch: 0   Global Step: 860   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-15 15:04:28,325-Speed 10472.15 samples/sec   Loss 38.5808   LearningRate 0.0252   Epoch: 0   Global Step: 870   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-15 15:04:36,132-Speed 10496.18 samples/sec   Loss 38.5699   LearningRate 0.0255   Epoch: 0   Global Step: 880   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-15 15:04:43,921-Speed 10519.62 samples/sec   Loss 38.5524   LearningRate 0.0258   Epoch: 0   Global Step: 890   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-15 15:04:51,712-Speed 10516.82 samples/sec   Loss 38.5357   LearningRate 0.0260   Epoch: 0   Global Step: 900   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-15 15:04:59,536-Speed 10473.58 samples/sec   Loss 38.5317   LearningRate 0.0263   Epoch: 0   Global Step: 910   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-15 15:05:07,347-Speed 10493.15 samples/sec   Loss 38.5047   LearningRate 0.0266   Epoch: 0   Global Step: 920   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-15 15:05:15,152-Speed 10497.44 samples/sec   Loss 38.4897   LearningRate 0.0269   Epoch: 0   Global Step: 930   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-15 15:05:22,969-Speed 10481.83 samples/sec   Loss 38.4940   LearningRate 0.0272   Epoch: 0   Global Step: 940   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-15 15:05:30,784-Speed 10485.88 samples/sec   Loss 38.4776   LearningRate 0.0275   Epoch: 0   Global Step: 950   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-15 15:05:38,599-Speed 10485.32 samples/sec   Loss 38.4629   LearningRate 0.0278   Epoch: 0   Global Step: 960   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:05:46,375-Speed 10537.75 samples/sec   Loss 38.4521   LearningRate 0.0281   Epoch: 0   Global Step: 970   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-15 15:05:54,209-Speed 10457.97 samples/sec   Loss 38.4495   LearningRate 0.0284   Epoch: 0   Global Step: 980   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-15 15:06:02,010-Speed 10504.63 samples/sec   Loss 38.4249   LearningRate 0.0286   Epoch: 0   Global Step: 990   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-15 15:06:09,826-Speed 10483.87 samples/sec   Loss 38.4208   LearningRate 0.0289   Epoch: 0   Global Step: 1000   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-15 15:06:17,685-Speed 10425.80 samples/sec   Loss 38.4126   LearningRate 0.0292   Epoch: 0   Global Step: 1010   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-15 15:06:25,478-Speed 10513.39 samples/sec   Loss 38.4139   LearningRate 0.0295   Epoch: 0   Global Step: 1020   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-15 15:06:33,277-Speed 10505.19 samples/sec   Loss 38.4050   LearningRate 0.0298   Epoch: 0   Global Step: 1030   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-15 15:06:41,097-Speed 10477.67 samples/sec   Loss 38.3907   LearningRate 0.0301   Epoch: 0   Global Step: 1040   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-15 15:06:48,887-Speed 10518.07 samples/sec   Loss 38.3610   LearningRate 0.0304   Epoch: 0   Global Step: 1050   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-15 15:06:56,718-Speed 10462.31 samples/sec   Loss 38.3688   LearningRate 0.0307   Epoch: 0   Global Step: 1060   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-15 15:07:04,538-Speed 10477.88 samples/sec   Loss 38.3434   LearningRate 0.0310   Epoch: 0   Global Step: 1070   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-15 15:07:12,355-Speed 10481.34 samples/sec   Loss 38.3405   LearningRate 0.0312   Epoch: 0   Global Step: 1080   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-15 15:07:20,193-Speed 10454.06 samples/sec   Loss 38.3381   LearningRate 0.0315   Epoch: 0   Global Step: 1090   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-01-15 15:07:28,019-Speed 10470.22 samples/sec   Loss 38.3337   LearningRate 0.0318   Epoch: 0   Global Step: 1100   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-01-15 15:07:35,853-Speed 10458.44 samples/sec   Loss 38.3181   LearningRate 0.0321   Epoch: 0   Global Step: 1110   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-01-15 15:07:43,646-Speed 10514.32 samples/sec   Loss 38.3201   LearningRate 0.0324   Epoch: 0   Global Step: 1120   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-01-15 15:07:51,433-Speed 10521.97 samples/sec   Loss 38.3203   LearningRate 0.0327   Epoch: 0   Global Step: 1130   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-01-15 15:07:59,224-Speed 10516.49 samples/sec   Loss 38.2987   LearningRate 0.0330   Epoch: 0   Global Step: 1140   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-01-15 15:08:07,021-Speed 10508.10 samples/sec   Loss 38.3164   LearningRate 0.0333   Epoch: 0   Global Step: 1150   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-01-15 15:08:14,799-Speed 10533.34 samples/sec   Loss 38.2953   LearningRate 0.0336   Epoch: 0   Global Step: 1160   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-01-15 15:08:22,592-Speed 10514.28 samples/sec   Loss 38.2887   LearningRate 0.0339   Epoch: 0   Global Step: 1170   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-01-15 15:08:30,414-Speed 10475.18 samples/sec   Loss 38.2872   LearningRate 0.0341   Epoch: 0   Global Step: 1180   Fp16 Grad Scale: 16384   Required: 22 hours
Training: 2022-01-15 15:08:38,219-Speed 10499.28 samples/sec   Loss 38.2896   LearningRate 0.0344   Epoch: 0   Global Step: 1190   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-15 15:08:46,047-Speed 10466.41 samples/sec   Loss 38.2829   LearningRate 0.0347   Epoch: 0   Global Step: 1200   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-15 15:08:53,833-Speed 10523.04 samples/sec   Loss 38.2741   LearningRate 0.0350   Epoch: 0   Global Step: 1210   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-15 15:09:01,658-Speed 10470.89 samples/sec   Loss 38.2550   LearningRate 0.0353   Epoch: 0   Global Step: 1220   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-15 15:09:09,481-Speed 10474.09 samples/sec   Loss 38.2580   LearningRate 0.0356   Epoch: 0   Global Step: 1230   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-15 15:09:17,279-Speed 10506.92 samples/sec   Loss 38.2631   LearningRate 0.0359   Epoch: 0   Global Step: 1240   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-15 15:09:25,064-Speed 10524.49 samples/sec   Loss 38.2589   LearningRate 0.0362   Epoch: 0   Global Step: 1250   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-15 15:09:32,883-Speed 10478.86 samples/sec   Loss 38.2594   LearningRate 0.0365   Epoch: 0   Global Step: 1260   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-15 15:09:40,717-Speed 10459.69 samples/sec   Loss 38.2590   LearningRate 0.0367   Epoch: 0   Global Step: 1270   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-15 15:09:48,561-Speed 10444.52 samples/sec   Loss 38.2660   LearningRate 0.0370   Epoch: 0   Global Step: 1280   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-15 15:09:56,401-Speed 10451.40 samples/sec   Loss 38.2441   LearningRate 0.0373   Epoch: 0   Global Step: 1290   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:10:04,218-Speed 10480.84 samples/sec   Loss 38.2531   LearningRate 0.0376   Epoch: 0   Global Step: 1300   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:10:12,055-Speed 10455.44 samples/sec   Loss 38.2634   LearningRate 0.0379   Epoch: 0   Global Step: 1310   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:10:19,905-Speed 10438.29 samples/sec   Loss 38.2514   LearningRate 0.0382   Epoch: 0   Global Step: 1320   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:10:27,719-Speed 10484.41 samples/sec   Loss 38.2451   LearningRate 0.0385   Epoch: 0   Global Step: 1330   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:10:35,561-Speed 10448.89 samples/sec   Loss 38.2466   LearningRate 0.0388   Epoch: 0   Global Step: 1340   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:10:43,407-Speed 10442.84 samples/sec   Loss 38.2262   LearningRate 0.0391   Epoch: 0   Global Step: 1350   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:10:51,242-Speed 10456.23 samples/sec   Loss 38.2228   LearningRate 0.0394   Epoch: 0   Global Step: 1360   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:10:59,067-Speed 10470.81 samples/sec   Loss 38.2284   LearningRate 0.0396   Epoch: 0   Global Step: 1370   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:11:06,914-Speed 10440.98 samples/sec   Loss 38.2449   LearningRate 0.0399   Epoch: 0   Global Step: 1380   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:11:14,746-Speed 10460.76 samples/sec   Loss 38.2194   LearningRate 0.0402   Epoch: 0   Global Step: 1390   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:11:22,577-Speed 10463.14 samples/sec   Loss 38.2370   LearningRate 0.0405   Epoch: 0   Global Step: 1400   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:11:30,388-Speed 10490.00 samples/sec   Loss 38.2302   LearningRate 0.0408   Epoch: 0   Global Step: 1410   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:11:38,166-Speed 10535.38 samples/sec   Loss 38.2242   LearningRate 0.0411   Epoch: 0   Global Step: 1420   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:11:45,977-Speed 10492.37 samples/sec   Loss 38.2461   LearningRate 0.0414   Epoch: 0   Global Step: 1430   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:11:53,777-Speed 10503.71 samples/sec   Loss 38.2388   LearningRate 0.0417   Epoch: 0   Global Step: 1440   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:12:01,586-Speed 10496.64 samples/sec   Loss 38.2325   LearningRate 0.0420   Epoch: 0   Global Step: 1450   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:12:09,373-Speed 10522.35 samples/sec   Loss 38.2465   LearningRate 0.0422   Epoch: 0   Global Step: 1460   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:12:17,194-Speed 10476.19 samples/sec   Loss 38.2373   LearningRate 0.0425   Epoch: 0   Global Step: 1470   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:12:25,025-Speed 10463.27 samples/sec   Loss 38.2213   LearningRate 0.0428   Epoch: 0   Global Step: 1480   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:12:32,836-Speed 10489.71 samples/sec   Loss 38.2317   LearningRate 0.0431   Epoch: 0   Global Step: 1490   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:12:40,625-Speed 10519.25 samples/sec   Loss 38.2165   LearningRate 0.0434   Epoch: 0   Global Step: 1500   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:12:48,418-Speed 10514.16 samples/sec   Loss 38.2158   LearningRate 0.0437   Epoch: 0   Global Step: 1510   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:12:56,192-Speed 10537.78 samples/sec   Loss 38.2174   LearningRate 0.0440   Epoch: 0   Global Step: 1520   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:13:03,982-Speed 10517.37 samples/sec   Loss 38.2420   LearningRate 0.0443   Epoch: 0   Global Step: 1530   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:13:11,787-Speed 10497.41 samples/sec   Loss 38.2211   LearningRate 0.0446   Epoch: 0   Global Step: 1540   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:13:19,604-Speed 10481.47 samples/sec   Loss 38.2172   LearningRate 0.0448   Epoch: 0   Global Step: 1550   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:13:27,405-Speed 10502.79 samples/sec   Loss 38.2099   LearningRate 0.0451   Epoch: 0   Global Step: 1560   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:13:35,264-Speed 10425.89 samples/sec   Loss 38.2204   LearningRate 0.0454   Epoch: 0   Global Step: 1570   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:13:43,069-Speed 10496.95 samples/sec   Loss 38.2153   LearningRate 0.0457   Epoch: 0   Global Step: 1580   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:13:50,899-Speed 10463.62 samples/sec   Loss 38.2126   LearningRate 0.0460   Epoch: 0   Global Step: 1590   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:13:58,688-Speed 10519.78 samples/sec   Loss 38.2188   LearningRate 0.0463   Epoch: 0   Global Step: 1600   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:14:06,485-Speed 10508.26 samples/sec   Loss 38.2258   LearningRate 0.0466   Epoch: 0   Global Step: 1610   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:14:14,342-Speed 10426.95 samples/sec   Loss 38.2314   LearningRate 0.0469   Epoch: 0   Global Step: 1620   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:14:22,165-Speed 10473.64 samples/sec   Loss 38.2210   LearningRate 0.0472   Epoch: 0   Global Step: 1630   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:14:29,995-Speed 10465.60 samples/sec   Loss 38.2138   LearningRate 0.0475   Epoch: 0   Global Step: 1640   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:14:37,814-Speed 10478.86 samples/sec   Loss 38.2127   LearningRate 0.0477   Epoch: 0   Global Step: 1650   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:14:45,603-Speed 10520.92 samples/sec   Loss 38.2195   LearningRate 0.0480   Epoch: 0   Global Step: 1660   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:14:53,401-Speed 10507.92 samples/sec   Loss 38.1962   LearningRate 0.0483   Epoch: 0   Global Step: 1670   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:15:01,200-Speed 10506.55 samples/sec   Loss 38.2052   LearningRate 0.0486   Epoch: 0   Global Step: 1680   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:15:08,990-Speed 10518.17 samples/sec   Loss 38.2062   LearningRate 0.0489   Epoch: 0   Global Step: 1690   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:15:16,792-Speed 10501.04 samples/sec   Loss 38.2024   LearningRate 0.0492   Epoch: 0   Global Step: 1700   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:15:24,608-Speed 10481.50 samples/sec   Loss 38.2109   LearningRate 0.0495   Epoch: 0   Global Step: 1710   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:15:32,405-Speed 10508.49 samples/sec   Loss 38.1828   LearningRate 0.0498   Epoch: 0   Global Step: 1720   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:15:40,192-Speed 10522.60 samples/sec   Loss 38.1993   LearningRate 0.0501   Epoch: 0   Global Step: 1730   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:15:47,972-Speed 10529.90 samples/sec   Loss 38.1804   LearningRate 0.0503   Epoch: 0   Global Step: 1740   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:15:55,779-Speed 10493.89 samples/sec   Loss 38.1848   LearningRate 0.0506   Epoch: 0   Global Step: 1750   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:16:03,576-Speed 10508.88 samples/sec   Loss 38.1711   LearningRate 0.0509   Epoch: 0   Global Step: 1760   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:16:11,365-Speed 10517.99 samples/sec   Loss 38.1686   LearningRate 0.0512   Epoch: 0   Global Step: 1770   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:16:19,182-Speed 10480.77 samples/sec   Loss 38.1582   LearningRate 0.0515   Epoch: 0   Global Step: 1780   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:16:26,978-Speed 10509.48 samples/sec   Loss 38.1432   LearningRate 0.0518   Epoch: 0   Global Step: 1790   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:16:34,801-Speed 10474.21 samples/sec   Loss 38.1435   LearningRate 0.0521   Epoch: 0   Global Step: 1800   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:16:42,583-Speed 10527.10 samples/sec   Loss 38.1528   LearningRate 0.0524   Epoch: 0   Global Step: 1810   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:16:50,387-Speed 10499.77 samples/sec   Loss 38.1379   LearningRate 0.0527   Epoch: 0   Global Step: 1820   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:16:58,187-Speed 10505.29 samples/sec   Loss 38.1374   LearningRate 0.0530   Epoch: 0   Global Step: 1830   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:17:05,974-Speed 10522.03 samples/sec   Loss 38.1258   LearningRate 0.0532   Epoch: 0   Global Step: 1840   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:17:13,782-Speed 10493.58 samples/sec   Loss 38.1100   LearningRate 0.0535   Epoch: 0   Global Step: 1850   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:17:21,576-Speed 10512.19 samples/sec   Loss 38.0913   LearningRate 0.0538   Epoch: 0   Global Step: 1860   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:17:29,404-Speed 10468.36 samples/sec   Loss 38.0832   LearningRate 0.0541   Epoch: 0   Global Step: 1870   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:17:37,227-Speed 10472.51 samples/sec   Loss 38.0898   LearningRate 0.0544   Epoch: 0   Global Step: 1880   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:17:45,027-Speed 10505.35 samples/sec   Loss 38.0614   LearningRate 0.0547   Epoch: 0   Global Step: 1890   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:17:52,823-Speed 10509.05 samples/sec   Loss 38.0515   LearningRate 0.0550   Epoch: 0   Global Step: 1900   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:18:00,615-Speed 10520.08 samples/sec   Loss 38.0361   LearningRate 0.0553   Epoch: 0   Global Step: 1910   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:18:08,430-Speed 10484.27 samples/sec   Loss 38.0311   LearningRate 0.0556   Epoch: 0   Global Step: 1920   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:18:16,231-Speed 10502.84 samples/sec   Loss 38.0174   LearningRate 0.0558   Epoch: 0   Global Step: 1930   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:18:24,025-Speed 10511.93 samples/sec   Loss 37.9984   LearningRate 0.0561   Epoch: 0   Global Step: 1940   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:18:31,847-Speed 10474.63 samples/sec   Loss 38.0002   LearningRate 0.0564   Epoch: 0   Global Step: 1950   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:18:39,679-Speed 10461.09 samples/sec   Loss 37.9825   LearningRate 0.0567   Epoch: 0   Global Step: 1960   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:18:47,472-Speed 10513.76 samples/sec   Loss 37.9691   LearningRate 0.0570   Epoch: 0   Global Step: 1970   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:18:55,260-Speed 10519.54 samples/sec   Loss 37.9501   LearningRate 0.0573   Epoch: 0   Global Step: 1980   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:19:03,064-Speed 10499.98 samples/sec   Loss 37.9256   LearningRate 0.0576   Epoch: 0   Global Step: 1990   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:19:10,867-Speed 10500.10 samples/sec   Loss 37.9307   LearningRate 0.0579   Epoch: 0   Global Step: 2000   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:19:18,704-Speed 10460.23 samples/sec   Loss 37.8758   LearningRate 0.0582   Epoch: 0   Global Step: 2010   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:19:26,510-Speed 10495.70 samples/sec   Loss 37.8726   LearningRate 0.0584   Epoch: 0   Global Step: 2020   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:19:34,325-Speed 10485.33 samples/sec   Loss 37.8509   LearningRate 0.0587   Epoch: 0   Global Step: 2030   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:19:42,114-Speed 10519.45 samples/sec   Loss 37.8367   LearningRate 0.0590   Epoch: 0   Global Step: 2040   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:19:49,918-Speed 10498.36 samples/sec   Loss 37.8342   LearningRate 0.0593   Epoch: 0   Global Step: 2050   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:19:57,739-Speed 10476.44 samples/sec   Loss 37.7983   LearningRate 0.0596   Epoch: 0   Global Step: 2060   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:20:05,579-Speed 10449.43 samples/sec   Loss 37.7672   LearningRate 0.0599   Epoch: 0   Global Step: 2070   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:20:13,373-Speed 10513.28 samples/sec   Loss 37.7556   LearningRate 0.0602   Epoch: 0   Global Step: 2080   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:20:21,192-Speed 10479.21 samples/sec   Loss 37.7510   LearningRate 0.0605   Epoch: 0   Global Step: 2090   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:20:29,076-Speed 10391.54 samples/sec   Loss 37.6926   LearningRate 0.0608   Epoch: 0   Global Step: 2100   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:20:36,894-Speed 10479.97 samples/sec   Loss 37.6947   LearningRate 0.0611   Epoch: 0   Global Step: 2110   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:20:44,756-Speed 10422.46 samples/sec   Loss 37.6644   LearningRate 0.0613   Epoch: 0   Global Step: 2120   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:20:52,589-Speed 10462.30 samples/sec   Loss 37.6445   LearningRate 0.0616   Epoch: 0   Global Step: 2130   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:21:00,455-Speed 10415.41 samples/sec   Loss 37.6335   LearningRate 0.0619   Epoch: 0   Global Step: 2140   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:21:08,276-Speed 10477.50 samples/sec   Loss 37.6002   LearningRate 0.0622   Epoch: 0   Global Step: 2150   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:21:16,089-Speed 10486.26 samples/sec   Loss 37.5671   LearningRate 0.0625   Epoch: 0   Global Step: 2160   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:21:23,907-Speed 10480.15 samples/sec   Loss 37.5524   LearningRate 0.0628   Epoch: 0   Global Step: 2170   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:21:31,726-Speed 10480.34 samples/sec   Loss 37.5635   LearningRate 0.0631   Epoch: 0   Global Step: 2180   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:21:39,593-Speed 10414.93 samples/sec   Loss 37.5049   LearningRate 0.0634   Epoch: 0   Global Step: 2190   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:21:47,412-Speed 10480.15 samples/sec   Loss 37.4907   LearningRate 0.0637   Epoch: 0   Global Step: 2200   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:21:55,216-Speed 10499.33 samples/sec   Loss 37.4874   LearningRate 0.0639   Epoch: 0   Global Step: 2210   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:22:03,039-Speed 10475.94 samples/sec   Loss 37.4474   LearningRate 0.0642   Epoch: 0   Global Step: 2220   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:22:10,832-Speed 10515.31 samples/sec   Loss 37.4129   LearningRate 0.0645   Epoch: 0   Global Step: 2230   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:22:18,643-Speed 10489.27 samples/sec   Loss 37.3853   LearningRate 0.0648   Epoch: 0   Global Step: 2240   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:22:26,453-Speed 10491.88 samples/sec   Loss 37.3638   LearningRate 0.0651   Epoch: 0   Global Step: 2250   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:22:34,245-Speed 10516.23 samples/sec   Loss 37.3398   LearningRate 0.0654   Epoch: 0   Global Step: 2260   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:22:42,013-Speed 10546.88 samples/sec   Loss 37.3354   LearningRate 0.0657   Epoch: 0   Global Step: 2270   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:22:49,816-Speed 10500.21 samples/sec   Loss 37.2706   LearningRate 0.0660   Epoch: 0   Global Step: 2280   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:22:57,643-Speed 10467.79 samples/sec   Loss 37.2561   LearningRate 0.0663   Epoch: 0   Global Step: 2290   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:23:05,447-Speed 10498.59 samples/sec   Loss 37.2284   LearningRate 0.0666   Epoch: 0   Global Step: 2300   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:23:13,284-Speed 10455.31 samples/sec   Loss 37.1947   LearningRate 0.0668   Epoch: 0   Global Step: 2310   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:23:21,089-Speed 10502.41 samples/sec   Loss 37.1694   LearningRate 0.0671   Epoch: 0   Global Step: 2320   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:23:28,883-Speed 10512.93 samples/sec   Loss 37.1521   LearningRate 0.0674   Epoch: 0   Global Step: 2330   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:23:36,697-Speed 10486.03 samples/sec   Loss 37.1284   LearningRate 0.0677   Epoch: 0   Global Step: 2340   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:23:44,511-Speed 10486.82 samples/sec   Loss 37.0817   LearningRate 0.0680   Epoch: 0   Global Step: 2350   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:23:52,323-Speed 10487.49 samples/sec   Loss 37.0449   LearningRate 0.0683   Epoch: 0   Global Step: 2360   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:24:00,103-Speed 10533.22 samples/sec   Loss 37.0263   LearningRate 0.0686   Epoch: 0   Global Step: 2370   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:24:07,914-Speed 10489.56 samples/sec   Loss 37.0215   LearningRate 0.0689   Epoch: 0   Global Step: 2380   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:24:15,737-Speed 10472.67 samples/sec   Loss 36.9717   LearningRate 0.0692   Epoch: 0   Global Step: 2390   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:24:23,545-Speed 10494.06 samples/sec   Loss 36.9356   LearningRate 0.0694   Epoch: 0   Global Step: 2400   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:24:31,349-Speed 10499.00 samples/sec   Loss 36.8997   LearningRate 0.0697   Epoch: 0   Global Step: 2410   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:24:39,151-Speed 10502.02 samples/sec   Loss 36.9067   LearningRate 0.0700   Epoch: 0   Global Step: 2420   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:24:46,918-Speed 10548.83 samples/sec   Loss 36.8396   LearningRate 0.0703   Epoch: 0   Global Step: 2430   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:24:54,702-Speed 10525.76 samples/sec   Loss 36.8042   LearningRate 0.0706   Epoch: 0   Global Step: 2440   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:25:02,530-Speed 10467.31 samples/sec   Loss 36.7970   LearningRate 0.0709   Epoch: 0   Global Step: 2450   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:25:10,348-Speed 10479.66 samples/sec   Loss 36.7402   LearningRate 0.0712   Epoch: 0   Global Step: 2460   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:25:18,135-Speed 10522.50 samples/sec   Loss 36.7494   LearningRate 0.0715   Epoch: 0   Global Step: 2470   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:25:25,950-Speed 10484.56 samples/sec   Loss 36.6892   LearningRate 0.0718   Epoch: 0   Global Step: 2480   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:25:33,741-Speed 10517.48 samples/sec   Loss 36.6800   LearningRate 0.0720   Epoch: 0   Global Step: 2490   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:25:41,530-Speed 10519.84 samples/sec   Loss 36.6308   LearningRate 0.0723   Epoch: 0   Global Step: 2500   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:25:49,346-Speed 10482.19 samples/sec   Loss 36.5907   LearningRate 0.0726   Epoch: 0   Global Step: 2510   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:25:57,138-Speed 10516.35 samples/sec   Loss 36.5710   LearningRate 0.0729   Epoch: 0   Global Step: 2520   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:26:04,959-Speed 10475.49 samples/sec   Loss 36.5179   LearningRate 0.0732   Epoch: 0   Global Step: 2530   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:26:12,750-Speed 10516.80 samples/sec   Loss 36.4886   LearningRate 0.0735   Epoch: 0   Global Step: 2540   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:26:20,565-Speed 10484.18 samples/sec   Loss 36.4848   LearningRate 0.0738   Epoch: 0   Global Step: 2550   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:26:28,367-Speed 10502.45 samples/sec   Loss 36.4192   LearningRate 0.0741   Epoch: 0   Global Step: 2560   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:26:36,137-Speed 10544.85 samples/sec   Loss 36.3733   LearningRate 0.0744   Epoch: 0   Global Step: 2570   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:26:43,925-Speed 10520.58 samples/sec   Loss 36.3664   LearningRate 0.0747   Epoch: 0   Global Step: 2580   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:26:51,737-Speed 10488.58 samples/sec   Loss 36.3128   LearningRate 0.0749   Epoch: 0   Global Step: 2590   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:26:59,540-Speed 10500.14 samples/sec   Loss 36.2821   LearningRate 0.0752   Epoch: 0   Global Step: 2600   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:27:07,341-Speed 10502.79 samples/sec   Loss 36.2552   LearningRate 0.0755   Epoch: 0   Global Step: 2610   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:27:15,156-Speed 10483.95 samples/sec   Loss 36.2089   LearningRate 0.0758   Epoch: 0   Global Step: 2620   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:27:22,998-Speed 10446.35 samples/sec   Loss 36.2069   LearningRate 0.0761   Epoch: 0   Global Step: 2630   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:27:30,905-Speed 10362.62 samples/sec   Loss 36.1370   LearningRate 0.0764   Epoch: 0   Global Step: 2640   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:27:38,719-Speed 10485.85 samples/sec   Loss 36.0996   LearningRate 0.0767   Epoch: 0   Global Step: 2650   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:27:46,534-Speed 10482.96 samples/sec   Loss 36.0538   LearningRate 0.0770   Epoch: 0   Global Step: 2660   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:27:54,379-Speed 10445.79 samples/sec   Loss 36.0217   LearningRate 0.0773   Epoch: 0   Global Step: 2670   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:28:02,305-Speed 10339.66 samples/sec   Loss 36.0343   LearningRate 0.0775   Epoch: 0   Global Step: 2680   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:28:10,122-Speed 10481.53 samples/sec   Loss 35.9923   LearningRate 0.0778   Epoch: 0   Global Step: 2690   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:28:17,963-Speed 10449.58 samples/sec   Loss 35.9154   LearningRate 0.0781   Epoch: 0   Global Step: 2700   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:28:25,801-Speed 10452.70 samples/sec   Loss 35.9000   LearningRate 0.0784   Epoch: 0   Global Step: 2710   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:28:33,653-Speed 10436.45 samples/sec   Loss 35.8745   LearningRate 0.0787   Epoch: 0   Global Step: 2720   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:28:41,465-Speed 10488.95 samples/sec   Loss 35.7826   LearningRate 0.0790   Epoch: 0   Global Step: 2730   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:28:49,239-Speed 10539.32 samples/sec   Loss 35.7663   LearningRate 0.0793   Epoch: 0   Global Step: 2740   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:28:57,019-Speed 10531.22 samples/sec   Loss 35.7742   LearningRate 0.0796   Epoch: 0   Global Step: 2750   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:29:04,816-Speed 10507.66 samples/sec   Loss 35.6836   LearningRate 0.0799   Epoch: 0   Global Step: 2760   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:29:12,597-Speed 10530.61 samples/sec   Loss 35.6889   LearningRate 0.0802   Epoch: 0   Global Step: 2770   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:29:20,388-Speed 10517.57 samples/sec   Loss 35.6139   LearningRate 0.0804   Epoch: 0   Global Step: 2780   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:29:28,179-Speed 10515.48 samples/sec   Loss 35.6009   LearningRate 0.0807   Epoch: 0   Global Step: 2790   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:29:35,968-Speed 10518.82 samples/sec   Loss 35.5053   LearningRate 0.0810   Epoch: 0   Global Step: 2800   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:29:43,806-Speed 10456.53 samples/sec   Loss 35.4856   LearningRate 0.0813   Epoch: 0   Global Step: 2810   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:29:51,638-Speed 10462.49 samples/sec   Loss 35.4605   LearningRate 0.0816   Epoch: 0   Global Step: 2820   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:29:59,442-Speed 10498.56 samples/sec   Loss 35.4022   LearningRate 0.0819   Epoch: 0   Global Step: 2830   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:30:07,206-Speed 10554.12 samples/sec   Loss 35.3554   LearningRate 0.0822   Epoch: 0   Global Step: 2840   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:30:14,990-Speed 10531.96 samples/sec   Loss 35.3110   LearningRate 0.0825   Epoch: 0   Global Step: 2850   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:30:22,811-Speed 10476.64 samples/sec   Loss 35.2568   LearningRate 0.0828   Epoch: 0   Global Step: 2860   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:30:30,626-Speed 10484.59 samples/sec   Loss 35.2275   LearningRate 0.0830   Epoch: 0   Global Step: 2870   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:30:38,461-Speed 10457.73 samples/sec   Loss 35.1963   LearningRate 0.0833   Epoch: 0   Global Step: 2880   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:30:46,302-Speed 10449.24 samples/sec   Loss 35.1357   LearningRate 0.0836   Epoch: 0   Global Step: 2890   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:30:54,085-Speed 10527.99 samples/sec   Loss 35.0861   LearningRate 0.0839   Epoch: 0   Global Step: 2900   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:31:01,875-Speed 10517.28 samples/sec   Loss 35.0436   LearningRate 0.0842   Epoch: 0   Global Step: 2910   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:31:09,666-Speed 10516.98 samples/sec   Loss 34.9981   LearningRate 0.0845   Epoch: 0   Global Step: 2920   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:31:17,435-Speed 10545.17 samples/sec   Loss 34.9496   LearningRate 0.0848   Epoch: 0   Global Step: 2930   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:31:25,245-Speed 10492.11 samples/sec   Loss 34.9343   LearningRate 0.0851   Epoch: 0   Global Step: 2940   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:31:33,058-Speed 10488.17 samples/sec   Loss 34.8839   LearningRate 0.0854   Epoch: 0   Global Step: 2950   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:31:40,865-Speed 10495.47 samples/sec   Loss 34.8213   LearningRate 0.0856   Epoch: 0   Global Step: 2960   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:31:48,669-Speed 10500.00 samples/sec   Loss 34.7541   LearningRate 0.0859   Epoch: 0   Global Step: 2970   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:31:56,517-Speed 10444.70 samples/sec   Loss 34.7399   LearningRate 0.0862   Epoch: 0   Global Step: 2980   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:32:04,341-Speed 10473.00 samples/sec   Loss 34.7122   LearningRate 0.0865   Epoch: 0   Global Step: 2990   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:32:12,142-Speed 10507.68 samples/sec   Loss 34.6536   LearningRate 0.0868   Epoch: 0   Global Step: 3000   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:32:19,944-Speed 10502.26 samples/sec   Loss 34.5941   LearningRate 0.0871   Epoch: 0   Global Step: 3010   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:32:27,770-Speed 10470.14 samples/sec   Loss 34.5121   LearningRate 0.0874   Epoch: 0   Global Step: 3020   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:32:35,562-Speed 10514.26 samples/sec   Loss 34.4775   LearningRate 0.0877   Epoch: 0   Global Step: 3030   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:32:43,356-Speed 10512.73 samples/sec   Loss 34.4509   LearningRate 0.0880   Epoch: 0   Global Step: 3040   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:32:51,160-Speed 10499.57 samples/sec   Loss 34.3857   LearningRate 0.0883   Epoch: 0   Global Step: 3050   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:32:58,977-Speed 10483.51 samples/sec   Loss 34.3196   LearningRate 0.0885   Epoch: 0   Global Step: 3060   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:33:06,812-Speed 10457.88 samples/sec   Loss 34.2863   LearningRate 0.0888   Epoch: 0   Global Step: 3070   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:33:14,619-Speed 10495.16 samples/sec   Loss 34.2116   LearningRate 0.0891   Epoch: 0   Global Step: 3080   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:33:22,464-Speed 10444.87 samples/sec   Loss 34.2060   LearningRate 0.0894   Epoch: 0   Global Step: 3090   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:33:30,316-Speed 10435.27 samples/sec   Loss 34.1100   LearningRate 0.0897   Epoch: 0   Global Step: 3100   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:33:38,127-Speed 10490.78 samples/sec   Loss 34.0533   LearningRate 0.0900   Epoch: 0   Global Step: 3110   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:33:45,939-Speed 10488.72 samples/sec   Loss 34.0500   LearningRate 0.0903   Epoch: 0   Global Step: 3120   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:33:53,745-Speed 10496.52 samples/sec   Loss 33.9818   LearningRate 0.0906   Epoch: 0   Global Step: 3130   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:34:01,541-Speed 10510.39 samples/sec   Loss 33.9360   LearningRate 0.0909   Epoch: 0   Global Step: 3140   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:34:09,363-Speed 10474.12 samples/sec   Loss 33.8508   LearningRate 0.0911   Epoch: 0   Global Step: 3150   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:34:17,164-Speed 10502.07 samples/sec   Loss 33.8342   LearningRate 0.0914   Epoch: 0   Global Step: 3160   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:34:24,967-Speed 10501.04 samples/sec   Loss 33.7779   LearningRate 0.0917   Epoch: 0   Global Step: 3170   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:34:32,766-Speed 10509.29 samples/sec   Loss 33.7098   LearningRate 0.0920   Epoch: 0   Global Step: 3180   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:34:40,581-Speed 10484.72 samples/sec   Loss 33.6542   LearningRate 0.0923   Epoch: 0   Global Step: 3190   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:34:48,375-Speed 10511.01 samples/sec   Loss 33.6219   LearningRate 0.0926   Epoch: 0   Global Step: 3200   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:34:56,159-Speed 10527.03 samples/sec   Loss 33.5585   LearningRate 0.0929   Epoch: 0   Global Step: 3210   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:35:03,932-Speed 10541.08 samples/sec   Loss 33.4974   LearningRate 0.0932   Epoch: 0   Global Step: 3220   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:35:11,739-Speed 10496.01 samples/sec   Loss 33.4497   LearningRate 0.0935   Epoch: 0   Global Step: 3230   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:35:19,517-Speed 10534.23 samples/sec   Loss 33.4034   LearningRate 0.0938   Epoch: 0   Global Step: 3240   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:35:27,333-Speed 10482.94 samples/sec   Loss 33.3321   LearningRate 0.0940   Epoch: 0   Global Step: 3250   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:35:35,124-Speed 10516.70 samples/sec   Loss 33.2796   LearningRate 0.0943   Epoch: 0   Global Step: 3260   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:35:42,900-Speed 10540.48 samples/sec   Loss 33.2219   LearningRate 0.0946   Epoch: 0   Global Step: 3270   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:35:50,686-Speed 10527.31 samples/sec   Loss 33.2085   LearningRate 0.0949   Epoch: 0   Global Step: 3280   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:35:58,466-Speed 10531.65 samples/sec   Loss 33.1250   LearningRate 0.0952   Epoch: 0   Global Step: 3290   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:36:06,261-Speed 10511.97 samples/sec   Loss 33.0747   LearningRate 0.0955   Epoch: 0   Global Step: 3300   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:36:14,086-Speed 10471.75 samples/sec   Loss 33.0011   LearningRate 0.0958   Epoch: 0   Global Step: 3310   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:36:21,877-Speed 10516.99 samples/sec   Loss 32.9442   LearningRate 0.0961   Epoch: 0   Global Step: 3320   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:36:29,677-Speed 10504.55 samples/sec   Loss 32.9245   LearningRate 0.0964   Epoch: 0   Global Step: 3330   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:36:37,474-Speed 10509.25 samples/sec   Loss 32.8794   LearningRate 0.0966   Epoch: 0   Global Step: 3340   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:36:45,302-Speed 10466.73 samples/sec   Loss 32.7904   LearningRate 0.0969   Epoch: 0   Global Step: 3350   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:36:53,106-Speed 10499.43 samples/sec   Loss 32.7350   LearningRate 0.0972   Epoch: 0   Global Step: 3360   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:37:00,916-Speed 10491.81 samples/sec   Loss 32.7126   LearningRate 0.0975   Epoch: 0   Global Step: 3370   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:37:08,720-Speed 10500.33 samples/sec   Loss 32.6052   LearningRate 0.0978   Epoch: 0   Global Step: 3380   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:37:16,513-Speed 10514.29 samples/sec   Loss 32.5599   LearningRate 0.0981   Epoch: 0   Global Step: 3390   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:37:24,327-Speed 10486.17 samples/sec   Loss 32.5151   LearningRate 0.0984   Epoch: 0   Global Step: 3400   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:37:32,145-Speed 10480.00 samples/sec   Loss 32.4110   LearningRate 0.0987   Epoch: 0   Global Step: 3410   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:37:39,952-Speed 10495.50 samples/sec   Loss 32.4042   LearningRate 0.0990   Epoch: 0   Global Step: 3420   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:37:47,769-Speed 10481.67 samples/sec   Loss 32.3661   LearningRate 0.0992   Epoch: 0   Global Step: 3430   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:37:55,580-Speed 10493.29 samples/sec   Loss 32.2395   LearningRate 0.0995   Epoch: 0   Global Step: 3440   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:38:03,364-Speed 10526.15 samples/sec   Loss 32.2077   LearningRate 0.0998   Epoch: 0   Global Step: 3450   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:38:11,214-Speed 10436.93 samples/sec   Loss 32.1870   LearningRate 0.1001   Epoch: 0   Global Step: 3460   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:38:18,995-Speed 10530.28 samples/sec   Loss 32.0700   LearningRate 0.1004   Epoch: 0   Global Step: 3470   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:38:26,826-Speed 10467.89 samples/sec   Loss 32.0359   LearningRate 0.1007   Epoch: 0   Global Step: 3480   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:38:34,619-Speed 10513.45 samples/sec   Loss 31.9767   LearningRate 0.1010   Epoch: 0   Global Step: 3490   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:38:42,418-Speed 10505.80 samples/sec   Loss 31.9094   LearningRate 0.1013   Epoch: 0   Global Step: 3500   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:38:50,201-Speed 10528.22 samples/sec   Loss 31.8932   LearningRate 0.1016   Epoch: 0   Global Step: 3510   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:38:57,987-Speed 10523.75 samples/sec   Loss 31.8085   LearningRate 0.1019   Epoch: 0   Global Step: 3520   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:39:05,765-Speed 10535.46 samples/sec   Loss 31.7308   LearningRate 0.1021   Epoch: 0   Global Step: 3530   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:39:13,562-Speed 10509.49 samples/sec   Loss 31.6538   LearningRate 0.1024   Epoch: 0   Global Step: 3540   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:39:21,372-Speed 10490.85 samples/sec   Loss 31.5949   LearningRate 0.1027   Epoch: 0   Global Step: 3550   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:39:29,213-Speed 10450.21 samples/sec   Loss 31.5747   LearningRate 0.1030   Epoch: 0   Global Step: 3560   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:39:37,035-Speed 10474.34 samples/sec   Loss 31.4318   LearningRate 0.1033   Epoch: 0   Global Step: 3570   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:39:44,822-Speed 10521.69 samples/sec   Loss 31.4191   LearningRate 0.1036   Epoch: 0   Global Step: 3580   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:39:52,654-Speed 10462.61 samples/sec   Loss 31.3333   LearningRate 0.1039   Epoch: 0   Global Step: 3590   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:40:00,499-Speed 10443.65 samples/sec   Loss 31.2756   LearningRate 0.1042   Epoch: 0   Global Step: 3600   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:40:08,268-Speed 10546.84 samples/sec   Loss 31.2849   LearningRate 0.1045   Epoch: 0   Global Step: 3610   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:40:16,082-Speed 10485.18 samples/sec   Loss 31.1584   LearningRate 0.1047   Epoch: 0   Global Step: 3620   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:40:23,887-Speed 10497.70 samples/sec   Loss 31.1386   LearningRate 0.1050   Epoch: 0   Global Step: 3630   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:40:31,691-Speed 10497.67 samples/sec   Loss 31.1046   LearningRate 0.1053   Epoch: 0   Global Step: 3640   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:40:39,522-Speed 10463.22 samples/sec   Loss 31.0417   LearningRate 0.1056   Epoch: 0   Global Step: 3650   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:40:47,306-Speed 10526.62 samples/sec   Loss 30.9521   LearningRate 0.1059   Epoch: 0   Global Step: 3660   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:40:55,098-Speed 10514.12 samples/sec   Loss 30.8901   LearningRate 0.1062   Epoch: 0   Global Step: 3670   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:41:02,929-Speed 10464.51 samples/sec   Loss 30.8120   LearningRate 0.1065   Epoch: 0   Global Step: 3680   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:41:10,714-Speed 10524.27 samples/sec   Loss 30.7770   LearningRate 0.1068   Epoch: 0   Global Step: 3690   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:41:18,522-Speed 10492.70 samples/sec   Loss 30.7184   LearningRate 0.1071   Epoch: 0   Global Step: 3700   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:41:26,308-Speed 10524.30 samples/sec   Loss 30.5893   LearningRate 0.1073   Epoch: 0   Global Step: 3710   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:41:34,115-Speed 10494.15 samples/sec   Loss 30.6073   LearningRate 0.1076   Epoch: 0   Global Step: 3720   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:41:41,942-Speed 10468.96 samples/sec   Loss 30.5308   LearningRate 0.1079   Epoch: 0   Global Step: 3730   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:41:49,788-Speed 10443.70 samples/sec   Loss 30.4862   LearningRate 0.1082   Epoch: 0   Global Step: 3740   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:41:57,577-Speed 10518.82 samples/sec   Loss 30.3750   LearningRate 0.1085   Epoch: 0   Global Step: 3750   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:42:05,377-Speed 10504.17 samples/sec   Loss 30.3702   LearningRate 0.1088   Epoch: 0   Global Step: 3760   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:42:13,202-Speed 10472.70 samples/sec   Loss 30.2370   LearningRate 0.1091   Epoch: 0   Global Step: 3770   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:42:21,010-Speed 10493.14 samples/sec   Loss 30.2190   LearningRate 0.1094   Epoch: 0   Global Step: 3780   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:42:28,841-Speed 10463.30 samples/sec   Loss 30.1498   LearningRate 0.1097   Epoch: 0   Global Step: 3790   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:42:36,639-Speed 10506.50 samples/sec   Loss 30.1364   LearningRate 0.1100   Epoch: 0   Global Step: 3800   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:42:44,420-Speed 10531.01 samples/sec   Loss 29.9757   LearningRate 0.1102   Epoch: 0   Global Step: 3810   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:42:52,207-Speed 10522.71 samples/sec   Loss 29.9074   LearningRate 0.1105   Epoch: 0   Global Step: 3820   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:42:59,989-Speed 10529.29 samples/sec   Loss 29.8381   LearningRate 0.1108   Epoch: 0   Global Step: 3830   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:43:07,807-Speed 10479.28 samples/sec   Loss 29.8431   LearningRate 0.1111   Epoch: 0   Global Step: 3840   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:43:15,618-Speed 10489.77 samples/sec   Loss 29.7514   LearningRate 0.1114   Epoch: 0   Global Step: 3850   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:43:23,432-Speed 10486.11 samples/sec   Loss 29.6768   LearningRate 0.1117   Epoch: 0   Global Step: 3860   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:43:31,230-Speed 10505.53 samples/sec   Loss 29.5710   LearningRate 0.1120   Epoch: 0   Global Step: 3870   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:43:38,996-Speed 10550.25 samples/sec   Loss 29.5396   LearningRate 0.1123   Epoch: 0   Global Step: 3880   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:43:46,790-Speed 10513.25 samples/sec   Loss 29.5692   LearningRate 0.1126   Epoch: 0   Global Step: 3890   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:43:54,626-Speed 10461.16 samples/sec   Loss 29.4409   LearningRate 0.1128   Epoch: 0   Global Step: 3900   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:44:02,428-Speed 10502.38 samples/sec   Loss 29.3718   LearningRate 0.1131   Epoch: 0   Global Step: 3910   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:44:10,248-Speed 10476.53 samples/sec   Loss 29.2817   LearningRate 0.1134   Epoch: 0   Global Step: 3920   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:44:18,055-Speed 10495.07 samples/sec   Loss 29.2354   LearningRate 0.1137   Epoch: 0   Global Step: 3930   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:44:25,910-Speed 10431.11 samples/sec   Loss 29.2058   LearningRate 0.1140   Epoch: 0   Global Step: 3940   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:44:33,708-Speed 10506.54 samples/sec   Loss 29.1715   LearningRate 0.1143   Epoch: 0   Global Step: 3950   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:44:41,498-Speed 10518.29 samples/sec   Loss 29.0271   LearningRate 0.1146   Epoch: 0   Global Step: 3960   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:44:49,312-Speed 10485.44 samples/sec   Loss 29.0186   LearningRate 0.1149   Epoch: 0   Global Step: 3970   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:44:57,099-Speed 10521.55 samples/sec   Loss 28.9325   LearningRate 0.1152   Epoch: 0   Global Step: 3980   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:45:04,896-Speed 10507.92 samples/sec   Loss 28.8363   LearningRate 0.1155   Epoch: 0   Global Step: 3990   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:45:12,692-Speed 10510.17 samples/sec   Loss 28.8304   LearningRate 0.1157   Epoch: 0   Global Step: 4000   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:45:20,470-Speed 10534.97 samples/sec   Loss 28.7399   LearningRate 0.1160   Epoch: 0   Global Step: 4010   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:45:28,251-Speed 10529.90 samples/sec   Loss 28.6998   LearningRate 0.1163   Epoch: 0   Global Step: 4020   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:45:36,065-Speed 10484.95 samples/sec   Loss 28.5734   LearningRate 0.1166   Epoch: 0   Global Step: 4030   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:45:43,843-Speed 10534.28 samples/sec   Loss 28.4916   LearningRate 0.1169   Epoch: 0   Global Step: 4040   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:45:51,656-Speed 10487.04 samples/sec   Loss 28.3875   LearningRate 0.1172   Epoch: 0   Global Step: 4050   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:45:59,424-Speed 10547.00 samples/sec   Loss 28.4237   LearningRate 0.1175   Epoch: 0   Global Step: 4060   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:46:07,234-Speed 10490.63 samples/sec   Loss 28.2989   LearningRate 0.1178   Epoch: 0   Global Step: 4070   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:46:15,031-Speed 10509.23 samples/sec   Loss 28.2516   LearningRate 0.1181   Epoch: 0   Global Step: 4080   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:46:22,820-Speed 10519.97 samples/sec   Loss 28.1823   LearningRate 0.1183   Epoch: 0   Global Step: 4090   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:46:30,614-Speed 10512.40 samples/sec   Loss 28.0971   LearningRate 0.1186   Epoch: 0   Global Step: 4100   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:46:38,420-Speed 10496.29 samples/sec   Loss 27.9701   LearningRate 0.1189   Epoch: 0   Global Step: 4110   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:46:46,248-Speed 10467.39 samples/sec   Loss 27.9133   LearningRate 0.1192   Epoch: 0   Global Step: 4120   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:46:54,033-Speed 10524.84 samples/sec   Loss 27.8962   LearningRate 0.1195   Epoch: 0   Global Step: 4130   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:47:01,811-Speed 10534.79 samples/sec   Loss 27.8093   LearningRate 0.1198   Epoch: 0   Global Step: 4140   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:47:11,082-Speed 8837.86 samples/sec   Loss 27.8495   LearningRate 0.1201   Epoch: 0   Global Step: 4150   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:47:18,873-Speed 10516.92 samples/sec   Loss 27.7178   LearningRate 0.1204   Epoch: 0   Global Step: 4160   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:47:26,680-Speed 10495.77 samples/sec   Loss 27.6475   LearningRate 0.1207   Epoch: 0   Global Step: 4170   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:47:34,477-Speed 10508.21 samples/sec   Loss 27.5682   LearningRate 0.1209   Epoch: 0   Global Step: 4180   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:47:42,291-Speed 10486.78 samples/sec   Loss 27.4922   LearningRate 0.1212   Epoch: 0   Global Step: 4190   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:47:50,101-Speed 10496.89 samples/sec   Loss 27.4057   LearningRate 0.1215   Epoch: 0   Global Step: 4200   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:47:57,894-Speed 10513.42 samples/sec   Loss 27.3779   LearningRate 0.1218   Epoch: 0   Global Step: 4210   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:48:05,679-Speed 10525.71 samples/sec   Loss 27.2868   LearningRate 0.1221   Epoch: 0   Global Step: 4220   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:48:13,462-Speed 10527.01 samples/sec   Loss 27.2341   LearningRate 0.1224   Epoch: 0   Global Step: 4230   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:48:21,249-Speed 10522.57 samples/sec   Loss 27.1557   LearningRate 0.1227   Epoch: 0   Global Step: 4240   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:48:29,031-Speed 10527.91 samples/sec   Loss 27.0681   LearningRate 0.1230   Epoch: 0   Global Step: 4250   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:48:36,813-Speed 10527.69 samples/sec   Loss 26.9960   LearningRate 0.1233   Epoch: 0   Global Step: 4260   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:48:44,623-Speed 10497.33 samples/sec   Loss 26.9321   LearningRate 0.1236   Epoch: 0   Global Step: 4270   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:48:52,436-Speed 10487.22 samples/sec   Loss 26.9226   LearningRate 0.1238   Epoch: 0   Global Step: 4280   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:49:00,251-Speed 10483.92 samples/sec   Loss 26.8117   LearningRate 0.1241   Epoch: 0   Global Step: 4290   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:49:08,043-Speed 10515.39 samples/sec   Loss 26.7446   LearningRate 0.1244   Epoch: 0   Global Step: 4300   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:49:15,889-Speed 10443.33 samples/sec   Loss 26.6975   LearningRate 0.1247   Epoch: 0   Global Step: 4310   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:49:23,684-Speed 10509.89 samples/sec   Loss 26.6354   LearningRate 0.1250   Epoch: 0   Global Step: 4320   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:49:31,503-Speed 10479.33 samples/sec   Loss 26.5283   LearningRate 0.1253   Epoch: 0   Global Step: 4330   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:49:39,347-Speed 10445.27 samples/sec   Loss 26.4475   LearningRate 0.1256   Epoch: 0   Global Step: 4340   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:49:47,140-Speed 10513.90 samples/sec   Loss 26.3016   LearningRate 0.1259   Epoch: 0   Global Step: 4350   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:49:54,930-Speed 10517.61 samples/sec   Loss 26.3724   LearningRate 0.1262   Epoch: 0   Global Step: 4360   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:50:02,731-Speed 10504.05 samples/sec   Loss 26.3121   LearningRate 0.1264   Epoch: 0   Global Step: 4370   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:50:10,538-Speed 10494.97 samples/sec   Loss 26.2437   LearningRate 0.1267   Epoch: 0   Global Step: 4380   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:50:18,333-Speed 10510.04 samples/sec   Loss 26.1366   LearningRate 0.1270   Epoch: 0   Global Step: 4390   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:50:26,100-Speed 10549.70 samples/sec   Loss 26.0627   LearningRate 0.1273   Epoch: 0   Global Step: 4400   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:50:33,911-Speed 10489.65 samples/sec   Loss 26.0746   LearningRate 0.1276   Epoch: 0   Global Step: 4410   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:50:41,722-Speed 10489.51 samples/sec   Loss 25.9536   LearningRate 0.1279   Epoch: 0   Global Step: 4420   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:50:49,515-Speed 10513.88 samples/sec   Loss 25.8462   LearningRate 0.1282   Epoch: 0   Global Step: 4430   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:50:57,326-Speed 10488.66 samples/sec   Loss 25.8146   LearningRate 0.1285   Epoch: 0   Global Step: 4440   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:51:05,161-Speed 10458.43 samples/sec   Loss 25.7183   LearningRate 0.1288   Epoch: 0   Global Step: 4450   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:51:12,963-Speed 10501.40 samples/sec   Loss 25.6395   LearningRate 0.1291   Epoch: 0   Global Step: 4460   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:51:20,741-Speed 10533.87 samples/sec   Loss 25.5915   LearningRate 0.1293   Epoch: 0   Global Step: 4470   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:51:28,541-Speed 10502.25 samples/sec   Loss 25.4946   LearningRate 0.1296   Epoch: 0   Global Step: 4480   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-15 15:51:36,330-Speed 10519.69 samples/sec   Loss 25.4817   LearningRate 0.1299   Epoch: 0   Global Step: 4490   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:51:44,122-Speed 10515.55 samples/sec   Loss 25.2924   LearningRate 0.1302   Epoch: 0   Global Step: 4500   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:51:51,983-Speed 10421.18 samples/sec   Loss 25.3029   LearningRate 0.1305   Epoch: 0   Global Step: 4510   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:51:59,801-Speed 10480.02 samples/sec   Loss 25.2564   LearningRate 0.1308   Epoch: 0   Global Step: 4520   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:52:07,603-Speed 10503.05 samples/sec   Loss 25.1238   LearningRate 0.1311   Epoch: 0   Global Step: 4530   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:52:15,387-Speed 10528.41 samples/sec   Loss 25.0810   LearningRate 0.1314   Epoch: 0   Global Step: 4540   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:52:23,169-Speed 10528.22 samples/sec   Loss 24.9932   LearningRate 0.1317   Epoch: 0   Global Step: 4550   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:52:30,998-Speed 10466.56 samples/sec   Loss 24.9511   LearningRate 0.1319   Epoch: 0   Global Step: 4560   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:52:38,832-Speed 10459.76 samples/sec   Loss 24.9410   LearningRate 0.1322   Epoch: 0   Global Step: 4570   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:52:46,676-Speed 10445.58 samples/sec   Loss 24.8144   LearningRate 0.1325   Epoch: 0   Global Step: 4580   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:52:54,495-Speed 10478.62 samples/sec   Loss 24.7410   LearningRate 0.1328   Epoch: 0   Global Step: 4590   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:53:02,298-Speed 10500.67 samples/sec   Loss 24.6295   LearningRate 0.1331   Epoch: 0   Global Step: 4600   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:53:10,123-Speed 10470.71 samples/sec   Loss 24.6516   LearningRate 0.1334   Epoch: 0   Global Step: 4610   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:53:17,933-Speed 10491.36 samples/sec   Loss 24.6048   LearningRate 0.1337   Epoch: 0   Global Step: 4620   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:53:25,734-Speed 10503.43 samples/sec   Loss 24.4456   LearningRate 0.1340   Epoch: 0   Global Step: 4630   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:53:33,516-Speed 10527.26 samples/sec   Loss 24.4727   LearningRate 0.1343   Epoch: 0   Global Step: 4640   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:53:41,311-Speed 10515.08 samples/sec   Loss 24.3216   LearningRate 0.1345   Epoch: 0   Global Step: 4650   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:53:49,083-Speed 10541.82 samples/sec   Loss 24.3399   LearningRate 0.1348   Epoch: 0   Global Step: 4660   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:53:56,859-Speed 10535.80 samples/sec   Loss 24.2126   LearningRate 0.1351   Epoch: 0   Global Step: 4670   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:54:06,930-Speed 8135.17 samples/sec   Loss 24.1315   LearningRate 0.1354   Epoch: 0   Global Step: 4680   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:54:14,695-Speed 10552.00 samples/sec   Loss 24.0211   LearningRate 0.1357   Epoch: 0   Global Step: 4690   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-15 15:54:22,479-Speed 10531.31 samples/sec   Loss 24.0521   LearningRate 0.1360   Epoch: 0   Global Step: 4700   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:54:30,256-Speed 10535.63 samples/sec   Loss 23.9430   LearningRate 0.1363   Epoch: 0   Global Step: 4710   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:54:38,035-Speed 10532.38 samples/sec   Loss 23.8784   LearningRate 0.1366   Epoch: 0   Global Step: 4720   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:54:45,813-Speed 10535.78 samples/sec   Loss 23.8262   LearningRate 0.1369   Epoch: 0   Global Step: 4730   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:54:53,642-Speed 10464.64 samples/sec   Loss 23.7200   LearningRate 0.1372   Epoch: 0   Global Step: 4740   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:55:01,520-Speed 10399.96 samples/sec   Loss 23.6307   LearningRate 0.1374   Epoch: 0   Global Step: 4750   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:55:09,323-Speed 10500.50 samples/sec   Loss 23.6713   LearningRate 0.1377   Epoch: 0   Global Step: 4760   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-15 15:55:17,093-Speed 10546.26 samples/sec   Loss 23.5056   LearningRate 0.1380   Epoch: 0   Global Step: 4770   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-15 15:55:24,890-Speed 10507.02 samples/sec   Loss 23.4991   LearningRate 0.1383   Epoch: 0   Global Step: 4780   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-15 15:55:32,685-Speed 10512.44 samples/sec   Loss 23.4273   LearningRate 0.1386   Epoch: 0   Global Step: 4790   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-15 15:55:40,472-Speed 10521.73 samples/sec   Loss 23.3681   LearningRate 0.1389   Epoch: 0   Global Step: 4800   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-15 15:55:48,252-Speed 10532.60 samples/sec   Loss 23.3039   LearningRate 0.1392   Epoch: 0   Global Step: 4810   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-15 15:55:56,066-Speed 10486.69 samples/sec   Loss 23.1922   LearningRate 0.1395   Epoch: 0   Global Step: 4820   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-15 15:56:03,880-Speed 10485.58 samples/sec   Loss 23.0905   LearningRate 0.1398   Epoch: 0   Global Step: 4830   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-15 15:56:11,682-Speed 10500.00 samples/sec   Loss 23.1084   LearningRate 0.1400   Epoch: 0   Global Step: 4840   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-15 15:56:19,490-Speed 10493.51 samples/sec   Loss 22.9152   LearningRate 0.1403   Epoch: 0   Global Step: 4850   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-15 15:56:27,293-Speed 10501.44 samples/sec   Loss 22.8883   LearningRate 0.1406   Epoch: 0   Global Step: 4860   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-15 15:56:35,118-Speed 10474.04 samples/sec   Loss 22.8544   LearningRate 0.1409   Epoch: 0   Global Step: 4870   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 15:56:42,954-Speed 10457.50 samples/sec   Loss 22.8564   LearningRate 0.1412   Epoch: 0   Global Step: 4880   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 15:56:50,775-Speed 10476.84 samples/sec   Loss 22.7073   LearningRate 0.1415   Epoch: 0   Global Step: 4890   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 15:56:58,559-Speed 10525.04 samples/sec   Loss 22.6033   LearningRate 0.1418   Epoch: 0   Global Step: 4900   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 15:57:06,395-Speed 10456.79 samples/sec   Loss 22.5397   LearningRate 0.1421   Epoch: 0   Global Step: 4910   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 15:57:14,191-Speed 10511.64 samples/sec   Loss 22.5665   LearningRate 0.1424   Epoch: 0   Global Step: 4920   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 15:57:21,987-Speed 10510.51 samples/sec   Loss 22.4893   LearningRate 0.1427   Epoch: 0   Global Step: 4930   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 15:57:29,775-Speed 10519.58 samples/sec   Loss 22.3900   LearningRate 0.1429   Epoch: 0   Global Step: 4940   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 15:57:37,581-Speed 10496.02 samples/sec   Loss 22.3162   LearningRate 0.1432   Epoch: 0   Global Step: 4950   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 15:57:45,385-Speed 10498.43 samples/sec   Loss 22.2510   LearningRate 0.1435   Epoch: 0   Global Step: 4960   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 15:57:53,191-Speed 10497.02 samples/sec   Loss 22.2205   LearningRate 0.1438   Epoch: 0   Global Step: 4970   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 15:58:00,978-Speed 10521.37 samples/sec   Loss 22.1409   LearningRate 0.1441   Epoch: 0   Global Step: 4980   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 15:58:08,766-Speed 10520.08 samples/sec   Loss 22.0570   LearningRate 0.1444   Epoch: 0   Global Step: 4990   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 15:58:16,544-Speed 10533.53 samples/sec   Loss 21.9856   LearningRate 0.1447   Epoch: 0   Global Step: 5000   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 15:58:24,349-Speed 10497.68 samples/sec   Loss 21.9345   LearningRate 0.1450   Epoch: 0   Global Step: 5010   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 15:58:32,182-Speed 10459.19 samples/sec   Loss 21.8381   LearningRate 0.1453   Epoch: 0   Global Step: 5020   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 15:58:39,974-Speed 10513.81 samples/sec   Loss 21.7888   LearningRate 0.1455   Epoch: 0   Global Step: 5030   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 15:58:47,759-Speed 10524.80 samples/sec   Loss 21.7161   LearningRate 0.1458   Epoch: 0   Global Step: 5040   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 15:58:55,558-Speed 10505.19 samples/sec   Loss 21.6213   LearningRate 0.1461   Epoch: 0   Global Step: 5050   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 15:59:03,355-Speed 10508.24 samples/sec   Loss 21.5554   LearningRate 0.1464   Epoch: 0   Global Step: 5060   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 15:59:11,164-Speed 10493.40 samples/sec   Loss 21.4834   LearningRate 0.1467   Epoch: 0   Global Step: 5070   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 15:59:18,962-Speed 10512.35 samples/sec   Loss 21.4653   LearningRate 0.1470   Epoch: 0   Global Step: 5080   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 15:59:26,759-Speed 10508.89 samples/sec   Loss 21.4166   LearningRate 0.1473   Epoch: 0   Global Step: 5090   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 15:59:34,575-Speed 10482.03 samples/sec   Loss 21.3460   LearningRate 0.1476   Epoch: 0   Global Step: 5100   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 15:59:42,403-Speed 10468.79 samples/sec   Loss 21.2477   LearningRate 0.1479   Epoch: 0   Global Step: 5110   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 15:59:50,241-Speed 10452.56 samples/sec   Loss 21.2263   LearningRate 0.1481   Epoch: 0   Global Step: 5120   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 15:59:58,067-Speed 10470.94 samples/sec   Loss 21.1825   LearningRate 0.1484   Epoch: 0   Global Step: 5130   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:00:05,897-Speed 10465.49 samples/sec   Loss 21.1210   LearningRate 0.1487   Epoch: 0   Global Step: 5140   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:00:13,699-Speed 10500.97 samples/sec   Loss 21.0489   LearningRate 0.1490   Epoch: 0   Global Step: 5150   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:00:21,522-Speed 10474.71 samples/sec   Loss 21.0028   LearningRate 0.1493   Epoch: 0   Global Step: 5160   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:00:29,332-Speed 10490.34 samples/sec   Loss 20.8953   LearningRate 0.1496   Epoch: 0   Global Step: 5170   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:00:37,143-Speed 10489.88 samples/sec   Loss 20.8111   LearningRate 0.1499   Epoch: 0   Global Step: 5180   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:00:59,555-Speed 3655.55 samples/sec   Loss 20.7306   LearningRate 0.1502   Epoch: 1   Global Step: 5190   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:01:07,353-Speed 10508.26 samples/sec   Loss 20.7049   LearningRate 0.1505   Epoch: 1   Global Step: 5200   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:01:15,145-Speed 10515.83 samples/sec   Loss 20.6998   LearningRate 0.1508   Epoch: 1   Global Step: 5210   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:01:22,942-Speed 10507.34 samples/sec   Loss 20.6126   LearningRate 0.1510   Epoch: 1   Global Step: 5220   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:01:30,742-Speed 10504.70 samples/sec   Loss 20.4690   LearningRate 0.1513   Epoch: 1   Global Step: 5230   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:01:38,503-Speed 10557.24 samples/sec   Loss 20.4268   LearningRate 0.1516   Epoch: 1   Global Step: 5240   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:01:46,308-Speed 10497.01 samples/sec   Loss 20.3955   LearningRate 0.1519   Epoch: 1   Global Step: 5250   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:01:54,125-Speed 10480.51 samples/sec   Loss 20.3647   LearningRate 0.1522   Epoch: 1   Global Step: 5260   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:02:01,892-Speed 10548.40 samples/sec   Loss 20.2729   LearningRate 0.1525   Epoch: 1   Global Step: 5270   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:02:09,683-Speed 10516.95 samples/sec   Loss 20.2423   LearningRate 0.1528   Epoch: 1   Global Step: 5280   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-15 16:02:17,484-Speed 10501.94 samples/sec   Loss 20.2150   LearningRate 0.1531   Epoch: 1   Global Step: 5290   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-15 16:02:25,275-Speed 10516.63 samples/sec   Loss 20.0415   LearningRate 0.1534   Epoch: 1   Global Step: 5300   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-15 16:02:33,065-Speed 10516.98 samples/sec   Loss 20.0058   LearningRate 0.1536   Epoch: 1   Global Step: 5310   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-15 16:02:40,872-Speed 10494.50 samples/sec   Loss 19.9916   LearningRate 0.1539   Epoch: 1   Global Step: 5320   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-15 16:02:48,642-Speed 10545.12 samples/sec   Loss 19.8870   LearningRate 0.1542   Epoch: 1   Global Step: 5330   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-15 16:02:56,423-Speed 10529.68 samples/sec   Loss 19.7920   LearningRate 0.1545   Epoch: 1   Global Step: 5340   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-15 16:03:04,209-Speed 10524.84 samples/sec   Loss 19.8201   LearningRate 0.1548   Epoch: 1   Global Step: 5350   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-15 16:03:12,010-Speed 10504.11 samples/sec   Loss 19.7805   LearningRate 0.1551   Epoch: 1   Global Step: 5360   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-15 16:03:19,860-Speed 10437.17 samples/sec   Loss 19.6860   LearningRate 0.1554   Epoch: 1   Global Step: 5370   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-15 16:03:27,676-Speed 10484.18 samples/sec   Loss 19.6766   LearningRate 0.1557   Epoch: 1   Global Step: 5380   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:03:35,460-Speed 10525.73 samples/sec   Loss 19.6099   LearningRate 0.1560   Epoch: 1   Global Step: 5390   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:03:43,245-Speed 10525.17 samples/sec   Loss 19.4645   LearningRate 0.1562   Epoch: 1   Global Step: 5400   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:03:51,051-Speed 10495.76 samples/sec   Loss 19.5297   LearningRate 0.1565   Epoch: 1   Global Step: 5410   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:03:58,826-Speed 10537.98 samples/sec   Loss 19.4194   LearningRate 0.1568   Epoch: 1   Global Step: 5420   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:04:06,618-Speed 10518.42 samples/sec   Loss 19.4323   LearningRate 0.1571   Epoch: 1   Global Step: 5430   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:04:14,440-Speed 10474.93 samples/sec   Loss 19.3078   LearningRate 0.1574   Epoch: 1   Global Step: 5440   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:04:22,262-Speed 10474.79 samples/sec   Loss 19.1968   LearningRate 0.1577   Epoch: 1   Global Step: 5450   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:04:30,079-Speed 10482.28 samples/sec   Loss 19.2155   LearningRate 0.1580   Epoch: 1   Global Step: 5460   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:04:37,890-Speed 10494.60 samples/sec   Loss 19.1448   LearningRate 0.1583   Epoch: 1   Global Step: 5470   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:04:45,686-Speed 10510.51 samples/sec   Loss 19.0557   LearningRate 0.1586   Epoch: 1   Global Step: 5480   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:04:53,501-Speed 10484.00 samples/sec   Loss 18.9721   LearningRate 0.1589   Epoch: 1   Global Step: 5490   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:05:01,311-Speed 10491.43 samples/sec   Loss 18.9531   LearningRate 0.1591   Epoch: 1   Global Step: 5500   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:05:09,107-Speed 10508.63 samples/sec   Loss 18.9756   LearningRate 0.1594   Epoch: 1   Global Step: 5510   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:05:16,908-Speed 10504.09 samples/sec   Loss 18.8561   LearningRate 0.1597   Epoch: 1   Global Step: 5520   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:05:24,700-Speed 10514.57 samples/sec   Loss 18.7858   LearningRate 0.1600   Epoch: 1   Global Step: 5530   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:05:32,478-Speed 10534.49 samples/sec   Loss 18.7619   LearningRate 0.1603   Epoch: 1   Global Step: 5540   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:05:40,263-Speed 10533.63 samples/sec   Loss 18.5925   LearningRate 0.1606   Epoch: 1   Global Step: 5550   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:05:48,117-Speed 10432.46 samples/sec   Loss 18.5888   LearningRate 0.1609   Epoch: 1   Global Step: 5560   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:05:55,916-Speed 10505.93 samples/sec   Loss 18.6380   LearningRate 0.1612   Epoch: 1   Global Step: 5570   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:06:03,704-Speed 10520.55 samples/sec   Loss 18.5644   LearningRate 0.1615   Epoch: 1   Global Step: 5580   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:06:11,534-Speed 10464.80 samples/sec   Loss 18.4567   LearningRate 0.1617   Epoch: 1   Global Step: 5590   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:06:19,349-Speed 10483.26 samples/sec   Loss 18.4461   LearningRate 0.1620   Epoch: 1   Global Step: 5600   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:06:27,122-Speed 10542.00 samples/sec   Loss 18.4130   LearningRate 0.1623   Epoch: 1   Global Step: 5610   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:06:34,904-Speed 10529.47 samples/sec   Loss 18.3210   LearningRate 0.1626   Epoch: 1   Global Step: 5620   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:06:42,696-Speed 10516.07 samples/sec   Loss 18.2544   LearningRate 0.1629   Epoch: 1   Global Step: 5630   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:06:50,478-Speed 10529.41 samples/sec   Loss 18.2818   LearningRate 0.1632   Epoch: 1   Global Step: 5640   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:06:58,246-Speed 10548.60 samples/sec   Loss 18.1940   LearningRate 0.1635   Epoch: 1   Global Step: 5650   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:07:06,011-Speed 10552.23 samples/sec   Loss 18.2608   LearningRate 0.1638   Epoch: 1   Global Step: 5660   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:07:13,780-Speed 10545.45 samples/sec   Loss 18.0607   LearningRate 0.1641   Epoch: 1   Global Step: 5670   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:07:21,561-Speed 10531.27 samples/sec   Loss 18.0505   LearningRate 0.1644   Epoch: 1   Global Step: 5680   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:07:29,333-Speed 10541.81 samples/sec   Loss 18.0149   LearningRate 0.1646   Epoch: 1   Global Step: 5690   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:07:37,175-Speed 10447.57 samples/sec   Loss 17.9031   LearningRate 0.1649   Epoch: 1   Global Step: 5700   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:07:44,963-Speed 10521.47 samples/sec   Loss 17.8329   LearningRate 0.1652   Epoch: 1   Global Step: 5710   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:07:52,773-Speed 10490.58 samples/sec   Loss 17.8037   LearningRate 0.1655   Epoch: 1   Global Step: 5720   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:08:00,579-Speed 10496.42 samples/sec   Loss 17.7336   LearningRate 0.1658   Epoch: 1   Global Step: 5730   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:08:08,368-Speed 10520.48 samples/sec   Loss 17.6466   LearningRate 0.1661   Epoch: 1   Global Step: 5740   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:08:16,138-Speed 10544.79 samples/sec   Loss 17.6486   LearningRate 0.1664   Epoch: 1   Global Step: 5750   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:08:23,930-Speed 10515.53 samples/sec   Loss 17.6712   LearningRate 0.1667   Epoch: 1   Global Step: 5760   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:08:31,751-Speed 10475.37 samples/sec   Loss 17.6085   LearningRate 0.1670   Epoch: 1   Global Step: 5770   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:08:39,553-Speed 10501.98 samples/sec   Loss 17.5113   LearningRate 0.1672   Epoch: 1   Global Step: 5780   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:08:47,334-Speed 10530.80 samples/sec   Loss 17.5225   LearningRate 0.1675   Epoch: 1   Global Step: 5790   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:08:55,111-Speed 10535.85 samples/sec   Loss 17.4781   LearningRate 0.1678   Epoch: 1   Global Step: 5800   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:09:02,908-Speed 10508.93 samples/sec   Loss 17.3773   LearningRate 0.1681   Epoch: 1   Global Step: 5810   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:09:10,721-Speed 10486.79 samples/sec   Loss 17.3223   LearningRate 0.1684   Epoch: 1   Global Step: 5820   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:09:18,486-Speed 10552.01 samples/sec   Loss 17.2909   LearningRate 0.1687   Epoch: 1   Global Step: 5830   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:09:26,252-Speed 10556.49 samples/sec   Loss 17.2713   LearningRate 0.1690   Epoch: 1   Global Step: 5840   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:09:34,022-Speed 10544.77 samples/sec   Loss 17.1329   LearningRate 0.1693   Epoch: 1   Global Step: 5850   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:09:41,815-Speed 10513.33 samples/sec   Loss 17.1332   LearningRate 0.1696   Epoch: 1   Global Step: 5860   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:09:49,627-Speed 10488.45 samples/sec   Loss 17.1117   LearningRate 0.1698   Epoch: 1   Global Step: 5870   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:09:57,417-Speed 10517.97 samples/sec   Loss 17.0782   LearningRate 0.1701   Epoch: 1   Global Step: 5880   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:10:05,197-Speed 10530.75 samples/sec   Loss 17.1290   LearningRate 0.1704   Epoch: 1   Global Step: 5890   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:10:12,987-Speed 10516.18 samples/sec   Loss 16.9668   LearningRate 0.1707   Epoch: 1   Global Step: 5900   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:10:20,767-Speed 10531.87 samples/sec   Loss 16.8363   LearningRate 0.1710   Epoch: 1   Global Step: 5910   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:10:28,561-Speed 10512.48 samples/sec   Loss 16.8930   LearningRate 0.1713   Epoch: 1   Global Step: 5920   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:10:36,372-Speed 10488.68 samples/sec   Loss 16.8121   LearningRate 0.1716   Epoch: 1   Global Step: 5930   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:10:44,146-Speed 10540.98 samples/sec   Loss 16.8657   LearningRate 0.1719   Epoch: 1   Global Step: 5940   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:10:51,911-Speed 10551.64 samples/sec   Loss 16.7803   LearningRate 0.1722   Epoch: 1   Global Step: 5950   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:10:59,759-Speed 10438.73 samples/sec   Loss 16.7216   LearningRate 0.1725   Epoch: 1   Global Step: 5960   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:11:07,616-Speed 10428.94 samples/sec   Loss 16.7231   LearningRate 0.1727   Epoch: 1   Global Step: 5970   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:11:15,461-Speed 10445.13 samples/sec   Loss 16.6267   LearningRate 0.1730   Epoch: 1   Global Step: 5980   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:11:23,292-Speed 10462.16 samples/sec   Loss 16.6828   LearningRate 0.1733   Epoch: 1   Global Step: 5990   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:11:31,133-Speed 10450.39 samples/sec   Loss 16.5863   LearningRate 0.1736   Epoch: 1   Global Step: 6000   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:11:39,041-Speed 10361.53 samples/sec   Loss 16.5387   LearningRate 0.1739   Epoch: 1   Global Step: 6010   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:11:46,865-Speed 10473.35 samples/sec   Loss 16.4689   LearningRate 0.1742   Epoch: 1   Global Step: 6020   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:11:54,656-Speed 10516.21 samples/sec   Loss 16.4359   LearningRate 0.1745   Epoch: 1   Global Step: 6030   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:12:02,448-Speed 10518.14 samples/sec   Loss 16.3769   LearningRate 0.1748   Epoch: 1   Global Step: 6040   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:12:10,253-Speed 10497.75 samples/sec   Loss 16.3304   LearningRate 0.1751   Epoch: 1   Global Step: 6050   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:12:18,068-Speed 10484.77 samples/sec   Loss 16.2754   LearningRate 0.1753   Epoch: 1   Global Step: 6060   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:12:25,908-Speed 10451.15 samples/sec   Loss 16.2832   LearningRate 0.1756   Epoch: 1   Global Step: 6070   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:12:33,716-Speed 10492.95 samples/sec   Loss 16.1898   LearningRate 0.1759   Epoch: 1   Global Step: 6080   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:12:41,528-Speed 10488.82 samples/sec   Loss 16.1883   LearningRate 0.1762   Epoch: 1   Global Step: 6090   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:12:49,342-Speed 10485.90 samples/sec   Loss 16.1261   LearningRate 0.1765   Epoch: 1   Global Step: 6100   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:12:57,159-Speed 10480.99 samples/sec   Loss 16.0928   LearningRate 0.1768   Epoch: 1   Global Step: 6110   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:13:05,047-Speed 10387.18 samples/sec   Loss 16.1268   LearningRate 0.1771   Epoch: 1   Global Step: 6120   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:13:12,836-Speed 10518.54 samples/sec   Loss 16.0212   LearningRate 0.1774   Epoch: 1   Global Step: 6130   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-01-15 16:13:20,665-Speed 10466.27 samples/sec   Loss 15.9875   LearningRate 0.1777   Epoch: 1   Global Step: 6140   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-01-15 16:13:28,480-Speed 10484.54 samples/sec   Loss 16.0045   LearningRate 0.1780   Epoch: 1   Global Step: 6150   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-01-15 16:13:36,274-Speed 10513.59 samples/sec   Loss 15.9785   LearningRate 0.1782   Epoch: 1   Global Step: 6160   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-01-15 16:13:44,065-Speed 10517.71 samples/sec   Loss 15.8755   LearningRate 0.1785   Epoch: 1   Global Step: 6170   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-01-15 16:13:51,880-Speed 10484.05 samples/sec   Loss 15.9028   LearningRate 0.1788   Epoch: 1   Global Step: 6180   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-01-15 16:13:59,684-Speed 10499.97 samples/sec   Loss 15.8497   LearningRate 0.1791   Epoch: 1   Global Step: 6190   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-01-15 16:14:07,541-Speed 10429.87 samples/sec   Loss 15.7970   LearningRate 0.1794   Epoch: 1   Global Step: 6200   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-01-15 16:14:15,355-Speed 10486.22 samples/sec   Loss 15.6995   LearningRate 0.1797   Epoch: 1   Global Step: 6210   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-01-15 16:14:23,159-Speed 10498.49 samples/sec   Loss 15.6312   LearningRate 0.1800   Epoch: 1   Global Step: 6220   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-01-15 16:14:30,961-Speed 10502.75 samples/sec   Loss 15.6446   LearningRate 0.1803   Epoch: 1   Global Step: 6230   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-15 16:14:38,785-Speed 10472.59 samples/sec   Loss 15.5847   LearningRate 0.1806   Epoch: 1   Global Step: 6240   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-15 16:14:46,609-Speed 10472.89 samples/sec   Loss 15.6209   LearningRate 0.1808   Epoch: 1   Global Step: 6250   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-15 16:14:54,436-Speed 10468.87 samples/sec   Loss 15.5582   LearningRate 0.1811   Epoch: 1   Global Step: 6260   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-15 16:15:02,265-Speed 10464.24 samples/sec   Loss 15.5140   LearningRate 0.1814   Epoch: 1   Global Step: 6270   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-15 16:15:10,072-Speed 10495.81 samples/sec   Loss 15.4925   LearningRate 0.1817   Epoch: 1   Global Step: 6280   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-15 16:15:17,870-Speed 10507.32 samples/sec   Loss 15.4063   LearningRate 0.1820   Epoch: 1   Global Step: 6290   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-15 16:15:25,687-Speed 10481.31 samples/sec   Loss 15.4281   LearningRate 0.1823   Epoch: 1   Global Step: 6300   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-15 16:15:33,522-Speed 10456.34 samples/sec   Loss 15.3631   LearningRate 0.1826   Epoch: 1   Global Step: 6310   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-15 16:15:41,357-Speed 10458.69 samples/sec   Loss 15.3153   LearningRate 0.1829   Epoch: 1   Global Step: 6320   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-01-15 16:15:49,173-Speed 10481.93 samples/sec   Loss 15.3691   LearningRate 0.1832   Epoch: 1   Global Step: 6330   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:15:56,990-Speed 10480.77 samples/sec   Loss 15.3549   LearningRate 0.1834   Epoch: 1   Global Step: 6340   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:16:04,821-Speed 10462.85 samples/sec   Loss 15.2624   LearningRate 0.1837   Epoch: 1   Global Step: 6350   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:16:12,667-Speed 10443.27 samples/sec   Loss 15.2029   LearningRate 0.1840   Epoch: 1   Global Step: 6360   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:16:20,476-Speed 10491.75 samples/sec   Loss 15.1562   LearningRate 0.1843   Epoch: 1   Global Step: 6370   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:16:28,305-Speed 10467.81 samples/sec   Loss 15.1235   LearningRate 0.1846   Epoch: 1   Global Step: 6380   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:16:36,131-Speed 10468.67 samples/sec   Loss 15.1337   LearningRate 0.1849   Epoch: 1   Global Step: 6390   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:16:43,948-Speed 10481.75 samples/sec   Loss 15.0772   LearningRate 0.1852   Epoch: 1   Global Step: 6400   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:16:51,791-Speed 10447.18 samples/sec   Loss 15.0016   LearningRate 0.1855   Epoch: 1   Global Step: 6410   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:16:59,604-Speed 10486.47 samples/sec   Loss 15.0292   LearningRate 0.1858   Epoch: 1   Global Step: 6420   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:17:07,411-Speed 10493.84 samples/sec   Loss 14.9701   LearningRate 0.1861   Epoch: 1   Global Step: 6430   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:17:15,265-Speed 10432.88 samples/sec   Loss 14.9295   LearningRate 0.1863   Epoch: 1   Global Step: 6440   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:17:23,083-Speed 10480.18 samples/sec   Loss 14.9026   LearningRate 0.1866   Epoch: 1   Global Step: 6450   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:17:30,925-Speed 10447.26 samples/sec   Loss 14.8515   LearningRate 0.1869   Epoch: 1   Global Step: 6460   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:17:38,750-Speed 10470.57 samples/sec   Loss 14.9038   LearningRate 0.1872   Epoch: 1   Global Step: 6470   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:17:46,548-Speed 10506.44 samples/sec   Loss 14.8624   LearningRate 0.1875   Epoch: 1   Global Step: 6480   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:17:54,387-Speed 10452.18 samples/sec   Loss 14.8428   LearningRate 0.1878   Epoch: 1   Global Step: 6490   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:18:02,203-Speed 10481.69 samples/sec   Loss 14.8304   LearningRate 0.1881   Epoch: 1   Global Step: 6500   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:18:10,035-Speed 10462.22 samples/sec   Loss 14.7872   LearningRate 0.1884   Epoch: 1   Global Step: 6510   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:18:17,842-Speed 10494.02 samples/sec   Loss 14.7285   LearningRate 0.1887   Epoch: 1   Global Step: 6520   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:18:25,654-Speed 10493.92 samples/sec   Loss 14.6669   LearningRate 0.1889   Epoch: 1   Global Step: 6530   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:18:33,477-Speed 10472.50 samples/sec   Loss 14.7177   LearningRate 0.1892   Epoch: 1   Global Step: 6540   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:18:41,331-Speed 10433.10 samples/sec   Loss 14.6245   LearningRate 0.1895   Epoch: 1   Global Step: 6550   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:18:49,192-Speed 10422.36 samples/sec   Loss 14.5728   LearningRate 0.1898   Epoch: 1   Global Step: 6560   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:18:57,004-Speed 10489.94 samples/sec   Loss 14.5656   LearningRate 0.1901   Epoch: 1   Global Step: 6570   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:19:04,876-Speed 10412.38 samples/sec   Loss 14.5242   LearningRate 0.1904   Epoch: 1   Global Step: 6580   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:19:12,699-Speed 10474.54 samples/sec   Loss 14.5771   LearningRate 0.1907   Epoch: 1   Global Step: 6590   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:19:20,504-Speed 10498.25 samples/sec   Loss 14.5146   LearningRate 0.1910   Epoch: 1   Global Step: 6600   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:19:28,352-Speed 10441.00 samples/sec   Loss 14.4515   LearningRate 0.1913   Epoch: 1   Global Step: 6610   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:19:36,169-Speed 10481.21 samples/sec   Loss 14.4438   LearningRate 0.1916   Epoch: 1   Global Step: 6620   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:19:43,992-Speed 10473.85 samples/sec   Loss 14.4424   LearningRate 0.1918   Epoch: 1   Global Step: 6630   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:19:51,833-Speed 10449.74 samples/sec   Loss 14.3928   LearningRate 0.1921   Epoch: 1   Global Step: 6640   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:19:59,673-Speed 10450.95 samples/sec   Loss 14.2987   LearningRate 0.1924   Epoch: 1   Global Step: 6650   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:20:07,505-Speed 10461.65 samples/sec   Loss 14.3324   LearningRate 0.1927   Epoch: 1   Global Step: 6660   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:20:15,331-Speed 10469.94 samples/sec   Loss 14.3624   LearningRate 0.1930   Epoch: 1   Global Step: 6670   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:20:23,149-Speed 10481.15 samples/sec   Loss 14.2144   LearningRate 0.1933   Epoch: 1   Global Step: 6680   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:20:30,982-Speed 10459.78 samples/sec   Loss 14.2143   LearningRate 0.1936   Epoch: 1   Global Step: 6690   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:20:38,794-Speed 10488.48 samples/sec   Loss 14.1985   LearningRate 0.1939   Epoch: 1   Global Step: 6700   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:20:46,631-Speed 10454.77 samples/sec   Loss 14.1937   LearningRate 0.1942   Epoch: 1   Global Step: 6710   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:20:54,453-Speed 10476.04 samples/sec   Loss 14.1576   LearningRate 0.1944   Epoch: 1   Global Step: 6720   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:21:02,272-Speed 10479.52 samples/sec   Loss 14.1459   LearningRate 0.1947   Epoch: 1   Global Step: 6730   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:21:10,091-Speed 10478.27 samples/sec   Loss 14.1308   LearningRate 0.1950   Epoch: 1   Global Step: 6740   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:21:17,928-Speed 10455.43 samples/sec   Loss 14.1182   LearningRate 0.1953   Epoch: 1   Global Step: 6750   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:21:25,766-Speed 10455.94 samples/sec   Loss 14.1072   LearningRate 0.1956   Epoch: 1   Global Step: 6760   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:21:33,590-Speed 10472.51 samples/sec   Loss 14.0425   LearningRate 0.1959   Epoch: 1   Global Step: 6770   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:21:41,418-Speed 10468.51 samples/sec   Loss 14.0038   LearningRate 0.1962   Epoch: 1   Global Step: 6780   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:21:49,269-Speed 10436.20 samples/sec   Loss 13.9770   LearningRate 0.1965   Epoch: 1   Global Step: 6790   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:21:57,086-Speed 10481.54 samples/sec   Loss 13.9645   LearningRate 0.1968   Epoch: 1   Global Step: 6800   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:22:04,910-Speed 10472.21 samples/sec   Loss 13.9568   LearningRate 0.1970   Epoch: 1   Global Step: 6810   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:22:12,713-Speed 10500.58 samples/sec   Loss 13.8665   LearningRate 0.1973   Epoch: 1   Global Step: 6820   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:22:20,527-Speed 10485.14 samples/sec   Loss 13.8960   LearningRate 0.1976   Epoch: 1   Global Step: 6830   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:22:28,363-Speed 10456.05 samples/sec   Loss 13.7532   LearningRate 0.1979   Epoch: 1   Global Step: 6840   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:22:36,174-Speed 10489.75 samples/sec   Loss 13.8504   LearningRate 0.1982   Epoch: 1   Global Step: 6850   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:22:43,952-Speed 10542.04 samples/sec   Loss 13.8627   LearningRate 0.1985   Epoch: 1   Global Step: 6860   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:22:51,754-Speed 10505.98 samples/sec   Loss 13.8304   LearningRate 0.1988   Epoch: 1   Global Step: 6870   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:22:59,544-Speed 10518.15 samples/sec   Loss 13.8071   LearningRate 0.1991   Epoch: 1   Global Step: 6880   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:23:07,335-Speed 10517.38 samples/sec   Loss 13.7442   LearningRate 0.1994   Epoch: 1   Global Step: 6890   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:23:15,129-Speed 10512.40 samples/sec   Loss 13.7089   LearningRate 0.1997   Epoch: 1   Global Step: 6900   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:23:22,935-Speed 10496.54 samples/sec   Loss 13.7029   LearningRate 0.1999   Epoch: 1   Global Step: 6910   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:23:30,761-Speed 10471.21 samples/sec   Loss 13.7903   LearningRate 0.2002   Epoch: 1   Global Step: 6920   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:23:38,571-Speed 10490.10 samples/sec   Loss 13.6416   LearningRate 0.2005   Epoch: 1   Global Step: 6930   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:23:46,352-Speed 10529.98 samples/sec   Loss 13.6929   LearningRate 0.2008   Epoch: 1   Global Step: 6940   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:23:54,192-Speed 10451.26 samples/sec   Loss 13.6043   LearningRate 0.2011   Epoch: 1   Global Step: 6950   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:24:01,977-Speed 10524.70 samples/sec   Loss 13.6320   LearningRate 0.2014   Epoch: 1   Global Step: 6960   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:24:09,807-Speed 10464.18 samples/sec   Loss 13.5977   LearningRate 0.2017   Epoch: 1   Global Step: 6970   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:24:17,606-Speed 10504.39 samples/sec   Loss 13.6072   LearningRate 0.2020   Epoch: 1   Global Step: 6980   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:24:25,426-Speed 10476.61 samples/sec   Loss 13.4894   LearningRate 0.2023   Epoch: 1   Global Step: 6990   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:24:33,276-Speed 10447.64 samples/sec   Loss 13.4887   LearningRate 0.2025   Epoch: 1   Global Step: 7000   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:24:41,060-Speed 10525.55 samples/sec   Loss 13.5295   LearningRate 0.2028   Epoch: 1   Global Step: 7010   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:24:48,860-Speed 10503.79 samples/sec   Loss 13.4742   LearningRate 0.2031   Epoch: 1   Global Step: 7020   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:24:56,670-Speed 10491.13 samples/sec   Loss 13.4851   LearningRate 0.2034   Epoch: 1   Global Step: 7030   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:25:04,458-Speed 10521.01 samples/sec   Loss 13.4577   LearningRate 0.2037   Epoch: 1   Global Step: 7040   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:25:12,256-Speed 10507.32 samples/sec   Loss 13.3776   LearningRate 0.2040   Epoch: 1   Global Step: 7050   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:25:20,041-Speed 10523.66 samples/sec   Loss 13.4737   LearningRate 0.2043   Epoch: 1   Global Step: 7060   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:25:27,848-Speed 10494.97 samples/sec   Loss 13.3618   LearningRate 0.2046   Epoch: 1   Global Step: 7070   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:25:35,684-Speed 10455.77 samples/sec   Loss 13.3411   LearningRate 0.2049   Epoch: 1   Global Step: 7080   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:25:43,515-Speed 10467.14 samples/sec   Loss 13.3810   LearningRate 0.2052   Epoch: 1   Global Step: 7090   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:25:51,312-Speed 10509.31 samples/sec   Loss 13.3300   LearningRate 0.2054   Epoch: 1   Global Step: 7100   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:25:59,143-Speed 10462.06 samples/sec   Loss 13.3492   LearningRate 0.2057   Epoch: 1   Global Step: 7110   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:26:06,949-Speed 10496.95 samples/sec   Loss 13.3506   LearningRate 0.2060   Epoch: 1   Global Step: 7120   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:26:14,758-Speed 10493.34 samples/sec   Loss 13.2424   LearningRate 0.2063   Epoch: 1   Global Step: 7130   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:26:22,555-Speed 10507.59 samples/sec   Loss 13.2292   LearningRate 0.2066   Epoch: 1   Global Step: 7140   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:26:30,381-Speed 10469.27 samples/sec   Loss 13.1954   LearningRate 0.2069   Epoch: 1   Global Step: 7150   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:26:38,205-Speed 10472.43 samples/sec   Loss 13.2087   LearningRate 0.2072   Epoch: 1   Global Step: 7160   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:26:46,044-Speed 10455.70 samples/sec   Loss 13.2381   LearningRate 0.2075   Epoch: 1   Global Step: 7170   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:26:53,852-Speed 10494.46 samples/sec   Loss 13.1533   LearningRate 0.2078   Epoch: 1   Global Step: 7180   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:27:01,656-Speed 10499.66 samples/sec   Loss 13.1414   LearningRate 0.2080   Epoch: 1   Global Step: 7190   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:27:09,476-Speed 10477.29 samples/sec   Loss 13.0861   LearningRate 0.2083   Epoch: 1   Global Step: 7200   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:27:17,276-Speed 10505.18 samples/sec   Loss 13.1478   LearningRate 0.2086   Epoch: 1   Global Step: 7210   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:27:25,082-Speed 10496.81 samples/sec   Loss 13.0986   LearningRate 0.2089   Epoch: 1   Global Step: 7220   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:27:32,898-Speed 10482.61 samples/sec   Loss 13.0794   LearningRate 0.2092   Epoch: 1   Global Step: 7230   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:27:40,740-Speed 10448.04 samples/sec   Loss 13.0212   LearningRate 0.2095   Epoch: 1   Global Step: 7240   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:27:48,574-Speed 10458.67 samples/sec   Loss 13.0420   LearningRate 0.2098   Epoch: 1   Global Step: 7250   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:27:56,359-Speed 10525.93 samples/sec   Loss 13.0299   LearningRate 0.2101   Epoch: 1   Global Step: 7260   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:28:04,145-Speed 10522.93 samples/sec   Loss 12.9774   LearningRate 0.2104   Epoch: 1   Global Step: 7270   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:28:11,956-Speed 10489.15 samples/sec   Loss 12.9439   LearningRate 0.2106   Epoch: 1   Global Step: 7280   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:28:19,738-Speed 10529.22 samples/sec   Loss 12.9861   LearningRate 0.2109   Epoch: 1   Global Step: 7290   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:28:27,558-Speed 10477.24 samples/sec   Loss 12.9449   LearningRate 0.2112   Epoch: 1   Global Step: 7300   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:28:35,380-Speed 10475.87 samples/sec   Loss 12.9539   LearningRate 0.2115   Epoch: 1   Global Step: 7310   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:28:43,188-Speed 10494.22 samples/sec   Loss 12.8787   LearningRate 0.2118   Epoch: 1   Global Step: 7320   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:28:50,989-Speed 10503.22 samples/sec   Loss 12.8944   LearningRate 0.2121   Epoch: 1   Global Step: 7330   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:28:58,798-Speed 10491.76 samples/sec   Loss 12.9011   LearningRate 0.2124   Epoch: 1   Global Step: 7340   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:29:06,587-Speed 10520.40 samples/sec   Loss 12.8390   LearningRate 0.2127   Epoch: 1   Global Step: 7350   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:29:14,421-Speed 10459.13 samples/sec   Loss 12.8835   LearningRate 0.2130   Epoch: 1   Global Step: 7360   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:29:22,250-Speed 10465.56 samples/sec   Loss 12.8311   LearningRate 0.2133   Epoch: 1   Global Step: 7370   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:29:30,037-Speed 10522.06 samples/sec   Loss 12.8248   LearningRate 0.2135   Epoch: 1   Global Step: 7380   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:29:37,835-Speed 10510.14 samples/sec   Loss 12.7816   LearningRate 0.2138   Epoch: 1   Global Step: 7390   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:29:45,651-Speed 10482.42 samples/sec   Loss 12.8135   LearningRate 0.2141   Epoch: 1   Global Step: 7400   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:29:53,457-Speed 10495.75 samples/sec   Loss 12.8424   LearningRate 0.2144   Epoch: 1   Global Step: 7410   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:30:01,257-Speed 10505.48 samples/sec   Loss 12.8746   LearningRate 0.2147   Epoch: 1   Global Step: 7420   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:30:09,098-Speed 10448.38 samples/sec   Loss 12.7487   LearningRate 0.2150   Epoch: 1   Global Step: 7430   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:30:16,915-Speed 10482.03 samples/sec   Loss 12.7455   LearningRate 0.2153   Epoch: 1   Global Step: 7440   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:30:24,723-Speed 10493.19 samples/sec   Loss 12.7063   LearningRate 0.2156   Epoch: 1   Global Step: 7450   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:30:32,564-Speed 10450.00 samples/sec   Loss 12.7045   LearningRate 0.2159   Epoch: 1   Global Step: 7460   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:30:40,359-Speed 10510.97 samples/sec   Loss 12.6972   LearningRate 0.2161   Epoch: 1   Global Step: 7470   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:30:48,138-Speed 10531.81 samples/sec   Loss 12.6699   LearningRate 0.2164   Epoch: 1   Global Step: 7480   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:30:55,936-Speed 10507.83 samples/sec   Loss 12.6814   LearningRate 0.2167   Epoch: 1   Global Step: 7490   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:31:03,780-Speed 10444.58 samples/sec   Loss 12.6577   LearningRate 0.2170   Epoch: 1   Global Step: 7500   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:31:11,572-Speed 10516.25 samples/sec   Loss 12.6433   LearningRate 0.2173   Epoch: 1   Global Step: 7510   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:31:19,382-Speed 10491.95 samples/sec   Loss 12.6610   LearningRate 0.2176   Epoch: 1   Global Step: 7520   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:31:27,175-Speed 10514.64 samples/sec   Loss 12.6079   LearningRate 0.2179   Epoch: 1   Global Step: 7530   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:31:34,988-Speed 10491.91 samples/sec   Loss 12.6361   LearningRate 0.2182   Epoch: 1   Global Step: 7540   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:31:42,799-Speed 10490.75 samples/sec   Loss 12.6176   LearningRate 0.2185   Epoch: 1   Global Step: 7550   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:31:50,607-Speed 10494.71 samples/sec   Loss 12.6193   LearningRate 0.2187   Epoch: 1   Global Step: 7560   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:31:58,425-Speed 10481.12 samples/sec   Loss 12.5687   LearningRate 0.2190   Epoch: 1   Global Step: 7570   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:32:06,231-Speed 10497.53 samples/sec   Loss 12.5497   LearningRate 0.2193   Epoch: 1   Global Step: 7580   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:32:14,046-Speed 10484.23 samples/sec   Loss 12.4628   LearningRate 0.2196   Epoch: 1   Global Step: 7590   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:32:21,862-Speed 10483.34 samples/sec   Loss 12.5350   LearningRate 0.2199   Epoch: 1   Global Step: 7600   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:32:29,693-Speed 10463.87 samples/sec   Loss 12.5011   LearningRate 0.2202   Epoch: 1   Global Step: 7610   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:32:37,504-Speed 10491.79 samples/sec   Loss 12.5098   LearningRate 0.2205   Epoch: 1   Global Step: 7620   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:32:45,286-Speed 10529.92 samples/sec   Loss 12.4639   LearningRate 0.2208   Epoch: 1   Global Step: 7630   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:32:53,070-Speed 10527.50 samples/sec   Loss 12.4655   LearningRate 0.2211   Epoch: 1   Global Step: 7640   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:33:00,886-Speed 10482.36 samples/sec   Loss 12.4872   LearningRate 0.2214   Epoch: 1   Global Step: 7650   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:33:08,682-Speed 10510.68 samples/sec   Loss 12.4759   LearningRate 0.2216   Epoch: 1   Global Step: 7660   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:33:16,510-Speed 10466.68 samples/sec   Loss 12.4609   LearningRate 0.2219   Epoch: 1   Global Step: 7670   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:33:24,328-Speed 10480.85 samples/sec   Loss 12.4898   LearningRate 0.2222   Epoch: 1   Global Step: 7680   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:33:32,123-Speed 10510.77 samples/sec   Loss 12.4782   LearningRate 0.2225   Epoch: 1   Global Step: 7690   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:33:39,944-Speed 10475.59 samples/sec   Loss 12.4631   LearningRate 0.2228   Epoch: 1   Global Step: 7700   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:33:47,745-Speed 10503.35 samples/sec   Loss 12.4007   LearningRate 0.2231   Epoch: 1   Global Step: 7710   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:33:55,554-Speed 10491.98 samples/sec   Loss 12.3831   LearningRate 0.2234   Epoch: 1   Global Step: 7720   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:34:03,375-Speed 10476.63 samples/sec   Loss 12.3720   LearningRate 0.2237   Epoch: 1   Global Step: 7730   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:34:11,208-Speed 10459.47 samples/sec   Loss 12.3739   LearningRate 0.2240   Epoch: 1   Global Step: 7740   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:34:18,993-Speed 10524.02 samples/sec   Loss 12.3496   LearningRate 0.2242   Epoch: 1   Global Step: 7750   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:34:26,800-Speed 10494.49 samples/sec   Loss 12.3399   LearningRate 0.2245   Epoch: 1   Global Step: 7760   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:34:34,607-Speed 10495.25 samples/sec   Loss 12.2857   LearningRate 0.2248   Epoch: 1   Global Step: 7770   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:34:42,406-Speed 10505.19 samples/sec   Loss 12.2689   LearningRate 0.2251   Epoch: 1   Global Step: 7780   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:34:50,234-Speed 10467.14 samples/sec   Loss 12.2897   LearningRate 0.2254   Epoch: 1   Global Step: 7790   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:34:58,060-Speed 10469.11 samples/sec   Loss 12.2552   LearningRate 0.2257   Epoch: 1   Global Step: 7800   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:35:05,867-Speed 10496.03 samples/sec   Loss 12.2977   LearningRate 0.2260   Epoch: 1   Global Step: 7810   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:35:13,650-Speed 10526.36 samples/sec   Loss 12.3160   LearningRate 0.2263   Epoch: 1   Global Step: 7820   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:35:21,485-Speed 10457.64 samples/sec   Loss 12.2141   LearningRate 0.2266   Epoch: 1   Global Step: 7830   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:35:29,282-Speed 10508.36 samples/sec   Loss 12.2689   LearningRate 0.2269   Epoch: 1   Global Step: 7840   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:35:37,123-Speed 10450.48 samples/sec   Loss 12.3546   LearningRate 0.2271   Epoch: 1   Global Step: 7850   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:35:44,934-Speed 10487.76 samples/sec   Loss 12.7375   LearningRate 0.2274   Epoch: 1   Global Step: 7860   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:35:52,749-Speed 10484.02 samples/sec   Loss 13.3437   LearningRate 0.2277   Epoch: 1   Global Step: 7870   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:36:00,530-Speed 10530.98 samples/sec   Loss 13.2974   LearningRate 0.2280   Epoch: 1   Global Step: 7880   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:36:08,383-Speed 10434.12 samples/sec   Loss 12.8482   LearningRate 0.2283   Epoch: 1   Global Step: 7890   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:36:16,149-Speed 10549.64 samples/sec   Loss 12.5436   LearningRate 0.2286   Epoch: 1   Global Step: 7900   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:36:23,932-Speed 10526.93 samples/sec   Loss 12.3282   LearningRate 0.2289   Epoch: 1   Global Step: 7910   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:36:31,738-Speed 10497.35 samples/sec   Loss 12.2899   LearningRate 0.2292   Epoch: 1   Global Step: 7920   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:36:39,523-Speed 10525.12 samples/sec   Loss 12.3398   LearningRate 0.2295   Epoch: 1   Global Step: 7930   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:36:47,333-Speed 10491.44 samples/sec   Loss 12.2825   LearningRate 0.2297   Epoch: 1   Global Step: 7940   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:36:55,150-Speed 10481.14 samples/sec   Loss 12.2351   LearningRate 0.2300   Epoch: 1   Global Step: 7950   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:37:02,957-Speed 10495.18 samples/sec   Loss 12.1520   LearningRate 0.2303   Epoch: 1   Global Step: 7960   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:37:10,729-Speed 10542.38 samples/sec   Loss 12.2647   LearningRate 0.2306   Epoch: 1   Global Step: 7970   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:37:18,529-Speed 10505.15 samples/sec   Loss 12.2722   LearningRate 0.2309   Epoch: 1   Global Step: 7980   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:37:26,323-Speed 10512.54 samples/sec   Loss 12.2129   LearningRate 0.2312   Epoch: 1   Global Step: 7990   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:37:34,098-Speed 10539.16 samples/sec   Loss 12.1665   LearningRate 0.2315   Epoch: 1   Global Step: 8000   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:37:41,894-Speed 10509.21 samples/sec   Loss 12.1425   LearningRate 0.2318   Epoch: 1   Global Step: 8010   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:37:49,692-Speed 10506.59 samples/sec   Loss 12.2000   LearningRate 0.2321   Epoch: 1   Global Step: 8020   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:37:57,518-Speed 10470.39 samples/sec   Loss 12.0602   LearningRate 0.2323   Epoch: 1   Global Step: 8030   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:38:05,368-Speed 10438.08 samples/sec   Loss 12.1357   LearningRate 0.2326   Epoch: 1   Global Step: 8040   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:38:13,177-Speed 10492.41 samples/sec   Loss 12.1344   LearningRate 0.2329   Epoch: 1   Global Step: 8050   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:38:20,956-Speed 10533.79 samples/sec   Loss 12.0873   LearningRate 0.2332   Epoch: 1   Global Step: 8060   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:38:28,773-Speed 10481.70 samples/sec   Loss 12.0849   LearningRate 0.2335   Epoch: 1   Global Step: 8070   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:38:36,601-Speed 10473.00 samples/sec   Loss 12.0774   LearningRate 0.2338   Epoch: 1   Global Step: 8080   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:38:44,401-Speed 10503.65 samples/sec   Loss 12.0163   LearningRate 0.2341   Epoch: 1   Global Step: 8090   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:38:52,205-Speed 10499.59 samples/sec   Loss 12.0346   LearningRate 0.2344   Epoch: 1   Global Step: 8100   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:39:00,004-Speed 10506.85 samples/sec   Loss 11.9754   LearningRate 0.2347   Epoch: 1   Global Step: 8110   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:39:07,813-Speed 10493.13 samples/sec   Loss 12.0105   LearningRate 0.2350   Epoch: 1   Global Step: 8120   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:39:15,635-Speed 10473.63 samples/sec   Loss 12.0141   LearningRate 0.2352   Epoch: 1   Global Step: 8130   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:39:23,408-Speed 10541.03 samples/sec   Loss 12.1052   LearningRate 0.2355   Epoch: 1   Global Step: 8140   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:39:31,195-Speed 10522.10 samples/sec   Loss 12.0172   LearningRate 0.2358   Epoch: 1   Global Step: 8150   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:39:38,983-Speed 10520.79 samples/sec   Loss 12.0111   LearningRate 0.2361   Epoch: 1   Global Step: 8160   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:39:46,775-Speed 10515.66 samples/sec   Loss 11.9759   LearningRate 0.2364   Epoch: 1   Global Step: 8170   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:39:54,580-Speed 10496.57 samples/sec   Loss 11.9699   LearningRate 0.2367   Epoch: 1   Global Step: 8180   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:40:02,359-Speed 10534.83 samples/sec   Loss 11.9196   LearningRate 0.2370   Epoch: 1   Global Step: 8190   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:40:10,155-Speed 10514.66 samples/sec   Loss 11.9134   LearningRate 0.2373   Epoch: 1   Global Step: 8200   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:40:17,984-Speed 10464.30 samples/sec   Loss 12.0250   LearningRate 0.2376   Epoch: 1   Global Step: 8210   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:40:25,814-Speed 10465.07 samples/sec   Loss 11.9306   LearningRate 0.2378   Epoch: 1   Global Step: 8220   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:40:33,654-Speed 10453.16 samples/sec   Loss 12.0114   LearningRate 0.2381   Epoch: 1   Global Step: 8230   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:40:41,487-Speed 10461.38 samples/sec   Loss 11.9947   LearningRate 0.2384   Epoch: 1   Global Step: 8240   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:40:49,282-Speed 10511.17 samples/sec   Loss 11.9013   LearningRate 0.2387   Epoch: 1   Global Step: 8250   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:40:57,094-Speed 10489.42 samples/sec   Loss 11.9253   LearningRate 0.2390   Epoch: 1   Global Step: 8260   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:41:04,890-Speed 10509.59 samples/sec   Loss 11.8885   LearningRate 0.2393   Epoch: 1   Global Step: 8270   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:41:12,679-Speed 10520.25 samples/sec   Loss 11.8643   LearningRate 0.2396   Epoch: 1   Global Step: 8280   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:41:20,493-Speed 10484.75 samples/sec   Loss 11.9053   LearningRate 0.2399   Epoch: 1   Global Step: 8290   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:41:28,335-Speed 10448.02 samples/sec   Loss 11.9095   LearningRate 0.2402   Epoch: 1   Global Step: 8300   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:41:36,156-Speed 10477.14 samples/sec   Loss 11.8087   LearningRate 0.2405   Epoch: 1   Global Step: 8310   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:41:43,986-Speed 10464.07 samples/sec   Loss 11.8620   LearningRate 0.2407   Epoch: 1   Global Step: 8320   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:41:51,819-Speed 10459.62 samples/sec   Loss 11.7955   LearningRate 0.2410   Epoch: 1   Global Step: 8330   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:41:59,654-Speed 10457.94 samples/sec   Loss 11.9660   LearningRate 0.2413   Epoch: 1   Global Step: 8340   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:42:07,466-Speed 10487.31 samples/sec   Loss 11.8188   LearningRate 0.2416   Epoch: 1   Global Step: 8350   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:42:15,294-Speed 10468.21 samples/sec   Loss 11.8412   LearningRate 0.2419   Epoch: 1   Global Step: 8360   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:42:23,126-Speed 10460.96 samples/sec   Loss 11.8093   LearningRate 0.2422   Epoch: 1   Global Step: 8370   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:42:30,979-Speed 10432.65 samples/sec   Loss 11.8084   LearningRate 0.2425   Epoch: 1   Global Step: 8380   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:42:38,829-Speed 10438.82 samples/sec   Loss 11.8008   LearningRate 0.2428   Epoch: 1   Global Step: 8390   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:42:46,612-Speed 10528.87 samples/sec   Loss 11.8321   LearningRate 0.2431   Epoch: 1   Global Step: 8400   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:42:54,424-Speed 10495.41 samples/sec   Loss 11.6838   LearningRate 0.2433   Epoch: 1   Global Step: 8410   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:43:02,236-Speed 10488.36 samples/sec   Loss 11.7956   LearningRate 0.2436   Epoch: 1   Global Step: 8420   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:43:10,070-Speed 10458.37 samples/sec   Loss 11.7644   LearningRate 0.2439   Epoch: 1   Global Step: 8430   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:43:17,916-Speed 10446.05 samples/sec   Loss 11.7601   LearningRate 0.2442   Epoch: 1   Global Step: 8440   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:43:25,769-Speed 10432.89 samples/sec   Loss 11.7641   LearningRate 0.2445   Epoch: 1   Global Step: 8450   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:43:33,582-Speed 10487.23 samples/sec   Loss 11.8046   LearningRate 0.2448   Epoch: 1   Global Step: 8460   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:43:41,385-Speed 10499.54 samples/sec   Loss 11.7857   LearningRate 0.2451   Epoch: 1   Global Step: 8470   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:43:49,174-Speed 10520.14 samples/sec   Loss 11.6941   LearningRate 0.2454   Epoch: 1   Global Step: 8480   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:43:56,980-Speed 10496.80 samples/sec   Loss 11.7608   LearningRate 0.2457   Epoch: 1   Global Step: 8490   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:44:04,771-Speed 10515.84 samples/sec   Loss 11.6919   LearningRate 0.2459   Epoch: 1   Global Step: 8500   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:44:12,562-Speed 10517.53 samples/sec   Loss 11.6655   LearningRate 0.2462   Epoch: 1   Global Step: 8510   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:44:20,359-Speed 10507.82 samples/sec   Loss 11.6948   LearningRate 0.2465   Epoch: 1   Global Step: 8520   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:44:28,147-Speed 10521.16 samples/sec   Loss 11.6784   LearningRate 0.2468   Epoch: 1   Global Step: 8530   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:44:35,950-Speed 10500.91 samples/sec   Loss 11.6714   LearningRate 0.2471   Epoch: 1   Global Step: 8540   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:44:43,743-Speed 10513.38 samples/sec   Loss 11.7180   LearningRate 0.2474   Epoch: 1   Global Step: 8550   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:44:51,566-Speed 10473.99 samples/sec   Loss 11.6242   LearningRate 0.2477   Epoch: 1   Global Step: 8560   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:44:59,386-Speed 10477.21 samples/sec   Loss 11.6543   LearningRate 0.2480   Epoch: 1   Global Step: 8570   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:45:07,216-Speed 10464.31 samples/sec   Loss 11.7052   LearningRate 0.2483   Epoch: 1   Global Step: 8580   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:45:14,995-Speed 10532.32 samples/sec   Loss 11.7368   LearningRate 0.2486   Epoch: 1   Global Step: 8590   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:45:22,786-Speed 10516.59 samples/sec   Loss 11.6800   LearningRate 0.2488   Epoch: 1   Global Step: 8600   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:45:30,587-Speed 10503.41 samples/sec   Loss 11.6955   LearningRate 0.2491   Epoch: 1   Global Step: 8610   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:45:38,394-Speed 10495.35 samples/sec   Loss 11.5830   LearningRate 0.2494   Epoch: 1   Global Step: 8620   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:45:46,211-Speed 10480.22 samples/sec   Loss 11.6039   LearningRate 0.2497   Epoch: 1   Global Step: 8630   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:45:54,030-Speed 10479.64 samples/sec   Loss 11.5857   LearningRate 0.2500   Epoch: 1   Global Step: 8640   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:46:01,838-Speed 10493.07 samples/sec   Loss 11.7083   LearningRate 0.2503   Epoch: 1   Global Step: 8650   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:46:09,638-Speed 10503.77 samples/sec   Loss 11.6375   LearningRate 0.2506   Epoch: 1   Global Step: 8660   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:46:17,465-Speed 10467.74 samples/sec   Loss 11.5984   LearningRate 0.2509   Epoch: 1   Global Step: 8670   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:46:25,273-Speed 10493.73 samples/sec   Loss 11.5790   LearningRate 0.2512   Epoch: 1   Global Step: 8680   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-15 16:46:33,079-Speed 10500.22 samples/sec   Loss 11.5658   LearningRate 0.2514   Epoch: 1   Global Step: 8690   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:46:40,862-Speed 10527.04 samples/sec   Loss 11.5969   LearningRate 0.2517   Epoch: 1   Global Step: 8700   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:46:48,657-Speed 10511.97 samples/sec   Loss 11.5140   LearningRate 0.2520   Epoch: 1   Global Step: 8710   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:46:56,454-Speed 10510.14 samples/sec   Loss 11.5778   LearningRate 0.2523   Epoch: 1   Global Step: 8720   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:47:04,241-Speed 10521.35 samples/sec   Loss 11.6863   LearningRate 0.2526   Epoch: 1   Global Step: 8730   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:47:12,034-Speed 10513.25 samples/sec   Loss 11.5752   LearningRate 0.2529   Epoch: 1   Global Step: 8740   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:47:19,875-Speed 10450.20 samples/sec   Loss 11.6774   LearningRate 0.2532   Epoch: 1   Global Step: 8750   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:47:27,715-Speed 10451.28 samples/sec   Loss 11.5664   LearningRate 0.2535   Epoch: 1   Global Step: 8760   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:47:35,516-Speed 10502.21 samples/sec   Loss 11.5689   LearningRate 0.2538   Epoch: 1   Global Step: 8770   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:47:43,308-Speed 10514.63 samples/sec   Loss 11.6229   LearningRate 0.2541   Epoch: 1   Global Step: 8780   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:47:51,089-Speed 10534.65 samples/sec   Loss 11.5259   LearningRate 0.2543   Epoch: 1   Global Step: 8790   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:47:58,909-Speed 10478.01 samples/sec   Loss 11.5222   LearningRate 0.2546   Epoch: 1   Global Step: 8800   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:48:06,745-Speed 10456.84 samples/sec   Loss 11.4711   LearningRate 0.2549   Epoch: 1   Global Step: 8810   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:48:14,537-Speed 10516.45 samples/sec   Loss 11.5199   LearningRate 0.2552   Epoch: 1   Global Step: 8820   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:48:22,348-Speed 10488.77 samples/sec   Loss 11.4774   LearningRate 0.2555   Epoch: 1   Global Step: 8830   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:48:30,171-Speed 10472.94 samples/sec   Loss 11.4265   LearningRate 0.2558   Epoch: 1   Global Step: 8840   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:48:37,960-Speed 10522.96 samples/sec   Loss 11.6024   LearningRate 0.2561   Epoch: 1   Global Step: 8850   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:48:45,789-Speed 10464.84 samples/sec   Loss 11.5222   LearningRate 0.2564   Epoch: 1   Global Step: 8860   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:48:53,595-Speed 10496.42 samples/sec   Loss 11.5405   LearningRate 0.2567   Epoch: 1   Global Step: 8870   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:49:01,429-Speed 10458.87 samples/sec   Loss 11.5194   LearningRate 0.2569   Epoch: 1   Global Step: 8880   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:49:09,229-Speed 10510.35 samples/sec   Loss 11.4813   LearningRate 0.2572   Epoch: 1   Global Step: 8890   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:49:17,043-Speed 10485.45 samples/sec   Loss 11.4391   LearningRate 0.2575   Epoch: 1   Global Step: 8900   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:49:24,880-Speed 10458.27 samples/sec   Loss 11.5178   LearningRate 0.2578   Epoch: 1   Global Step: 8910   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:49:32,645-Speed 10550.56 samples/sec   Loss 11.4312   LearningRate 0.2581   Epoch: 1   Global Step: 8920   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:49:40,478-Speed 10459.79 samples/sec   Loss 11.4211   LearningRate 0.2584   Epoch: 1   Global Step: 8930   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:49:48,276-Speed 10507.99 samples/sec   Loss 11.5513   LearningRate 0.2587   Epoch: 1   Global Step: 8940   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:49:56,089-Speed 10488.24 samples/sec   Loss 11.5177   LearningRate 0.2590   Epoch: 1   Global Step: 8950   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:50:03,944-Speed 10429.74 samples/sec   Loss 11.4024   LearningRate 0.2593   Epoch: 1   Global Step: 8960   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:50:11,775-Speed 10463.35 samples/sec   Loss 11.4793   LearningRate 0.2595   Epoch: 1   Global Step: 8970   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:50:19,608-Speed 10460.66 samples/sec   Loss 11.4461   LearningRate 0.2598   Epoch: 1   Global Step: 8980   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:50:27,437-Speed 10467.24 samples/sec   Loss 11.3990   LearningRate 0.2601   Epoch: 1   Global Step: 8990   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:50:35,252-Speed 10484.87 samples/sec   Loss 11.3759   LearningRate 0.2604   Epoch: 1   Global Step: 9000   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:50:43,046-Speed 10512.23 samples/sec   Loss 11.4295   LearningRate 0.2607   Epoch: 1   Global Step: 9010   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:50:50,841-Speed 10510.58 samples/sec   Loss 11.4546   LearningRate 0.2610   Epoch: 1   Global Step: 9020   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:50:58,624-Speed 10527.10 samples/sec   Loss 11.4330   LearningRate 0.2613   Epoch: 1   Global Step: 9030   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:51:06,414-Speed 10520.36 samples/sec   Loss 11.4184   LearningRate 0.2616   Epoch: 1   Global Step: 9040   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:51:14,215-Speed 10510.95 samples/sec   Loss 11.4508   LearningRate 0.2619   Epoch: 1   Global Step: 9050   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:51:22,037-Speed 10476.64 samples/sec   Loss 11.4190   LearningRate 0.2622   Epoch: 1   Global Step: 9060   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:51:29,856-Speed 10479.27 samples/sec   Loss 11.4153   LearningRate 0.2624   Epoch: 1   Global Step: 9070   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:51:37,692-Speed 10455.18 samples/sec   Loss 11.4071   LearningRate 0.2627   Epoch: 1   Global Step: 9080   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:51:45,517-Speed 10470.18 samples/sec   Loss 11.4099   LearningRate 0.2630   Epoch: 1   Global Step: 9090   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:51:53,359-Speed 10448.13 samples/sec   Loss 11.4077   LearningRate 0.2633   Epoch: 1   Global Step: 9100   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:52:01,200-Speed 10450.87 samples/sec   Loss 11.3638   LearningRate 0.2636   Epoch: 1   Global Step: 9110   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:52:09,016-Speed 10482.24 samples/sec   Loss 11.4007   LearningRate 0.2639   Epoch: 1   Global Step: 9120   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:52:16,812-Speed 10509.18 samples/sec   Loss 11.3737   LearningRate 0.2642   Epoch: 1   Global Step: 9130   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:52:24,628-Speed 10483.66 samples/sec   Loss 11.3402   LearningRate 0.2645   Epoch: 1   Global Step: 9140   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:52:32,448-Speed 10478.42 samples/sec   Loss 11.3829   LearningRate 0.2648   Epoch: 1   Global Step: 9150   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:52:40,287-Speed 10452.05 samples/sec   Loss 11.3262   LearningRate 0.2650   Epoch: 1   Global Step: 9160   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:52:48,095-Speed 10494.02 samples/sec   Loss 11.2468   LearningRate 0.2653   Epoch: 1   Global Step: 9170   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:52:55,895-Speed 10504.61 samples/sec   Loss 11.4098   LearningRate 0.2656   Epoch: 1   Global Step: 9180   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:53:03,678-Speed 10527.40 samples/sec   Loss 11.4733   LearningRate 0.2659   Epoch: 1   Global Step: 9190   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:53:11,486-Speed 10505.81 samples/sec   Loss 11.3295   LearningRate 0.2662   Epoch: 1   Global Step: 9200   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:53:19,272-Speed 10522.73 samples/sec   Loss 11.4898   LearningRate 0.2665   Epoch: 1   Global Step: 9210   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:53:27,091-Speed 10479.66 samples/sec   Loss 11.3724   LearningRate 0.2668   Epoch: 1   Global Step: 9220   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:53:34,944-Speed 10433.62 samples/sec   Loss 11.3762   LearningRate 0.2671   Epoch: 1   Global Step: 9230   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:53:42,744-Speed 10503.55 samples/sec   Loss 11.3575   LearningRate 0.2674   Epoch: 1   Global Step: 9240   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:53:50,568-Speed 10470.45 samples/sec   Loss 11.3596   LearningRate 0.2677   Epoch: 1   Global Step: 9250   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:53:58,370-Speed 10502.19 samples/sec   Loss 11.2956   LearningRate 0.2679   Epoch: 1   Global Step: 9260   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:54:06,197-Speed 10468.94 samples/sec   Loss 11.2785   LearningRate 0.2682   Epoch: 1   Global Step: 9270   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:54:14,028-Speed 10462.10 samples/sec   Loss 11.2505   LearningRate 0.2685   Epoch: 1   Global Step: 9280   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:54:21,841-Speed 10486.48 samples/sec   Loss 11.3133   LearningRate 0.2688   Epoch: 1   Global Step: 9290   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:54:29,709-Speed 10413.45 samples/sec   Loss 11.3311   LearningRate 0.2691   Epoch: 1   Global Step: 9300   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:54:37,548-Speed 10452.31 samples/sec   Loss 11.3761   LearningRate 0.2694   Epoch: 1   Global Step: 9310   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:54:45,359-Speed 10490.85 samples/sec   Loss 11.3335   LearningRate 0.2697   Epoch: 1   Global Step: 9320   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:54:53,195-Speed 10455.20 samples/sec   Loss 11.3088   LearningRate 0.2700   Epoch: 1   Global Step: 9330   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:55:01,036-Speed 10450.96 samples/sec   Loss 11.3158   LearningRate 0.2703   Epoch: 1   Global Step: 9340   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:55:08,811-Speed 10539.52 samples/sec   Loss 11.3128   LearningRate 0.2705   Epoch: 1   Global Step: 9350   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:55:16,609-Speed 10507.16 samples/sec   Loss 11.3122   LearningRate 0.2708   Epoch: 1   Global Step: 9360   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:55:24,392-Speed 10529.03 samples/sec   Loss 11.2317   LearningRate 0.2711   Epoch: 1   Global Step: 9370   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:55:32,181-Speed 10523.91 samples/sec   Loss 11.3335   LearningRate 0.2714   Epoch: 1   Global Step: 9380   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:55:40,000-Speed 10479.63 samples/sec   Loss 11.3166   LearningRate 0.2717   Epoch: 1   Global Step: 9390   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:55:47,792-Speed 10515.15 samples/sec   Loss 11.3479   LearningRate 0.2720   Epoch: 1   Global Step: 9400   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:55:55,603-Speed 10489.69 samples/sec   Loss 11.2496   LearningRate 0.2723   Epoch: 1   Global Step: 9410   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:56:03,402-Speed 10506.41 samples/sec   Loss 11.2035   LearningRate 0.2726   Epoch: 1   Global Step: 9420   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:56:11,231-Speed 10465.09 samples/sec   Loss 11.2797   LearningRate 0.2729   Epoch: 1   Global Step: 9430   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:56:19,040-Speed 10492.38 samples/sec   Loss 11.2204   LearningRate 0.2731   Epoch: 1   Global Step: 9440   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:56:26,820-Speed 10532.38 samples/sec   Loss 11.3702   LearningRate 0.2734   Epoch: 1   Global Step: 9450   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 16:56:34,633-Speed 10487.47 samples/sec   Loss 11.2495   LearningRate 0.2737   Epoch: 1   Global Step: 9460   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:56:42,419-Speed 10523.49 samples/sec   Loss 11.2638   LearningRate 0.2740   Epoch: 1   Global Step: 9470   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 16:56:50,204-Speed 10525.02 samples/sec   Loss 11.2630   LearningRate 0.2743   Epoch: 1   Global Step: 9480   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 16:56:58,001-Speed 10507.94 samples/sec   Loss 11.1760   LearningRate 0.2746   Epoch: 1   Global Step: 9490   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 16:57:05,800-Speed 10506.18 samples/sec   Loss 11.2269   LearningRate 0.2749   Epoch: 1   Global Step: 9500   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-15 16:57:13,592-Speed 10515.30 samples/sec   Loss 11.2109   LearningRate 0.2752   Epoch: 1   Global Step: 9510   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-15 16:57:21,381-Speed 10519.60 samples/sec   Loss 11.2375   LearningRate 0.2755   Epoch: 1   Global Step: 9520   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-15 16:57:29,175-Speed 10511.83 samples/sec   Loss 11.2190   LearningRate 0.2758   Epoch: 1   Global Step: 9530   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-15 16:57:36,965-Speed 10517.53 samples/sec   Loss 11.2582   LearningRate 0.2760   Epoch: 1   Global Step: 9540   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-15 16:57:44,774-Speed 10492.87 samples/sec   Loss 11.2220   LearningRate 0.2763   Epoch: 1   Global Step: 9550   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-15 16:57:52,609-Speed 10458.37 samples/sec   Loss 11.2654   LearningRate 0.2766   Epoch: 1   Global Step: 9560   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-15 16:58:00,420-Speed 10489.49 samples/sec   Loss 11.2801   LearningRate 0.2769   Epoch: 1   Global Step: 9570   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-15 16:58:08,231-Speed 10492.38 samples/sec   Loss 11.3125   LearningRate 0.2772   Epoch: 1   Global Step: 9580   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-15 16:58:16,029-Speed 10507.78 samples/sec   Loss 11.2783   LearningRate 0.2775   Epoch: 1   Global Step: 9590   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-15 16:58:23,828-Speed 10507.11 samples/sec   Loss 11.2738   LearningRate 0.2778   Epoch: 1   Global Step: 9600   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 16:58:31,630-Speed 10500.96 samples/sec   Loss 11.2702   LearningRate 0.2781   Epoch: 1   Global Step: 9610   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 16:58:39,420-Speed 10518.23 samples/sec   Loss 11.1511   LearningRate 0.2784   Epoch: 1   Global Step: 9620   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 16:58:47,218-Speed 10508.10 samples/sec   Loss 11.2479   LearningRate 0.2786   Epoch: 1   Global Step: 9630   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 16:58:55,021-Speed 10501.09 samples/sec   Loss 11.2091   LearningRate 0.2789   Epoch: 1   Global Step: 9640   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 16:59:02,843-Speed 10474.23 samples/sec   Loss 11.3256   LearningRate 0.2792   Epoch: 1   Global Step: 9650   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 16:59:10,665-Speed 10475.16 samples/sec   Loss 11.1924   LearningRate 0.2795   Epoch: 1   Global Step: 9660   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 16:59:18,449-Speed 10525.95 samples/sec   Loss 11.1874   LearningRate 0.2798   Epoch: 1   Global Step: 9670   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 16:59:26,255-Speed 10496.30 samples/sec   Loss 11.2342   LearningRate 0.2801   Epoch: 1   Global Step: 9680   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 16:59:34,049-Speed 10511.63 samples/sec   Loss 11.1464   LearningRate 0.2804   Epoch: 1   Global Step: 9690   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 16:59:41,848-Speed 10505.75 samples/sec   Loss 11.2223   LearningRate 0.2807   Epoch: 1   Global Step: 9700   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 16:59:49,708-Speed 10424.49 samples/sec   Loss 11.2503   LearningRate 0.2810   Epoch: 1   Global Step: 9710   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 16:59:57,574-Speed 10416.44 samples/sec   Loss 11.2204   LearningRate 0.2812   Epoch: 1   Global Step: 9720   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:00:05,421-Speed 10441.35 samples/sec   Loss 11.2115   LearningRate 0.2815   Epoch: 1   Global Step: 9730   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:00:13,212-Speed 10516.35 samples/sec   Loss 11.2207   LearningRate 0.2818   Epoch: 1   Global Step: 9740   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:00:21,020-Speed 10494.73 samples/sec   Loss 11.0847   LearningRate 0.2821   Epoch: 1   Global Step: 9750   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:00:28,812-Speed 10515.55 samples/sec   Loss 11.2637   LearningRate 0.2824   Epoch: 1   Global Step: 9760   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:00:36,660-Speed 10438.57 samples/sec   Loss 11.2441   LearningRate 0.2827   Epoch: 1   Global Step: 9770   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:00:44,464-Speed 10500.36 samples/sec   Loss 11.3093   LearningRate 0.2830   Epoch: 1   Global Step: 9780   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:00:52,304-Speed 10449.85 samples/sec   Loss 11.2310   LearningRate 0.2833   Epoch: 1   Global Step: 9790   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:01:00,122-Speed 10482.01 samples/sec   Loss 11.2221   LearningRate 0.2836   Epoch: 1   Global Step: 9800   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:01:07,920-Speed 10515.44 samples/sec   Loss 11.1947   LearningRate 0.2839   Epoch: 1   Global Step: 9810   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:01:15,728-Speed 10493.79 samples/sec   Loss 11.1145   LearningRate 0.2841   Epoch: 1   Global Step: 9820   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:01:23,506-Speed 10532.96 samples/sec   Loss 11.1388   LearningRate 0.2844   Epoch: 1   Global Step: 9830   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:01:31,348-Speed 10447.97 samples/sec   Loss 11.1974   LearningRate 0.2847   Epoch: 1   Global Step: 9840   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:01:39,146-Speed 10509.02 samples/sec   Loss 11.1942   LearningRate 0.2850   Epoch: 1   Global Step: 9850   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:01:46,955-Speed 10491.26 samples/sec   Loss 11.2094   LearningRate 0.2853   Epoch: 1   Global Step: 9860   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:01:54,796-Speed 10449.80 samples/sec   Loss 11.1447   LearningRate 0.2856   Epoch: 1   Global Step: 9870   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:02:02,582-Speed 10523.30 samples/sec   Loss 11.1938   LearningRate 0.2859   Epoch: 1   Global Step: 9880   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:02:10,386-Speed 10500.76 samples/sec   Loss 11.1721   LearningRate 0.2862   Epoch: 1   Global Step: 9890   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:02:18,183-Speed 10509.53 samples/sec   Loss 11.1450   LearningRate 0.2865   Epoch: 1   Global Step: 9900   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:02:25,958-Speed 10538.59 samples/sec   Loss 11.2249   LearningRate 0.2867   Epoch: 1   Global Step: 9910   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:02:33,726-Speed 10551.54 samples/sec   Loss 11.1986   LearningRate 0.2870   Epoch: 1   Global Step: 9920   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:02:41,538-Speed 10488.61 samples/sec   Loss 11.1813   LearningRate 0.2873   Epoch: 1   Global Step: 9930   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:02:49,391-Speed 10433.47 samples/sec   Loss 11.2198   LearningRate 0.2876   Epoch: 1   Global Step: 9940   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:02:57,195-Speed 10499.71 samples/sec   Loss 11.1669   LearningRate 0.2879   Epoch: 1   Global Step: 9950   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:03:05,015-Speed 10482.29 samples/sec   Loss 11.1200   LearningRate 0.2882   Epoch: 1   Global Step: 9960   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:03:12,801-Speed 10523.03 samples/sec   Loss 11.1096   LearningRate 0.2885   Epoch: 1   Global Step: 9970   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:03:20,605-Speed 10499.88 samples/sec   Loss 11.1261   LearningRate 0.2888   Epoch: 1   Global Step: 9980   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:03:28,427-Speed 10475.52 samples/sec   Loss 11.1293   LearningRate 0.2891   Epoch: 1   Global Step: 9990   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:03:36,234-Speed 10495.42 samples/sec   Loss 11.2263   LearningRate 0.2894   Epoch: 1   Global Step: 10000   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:04:03,465-[lfw][10000]XNorm: 23.064546
Training: 2022-01-15 17:04:03,466-[lfw][10000]Accuracy-Flip: 0.99483+-0.00391
Training: 2022-01-15 17:04:03,467-[lfw][10000]Accuracy-Highest: 0.99483
Training: 2022-01-15 17:04:35,455-[cfp_fp][10000]XNorm: 21.010530
Training: 2022-01-15 17:04:35,455-[cfp_fp][10000]Accuracy-Flip: 0.96829+-0.01067
Training: 2022-01-15 17:04:35,456-[cfp_fp][10000]Accuracy-Highest: 0.96829
Training: 2022-01-15 17:05:03,455-[agedb_30][10000]XNorm: 22.595752
Training: 2022-01-15 17:05:03,457-[agedb_30][10000]Accuracy-Flip: 0.95250+-0.01083
Training: 2022-01-15 17:05:03,457-[agedb_30][10000]Accuracy-Highest: 0.95250
Training: 2022-01-15 17:05:11,292-Speed 861.80 samples/sec   Loss 11.1196   LearningRate 0.2896   Epoch: 1   Global Step: 10010   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 17:05:19,060-Speed 10549.70 samples/sec   Loss 11.1118   LearningRate 0.2899   Epoch: 1   Global Step: 10020   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 17:05:26,825-Speed 10551.57 samples/sec   Loss 11.0616   LearningRate 0.2902   Epoch: 1   Global Step: 10030   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 17:05:34,592-Speed 10549.10 samples/sec   Loss 11.0828   LearningRate 0.2905   Epoch: 1   Global Step: 10040   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 17:05:42,445-Speed 10434.52 samples/sec   Loss 11.1688   LearningRate 0.2908   Epoch: 1   Global Step: 10050   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 17:05:50,241-Speed 10513.65 samples/sec   Loss 11.1544   LearningRate 0.2911   Epoch: 1   Global Step: 10060   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 17:05:58,001-Speed 10559.25 samples/sec   Loss 11.1863   LearningRate 0.2914   Epoch: 1   Global Step: 10070   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 17:06:05,794-Speed 10515.02 samples/sec   Loss 11.1837   LearningRate 0.2917   Epoch: 1   Global Step: 10080   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 17:06:13,576-Speed 10529.29 samples/sec   Loss 11.1362   LearningRate 0.2920   Epoch: 1   Global Step: 10090   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 17:06:21,362-Speed 10524.26 samples/sec   Loss 11.1585   LearningRate 0.2922   Epoch: 1   Global Step: 10100   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 17:06:29,145-Speed 10536.33 samples/sec   Loss 11.0764   LearningRate 0.2925   Epoch: 1   Global Step: 10110   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 17:06:36,916-Speed 10543.53 samples/sec   Loss 11.1530   LearningRate 0.2928   Epoch: 1   Global Step: 10120   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 17:06:44,696-Speed 10531.00 samples/sec   Loss 11.1490   LearningRate 0.2931   Epoch: 1   Global Step: 10130   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 17:06:52,470-Speed 10539.14 samples/sec   Loss 11.3044   LearningRate 0.2934   Epoch: 1   Global Step: 10140   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 17:07:00,242-Speed 10542.22 samples/sec   Loss 11.0486   LearningRate 0.2937   Epoch: 1   Global Step: 10150   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 17:07:08,028-Speed 10524.45 samples/sec   Loss 11.0951   LearningRate 0.2940   Epoch: 1   Global Step: 10160   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 17:07:15,810-Speed 10528.67 samples/sec   Loss 11.1485   LearningRate 0.2943   Epoch: 1   Global Step: 10170   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 17:07:23,590-Speed 10531.52 samples/sec   Loss 11.0862   LearningRate 0.2946   Epoch: 1   Global Step: 10180   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 17:07:31,385-Speed 10514.56 samples/sec   Loss 11.1484   LearningRate 0.2948   Epoch: 1   Global Step: 10190   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 17:07:39,165-Speed 10531.67 samples/sec   Loss 11.0582   LearningRate 0.2951   Epoch: 1   Global Step: 10200   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 17:07:46,981-Speed 10482.54 samples/sec   Loss 11.0968   LearningRate 0.2954   Epoch: 1   Global Step: 10210   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 17:07:54,764-Speed 10528.33 samples/sec   Loss 11.1518   LearningRate 0.2957   Epoch: 1   Global Step: 10220   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 17:08:02,531-Speed 10549.10 samples/sec   Loss 11.1228   LearningRate 0.2960   Epoch: 1   Global Step: 10230   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 17:08:10,306-Speed 10538.13 samples/sec   Loss 11.1999   LearningRate 0.2963   Epoch: 1   Global Step: 10240   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 17:08:18,089-Speed 10527.55 samples/sec   Loss 11.0920   LearningRate 0.2966   Epoch: 1   Global Step: 10250   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 17:08:25,894-Speed 10497.92 samples/sec   Loss 11.1351   LearningRate 0.2969   Epoch: 1   Global Step: 10260   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 17:08:33,667-Speed 10541.03 samples/sec   Loss 11.2410   LearningRate 0.2972   Epoch: 1   Global Step: 10270   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 17:08:41,465-Speed 10508.74 samples/sec   Loss 11.1587   LearningRate 0.2975   Epoch: 1   Global Step: 10280   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 17:08:49,241-Speed 10536.26 samples/sec   Loss 11.0385   LearningRate 0.2977   Epoch: 1   Global Step: 10290   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 17:08:57,063-Speed 10474.69 samples/sec   Loss 11.1128   LearningRate 0.2980   Epoch: 1   Global Step: 10300   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 17:09:04,886-Speed 10472.71 samples/sec   Loss 11.1826   LearningRate 0.2983   Epoch: 1   Global Step: 10310   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 17:09:12,687-Speed 10504.10 samples/sec   Loss 11.0643   LearningRate 0.2986   Epoch: 1   Global Step: 10320   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 17:09:20,517-Speed 10464.10 samples/sec   Loss 11.1408   LearningRate 0.2989   Epoch: 1   Global Step: 10330   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 17:09:28,350-Speed 10460.84 samples/sec   Loss 11.2111   LearningRate 0.2992   Epoch: 1   Global Step: 10340   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 17:09:36,225-Speed 10406.48 samples/sec   Loss 11.2287   LearningRate 0.2995   Epoch: 1   Global Step: 10350   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 17:09:44,056-Speed 10463.13 samples/sec   Loss 11.0545   LearningRate 0.2998   Epoch: 1   Global Step: 10360   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 17:09:51,882-Speed 10468.49 samples/sec   Loss 11.0891   LearningRate 0.3001   Epoch: 1   Global Step: 10370   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 17:10:14,752-Speed 3582.17 samples/sec   Loss 11.0807   LearningRate 0.3003   Epoch: 2   Global Step: 10380   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 17:10:22,547-Speed 10511.78 samples/sec   Loss 11.0924   LearningRate 0.3006   Epoch: 2   Global Step: 10390   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 17:10:30,346-Speed 10509.31 samples/sec   Loss 10.9982   LearningRate 0.3009   Epoch: 2   Global Step: 10400   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 17:10:38,119-Speed 10541.34 samples/sec   Loss 11.1399   LearningRate 0.3012   Epoch: 2   Global Step: 10410   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 17:10:45,946-Speed 10467.32 samples/sec   Loss 11.0933   LearningRate 0.3015   Epoch: 2   Global Step: 10420   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 17:10:53,749-Speed 10502.63 samples/sec   Loss 11.0406   LearningRate 0.3018   Epoch: 2   Global Step: 10430   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 17:11:01,527-Speed 10534.46 samples/sec   Loss 11.1111   LearningRate 0.3021   Epoch: 2   Global Step: 10440   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 17:11:09,350-Speed 10473.81 samples/sec   Loss 11.0601   LearningRate 0.3024   Epoch: 2   Global Step: 10450   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 17:11:17,162-Speed 10488.77 samples/sec   Loss 11.0802   LearningRate 0.3027   Epoch: 2   Global Step: 10460   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 17:11:24,996-Speed 10458.70 samples/sec   Loss 11.0908   LearningRate 0.3030   Epoch: 2   Global Step: 10470   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 17:11:32,788-Speed 10515.61 samples/sec   Loss 11.1481   LearningRate 0.3032   Epoch: 2   Global Step: 10480   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 17:11:40,585-Speed 10508.67 samples/sec   Loss 11.1577   LearningRate 0.3035   Epoch: 2   Global Step: 10490   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 17:11:48,386-Speed 10503.21 samples/sec   Loss 11.1018   LearningRate 0.3038   Epoch: 2   Global Step: 10500   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 17:11:56,179-Speed 10514.79 samples/sec   Loss 11.1315   LearningRate 0.3041   Epoch: 2   Global Step: 10510   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 17:12:03,970-Speed 10516.28 samples/sec   Loss 11.0961   LearningRate 0.3044   Epoch: 2   Global Step: 10520   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 17:12:11,748-Speed 10534.51 samples/sec   Loss 11.1705   LearningRate 0.3047   Epoch: 2   Global Step: 10530   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 17:12:19,545-Speed 10508.93 samples/sec   Loss 11.1463   LearningRate 0.3050   Epoch: 2   Global Step: 10540   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 17:12:27,299-Speed 10566.23 samples/sec   Loss 11.0101   LearningRate 0.3053   Epoch: 2   Global Step: 10550   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 17:12:35,058-Speed 10560.62 samples/sec   Loss 11.1359   LearningRate 0.3056   Epoch: 2   Global Step: 10560   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 17:12:42,830-Speed 10542.66 samples/sec   Loss 11.0290   LearningRate 0.3058   Epoch: 2   Global Step: 10570   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-15 17:12:50,609-Speed 10532.41 samples/sec   Loss 11.0516   LearningRate 0.3061   Epoch: 2   Global Step: 10580   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-15 17:12:58,400-Speed 10515.24 samples/sec   Loss 11.0224   LearningRate 0.3064   Epoch: 2   Global Step: 10590   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:13:06,182-Speed 10530.40 samples/sec   Loss 11.0455   LearningRate 0.3067   Epoch: 2   Global Step: 10600   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:13:13,968-Speed 10522.87 samples/sec   Loss 11.0806   LearningRate 0.3070   Epoch: 2   Global Step: 10610   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:13:21,791-Speed 10473.67 samples/sec   Loss 11.1035   LearningRate 0.3073   Epoch: 2   Global Step: 10620   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:13:29,600-Speed 10494.13 samples/sec   Loss 11.0577   LearningRate 0.3076   Epoch: 2   Global Step: 10630   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:13:37,392-Speed 10516.56 samples/sec   Loss 11.0638   LearningRate 0.3079   Epoch: 2   Global Step: 10640   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:13:45,181-Speed 10521.26 samples/sec   Loss 11.2120   LearningRate 0.3082   Epoch: 2   Global Step: 10650   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:13:53,013-Speed 10460.98 samples/sec   Loss 11.1199   LearningRate 0.3084   Epoch: 2   Global Step: 10660   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:14:00,795-Speed 10527.72 samples/sec   Loss 11.0791   LearningRate 0.3087   Epoch: 2   Global Step: 10670   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:14:08,590-Speed 10512.71 samples/sec   Loss 11.0554   LearningRate 0.3090   Epoch: 2   Global Step: 10680   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:14:16,375-Speed 10524.56 samples/sec   Loss 11.0719   LearningRate 0.3093   Epoch: 2   Global Step: 10690   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:14:24,179-Speed 10500.01 samples/sec   Loss 11.0648   LearningRate 0.3096   Epoch: 2   Global Step: 10700   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:14:31,977-Speed 10507.59 samples/sec   Loss 11.0231   LearningRate 0.3099   Epoch: 2   Global Step: 10710   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:14:39,765-Speed 10520.18 samples/sec   Loss 11.0810   LearningRate 0.3102   Epoch: 2   Global Step: 10720   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:14:47,546-Speed 10531.34 samples/sec   Loss 11.0914   LearningRate 0.3105   Epoch: 2   Global Step: 10730   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:14:55,331-Speed 10524.58 samples/sec   Loss 11.1033   LearningRate 0.3108   Epoch: 2   Global Step: 10740   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:15:03,127-Speed 10509.97 samples/sec   Loss 11.0875   LearningRate 0.3111   Epoch: 2   Global Step: 10750   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:15:10,915-Speed 10520.36 samples/sec   Loss 11.1098   LearningRate 0.3113   Epoch: 2   Global Step: 10760   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:15:18,736-Speed 10485.23 samples/sec   Loss 11.0785   LearningRate 0.3116   Epoch: 2   Global Step: 10770   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:15:26,532-Speed 10509.12 samples/sec   Loss 11.0668   LearningRate 0.3119   Epoch: 2   Global Step: 10780   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:15:34,359-Speed 10471.45 samples/sec   Loss 11.0710   LearningRate 0.3122   Epoch: 2   Global Step: 10790   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:15:42,154-Speed 10511.55 samples/sec   Loss 11.1413   LearningRate 0.3125   Epoch: 2   Global Step: 10800   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:15:49,983-Speed 10465.69 samples/sec   Loss 11.0706   LearningRate 0.3128   Epoch: 2   Global Step: 10810   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:15:57,826-Speed 10444.89 samples/sec   Loss 11.1146   LearningRate 0.3131   Epoch: 2   Global Step: 10820   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:16:05,642-Speed 10484.28 samples/sec   Loss 11.0667   LearningRate 0.3134   Epoch: 2   Global Step: 10830   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:16:13,446-Speed 10498.64 samples/sec   Loss 11.0417   LearningRate 0.3137   Epoch: 2   Global Step: 10840   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:16:21,251-Speed 10505.62 samples/sec   Loss 11.0707   LearningRate 0.3139   Epoch: 2   Global Step: 10850   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:16:29,072-Speed 10475.21 samples/sec   Loss 11.1314   LearningRate 0.3142   Epoch: 2   Global Step: 10860   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:16:36,857-Speed 10525.34 samples/sec   Loss 11.0097   LearningRate 0.3145   Epoch: 2   Global Step: 10870   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:16:44,644-Speed 10526.65 samples/sec   Loss 11.1690   LearningRate 0.3148   Epoch: 2   Global Step: 10880   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:16:52,549-Speed 10365.75 samples/sec   Loss 11.1114   LearningRate 0.3151   Epoch: 2   Global Step: 10890   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:17:00,371-Speed 10474.94 samples/sec   Loss 11.0820   LearningRate 0.3154   Epoch: 2   Global Step: 10900   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:17:08,176-Speed 10497.46 samples/sec   Loss 11.0670   LearningRate 0.3157   Epoch: 2   Global Step: 10910   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:17:15,961-Speed 10524.18 samples/sec   Loss 11.0627   LearningRate 0.3160   Epoch: 2   Global Step: 10920   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:17:23,754-Speed 10517.21 samples/sec   Loss 11.0671   LearningRate 0.3163   Epoch: 2   Global Step: 10930   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:17:31,529-Speed 10537.73 samples/sec   Loss 11.0488   LearningRate 0.3166   Epoch: 2   Global Step: 10940   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-15 17:17:39,354-Speed 10470.00 samples/sec   Loss 11.0023   LearningRate 0.3168   Epoch: 2   Global Step: 10950   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-15 17:17:47,145-Speed 10516.86 samples/sec   Loss 11.0131   LearningRate 0.3171   Epoch: 2   Global Step: 10960   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-15 17:17:54,941-Speed 10510.68 samples/sec   Loss 11.0648   LearningRate 0.3174   Epoch: 2   Global Step: 10970   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-15 17:18:02,716-Speed 10537.22 samples/sec   Loss 11.1550   LearningRate 0.3177   Epoch: 2   Global Step: 10980   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-15 17:18:10,519-Speed 10500.20 samples/sec   Loss 11.1206   LearningRate 0.3180   Epoch: 2   Global Step: 10990   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-15 17:18:18,320-Speed 10502.62 samples/sec   Loss 11.0520   LearningRate 0.3183   Epoch: 2   Global Step: 11000   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-15 17:18:26,144-Speed 10477.87 samples/sec   Loss 11.1006   LearningRate 0.3186   Epoch: 2   Global Step: 11010   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-15 17:18:33,963-Speed 10477.85 samples/sec   Loss 11.0638   LearningRate 0.3189   Epoch: 2   Global Step: 11020   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-15 17:18:41,750-Speed 10522.63 samples/sec   Loss 11.0316   LearningRate 0.3192   Epoch: 2   Global Step: 11030   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-15 17:18:49,583-Speed 10460.48 samples/sec   Loss 11.0401   LearningRate 0.3194   Epoch: 2   Global Step: 11040   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:18:57,382-Speed 10506.17 samples/sec   Loss 11.1255   LearningRate 0.3197   Epoch: 2   Global Step: 11050   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:19:05,161-Speed 10531.89 samples/sec   Loss 11.1191   LearningRate 0.3200   Epoch: 2   Global Step: 11060   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:19:12,931-Speed 10544.72 samples/sec   Loss 11.0418   LearningRate 0.3203   Epoch: 2   Global Step: 11070   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:19:20,717-Speed 10522.97 samples/sec   Loss 11.0843   LearningRate 0.3206   Epoch: 2   Global Step: 11080   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:19:28,552-Speed 10457.90 samples/sec   Loss 10.9939   LearningRate 0.3209   Epoch: 2   Global Step: 11090   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:19:36,381-Speed 10466.67 samples/sec   Loss 11.1019   LearningRate 0.3212   Epoch: 2   Global Step: 11100   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:19:44,228-Speed 10440.96 samples/sec   Loss 11.0629   LearningRate 0.3215   Epoch: 2   Global Step: 11110   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:19:52,004-Speed 10537.19 samples/sec   Loss 11.0215   LearningRate 0.3218   Epoch: 2   Global Step: 11120   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:19:59,793-Speed 10517.84 samples/sec   Loss 11.0880   LearningRate 0.3220   Epoch: 2   Global Step: 11130   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:20:07,615-Speed 10474.60 samples/sec   Loss 11.0518   LearningRate 0.3223   Epoch: 2   Global Step: 11140   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:20:15,457-Speed 10448.42 samples/sec   Loss 11.0599   LearningRate 0.3226   Epoch: 2   Global Step: 11150   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:20:23,309-Speed 10433.77 samples/sec   Loss 11.0028   LearningRate 0.3229   Epoch: 2   Global Step: 11160   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:20:31,152-Speed 10447.96 samples/sec   Loss 11.0711   LearningRate 0.3232   Epoch: 2   Global Step: 11170   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:20:38,992-Speed 10451.93 samples/sec   Loss 11.0581   LearningRate 0.3235   Epoch: 2   Global Step: 11180   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:20:46,836-Speed 10446.65 samples/sec   Loss 11.0877   LearningRate 0.3238   Epoch: 2   Global Step: 11190   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:20:54,672-Speed 10456.51 samples/sec   Loss 11.0548   LearningRate 0.3241   Epoch: 2   Global Step: 11200   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:21:02,499-Speed 10473.84 samples/sec   Loss 11.1774   LearningRate 0.3244   Epoch: 2   Global Step: 11210   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:21:10,340-Speed 10449.92 samples/sec   Loss 11.0767   LearningRate 0.3247   Epoch: 2   Global Step: 11220   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:21:18,161-Speed 10478.18 samples/sec   Loss 11.0226   LearningRate 0.3249   Epoch: 2   Global Step: 11230   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:21:25,986-Speed 10471.27 samples/sec   Loss 11.0503   LearningRate 0.3252   Epoch: 2   Global Step: 11240   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:21:33,834-Speed 10440.01 samples/sec   Loss 11.0580   LearningRate 0.3255   Epoch: 2   Global Step: 11250   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:21:41,679-Speed 10444.48 samples/sec   Loss 11.1335   LearningRate 0.3258   Epoch: 2   Global Step: 11260   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:21:49,556-Speed 10401.61 samples/sec   Loss 11.0033   LearningRate 0.3261   Epoch: 2   Global Step: 11270   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:21:57,417-Speed 10422.92 samples/sec   Loss 11.0827   LearningRate 0.3264   Epoch: 2   Global Step: 11280   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:22:05,228-Speed 10489.27 samples/sec   Loss 11.0354   LearningRate 0.3267   Epoch: 2   Global Step: 11290   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:22:13,050-Speed 10475.55 samples/sec   Loss 11.1449   LearningRate 0.3270   Epoch: 2   Global Step: 11300   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:22:20,860-Speed 10490.81 samples/sec   Loss 11.0989   LearningRate 0.3273   Epoch: 2   Global Step: 11310   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:22:28,722-Speed 10421.13 samples/sec   Loss 11.0812   LearningRate 0.3275   Epoch: 2   Global Step: 11320   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:22:36,549-Speed 10468.45 samples/sec   Loss 10.9581   LearningRate 0.3278   Epoch: 2   Global Step: 11330   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:22:44,373-Speed 10472.38 samples/sec   Loss 10.9381   LearningRate 0.3281   Epoch: 2   Global Step: 11340   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:22:52,218-Speed 10443.97 samples/sec   Loss 11.0328   LearningRate 0.3284   Epoch: 2   Global Step: 11350   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:23:00,045-Speed 10469.75 samples/sec   Loss 11.0142   LearningRate 0.3287   Epoch: 2   Global Step: 11360   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:23:07,855-Speed 10491.47 samples/sec   Loss 11.0356   LearningRate 0.3290   Epoch: 2   Global Step: 11370   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:23:15,693-Speed 10453.99 samples/sec   Loss 11.0664   LearningRate 0.3293   Epoch: 2   Global Step: 11380   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:23:23,496-Speed 10501.25 samples/sec   Loss 11.0824   LearningRate 0.3296   Epoch: 2   Global Step: 11390   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:23:31,331-Speed 10457.23 samples/sec   Loss 11.0445   LearningRate 0.3299   Epoch: 2   Global Step: 11400   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:23:39,144-Speed 10487.98 samples/sec   Loss 11.0296   LearningRate 0.3302   Epoch: 2   Global Step: 11410   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:23:46,960-Speed 10483.26 samples/sec   Loss 11.0398   LearningRate 0.3304   Epoch: 2   Global Step: 11420   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:23:54,790-Speed 10464.15 samples/sec   Loss 11.1006   LearningRate 0.3307   Epoch: 2   Global Step: 11430   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:24:02,621-Speed 10463.20 samples/sec   Loss 11.1618   LearningRate 0.3310   Epoch: 2   Global Step: 11440   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:24:10,447-Speed 10469.12 samples/sec   Loss 11.0243   LearningRate 0.3313   Epoch: 2   Global Step: 11450   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:24:18,291-Speed 10447.17 samples/sec   Loss 11.0835   LearningRate 0.3316   Epoch: 2   Global Step: 11460   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:24:26,120-Speed 10464.51 samples/sec   Loss 11.1126   LearningRate 0.3319   Epoch: 2   Global Step: 11470   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:24:33,938-Speed 10479.74 samples/sec   Loss 11.0890   LearningRate 0.3322   Epoch: 2   Global Step: 11480   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:24:41,756-Speed 10481.70 samples/sec   Loss 11.0531   LearningRate 0.3325   Epoch: 2   Global Step: 11490   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:24:49,598-Speed 10448.39 samples/sec   Loss 11.0198   LearningRate 0.3328   Epoch: 2   Global Step: 11500   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:24:57,418-Speed 10477.47 samples/sec   Loss 11.0632   LearningRate 0.3330   Epoch: 2   Global Step: 11510   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:25:05,253-Speed 10458.55 samples/sec   Loss 11.0683   LearningRate 0.3333   Epoch: 2   Global Step: 11520   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:25:13,098-Speed 10444.92 samples/sec   Loss 11.0756   LearningRate 0.3336   Epoch: 2   Global Step: 11530   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:25:21,030-Speed 10330.22 samples/sec   Loss 11.0931   LearningRate 0.3339   Epoch: 2   Global Step: 11540   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:25:28,853-Speed 10474.87 samples/sec   Loss 11.0739   LearningRate 0.3342   Epoch: 2   Global Step: 11550   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:25:36,678-Speed 10470.45 samples/sec   Loss 11.0108   LearningRate 0.3345   Epoch: 2   Global Step: 11560   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:25:44,509-Speed 10462.37 samples/sec   Loss 11.1432   LearningRate 0.3348   Epoch: 2   Global Step: 11570   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:25:52,339-Speed 10463.38 samples/sec   Loss 11.0157   LearningRate 0.3351   Epoch: 2   Global Step: 11580   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:26:00,174-Speed 10458.22 samples/sec   Loss 11.0591   LearningRate 0.3354   Epoch: 2   Global Step: 11590   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:26:08,014-Speed 10449.65 samples/sec   Loss 10.9985   LearningRate 0.3356   Epoch: 2   Global Step: 11600   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:26:15,887-Speed 10407.32 samples/sec   Loss 10.9622   LearningRate 0.3359   Epoch: 2   Global Step: 11610   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:26:23,716-Speed 10464.82 samples/sec   Loss 11.0827   LearningRate 0.3362   Epoch: 2   Global Step: 11620   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:26:31,529-Speed 10487.31 samples/sec   Loss 11.0876   LearningRate 0.3365   Epoch: 2   Global Step: 11630   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:26:39,387-Speed 10425.13 samples/sec   Loss 11.0972   LearningRate 0.3368   Epoch: 2   Global Step: 11640   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:26:47,240-Speed 10433.44 samples/sec   Loss 11.1535   LearningRate 0.3371   Epoch: 2   Global Step: 11650   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:26:55,077-Speed 10453.86 samples/sec   Loss 11.3383   LearningRate 0.3374   Epoch: 2   Global Step: 11660   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:27:02,915-Speed 10453.73 samples/sec   Loss 11.1599   LearningRate 0.3377   Epoch: 2   Global Step: 11670   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:27:10,751-Speed 10455.48 samples/sec   Loss 11.0985   LearningRate 0.3380   Epoch: 2   Global Step: 11680   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:27:18,572-Speed 10475.13 samples/sec   Loss 11.0811   LearningRate 0.3383   Epoch: 2   Global Step: 11690   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:27:26,408-Speed 10455.98 samples/sec   Loss 11.0538   LearningRate 0.3385   Epoch: 2   Global Step: 11700   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:27:34,243-Speed 10457.17 samples/sec   Loss 11.0359   LearningRate 0.3388   Epoch: 2   Global Step: 11710   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:27:42,068-Speed 10471.69 samples/sec   Loss 11.0721   LearningRate 0.3391   Epoch: 2   Global Step: 11720   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:27:49,925-Speed 10428.16 samples/sec   Loss 11.1705   LearningRate 0.3394   Epoch: 2   Global Step: 11730   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:27:57,762-Speed 10456.29 samples/sec   Loss 11.0841   LearningRate 0.3397   Epoch: 2   Global Step: 11740   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:28:05,582-Speed 10477.56 samples/sec   Loss 11.0069   LearningRate 0.3400   Epoch: 2   Global Step: 11750   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:28:13,405-Speed 10473.48 samples/sec   Loss 11.0484   LearningRate 0.3403   Epoch: 2   Global Step: 11760   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:28:21,296-Speed 10386.91 samples/sec   Loss 11.0909   LearningRate 0.3406   Epoch: 2   Global Step: 11770   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:28:29,185-Speed 10386.43 samples/sec   Loss 11.0518   LearningRate 0.3409   Epoch: 2   Global Step: 11780   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:28:37,020-Speed 10457.38 samples/sec   Loss 11.1126   LearningRate 0.3411   Epoch: 2   Global Step: 11790   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:28:44,845-Speed 10469.71 samples/sec   Loss 11.0786   LearningRate 0.3414   Epoch: 2   Global Step: 11800   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:28:52,710-Speed 10418.35 samples/sec   Loss 11.0570   LearningRate 0.3417   Epoch: 2   Global Step: 11810   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:29:00,502-Speed 10515.18 samples/sec   Loss 11.0853   LearningRate 0.3420   Epoch: 2   Global Step: 11820   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:29:08,304-Speed 10501.42 samples/sec   Loss 11.3346   LearningRate 0.3423   Epoch: 2   Global Step: 11830   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:29:16,180-Speed 10401.57 samples/sec   Loss 11.1900   LearningRate 0.3426   Epoch: 2   Global Step: 11840   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:29:23,988-Speed 10493.91 samples/sec   Loss 11.0431   LearningRate 0.3429   Epoch: 2   Global Step: 11850   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:29:31,791-Speed 10500.22 samples/sec   Loss 11.0203   LearningRate 0.3432   Epoch: 2   Global Step: 11860   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:29:39,611-Speed 10477.11 samples/sec   Loss 11.0805   LearningRate 0.3435   Epoch: 2   Global Step: 11870   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:29:47,403-Speed 10515.34 samples/sec   Loss 11.0941   LearningRate 0.3437   Epoch: 2   Global Step: 11880   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:29:55,258-Speed 10429.70 samples/sec   Loss 11.0368   LearningRate 0.3440   Epoch: 2   Global Step: 11890   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:30:03,078-Speed 10478.89 samples/sec   Loss 11.0515   LearningRate 0.3443   Epoch: 2   Global Step: 11900   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:30:10,881-Speed 10499.74 samples/sec   Loss 11.0608   LearningRate 0.3446   Epoch: 2   Global Step: 11910   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:30:18,690-Speed 10491.25 samples/sec   Loss 11.0354   LearningRate 0.3449   Epoch: 2   Global Step: 11920   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-15 17:30:26,521-Speed 10463.77 samples/sec   Loss 11.0403   LearningRate 0.3452   Epoch: 2   Global Step: 11930   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-15 17:30:34,329-Speed 10493.02 samples/sec   Loss 11.0120   LearningRate 0.3455   Epoch: 2   Global Step: 11940   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-15 17:30:42,192-Speed 10420.52 samples/sec   Loss 11.0770   LearningRate 0.3458   Epoch: 2   Global Step: 11950   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-15 17:30:50,009-Speed 10480.04 samples/sec   Loss 11.1610   LearningRate 0.3461   Epoch: 2   Global Step: 11960   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-15 17:30:57,866-Speed 10427.51 samples/sec   Loss 11.0925   LearningRate 0.3464   Epoch: 2   Global Step: 11970   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-15 17:31:05,686-Speed 10478.94 samples/sec   Loss 11.0927   LearningRate 0.3466   Epoch: 2   Global Step: 11980   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-15 17:31:13,526-Speed 10451.18 samples/sec   Loss 11.1206   LearningRate 0.3469   Epoch: 2   Global Step: 11990   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-15 17:31:21,331-Speed 10497.42 samples/sec   Loss 11.0411   LearningRate 0.3472   Epoch: 2   Global Step: 12000   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-15 17:31:29,135-Speed 10499.13 samples/sec   Loss 11.1667   LearningRate 0.3475   Epoch: 2   Global Step: 12010   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-15 17:31:36,944-Speed 10492.84 samples/sec   Loss 11.0826   LearningRate 0.3478   Epoch: 2   Global Step: 12020   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:31:44,757-Speed 10487.23 samples/sec   Loss 11.1820   LearningRate 0.3481   Epoch: 2   Global Step: 12030   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:31:52,610-Speed 10433.16 samples/sec   Loss 11.0456   LearningRate 0.3484   Epoch: 2   Global Step: 12040   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:32:00,477-Speed 10416.73 samples/sec   Loss 11.1058   LearningRate 0.3487   Epoch: 2   Global Step: 12050   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:32:08,324-Speed 10440.77 samples/sec   Loss 11.0163   LearningRate 0.3490   Epoch: 2   Global Step: 12060   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:32:16,177-Speed 10434.22 samples/sec   Loss 11.1148   LearningRate 0.3492   Epoch: 2   Global Step: 12070   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:32:23,987-Speed 10489.90 samples/sec   Loss 11.0939   LearningRate 0.3495   Epoch: 2   Global Step: 12080   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:32:31,781-Speed 10511.91 samples/sec   Loss 11.0226   LearningRate 0.3498   Epoch: 2   Global Step: 12090   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:32:39,571-Speed 10517.57 samples/sec   Loss 11.0764   LearningRate 0.3501   Epoch: 2   Global Step: 12100   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:32:47,387-Speed 10484.48 samples/sec   Loss 11.0820   LearningRate 0.3504   Epoch: 2   Global Step: 12110   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:32:55,215-Speed 10465.48 samples/sec   Loss 11.0292   LearningRate 0.3507   Epoch: 2   Global Step: 12120   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:33:03,151-Speed 10324.46 samples/sec   Loss 11.0863   LearningRate 0.3510   Epoch: 2   Global Step: 12130   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:33:10,980-Speed 10465.74 samples/sec   Loss 11.1281   LearningRate 0.3513   Epoch: 2   Global Step: 12140   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:33:18,755-Speed 10537.90 samples/sec   Loss 11.0614   LearningRate 0.3516   Epoch: 2   Global Step: 12150   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:33:26,545-Speed 10518.28 samples/sec   Loss 11.1026   LearningRate 0.3519   Epoch: 2   Global Step: 12160   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:33:34,324-Speed 10532.23 samples/sec   Loss 11.1431   LearningRate 0.3521   Epoch: 2   Global Step: 12170   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:33:42,141-Speed 10482.97 samples/sec   Loss 11.0713   LearningRate 0.3524   Epoch: 2   Global Step: 12180   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:33:49,936-Speed 10510.92 samples/sec   Loss 11.0866   LearningRate 0.3527   Epoch: 2   Global Step: 12190   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:33:57,735-Speed 10506.05 samples/sec   Loss 11.0217   LearningRate 0.3530   Epoch: 2   Global Step: 12200   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:34:05,548-Speed 10485.74 samples/sec   Loss 11.0795   LearningRate 0.3533   Epoch: 2   Global Step: 12210   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:34:13,372-Speed 10472.37 samples/sec   Loss 11.1051   LearningRate 0.3536   Epoch: 2   Global Step: 12220   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:34:21,200-Speed 10467.81 samples/sec   Loss 11.0990   LearningRate 0.3539   Epoch: 2   Global Step: 12230   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:34:29,002-Speed 10501.44 samples/sec   Loss 11.1782   LearningRate 0.3542   Epoch: 2   Global Step: 12240   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:34:36,782-Speed 10535.99 samples/sec   Loss 11.1739   LearningRate 0.3545   Epoch: 2   Global Step: 12250   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:34:44,578-Speed 10510.34 samples/sec   Loss 11.1488   LearningRate 0.3547   Epoch: 2   Global Step: 12260   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:34:52,391-Speed 10486.78 samples/sec   Loss 11.0959   LearningRate 0.3550   Epoch: 2   Global Step: 12270   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:35:00,210-Speed 10478.61 samples/sec   Loss 11.1439   LearningRate 0.3553   Epoch: 2   Global Step: 12280   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:35:08,025-Speed 10485.05 samples/sec   Loss 11.0445   LearningRate 0.3556   Epoch: 2   Global Step: 12290   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:35:15,828-Speed 10499.87 samples/sec   Loss 11.0640   LearningRate 0.3559   Epoch: 2   Global Step: 12300   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:35:23,664-Speed 10455.66 samples/sec   Loss 11.0357   LearningRate 0.3562   Epoch: 2   Global Step: 12310   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:35:31,503-Speed 10453.42 samples/sec   Loss 11.0688   LearningRate 0.3565   Epoch: 2   Global Step: 12320   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:35:39,297-Speed 10511.09 samples/sec   Loss 11.0493   LearningRate 0.3568   Epoch: 2   Global Step: 12330   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:35:47,107-Speed 10491.94 samples/sec   Loss 11.0447   LearningRate 0.3571   Epoch: 2   Global Step: 12340   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:35:54,919-Speed 10488.25 samples/sec   Loss 11.2155   LearningRate 0.3573   Epoch: 2   Global Step: 12350   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:36:02,732-Speed 10487.12 samples/sec   Loss 11.0893   LearningRate 0.3576   Epoch: 2   Global Step: 12360   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:36:10,527-Speed 10511.17 samples/sec   Loss 11.1274   LearningRate 0.3579   Epoch: 2   Global Step: 12370   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:36:18,326-Speed 10505.59 samples/sec   Loss 11.0582   LearningRate 0.3582   Epoch: 2   Global Step: 12380   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:36:26,162-Speed 10460.29 samples/sec   Loss 11.1313   LearningRate 0.3585   Epoch: 2   Global Step: 12390   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:36:33,959-Speed 10507.11 samples/sec   Loss 11.2474   LearningRate 0.3588   Epoch: 2   Global Step: 12400   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:36:41,763-Speed 10499.64 samples/sec   Loss 11.1819   LearningRate 0.3591   Epoch: 2   Global Step: 12410   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:36:49,554-Speed 10517.39 samples/sec   Loss 11.1066   LearningRate 0.3594   Epoch: 2   Global Step: 12420   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:36:57,360-Speed 10494.95 samples/sec   Loss 11.0853   LearningRate 0.3597   Epoch: 2   Global Step: 12430   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:37:05,148-Speed 10519.82 samples/sec   Loss 11.0858   LearningRate 0.3600   Epoch: 2   Global Step: 12440   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:37:12,942-Speed 10513.01 samples/sec   Loss 11.0876   LearningRate 0.3602   Epoch: 2   Global Step: 12450   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:37:20,738-Speed 10509.61 samples/sec   Loss 11.0547   LearningRate 0.3605   Epoch: 2   Global Step: 12460   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:37:28,591-Speed 10434.42 samples/sec   Loss 11.1442   LearningRate 0.3608   Epoch: 2   Global Step: 12470   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:37:36,437-Speed 10442.24 samples/sec   Loss 11.0813   LearningRate 0.3611   Epoch: 2   Global Step: 12480   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:37:44,259-Speed 10475.11 samples/sec   Loss 10.9981   LearningRate 0.3614   Epoch: 2   Global Step: 12490   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:37:52,102-Speed 10446.70 samples/sec   Loss 11.1024   LearningRate 0.3617   Epoch: 2   Global Step: 12500   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:37:59,963-Speed 10422.54 samples/sec   Loss 11.0372   LearningRate 0.3620   Epoch: 2   Global Step: 12510   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:38:07,759-Speed 10509.78 samples/sec   Loss 11.0784   LearningRate 0.3623   Epoch: 2   Global Step: 12520   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:38:15,534-Speed 10537.75 samples/sec   Loss 11.0505   LearningRate 0.3626   Epoch: 2   Global Step: 12530   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:38:23,324-Speed 10517.80 samples/sec   Loss 11.1460   LearningRate 0.3628   Epoch: 2   Global Step: 12540   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:38:31,132-Speed 10494.21 samples/sec   Loss 11.1691   LearningRate 0.3631   Epoch: 2   Global Step: 12550   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:38:38,930-Speed 10505.62 samples/sec   Loss 11.0876   LearningRate 0.3634   Epoch: 2   Global Step: 12560   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:38:46,723-Speed 10513.86 samples/sec   Loss 11.0933   LearningRate 0.3637   Epoch: 2   Global Step: 12570   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:38:54,559-Speed 10457.12 samples/sec   Loss 11.1167   LearningRate 0.3640   Epoch: 2   Global Step: 12580   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:39:02,377-Speed 10480.15 samples/sec   Loss 11.1933   LearningRate 0.3643   Epoch: 2   Global Step: 12590   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:39:10,224-Speed 10441.69 samples/sec   Loss 11.1063   LearningRate 0.3646   Epoch: 2   Global Step: 12600   Fp16 Grad Scale: 524288   Required: 20 hours
Training: 2022-01-15 17:39:18,014-Speed 10517.74 samples/sec   Loss 11.1771   LearningRate 0.3649   Epoch: 2   Global Step: 12610   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:39:25,849-Speed 10456.81 samples/sec   Loss 11.0595   LearningRate 0.3652   Epoch: 2   Global Step: 12620   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:39:33,649-Speed 10504.02 samples/sec   Loss 11.0798   LearningRate 0.3655   Epoch: 2   Global Step: 12630   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:39:41,451-Speed 10502.45 samples/sec   Loss 11.0977   LearningRate 0.3657   Epoch: 2   Global Step: 12640   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:39:49,258-Speed 10494.22 samples/sec   Loss 11.1585   LearningRate 0.3660   Epoch: 2   Global Step: 12650   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:39:57,058-Speed 10503.73 samples/sec   Loss 11.0931   LearningRate 0.3663   Epoch: 2   Global Step: 12660   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:40:04,877-Speed 10482.24 samples/sec   Loss 11.2376   LearningRate 0.3666   Epoch: 2   Global Step: 12670   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:40:12,687-Speed 10493.46 samples/sec   Loss 11.1691   LearningRate 0.3669   Epoch: 2   Global Step: 12680   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:40:20,478-Speed 10515.56 samples/sec   Loss 11.2703   LearningRate 0.3672   Epoch: 2   Global Step: 12690   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:40:28,304-Speed 10469.92 samples/sec   Loss 11.2477   LearningRate 0.3675   Epoch: 2   Global Step: 12700   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:40:36,095-Speed 10516.55 samples/sec   Loss 11.1244   LearningRate 0.3678   Epoch: 2   Global Step: 12710   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:40:43,898-Speed 10500.34 samples/sec   Loss 11.1160   LearningRate 0.3681   Epoch: 2   Global Step: 12720   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:40:51,702-Speed 10499.02 samples/sec   Loss 11.0949   LearningRate 0.3683   Epoch: 2   Global Step: 12730   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:40:59,509-Speed 10494.93 samples/sec   Loss 11.1304   LearningRate 0.3686   Epoch: 2   Global Step: 12740   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:41:07,333-Speed 10471.49 samples/sec   Loss 11.1129   LearningRate 0.3689   Epoch: 2   Global Step: 12750   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:41:15,109-Speed 10537.20 samples/sec   Loss 11.0514   LearningRate 0.3692   Epoch: 2   Global Step: 12760   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:41:22,902-Speed 10514.41 samples/sec   Loss 11.2199   LearningRate 0.3695   Epoch: 2   Global Step: 12770   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:41:30,724-Speed 10474.57 samples/sec   Loss 11.1238   LearningRate 0.3698   Epoch: 2   Global Step: 12780   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:41:38,626-Speed 10370.15 samples/sec   Loss 11.0739   LearningRate 0.3701   Epoch: 2   Global Step: 12790   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:41:46,429-Speed 10499.77 samples/sec   Loss 11.1500   LearningRate 0.3704   Epoch: 2   Global Step: 12800   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:41:54,228-Speed 10506.31 samples/sec   Loss 11.1013   LearningRate 0.3707   Epoch: 2   Global Step: 12810   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:42:02,037-Speed 10493.65 samples/sec   Loss 11.1497   LearningRate 0.3709   Epoch: 2   Global Step: 12820   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:42:09,830-Speed 10513.46 samples/sec   Loss 11.0987   LearningRate 0.3712   Epoch: 2   Global Step: 12830   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:42:17,628-Speed 10506.90 samples/sec   Loss 11.1568   LearningRate 0.3715   Epoch: 2   Global Step: 12840   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:42:25,418-Speed 10518.50 samples/sec   Loss 11.1918   LearningRate 0.3718   Epoch: 2   Global Step: 12850   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:42:33,195-Speed 10534.92 samples/sec   Loss 11.1326   LearningRate 0.3721   Epoch: 2   Global Step: 12860   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:42:41,026-Speed 10462.41 samples/sec   Loss 11.0957   LearningRate 0.3724   Epoch: 2   Global Step: 12870   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:42:48,813-Speed 10522.73 samples/sec   Loss 11.1358   LearningRate 0.3727   Epoch: 2   Global Step: 12880   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:42:56,633-Speed 10477.17 samples/sec   Loss 11.1380   LearningRate 0.3730   Epoch: 2   Global Step: 12890   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:43:04,442-Speed 10496.88 samples/sec   Loss 11.1679   LearningRate 0.3733   Epoch: 2   Global Step: 12900   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:43:12,225-Speed 10532.67 samples/sec   Loss 11.1577   LearningRate 0.3736   Epoch: 2   Global Step: 12910   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:43:20,013-Speed 10522.56 samples/sec   Loss 11.1871   LearningRate 0.3738   Epoch: 2   Global Step: 12920   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:43:27,846-Speed 10460.07 samples/sec   Loss 11.2618   LearningRate 0.3741   Epoch: 2   Global Step: 12930   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:43:35,649-Speed 10499.71 samples/sec   Loss 11.1821   LearningRate 0.3744   Epoch: 2   Global Step: 12940   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:43:43,438-Speed 10520.24 samples/sec   Loss 11.2407   LearningRate 0.3747   Epoch: 2   Global Step: 12950   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:43:51,232-Speed 10512.83 samples/sec   Loss 11.1870   LearningRate 0.3750   Epoch: 2   Global Step: 12960   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:43:59,053-Speed 10475.42 samples/sec   Loss 11.1993   LearningRate 0.3753   Epoch: 2   Global Step: 12970   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:44:06,852-Speed 10505.59 samples/sec   Loss 11.1663   LearningRate 0.3756   Epoch: 2   Global Step: 12980   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:44:14,647-Speed 10511.24 samples/sec   Loss 11.1156   LearningRate 0.3759   Epoch: 2   Global Step: 12990   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:44:22,503-Speed 10430.64 samples/sec   Loss 11.1230   LearningRate 0.3762   Epoch: 2   Global Step: 13000   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:44:30,300-Speed 10509.38 samples/sec   Loss 11.1376   LearningRate 0.3764   Epoch: 2   Global Step: 13010   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:44:38,098-Speed 10507.86 samples/sec   Loss 11.1273   LearningRate 0.3767   Epoch: 2   Global Step: 13020   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:44:45,902-Speed 10497.75 samples/sec   Loss 11.1065   LearningRate 0.3770   Epoch: 2   Global Step: 13030   Fp16 Grad Scale: 524288   Required: 20 hours
Training: 2022-01-15 17:44:53,665-Speed 10554.17 samples/sec   Loss 11.1460   LearningRate 0.3773   Epoch: 2   Global Step: 13040   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:45:01,499-Speed 10459.34 samples/sec   Loss 11.2171   LearningRate 0.3776   Epoch: 2   Global Step: 13050   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:45:09,327-Speed 10466.55 samples/sec   Loss 11.2399   LearningRate 0.3779   Epoch: 2   Global Step: 13060   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:45:17,174-Speed 10442.37 samples/sec   Loss 11.1518   LearningRate 0.3782   Epoch: 2   Global Step: 13070   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:45:24,949-Speed 10539.38 samples/sec   Loss 11.2482   LearningRate 0.3785   Epoch: 2   Global Step: 13080   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:45:32,753-Speed 10498.97 samples/sec   Loss 11.2064   LearningRate 0.3788   Epoch: 2   Global Step: 13090   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:45:40,578-Speed 10472.24 samples/sec   Loss 11.1599   LearningRate 0.3791   Epoch: 2   Global Step: 13100   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:45:48,377-Speed 10505.45 samples/sec   Loss 11.0665   LearningRate 0.3793   Epoch: 2   Global Step: 13110   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:45:56,182-Speed 10499.47 samples/sec   Loss 11.1183   LearningRate 0.3796   Epoch: 2   Global Step: 13120   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:46:04,043-Speed 10423.69 samples/sec   Loss 11.0925   LearningRate 0.3799   Epoch: 2   Global Step: 13130   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:46:11,850-Speed 10494.90 samples/sec   Loss 11.1393   LearningRate 0.3802   Epoch: 2   Global Step: 13140   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:46:19,654-Speed 10499.82 samples/sec   Loss 11.2383   LearningRate 0.3805   Epoch: 2   Global Step: 13150   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:46:27,493-Speed 10457.60 samples/sec   Loss 11.1904   LearningRate 0.3808   Epoch: 2   Global Step: 13160   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:46:35,343-Speed 10442.13 samples/sec   Loss 11.1475   LearningRate 0.3811   Epoch: 2   Global Step: 13170   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:46:43,168-Speed 10478.05 samples/sec   Loss 11.4029   LearningRate 0.3814   Epoch: 2   Global Step: 13180   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:46:50,992-Speed 10472.61 samples/sec   Loss 11.2161   LearningRate 0.3817   Epoch: 2   Global Step: 13190   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:46:58,833-Speed 10451.87 samples/sec   Loss 11.3283   LearningRate 0.3819   Epoch: 2   Global Step: 13200   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:47:06,637-Speed 10500.34 samples/sec   Loss 11.1342   LearningRate 0.3822   Epoch: 2   Global Step: 13210   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:47:14,452-Speed 10485.19 samples/sec   Loss 11.1879   LearningRate 0.3825   Epoch: 2   Global Step: 13220   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:47:22,249-Speed 10507.48 samples/sec   Loss 11.1589   LearningRate 0.3828   Epoch: 2   Global Step: 13230   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:47:30,067-Speed 10482.24 samples/sec   Loss 11.1753   LearningRate 0.3831   Epoch: 2   Global Step: 13240   Fp16 Grad Scale: 524288   Required: 20 hours
Training: 2022-01-15 17:47:37,874-Speed 10495.49 samples/sec   Loss 11.1835   LearningRate 0.3834   Epoch: 2   Global Step: 13250   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:47:45,699-Speed 10471.58 samples/sec   Loss 11.2122   LearningRate 0.3837   Epoch: 2   Global Step: 13260   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:47:53,499-Speed 10504.10 samples/sec   Loss 11.1867   LearningRate 0.3840   Epoch: 2   Global Step: 13270   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:48:01,343-Speed 10446.26 samples/sec   Loss 11.1795   LearningRate 0.3843   Epoch: 2   Global Step: 13280   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:48:09,145-Speed 10502.71 samples/sec   Loss 11.1222   LearningRate 0.3845   Epoch: 2   Global Step: 13290   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:48:16,951-Speed 10495.66 samples/sec   Loss 11.2074   LearningRate 0.3848   Epoch: 2   Global Step: 13300   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:48:24,786-Speed 10458.50 samples/sec   Loss 11.2291   LearningRate 0.3851   Epoch: 2   Global Step: 13310   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:48:32,620-Speed 10459.44 samples/sec   Loss 11.2779   LearningRate 0.3854   Epoch: 2   Global Step: 13320   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:48:40,421-Speed 10502.22 samples/sec   Loss 11.2377   LearningRate 0.3857   Epoch: 2   Global Step: 13330   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:48:48,219-Speed 10508.20 samples/sec   Loss 11.3052   LearningRate 0.3860   Epoch: 2   Global Step: 13340   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:48:56,019-Speed 10505.04 samples/sec   Loss 11.2529   LearningRate 0.3863   Epoch: 2   Global Step: 13350   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:49:03,805-Speed 10523.17 samples/sec   Loss 11.2933   LearningRate 0.3866   Epoch: 2   Global Step: 13360   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:49:11,633-Speed 10467.25 samples/sec   Loss 11.1773   LearningRate 0.3869   Epoch: 2   Global Step: 13370   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:49:19,437-Speed 10498.81 samples/sec   Loss 11.1668   LearningRate 0.3872   Epoch: 2   Global Step: 13380   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:49:27,240-Speed 10501.21 samples/sec   Loss 11.1327   LearningRate 0.3874   Epoch: 2   Global Step: 13390   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:49:35,058-Speed 10480.38 samples/sec   Loss 11.1186   LearningRate 0.3877   Epoch: 2   Global Step: 13400   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:49:42,857-Speed 10510.04 samples/sec   Loss 11.1404   LearningRate 0.3880   Epoch: 2   Global Step: 13410   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:49:50,680-Speed 10473.31 samples/sec   Loss 11.1363   LearningRate 0.3883   Epoch: 2   Global Step: 13420   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:49:58,467-Speed 10520.48 samples/sec   Loss 11.1994   LearningRate 0.3886   Epoch: 2   Global Step: 13430   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:50:06,268-Speed 10507.15 samples/sec   Loss 11.2238   LearningRate 0.3889   Epoch: 2   Global Step: 13440   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:50:14,052-Speed 10525.95 samples/sec   Loss 11.2633   LearningRate 0.3892   Epoch: 2   Global Step: 13450   Fp16 Grad Scale: 524288   Required: 20 hours
Training: 2022-01-15 17:50:21,832-Speed 10531.26 samples/sec   Loss 11.1666   LearningRate 0.3895   Epoch: 2   Global Step: 13460   Fp16 Grad Scale: 524288   Required: 20 hours
Training: 2022-01-15 17:50:29,662-Speed 10464.51 samples/sec   Loss 11.1303   LearningRate 0.3898   Epoch: 2   Global Step: 13470   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:50:37,462-Speed 10503.83 samples/sec   Loss 11.2371   LearningRate 0.3900   Epoch: 2   Global Step: 13480   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:50:45,266-Speed 10499.84 samples/sec   Loss 11.2361   LearningRate 0.3903   Epoch: 2   Global Step: 13490   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:50:53,071-Speed 10497.84 samples/sec   Loss 11.3397   LearningRate 0.3906   Epoch: 2   Global Step: 13500   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:51:00,864-Speed 10515.32 samples/sec   Loss 11.2244   LearningRate 0.3909   Epoch: 2   Global Step: 13510   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:51:08,637-Speed 10540.52 samples/sec   Loss 11.1677   LearningRate 0.3912   Epoch: 2   Global Step: 13520   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:51:16,407-Speed 10545.55 samples/sec   Loss 11.2854   LearningRate 0.3915   Epoch: 2   Global Step: 13530   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:51:24,196-Speed 10519.57 samples/sec   Loss 11.2405   LearningRate 0.3918   Epoch: 2   Global Step: 13540   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:51:31,980-Speed 10526.00 samples/sec   Loss 11.2573   LearningRate 0.3921   Epoch: 2   Global Step: 13550   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:51:39,782-Speed 10501.30 samples/sec   Loss 11.2180   LearningRate 0.3924   Epoch: 2   Global Step: 13560   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:51:47,585-Speed 10499.64 samples/sec   Loss 11.2469   LearningRate 0.3927   Epoch: 2   Global Step: 13570   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:51:55,380-Speed 10510.42 samples/sec   Loss 11.1716   LearningRate 0.3929   Epoch: 2   Global Step: 13580   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:52:03,170-Speed 10518.55 samples/sec   Loss 11.0682   LearningRate 0.3932   Epoch: 2   Global Step: 13590   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:52:10,969-Speed 10504.93 samples/sec   Loss 11.2394   LearningRate 0.3935   Epoch: 2   Global Step: 13600   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:52:18,753-Speed 10525.54 samples/sec   Loss 11.2716   LearningRate 0.3938   Epoch: 2   Global Step: 13610   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:52:26,550-Speed 10508.34 samples/sec   Loss 11.2205   LearningRate 0.3941   Epoch: 2   Global Step: 13620   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:52:34,339-Speed 10517.61 samples/sec   Loss 11.3248   LearningRate 0.3944   Epoch: 2   Global Step: 13630   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:52:42,111-Speed 10542.19 samples/sec   Loss 11.2070   LearningRate 0.3947   Epoch: 2   Global Step: 13640   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:52:49,936-Speed 10470.49 samples/sec   Loss 11.2795   LearningRate 0.3950   Epoch: 2   Global Step: 13650   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:52:57,727-Speed 10515.98 samples/sec   Loss 11.1439   LearningRate 0.3953   Epoch: 2   Global Step: 13660   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:53:05,553-Speed 10469.56 samples/sec   Loss 11.2655   LearningRate 0.3955   Epoch: 2   Global Step: 13670   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:53:13,372-Speed 10478.07 samples/sec   Loss 12.0651   LearningRate 0.3958   Epoch: 2   Global Step: 13680   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:53:21,165-Speed 10514.03 samples/sec   Loss 13.2573   LearningRate 0.3961   Epoch: 2   Global Step: 13690   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:53:28,948-Speed 10528.49 samples/sec   Loss 12.5850   LearningRate 0.3964   Epoch: 2   Global Step: 13700   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:53:36,739-Speed 10516.53 samples/sec   Loss 11.9859   LearningRate 0.3967   Epoch: 2   Global Step: 13710   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:53:44,517-Speed 10534.98 samples/sec   Loss 11.4907   LearningRate 0.3970   Epoch: 2   Global Step: 13720   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:53:52,333-Speed 10484.74 samples/sec   Loss 11.4102   LearningRate 0.3973   Epoch: 2   Global Step: 13730   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:54:00,150-Speed 10481.33 samples/sec   Loss 11.2825   LearningRate 0.3976   Epoch: 2   Global Step: 13740   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:54:07,955-Speed 10499.13 samples/sec   Loss 11.2098   LearningRate 0.3979   Epoch: 2   Global Step: 13750   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:54:15,749-Speed 10512.03 samples/sec   Loss 11.1532   LearningRate 0.3981   Epoch: 2   Global Step: 13760   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:54:23,549-Speed 10507.08 samples/sec   Loss 11.0963   LearningRate 0.3984   Epoch: 2   Global Step: 13770   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:54:31,380-Speed 10464.98 samples/sec   Loss 11.2152   LearningRate 0.3987   Epoch: 2   Global Step: 13780   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:54:39,189-Speed 10491.53 samples/sec   Loss 11.1040   LearningRate 0.3990   Epoch: 2   Global Step: 13790   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:54:47,012-Speed 10473.06 samples/sec   Loss 11.1940   LearningRate 0.3993   Epoch: 2   Global Step: 13800   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:54:54,788-Speed 10537.76 samples/sec   Loss 11.1627   LearningRate 0.3996   Epoch: 2   Global Step: 13810   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:55:02,588-Speed 10511.17 samples/sec   Loss 11.1589   LearningRate 0.3999   Epoch: 2   Global Step: 13820   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:55:10,386-Speed 10505.49 samples/sec   Loss 11.2428   LearningRate 0.4002   Epoch: 2   Global Step: 13830   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:55:18,207-Speed 10476.11 samples/sec   Loss 11.1386   LearningRate 0.4005   Epoch: 2   Global Step: 13840   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:55:26,004-Speed 10511.40 samples/sec   Loss 11.3230   LearningRate 0.4008   Epoch: 2   Global Step: 13850   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:55:33,788-Speed 10525.71 samples/sec   Loss 11.2764   LearningRate 0.4010   Epoch: 2   Global Step: 13860   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:55:41,575-Speed 10521.86 samples/sec   Loss 11.1915   LearningRate 0.4013   Epoch: 2   Global Step: 13870   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:55:49,360-Speed 10523.11 samples/sec   Loss 11.2416   LearningRate 0.4016   Epoch: 2   Global Step: 13880   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:55:57,130-Speed 10545.66 samples/sec   Loss 11.1974   LearningRate 0.4019   Epoch: 2   Global Step: 13890   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:56:04,917-Speed 10522.04 samples/sec   Loss 11.3124   LearningRate 0.4022   Epoch: 2   Global Step: 13900   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:56:12,703-Speed 10524.13 samples/sec   Loss 11.2227   LearningRate 0.4025   Epoch: 2   Global Step: 13910   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:56:20,472-Speed 10546.24 samples/sec   Loss 11.3028   LearningRate 0.4028   Epoch: 2   Global Step: 13920   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:56:28,257-Speed 10524.19 samples/sec   Loss 11.2438   LearningRate 0.4031   Epoch: 2   Global Step: 13930   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:56:36,049-Speed 10515.32 samples/sec   Loss 11.3071   LearningRate 0.4034   Epoch: 2   Global Step: 13940   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:56:43,900-Speed 10437.58 samples/sec   Loss 11.2348   LearningRate 0.4036   Epoch: 2   Global Step: 13950   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:56:51,678-Speed 10533.51 samples/sec   Loss 11.3294   LearningRate 0.4039   Epoch: 2   Global Step: 13960   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:56:59,466-Speed 10520.02 samples/sec   Loss 11.2359   LearningRate 0.4042   Epoch: 2   Global Step: 13970   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:57:07,271-Speed 10498.22 samples/sec   Loss 11.2516   LearningRate 0.4045   Epoch: 2   Global Step: 13980   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:57:15,043-Speed 10541.72 samples/sec   Loss 11.2812   LearningRate 0.4048   Epoch: 2   Global Step: 13990   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:57:22,854-Speed 10488.43 samples/sec   Loss 11.3345   LearningRate 0.4051   Epoch: 2   Global Step: 14000   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:57:30,672-Speed 10481.51 samples/sec   Loss 11.3303   LearningRate 0.4054   Epoch: 2   Global Step: 14010   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:57:38,454-Speed 10528.43 samples/sec   Loss 11.2954   LearningRate 0.4057   Epoch: 2   Global Step: 14020   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:57:46,274-Speed 10478.25 samples/sec   Loss 11.2596   LearningRate 0.4060   Epoch: 2   Global Step: 14030   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:57:54,070-Speed 10509.50 samples/sec   Loss 11.3127   LearningRate 0.4062   Epoch: 2   Global Step: 14040   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:58:01,843-Speed 10541.46 samples/sec   Loss 11.4430   LearningRate 0.4065   Epoch: 2   Global Step: 14050   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:58:09,613-Speed 10545.68 samples/sec   Loss 11.2597   LearningRate 0.4068   Epoch: 2   Global Step: 14060   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:58:17,407-Speed 10511.90 samples/sec   Loss 11.2783   LearningRate 0.4071   Epoch: 2   Global Step: 14070   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:58:25,191-Speed 10525.26 samples/sec   Loss 11.3027   LearningRate 0.4074   Epoch: 2   Global Step: 14080   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:58:33,004-Speed 10489.06 samples/sec   Loss 11.2588   LearningRate 0.4077   Epoch: 2   Global Step: 14090   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:58:40,827-Speed 10474.69 samples/sec   Loss 11.2712   LearningRate 0.4080   Epoch: 2   Global Step: 14100   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:58:48,684-Speed 10427.15 samples/sec   Loss 11.3011   LearningRate 0.4083   Epoch: 2   Global Step: 14110   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:58:56,492-Speed 10494.32 samples/sec   Loss 11.3248   LearningRate 0.4086   Epoch: 2   Global Step: 14120   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:59:04,293-Speed 10509.08 samples/sec   Loss 11.2896   LearningRate 0.4089   Epoch: 2   Global Step: 14130   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 17:59:12,095-Speed 10501.75 samples/sec   Loss 11.2286   LearningRate 0.4091   Epoch: 2   Global Step: 14140   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:59:19,914-Speed 10477.72 samples/sec   Loss 11.3222   LearningRate 0.4094   Epoch: 2   Global Step: 14150   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:59:27,705-Speed 10517.53 samples/sec   Loss 11.3170   LearningRate 0.4097   Epoch: 2   Global Step: 14160   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:59:35,513-Speed 10493.98 samples/sec   Loss 11.2199   LearningRate 0.4100   Epoch: 2   Global Step: 14170   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:59:43,332-Speed 10479.38 samples/sec   Loss 11.3206   LearningRate 0.4103   Epoch: 2   Global Step: 14180   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:59:51,132-Speed 10507.08 samples/sec   Loss 11.3088   LearningRate 0.4106   Epoch: 2   Global Step: 14190   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 17:59:58,948-Speed 10483.24 samples/sec   Loss 11.3914   LearningRate 0.4109   Epoch: 2   Global Step: 14200   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 18:00:06,754-Speed 10497.52 samples/sec   Loss 11.3178   LearningRate 0.4112   Epoch: 2   Global Step: 14210   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 18:00:14,575-Speed 10476.54 samples/sec   Loss 11.2491   LearningRate 0.4115   Epoch: 2   Global Step: 14220   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 18:00:22,392-Speed 10481.58 samples/sec   Loss 11.2652   LearningRate 0.4117   Epoch: 2   Global Step: 14230   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 18:00:30,204-Speed 10488.65 samples/sec   Loss 11.3220   LearningRate 0.4120   Epoch: 2   Global Step: 14240   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 18:00:38,037-Speed 10459.60 samples/sec   Loss 11.3701   LearningRate 0.4123   Epoch: 2   Global Step: 14250   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 18:00:45,866-Speed 10465.45 samples/sec   Loss 11.2871   LearningRate 0.4126   Epoch: 2   Global Step: 14260   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 18:00:53,712-Speed 10444.75 samples/sec   Loss 11.3335   LearningRate 0.4129   Epoch: 2   Global Step: 14270   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 18:01:01,512-Speed 10503.64 samples/sec   Loss 11.2869   LearningRate 0.4132   Epoch: 2   Global Step: 14280   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 18:01:09,308-Speed 10511.03 samples/sec   Loss 11.2795   LearningRate 0.4135   Epoch: 2   Global Step: 14290   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 18:01:17,109-Speed 10507.38 samples/sec   Loss 11.3423   LearningRate 0.4138   Epoch: 2   Global Step: 14300   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 18:01:24,917-Speed 10493.08 samples/sec   Loss 11.3457   LearningRate 0.4141   Epoch: 2   Global Step: 14310   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 18:01:32,713-Speed 10509.03 samples/sec   Loss 11.3680   LearningRate 0.4144   Epoch: 2   Global Step: 14320   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 18:01:40,500-Speed 10526.76 samples/sec   Loss 11.3425   LearningRate 0.4146   Epoch: 2   Global Step: 14330   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 18:01:48,308-Speed 10496.53 samples/sec   Loss 11.3496   LearningRate 0.4149   Epoch: 2   Global Step: 14340   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 18:01:56,081-Speed 10539.93 samples/sec   Loss 11.2907   LearningRate 0.4152   Epoch: 2   Global Step: 14350   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 18:02:03,927-Speed 10448.04 samples/sec   Loss 11.3107   LearningRate 0.4155   Epoch: 2   Global Step: 14360   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 18:02:11,718-Speed 10517.44 samples/sec   Loss 11.2975   LearningRate 0.4158   Epoch: 2   Global Step: 14370   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 18:02:19,494-Speed 10536.13 samples/sec   Loss 11.2613   LearningRate 0.4161   Epoch: 2   Global Step: 14380   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 18:02:27,284-Speed 10518.20 samples/sec   Loss 11.3151   LearningRate 0.4164   Epoch: 2   Global Step: 14390   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 18:02:35,080-Speed 10509.28 samples/sec   Loss 11.3425   LearningRate 0.4167   Epoch: 2   Global Step: 14400   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 18:02:42,911-Speed 10462.76 samples/sec   Loss 11.5044   LearningRate 0.4170   Epoch: 2   Global Step: 14410   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 18:02:50,689-Speed 10539.67 samples/sec   Loss 11.4723   LearningRate 0.4172   Epoch: 2   Global Step: 14420   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 18:02:58,488-Speed 10506.12 samples/sec   Loss 11.3887   LearningRate 0.4175   Epoch: 2   Global Step: 14430   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 18:03:06,270-Speed 10528.68 samples/sec   Loss 11.3700   LearningRate 0.4178   Epoch: 2   Global Step: 14440   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 18:03:14,092-Speed 10474.83 samples/sec   Loss 11.3231   LearningRate 0.4181   Epoch: 2   Global Step: 14450   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 18:03:21,980-Speed 10388.79 samples/sec   Loss 11.2703   LearningRate 0.4184   Epoch: 2   Global Step: 14460   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 18:03:29,783-Speed 10500.35 samples/sec   Loss 11.4473   LearningRate 0.4187   Epoch: 2   Global Step: 14470   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 18:03:37,600-Speed 10480.49 samples/sec   Loss 11.3316   LearningRate 0.4190   Epoch: 2   Global Step: 14480   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 18:03:45,384-Speed 10526.38 samples/sec   Loss 11.3035   LearningRate 0.4193   Epoch: 2   Global Step: 14490   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 18:03:53,174-Speed 10517.69 samples/sec   Loss 11.2937   LearningRate 0.4196   Epoch: 2   Global Step: 14500   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 18:04:00,963-Speed 10518.81 samples/sec   Loss 11.3627   LearningRate 0.4198   Epoch: 2   Global Step: 14510   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 18:04:08,756-Speed 10514.39 samples/sec   Loss 11.4959   LearningRate 0.4201   Epoch: 2   Global Step: 14520   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 18:04:16,609-Speed 10433.26 samples/sec   Loss 11.4147   LearningRate 0.4204   Epoch: 2   Global Step: 14530   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 18:04:24,446-Speed 10455.22 samples/sec   Loss 11.3157   LearningRate 0.4207   Epoch: 2   Global Step: 14540   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 18:04:32,263-Speed 10482.90 samples/sec   Loss 11.3228   LearningRate 0.4210   Epoch: 2   Global Step: 14550   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 18:04:40,100-Speed 10455.24 samples/sec   Loss 11.2717   LearningRate 0.4213   Epoch: 2   Global Step: 14560   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 18:04:47,917-Speed 10480.87 samples/sec   Loss 11.4113   LearningRate 0.4216   Epoch: 2   Global Step: 14570   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 18:04:55,751-Speed 10459.43 samples/sec   Loss 11.2948   LearningRate 0.4219   Epoch: 2   Global Step: 14580   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 18:05:03,601-Speed 10438.21 samples/sec   Loss 11.3428   LearningRate 0.4222   Epoch: 2   Global Step: 14590   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 18:05:11,437-Speed 10454.80 samples/sec   Loss 11.3661   LearningRate 0.4225   Epoch: 2   Global Step: 14600   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 18:05:19,273-Speed 10457.08 samples/sec   Loss 11.3019   LearningRate 0.4227   Epoch: 2   Global Step: 14610   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 18:05:27,077-Speed 10499.12 samples/sec   Loss 11.4225   LearningRate 0.4230   Epoch: 2   Global Step: 14620   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 18:05:34,907-Speed 10464.35 samples/sec   Loss 11.3066   LearningRate 0.4233   Epoch: 2   Global Step: 14630   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 18:05:42,694-Speed 10523.53 samples/sec   Loss 11.3711   LearningRate 0.4236   Epoch: 2   Global Step: 14640   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 18:05:50,476-Speed 10530.27 samples/sec   Loss 11.3260   LearningRate 0.4239   Epoch: 2   Global Step: 14650   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 18:05:58,276-Speed 10504.12 samples/sec   Loss 11.3512   LearningRate 0.4242   Epoch: 2   Global Step: 14660   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 18:06:06,080-Speed 10499.55 samples/sec   Loss 11.4070   LearningRate 0.4245   Epoch: 2   Global Step: 14670   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-15 18:06:13,925-Speed 10445.30 samples/sec   Loss 11.3927   LearningRate 0.4248   Epoch: 2   Global Step: 14680   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 18:06:21,735-Speed 10490.66 samples/sec   Loss 11.5101   LearningRate 0.4251   Epoch: 2   Global Step: 14690   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 18:06:29,575-Speed 10453.27 samples/sec   Loss 11.4411   LearningRate 0.4253   Epoch: 2   Global Step: 14700   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 18:06:37,395-Speed 10477.68 samples/sec   Loss 11.3339   LearningRate 0.4256   Epoch: 2   Global Step: 14710   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 18:06:45,181-Speed 10524.53 samples/sec   Loss 11.3067   LearningRate 0.4259   Epoch: 2   Global Step: 14720   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 18:06:52,984-Speed 10499.49 samples/sec   Loss 11.3913   LearningRate 0.4262   Epoch: 2   Global Step: 14730   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 18:07:00,783-Speed 10506.51 samples/sec   Loss 11.3620   LearningRate 0.4265   Epoch: 2   Global Step: 14740   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 18:07:08,574-Speed 10516.85 samples/sec   Loss 11.3724   LearningRate 0.4268   Epoch: 2   Global Step: 14750   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 18:07:16,383-Speed 10491.33 samples/sec   Loss 11.3390   LearningRate 0.4271   Epoch: 2   Global Step: 14760   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-15 18:07:24,220-Speed 10453.69 samples/sec   Loss 11.3780   LearningRate 0.4274   Epoch: 2   Global Step: 14770   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:07:31,998-Speed 10533.77 samples/sec   Loss 11.2857   LearningRate 0.4277   Epoch: 2   Global Step: 14780   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:07:39,792-Speed 10513.96 samples/sec   Loss 11.6025   LearningRate 0.4280   Epoch: 2   Global Step: 14790   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:07:47,604-Speed 10487.51 samples/sec   Loss 11.4860   LearningRate 0.4282   Epoch: 2   Global Step: 14800   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:07:55,397-Speed 10513.83 samples/sec   Loss 11.3576   LearningRate 0.4285   Epoch: 2   Global Step: 14810   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:08:03,188-Speed 10515.41 samples/sec   Loss 11.3811   LearningRate 0.4288   Epoch: 2   Global Step: 14820   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:08:10,990-Speed 10502.24 samples/sec   Loss 11.3641   LearningRate 0.4291   Epoch: 2   Global Step: 14830   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:08:18,839-Speed 10438.86 samples/sec   Loss 11.3962   LearningRate 0.4294   Epoch: 2   Global Step: 14840   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:08:26,615-Speed 10536.77 samples/sec   Loss 11.3557   LearningRate 0.4297   Epoch: 2   Global Step: 14850   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:08:34,413-Speed 10507.22 samples/sec   Loss 11.4245   LearningRate 0.4300   Epoch: 2   Global Step: 14860   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:08:42,220-Speed 10494.82 samples/sec   Loss 11.3670   LearningRate 0.4303   Epoch: 2   Global Step: 14870   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:08:50,036-Speed 10483.24 samples/sec   Loss 11.3829   LearningRate 0.4306   Epoch: 2   Global Step: 14880   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:08:57,829-Speed 10513.84 samples/sec   Loss 11.4363   LearningRate 0.4308   Epoch: 2   Global Step: 14890   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:09:05,638-Speed 10492.16 samples/sec   Loss 11.4235   LearningRate 0.4311   Epoch: 2   Global Step: 14900   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:09:13,511-Speed 10407.27 samples/sec   Loss 11.3923   LearningRate 0.4314   Epoch: 2   Global Step: 14910   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:09:21,341-Speed 10472.27 samples/sec   Loss 11.4905   LearningRate 0.4317   Epoch: 2   Global Step: 14920   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:09:29,167-Speed 10468.46 samples/sec   Loss 11.4317   LearningRate 0.4320   Epoch: 2   Global Step: 14930   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:09:36,950-Speed 10528.89 samples/sec   Loss 11.3597   LearningRate 0.4323   Epoch: 2   Global Step: 14940   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:09:44,750-Speed 10504.66 samples/sec   Loss 11.5007   LearningRate 0.4326   Epoch: 2   Global Step: 14950   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:09:52,552-Speed 10501.09 samples/sec   Loss 11.4424   LearningRate 0.4329   Epoch: 2   Global Step: 14960   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:10:00,380-Speed 10472.72 samples/sec   Loss 11.4122   LearningRate 0.4332   Epoch: 2   Global Step: 14970   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:10:08,191-Speed 10490.28 samples/sec   Loss 11.3921   LearningRate 0.4334   Epoch: 2   Global Step: 14980   Fp16 Grad Scale: 524288   Required: 19 hours
Training: 2022-01-15 18:10:16,001-Speed 10498.12 samples/sec   Loss 11.4477   LearningRate 0.4337   Epoch: 2   Global Step: 14990   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:10:23,825-Speed 10472.96 samples/sec   Loss 11.3779   LearningRate 0.4340   Epoch: 2   Global Step: 15000   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:10:31,621-Speed 10512.91 samples/sec   Loss 11.4120   LearningRate 0.4343   Epoch: 2   Global Step: 15010   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:10:39,467-Speed 10442.63 samples/sec   Loss 11.5130   LearningRate 0.4346   Epoch: 2   Global Step: 15020   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:10:47,253-Speed 10521.97 samples/sec   Loss 11.3883   LearningRate 0.4349   Epoch: 2   Global Step: 15030   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:10:55,073-Speed 10481.20 samples/sec   Loss 11.5630   LearningRate 0.4352   Epoch: 2   Global Step: 15040   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:11:02,953-Speed 10397.19 samples/sec   Loss 11.4459   LearningRate 0.4355   Epoch: 2   Global Step: 15050   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:11:10,754-Speed 10503.11 samples/sec   Loss 11.4011   LearningRate 0.4358   Epoch: 2   Global Step: 15060   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:11:18,535-Speed 10530.12 samples/sec   Loss 11.4070   LearningRate 0.4361   Epoch: 2   Global Step: 15070   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:11:26,329-Speed 10512.46 samples/sec   Loss 11.4384   LearningRate 0.4363   Epoch: 2   Global Step: 15080   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:11:34,116-Speed 10521.73 samples/sec   Loss 11.4152   LearningRate 0.4366   Epoch: 2   Global Step: 15090   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:11:41,926-Speed 10490.07 samples/sec   Loss 11.3783   LearningRate 0.4369   Epoch: 2   Global Step: 15100   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:11:49,745-Speed 10479.40 samples/sec   Loss 11.4876   LearningRate 0.4372   Epoch: 2   Global Step: 15110   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:11:57,567-Speed 10475.39 samples/sec   Loss 11.4778   LearningRate 0.4375   Epoch: 2   Global Step: 15120   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:12:05,383-Speed 10486.68 samples/sec   Loss 11.3581   LearningRate 0.4378   Epoch: 2   Global Step: 15130   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:12:13,188-Speed 10497.88 samples/sec   Loss 11.3862   LearningRate 0.4381   Epoch: 2   Global Step: 15140   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:12:21,006-Speed 10481.38 samples/sec   Loss 11.4255   LearningRate 0.4384   Epoch: 2   Global Step: 15150   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:12:28,822-Speed 10482.60 samples/sec   Loss 11.5058   LearningRate 0.4387   Epoch: 2   Global Step: 15160   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:12:36,654-Speed 10461.63 samples/sec   Loss 11.5171   LearningRate 0.4389   Epoch: 2   Global Step: 15170   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:12:44,441-Speed 10521.89 samples/sec   Loss 11.4282   LearningRate 0.4392   Epoch: 2   Global Step: 15180   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:12:52,274-Speed 10460.28 samples/sec   Loss 11.4158   LearningRate 0.4395   Epoch: 2   Global Step: 15190   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:13:00,062-Speed 10521.33 samples/sec   Loss 11.4215   LearningRate 0.4398   Epoch: 2   Global Step: 15200   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:13:07,856-Speed 10512.68 samples/sec   Loss 11.5535   LearningRate 0.4401   Epoch: 2   Global Step: 15210   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:13:15,663-Speed 10495.74 samples/sec   Loss 11.5299   LearningRate 0.4404   Epoch: 2   Global Step: 15220   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:13:23,475-Speed 10492.76 samples/sec   Loss 11.5624   LearningRate 0.4407   Epoch: 2   Global Step: 15230   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:13:31,275-Speed 10503.78 samples/sec   Loss 11.5520   LearningRate 0.4410   Epoch: 2   Global Step: 15240   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:13:39,044-Speed 10545.29 samples/sec   Loss 12.3371   LearningRate 0.4413   Epoch: 2   Global Step: 15250   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-01-15 18:13:46,841-Speed 10509.86 samples/sec   Loss 12.8725   LearningRate 0.4416   Epoch: 2   Global Step: 15260   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-01-15 18:13:54,640-Speed 10506.33 samples/sec   Loss 13.3339   LearningRate 0.4418   Epoch: 2   Global Step: 15270   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-01-15 18:14:02,426-Speed 10522.90 samples/sec   Loss 12.5485   LearningRate 0.4421   Epoch: 2   Global Step: 15280   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-01-15 18:14:10,219-Speed 10513.93 samples/sec   Loss 11.9106   LearningRate 0.4424   Epoch: 2   Global Step: 15290   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-01-15 18:14:18,017-Speed 10507.85 samples/sec   Loss 11.6442   LearningRate 0.4427   Epoch: 2   Global Step: 15300   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-01-15 18:14:25,802-Speed 10526.19 samples/sec   Loss 11.5157   LearningRate 0.4430   Epoch: 2   Global Step: 15310   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-01-15 18:14:33,594-Speed 10514.73 samples/sec   Loss 11.4547   LearningRate 0.4433   Epoch: 2   Global Step: 15320   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-01-15 18:14:41,409-Speed 10483.53 samples/sec   Loss 11.4201   LearningRate 0.4436   Epoch: 2   Global Step: 15330   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-01-15 18:14:49,205-Speed 10509.76 samples/sec   Loss 11.4542   LearningRate 0.4439   Epoch: 2   Global Step: 15340   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-01-15 18:14:56,997-Speed 10516.89 samples/sec   Loss 11.4268   LearningRate 0.4442   Epoch: 2   Global Step: 15350   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-15 18:15:04,811-Speed 10486.07 samples/sec   Loss 11.4493   LearningRate 0.4444   Epoch: 2   Global Step: 15360   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-15 18:15:12,607-Speed 10509.83 samples/sec   Loss 11.4283   LearningRate 0.4447   Epoch: 2   Global Step: 15370   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-15 18:15:20,401-Speed 10512.94 samples/sec   Loss 11.5602   LearningRate 0.4450   Epoch: 2   Global Step: 15380   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-15 18:15:28,197-Speed 10509.35 samples/sec   Loss 11.5100   LearningRate 0.4453   Epoch: 2   Global Step: 15390   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-15 18:15:35,992-Speed 10512.50 samples/sec   Loss 11.4934   LearningRate 0.4456   Epoch: 2   Global Step: 15400   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-15 18:15:43,809-Speed 10481.78 samples/sec   Loss 11.4690   LearningRate 0.4459   Epoch: 2   Global Step: 15410   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-15 18:15:51,622-Speed 10485.86 samples/sec   Loss 11.4914   LearningRate 0.4462   Epoch: 2   Global Step: 15420   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-15 18:15:59,395-Speed 10541.28 samples/sec   Loss 11.4387   LearningRate 0.4465   Epoch: 2   Global Step: 15430   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-15 18:16:07,184-Speed 10519.57 samples/sec   Loss 11.5773   LearningRate 0.4468   Epoch: 2   Global Step: 15440   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-15 18:16:15,014-Speed 10464.93 samples/sec   Loss 11.5213   LearningRate 0.4470   Epoch: 2   Global Step: 15450   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-15 18:16:22,813-Speed 10504.72 samples/sec   Loss 11.5333   LearningRate 0.4473   Epoch: 2   Global Step: 15460   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-15 18:16:30,615-Speed 10501.49 samples/sec   Loss 11.5515   LearningRate 0.4476   Epoch: 2   Global Step: 15470   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-15 18:16:38,428-Speed 10487.25 samples/sec   Loss 11.5726   LearningRate 0.4479   Epoch: 2   Global Step: 15480   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-15 18:16:46,238-Speed 10490.02 samples/sec   Loss 11.5856   LearningRate 0.4482   Epoch: 2   Global Step: 15490   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-15 18:16:54,067-Speed 10465.16 samples/sec   Loss 11.5398   LearningRate 0.4485   Epoch: 2   Global Step: 15500   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-15 18:17:01,875-Speed 10500.70 samples/sec   Loss 11.4546   LearningRate 0.4488   Epoch: 2   Global Step: 15510   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-15 18:17:09,656-Speed 10529.75 samples/sec   Loss 11.5021   LearningRate 0.4491   Epoch: 2   Global Step: 15520   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-15 18:17:17,465-Speed 10492.99 samples/sec   Loss 11.6297   LearningRate 0.4494   Epoch: 2   Global Step: 15530   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-15 18:17:25,265-Speed 10504.72 samples/sec   Loss 11.5098   LearningRate 0.4497   Epoch: 2   Global Step: 15540   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-15 18:17:33,098-Speed 10460.56 samples/sec   Loss 11.6274   LearningRate 0.4499   Epoch: 2   Global Step: 15550   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:17:55,748-Speed 3616.96 samples/sec   Loss 11.6290   LearningRate 0.4502   Epoch: 3   Global Step: 15560   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:18:03,519-Speed 10544.09 samples/sec   Loss 11.4777   LearningRate 0.4505   Epoch: 3   Global Step: 15570   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:18:11,281-Speed 10555.98 samples/sec   Loss 11.5090   LearningRate 0.4508   Epoch: 3   Global Step: 15580   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:18:19,031-Speed 10572.07 samples/sec   Loss 11.4857   LearningRate 0.4511   Epoch: 3   Global Step: 15590   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:18:26,798-Speed 10549.32 samples/sec   Loss 11.5063   LearningRate 0.4514   Epoch: 3   Global Step: 15600   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:18:34,595-Speed 10508.31 samples/sec   Loss 11.6928   LearningRate 0.4517   Epoch: 3   Global Step: 15610   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-15 18:18:42,368-Speed 10540.12 samples/sec   Loss 11.5525   LearningRate 0.4520   Epoch: 3   Global Step: 15620   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-15 18:18:50,142-Speed 10538.60 samples/sec   Loss 11.4708   LearningRate 0.4523   Epoch: 3   Global Step: 15630   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-15 18:18:57,941-Speed 10506.57 samples/sec   Loss 11.8085   LearningRate 0.4525   Epoch: 3   Global Step: 15640   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-15 18:19:05,730-Speed 10518.86 samples/sec   Loss 11.9956   LearningRate 0.4528   Epoch: 3   Global Step: 15650   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-15 18:19:13,602-Speed 10412.09 samples/sec   Loss 13.5376   LearningRate 0.4531   Epoch: 3   Global Step: 15660   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-01-15 18:19:21,457-Speed 10430.35 samples/sec   Loss 14.0490   LearningRate 0.4534   Epoch: 3   Global Step: 15670   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-01-15 18:19:29,231-Speed 10540.32 samples/sec   Loss 13.0212   LearningRate 0.4537   Epoch: 3   Global Step: 15680   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-01-15 18:19:37,010-Speed 10533.62 samples/sec   Loss 12.2648   LearningRate 0.4540   Epoch: 3   Global Step: 15690   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-01-15 18:19:44,780-Speed 10548.03 samples/sec   Loss 11.8517   LearningRate 0.4543   Epoch: 3   Global Step: 15700   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-01-15 18:19:52,551-Speed 10544.66 samples/sec   Loss 11.6415   LearningRate 0.4546   Epoch: 3   Global Step: 15710   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-01-15 18:20:00,335-Speed 10525.17 samples/sec   Loss 11.6004   LearningRate 0.4549   Epoch: 3   Global Step: 15720   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-01-15 18:20:08,110-Speed 10538.27 samples/sec   Loss 11.6001   LearningRate 0.4552   Epoch: 3   Global Step: 15730   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-01-15 18:20:15,869-Speed 10561.66 samples/sec   Loss 11.5529   LearningRate 0.4554   Epoch: 3   Global Step: 15740   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-01-15 18:20:23,687-Speed 10479.84 samples/sec   Loss 11.5184   LearningRate 0.4557   Epoch: 3   Global Step: 15750   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-01-15 18:20:31,491-Speed 10498.82 samples/sec   Loss 11.6285   LearningRate 0.4560   Epoch: 3   Global Step: 15760   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-01-15 18:20:39,245-Speed 10566.46 samples/sec   Loss 11.4928   LearningRate 0.4563   Epoch: 3   Global Step: 15770   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-01-15 18:20:47,013-Speed 10548.88 samples/sec   Loss 11.6395   LearningRate 0.4566   Epoch: 3   Global Step: 15780   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-01-15 18:20:54,814-Speed 10502.81 samples/sec   Loss 11.5869   LearningRate 0.4569   Epoch: 3   Global Step: 15790   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-01-15 18:21:02,600-Speed 10522.51 samples/sec   Loss 11.5833   LearningRate 0.4572   Epoch: 3   Global Step: 15800   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-01-15 18:21:10,403-Speed 10500.23 samples/sec   Loss 11.5780   LearningRate 0.4575   Epoch: 3   Global Step: 15810   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-01-15 18:21:18,227-Speed 10471.94 samples/sec   Loss 11.5646   LearningRate 0.4578   Epoch: 3   Global Step: 15820   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-01-15 18:21:26,023-Speed 10510.75 samples/sec   Loss 11.5159   LearningRate 0.4580   Epoch: 3   Global Step: 15830   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-01-15 18:21:33,863-Speed 10450.15 samples/sec   Loss 11.6390   LearningRate 0.4583   Epoch: 3   Global Step: 15840   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-01-15 18:21:41,642-Speed 10533.37 samples/sec   Loss 11.6842   LearningRate 0.4586   Epoch: 3   Global Step: 15850   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-01-15 18:21:49,437-Speed 10511.37 samples/sec   Loss 11.7046   LearningRate 0.4589   Epoch: 3   Global Step: 15860   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-01-15 18:21:57,245-Speed 10493.42 samples/sec   Loss 11.5431   LearningRate 0.4592   Epoch: 3   Global Step: 15870   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-01-15 18:22:05,014-Speed 10545.76 samples/sec   Loss 11.7126   LearningRate 0.4595   Epoch: 3   Global Step: 15880   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-01-15 18:22:12,856-Speed 10449.88 samples/sec   Loss 11.6695   LearningRate 0.4598   Epoch: 3   Global Step: 15890   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-01-15 18:22:20,650-Speed 10522.40 samples/sec   Loss 11.7897   LearningRate 0.4601   Epoch: 3   Global Step: 15900   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-01-15 18:22:28,483-Speed 10459.15 samples/sec   Loss 11.6151   LearningRate 0.4604   Epoch: 3   Global Step: 15910   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-01-15 18:22:36,294-Speed 10488.83 samples/sec   Loss 11.7332   LearningRate 0.4606   Epoch: 3   Global Step: 15920   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-01-15 18:22:44,099-Speed 10497.72 samples/sec   Loss 11.6407   LearningRate 0.4609   Epoch: 3   Global Step: 15930   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-01-15 18:22:51,904-Speed 10498.71 samples/sec   Loss 11.6824   LearningRate 0.4612   Epoch: 3   Global Step: 15940   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-01-15 18:22:59,714-Speed 10490.52 samples/sec   Loss 11.6342   LearningRate 0.4615   Epoch: 3   Global Step: 15950   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-01-15 18:23:07,541-Speed 10468.70 samples/sec   Loss 11.7166   LearningRate 0.4618   Epoch: 3   Global Step: 15960   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-15 18:23:15,382-Speed 10449.94 samples/sec   Loss 11.6185   LearningRate 0.4621   Epoch: 3   Global Step: 15970   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-15 18:23:23,191-Speed 10493.59 samples/sec   Loss 11.5882   LearningRate 0.4624   Epoch: 3   Global Step: 15980   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-15 18:23:31,027-Speed 10454.54 samples/sec   Loss 11.7330   LearningRate 0.4627   Epoch: 3   Global Step: 15990   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-15 18:23:38,862-Speed 10458.53 samples/sec   Loss 11.6200   LearningRate 0.4630   Epoch: 3   Global Step: 16000   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-15 18:23:46,721-Speed 10424.74 samples/sec   Loss 11.7956   LearningRate 0.4633   Epoch: 3   Global Step: 16010   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-15 18:23:54,543-Speed 10474.23 samples/sec   Loss 11.8032   LearningRate 0.4635   Epoch: 3   Global Step: 16020   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-15 18:24:02,422-Speed 10400.88 samples/sec   Loss 11.6868   LearningRate 0.4638   Epoch: 3   Global Step: 16030   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-15 18:24:10,264-Speed 10447.04 samples/sec   Loss 11.7434   LearningRate 0.4641   Epoch: 3   Global Step: 16040   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-15 18:24:18,109-Speed 10444.84 samples/sec   Loss 11.7473   LearningRate 0.4644   Epoch: 3   Global Step: 16050   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-15 18:24:25,968-Speed 10425.28 samples/sec   Loss 11.7611   LearningRate 0.4647   Epoch: 3   Global Step: 16060   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-15 18:24:33,823-Speed 10430.28 samples/sec   Loss 11.6788   LearningRate 0.4650   Epoch: 3   Global Step: 16070   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-15 18:24:41,680-Speed 10434.10 samples/sec   Loss 11.6443   LearningRate 0.4653   Epoch: 3   Global Step: 16080   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-15 18:24:49,515-Speed 10457.14 samples/sec   Loss 11.6990   LearningRate 0.4656   Epoch: 3   Global Step: 16090   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-15 18:24:57,362-Speed 10440.96 samples/sec   Loss 11.6650   LearningRate 0.4659   Epoch: 3   Global Step: 16100   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-15 18:25:05,208-Speed 10442.45 samples/sec   Loss 11.5835   LearningRate 0.4661   Epoch: 3   Global Step: 16110   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-15 18:25:13,058-Speed 10445.82 samples/sec   Loss 11.7084   LearningRate 0.4664   Epoch: 3   Global Step: 16120   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-15 18:25:20,885-Speed 10467.72 samples/sec   Loss 11.6602   LearningRate 0.4667   Epoch: 3   Global Step: 16130   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-15 18:25:28,713-Speed 10466.22 samples/sec   Loss 11.6141   LearningRate 0.4670   Epoch: 3   Global Step: 16140   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-15 18:25:36,557-Speed 10446.62 samples/sec   Loss 11.6595   LearningRate 0.4673   Epoch: 3   Global Step: 16150   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-15 18:25:44,380-Speed 10475.14 samples/sec   Loss 11.7081   LearningRate 0.4676   Epoch: 3   Global Step: 16160   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:25:52,198-Speed 10479.44 samples/sec   Loss 11.7111   LearningRate 0.4679   Epoch: 3   Global Step: 16170   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:26:00,024-Speed 10470.11 samples/sec   Loss 11.7502   LearningRate 0.4682   Epoch: 3   Global Step: 16180   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:26:07,899-Speed 10405.24 samples/sec   Loss 11.6765   LearningRate 0.4685   Epoch: 3   Global Step: 16190   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:26:15,724-Speed 10469.45 samples/sec   Loss 11.6295   LearningRate 0.4688   Epoch: 3   Global Step: 16200   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:26:23,560-Speed 10456.31 samples/sec   Loss 11.6830   LearningRate 0.4690   Epoch: 3   Global Step: 16210   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:26:31,378-Speed 10481.89 samples/sec   Loss 11.6296   LearningRate 0.4693   Epoch: 3   Global Step: 16220   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:26:39,211-Speed 10459.45 samples/sec   Loss 11.8033   LearningRate 0.4696   Epoch: 3   Global Step: 16230   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:26:47,023-Speed 10488.85 samples/sec   Loss 11.7066   LearningRate 0.4699   Epoch: 3   Global Step: 16240   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:26:54,852-Speed 10465.98 samples/sec   Loss 11.7407   LearningRate 0.4702   Epoch: 3   Global Step: 16250   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:27:02,692-Speed 10450.12 samples/sec   Loss 11.6142   LearningRate 0.4705   Epoch: 3   Global Step: 16260   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:27:10,531-Speed 10451.96 samples/sec   Loss 11.6570   LearningRate 0.4708   Epoch: 3   Global Step: 16270   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:27:18,405-Speed 10405.95 samples/sec   Loss 11.7358   LearningRate 0.4711   Epoch: 3   Global Step: 16280   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:27:26,263-Speed 10425.95 samples/sec   Loss 11.7402   LearningRate 0.4714   Epoch: 3   Global Step: 16290   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:27:34,108-Speed 10448.00 samples/sec   Loss 11.6772   LearningRate 0.4716   Epoch: 3   Global Step: 16300   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:27:41,983-Speed 10405.42 samples/sec   Loss 11.7559   LearningRate 0.4719   Epoch: 3   Global Step: 16310   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:27:49,834-Speed 10434.85 samples/sec   Loss 11.6920   LearningRate 0.4722   Epoch: 3   Global Step: 16320   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:27:57,699-Speed 10417.08 samples/sec   Loss 11.5775   LearningRate 0.4725   Epoch: 3   Global Step: 16330   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:28:05,533-Speed 10459.35 samples/sec   Loss 11.6025   LearningRate 0.4728   Epoch: 3   Global Step: 16340   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:28:13,373-Speed 10451.23 samples/sec   Loss 11.6670   LearningRate 0.4731   Epoch: 3   Global Step: 16350   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:28:21,208-Speed 10457.30 samples/sec   Loss 11.8496   LearningRate 0.4734   Epoch: 3   Global Step: 16360   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:28:29,077-Speed 10411.45 samples/sec   Loss 11.7675   LearningRate 0.4737   Epoch: 3   Global Step: 16370   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:28:36,900-Speed 10474.83 samples/sec   Loss 11.7945   LearningRate 0.4740   Epoch: 3   Global Step: 16380   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:28:44,754-Speed 10431.13 samples/sec   Loss 11.5659   LearningRate 0.4742   Epoch: 3   Global Step: 16390   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:28:52,583-Speed 10465.17 samples/sec   Loss 11.7314   LearningRate 0.4745   Epoch: 3   Global Step: 16400   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:29:00,435-Speed 10436.19 samples/sec   Loss 11.8380   LearningRate 0.4748   Epoch: 3   Global Step: 16410   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:29:08,256-Speed 10476.57 samples/sec   Loss 11.7518   LearningRate 0.4751   Epoch: 3   Global Step: 16420   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:29:16,087-Speed 10462.67 samples/sec   Loss 11.7996   LearningRate 0.4754   Epoch: 3   Global Step: 16430   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:29:23,908-Speed 10475.81 samples/sec   Loss 11.6704   LearningRate 0.4757   Epoch: 3   Global Step: 16440   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:29:31,751-Speed 10447.20 samples/sec   Loss 11.7347   LearningRate 0.4760   Epoch: 3   Global Step: 16450   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:29:39,586-Speed 10457.87 samples/sec   Loss 11.6904   LearningRate 0.4763   Epoch: 3   Global Step: 16460   Fp16 Grad Scale: 524288   Required: 19 hours
Training: 2022-01-15 18:29:47,392-Speed 10495.33 samples/sec   Loss 11.6542   LearningRate 0.4766   Epoch: 3   Global Step: 16470   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:29:55,180-Speed 10534.19 samples/sec   Loss 11.7190   LearningRate 0.4769   Epoch: 3   Global Step: 16480   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:30:02,978-Speed 10506.63 samples/sec   Loss 11.8613   LearningRate 0.4771   Epoch: 3   Global Step: 16490   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:30:10,804-Speed 10469.84 samples/sec   Loss 11.9464   LearningRate 0.4774   Epoch: 3   Global Step: 16500   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:30:18,668-Speed 10419.04 samples/sec   Loss 11.8869   LearningRate 0.4777   Epoch: 3   Global Step: 16510   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:30:26,470-Speed 10501.01 samples/sec   Loss 11.6816   LearningRate 0.4780   Epoch: 3   Global Step: 16520   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:30:34,306-Speed 10455.33 samples/sec   Loss 11.6841   LearningRate 0.4783   Epoch: 3   Global Step: 16530   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:30:42,138-Speed 10461.32 samples/sec   Loss 11.6751   LearningRate 0.4786   Epoch: 3   Global Step: 16540   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:30:49,938-Speed 10504.36 samples/sec   Loss 11.6516   LearningRate 0.4789   Epoch: 3   Global Step: 16550   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:30:57,770-Speed 10461.39 samples/sec   Loss 11.6532   LearningRate 0.4792   Epoch: 3   Global Step: 16560   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:31:05,598-Speed 10466.67 samples/sec   Loss 11.7461   LearningRate 0.4795   Epoch: 3   Global Step: 16570   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:31:13,395-Speed 10508.64 samples/sec   Loss 11.7403   LearningRate 0.4797   Epoch: 3   Global Step: 16580   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:31:21,203-Speed 10492.78 samples/sec   Loss 11.7798   LearningRate 0.4800   Epoch: 3   Global Step: 16590   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:31:29,001-Speed 10506.88 samples/sec   Loss 11.6913   LearningRate 0.4803   Epoch: 3   Global Step: 16600   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:31:36,825-Speed 10471.73 samples/sec   Loss 11.6216   LearningRate 0.4806   Epoch: 3   Global Step: 16610   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:31:44,661-Speed 10455.26 samples/sec   Loss 11.7677   LearningRate 0.4809   Epoch: 3   Global Step: 16620   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:31:52,443-Speed 10528.94 samples/sec   Loss 11.8053   LearningRate 0.4812   Epoch: 3   Global Step: 16630   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:32:00,240-Speed 10508.06 samples/sec   Loss 11.6809   LearningRate 0.4815   Epoch: 3   Global Step: 16640   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:32:08,021-Speed 10528.61 samples/sec   Loss 11.7203   LearningRate 0.4818   Epoch: 3   Global Step: 16650   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:32:15,810-Speed 10519.59 samples/sec   Loss 11.6317   LearningRate 0.4821   Epoch: 3   Global Step: 16660   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:32:23,634-Speed 10471.90 samples/sec   Loss 11.8121   LearningRate 0.4823   Epoch: 3   Global Step: 16670   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:32:31,466-Speed 10460.40 samples/sec   Loss 11.8724   LearningRate 0.4826   Epoch: 3   Global Step: 16680   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:32:39,281-Speed 10484.06 samples/sec   Loss 12.1755   LearningRate 0.4829   Epoch: 3   Global Step: 16690   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:32:47,056-Speed 10538.70 samples/sec   Loss 12.0990   LearningRate 0.4832   Epoch: 3   Global Step: 16700   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:32:54,839-Speed 10526.49 samples/sec   Loss 11.8462   LearningRate 0.4835   Epoch: 3   Global Step: 16710   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:33:02,663-Speed 10472.01 samples/sec   Loss 11.7298   LearningRate 0.4838   Epoch: 3   Global Step: 16720   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:33:10,478-Speed 10483.04 samples/sec   Loss 11.6368   LearningRate 0.4841   Epoch: 3   Global Step: 16730   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:33:18,291-Speed 10487.74 samples/sec   Loss 11.6585   LearningRate 0.4844   Epoch: 3   Global Step: 16740   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:33:26,105-Speed 10485.05 samples/sec   Loss 11.7961   LearningRate 0.4847   Epoch: 3   Global Step: 16750   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:33:33,932-Speed 10468.67 samples/sec   Loss 11.7249   LearningRate 0.4850   Epoch: 3   Global Step: 16760   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:33:41,750-Speed 10479.15 samples/sec   Loss 11.7818   LearningRate 0.4852   Epoch: 3   Global Step: 16770   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:33:49,609-Speed 10426.01 samples/sec   Loss 11.6715   LearningRate 0.4855   Epoch: 3   Global Step: 16780   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:33:57,418-Speed 10492.18 samples/sec   Loss 11.7783   LearningRate 0.4858   Epoch: 3   Global Step: 16790   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:34:05,230-Speed 10487.68 samples/sec   Loss 11.8765   LearningRate 0.4861   Epoch: 3   Global Step: 16800   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:34:13,027-Speed 10509.15 samples/sec   Loss 11.8373   LearningRate 0.4864   Epoch: 3   Global Step: 16810   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:34:20,836-Speed 10492.00 samples/sec   Loss 11.6909   LearningRate 0.4867   Epoch: 3   Global Step: 16820   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:34:28,676-Speed 10450.53 samples/sec   Loss 11.7435   LearningRate 0.4870   Epoch: 3   Global Step: 16830   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:34:36,479-Speed 10499.73 samples/sec   Loss 11.7868   LearningRate 0.4873   Epoch: 3   Global Step: 16840   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:34:44,261-Speed 10528.68 samples/sec   Loss 11.8070   LearningRate 0.4876   Epoch: 3   Global Step: 16850   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:34:52,081-Speed 10477.26 samples/sec   Loss 11.7747   LearningRate 0.4878   Epoch: 3   Global Step: 16860   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:34:59,892-Speed 10488.13 samples/sec   Loss 11.7475   LearningRate 0.4881   Epoch: 3   Global Step: 16870   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:35:07,737-Speed 10444.18 samples/sec   Loss 11.7357   LearningRate 0.4884   Epoch: 3   Global Step: 16880   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:35:15,543-Speed 10497.21 samples/sec   Loss 11.7896   LearningRate 0.4887   Epoch: 3   Global Step: 16890   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:35:23,344-Speed 10502.10 samples/sec   Loss 11.8310   LearningRate 0.4890   Epoch: 3   Global Step: 16900   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:35:31,189-Speed 10443.69 samples/sec   Loss 11.7359   LearningRate 0.4893   Epoch: 3   Global Step: 16910   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:35:39,014-Speed 10470.24 samples/sec   Loss 11.8230   LearningRate 0.4896   Epoch: 3   Global Step: 16920   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:35:46,832-Speed 10479.22 samples/sec   Loss 11.8144   LearningRate 0.4899   Epoch: 3   Global Step: 16930   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:35:54,639-Speed 10495.59 samples/sec   Loss 11.6792   LearningRate 0.4902   Epoch: 3   Global Step: 16940   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:36:02,424-Speed 10523.95 samples/sec   Loss 11.8068   LearningRate 0.4905   Epoch: 3   Global Step: 16950   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:36:10,216-Speed 10515.03 samples/sec   Loss 11.8519   LearningRate 0.4907   Epoch: 3   Global Step: 16960   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:36:18,021-Speed 10496.80 samples/sec   Loss 11.7398   LearningRate 0.4910   Epoch: 3   Global Step: 16970   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:36:25,842-Speed 10476.14 samples/sec   Loss 11.8474   LearningRate 0.4913   Epoch: 3   Global Step: 16980   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:36:33,628-Speed 10522.29 samples/sec   Loss 11.8246   LearningRate 0.4916   Epoch: 3   Global Step: 16990   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:36:41,424-Speed 10510.46 samples/sec   Loss 11.8366   LearningRate 0.4919   Epoch: 3   Global Step: 17000   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:36:49,234-Speed 10490.01 samples/sec   Loss 11.7986   LearningRate 0.4922   Epoch: 3   Global Step: 17010   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:36:57,034-Speed 10503.62 samples/sec   Loss 11.9118   LearningRate 0.4925   Epoch: 3   Global Step: 17020   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:37:04,857-Speed 10474.11 samples/sec   Loss 11.7765   LearningRate 0.4928   Epoch: 3   Global Step: 17030   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:37:12,676-Speed 10478.86 samples/sec   Loss 11.9275   LearningRate 0.4931   Epoch: 3   Global Step: 17040   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:37:20,506-Speed 10461.87 samples/sec   Loss 11.7553   LearningRate 0.4933   Epoch: 3   Global Step: 17050   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:37:28,309-Speed 10500.03 samples/sec   Loss 11.8393   LearningRate 0.4936   Epoch: 3   Global Step: 17060   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:37:36,102-Speed 10514.01 samples/sec   Loss 11.8677   LearningRate 0.4939   Epoch: 3   Global Step: 17070   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:37:43,903-Speed 10503.15 samples/sec   Loss 11.8169   LearningRate 0.4942   Epoch: 3   Global Step: 17080   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:37:51,705-Speed 10500.35 samples/sec   Loss 11.8226   LearningRate 0.4945   Epoch: 3   Global Step: 17090   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:37:59,486-Speed 10530.49 samples/sec   Loss 11.7230   LearningRate 0.4948   Epoch: 3   Global Step: 17100   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:38:07,315-Speed 10464.62 samples/sec   Loss 11.8191   LearningRate 0.4951   Epoch: 3   Global Step: 17110   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:38:15,140-Speed 10470.48 samples/sec   Loss 11.8009   LearningRate 0.4954   Epoch: 3   Global Step: 17120   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:38:22,923-Speed 10527.16 samples/sec   Loss 11.9099   LearningRate 0.4957   Epoch: 3   Global Step: 17130   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:38:30,707-Speed 10525.08 samples/sec   Loss 11.7706   LearningRate 0.4959   Epoch: 3   Global Step: 17140   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:38:38,515-Speed 10493.51 samples/sec   Loss 11.7934   LearningRate 0.4962   Epoch: 3   Global Step: 17150   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:38:46,317-Speed 10501.41 samples/sec   Loss 11.9074   LearningRate 0.4965   Epoch: 3   Global Step: 17160   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:38:54,124-Speed 10501.09 samples/sec   Loss 11.8125   LearningRate 0.4968   Epoch: 3   Global Step: 17170   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:39:01,923-Speed 10506.06 samples/sec   Loss 11.7734   LearningRate 0.4971   Epoch: 3   Global Step: 17180   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:39:09,722-Speed 10505.73 samples/sec   Loss 11.8589   LearningRate 0.4974   Epoch: 3   Global Step: 17190   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:39:17,511-Speed 10518.74 samples/sec   Loss 11.8432   LearningRate 0.4977   Epoch: 3   Global Step: 17200   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:39:25,318-Speed 10500.17 samples/sec   Loss 12.0862   LearningRate 0.4980   Epoch: 3   Global Step: 17210   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:39:33,126-Speed 10493.99 samples/sec   Loss 12.7682   LearningRate 0.4983   Epoch: 3   Global Step: 17220   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:39:40,919-Speed 10514.25 samples/sec   Loss 13.4151   LearningRate 0.4986   Epoch: 3   Global Step: 17230   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:39:48,698-Speed 10531.88 samples/sec   Loss 13.0059   LearningRate 0.4988   Epoch: 3   Global Step: 17240   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:39:56,488-Speed 10519.09 samples/sec   Loss 12.6866   LearningRate 0.4991   Epoch: 3   Global Step: 17250   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:40:04,320-Speed 10461.37 samples/sec   Loss 12.1151   LearningRate 0.4994   Epoch: 3   Global Step: 17260   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:40:12,141-Speed 10477.26 samples/sec   Loss 11.9756   LearningRate 0.4997   Epoch: 3   Global Step: 17270   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:40:19,954-Speed 10485.92 samples/sec   Loss 11.8628   LearningRate 0.5000   Epoch: 3   Global Step: 17280   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:40:27,761-Speed 10495.35 samples/sec   Loss 11.7456   LearningRate 0.5003   Epoch: 3   Global Step: 17290   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:40:35,560-Speed 10505.97 samples/sec   Loss 11.7876   LearningRate 0.5006   Epoch: 3   Global Step: 17300   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:40:43,410-Speed 10436.93 samples/sec   Loss 11.7839   LearningRate 0.5009   Epoch: 3   Global Step: 17310   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:40:51,230-Speed 10479.76 samples/sec   Loss 11.7913   LearningRate 0.5012   Epoch: 3   Global Step: 17320   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:40:59,039-Speed 10491.83 samples/sec   Loss 11.9491   LearningRate 0.5014   Epoch: 3   Global Step: 17330   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:41:06,828-Speed 10519.56 samples/sec   Loss 11.8159   LearningRate 0.5017   Epoch: 3   Global Step: 17340   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:41:14,634-Speed 10495.31 samples/sec   Loss 11.8634   LearningRate 0.5020   Epoch: 3   Global Step: 17350   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:41:22,455-Speed 10476.64 samples/sec   Loss 11.8930   LearningRate 0.5023   Epoch: 3   Global Step: 17360   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:41:30,297-Speed 10448.08 samples/sec   Loss 11.8727   LearningRate 0.5026   Epoch: 3   Global Step: 17370   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:41:38,143-Speed 10444.70 samples/sec   Loss 11.8218   LearningRate 0.5029   Epoch: 3   Global Step: 17380   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:41:45,981-Speed 10452.55 samples/sec   Loss 12.0003   LearningRate 0.5032   Epoch: 3   Global Step: 17390   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:41:53,801-Speed 10479.29 samples/sec   Loss 11.9266   LearningRate 0.5035   Epoch: 3   Global Step: 17400   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:42:01,596-Speed 10511.56 samples/sec   Loss 11.9036   LearningRate 0.5038   Epoch: 3   Global Step: 17410   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:42:09,390-Speed 10512.30 samples/sec   Loss 11.9735   LearningRate 0.5041   Epoch: 3   Global Step: 17420   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:42:17,275-Speed 10391.61 samples/sec   Loss 11.9486   LearningRate 0.5043   Epoch: 3   Global Step: 17430   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:42:25,084-Speed 10492.84 samples/sec   Loss 11.9364   LearningRate 0.5046   Epoch: 3   Global Step: 17440   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:42:32,906-Speed 10474.53 samples/sec   Loss 11.9600   LearningRate 0.5049   Epoch: 3   Global Step: 17450   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:42:40,731-Speed 10470.12 samples/sec   Loss 11.8956   LearningRate 0.5052   Epoch: 3   Global Step: 17460   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:42:48,517-Speed 10524.70 samples/sec   Loss 11.8688   LearningRate 0.5055   Epoch: 3   Global Step: 17470   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:42:56,342-Speed 10470.65 samples/sec   Loss 11.9130   LearningRate 0.5058   Epoch: 3   Global Step: 17480   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:43:04,151-Speed 10492.95 samples/sec   Loss 11.8852   LearningRate 0.5061   Epoch: 3   Global Step: 17490   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:43:11,941-Speed 10517.46 samples/sec   Loss 11.9444   LearningRate 0.5064   Epoch: 3   Global Step: 17500   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:43:19,728-Speed 10521.36 samples/sec   Loss 11.9658   LearningRate 0.5067   Epoch: 3   Global Step: 17510   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:43:27,540-Speed 10488.08 samples/sec   Loss 12.0571   LearningRate 0.5069   Epoch: 3   Global Step: 17520   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:43:35,325-Speed 10523.88 samples/sec   Loss 12.0620   LearningRate 0.5072   Epoch: 3   Global Step: 17530   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:43:43,173-Speed 10440.23 samples/sec   Loss 11.9738   LearningRate 0.5075   Epoch: 3   Global Step: 17540   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:43:50,972-Speed 10504.64 samples/sec   Loss 12.0672   LearningRate 0.5078   Epoch: 3   Global Step: 17550   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:43:58,783-Speed 10488.86 samples/sec   Loss 11.9841   LearningRate 0.5081   Epoch: 3   Global Step: 17560   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:44:06,579-Speed 10510.08 samples/sec   Loss 11.9214   LearningRate 0.5084   Epoch: 3   Global Step: 17570   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:44:14,385-Speed 10495.84 samples/sec   Loss 11.8963   LearningRate 0.5087   Epoch: 3   Global Step: 17580   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:44:22,163-Speed 10533.68 samples/sec   Loss 12.0027   LearningRate 0.5090   Epoch: 3   Global Step: 17590   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:44:29,984-Speed 10476.56 samples/sec   Loss 11.9229   LearningRate 0.5093   Epoch: 3   Global Step: 17600   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:44:37,777-Speed 10512.92 samples/sec   Loss 12.0466   LearningRate 0.5095   Epoch: 3   Global Step: 17610   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:44:45,593-Speed 10482.41 samples/sec   Loss 11.9440   LearningRate 0.5098   Epoch: 3   Global Step: 17620   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:44:53,424-Speed 10463.01 samples/sec   Loss 11.9114   LearningRate 0.5101   Epoch: 3   Global Step: 17630   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:45:01,244-Speed 10477.11 samples/sec   Loss 11.9477   LearningRate 0.5104   Epoch: 3   Global Step: 17640   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:45:09,125-Speed 10396.29 samples/sec   Loss 11.8779   LearningRate 0.5107   Epoch: 3   Global Step: 17650   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:45:16,925-Speed 10503.81 samples/sec   Loss 11.8945   LearningRate 0.5110   Epoch: 3   Global Step: 17660   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:45:24,734-Speed 10491.55 samples/sec   Loss 11.8929   LearningRate 0.5113   Epoch: 3   Global Step: 17670   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:45:32,535-Speed 10503.40 samples/sec   Loss 11.9910   LearningRate 0.5116   Epoch: 3   Global Step: 17680   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:45:40,399-Speed 10417.80 samples/sec   Loss 12.0216   LearningRate 0.5119   Epoch: 3   Global Step: 17690   Fp16 Grad Scale: 524288   Required: 19 hours
Training: 2022-01-15 18:45:48,208-Speed 10491.66 samples/sec   Loss 11.9870   LearningRate 0.5122   Epoch: 3   Global Step: 17700   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:45:56,035-Speed 10468.43 samples/sec   Loss 12.0230   LearningRate 0.5124   Epoch: 3   Global Step: 17710   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:46:03,828-Speed 10513.08 samples/sec   Loss 12.0301   LearningRate 0.5127   Epoch: 3   Global Step: 17720   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:46:11,620-Speed 10515.90 samples/sec   Loss 12.0141   LearningRate 0.5130   Epoch: 3   Global Step: 17730   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:46:19,423-Speed 10498.82 samples/sec   Loss 12.0160   LearningRate 0.5133   Epoch: 3   Global Step: 17740   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:46:27,281-Speed 10426.42 samples/sec   Loss 11.9745   LearningRate 0.5136   Epoch: 3   Global Step: 17750   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:46:35,078-Speed 10507.90 samples/sec   Loss 11.9114   LearningRate 0.5139   Epoch: 3   Global Step: 17760   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:46:42,907-Speed 10464.72 samples/sec   Loss 11.9825   LearningRate 0.5142   Epoch: 3   Global Step: 17770   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:46:50,701-Speed 10512.46 samples/sec   Loss 12.0582   LearningRate 0.5145   Epoch: 3   Global Step: 17780   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:46:58,489-Speed 10519.91 samples/sec   Loss 12.0438   LearningRate 0.5148   Epoch: 3   Global Step: 17790   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:47:06,286-Speed 10508.77 samples/sec   Loss 11.9563   LearningRate 0.5150   Epoch: 3   Global Step: 17800   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:47:14,072-Speed 10521.72 samples/sec   Loss 11.9620   LearningRate 0.5153   Epoch: 3   Global Step: 17810   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:47:21,875-Speed 10501.04 samples/sec   Loss 12.1001   LearningRate 0.5156   Epoch: 3   Global Step: 17820   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:47:29,659-Speed 10524.83 samples/sec   Loss 11.9760   LearningRate 0.5159   Epoch: 3   Global Step: 17830   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:47:37,476-Speed 10480.24 samples/sec   Loss 11.9400   LearningRate 0.5162   Epoch: 3   Global Step: 17840   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:47:45,301-Speed 10471.34 samples/sec   Loss 11.9408   LearningRate 0.5165   Epoch: 3   Global Step: 17850   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:47:53,105-Speed 10500.12 samples/sec   Loss 11.9552   LearningRate 0.5168   Epoch: 3   Global Step: 17860   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:48:00,904-Speed 10504.85 samples/sec   Loss 12.1411   LearningRate 0.5171   Epoch: 3   Global Step: 17870   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:48:08,722-Speed 10481.07 samples/sec   Loss 12.0029   LearningRate 0.5174   Epoch: 3   Global Step: 17880   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:48:16,525-Speed 10500.62 samples/sec   Loss 12.0764   LearningRate 0.5177   Epoch: 3   Global Step: 17890   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:48:24,334-Speed 10491.94 samples/sec   Loss 12.0524   LearningRate 0.5179   Epoch: 3   Global Step: 17900   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:48:32,168-Speed 10459.27 samples/sec   Loss 12.0011   LearningRate 0.5182   Epoch: 3   Global Step: 17910   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:48:39,968-Speed 10504.82 samples/sec   Loss 12.0107   LearningRate 0.5185   Epoch: 3   Global Step: 17920   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:48:47,763-Speed 10510.35 samples/sec   Loss 11.9286   LearningRate 0.5188   Epoch: 3   Global Step: 17930   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:48:55,611-Speed 10439.87 samples/sec   Loss 12.0408   LearningRate 0.5191   Epoch: 3   Global Step: 17940   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:49:03,411-Speed 10506.24 samples/sec   Loss 11.9567   LearningRate 0.5194   Epoch: 3   Global Step: 17950   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:49:11,234-Speed 10472.91 samples/sec   Loss 11.9497   LearningRate 0.5197   Epoch: 3   Global Step: 17960   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:49:19,062-Speed 10467.64 samples/sec   Loss 11.9688   LearningRate 0.5200   Epoch: 3   Global Step: 17970   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:49:26,902-Speed 10451.10 samples/sec   Loss 12.0085   LearningRate 0.5203   Epoch: 3   Global Step: 17980   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:49:34,703-Speed 10502.66 samples/sec   Loss 12.0857   LearningRate 0.5205   Epoch: 3   Global Step: 17990   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:49:42,497-Speed 10511.71 samples/sec   Loss 12.2004   LearningRate 0.5208   Epoch: 3   Global Step: 18000   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:49:50,284-Speed 10523.21 samples/sec   Loss 12.2238   LearningRate 0.5211   Epoch: 3   Global Step: 18010   Fp16 Grad Scale: 524288   Required: 19 hours
Training: 2022-01-15 18:49:58,098-Speed 10486.33 samples/sec   Loss 12.0649   LearningRate 0.5214   Epoch: 3   Global Step: 18020   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:50:05,900-Speed 10501.56 samples/sec   Loss 12.0896   LearningRate 0.5217   Epoch: 3   Global Step: 18030   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:50:13,702-Speed 10501.48 samples/sec   Loss 11.9843   LearningRate 0.5220   Epoch: 3   Global Step: 18040   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:50:21,497-Speed 10512.20 samples/sec   Loss 11.9350   LearningRate 0.5223   Epoch: 3   Global Step: 18050   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:50:29,362-Speed 10418.24 samples/sec   Loss 12.1410   LearningRate 0.5226   Epoch: 3   Global Step: 18060   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:50:37,204-Speed 10448.31 samples/sec   Loss 12.0705   LearningRate 0.5229   Epoch: 3   Global Step: 18070   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:50:45,037-Speed 10459.97 samples/sec   Loss 12.0849   LearningRate 0.5231   Epoch: 3   Global Step: 18080   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:50:52,852-Speed 10485.12 samples/sec   Loss 12.0653   LearningRate 0.5234   Epoch: 3   Global Step: 18090   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:51:00,684-Speed 10461.96 samples/sec   Loss 11.9878   LearningRate 0.5237   Epoch: 3   Global Step: 18100   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:51:08,476-Speed 10514.56 samples/sec   Loss 12.0470   LearningRate 0.5240   Epoch: 3   Global Step: 18110   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:51:16,307-Speed 10462.05 samples/sec   Loss 12.0276   LearningRate 0.5243   Epoch: 3   Global Step: 18120   Fp16 Grad Scale: 524288   Required: 19 hours
Training: 2022-01-15 18:51:24,098-Speed 10516.56 samples/sec   Loss 12.1341   LearningRate 0.5246   Epoch: 3   Global Step: 18130   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:51:31,896-Speed 10506.17 samples/sec   Loss 12.0896   LearningRate 0.5249   Epoch: 3   Global Step: 18140   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:51:39,730-Speed 10458.48 samples/sec   Loss 11.9671   LearningRate 0.5252   Epoch: 3   Global Step: 18150   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:51:47,550-Speed 10477.18 samples/sec   Loss 12.1154   LearningRate 0.5255   Epoch: 3   Global Step: 18160   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:51:55,338-Speed 10519.89 samples/sec   Loss 12.0651   LearningRate 0.5258   Epoch: 3   Global Step: 18170   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:52:03,142-Speed 10498.56 samples/sec   Loss 11.9477   LearningRate 0.5260   Epoch: 3   Global Step: 18180   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:52:10,942-Speed 10504.47 samples/sec   Loss 12.0763   LearningRate 0.5263   Epoch: 3   Global Step: 18190   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:52:18,770-Speed 10466.26 samples/sec   Loss 12.1193   LearningRate 0.5266   Epoch: 3   Global Step: 18200   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:52:26,571-Speed 10502.54 samples/sec   Loss 12.0638   LearningRate 0.5269   Epoch: 3   Global Step: 18210   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:52:34,389-Speed 10480.53 samples/sec   Loss 12.1104   LearningRate 0.5272   Epoch: 3   Global Step: 18220   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:52:42,187-Speed 10505.75 samples/sec   Loss 11.9843   LearningRate 0.5275   Epoch: 3   Global Step: 18230   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:52:50,004-Speed 10482.28 samples/sec   Loss 12.1237   LearningRate 0.5278   Epoch: 3   Global Step: 18240   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:52:57,816-Speed 10487.61 samples/sec   Loss 12.0495   LearningRate 0.5281   Epoch: 3   Global Step: 18250   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:53:05,609-Speed 10513.79 samples/sec   Loss 12.0852   LearningRate 0.5284   Epoch: 3   Global Step: 18260   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:53:13,392-Speed 10527.46 samples/sec   Loss 11.9999   LearningRate 0.5286   Epoch: 3   Global Step: 18270   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:53:21,194-Speed 10500.81 samples/sec   Loss 12.1545   LearningRate 0.5289   Epoch: 3   Global Step: 18280   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:53:29,005-Speed 10490.69 samples/sec   Loss 12.0940   LearningRate 0.5292   Epoch: 3   Global Step: 18290   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:53:36,823-Speed 10480.47 samples/sec   Loss 12.0274   LearningRate 0.5295   Epoch: 3   Global Step: 18300   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:53:44,653-Speed 10463.74 samples/sec   Loss 12.2052   LearningRate 0.5298   Epoch: 3   Global Step: 18310   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:53:52,477-Speed 10471.51 samples/sec   Loss 12.2092   LearningRate 0.5301   Epoch: 3   Global Step: 18320   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:54:00,264-Speed 10521.91 samples/sec   Loss 12.0635   LearningRate 0.5304   Epoch: 3   Global Step: 18330   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:54:08,083-Speed 10478.23 samples/sec   Loss 12.0754   LearningRate 0.5307   Epoch: 3   Global Step: 18340   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:54:15,920-Speed 10455.79 samples/sec   Loss 12.0006   LearningRate 0.5310   Epoch: 3   Global Step: 18350   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:54:23,725-Speed 10497.30 samples/sec   Loss 12.1156   LearningRate 0.5312   Epoch: 3   Global Step: 18360   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:54:31,582-Speed 10428.07 samples/sec   Loss 12.0796   LearningRate 0.5315   Epoch: 3   Global Step: 18370   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:54:39,435-Speed 10434.41 samples/sec   Loss 12.1134   LearningRate 0.5318   Epoch: 3   Global Step: 18380   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:54:47,259-Speed 10472.40 samples/sec   Loss 12.0862   LearningRate 0.5321   Epoch: 3   Global Step: 18390   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:54:55,065-Speed 10497.22 samples/sec   Loss 12.0841   LearningRate 0.5324   Epoch: 3   Global Step: 18400   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:55:02,860-Speed 10510.87 samples/sec   Loss 12.2147   LearningRate 0.5327   Epoch: 3   Global Step: 18410   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:55:10,688-Speed 10466.88 samples/sec   Loss 12.1749   LearningRate 0.5330   Epoch: 3   Global Step: 18420   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:55:18,488-Speed 10503.50 samples/sec   Loss 12.1147   LearningRate 0.5333   Epoch: 3   Global Step: 18430   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:55:26,317-Speed 10465.88 samples/sec   Loss 12.1428   LearningRate 0.5336   Epoch: 3   Global Step: 18440   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:55:34,144-Speed 10468.69 samples/sec   Loss 12.0519   LearningRate 0.5339   Epoch: 3   Global Step: 18450   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:55:41,948-Speed 10498.58 samples/sec   Loss 12.1095   LearningRate 0.5341   Epoch: 3   Global Step: 18460   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:55:49,736-Speed 10520.23 samples/sec   Loss 12.1325   LearningRate 0.5344   Epoch: 3   Global Step: 18470   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:55:57,533-Speed 10508.63 samples/sec   Loss 12.1669   LearningRate 0.5347   Epoch: 3   Global Step: 18480   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:56:05,340-Speed 10495.08 samples/sec   Loss 12.1426   LearningRate 0.5350   Epoch: 3   Global Step: 18490   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:56:13,141-Speed 10502.12 samples/sec   Loss 12.1573   LearningRate 0.5353   Epoch: 3   Global Step: 18500   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:56:20,920-Speed 10532.90 samples/sec   Loss 12.0891   LearningRate 0.5356   Epoch: 3   Global Step: 18510   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:56:28,759-Speed 10451.43 samples/sec   Loss 12.0874   LearningRate 0.5359   Epoch: 3   Global Step: 18520   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:56:36,582-Speed 10473.84 samples/sec   Loss 12.1764   LearningRate 0.5362   Epoch: 3   Global Step: 18530   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:56:44,383-Speed 10504.49 samples/sec   Loss 12.2285   LearningRate 0.5365   Epoch: 3   Global Step: 18540   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:56:52,198-Speed 10484.35 samples/sec   Loss 12.0590   LearningRate 0.5367   Epoch: 3   Global Step: 18550   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:57:00,004-Speed 10498.41 samples/sec   Loss 12.1520   LearningRate 0.5370   Epoch: 3   Global Step: 18560   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:57:07,810-Speed 10497.06 samples/sec   Loss 12.2600   LearningRate 0.5373   Epoch: 3   Global Step: 18570   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:57:15,605-Speed 10510.52 samples/sec   Loss 12.2144   LearningRate 0.5376   Epoch: 3   Global Step: 18580   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:57:23,405-Speed 10504.93 samples/sec   Loss 12.0863   LearningRate 0.5379   Epoch: 3   Global Step: 18590   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:57:31,195-Speed 10517.57 samples/sec   Loss 12.1355   LearningRate 0.5382   Epoch: 3   Global Step: 18600   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:57:38,968-Speed 10540.30 samples/sec   Loss 12.0902   LearningRate 0.5385   Epoch: 3   Global Step: 18610   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:57:46,755-Speed 10522.45 samples/sec   Loss 12.1146   LearningRate 0.5388   Epoch: 3   Global Step: 18620   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:57:54,590-Speed 10457.30 samples/sec   Loss 12.1097   LearningRate 0.5391   Epoch: 3   Global Step: 18630   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:58:02,409-Speed 10480.25 samples/sec   Loss 12.2284   LearningRate 0.5394   Epoch: 3   Global Step: 18640   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:58:10,258-Speed 10438.06 samples/sec   Loss 12.1606   LearningRate 0.5396   Epoch: 3   Global Step: 18650   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:58:18,074-Speed 10483.56 samples/sec   Loss 12.1744   LearningRate 0.5399   Epoch: 3   Global Step: 18660   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:58:25,855-Speed 10529.54 samples/sec   Loss 12.0958   LearningRate 0.5402   Epoch: 3   Global Step: 18670   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:58:33,644-Speed 10518.80 samples/sec   Loss 12.5120   LearningRate 0.5405   Epoch: 3   Global Step: 18680   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:58:41,457-Speed 10487.13 samples/sec   Loss 12.3626   LearningRate 0.5408   Epoch: 3   Global Step: 18690   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:58:49,301-Speed 10445.71 samples/sec   Loss 12.2440   LearningRate 0.5411   Epoch: 3   Global Step: 18700   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:58:57,157-Speed 10429.15 samples/sec   Loss 12.1942   LearningRate 0.5414   Epoch: 3   Global Step: 18710   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:59:04,984-Speed 10468.97 samples/sec   Loss 12.0612   LearningRate 0.5417   Epoch: 3   Global Step: 18720   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:59:12,818-Speed 10464.10 samples/sec   Loss 11.9764   LearningRate 0.5420   Epoch: 3   Global Step: 18730   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:59:20,642-Speed 10471.71 samples/sec   Loss 12.3581   LearningRate 0.5422   Epoch: 3   Global Step: 18740   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:59:28,481-Speed 10452.03 samples/sec   Loss 12.1658   LearningRate 0.5425   Epoch: 3   Global Step: 18750   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:59:36,307-Speed 10468.42 samples/sec   Loss 12.0532   LearningRate 0.5428   Epoch: 3   Global Step: 18760   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:59:44,142-Speed 10457.62 samples/sec   Loss 12.0548   LearningRate 0.5431   Epoch: 3   Global Step: 18770   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 18:59:51,944-Speed 10501.18 samples/sec   Loss 12.1186   LearningRate 0.5434   Epoch: 3   Global Step: 18780   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 18:59:59,754-Speed 10494.46 samples/sec   Loss 12.1670   LearningRate 0.5437   Epoch: 3   Global Step: 18790   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 19:00:07,563-Speed 10493.73 samples/sec   Loss 12.3220   LearningRate 0.5440   Epoch: 3   Global Step: 18800   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 19:00:15,346-Speed 10527.65 samples/sec   Loss 12.3877   LearningRate 0.5443   Epoch: 3   Global Step: 18810   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 19:00:23,160-Speed 10485.30 samples/sec   Loss 12.2129   LearningRate 0.5446   Epoch: 3   Global Step: 18820   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 19:00:30,991-Speed 10462.86 samples/sec   Loss 12.0825   LearningRate 0.5448   Epoch: 3   Global Step: 18830   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 19:00:38,787-Speed 10508.51 samples/sec   Loss 12.1543   LearningRate 0.5451   Epoch: 3   Global Step: 18840   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 19:00:46,589-Speed 10502.47 samples/sec   Loss 12.1403   LearningRate 0.5454   Epoch: 3   Global Step: 18850   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 19:00:54,400-Speed 10489.59 samples/sec   Loss 12.0790   LearningRate 0.5457   Epoch: 3   Global Step: 18860   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 19:01:02,198-Speed 10506.52 samples/sec   Loss 12.2393   LearningRate 0.5460   Epoch: 3   Global Step: 18870   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 19:01:09,972-Speed 10540.19 samples/sec   Loss 12.2499   LearningRate 0.5463   Epoch: 3   Global Step: 18880   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 19:01:17,787-Speed 10489.73 samples/sec   Loss 12.1557   LearningRate 0.5466   Epoch: 3   Global Step: 18890   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 19:01:25,591-Speed 10501.00 samples/sec   Loss 12.1416   LearningRate 0.5469   Epoch: 3   Global Step: 18900   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 19:01:33,389-Speed 10506.43 samples/sec   Loss 12.1499   LearningRate 0.5472   Epoch: 3   Global Step: 18910   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 19:01:41,168-Speed 10533.34 samples/sec   Loss 12.2740   LearningRate 0.5475   Epoch: 3   Global Step: 18920   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 19:01:48,976-Speed 10494.67 samples/sec   Loss 12.2094   LearningRate 0.5477   Epoch: 3   Global Step: 18930   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 19:01:56,806-Speed 10464.26 samples/sec   Loss 12.1851   LearningRate 0.5480   Epoch: 3   Global Step: 18940   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 19:02:04,593-Speed 10521.84 samples/sec   Loss 12.1895   LearningRate 0.5483   Epoch: 3   Global Step: 18950   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-15 19:02:12,446-Speed 10433.39 samples/sec   Loss 12.3039   LearningRate 0.5486   Epoch: 3   Global Step: 18960   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 19:02:20,245-Speed 10505.96 samples/sec   Loss 12.2116   LearningRate 0.5489   Epoch: 3   Global Step: 18970   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 19:02:28,074-Speed 10466.31 samples/sec   Loss 12.3654   LearningRate 0.5492   Epoch: 3   Global Step: 18980   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 19:02:35,870-Speed 10510.41 samples/sec   Loss 12.2252   LearningRate 0.5495   Epoch: 3   Global Step: 18990   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 19:02:43,715-Speed 10443.10 samples/sec   Loss 12.3250   LearningRate 0.5498   Epoch: 3   Global Step: 19000   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 19:02:51,568-Speed 10434.10 samples/sec   Loss 12.1949   LearningRate 0.5501   Epoch: 3   Global Step: 19010   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 19:02:59,365-Speed 10507.68 samples/sec   Loss 12.1719   LearningRate 0.5503   Epoch: 3   Global Step: 19020   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 19:03:07,244-Speed 10399.42 samples/sec   Loss 12.1883   LearningRate 0.5506   Epoch: 3   Global Step: 19030   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 19:03:15,039-Speed 10510.98 samples/sec   Loss 12.3595   LearningRate 0.5509   Epoch: 3   Global Step: 19040   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 19:03:22,844-Speed 10496.91 samples/sec   Loss 12.2992   LearningRate 0.5512   Epoch: 3   Global Step: 19050   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 19:03:30,654-Speed 10490.94 samples/sec   Loss 12.3547   LearningRate 0.5515   Epoch: 3   Global Step: 19060   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 19:03:38,482-Speed 10466.62 samples/sec   Loss 12.1613   LearningRate 0.5518   Epoch: 3   Global Step: 19070   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 19:03:46,274-Speed 10513.93 samples/sec   Loss 12.2766   LearningRate 0.5521   Epoch: 3   Global Step: 19080   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 19:03:54,123-Speed 10438.85 samples/sec   Loss 12.1922   LearningRate 0.5524   Epoch: 3   Global Step: 19090   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 19:04:01,944-Speed 10475.85 samples/sec   Loss 12.1631   LearningRate 0.5527   Epoch: 3   Global Step: 19100   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 19:04:09,751-Speed 10495.92 samples/sec   Loss 12.2147   LearningRate 0.5530   Epoch: 3   Global Step: 19110   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 19:04:17,541-Speed 10517.31 samples/sec   Loss 12.1419   LearningRate 0.5532   Epoch: 3   Global Step: 19120   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 19:04:25,339-Speed 10507.97 samples/sec   Loss 12.2775   LearningRate 0.5535   Epoch: 3   Global Step: 19130   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 19:04:33,136-Speed 10508.48 samples/sec   Loss 12.3242   LearningRate 0.5538   Epoch: 3   Global Step: 19140   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 19:04:40,917-Speed 10529.47 samples/sec   Loss 12.2433   LearningRate 0.5541   Epoch: 3   Global Step: 19150   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 19:04:48,710-Speed 10514.45 samples/sec   Loss 12.2546   LearningRate 0.5544   Epoch: 3   Global Step: 19160   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 19:04:56,522-Speed 10488.86 samples/sec   Loss 12.2686   LearningRate 0.5547   Epoch: 3   Global Step: 19170   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 19:05:04,335-Speed 10486.42 samples/sec   Loss 12.3360   LearningRate 0.5550   Epoch: 3   Global Step: 19180   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 19:05:12,155-Speed 10477.79 samples/sec   Loss 12.1749   LearningRate 0.5553   Epoch: 3   Global Step: 19190   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 19:05:19,950-Speed 10511.22 samples/sec   Loss 12.2683   LearningRate 0.5556   Epoch: 3   Global Step: 19200   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-15 19:05:27,780-Speed 10464.04 samples/sec   Loss 12.2182   LearningRate 0.5558   Epoch: 3   Global Step: 19210   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:05:35,585-Speed 10498.80 samples/sec   Loss 12.2351   LearningRate 0.5561   Epoch: 3   Global Step: 19220   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:05:43,435-Speed 10436.93 samples/sec   Loss 12.3015   LearningRate 0.5564   Epoch: 3   Global Step: 19230   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:05:51,267-Speed 10460.83 samples/sec   Loss 12.4377   LearningRate 0.5567   Epoch: 3   Global Step: 19240   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:05:59,127-Speed 10424.16 samples/sec   Loss 12.3110   LearningRate 0.5570   Epoch: 3   Global Step: 19250   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:06:06,982-Speed 10431.20 samples/sec   Loss 12.3478   LearningRate 0.5573   Epoch: 3   Global Step: 19260   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:06:14,802-Speed 10477.41 samples/sec   Loss 12.1849   LearningRate 0.5576   Epoch: 3   Global Step: 19270   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:06:22,589-Speed 10521.89 samples/sec   Loss 12.3584   LearningRate 0.5579   Epoch: 3   Global Step: 19280   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:06:30,386-Speed 10508.51 samples/sec   Loss 12.1789   LearningRate 0.5582   Epoch: 3   Global Step: 19290   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:06:38,191-Speed 10496.58 samples/sec   Loss 12.4373   LearningRate 0.5584   Epoch: 3   Global Step: 19300   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:06:46,034-Speed 10446.38 samples/sec   Loss 12.2616   LearningRate 0.5587   Epoch: 3   Global Step: 19310   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:06:53,868-Speed 10458.39 samples/sec   Loss 12.4072   LearningRate 0.5590   Epoch: 3   Global Step: 19320   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:07:01,661-Speed 10514.63 samples/sec   Loss 12.1999   LearningRate 0.5593   Epoch: 3   Global Step: 19330   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:07:09,475-Speed 10484.71 samples/sec   Loss 12.1403   LearningRate 0.5596   Epoch: 3   Global Step: 19340   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:07:17,275-Speed 10504.29 samples/sec   Loss 12.3386   LearningRate 0.5599   Epoch: 3   Global Step: 19350   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:07:25,088-Speed 10487.72 samples/sec   Loss 12.3693   LearningRate 0.5602   Epoch: 3   Global Step: 19360   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:07:32,875-Speed 10532.52 samples/sec   Loss 12.3916   LearningRate 0.5605   Epoch: 3   Global Step: 19370   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:07:40,662-Speed 10521.31 samples/sec   Loss 12.2073   LearningRate 0.5608   Epoch: 3   Global Step: 19380   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:07:48,502-Speed 10452.00 samples/sec   Loss 12.2734   LearningRate 0.5611   Epoch: 3   Global Step: 19390   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:07:56,301-Speed 10505.51 samples/sec   Loss 12.1767   LearningRate 0.5613   Epoch: 3   Global Step: 19400   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:08:04,102-Speed 10503.32 samples/sec   Loss 12.3238   LearningRate 0.5616   Epoch: 3   Global Step: 19410   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:08:11,899-Speed 10507.40 samples/sec   Loss 12.2312   LearningRate 0.5619   Epoch: 3   Global Step: 19420   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:08:19,702-Speed 10500.74 samples/sec   Loss 12.3531   LearningRate 0.5622   Epoch: 3   Global Step: 19430   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:08:27,503-Speed 10502.47 samples/sec   Loss 12.3179   LearningRate 0.5625   Epoch: 3   Global Step: 19440   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:08:35,289-Speed 10524.10 samples/sec   Loss 12.2464   LearningRate 0.5628   Epoch: 3   Global Step: 19450   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:08:43,066-Speed 10535.87 samples/sec   Loss 12.3921   LearningRate 0.5631   Epoch: 3   Global Step: 19460   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:08:50,883-Speed 10482.25 samples/sec   Loss 12.3867   LearningRate 0.5634   Epoch: 3   Global Step: 19470   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:08:58,675-Speed 10516.26 samples/sec   Loss 12.4636   LearningRate 0.5637   Epoch: 3   Global Step: 19480   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:09:06,462-Speed 10522.75 samples/sec   Loss 12.2854   LearningRate 0.5639   Epoch: 3   Global Step: 19490   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:09:14,237-Speed 10539.28 samples/sec   Loss 12.3161   LearningRate 0.5642   Epoch: 3   Global Step: 19500   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:09:22,049-Speed 10488.72 samples/sec   Loss 12.2619   LearningRate 0.5645   Epoch: 3   Global Step: 19510   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:09:29,877-Speed 10466.96 samples/sec   Loss 12.2902   LearningRate 0.5648   Epoch: 3   Global Step: 19520   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:09:37,719-Speed 10448.51 samples/sec   Loss 12.2895   LearningRate 0.5651   Epoch: 3   Global Step: 19530   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:09:45,560-Speed 10450.93 samples/sec   Loss 12.3900   LearningRate 0.5654   Epoch: 3   Global Step: 19540   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:09:53,358-Speed 10506.53 samples/sec   Loss 12.3283   LearningRate 0.5657   Epoch: 3   Global Step: 19550   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:10:01,144-Speed 10524.65 samples/sec   Loss 12.3656   LearningRate 0.5660   Epoch: 3   Global Step: 19560   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:10:08,927-Speed 10526.60 samples/sec   Loss 12.3604   LearningRate 0.5663   Epoch: 3   Global Step: 19570   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:10:16,722-Speed 10511.57 samples/sec   Loss 12.5545   LearningRate 0.5666   Epoch: 3   Global Step: 19580   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:10:24,537-Speed 10484.15 samples/sec   Loss 12.3914   LearningRate 0.5668   Epoch: 3   Global Step: 19590   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:10:32,368-Speed 10463.29 samples/sec   Loss 12.2527   LearningRate 0.5671   Epoch: 3   Global Step: 19600   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:10:40,184-Speed 10483.00 samples/sec   Loss 12.3143   LearningRate 0.5674   Epoch: 3   Global Step: 19610   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:10:47,954-Speed 10545.88 samples/sec   Loss 12.3268   LearningRate 0.5677   Epoch: 3   Global Step: 19620   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:10:55,742-Speed 10521.90 samples/sec   Loss 12.2654   LearningRate 0.5680   Epoch: 3   Global Step: 19630   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:11:03,556-Speed 10485.66 samples/sec   Loss 12.3464   LearningRate 0.5683   Epoch: 3   Global Step: 19640   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:11:11,365-Speed 10492.71 samples/sec   Loss 12.3561   LearningRate 0.5686   Epoch: 3   Global Step: 19650   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:11:19,152-Speed 10521.72 samples/sec   Loss 12.3390   LearningRate 0.5689   Epoch: 3   Global Step: 19660   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:11:26,955-Speed 10499.66 samples/sec   Loss 12.3428   LearningRate 0.5692   Epoch: 3   Global Step: 19670   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:11:34,749-Speed 10513.43 samples/sec   Loss 12.2955   LearningRate 0.5694   Epoch: 3   Global Step: 19680   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-15 19:11:42,522-Speed 10541.06 samples/sec   Loss 12.3604   LearningRate 0.5697   Epoch: 3   Global Step: 19690   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-15 19:11:50,291-Speed 10546.64 samples/sec   Loss 12.3675   LearningRate 0.5700   Epoch: 3   Global Step: 19700   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-15 19:11:58,113-Speed 10475.58 samples/sec   Loss 12.5989   LearningRate 0.5703   Epoch: 3   Global Step: 19710   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-15 19:12:05,925-Speed 10487.76 samples/sec   Loss 12.4316   LearningRate 0.5706   Epoch: 3   Global Step: 19720   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-15 19:12:13,717-Speed 10514.09 samples/sec   Loss 12.5177   LearningRate 0.5709   Epoch: 3   Global Step: 19730   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-15 19:12:21,548-Speed 10463.41 samples/sec   Loss 12.4544   LearningRate 0.5712   Epoch: 3   Global Step: 19740   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-15 19:12:29,391-Speed 10445.67 samples/sec   Loss 12.3679   LearningRate 0.5715   Epoch: 3   Global Step: 19750   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-15 19:12:37,166-Speed 10538.17 samples/sec   Loss 12.2312   LearningRate 0.5718   Epoch: 3   Global Step: 19760   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-15 19:12:44,980-Speed 10484.61 samples/sec   Loss 12.3506   LearningRate 0.5720   Epoch: 3   Global Step: 19770   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-15 19:12:52,803-Speed 10473.06 samples/sec   Loss 12.4179   LearningRate 0.5723   Epoch: 3   Global Step: 19780   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:13:00,591-Speed 10520.87 samples/sec   Loss 12.3161   LearningRate 0.5726   Epoch: 3   Global Step: 19790   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:13:08,409-Speed 10479.69 samples/sec   Loss 12.3853   LearningRate 0.5729   Epoch: 3   Global Step: 19800   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:13:16,225-Speed 10481.73 samples/sec   Loss 12.4117   LearningRate 0.5732   Epoch: 3   Global Step: 19810   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:13:24,061-Speed 10456.59 samples/sec   Loss 12.2813   LearningRate 0.5735   Epoch: 3   Global Step: 19820   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:13:31,929-Speed 10413.28 samples/sec   Loss 12.3711   LearningRate 0.5738   Epoch: 3   Global Step: 19830   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:13:39,729-Speed 10504.31 samples/sec   Loss 12.3222   LearningRate 0.5741   Epoch: 3   Global Step: 19840   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:13:47,520-Speed 10515.08 samples/sec   Loss 12.3376   LearningRate 0.5744   Epoch: 3   Global Step: 19850   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:13:55,311-Speed 10515.87 samples/sec   Loss 12.3915   LearningRate 0.5747   Epoch: 3   Global Step: 19860   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:14:03,110-Speed 10505.35 samples/sec   Loss 12.4276   LearningRate 0.5749   Epoch: 3   Global Step: 19870   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:14:10,925-Speed 10483.69 samples/sec   Loss 12.2893   LearningRate 0.5752   Epoch: 3   Global Step: 19880   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:14:18,761-Speed 10456.38 samples/sec   Loss 12.3668   LearningRate 0.5755   Epoch: 3   Global Step: 19890   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:14:26,586-Speed 10471.19 samples/sec   Loss 12.4372   LearningRate 0.5758   Epoch: 3   Global Step: 19900   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:14:34,391-Speed 10496.40 samples/sec   Loss 12.5691   LearningRate 0.5761   Epoch: 3   Global Step: 19910   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:14:42,209-Speed 10478.41 samples/sec   Loss 12.4340   LearningRate 0.5764   Epoch: 3   Global Step: 19920   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:14:50,055-Speed 10442.94 samples/sec   Loss 12.4227   LearningRate 0.5767   Epoch: 3   Global Step: 19930   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:14:57,883-Speed 10467.23 samples/sec   Loss 12.4791   LearningRate 0.5770   Epoch: 3   Global Step: 19940   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:15:05,737-Speed 10430.33 samples/sec   Loss 12.4970   LearningRate 0.5773   Epoch: 3   Global Step: 19950   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:15:13,530-Speed 10513.88 samples/sec   Loss 12.4475   LearningRate 0.5775   Epoch: 3   Global Step: 19960   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:15:21,319-Speed 10519.62 samples/sec   Loss 12.3304   LearningRate 0.5778   Epoch: 3   Global Step: 19970   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:15:29,137-Speed 10479.95 samples/sec   Loss 12.3077   LearningRate 0.5781   Epoch: 3   Global Step: 19980   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:15:36,939-Speed 10501.43 samples/sec   Loss 12.8934   LearningRate 0.5784   Epoch: 3   Global Step: 19990   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:15:44,714-Speed 10539.06 samples/sec   Loss 12.8263   LearningRate 0.5787   Epoch: 3   Global Step: 20000   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:16:11,643-[lfw][20000]XNorm: 22.510489
Training: 2022-01-15 19:16:11,643-[lfw][20000]Accuracy-Flip: 0.99433+-0.00396
Training: 2022-01-15 19:16:11,644-[lfw][20000]Accuracy-Highest: 0.99483
Training: 2022-01-15 19:16:43,574-[cfp_fp][20000]XNorm: 19.549580
Training: 2022-01-15 19:16:43,574-[cfp_fp][20000]Accuracy-Flip: 0.95829+-0.01002
Training: 2022-01-15 19:16:43,575-[cfp_fp][20000]Accuracy-Highest: 0.96829
Training: 2022-01-15 19:17:11,399-[agedb_30][20000]XNorm: 21.887470
Training: 2022-01-15 19:17:11,400-[agedb_30][20000]Accuracy-Flip: 0.95200+-0.00833
Training: 2022-01-15 19:17:11,401-[agedb_30][20000]Accuracy-Highest: 0.95250
Training: 2022-01-15 19:17:19,141-Speed 867.56 samples/sec   Loss 12.5918   LearningRate 0.5790   Epoch: 3   Global Step: 20010   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:17:26,895-Speed 10567.30 samples/sec   Loss 12.3186   LearningRate 0.5793   Epoch: 3   Global Step: 20020   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:17:34,635-Speed 10584.74 samples/sec   Loss 12.3475   LearningRate 0.5796   Epoch: 3   Global Step: 20030   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:17:42,370-Speed 10593.35 samples/sec   Loss 12.3007   LearningRate 0.5799   Epoch: 3   Global Step: 20040   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:17:50,113-Speed 10580.10 samples/sec   Loss 12.2902   LearningRate 0.5802   Epoch: 3   Global Step: 20050   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:17:57,947-Speed 10459.32 samples/sec   Loss 12.2658   LearningRate 0.5804   Epoch: 3   Global Step: 20060   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:18:05,717-Speed 10545.24 samples/sec   Loss 12.3454   LearningRate 0.5807   Epoch: 3   Global Step: 20070   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:18:13,492-Speed 10536.71 samples/sec   Loss 12.3956   LearningRate 0.5810   Epoch: 3   Global Step: 20080   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:18:21,264-Speed 10541.23 samples/sec   Loss 12.4873   LearningRate 0.5813   Epoch: 3   Global Step: 20090   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:18:29,061-Speed 10508.75 samples/sec   Loss 12.4286   LearningRate 0.5816   Epoch: 3   Global Step: 20100   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:18:36,837-Speed 10538.50 samples/sec   Loss 12.4782   LearningRate 0.5819   Epoch: 3   Global Step: 20110   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:18:44,614-Speed 10534.04 samples/sec   Loss 12.3507   LearningRate 0.5822   Epoch: 3   Global Step: 20120   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:18:52,396-Speed 10527.79 samples/sec   Loss 12.4951   LearningRate 0.5825   Epoch: 3   Global Step: 20130   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:19:00,185-Speed 10519.03 samples/sec   Loss 12.4214   LearningRate 0.5828   Epoch: 3   Global Step: 20140   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:19:07,960-Speed 10537.85 samples/sec   Loss 12.3716   LearningRate 0.5830   Epoch: 3   Global Step: 20150   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:19:15,735-Speed 10538.02 samples/sec   Loss 12.4427   LearningRate 0.5833   Epoch: 3   Global Step: 20160   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:19:23,531-Speed 10509.59 samples/sec   Loss 12.4309   LearningRate 0.5836   Epoch: 3   Global Step: 20170   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:19:31,388-Speed 10427.38 samples/sec   Loss 12.4116   LearningRate 0.5839   Epoch: 3   Global Step: 20180   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:19:39,300-Speed 10356.41 samples/sec   Loss 12.4013   LearningRate 0.5842   Epoch: 3   Global Step: 20190   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:19:47,055-Speed 10565.02 samples/sec   Loss 12.6275   LearningRate 0.5845   Epoch: 3   Global Step: 20200   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:19:54,839-Speed 10525.83 samples/sec   Loss 12.5041   LearningRate 0.5848   Epoch: 3   Global Step: 20210   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:20:02,632-Speed 10513.67 samples/sec   Loss 12.4671   LearningRate 0.5851   Epoch: 3   Global Step: 20220   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:20:10,435-Speed 10500.50 samples/sec   Loss 12.5309   LearningRate 0.5854   Epoch: 3   Global Step: 20230   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:20:18,221-Speed 10521.70 samples/sec   Loss 12.4883   LearningRate 0.5856   Epoch: 3   Global Step: 20240   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:20:26,003-Speed 10528.69 samples/sec   Loss 12.4980   LearningRate 0.5859   Epoch: 3   Global Step: 20250   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:20:33,782-Speed 10532.86 samples/sec   Loss 12.4559   LearningRate 0.5862   Epoch: 3   Global Step: 20260   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:20:41,583-Speed 10502.10 samples/sec   Loss 12.5175   LearningRate 0.5865   Epoch: 3   Global Step: 20270   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:20:49,361-Speed 10535.60 samples/sec   Loss 12.4290   LearningRate 0.5868   Epoch: 3   Global Step: 20280   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:20:57,152-Speed 10519.44 samples/sec   Loss 12.3616   LearningRate 0.5871   Epoch: 3   Global Step: 20290   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:21:05,053-Speed 10370.00 samples/sec   Loss 12.4780   LearningRate 0.5874   Epoch: 3   Global Step: 20300   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:21:12,870-Speed 10480.79 samples/sec   Loss 12.5735   LearningRate 0.5877   Epoch: 3   Global Step: 20310   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:21:20,680-Speed 10490.11 samples/sec   Loss 12.4525   LearningRate 0.5880   Epoch: 3   Global Step: 20320   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:21:28,466-Speed 10524.18 samples/sec   Loss 12.4721   LearningRate 0.5883   Epoch: 3   Global Step: 20330   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:21:36,281-Speed 10491.47 samples/sec   Loss 12.5907   LearningRate 0.5885   Epoch: 3   Global Step: 20340   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:21:44,070-Speed 10517.17 samples/sec   Loss 12.4866   LearningRate 0.5888   Epoch: 3   Global Step: 20350   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:21:51,853-Speed 10526.50 samples/sec   Loss 12.5522   LearningRate 0.5891   Epoch: 3   Global Step: 20360   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:21:59,642-Speed 10523.21 samples/sec   Loss 12.3940   LearningRate 0.5894   Epoch: 3   Global Step: 20370   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:22:07,422-Speed 10531.71 samples/sec   Loss 12.3445   LearningRate 0.5897   Epoch: 3   Global Step: 20380   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:22:15,206-Speed 10523.66 samples/sec   Loss 12.7067   LearningRate 0.5900   Epoch: 3   Global Step: 20390   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:22:22,987-Speed 10530.38 samples/sec   Loss 12.6810   LearningRate 0.5903   Epoch: 3   Global Step: 20400   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:22:30,784-Speed 10508.45 samples/sec   Loss 12.6548   LearningRate 0.5906   Epoch: 3   Global Step: 20410   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:22:38,572-Speed 10519.43 samples/sec   Loss 12.5802   LearningRate 0.5909   Epoch: 3   Global Step: 20420   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:22:46,401-Speed 10465.16 samples/sec   Loss 12.3844   LearningRate 0.5911   Epoch: 3   Global Step: 20430   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:22:54,196-Speed 10510.91 samples/sec   Loss 12.4990   LearningRate 0.5914   Epoch: 3   Global Step: 20440   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:23:02,003-Speed 10494.63 samples/sec   Loss 12.3410   LearningRate 0.5917   Epoch: 3   Global Step: 20450   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:23:09,793-Speed 10517.25 samples/sec   Loss 12.6675   LearningRate 0.5920   Epoch: 3   Global Step: 20460   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:23:17,615-Speed 10474.76 samples/sec   Loss 12.6442   LearningRate 0.5923   Epoch: 3   Global Step: 20470   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:23:25,408-Speed 10512.76 samples/sec   Loss 12.5585   LearningRate 0.5926   Epoch: 3   Global Step: 20480   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:23:33,230-Speed 10474.91 samples/sec   Loss 12.4701   LearningRate 0.5929   Epoch: 3   Global Step: 20490   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:23:41,051-Speed 10474.82 samples/sec   Loss 12.4811   LearningRate 0.5932   Epoch: 3   Global Step: 20500   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:23:48,832-Speed 10529.91 samples/sec   Loss 12.5142   LearningRate 0.5935   Epoch: 3   Global Step: 20510   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:23:56,630-Speed 10506.29 samples/sec   Loss 12.5239   LearningRate 0.5938   Epoch: 3   Global Step: 20520   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:24:04,496-Speed 10416.17 samples/sec   Loss 12.5547   LearningRate 0.5940   Epoch: 3   Global Step: 20530   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:24:12,314-Speed 10481.16 samples/sec   Loss 12.5357   LearningRate 0.5943   Epoch: 3   Global Step: 20540   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:24:20,166-Speed 10434.20 samples/sec   Loss 12.5979   LearningRate 0.5946   Epoch: 3   Global Step: 20550   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:24:27,956-Speed 10516.42 samples/sec   Loss 12.4919   LearningRate 0.5949   Epoch: 3   Global Step: 20560   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:24:35,757-Speed 10503.04 samples/sec   Loss 12.5991   LearningRate 0.5952   Epoch: 3   Global Step: 20570   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:24:43,553-Speed 10509.07 samples/sec   Loss 12.5139   LearningRate 0.5955   Epoch: 3   Global Step: 20580   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:24:51,337-Speed 10526.15 samples/sec   Loss 12.5921   LearningRate 0.5958   Epoch: 3   Global Step: 20590   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:24:59,141-Speed 10499.30 samples/sec   Loss 12.4991   LearningRate 0.5961   Epoch: 3   Global Step: 20600   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:25:06,923-Speed 10527.95 samples/sec   Loss 12.5502   LearningRate 0.5964   Epoch: 3   Global Step: 20610   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:25:14,737-Speed 10484.00 samples/sec   Loss 12.5848   LearningRate 0.5966   Epoch: 3   Global Step: 20620   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:25:22,524-Speed 10521.75 samples/sec   Loss 12.5501   LearningRate 0.5969   Epoch: 3   Global Step: 20630   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:25:30,325-Speed 10502.99 samples/sec   Loss 12.4853   LearningRate 0.5972   Epoch: 3   Global Step: 20640   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:25:38,095-Speed 10545.63 samples/sec   Loss 12.6028   LearningRate 0.5975   Epoch: 3   Global Step: 20650   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:25:45,900-Speed 10495.51 samples/sec   Loss 12.6522   LearningRate 0.5978   Epoch: 3   Global Step: 20660   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:25:53,696-Speed 10509.90 samples/sec   Loss 12.6707   LearningRate 0.5981   Epoch: 3   Global Step: 20670   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:26:01,477-Speed 10530.30 samples/sec   Loss 12.5909   LearningRate 0.5984   Epoch: 3   Global Step: 20680   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:26:09,311-Speed 10459.34 samples/sec   Loss 12.5585   LearningRate 0.5987   Epoch: 3   Global Step: 20690   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:26:17,121-Speed 10490.26 samples/sec   Loss 12.7030   LearningRate 0.5990   Epoch: 3   Global Step: 20700   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:26:24,953-Speed 10462.47 samples/sec   Loss 12.6313   LearningRate 0.5992   Epoch: 3   Global Step: 20710   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:26:32,773-Speed 10476.76 samples/sec   Loss 12.5787   LearningRate 0.5995   Epoch: 3   Global Step: 20720   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:26:40,599-Speed 10472.57 samples/sec   Loss 12.5420   LearningRate 0.5998   Epoch: 3   Global Step: 20730   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:26:48,424-Speed 10470.19 samples/sec   Loss 12.5760   LearningRate 0.5999   Epoch: 3   Global Step: 20740   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:27:11,285-Speed 3583.52 samples/sec   Loss 12.7261   LearningRate 0.5998   Epoch: 4   Global Step: 20750   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:27:19,117-Speed 10462.12 samples/sec   Loss 12.7621   LearningRate 0.5997   Epoch: 4   Global Step: 20760   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:27:26,934-Speed 10480.95 samples/sec   Loss 12.5734   LearningRate 0.5995   Epoch: 4   Global Step: 20770   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:27:34,671-Speed 10590.13 samples/sec   Loss 12.4603   LearningRate 0.5994   Epoch: 4   Global Step: 20780   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:27:42,420-Speed 10572.20 samples/sec   Loss 12.4839   LearningRate 0.5992   Epoch: 4   Global Step: 20790   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:27:50,168-Speed 10575.58 samples/sec   Loss 12.5619   LearningRate 0.5991   Epoch: 4   Global Step: 20800   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:27:57,967-Speed 10505.46 samples/sec   Loss 12.5532   LearningRate 0.5989   Epoch: 4   Global Step: 20810   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:28:05,800-Speed 10458.67 samples/sec   Loss 12.5422   LearningRate 0.5988   Epoch: 4   Global Step: 20820   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:28:13,598-Speed 10507.04 samples/sec   Loss 12.5250   LearningRate 0.5986   Epoch: 4   Global Step: 20830   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:28:21,386-Speed 10520.92 samples/sec   Loss 12.5142   LearningRate 0.5985   Epoch: 4   Global Step: 20840   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:28:29,190-Speed 10497.95 samples/sec   Loss 12.5303   LearningRate 0.5984   Epoch: 4   Global Step: 20850   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:28:37,001-Speed 10489.59 samples/sec   Loss 12.6478   LearningRate 0.5982   Epoch: 4   Global Step: 20860   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:28:44,834-Speed 10460.87 samples/sec   Loss 12.5605   LearningRate 0.5981   Epoch: 4   Global Step: 20870   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:28:52,673-Speed 10451.78 samples/sec   Loss 12.5962   LearningRate 0.5979   Epoch: 4   Global Step: 20880   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:29:00,448-Speed 10537.70 samples/sec   Loss 12.4646   LearningRate 0.5978   Epoch: 4   Global Step: 20890   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:29:08,254-Speed 10498.14 samples/sec   Loss 12.6246   LearningRate 0.5976   Epoch: 4   Global Step: 20900   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:29:16,051-Speed 10507.95 samples/sec   Loss 12.6714   LearningRate 0.5975   Epoch: 4   Global Step: 20910   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:29:23,848-Speed 10507.24 samples/sec   Loss 12.5431   LearningRate 0.5973   Epoch: 4   Global Step: 20920   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:29:31,640-Speed 10513.80 samples/sec   Loss 12.5389   LearningRate 0.5972   Epoch: 4   Global Step: 20930   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:29:39,458-Speed 10480.41 samples/sec   Loss 12.4869   LearningRate 0.5971   Epoch: 4   Global Step: 20940   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:29:47,245-Speed 10521.98 samples/sec   Loss 12.5489   LearningRate 0.5969   Epoch: 4   Global Step: 20950   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:29:55,059-Speed 10484.63 samples/sec   Loss 12.6364   LearningRate 0.5968   Epoch: 4   Global Step: 20960   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:30:02,841-Speed 10527.68 samples/sec   Loss 12.8342   LearningRate 0.5966   Epoch: 4   Global Step: 20970   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:30:10,636-Speed 10511.31 samples/sec   Loss 12.6554   LearningRate 0.5965   Epoch: 4   Global Step: 20980   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:30:18,429-Speed 10514.31 samples/sec   Loss 12.6061   LearningRate 0.5963   Epoch: 4   Global Step: 20990   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:30:26,257-Speed 10465.12 samples/sec   Loss 12.4940   LearningRate 0.5962   Epoch: 4   Global Step: 21000   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:30:34,087-Speed 10464.51 samples/sec   Loss 12.4853   LearningRate 0.5960   Epoch: 4   Global Step: 21010   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:30:41,896-Speed 10492.09 samples/sec   Loss 12.5547   LearningRate 0.5959   Epoch: 4   Global Step: 21020   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:30:49,693-Speed 10508.11 samples/sec   Loss 12.5150   LearningRate 0.5958   Epoch: 4   Global Step: 21030   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:30:57,501-Speed 10493.15 samples/sec   Loss 12.5365   LearningRate 0.5956   Epoch: 4   Global Step: 21040   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:31:05,302-Speed 10502.47 samples/sec   Loss 12.5860   LearningRate 0.5955   Epoch: 4   Global Step: 21050   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:31:13,114-Speed 10489.12 samples/sec   Loss 12.4963   LearningRate 0.5953   Epoch: 4   Global Step: 21060   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:31:20,930-Speed 10482.57 samples/sec   Loss 12.6114   LearningRate 0.5952   Epoch: 4   Global Step: 21070   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:31:28,734-Speed 10497.99 samples/sec   Loss 12.4771   LearningRate 0.5950   Epoch: 4   Global Step: 21080   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:31:36,538-Speed 10498.76 samples/sec   Loss 12.5848   LearningRate 0.5949   Epoch: 4   Global Step: 21090   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:31:44,363-Speed 10471.22 samples/sec   Loss 12.5823   LearningRate 0.5947   Epoch: 4   Global Step: 21100   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:31:52,271-Speed 10360.13 samples/sec   Loss 12.5139   LearningRate 0.5946   Epoch: 4   Global Step: 21110   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:32:00,106-Speed 10455.91 samples/sec   Loss 12.5594   LearningRate 0.5945   Epoch: 4   Global Step: 21120   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:32:07,938-Speed 10461.44 samples/sec   Loss 12.5370   LearningRate 0.5943   Epoch: 4   Global Step: 21130   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:32:15,837-Speed 10373.15 samples/sec   Loss 12.5980   LearningRate 0.5942   Epoch: 4   Global Step: 21140   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:32:23,685-Speed 10439.69 samples/sec   Loss 12.4143   LearningRate 0.5940   Epoch: 4   Global Step: 21150   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:32:31,500-Speed 10483.53 samples/sec   Loss 12.5104   LearningRate 0.5939   Epoch: 4   Global Step: 21160   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:32:39,320-Speed 10476.57 samples/sec   Loss 12.4736   LearningRate 0.5937   Epoch: 4   Global Step: 21170   Fp16 Grad Scale: 524288   Required: 18 hours
Training: 2022-01-15 19:32:47,152-Speed 10461.87 samples/sec   Loss 12.5609   LearningRate 0.5936   Epoch: 4   Global Step: 21180   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:32:54,989-Speed 10452.89 samples/sec   Loss 12.5598   LearningRate 0.5934   Epoch: 4   Global Step: 21190   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:33:02,873-Speed 10393.05 samples/sec   Loss 12.4611   LearningRate 0.5933   Epoch: 4   Global Step: 21200   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:33:10,702-Speed 10464.24 samples/sec   Loss 12.5378   LearningRate 0.5932   Epoch: 4   Global Step: 21210   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:33:18,547-Speed 10443.97 samples/sec   Loss 12.4927   LearningRate 0.5930   Epoch: 4   Global Step: 21220   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:33:26,410-Speed 10419.89 samples/sec   Loss 12.5414   LearningRate 0.5929   Epoch: 4   Global Step: 21230   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:33:34,256-Speed 10441.99 samples/sec   Loss 12.6690   LearningRate 0.5927   Epoch: 4   Global Step: 21240   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:33:42,071-Speed 10484.91 samples/sec   Loss 12.7050   LearningRate 0.5926   Epoch: 4   Global Step: 21250   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:33:49,910-Speed 10450.74 samples/sec   Loss 12.5542   LearningRate 0.5924   Epoch: 4   Global Step: 21260   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:33:57,726-Speed 10481.77 samples/sec   Loss 12.4857   LearningRate 0.5923   Epoch: 4   Global Step: 21270   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:34:05,559-Speed 10460.31 samples/sec   Loss 12.4994   LearningRate 0.5922   Epoch: 4   Global Step: 21280   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:34:13,378-Speed 10479.78 samples/sec   Loss 12.5757   LearningRate 0.5920   Epoch: 4   Global Step: 21290   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:34:21,215-Speed 10454.59 samples/sec   Loss 12.4361   LearningRate 0.5919   Epoch: 4   Global Step: 21300   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:34:29,104-Speed 10384.25 samples/sec   Loss 12.3928   LearningRate 0.5917   Epoch: 4   Global Step: 21310   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:34:36,924-Speed 10477.44 samples/sec   Loss 12.5955   LearningRate 0.5916   Epoch: 4   Global Step: 21320   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:34:44,790-Speed 10415.76 samples/sec   Loss 12.5047   LearningRate 0.5914   Epoch: 4   Global Step: 21330   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:34:52,619-Speed 10466.82 samples/sec   Loss 12.5168   LearningRate 0.5913   Epoch: 4   Global Step: 21340   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:35:00,468-Speed 10438.51 samples/sec   Loss 12.6664   LearningRate 0.5911   Epoch: 4   Global Step: 21350   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:35:08,331-Speed 10419.94 samples/sec   Loss 12.5407   LearningRate 0.5910   Epoch: 4   Global Step: 21360   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:35:16,173-Speed 10448.03 samples/sec   Loss 12.5048   LearningRate 0.5909   Epoch: 4   Global Step: 21370   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:35:24,001-Speed 10466.48 samples/sec   Loss 12.5130   LearningRate 0.5907   Epoch: 4   Global Step: 21380   Fp16 Grad Scale: 524288   Required: 18 hours
Training: 2022-01-15 19:35:31,840-Speed 10451.66 samples/sec   Loss 12.4568   LearningRate 0.5906   Epoch: 4   Global Step: 21390   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:35:39,684-Speed 10445.37 samples/sec   Loss 12.5404   LearningRate 0.5904   Epoch: 4   Global Step: 21400   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:35:47,518-Speed 10456.85 samples/sec   Loss 12.3680   LearningRate 0.5903   Epoch: 4   Global Step: 21410   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:35:55,342-Speed 10472.34 samples/sec   Loss 12.4448   LearningRate 0.5901   Epoch: 4   Global Step: 21420   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:36:03,217-Speed 10404.36 samples/sec   Loss 12.4346   LearningRate 0.5900   Epoch: 4   Global Step: 21430   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:36:11,051-Speed 10458.44 samples/sec   Loss 12.4442   LearningRate 0.5899   Epoch: 4   Global Step: 21440   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:36:18,884-Speed 10458.88 samples/sec   Loss 12.4460   LearningRate 0.5897   Epoch: 4   Global Step: 21450   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:36:26,758-Speed 10407.93 samples/sec   Loss 12.4394   LearningRate 0.5896   Epoch: 4   Global Step: 21460   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:36:34,566-Speed 10494.80 samples/sec   Loss 12.4038   LearningRate 0.5894   Epoch: 4   Global Step: 21470   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:36:42,407-Speed 10447.91 samples/sec   Loss 12.4445   LearningRate 0.5893   Epoch: 4   Global Step: 21480   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:36:50,231-Speed 10472.04 samples/sec   Loss 12.6080   LearningRate 0.5891   Epoch: 4   Global Step: 21490   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:36:58,041-Speed 10491.46 samples/sec   Loss 12.5344   LearningRate 0.5890   Epoch: 4   Global Step: 21500   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:37:05,876-Speed 10456.61 samples/sec   Loss 12.5061   LearningRate 0.5889   Epoch: 4   Global Step: 21510   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:37:13,684-Speed 10493.40 samples/sec   Loss 12.5429   LearningRate 0.5887   Epoch: 4   Global Step: 21520   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:37:21,492-Speed 10492.58 samples/sec   Loss 12.4759   LearningRate 0.5886   Epoch: 4   Global Step: 21530   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:37:29,331-Speed 10452.96 samples/sec   Loss 12.3882   LearningRate 0.5884   Epoch: 4   Global Step: 21540   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:37:37,150-Speed 10477.39 samples/sec   Loss 12.5075   LearningRate 0.5883   Epoch: 4   Global Step: 21550   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:37:44,959-Speed 10491.21 samples/sec   Loss 12.4250   LearningRate 0.5881   Epoch: 4   Global Step: 21560   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:37:52,751-Speed 10518.03 samples/sec   Loss 12.5740   LearningRate 0.5880   Epoch: 4   Global Step: 21570   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:38:00,587-Speed 10455.66 samples/sec   Loss 12.5156   LearningRate 0.5879   Epoch: 4   Global Step: 21580   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:38:08,383-Speed 10509.30 samples/sec   Loss 12.4342   LearningRate 0.5877   Epoch: 4   Global Step: 21590   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:38:16,173-Speed 10519.89 samples/sec   Loss 12.5136   LearningRate 0.5876   Epoch: 4   Global Step: 21600   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:38:23,990-Speed 10482.15 samples/sec   Loss 12.3843   LearningRate 0.5874   Epoch: 4   Global Step: 21610   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:38:31,785-Speed 10517.08 samples/sec   Loss 12.4669   LearningRate 0.5873   Epoch: 4   Global Step: 21620   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:38:39,572-Speed 10520.26 samples/sec   Loss 12.3580   LearningRate 0.5871   Epoch: 4   Global Step: 21630   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:38:47,377-Speed 10497.37 samples/sec   Loss 12.3555   LearningRate 0.5870   Epoch: 4   Global Step: 21640   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:38:55,197-Speed 10478.52 samples/sec   Loss 12.4203   LearningRate 0.5868   Epoch: 4   Global Step: 21650   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:39:03,005-Speed 10492.58 samples/sec   Loss 12.3854   LearningRate 0.5867   Epoch: 4   Global Step: 21660   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:39:10,828-Speed 10473.44 samples/sec   Loss 12.4946   LearningRate 0.5866   Epoch: 4   Global Step: 21670   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:39:18,633-Speed 10497.04 samples/sec   Loss 12.5136   LearningRate 0.5864   Epoch: 4   Global Step: 21680   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:39:26,430-Speed 10508.62 samples/sec   Loss 12.5074   LearningRate 0.5863   Epoch: 4   Global Step: 21690   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:39:34,266-Speed 10455.04 samples/sec   Loss 12.5631   LearningRate 0.5861   Epoch: 4   Global Step: 21700   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:39:42,054-Speed 10520.84 samples/sec   Loss 12.4737   LearningRate 0.5860   Epoch: 4   Global Step: 21710   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:39:49,833-Speed 10532.44 samples/sec   Loss 12.4038   LearningRate 0.5858   Epoch: 4   Global Step: 21720   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:39:57,631-Speed 10506.35 samples/sec   Loss 12.2713   LearningRate 0.5857   Epoch: 4   Global Step: 21730   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:40:05,448-Speed 10481.35 samples/sec   Loss 12.3641   LearningRate 0.5856   Epoch: 4   Global Step: 21740   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:40:13,282-Speed 10458.71 samples/sec   Loss 12.3714   LearningRate 0.5854   Epoch: 4   Global Step: 21750   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:40:21,074-Speed 10514.90 samples/sec   Loss 12.5048   LearningRate 0.5853   Epoch: 4   Global Step: 21760   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:40:28,905-Speed 10462.66 samples/sec   Loss 12.4577   LearningRate 0.5851   Epoch: 4   Global Step: 21770   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:40:36,732-Speed 10466.50 samples/sec   Loss 12.4207   LearningRate 0.5850   Epoch: 4   Global Step: 21780   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:40:44,557-Speed 10471.76 samples/sec   Loss 12.3630   LearningRate 0.5848   Epoch: 4   Global Step: 21790   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:40:52,365-Speed 10492.81 samples/sec   Loss 12.3599   LearningRate 0.5847   Epoch: 4   Global Step: 21800   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:41:00,148-Speed 10526.04 samples/sec   Loss 12.3051   LearningRate 0.5846   Epoch: 4   Global Step: 21810   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:41:07,961-Speed 10486.62 samples/sec   Loss 12.3218   LearningRate 0.5844   Epoch: 4   Global Step: 21820   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:41:15,773-Speed 10487.90 samples/sec   Loss 12.4544   LearningRate 0.5843   Epoch: 4   Global Step: 21830   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:41:23,562-Speed 10520.75 samples/sec   Loss 12.3380   LearningRate 0.5841   Epoch: 4   Global Step: 21840   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:41:31,349-Speed 10520.48 samples/sec   Loss 12.3240   LearningRate 0.5840   Epoch: 4   Global Step: 21850   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:41:39,170-Speed 10476.00 samples/sec   Loss 12.4256   LearningRate 0.5838   Epoch: 4   Global Step: 21860   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:41:46,997-Speed 10468.83 samples/sec   Loss 12.4305   LearningRate 0.5837   Epoch: 4   Global Step: 21870   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:41:54,817-Speed 10476.79 samples/sec   Loss 12.3768   LearningRate 0.5836   Epoch: 4   Global Step: 21880   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:42:02,643-Speed 10469.48 samples/sec   Loss 12.4685   LearningRate 0.5834   Epoch: 4   Global Step: 21890   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:42:10,473-Speed 10464.17 samples/sec   Loss 12.3976   LearningRate 0.5833   Epoch: 4   Global Step: 21900   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:42:18,319-Speed 10441.47 samples/sec   Loss 12.3955   LearningRate 0.5831   Epoch: 4   Global Step: 21910   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:42:26,133-Speed 10488.18 samples/sec   Loss 12.3767   LearningRate 0.5830   Epoch: 4   Global Step: 21920   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:42:33,933-Speed 10503.94 samples/sec   Loss 12.4123   LearningRate 0.5829   Epoch: 4   Global Step: 21930   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:42:41,791-Speed 10427.75 samples/sec   Loss 12.4311   LearningRate 0.5827   Epoch: 4   Global Step: 21940   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-15 19:42:49,605-Speed 10485.58 samples/sec   Loss 12.4874   LearningRate 0.5826   Epoch: 4   Global Step: 21950   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-15 19:42:57,497-Speed 10381.71 samples/sec   Loss 12.4889   LearningRate 0.5824   Epoch: 4   Global Step: 21960   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-15 19:43:05,331-Speed 10458.45 samples/sec   Loss 12.3010   LearningRate 0.5823   Epoch: 4   Global Step: 21970   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-15 19:43:13,154-Speed 10472.90 samples/sec   Loss 12.3431   LearningRate 0.5821   Epoch: 4   Global Step: 21980   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-15 19:43:20,973-Speed 10477.74 samples/sec   Loss 12.3414   LearningRate 0.5820   Epoch: 4   Global Step: 21990   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-15 19:43:28,796-Speed 10473.81 samples/sec   Loss 12.3197   LearningRate 0.5819   Epoch: 4   Global Step: 22000   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-15 19:43:36,614-Speed 10479.02 samples/sec   Loss 12.3368   LearningRate 0.5817   Epoch: 4   Global Step: 22010   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-15 19:43:44,441-Speed 10466.91 samples/sec   Loss 12.3984   LearningRate 0.5816   Epoch: 4   Global Step: 22020   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-15 19:43:52,264-Speed 10474.35 samples/sec   Loss 12.3128   LearningRate 0.5814   Epoch: 4   Global Step: 22030   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-15 19:44:00,121-Speed 10428.36 samples/sec   Loss 12.3180   LearningRate 0.5813   Epoch: 4   Global Step: 22040   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:44:08,010-Speed 10385.30 samples/sec   Loss 12.3302   LearningRate 0.5811   Epoch: 4   Global Step: 22050   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:44:15,862-Speed 10440.71 samples/sec   Loss 12.3638   LearningRate 0.5810   Epoch: 4   Global Step: 22060   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:44:23,708-Speed 10442.39 samples/sec   Loss 12.5253   LearningRate 0.5809   Epoch: 4   Global Step: 22070   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:44:31,576-Speed 10413.28 samples/sec   Loss 12.4457   LearningRate 0.5807   Epoch: 4   Global Step: 22080   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:44:39,427-Speed 10434.87 samples/sec   Loss 12.4124   LearningRate 0.5806   Epoch: 4   Global Step: 22090   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:44:47,247-Speed 10477.63 samples/sec   Loss 12.4199   LearningRate 0.5804   Epoch: 4   Global Step: 22100   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:44:55,094-Speed 10440.98 samples/sec   Loss 12.2905   LearningRate 0.5803   Epoch: 4   Global Step: 22110   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:45:02,877-Speed 10526.86 samples/sec   Loss 12.2896   LearningRate 0.5801   Epoch: 4   Global Step: 22120   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:45:10,691-Speed 10484.80 samples/sec   Loss 12.3308   LearningRate 0.5800   Epoch: 4   Global Step: 22130   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:45:18,504-Speed 10486.39 samples/sec   Loss 12.3901   LearningRate 0.5799   Epoch: 4   Global Step: 22140   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:45:26,315-Speed 10489.99 samples/sec   Loss 12.3845   LearningRate 0.5797   Epoch: 4   Global Step: 22150   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:45:34,146-Speed 10465.87 samples/sec   Loss 12.3855   LearningRate 0.5796   Epoch: 4   Global Step: 22160   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:45:41,969-Speed 10471.49 samples/sec   Loss 12.3340   LearningRate 0.5794   Epoch: 4   Global Step: 22170   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:45:49,782-Speed 10486.59 samples/sec   Loss 12.2505   LearningRate 0.5793   Epoch: 4   Global Step: 22180   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:45:57,613-Speed 10463.13 samples/sec   Loss 12.3022   LearningRate 0.5791   Epoch: 4   Global Step: 22190   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:46:05,443-Speed 10466.39 samples/sec   Loss 12.2761   LearningRate 0.5790   Epoch: 4   Global Step: 22200   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:46:13,277-Speed 10457.35 samples/sec   Loss 12.3297   LearningRate 0.5789   Epoch: 4   Global Step: 22210   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:46:21,123-Speed 10442.76 samples/sec   Loss 12.3456   LearningRate 0.5787   Epoch: 4   Global Step: 22220   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:46:28,940-Speed 10482.13 samples/sec   Loss 12.5471   LearningRate 0.5786   Epoch: 4   Global Step: 22230   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:46:36,743-Speed 10499.59 samples/sec   Loss 12.3665   LearningRate 0.5784   Epoch: 4   Global Step: 22240   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:46:44,537-Speed 10512.14 samples/sec   Loss 12.2699   LearningRate 0.5783   Epoch: 4   Global Step: 22250   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:46:52,347-Speed 10490.30 samples/sec   Loss 12.3042   LearningRate 0.5782   Epoch: 4   Global Step: 22260   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:47:00,139-Speed 10515.14 samples/sec   Loss 12.2532   LearningRate 0.5780   Epoch: 4   Global Step: 22270   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:47:07,960-Speed 10475.42 samples/sec   Loss 12.3204   LearningRate 0.5779   Epoch: 4   Global Step: 22280   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:47:15,758-Speed 10508.20 samples/sec   Loss 12.3186   LearningRate 0.5777   Epoch: 4   Global Step: 22290   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:47:23,558-Speed 10503.87 samples/sec   Loss 12.3128   LearningRate 0.5776   Epoch: 4   Global Step: 22300   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:47:31,377-Speed 10479.14 samples/sec   Loss 12.3209   LearningRate 0.5774   Epoch: 4   Global Step: 22310   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:47:39,204-Speed 10466.89 samples/sec   Loss 12.3550   LearningRate 0.5773   Epoch: 4   Global Step: 22320   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:47:47,013-Speed 10492.00 samples/sec   Loss 12.3169   LearningRate 0.5772   Epoch: 4   Global Step: 22330   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:47:54,824-Speed 10489.42 samples/sec   Loss 12.1907   LearningRate 0.5770   Epoch: 4   Global Step: 22340   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:48:02,597-Speed 10540.29 samples/sec   Loss 12.2545   LearningRate 0.5769   Epoch: 4   Global Step: 22350   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:48:10,382-Speed 10523.72 samples/sec   Loss 12.4793   LearningRate 0.5767   Epoch: 4   Global Step: 22360   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:48:18,169-Speed 10521.06 samples/sec   Loss 12.3609   LearningRate 0.5766   Epoch: 4   Global Step: 22370   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:48:25,953-Speed 10526.53 samples/sec   Loss 12.3559   LearningRate 0.5765   Epoch: 4   Global Step: 22380   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:48:33,747-Speed 10511.68 samples/sec   Loss 12.2540   LearningRate 0.5763   Epoch: 4   Global Step: 22390   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:48:41,556-Speed 10491.88 samples/sec   Loss 12.3982   LearningRate 0.5762   Epoch: 4   Global Step: 22400   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:48:49,406-Speed 10437.92 samples/sec   Loss 12.2597   LearningRate 0.5760   Epoch: 4   Global Step: 22410   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:48:57,220-Speed 10484.20 samples/sec   Loss 12.2961   LearningRate 0.5759   Epoch: 4   Global Step: 22420   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:49:05,038-Speed 10479.51 samples/sec   Loss 12.2321   LearningRate 0.5757   Epoch: 4   Global Step: 22430   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:49:12,857-Speed 10483.45 samples/sec   Loss 12.3037   LearningRate 0.5756   Epoch: 4   Global Step: 22440   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:49:20,694-Speed 10453.93 samples/sec   Loss 12.2577   LearningRate 0.5755   Epoch: 4   Global Step: 22450   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:49:28,534-Speed 10449.21 samples/sec   Loss 12.2746   LearningRate 0.5753   Epoch: 4   Global Step: 22460   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:49:36,391-Speed 10429.18 samples/sec   Loss 12.2021   LearningRate 0.5752   Epoch: 4   Global Step: 22470   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:49:44,224-Speed 10459.99 samples/sec   Loss 12.1819   LearningRate 0.5750   Epoch: 4   Global Step: 22480   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:49:52,044-Speed 10476.94 samples/sec   Loss 12.3963   LearningRate 0.5749   Epoch: 4   Global Step: 22490   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:49:59,831-Speed 10520.97 samples/sec   Loss 12.3110   LearningRate 0.5748   Epoch: 4   Global Step: 22500   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:50:07,646-Speed 10492.01 samples/sec   Loss 12.2835   LearningRate 0.5746   Epoch: 4   Global Step: 22510   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:50:15,444-Speed 10506.39 samples/sec   Loss 12.4966   LearningRate 0.5745   Epoch: 4   Global Step: 22520   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:50:23,272-Speed 10465.47 samples/sec   Loss 12.3647   LearningRate 0.5743   Epoch: 4   Global Step: 22530   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:50:31,093-Speed 10475.26 samples/sec   Loss 12.2286   LearningRate 0.5742   Epoch: 4   Global Step: 22540   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:50:38,878-Speed 10525.22 samples/sec   Loss 12.2124   LearningRate 0.5740   Epoch: 4   Global Step: 22550   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:50:46,668-Speed 10517.48 samples/sec   Loss 12.2356   LearningRate 0.5739   Epoch: 4   Global Step: 22560   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:50:54,525-Speed 10427.74 samples/sec   Loss 12.1878   LearningRate 0.5738   Epoch: 4   Global Step: 22570   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:51:02,351-Speed 10467.95 samples/sec   Loss 12.2162   LearningRate 0.5736   Epoch: 4   Global Step: 22580   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:51:10,171-Speed 10477.12 samples/sec   Loss 12.2395   LearningRate 0.5735   Epoch: 4   Global Step: 22590   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:51:17,961-Speed 10517.74 samples/sec   Loss 12.3788   LearningRate 0.5733   Epoch: 4   Global Step: 22600   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:51:25,778-Speed 10481.35 samples/sec   Loss 12.2901   LearningRate 0.5732   Epoch: 4   Global Step: 22610   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:51:33,592-Speed 10484.60 samples/sec   Loss 12.2931   LearningRate 0.5731   Epoch: 4   Global Step: 22620   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:51:41,397-Speed 10498.85 samples/sec   Loss 12.1961   LearningRate 0.5729   Epoch: 4   Global Step: 22630   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:51:49,242-Speed 10443.53 samples/sec   Loss 12.2696   LearningRate 0.5728   Epoch: 4   Global Step: 22640   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:51:57,053-Speed 10488.52 samples/sec   Loss 12.2481   LearningRate 0.5726   Epoch: 4   Global Step: 22650   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:52:04,891-Speed 10453.28 samples/sec   Loss 12.2677   LearningRate 0.5725   Epoch: 4   Global Step: 22660   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:52:12,713-Speed 10474.88 samples/sec   Loss 12.1531   LearningRate 0.5723   Epoch: 4   Global Step: 22670   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:52:20,560-Speed 10441.49 samples/sec   Loss 12.2378   LearningRate 0.5722   Epoch: 4   Global Step: 22680   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:52:28,391-Speed 10461.96 samples/sec   Loss 12.1333   LearningRate 0.5721   Epoch: 4   Global Step: 22690   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:52:36,200-Speed 10492.30 samples/sec   Loss 12.2776   LearningRate 0.5719   Epoch: 4   Global Step: 22700   Fp16 Grad Scale: 524288   Required: 18 hours
Training: 2022-01-15 19:52:44,004-Speed 10502.91 samples/sec   Loss 12.3006   LearningRate 0.5718   Epoch: 4   Global Step: 22710   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:52:51,834-Speed 10462.94 samples/sec   Loss 12.1963   LearningRate 0.5716   Epoch: 4   Global Step: 22720   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:52:59,639-Speed 10497.44 samples/sec   Loss 12.2713   LearningRate 0.5715   Epoch: 4   Global Step: 22730   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:53:07,460-Speed 10476.35 samples/sec   Loss 12.2607   LearningRate 0.5714   Epoch: 4   Global Step: 22740   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:53:15,271-Speed 10489.37 samples/sec   Loss 12.2769   LearningRate 0.5712   Epoch: 4   Global Step: 22750   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:53:23,067-Speed 10509.20 samples/sec   Loss 12.1735   LearningRate 0.5711   Epoch: 4   Global Step: 22760   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:53:30,840-Speed 10540.56 samples/sec   Loss 12.2783   LearningRate 0.5709   Epoch: 4   Global Step: 22770   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:53:38,624-Speed 10524.51 samples/sec   Loss 12.1650   LearningRate 0.5708   Epoch: 4   Global Step: 22780   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:53:46,452-Speed 10466.84 samples/sec   Loss 12.2101   LearningRate 0.5707   Epoch: 4   Global Step: 22790   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:53:54,233-Speed 10529.43 samples/sec   Loss 12.1369   LearningRate 0.5705   Epoch: 4   Global Step: 22800   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:54:02,038-Speed 10497.04 samples/sec   Loss 12.2259   LearningRate 0.5704   Epoch: 4   Global Step: 22810   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:54:09,851-Speed 10486.51 samples/sec   Loss 12.2217   LearningRate 0.5702   Epoch: 4   Global Step: 22820   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:54:17,715-Speed 10419.16 samples/sec   Loss 12.2383   LearningRate 0.5701   Epoch: 4   Global Step: 22830   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:54:25,520-Speed 10499.27 samples/sec   Loss 12.2055   LearningRate 0.5699   Epoch: 4   Global Step: 22840   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:54:33,352-Speed 10461.17 samples/sec   Loss 12.1832   LearningRate 0.5698   Epoch: 4   Global Step: 22850   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:54:41,182-Speed 10463.22 samples/sec   Loss 12.1449   LearningRate 0.5697   Epoch: 4   Global Step: 22860   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:54:48,998-Speed 10482.10 samples/sec   Loss 12.2342   LearningRate 0.5695   Epoch: 4   Global Step: 22870   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:54:56,807-Speed 10491.87 samples/sec   Loss 12.1620   LearningRate 0.5694   Epoch: 4   Global Step: 22880   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:55:04,652-Speed 10443.82 samples/sec   Loss 12.1423   LearningRate 0.5692   Epoch: 4   Global Step: 22890   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:55:12,482-Speed 10463.59 samples/sec   Loss 12.2624   LearningRate 0.5691   Epoch: 4   Global Step: 22900   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:55:20,352-Speed 10410.06 samples/sec   Loss 12.1565   LearningRate 0.5690   Epoch: 4   Global Step: 22910   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:55:28,159-Speed 10495.22 samples/sec   Loss 12.3110   LearningRate 0.5688   Epoch: 4   Global Step: 22920   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:55:35,969-Speed 10490.63 samples/sec   Loss 12.1819   LearningRate 0.5687   Epoch: 4   Global Step: 22930   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:55:43,773-Speed 10499.08 samples/sec   Loss 12.1617   LearningRate 0.5685   Epoch: 4   Global Step: 22940   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 19:55:51,565-Speed 10514.45 samples/sec   Loss 12.1900   LearningRate 0.5684   Epoch: 4   Global Step: 22950   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:55:59,366-Speed 10503.11 samples/sec   Loss 12.2540   LearningRate 0.5683   Epoch: 4   Global Step: 22960   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:56:07,161-Speed 10510.26 samples/sec   Loss 12.2415   LearningRate 0.5681   Epoch: 4   Global Step: 22970   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:56:14,990-Speed 10465.73 samples/sec   Loss 12.1272   LearningRate 0.5680   Epoch: 4   Global Step: 22980   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:56:22,841-Speed 10435.20 samples/sec   Loss 12.2388   LearningRate 0.5678   Epoch: 4   Global Step: 22990   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:56:30,637-Speed 10510.24 samples/sec   Loss 12.2430   LearningRate 0.5677   Epoch: 4   Global Step: 23000   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:56:38,454-Speed 10480.64 samples/sec   Loss 12.1653   LearningRate 0.5676   Epoch: 4   Global Step: 23010   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:56:46,254-Speed 10504.04 samples/sec   Loss 12.2062   LearningRate 0.5674   Epoch: 4   Global Step: 23020   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:56:54,089-Speed 10457.62 samples/sec   Loss 12.2331   LearningRate 0.5673   Epoch: 4   Global Step: 23030   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:57:01,887-Speed 10506.11 samples/sec   Loss 12.2499   LearningRate 0.5671   Epoch: 4   Global Step: 23040   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:57:09,701-Speed 10486.39 samples/sec   Loss 12.0925   LearningRate 0.5670   Epoch: 4   Global Step: 23050   Fp16 Grad Scale: 524288   Required: 18 hours
Training: 2022-01-15 19:57:17,486-Speed 10523.94 samples/sec   Loss 12.1365   LearningRate 0.5668   Epoch: 4   Global Step: 23060   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:57:25,289-Speed 10499.52 samples/sec   Loss 12.1932   LearningRate 0.5667   Epoch: 4   Global Step: 23070   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:57:33,098-Speed 10491.86 samples/sec   Loss 12.1920   LearningRate 0.5666   Epoch: 4   Global Step: 23080   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:57:40,935-Speed 10454.16 samples/sec   Loss 12.0571   LearningRate 0.5664   Epoch: 4   Global Step: 23090   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:57:48,753-Speed 10480.86 samples/sec   Loss 12.2760   LearningRate 0.5663   Epoch: 4   Global Step: 23100   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:57:56,566-Speed 10485.83 samples/sec   Loss 12.1595   LearningRate 0.5661   Epoch: 4   Global Step: 23110   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:58:04,397-Speed 10461.75 samples/sec   Loss 12.2088   LearningRate 0.5660   Epoch: 4   Global Step: 23120   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:58:12,203-Speed 10496.38 samples/sec   Loss 12.1050   LearningRate 0.5659   Epoch: 4   Global Step: 23130   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:58:19,998-Speed 10511.18 samples/sec   Loss 12.1623   LearningRate 0.5657   Epoch: 4   Global Step: 23140   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:58:27,818-Speed 10477.57 samples/sec   Loss 12.2473   LearningRate 0.5656   Epoch: 4   Global Step: 23150   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:58:35,608-Speed 10517.45 samples/sec   Loss 12.2291   LearningRate 0.5654   Epoch: 4   Global Step: 23160   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:58:43,424-Speed 10482.79 samples/sec   Loss 12.2777   LearningRate 0.5653   Epoch: 4   Global Step: 23170   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:58:51,229-Speed 10497.01 samples/sec   Loss 12.2427   LearningRate 0.5652   Epoch: 4   Global Step: 23180   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:58:59,034-Speed 10497.19 samples/sec   Loss 12.1519   LearningRate 0.5650   Epoch: 4   Global Step: 23190   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:59:06,874-Speed 10451.03 samples/sec   Loss 12.0810   LearningRate 0.5649   Epoch: 4   Global Step: 23200   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:59:14,707-Speed 10460.57 samples/sec   Loss 12.1074   LearningRate 0.5647   Epoch: 4   Global Step: 23210   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:59:22,544-Speed 10453.92 samples/sec   Loss 12.3328   LearningRate 0.5646   Epoch: 4   Global Step: 23220   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:59:30,416-Speed 10408.74 samples/sec   Loss 12.2660   LearningRate 0.5645   Epoch: 4   Global Step: 23230   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:59:38,247-Speed 10463.43 samples/sec   Loss 12.2261   LearningRate 0.5643   Epoch: 4   Global Step: 23240   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:59:46,053-Speed 10496.75 samples/sec   Loss 12.1564   LearningRate 0.5642   Epoch: 4   Global Step: 23250   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 19:59:53,833-Speed 10531.58 samples/sec   Loss 12.1392   LearningRate 0.5640   Epoch: 4   Global Step: 23260   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:00:01,641-Speed 10492.76 samples/sec   Loss 12.0712   LearningRate 0.5639   Epoch: 4   Global Step: 23270   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:00:09,438-Speed 10508.08 samples/sec   Loss 12.0689   LearningRate 0.5638   Epoch: 4   Global Step: 23280   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:00:17,224-Speed 10523.71 samples/sec   Loss 12.0580   LearningRate 0.5636   Epoch: 4   Global Step: 23290   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:00:25,041-Speed 10481.87 samples/sec   Loss 12.2051   LearningRate 0.5635   Epoch: 4   Global Step: 23300   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:00:32,826-Speed 10523.17 samples/sec   Loss 12.1852   LearningRate 0.5633   Epoch: 4   Global Step: 23310   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:00:40,628-Speed 10501.59 samples/sec   Loss 12.1659   LearningRate 0.5632   Epoch: 4   Global Step: 23320   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:00:48,447-Speed 10478.95 samples/sec   Loss 12.1150   LearningRate 0.5631   Epoch: 4   Global Step: 23330   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:00:56,267-Speed 10476.63 samples/sec   Loss 12.1256   LearningRate 0.5629   Epoch: 4   Global Step: 23340   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:01:04,070-Speed 10500.48 samples/sec   Loss 12.1249   LearningRate 0.5628   Epoch: 4   Global Step: 23350   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:01:11,922-Speed 10435.19 samples/sec   Loss 12.1349   LearningRate 0.5626   Epoch: 4   Global Step: 23360   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:01:19,703-Speed 10530.29 samples/sec   Loss 12.1453   LearningRate 0.5625   Epoch: 4   Global Step: 23370   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:01:27,510-Speed 10494.09 samples/sec   Loss 12.0794   LearningRate 0.5624   Epoch: 4   Global Step: 23380   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:01:35,293-Speed 10527.17 samples/sec   Loss 12.1404   LearningRate 0.5622   Epoch: 4   Global Step: 23390   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:01:43,089-Speed 10509.71 samples/sec   Loss 11.9810   LearningRate 0.5621   Epoch: 4   Global Step: 23400   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:01:50,880-Speed 10516.56 samples/sec   Loss 12.3219   LearningRate 0.5619   Epoch: 4   Global Step: 23410   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:01:58,683-Speed 10500.22 samples/sec   Loss 12.2596   LearningRate 0.5618   Epoch: 4   Global Step: 23420   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:02:06,501-Speed 10479.56 samples/sec   Loss 12.1748   LearningRate 0.5617   Epoch: 4   Global Step: 23430   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:02:14,326-Speed 10470.28 samples/sec   Loss 12.0390   LearningRate 0.5615   Epoch: 4   Global Step: 23440   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:02:22,135-Speed 10491.38 samples/sec   Loss 11.9951   LearningRate 0.5614   Epoch: 4   Global Step: 23450   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:02:29,943-Speed 10493.86 samples/sec   Loss 12.1058   LearningRate 0.5612   Epoch: 4   Global Step: 23460   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:02:37,732-Speed 10518.69 samples/sec   Loss 12.0527   LearningRate 0.5611   Epoch: 4   Global Step: 23470   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:02:45,519-Speed 10521.59 samples/sec   Loss 12.2075   LearningRate 0.5610   Epoch: 4   Global Step: 23480   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:02:53,340-Speed 10474.84 samples/sec   Loss 12.0428   LearningRate 0.5608   Epoch: 4   Global Step: 23490   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:03:01,155-Speed 10484.22 samples/sec   Loss 12.0905   LearningRate 0.5607   Epoch: 4   Global Step: 23500   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:03:09,001-Speed 10443.03 samples/sec   Loss 12.1525   LearningRate 0.5605   Epoch: 4   Global Step: 23510   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:03:16,803-Speed 10502.06 samples/sec   Loss 12.2287   LearningRate 0.5604   Epoch: 4   Global Step: 23520   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:03:24,600-Speed 10507.05 samples/sec   Loss 12.1284   LearningRate 0.5603   Epoch: 4   Global Step: 23530   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:03:32,389-Speed 10519.56 samples/sec   Loss 12.0687   LearningRate 0.5601   Epoch: 4   Global Step: 23540   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:03:40,177-Speed 10520.07 samples/sec   Loss 12.0013   LearningRate 0.5600   Epoch: 4   Global Step: 23550   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:03:47,945-Speed 10546.47 samples/sec   Loss 12.0055   LearningRate 0.5598   Epoch: 4   Global Step: 23560   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:03:55,741-Speed 10509.97 samples/sec   Loss 12.0589   LearningRate 0.5597   Epoch: 4   Global Step: 23570   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:04:03,527-Speed 10523.93 samples/sec   Loss 12.0349   LearningRate 0.5596   Epoch: 4   Global Step: 23580   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:04:11,343-Speed 10481.96 samples/sec   Loss 12.0809   LearningRate 0.5594   Epoch: 4   Global Step: 23590   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:04:19,139-Speed 10508.72 samples/sec   Loss 12.1090   LearningRate 0.5593   Epoch: 4   Global Step: 23600   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:04:26,927-Speed 10520.28 samples/sec   Loss 12.0549   LearningRate 0.5591   Epoch: 4   Global Step: 23610   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:04:34,693-Speed 10551.19 samples/sec   Loss 12.1859   LearningRate 0.5590   Epoch: 4   Global Step: 23620   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 20:04:42,486-Speed 10512.78 samples/sec   Loss 12.0152   LearningRate 0.5589   Epoch: 4   Global Step: 23630   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 20:04:50,273-Speed 10521.16 samples/sec   Loss 12.1799   LearningRate 0.5587   Epoch: 4   Global Step: 23640   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 20:04:58,059-Speed 10523.95 samples/sec   Loss 12.2077   LearningRate 0.5586   Epoch: 4   Global Step: 23650   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 20:05:05,872-Speed 10486.53 samples/sec   Loss 12.0973   LearningRate 0.5584   Epoch: 4   Global Step: 23660   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 20:05:13,694-Speed 10475.20 samples/sec   Loss 12.1550   LearningRate 0.5583   Epoch: 4   Global Step: 23670   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 20:05:21,497-Speed 10499.13 samples/sec   Loss 12.1458   LearningRate 0.5582   Epoch: 4   Global Step: 23680   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 20:05:29,303-Speed 10495.54 samples/sec   Loss 11.9718   LearningRate 0.5580   Epoch: 4   Global Step: 23690   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 20:05:37,081-Speed 10533.55 samples/sec   Loss 12.0675   LearningRate 0.5579   Epoch: 4   Global Step: 23700   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 20:05:44,892-Speed 10489.09 samples/sec   Loss 12.0477   LearningRate 0.5577   Epoch: 4   Global Step: 23710   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 20:05:52,682-Speed 10518.66 samples/sec   Loss 12.1185   LearningRate 0.5576   Epoch: 4   Global Step: 23720   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:06:00,500-Speed 10478.61 samples/sec   Loss 11.9341   LearningRate 0.5575   Epoch: 4   Global Step: 23730   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:06:08,285-Speed 10524.74 samples/sec   Loss 12.0875   LearningRate 0.5573   Epoch: 4   Global Step: 23740   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:06:16,063-Speed 10534.22 samples/sec   Loss 12.0575   LearningRate 0.5572   Epoch: 4   Global Step: 23750   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:06:23,874-Speed 10489.02 samples/sec   Loss 12.1561   LearningRate 0.5570   Epoch: 4   Global Step: 23760   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:06:31,670-Speed 10509.88 samples/sec   Loss 12.1268   LearningRate 0.5569   Epoch: 4   Global Step: 23770   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:06:39,484-Speed 10483.73 samples/sec   Loss 12.0285   LearningRate 0.5568   Epoch: 4   Global Step: 23780   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:06:47,271-Speed 10522.01 samples/sec   Loss 12.0651   LearningRate 0.5566   Epoch: 4   Global Step: 23790   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:06:55,067-Speed 10510.04 samples/sec   Loss 11.9973   LearningRate 0.5565   Epoch: 4   Global Step: 23800   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:07:02,862-Speed 10510.02 samples/sec   Loss 12.0738   LearningRate 0.5564   Epoch: 4   Global Step: 23810   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:07:10,631-Speed 10545.15 samples/sec   Loss 11.9558   LearningRate 0.5562   Epoch: 4   Global Step: 23820   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 20:07:18,439-Speed 10494.57 samples/sec   Loss 12.2688   LearningRate 0.5561   Epoch: 4   Global Step: 23830   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 20:07:26,236-Speed 10508.33 samples/sec   Loss 12.2008   LearningRate 0.5559   Epoch: 4   Global Step: 23840   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 20:07:34,029-Speed 10515.55 samples/sec   Loss 12.1129   LearningRate 0.5558   Epoch: 4   Global Step: 23850   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 20:07:41,822-Speed 10513.71 samples/sec   Loss 12.0341   LearningRate 0.5557   Epoch: 4   Global Step: 23860   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 20:07:49,601-Speed 10532.46 samples/sec   Loss 11.9096   LearningRate 0.5555   Epoch: 4   Global Step: 23870   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 20:07:57,396-Speed 10510.43 samples/sec   Loss 12.0333   LearningRate 0.5554   Epoch: 4   Global Step: 23880   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 20:08:05,184-Speed 10519.59 samples/sec   Loss 11.9639   LearningRate 0.5552   Epoch: 4   Global Step: 23890   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 20:08:12,999-Speed 10484.14 samples/sec   Loss 12.0948   LearningRate 0.5551   Epoch: 4   Global Step: 23900   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 20:08:20,802-Speed 10500.49 samples/sec   Loss 12.0883   LearningRate 0.5550   Epoch: 4   Global Step: 23910   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-15 20:08:28,576-Speed 10541.17 samples/sec   Loss 12.1476   LearningRate 0.5548   Epoch: 4   Global Step: 23920   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:08:36,371-Speed 10509.94 samples/sec   Loss 11.9819   LearningRate 0.5547   Epoch: 4   Global Step: 23930   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:08:44,180-Speed 10492.27 samples/sec   Loss 11.8622   LearningRate 0.5545   Epoch: 4   Global Step: 23940   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:08:51,979-Speed 10504.68 samples/sec   Loss 11.9604   LearningRate 0.5544   Epoch: 4   Global Step: 23950   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:08:59,800-Speed 10476.68 samples/sec   Loss 12.0207   LearningRate 0.5543   Epoch: 4   Global Step: 23960   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:09:07,619-Speed 10477.99 samples/sec   Loss 12.1096   LearningRate 0.5541   Epoch: 4   Global Step: 23970   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:09:15,447-Speed 10467.04 samples/sec   Loss 12.0228   LearningRate 0.5540   Epoch: 4   Global Step: 23980   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:09:23,267-Speed 10476.43 samples/sec   Loss 11.9553   LearningRate 0.5538   Epoch: 4   Global Step: 23990   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:09:31,114-Speed 10440.34 samples/sec   Loss 12.0181   LearningRate 0.5537   Epoch: 4   Global Step: 24000   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:09:38,923-Speed 10499.10 samples/sec   Loss 12.0705   LearningRate 0.5536   Epoch: 4   Global Step: 24010   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:09:46,746-Speed 10473.76 samples/sec   Loss 12.0013   LearningRate 0.5534   Epoch: 4   Global Step: 24020   Fp16 Grad Scale: 524288   Required: 18 hours
Training: 2022-01-15 20:09:54,541-Speed 10510.62 samples/sec   Loss 12.1112   LearningRate 0.5533   Epoch: 4   Global Step: 24030   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:10:02,389-Speed 10438.50 samples/sec   Loss 11.9836   LearningRate 0.5532   Epoch: 4   Global Step: 24040   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-15 20:10:10,195-Speed 10500.87 samples/sec   Loss 12.0527   LearningRate 0.5530   Epoch: 4   Global Step: 24050   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:10:18,015-Speed 10476.96 samples/sec   Loss 12.0248   LearningRate 0.5529   Epoch: 4   Global Step: 24060   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:10:25,811-Speed 10508.49 samples/sec   Loss 11.9486   LearningRate 0.5527   Epoch: 4   Global Step: 24070   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:10:33,632-Speed 10476.15 samples/sec   Loss 11.9332   LearningRate 0.5526   Epoch: 4   Global Step: 24080   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:10:41,424-Speed 10515.92 samples/sec   Loss 12.0275   LearningRate 0.5525   Epoch: 4   Global Step: 24090   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:10:49,192-Speed 10546.36 samples/sec   Loss 11.9258   LearningRate 0.5523   Epoch: 4   Global Step: 24100   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:10:56,966-Speed 10539.30 samples/sec   Loss 12.0652   LearningRate 0.5522   Epoch: 4   Global Step: 24110   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:11:04,809-Speed 10447.26 samples/sec   Loss 12.0042   LearningRate 0.5520   Epoch: 4   Global Step: 24120   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:11:12,612-Speed 10500.27 samples/sec   Loss 11.9818   LearningRate 0.5519   Epoch: 4   Global Step: 24130   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:11:20,399-Speed 10520.80 samples/sec   Loss 12.1607   LearningRate 0.5518   Epoch: 4   Global Step: 24140   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:11:28,228-Speed 10465.17 samples/sec   Loss 12.0176   LearningRate 0.5516   Epoch: 4   Global Step: 24150   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:11:36,038-Speed 10490.85 samples/sec   Loss 12.0502   LearningRate 0.5515   Epoch: 4   Global Step: 24160   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:11:43,832-Speed 10511.54 samples/sec   Loss 11.9732   LearningRate 0.5513   Epoch: 4   Global Step: 24170   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:11:51,644-Speed 10488.06 samples/sec   Loss 11.9463   LearningRate 0.5512   Epoch: 4   Global Step: 24180   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:11:59,462-Speed 10479.12 samples/sec   Loss 11.9412   LearningRate 0.5511   Epoch: 4   Global Step: 24190   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:12:07,273-Speed 10490.15 samples/sec   Loss 11.8583   LearningRate 0.5509   Epoch: 4   Global Step: 24200   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:12:15,050-Speed 10537.42 samples/sec   Loss 11.9314   LearningRate 0.5508   Epoch: 4   Global Step: 24210   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:12:22,838-Speed 10519.51 samples/sec   Loss 12.1290   LearningRate 0.5507   Epoch: 4   Global Step: 24220   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:12:30,634-Speed 10508.77 samples/sec   Loss 11.9852   LearningRate 0.5505   Epoch: 4   Global Step: 24230   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:12:38,499-Speed 10417.97 samples/sec   Loss 11.8330   LearningRate 0.5504   Epoch: 4   Global Step: 24240   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:12:46,290-Speed 10516.06 samples/sec   Loss 11.9359   LearningRate 0.5502   Epoch: 4   Global Step: 24250   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:12:54,116-Speed 10468.90 samples/sec   Loss 12.0236   LearningRate 0.5501   Epoch: 4   Global Step: 24260   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:13:01,928-Speed 10487.80 samples/sec   Loss 11.9400   LearningRate 0.5500   Epoch: 4   Global Step: 24270   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:13:09,732-Speed 10502.14 samples/sec   Loss 11.9526   LearningRate 0.5498   Epoch: 4   Global Step: 24280   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:13:17,523-Speed 10515.61 samples/sec   Loss 12.0330   LearningRate 0.5497   Epoch: 4   Global Step: 24290   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:13:25,322-Speed 10505.68 samples/sec   Loss 11.9958   LearningRate 0.5495   Epoch: 4   Global Step: 24300   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:13:33,142-Speed 10477.21 samples/sec   Loss 12.0054   LearningRate 0.5494   Epoch: 4   Global Step: 24310   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:13:40,938-Speed 10508.41 samples/sec   Loss 11.9696   LearningRate 0.5493   Epoch: 4   Global Step: 24320   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:13:48,727-Speed 10519.17 samples/sec   Loss 11.8384   LearningRate 0.5491   Epoch: 4   Global Step: 24330   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:13:56,554-Speed 10468.36 samples/sec   Loss 11.9554   LearningRate 0.5490   Epoch: 4   Global Step: 24340   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:14:04,363-Speed 10492.08 samples/sec   Loss 11.8754   LearningRate 0.5489   Epoch: 4   Global Step: 24350   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:14:12,173-Speed 10491.16 samples/sec   Loss 11.9904   LearningRate 0.5487   Epoch: 4   Global Step: 24360   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:14:20,011-Speed 10459.21 samples/sec   Loss 11.9267   LearningRate 0.5486   Epoch: 4   Global Step: 24370   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:14:27,809-Speed 10506.88 samples/sec   Loss 11.8796   LearningRate 0.5484   Epoch: 4   Global Step: 24380   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:14:35,623-Speed 10486.17 samples/sec   Loss 11.9717   LearningRate 0.5483   Epoch: 4   Global Step: 24390   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:14:43,442-Speed 10478.46 samples/sec   Loss 11.8965   LearningRate 0.5482   Epoch: 4   Global Step: 24400   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:14:51,270-Speed 10466.97 samples/sec   Loss 12.0143   LearningRate 0.5480   Epoch: 4   Global Step: 24410   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:14:59,096-Speed 10469.12 samples/sec   Loss 11.9583   LearningRate 0.5479   Epoch: 4   Global Step: 24420   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:15:06,904-Speed 10494.12 samples/sec   Loss 11.9385   LearningRate 0.5477   Epoch: 4   Global Step: 24430   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:15:14,747-Speed 10446.03 samples/sec   Loss 11.9281   LearningRate 0.5476   Epoch: 4   Global Step: 24440   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:15:22,589-Speed 10447.29 samples/sec   Loss 11.9526   LearningRate 0.5475   Epoch: 4   Global Step: 24450   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:15:30,424-Speed 10457.96 samples/sec   Loss 11.8945   LearningRate 0.5473   Epoch: 4   Global Step: 24460   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:15:38,224-Speed 10504.32 samples/sec   Loss 11.9302   LearningRate 0.5472   Epoch: 4   Global Step: 24470   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:15:46,006-Speed 10527.44 samples/sec   Loss 12.0597   LearningRate 0.5471   Epoch: 4   Global Step: 24480   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:15:53,819-Speed 10487.00 samples/sec   Loss 12.0158   LearningRate 0.5469   Epoch: 4   Global Step: 24490   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:16:01,636-Speed 10480.52 samples/sec   Loss 11.9352   LearningRate 0.5468   Epoch: 4   Global Step: 24500   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:16:09,431-Speed 10510.62 samples/sec   Loss 11.8221   LearningRate 0.5466   Epoch: 4   Global Step: 24510   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:16:17,215-Speed 10526.56 samples/sec   Loss 11.8803   LearningRate 0.5465   Epoch: 4   Global Step: 24520   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:16:25,096-Speed 10395.21 samples/sec   Loss 11.8816   LearningRate 0.5464   Epoch: 4   Global Step: 24530   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:16:32,916-Speed 10477.33 samples/sec   Loss 12.0065   LearningRate 0.5462   Epoch: 4   Global Step: 24540   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:16:40,712-Speed 10509.92 samples/sec   Loss 11.8783   LearningRate 0.5461   Epoch: 4   Global Step: 24550   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:16:48,507-Speed 10510.16 samples/sec   Loss 11.8979   LearningRate 0.5460   Epoch: 4   Global Step: 24560   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:16:56,297-Speed 10517.58 samples/sec   Loss 11.9100   LearningRate 0.5458   Epoch: 4   Global Step: 24570   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:17:04,097-Speed 10504.23 samples/sec   Loss 11.9170   LearningRate 0.5457   Epoch: 4   Global Step: 24580   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:17:11,887-Speed 10517.68 samples/sec   Loss 11.8504   LearningRate 0.5455   Epoch: 4   Global Step: 24590   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:17:19,678-Speed 10516.58 samples/sec   Loss 11.8847   LearningRate 0.5454   Epoch: 4   Global Step: 24600   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:17:27,464-Speed 10522.27 samples/sec   Loss 11.8216   LearningRate 0.5453   Epoch: 4   Global Step: 24610   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:17:35,298-Speed 10458.87 samples/sec   Loss 11.9307   LearningRate 0.5451   Epoch: 4   Global Step: 24620   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:17:43,200-Speed 10368.59 samples/sec   Loss 11.9092   LearningRate 0.5450   Epoch: 4   Global Step: 24630   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:17:51,042-Speed 10446.41 samples/sec   Loss 11.8749   LearningRate 0.5448   Epoch: 4   Global Step: 24640   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:17:58,867-Speed 10471.40 samples/sec   Loss 11.9084   LearningRate 0.5447   Epoch: 4   Global Step: 24650   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:18:06,724-Speed 10428.56 samples/sec   Loss 11.8232   LearningRate 0.5446   Epoch: 4   Global Step: 24660   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:18:14,538-Speed 10484.42 samples/sec   Loss 11.8780   LearningRate 0.5444   Epoch: 4   Global Step: 24670   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:18:22,321-Speed 10528.09 samples/sec   Loss 11.9137   LearningRate 0.5443   Epoch: 4   Global Step: 24680   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:18:30,116-Speed 10510.89 samples/sec   Loss 11.9579   LearningRate 0.5442   Epoch: 4   Global Step: 24690   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:18:37,913-Speed 10508.18 samples/sec   Loss 11.9114   LearningRate 0.5440   Epoch: 4   Global Step: 24700   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:18:45,716-Speed 10500.30 samples/sec   Loss 11.8836   LearningRate 0.5439   Epoch: 4   Global Step: 24710   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:18:53,559-Speed 10445.51 samples/sec   Loss 11.8471   LearningRate 0.5437   Epoch: 4   Global Step: 24720   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:19:01,353-Speed 10511.61 samples/sec   Loss 11.8390   LearningRate 0.5436   Epoch: 4   Global Step: 24730   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:19:09,174-Speed 10476.36 samples/sec   Loss 11.8828   LearningRate 0.5435   Epoch: 4   Global Step: 24740   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:19:16,984-Speed 10490.52 samples/sec   Loss 11.9011   LearningRate 0.5433   Epoch: 4   Global Step: 24750   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:19:24,831-Speed 10441.15 samples/sec   Loss 12.0718   LearningRate 0.5432   Epoch: 4   Global Step: 24760   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:19:32,657-Speed 10468.53 samples/sec   Loss 11.9510   LearningRate 0.5431   Epoch: 4   Global Step: 24770   Fp16 Grad Scale: 524288   Required: 17 hours
Training: 2022-01-15 20:19:40,460-Speed 10500.44 samples/sec   Loss 11.9287   LearningRate 0.5429   Epoch: 4   Global Step: 24780   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:19:48,283-Speed 10473.53 samples/sec   Loss 11.8265   LearningRate 0.5428   Epoch: 4   Global Step: 24790   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:19:56,101-Speed 10479.83 samples/sec   Loss 11.7918   LearningRate 0.5426   Epoch: 4   Global Step: 24800   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:20:03,907-Speed 10495.25 samples/sec   Loss 11.8152   LearningRate 0.5425   Epoch: 4   Global Step: 24810   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:20:11,699-Speed 10515.30 samples/sec   Loss 11.8416   LearningRate 0.5424   Epoch: 4   Global Step: 24820   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:20:19,531-Speed 10460.45 samples/sec   Loss 11.8321   LearningRate 0.5422   Epoch: 4   Global Step: 24830   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:20:27,379-Speed 10439.69 samples/sec   Loss 11.8925   LearningRate 0.5421   Epoch: 4   Global Step: 24840   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:20:35,179-Speed 10503.07 samples/sec   Loss 11.8159   LearningRate 0.5420   Epoch: 4   Global Step: 24850   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:20:42,967-Speed 10521.51 samples/sec   Loss 11.8361   LearningRate 0.5418   Epoch: 4   Global Step: 24860   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:20:50,755-Speed 10519.68 samples/sec   Loss 11.9194   LearningRate 0.5417   Epoch: 4   Global Step: 24870   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:20:58,543-Speed 10520.25 samples/sec   Loss 11.8974   LearningRate 0.5415   Epoch: 4   Global Step: 24880   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:21:06,341-Speed 10506.21 samples/sec   Loss 11.7744   LearningRate 0.5414   Epoch: 4   Global Step: 24890   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:21:14,132-Speed 10516.61 samples/sec   Loss 11.7859   LearningRate 0.5413   Epoch: 4   Global Step: 24900   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:21:21,916-Speed 10526.08 samples/sec   Loss 11.7498   LearningRate 0.5411   Epoch: 4   Global Step: 24910   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:21:29,722-Speed 10495.69 samples/sec   Loss 11.7297   LearningRate 0.5410   Epoch: 4   Global Step: 24920   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:21:37,538-Speed 10483.16 samples/sec   Loss 11.7893   LearningRate 0.5409   Epoch: 4   Global Step: 24930   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:21:45,305-Speed 10549.14 samples/sec   Loss 11.8556   LearningRate 0.5407   Epoch: 4   Global Step: 24940   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:21:53,115-Speed 10489.92 samples/sec   Loss 11.8777   LearningRate 0.5406   Epoch: 4   Global Step: 24950   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:22:00,915-Speed 10504.95 samples/sec   Loss 11.8627   LearningRate 0.5404   Epoch: 4   Global Step: 24960   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:22:08,710-Speed 10509.59 samples/sec   Loss 11.9693   LearningRate 0.5403   Epoch: 4   Global Step: 24970   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:22:16,514-Speed 10498.28 samples/sec   Loss 11.9957   LearningRate 0.5402   Epoch: 4   Global Step: 24980   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:22:24,311-Speed 10508.47 samples/sec   Loss 11.7989   LearningRate 0.5400   Epoch: 4   Global Step: 24990   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:22:32,130-Speed 10478.83 samples/sec   Loss 11.7395   LearningRate 0.5399   Epoch: 4   Global Step: 25000   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:22:39,930-Speed 10504.02 samples/sec   Loss 11.8584   LearningRate 0.5398   Epoch: 4   Global Step: 25010   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:22:47,729-Speed 10505.09 samples/sec   Loss 11.8184   LearningRate 0.5396   Epoch: 4   Global Step: 25020   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:22:55,513-Speed 10525.82 samples/sec   Loss 11.7818   LearningRate 0.5395   Epoch: 4   Global Step: 25030   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:23:03,295-Speed 10528.63 samples/sec   Loss 11.7593   LearningRate 0.5393   Epoch: 4   Global Step: 25040   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:23:11,092-Speed 10507.99 samples/sec   Loss 11.8607   LearningRate 0.5392   Epoch: 4   Global Step: 25050   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:23:18,885-Speed 10513.32 samples/sec   Loss 11.8720   LearningRate 0.5391   Epoch: 4   Global Step: 25060   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:23:26,688-Speed 10500.45 samples/sec   Loss 11.9175   LearningRate 0.5389   Epoch: 4   Global Step: 25070   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:23:34,523-Speed 10457.22 samples/sec   Loss 11.8222   LearningRate 0.5388   Epoch: 4   Global Step: 25080   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:23:42,302-Speed 10531.54 samples/sec   Loss 11.7814   LearningRate 0.5387   Epoch: 4   Global Step: 25090   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:23:50,105-Speed 10500.58 samples/sec   Loss 11.7589   LearningRate 0.5385   Epoch: 4   Global Step: 25100   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:23:57,888-Speed 10527.01 samples/sec   Loss 11.7383   LearningRate 0.5384   Epoch: 4   Global Step: 25110   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:24:05,683-Speed 10510.26 samples/sec   Loss 12.0425   LearningRate 0.5383   Epoch: 4   Global Step: 25120   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:24:13,486-Speed 10500.48 samples/sec   Loss 11.8035   LearningRate 0.5381   Epoch: 4   Global Step: 25130   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:24:21,298-Speed 10486.76 samples/sec   Loss 11.7646   LearningRate 0.5380   Epoch: 4   Global Step: 25140   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:24:29,112-Speed 10485.29 samples/sec   Loss 11.7945   LearningRate 0.5378   Epoch: 4   Global Step: 25150   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:24:36,934-Speed 10475.77 samples/sec   Loss 11.8463   LearningRate 0.5377   Epoch: 4   Global Step: 25160   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:24:44,720-Speed 10521.41 samples/sec   Loss 11.8312   LearningRate 0.5376   Epoch: 4   Global Step: 25170   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:24:52,514-Speed 10512.20 samples/sec   Loss 11.8259   LearningRate 0.5374   Epoch: 4   Global Step: 25180   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:25:00,334-Speed 10482.14 samples/sec   Loss 11.6703   LearningRate 0.5373   Epoch: 4   Global Step: 25190   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:25:08,150-Speed 10482.65 samples/sec   Loss 11.7805   LearningRate 0.5372   Epoch: 4   Global Step: 25200   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:25:15,948-Speed 10507.77 samples/sec   Loss 11.9054   LearningRate 0.5370   Epoch: 4   Global Step: 25210   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:25:23,754-Speed 10494.72 samples/sec   Loss 11.7970   LearningRate 0.5369   Epoch: 4   Global Step: 25220   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:25:31,577-Speed 10473.05 samples/sec   Loss 11.7989   LearningRate 0.5367   Epoch: 4   Global Step: 25230   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:25:39,390-Speed 10486.21 samples/sec   Loss 11.7340   LearningRate 0.5366   Epoch: 4   Global Step: 25240   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:25:47,198-Speed 10493.83 samples/sec   Loss 11.6849   LearningRate 0.5365   Epoch: 4   Global Step: 25250   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:25:55,044-Speed 10441.32 samples/sec   Loss 11.8403   LearningRate 0.5363   Epoch: 4   Global Step: 25260   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:26:02,898-Speed 10432.72 samples/sec   Loss 11.7446   LearningRate 0.5362   Epoch: 4   Global Step: 25270   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:26:10,739-Speed 10459.59 samples/sec   Loss 11.7511   LearningRate 0.5361   Epoch: 4   Global Step: 25280   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:26:18,544-Speed 10496.50 samples/sec   Loss 11.7025   LearningRate 0.5359   Epoch: 4   Global Step: 25290   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:26:26,378-Speed 10458.77 samples/sec   Loss 11.8328   LearningRate 0.5358   Epoch: 4   Global Step: 25300   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:26:34,159-Speed 10529.18 samples/sec   Loss 11.7298   LearningRate 0.5356   Epoch: 4   Global Step: 25310   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:26:41,986-Speed 10468.07 samples/sec   Loss 11.8606   LearningRate 0.5355   Epoch: 4   Global Step: 25320   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:26:49,760-Speed 10539.15 samples/sec   Loss 11.8327   LearningRate 0.5354   Epoch: 4   Global Step: 25330   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:26:57,567-Speed 10495.32 samples/sec   Loss 11.8470   LearningRate 0.5352   Epoch: 4   Global Step: 25340   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:27:05,364-Speed 10506.63 samples/sec   Loss 11.7100   LearningRate 0.5351   Epoch: 4   Global Step: 25350   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:27:13,141-Speed 10534.32 samples/sec   Loss 11.7741   LearningRate 0.5350   Epoch: 4   Global Step: 25360   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:27:20,913-Speed 10542.79 samples/sec   Loss 11.7931   LearningRate 0.5348   Epoch: 4   Global Step: 25370   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:27:28,712-Speed 10505.36 samples/sec   Loss 11.7164   LearningRate 0.5347   Epoch: 4   Global Step: 25380   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:27:36,509-Speed 10507.97 samples/sec   Loss 11.7737   LearningRate 0.5346   Epoch: 4   Global Step: 25390   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:27:44,302-Speed 10513.23 samples/sec   Loss 11.8989   LearningRate 0.5344   Epoch: 4   Global Step: 25400   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:27:52,113-Speed 10489.32 samples/sec   Loss 11.7086   LearningRate 0.5343   Epoch: 4   Global Step: 25410   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:27:59,920-Speed 10494.68 samples/sec   Loss 11.7255   LearningRate 0.5341   Epoch: 4   Global Step: 25420   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:28:07,750-Speed 10464.18 samples/sec   Loss 11.8999   LearningRate 0.5340   Epoch: 4   Global Step: 25430   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:28:15,595-Speed 10443.29 samples/sec   Loss 11.7476   LearningRate 0.5339   Epoch: 4   Global Step: 25440   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:28:23,403-Speed 10493.94 samples/sec   Loss 11.7663   LearningRate 0.5337   Epoch: 4   Global Step: 25450   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:28:31,200-Speed 10507.43 samples/sec   Loss 11.6970   LearningRate 0.5336   Epoch: 4   Global Step: 25460   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:28:38,983-Speed 10527.29 samples/sec   Loss 11.7808   LearningRate 0.5335   Epoch: 4   Global Step: 25470   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:28:46,774-Speed 10516.21 samples/sec   Loss 11.7040   LearningRate 0.5333   Epoch: 4   Global Step: 25480   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:28:54,614-Speed 10450.26 samples/sec   Loss 11.6923   LearningRate 0.5332   Epoch: 4   Global Step: 25490   Fp16 Grad Scale: 524288   Required: 17 hours
Training: 2022-01-15 20:29:02,403-Speed 10518.54 samples/sec   Loss 11.7351   LearningRate 0.5331   Epoch: 4   Global Step: 25500   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:29:10,200-Speed 10508.43 samples/sec   Loss 11.8150   LearningRate 0.5329   Epoch: 4   Global Step: 25510   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:29:17,979-Speed 10536.28 samples/sec   Loss 11.6368   LearningRate 0.5328   Epoch: 4   Global Step: 25520   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:29:25,778-Speed 10506.04 samples/sec   Loss 11.7843   LearningRate 0.5326   Epoch: 4   Global Step: 25530   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:29:33,560-Speed 10527.45 samples/sec   Loss 11.6734   LearningRate 0.5325   Epoch: 4   Global Step: 25540   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:29:41,350-Speed 10518.15 samples/sec   Loss 11.7519   LearningRate 0.5324   Epoch: 4   Global Step: 25550   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:29:49,143-Speed 10512.99 samples/sec   Loss 11.6811   LearningRate 0.5322   Epoch: 4   Global Step: 25560   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:29:56,956-Speed 10486.55 samples/sec   Loss 11.6625   LearningRate 0.5321   Epoch: 4   Global Step: 25570   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:30:04,784-Speed 10466.66 samples/sec   Loss 11.7189   LearningRate 0.5320   Epoch: 4   Global Step: 25580   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:30:12,599-Speed 10482.94 samples/sec   Loss 11.7735   LearningRate 0.5318   Epoch: 4   Global Step: 25590   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:30:20,409-Speed 10490.72 samples/sec   Loss 11.8235   LearningRate 0.5317   Epoch: 4   Global Step: 25600   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:30:28,232-Speed 10474.05 samples/sec   Loss 11.7017   LearningRate 0.5316   Epoch: 4   Global Step: 25610   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:30:36,036-Speed 10498.44 samples/sec   Loss 11.7692   LearningRate 0.5314   Epoch: 4   Global Step: 25620   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:30:43,907-Speed 10408.80 samples/sec   Loss 11.6852   LearningRate 0.5313   Epoch: 4   Global Step: 25630   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:30:51,708-Speed 10503.71 samples/sec   Loss 11.6604   LearningRate 0.5311   Epoch: 4   Global Step: 25640   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:30:59,522-Speed 10485.15 samples/sec   Loss 11.6269   LearningRate 0.5310   Epoch: 4   Global Step: 25650   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:31:07,336-Speed 10485.01 samples/sec   Loss 11.7365   LearningRate 0.5309   Epoch: 4   Global Step: 25660   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:31:15,173-Speed 10454.31 samples/sec   Loss 11.7648   LearningRate 0.5307   Epoch: 4   Global Step: 25670   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:31:22,969-Speed 10510.00 samples/sec   Loss 11.6609   LearningRate 0.5306   Epoch: 4   Global Step: 25680   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:31:30,790-Speed 10475.98 samples/sec   Loss 11.6954   LearningRate 0.5305   Epoch: 4   Global Step: 25690   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:31:38,643-Speed 10433.74 samples/sec   Loss 11.8274   LearningRate 0.5303   Epoch: 4   Global Step: 25700   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:31:46,441-Speed 10506.18 samples/sec   Loss 11.7476   LearningRate 0.5302   Epoch: 4   Global Step: 25710   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:31:54,229-Speed 10519.44 samples/sec   Loss 11.6978   LearningRate 0.5301   Epoch: 4   Global Step: 25720   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:32:02,032-Speed 10499.88 samples/sec   Loss 11.6548   LearningRate 0.5299   Epoch: 4   Global Step: 25730   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:32:09,845-Speed 10486.61 samples/sec   Loss 11.9575   LearningRate 0.5298   Epoch: 4   Global Step: 25740   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:32:17,689-Speed 10445.89 samples/sec   Loss 11.7317   LearningRate 0.5297   Epoch: 4   Global Step: 25750   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:32:25,485-Speed 10508.65 samples/sec   Loss 11.8161   LearningRate 0.5295   Epoch: 4   Global Step: 25760   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:32:33,294-Speed 10494.08 samples/sec   Loss 11.7716   LearningRate 0.5294   Epoch: 4   Global Step: 25770   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:32:41,143-Speed 10438.97 samples/sec   Loss 11.6409   LearningRate 0.5292   Epoch: 4   Global Step: 25780   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:32:48,938-Speed 10511.20 samples/sec   Loss 11.5906   LearningRate 0.5291   Epoch: 4   Global Step: 25790   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:32:56,734-Speed 10508.89 samples/sec   Loss 11.6242   LearningRate 0.5290   Epoch: 4   Global Step: 25800   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:33:04,557-Speed 10473.71 samples/sec   Loss 11.7545   LearningRate 0.5288   Epoch: 4   Global Step: 25810   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:33:12,350-Speed 10513.36 samples/sec   Loss 11.6642   LearningRate 0.5287   Epoch: 4   Global Step: 25820   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:33:20,159-Speed 10491.37 samples/sec   Loss 11.5773   LearningRate 0.5286   Epoch: 4   Global Step: 25830   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:33:27,989-Speed 10463.55 samples/sec   Loss 11.6258   LearningRate 0.5284   Epoch: 4   Global Step: 25840   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:33:35,846-Speed 10428.31 samples/sec   Loss 11.6605   LearningRate 0.5283   Epoch: 4   Global Step: 25850   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:33:43,658-Speed 10486.97 samples/sec   Loss 11.6642   LearningRate 0.5282   Epoch: 4   Global Step: 25860   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:33:51,503-Speed 10444.42 samples/sec   Loss 11.7924   LearningRate 0.5280   Epoch: 4   Global Step: 25870   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:33:59,322-Speed 10478.30 samples/sec   Loss 11.6813   LearningRate 0.5279   Epoch: 4   Global Step: 25880   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:34:07,121-Speed 10505.54 samples/sec   Loss 11.6982   LearningRate 0.5278   Epoch: 4   Global Step: 25890   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:34:14,955-Speed 10457.96 samples/sec   Loss 11.8291   LearningRate 0.5276   Epoch: 4   Global Step: 25900   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:34:22,763-Speed 10493.50 samples/sec   Loss 11.6738   LearningRate 0.5275   Epoch: 4   Global Step: 25910   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:34:30,570-Speed 10494.96 samples/sec   Loss 11.6323   LearningRate 0.5273   Epoch: 4   Global Step: 25920   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:34:53,084-Speed 3638.98 samples/sec   Loss 11.5993   LearningRate 0.5272   Epoch: 5   Global Step: 25930   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:35:00,876-Speed 10515.01 samples/sec   Loss 11.5999   LearningRate 0.5271   Epoch: 5   Global Step: 25940   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:35:08,636-Speed 10560.70 samples/sec   Loss 11.5908   LearningRate 0.5269   Epoch: 5   Global Step: 25950   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:35:16,401-Speed 10551.81 samples/sec   Loss 11.6921   LearningRate 0.5268   Epoch: 5   Global Step: 25960   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:35:24,198-Speed 10508.05 samples/sec   Loss 11.7018   LearningRate 0.5267   Epoch: 5   Global Step: 25970   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:35:31,967-Speed 10547.35 samples/sec   Loss 11.6235   LearningRate 0.5265   Epoch: 5   Global Step: 25980   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:35:39,760-Speed 10513.63 samples/sec   Loss 11.5926   LearningRate 0.5264   Epoch: 5   Global Step: 25990   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:35:47,576-Speed 10481.94 samples/sec   Loss 11.6421   LearningRate 0.5263   Epoch: 5   Global Step: 26000   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:35:55,400-Speed 10472.85 samples/sec   Loss 11.6315   LearningRate 0.5261   Epoch: 5   Global Step: 26010   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:36:03,205-Speed 10497.02 samples/sec   Loss 11.6276   LearningRate 0.5260   Epoch: 5   Global Step: 26020   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:36:11,010-Speed 10497.53 samples/sec   Loss 11.6953   LearningRate 0.5259   Epoch: 5   Global Step: 26030   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:36:18,819-Speed 10491.36 samples/sec   Loss 11.5460   LearningRate 0.5257   Epoch: 5   Global Step: 26040   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:36:26,613-Speed 10511.86 samples/sec   Loss 11.5718   LearningRate 0.5256   Epoch: 5   Global Step: 26050   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:36:34,435-Speed 10474.92 samples/sec   Loss 11.7888   LearningRate 0.5254   Epoch: 5   Global Step: 26060   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:36:42,232-Speed 10510.49 samples/sec   Loss 11.6705   LearningRate 0.5253   Epoch: 5   Global Step: 26070   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:36:50,029-Speed 10508.19 samples/sec   Loss 11.7511   LearningRate 0.5252   Epoch: 5   Global Step: 26080   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:36:57,814-Speed 10524.17 samples/sec   Loss 11.5599   LearningRate 0.5250   Epoch: 5   Global Step: 26090   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:37:05,596-Speed 10527.60 samples/sec   Loss 11.6869   LearningRate 0.5249   Epoch: 5   Global Step: 26100   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:37:13,370-Speed 10539.79 samples/sec   Loss 11.5647   LearningRate 0.5248   Epoch: 5   Global Step: 26110   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:37:21,159-Speed 10518.18 samples/sec   Loss 11.6154   LearningRate 0.5246   Epoch: 5   Global Step: 26120   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:37:28,951-Speed 10514.24 samples/sec   Loss 11.6185   LearningRate 0.5245   Epoch: 5   Global Step: 26130   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:37:36,725-Speed 10539.69 samples/sec   Loss 11.6090   LearningRate 0.5244   Epoch: 5   Global Step: 26140   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:37:44,519-Speed 10512.04 samples/sec   Loss 11.5796   LearningRate 0.5242   Epoch: 5   Global Step: 26150   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:37:52,335-Speed 10481.57 samples/sec   Loss 11.6650   LearningRate 0.5241   Epoch: 5   Global Step: 26160   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:38:00,131-Speed 10510.16 samples/sec   Loss 11.6437   LearningRate 0.5240   Epoch: 5   Global Step: 26170   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:38:07,912-Speed 10529.55 samples/sec   Loss 11.6191   LearningRate 0.5238   Epoch: 5   Global Step: 26180   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:38:15,725-Speed 10486.08 samples/sec   Loss 11.5717   LearningRate 0.5237   Epoch: 5   Global Step: 26190   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:38:23,506-Speed 10530.29 samples/sec   Loss 11.6123   LearningRate 0.5236   Epoch: 5   Global Step: 26200   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:38:31,282-Speed 10535.46 samples/sec   Loss 11.5797   LearningRate 0.5234   Epoch: 5   Global Step: 26210   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:38:39,071-Speed 10521.20 samples/sec   Loss 11.5965   LearningRate 0.5233   Epoch: 5   Global Step: 26220   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:38:46,863-Speed 10514.27 samples/sec   Loss 11.6313   LearningRate 0.5231   Epoch: 5   Global Step: 26230   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:38:54,658-Speed 10511.39 samples/sec   Loss 11.5765   LearningRate 0.5230   Epoch: 5   Global Step: 26240   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:39:02,461-Speed 10499.95 samples/sec   Loss 11.5752   LearningRate 0.5229   Epoch: 5   Global Step: 26250   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:39:10,248-Speed 10527.70 samples/sec   Loss 11.6392   LearningRate 0.5227   Epoch: 5   Global Step: 26260   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:39:18,067-Speed 10477.80 samples/sec   Loss 11.6402   LearningRate 0.5226   Epoch: 5   Global Step: 26270   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:39:25,874-Speed 10495.05 samples/sec   Loss 11.6120   LearningRate 0.5225   Epoch: 5   Global Step: 26280   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:39:33,700-Speed 10468.97 samples/sec   Loss 11.5333   LearningRate 0.5223   Epoch: 5   Global Step: 26290   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:39:41,527-Speed 10466.88 samples/sec   Loss 11.4495   LearningRate 0.5222   Epoch: 5   Global Step: 26300   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:39:49,337-Speed 10491.74 samples/sec   Loss 11.6247   LearningRate 0.5221   Epoch: 5   Global Step: 26310   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:39:57,177-Speed 10450.90 samples/sec   Loss 11.6675   LearningRate 0.5219   Epoch: 5   Global Step: 26320   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:40:05,029-Speed 10434.27 samples/sec   Loss 11.5482   LearningRate 0.5218   Epoch: 5   Global Step: 26330   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:40:12,905-Speed 10402.39 samples/sec   Loss 11.5168   LearningRate 0.5217   Epoch: 5   Global Step: 26340   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:40:20,719-Speed 10484.87 samples/sec   Loss 11.5586   LearningRate 0.5215   Epoch: 5   Global Step: 26350   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:40:28,554-Speed 10457.08 samples/sec   Loss 11.6230   LearningRate 0.5214   Epoch: 5   Global Step: 26360   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:40:36,400-Speed 10443.07 samples/sec   Loss 11.6813   LearningRate 0.5213   Epoch: 5   Global Step: 26370   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:40:44,238-Speed 10453.31 samples/sec   Loss 11.7015   LearningRate 0.5211   Epoch: 5   Global Step: 26380   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:40:52,087-Speed 10437.28 samples/sec   Loss 11.6386   LearningRate 0.5210   Epoch: 5   Global Step: 26390   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:40:59,906-Speed 10479.65 samples/sec   Loss 11.6462   LearningRate 0.5209   Epoch: 5   Global Step: 26400   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:41:07,759-Speed 10436.27 samples/sec   Loss 11.5753   LearningRate 0.5207   Epoch: 5   Global Step: 26410   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:41:15,655-Speed 10376.64 samples/sec   Loss 11.6649   LearningRate 0.5206   Epoch: 5   Global Step: 26420   Fp16 Grad Scale: 524288   Required: 17 hours
Training: 2022-01-15 20:41:23,501-Speed 10442.43 samples/sec   Loss 11.5699   LearningRate 0.5204   Epoch: 5   Global Step: 26430   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:41:31,325-Speed 10471.40 samples/sec   Loss 11.5682   LearningRate 0.5203   Epoch: 5   Global Step: 26440   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:41:39,137-Speed 10487.79 samples/sec   Loss 11.5556   LearningRate 0.5202   Epoch: 5   Global Step: 26450   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:41:46,962-Speed 10470.35 samples/sec   Loss 11.5295   LearningRate 0.5200   Epoch: 5   Global Step: 26460   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:41:54,817-Speed 10430.10 samples/sec   Loss 11.4330   LearningRate 0.5199   Epoch: 5   Global Step: 26470   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:42:02,618-Speed 10503.26 samples/sec   Loss 11.6522   LearningRate 0.5198   Epoch: 5   Global Step: 26480   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-15 20:42:10,474-Speed 10428.99 samples/sec   Loss 11.6638   LearningRate 0.5196   Epoch: 5   Global Step: 26490   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-15 20:42:18,325-Speed 10435.03 samples/sec   Loss 11.4805   LearningRate 0.5195   Epoch: 5   Global Step: 26500   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-15 20:42:26,140-Speed 10484.36 samples/sec   Loss 11.5475   LearningRate 0.5194   Epoch: 5   Global Step: 26510   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-15 20:42:33,975-Speed 10457.52 samples/sec   Loss 11.6115   LearningRate 0.5192   Epoch: 5   Global Step: 26520   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-15 20:42:41,833-Speed 10426.33 samples/sec   Loss 11.5830   LearningRate 0.5191   Epoch: 5   Global Step: 26530   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-15 20:42:49,669-Speed 10456.02 samples/sec   Loss 11.4472   LearningRate 0.5190   Epoch: 5   Global Step: 26540   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-15 20:42:57,490-Speed 10474.97 samples/sec   Loss 11.7099   LearningRate 0.5188   Epoch: 5   Global Step: 26550   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-15 20:43:05,372-Speed 10395.50 samples/sec   Loss 11.5818   LearningRate 0.5187   Epoch: 5   Global Step: 26560   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-15 20:43:13,215-Speed 10445.70 samples/sec   Loss 11.6345   LearningRate 0.5186   Epoch: 5   Global Step: 26570   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-15 20:43:21,050-Speed 10456.95 samples/sec   Loss 11.4925   LearningRate 0.5184   Epoch: 5   Global Step: 26580   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:43:28,892-Speed 10447.43 samples/sec   Loss 11.4977   LearningRate 0.5183   Epoch: 5   Global Step: 26590   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:43:36,702-Speed 10490.49 samples/sec   Loss 11.5077   LearningRate 0.5182   Epoch: 5   Global Step: 26600   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:43:44,513-Speed 10489.18 samples/sec   Loss 11.5432   LearningRate 0.5180   Epoch: 5   Global Step: 26610   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:43:52,340-Speed 10468.05 samples/sec   Loss 11.5280   LearningRate 0.5179   Epoch: 5   Global Step: 26620   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:44:00,198-Speed 10426.35 samples/sec   Loss 11.5048   LearningRate 0.5178   Epoch: 5   Global Step: 26630   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:44:08,014-Speed 10483.29 samples/sec   Loss 11.4621   LearningRate 0.5176   Epoch: 5   Global Step: 26640   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:44:15,815-Speed 10501.76 samples/sec   Loss 11.5512   LearningRate 0.5175   Epoch: 5   Global Step: 26650   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:44:23,631-Speed 10482.62 samples/sec   Loss 11.5617   LearningRate 0.5174   Epoch: 5   Global Step: 26660   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:44:31,474-Speed 10446.15 samples/sec   Loss 11.5064   LearningRate 0.5172   Epoch: 5   Global Step: 26670   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:44:39,274-Speed 10504.15 samples/sec   Loss 11.5240   LearningRate 0.5171   Epoch: 5   Global Step: 26680   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:44:47,093-Speed 10478.89 samples/sec   Loss 11.6499   LearningRate 0.5170   Epoch: 5   Global Step: 26690   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:44:54,883-Speed 10516.69 samples/sec   Loss 11.6228   LearningRate 0.5168   Epoch: 5   Global Step: 26700   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:45:02,681-Speed 10507.70 samples/sec   Loss 11.5234   LearningRate 0.5167   Epoch: 5   Global Step: 26710   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:45:10,485-Speed 10498.46 samples/sec   Loss 11.4469   LearningRate 0.5165   Epoch: 5   Global Step: 26720   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:45:18,286-Speed 10507.39 samples/sec   Loss 11.5871   LearningRate 0.5164   Epoch: 5   Global Step: 26730   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:45:26,096-Speed 10491.55 samples/sec   Loss 11.4404   LearningRate 0.5163   Epoch: 5   Global Step: 26740   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:45:33,916-Speed 10476.71 samples/sec   Loss 11.5462   LearningRate 0.5161   Epoch: 5   Global Step: 26750   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:45:41,745-Speed 10465.34 samples/sec   Loss 11.4628   LearningRate 0.5160   Epoch: 5   Global Step: 26760   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:45:49,572-Speed 10467.85 samples/sec   Loss 11.4131   LearningRate 0.5159   Epoch: 5   Global Step: 26770   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:45:57,381-Speed 10491.44 samples/sec   Loss 11.5333   LearningRate 0.5157   Epoch: 5   Global Step: 26780   Fp16 Grad Scale: 524288   Required: 17 hours
Training: 2022-01-15 20:46:05,184-Speed 10500.87 samples/sec   Loss 11.4788   LearningRate 0.5156   Epoch: 5   Global Step: 26790   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:46:13,039-Speed 10430.65 samples/sec   Loss 11.5344   LearningRate 0.5155   Epoch: 5   Global Step: 26800   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:46:20,825-Speed 10522.76 samples/sec   Loss 11.5557   LearningRate 0.5153   Epoch: 5   Global Step: 26810   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:46:28,611-Speed 10523.39 samples/sec   Loss 11.5709   LearningRate 0.5152   Epoch: 5   Global Step: 26820   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:46:36,412-Speed 10502.01 samples/sec   Loss 11.5047   LearningRate 0.5151   Epoch: 5   Global Step: 26830   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:46:44,252-Speed 10449.52 samples/sec   Loss 11.6312   LearningRate 0.5149   Epoch: 5   Global Step: 26840   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:46:52,046-Speed 10513.04 samples/sec   Loss 11.5472   LearningRate 0.5148   Epoch: 5   Global Step: 26850   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:46:59,858-Speed 10488.02 samples/sec   Loss 11.4820   LearningRate 0.5147   Epoch: 5   Global Step: 26860   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:47:07,661-Speed 10500.51 samples/sec   Loss 11.5031   LearningRate 0.5145   Epoch: 5   Global Step: 26870   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:47:15,474-Speed 10485.05 samples/sec   Loss 11.5336   LearningRate 0.5144   Epoch: 5   Global Step: 26880   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:47:23,256-Speed 10528.94 samples/sec   Loss 11.3686   LearningRate 0.5143   Epoch: 5   Global Step: 26890   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:47:31,049-Speed 10514.27 samples/sec   Loss 11.4629   LearningRate 0.5141   Epoch: 5   Global Step: 26900   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:47:38,834-Speed 10522.92 samples/sec   Loss 11.5549   LearningRate 0.5140   Epoch: 5   Global Step: 26910   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:47:46,652-Speed 10479.60 samples/sec   Loss 11.4240   LearningRate 0.5139   Epoch: 5   Global Step: 26920   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:47:54,463-Speed 10489.96 samples/sec   Loss 11.4739   LearningRate 0.5137   Epoch: 5   Global Step: 26930   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:48:02,264-Speed 10502.20 samples/sec   Loss 11.4634   LearningRate 0.5136   Epoch: 5   Global Step: 26940   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:48:10,093-Speed 10465.26 samples/sec   Loss 11.3550   LearningRate 0.5135   Epoch: 5   Global Step: 26950   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:48:17,904-Speed 10489.18 samples/sec   Loss 11.4440   LearningRate 0.5133   Epoch: 5   Global Step: 26960   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:48:25,695-Speed 10516.61 samples/sec   Loss 11.4975   LearningRate 0.5132   Epoch: 5   Global Step: 26970   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:48:33,474-Speed 10532.98 samples/sec   Loss 11.4726   LearningRate 0.5131   Epoch: 5   Global Step: 26980   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:48:41,261-Speed 10520.75 samples/sec   Loss 11.4067   LearningRate 0.5129   Epoch: 5   Global Step: 26990   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:48:49,061-Speed 10504.19 samples/sec   Loss 11.3730   LearningRate 0.5128   Epoch: 5   Global Step: 27000   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:48:56,933-Speed 10407.83 samples/sec   Loss 11.4016   LearningRate 0.5127   Epoch: 5   Global Step: 27010   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:49:04,732-Speed 10505.58 samples/sec   Loss 11.5206   LearningRate 0.5125   Epoch: 5   Global Step: 27020   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:49:12,529-Speed 10508.23 samples/sec   Loss 11.6074   LearningRate 0.5124   Epoch: 5   Global Step: 27030   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:49:20,338-Speed 10491.54 samples/sec   Loss 11.5190   LearningRate 0.5123   Epoch: 5   Global Step: 27040   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:49:28,156-Speed 10480.53 samples/sec   Loss 11.4642   LearningRate 0.5121   Epoch: 5   Global Step: 27050   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:49:35,949-Speed 10514.19 samples/sec   Loss 11.4856   LearningRate 0.5120   Epoch: 5   Global Step: 27060   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:49:43,770-Speed 10474.84 samples/sec   Loss 11.3899   LearningRate 0.5119   Epoch: 5   Global Step: 27070   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:49:51,604-Speed 10458.17 samples/sec   Loss 11.4782   LearningRate 0.5117   Epoch: 5   Global Step: 27080   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:49:59,397-Speed 10512.83 samples/sec   Loss 11.4562   LearningRate 0.5116   Epoch: 5   Global Step: 27090   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:50:07,185-Speed 10520.14 samples/sec   Loss 11.4942   LearningRate 0.5115   Epoch: 5   Global Step: 27100   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:50:15,000-Speed 10484.43 samples/sec   Loss 11.4984   LearningRate 0.5113   Epoch: 5   Global Step: 27110   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:50:22,813-Speed 10486.72 samples/sec   Loss 11.3993   LearningRate 0.5112   Epoch: 5   Global Step: 27120   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:50:30,601-Speed 10520.39 samples/sec   Loss 11.4810   LearningRate 0.5111   Epoch: 5   Global Step: 27130   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:50:38,402-Speed 10503.12 samples/sec   Loss 11.4280   LearningRate 0.5109   Epoch: 5   Global Step: 27140   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:50:46,205-Speed 10500.51 samples/sec   Loss 11.4385   LearningRate 0.5108   Epoch: 5   Global Step: 27150   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:50:53,997-Speed 10515.04 samples/sec   Loss 11.5393   LearningRate 0.5107   Epoch: 5   Global Step: 27160   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:51:01,776-Speed 10531.43 samples/sec   Loss 11.5099   LearningRate 0.5105   Epoch: 5   Global Step: 27170   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:51:09,558-Speed 10527.99 samples/sec   Loss 11.3963   LearningRate 0.5104   Epoch: 5   Global Step: 27180   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:51:17,375-Speed 10481.69 samples/sec   Loss 11.3918   LearningRate 0.5103   Epoch: 5   Global Step: 27190   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:51:25,195-Speed 10477.83 samples/sec   Loss 11.4580   LearningRate 0.5101   Epoch: 5   Global Step: 27200   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:51:33,004-Speed 10490.79 samples/sec   Loss 11.4447   LearningRate 0.5100   Epoch: 5   Global Step: 27210   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:51:40,790-Speed 10522.53 samples/sec   Loss 11.4227   LearningRate 0.5099   Epoch: 5   Global Step: 27220   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:51:48,622-Speed 10464.36 samples/sec   Loss 11.6004   LearningRate 0.5097   Epoch: 5   Global Step: 27230   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:51:56,419-Speed 10507.82 samples/sec   Loss 11.5106   LearningRate 0.5096   Epoch: 5   Global Step: 27240   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:52:04,202-Speed 10526.36 samples/sec   Loss 11.4108   LearningRate 0.5095   Epoch: 5   Global Step: 27250   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:52:12,001-Speed 10505.30 samples/sec   Loss 11.4345   LearningRate 0.5093   Epoch: 5   Global Step: 27260   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:52:19,791-Speed 10517.42 samples/sec   Loss 11.4928   LearningRate 0.5092   Epoch: 5   Global Step: 27270   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:52:27,593-Speed 10501.73 samples/sec   Loss 11.3901   LearningRate 0.5091   Epoch: 5   Global Step: 27280   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:52:35,372-Speed 10531.46 samples/sec   Loss 11.3230   LearningRate 0.5089   Epoch: 5   Global Step: 27290   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:52:43,171-Speed 10506.48 samples/sec   Loss 11.5677   LearningRate 0.5088   Epoch: 5   Global Step: 27300   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:52:50,957-Speed 10524.92 samples/sec   Loss 11.4708   LearningRate 0.5087   Epoch: 5   Global Step: 27310   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:52:58,759-Speed 10500.04 samples/sec   Loss 11.4211   LearningRate 0.5085   Epoch: 5   Global Step: 27320   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:53:06,557-Speed 10506.85 samples/sec   Loss 11.6343   LearningRate 0.5084   Epoch: 5   Global Step: 27330   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:53:14,346-Speed 10519.42 samples/sec   Loss 11.4743   LearningRate 0.5083   Epoch: 5   Global Step: 27340   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:53:22,158-Speed 10494.17 samples/sec   Loss 11.4275   LearningRate 0.5081   Epoch: 5   Global Step: 27350   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:53:29,987-Speed 10464.73 samples/sec   Loss 11.3643   LearningRate 0.5080   Epoch: 5   Global Step: 27360   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:53:37,791-Speed 10498.57 samples/sec   Loss 11.3775   LearningRate 0.5079   Epoch: 5   Global Step: 27370   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:53:45,578-Speed 10521.73 samples/sec   Loss 11.3653   LearningRate 0.5077   Epoch: 5   Global Step: 27380   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:53:53,371-Speed 10513.97 samples/sec   Loss 11.3186   LearningRate 0.5076   Epoch: 5   Global Step: 27390   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:54:01,168-Speed 10507.72 samples/sec   Loss 11.3827   LearningRate 0.5075   Epoch: 5   Global Step: 27400   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:54:08,955-Speed 10520.86 samples/sec   Loss 11.4389   LearningRate 0.5073   Epoch: 5   Global Step: 27410   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:54:16,781-Speed 10468.63 samples/sec   Loss 11.5264   LearningRate 0.5072   Epoch: 5   Global Step: 27420   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:54:24,584-Speed 10499.57 samples/sec   Loss 11.4346   LearningRate 0.5071   Epoch: 5   Global Step: 27430   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:54:32,376-Speed 10515.39 samples/sec   Loss 11.3606   LearningRate 0.5069   Epoch: 5   Global Step: 27440   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:54:40,173-Speed 10507.90 samples/sec   Loss 11.2959   LearningRate 0.5068   Epoch: 5   Global Step: 27450   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:54:47,959-Speed 10522.27 samples/sec   Loss 11.3856   LearningRate 0.5067   Epoch: 5   Global Step: 27460   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:54:55,754-Speed 10511.00 samples/sec   Loss 11.3733   LearningRate 0.5065   Epoch: 5   Global Step: 27470   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:55:03,549-Speed 10510.72 samples/sec   Loss 11.3693   LearningRate 0.5064   Epoch: 5   Global Step: 27480   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:55:11,383-Speed 10459.03 samples/sec   Loss 11.3690   LearningRate 0.5063   Epoch: 5   Global Step: 27490   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:55:19,237-Speed 10430.97 samples/sec   Loss 11.2917   LearningRate 0.5061   Epoch: 5   Global Step: 27500   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:55:27,051-Speed 10485.47 samples/sec   Loss 11.4692   LearningRate 0.5060   Epoch: 5   Global Step: 27510   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:55:34,852-Speed 10502.91 samples/sec   Loss 11.3435   LearningRate 0.5059   Epoch: 5   Global Step: 27520   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:55:42,669-Speed 10480.80 samples/sec   Loss 11.3998   LearningRate 0.5057   Epoch: 5   Global Step: 27530   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:55:50,481-Speed 10487.69 samples/sec   Loss 11.4277   LearningRate 0.5056   Epoch: 5   Global Step: 27540   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:55:58,283-Speed 10501.27 samples/sec   Loss 11.4501   LearningRate 0.5055   Epoch: 5   Global Step: 27550   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:56:06,073-Speed 10517.41 samples/sec   Loss 11.3364   LearningRate 0.5053   Epoch: 5   Global Step: 27560   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:56:13,923-Speed 10436.74 samples/sec   Loss 11.4428   LearningRate 0.5052   Epoch: 5   Global Step: 27570   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:56:21,712-Speed 10519.14 samples/sec   Loss 11.4038   LearningRate 0.5051   Epoch: 5   Global Step: 27580   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:56:29,497-Speed 10523.80 samples/sec   Loss 11.3434   LearningRate 0.5049   Epoch: 5   Global Step: 27590   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:56:37,299-Speed 10501.15 samples/sec   Loss 11.3011   LearningRate 0.5048   Epoch: 5   Global Step: 27600   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:56:45,082-Speed 10528.08 samples/sec   Loss 11.4356   LearningRate 0.5047   Epoch: 5   Global Step: 27610   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:56:52,894-Speed 10490.81 samples/sec   Loss 11.3593   LearningRate 0.5045   Epoch: 5   Global Step: 27620   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:57:00,708-Speed 10484.11 samples/sec   Loss 11.3282   LearningRate 0.5044   Epoch: 5   Global Step: 27630   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:57:08,491-Speed 10527.19 samples/sec   Loss 11.3899   LearningRate 0.5043   Epoch: 5   Global Step: 27640   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:57:16,288-Speed 10508.83 samples/sec   Loss 11.3257   LearningRate 0.5041   Epoch: 5   Global Step: 27650   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:57:24,078-Speed 10517.02 samples/sec   Loss 11.3423   LearningRate 0.5040   Epoch: 5   Global Step: 27660   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:57:31,899-Speed 10475.70 samples/sec   Loss 11.2896   LearningRate 0.5039   Epoch: 5   Global Step: 27670   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:57:39,697-Speed 10506.59 samples/sec   Loss 11.4102   LearningRate 0.5037   Epoch: 5   Global Step: 27680   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:57:47,506-Speed 10491.65 samples/sec   Loss 11.4192   LearningRate 0.5036   Epoch: 5   Global Step: 27690   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:57:55,296-Speed 10517.10 samples/sec   Loss 11.3391   LearningRate 0.5035   Epoch: 5   Global Step: 27700   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:58:03,112-Speed 10483.60 samples/sec   Loss 11.3657   LearningRate 0.5033   Epoch: 5   Global Step: 27710   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:58:10,888-Speed 10535.95 samples/sec   Loss 11.3564   LearningRate 0.5032   Epoch: 5   Global Step: 27720   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:58:18,684-Speed 10508.82 samples/sec   Loss 11.2551   LearningRate 0.5031   Epoch: 5   Global Step: 27730   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:58:26,488-Speed 10499.64 samples/sec   Loss 11.3508   LearningRate 0.5029   Epoch: 5   Global Step: 27740   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:58:34,281-Speed 10513.12 samples/sec   Loss 11.2627   LearningRate 0.5028   Epoch: 5   Global Step: 27750   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 20:58:42,082-Speed 10502.42 samples/sec   Loss 11.6106   LearningRate 0.5027   Epoch: 5   Global Step: 27760   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:58:49,862-Speed 10530.42 samples/sec   Loss 11.6839   LearningRate 0.5026   Epoch: 5   Global Step: 27770   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:58:57,662-Speed 10504.85 samples/sec   Loss 11.4967   LearningRate 0.5024   Epoch: 5   Global Step: 27780   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:59:05,455-Speed 10513.48 samples/sec   Loss 11.4512   LearningRate 0.5023   Epoch: 5   Global Step: 27790   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:59:13,244-Speed 10519.29 samples/sec   Loss 11.3100   LearningRate 0.5022   Epoch: 5   Global Step: 27800   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:59:21,111-Speed 10414.27 samples/sec   Loss 11.3139   LearningRate 0.5020   Epoch: 5   Global Step: 27810   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:59:28,933-Speed 10474.27 samples/sec   Loss 11.2781   LearningRate 0.5019   Epoch: 5   Global Step: 27820   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:59:36,756-Speed 10473.32 samples/sec   Loss 11.2811   LearningRate 0.5018   Epoch: 5   Global Step: 27830   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:59:44,547-Speed 10516.60 samples/sec   Loss 11.2082   LearningRate 0.5016   Epoch: 5   Global Step: 27840   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 20:59:52,344-Speed 10507.67 samples/sec   Loss 11.3166   LearningRate 0.5015   Epoch: 5   Global Step: 27850   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 21:00:00,154-Speed 10489.93 samples/sec   Loss 11.2725   LearningRate 0.5014   Epoch: 5   Global Step: 27860   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 21:00:07,958-Speed 10499.85 samples/sec   Loss 11.2806   LearningRate 0.5012   Epoch: 5   Global Step: 27870   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 21:00:15,793-Speed 10456.77 samples/sec   Loss 11.3655   LearningRate 0.5011   Epoch: 5   Global Step: 27880   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 21:00:23,634-Speed 10448.14 samples/sec   Loss 11.2712   LearningRate 0.5010   Epoch: 5   Global Step: 27890   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 21:00:31,418-Speed 10525.42 samples/sec   Loss 11.3997   LearningRate 0.5008   Epoch: 5   Global Step: 27900   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 21:00:39,209-Speed 10516.87 samples/sec   Loss 11.3384   LearningRate 0.5007   Epoch: 5   Global Step: 27910   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 21:00:46,983-Speed 10539.63 samples/sec   Loss 11.4662   LearningRate 0.5006   Epoch: 5   Global Step: 27920   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 21:00:54,779-Speed 10507.96 samples/sec   Loss 11.2809   LearningRate 0.5004   Epoch: 5   Global Step: 27930   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 21:01:02,588-Speed 10492.62 samples/sec   Loss 11.3288   LearningRate 0.5003   Epoch: 5   Global Step: 27940   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 21:01:10,423-Speed 10457.04 samples/sec   Loss 11.2570   LearningRate 0.5002   Epoch: 5   Global Step: 27950   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 21:01:18,275-Speed 10434.91 samples/sec   Loss 11.2453   LearningRate 0.5000   Epoch: 5   Global Step: 27960   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 21:01:26,149-Speed 10404.48 samples/sec   Loss 11.3233   LearningRate 0.4999   Epoch: 5   Global Step: 27970   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 21:01:33,931-Speed 10528.58 samples/sec   Loss 11.3083   LearningRate 0.4998   Epoch: 5   Global Step: 27980   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 21:01:41,729-Speed 10506.34 samples/sec   Loss 11.2847   LearningRate 0.4996   Epoch: 5   Global Step: 27990   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 21:01:49,525-Speed 10508.29 samples/sec   Loss 11.3412   LearningRate 0.4995   Epoch: 5   Global Step: 28000   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 21:01:57,316-Speed 10517.57 samples/sec   Loss 11.2777   LearningRate 0.4994   Epoch: 5   Global Step: 28010   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 21:02:05,129-Speed 10486.16 samples/sec   Loss 11.3098   LearningRate 0.4992   Epoch: 5   Global Step: 28020   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 21:02:12,929-Speed 10504.20 samples/sec   Loss 11.1763   LearningRate 0.4991   Epoch: 5   Global Step: 28030   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 21:02:20,743-Speed 10485.45 samples/sec   Loss 11.3284   LearningRate 0.4990   Epoch: 5   Global Step: 28040   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 21:02:28,559-Speed 10483.26 samples/sec   Loss 11.1767   LearningRate 0.4988   Epoch: 5   Global Step: 28050   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 21:02:36,396-Speed 10454.22 samples/sec   Loss 11.3779   LearningRate 0.4987   Epoch: 5   Global Step: 28060   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 21:02:44,195-Speed 10505.13 samples/sec   Loss 11.2167   LearningRate 0.4986   Epoch: 5   Global Step: 28070   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 21:02:52,014-Speed 10477.83 samples/sec   Loss 11.2841   LearningRate 0.4985   Epoch: 5   Global Step: 28080   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 21:02:59,803-Speed 10520.01 samples/sec   Loss 11.2522   LearningRate 0.4983   Epoch: 5   Global Step: 28090   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 21:03:07,599-Speed 10509.15 samples/sec   Loss 11.1962   LearningRate 0.4982   Epoch: 5   Global Step: 28100   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 21:03:15,408-Speed 10491.54 samples/sec   Loss 11.2483   LearningRate 0.4981   Epoch: 5   Global Step: 28110   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 21:03:23,262-Speed 10431.96 samples/sec   Loss 11.2969   LearningRate 0.4979   Epoch: 5   Global Step: 28120   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 21:03:31,102-Speed 10450.97 samples/sec   Loss 11.2108   LearningRate 0.4978   Epoch: 5   Global Step: 28130   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 21:03:38,921-Speed 10477.97 samples/sec   Loss 11.2784   LearningRate 0.4977   Epoch: 5   Global Step: 28140   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 21:03:46,762-Speed 10448.48 samples/sec   Loss 11.4269   LearningRate 0.4975   Epoch: 5   Global Step: 28150   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 21:03:54,592-Speed 10463.76 samples/sec   Loss 11.2380   LearningRate 0.4974   Epoch: 5   Global Step: 28160   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 21:04:02,411-Speed 10478.28 samples/sec   Loss 11.4248   LearningRate 0.4973   Epoch: 5   Global Step: 28170   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 21:04:10,223-Speed 10488.63 samples/sec   Loss 11.3771   LearningRate 0.4971   Epoch: 5   Global Step: 28180   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 21:04:18,010-Speed 10521.49 samples/sec   Loss 11.3162   LearningRate 0.4970   Epoch: 5   Global Step: 28190   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 21:04:25,815-Speed 10497.72 samples/sec   Loss 11.2761   LearningRate 0.4969   Epoch: 5   Global Step: 28200   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 21:04:33,614-Speed 10504.43 samples/sec   Loss 11.2385   LearningRate 0.4967   Epoch: 5   Global Step: 28210   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 21:04:41,433-Speed 10478.75 samples/sec   Loss 11.2344   LearningRate 0.4966   Epoch: 5   Global Step: 28220   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 21:04:49,245-Speed 10488.01 samples/sec   Loss 11.2854   LearningRate 0.4965   Epoch: 5   Global Step: 28230   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 21:04:57,028-Speed 10526.50 samples/sec   Loss 11.2647   LearningRate 0.4963   Epoch: 5   Global Step: 28240   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 21:05:04,805-Speed 10535.39 samples/sec   Loss 11.2319   LearningRate 0.4962   Epoch: 5   Global Step: 28250   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 21:05:12,607-Speed 10501.20 samples/sec   Loss 11.3377   LearningRate 0.4961   Epoch: 5   Global Step: 28260   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 21:05:20,395-Speed 10520.15 samples/sec   Loss 11.2648   LearningRate 0.4960   Epoch: 5   Global Step: 28270   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 21:05:28,176-Speed 10528.97 samples/sec   Loss 11.3686   LearningRate 0.4958   Epoch: 5   Global Step: 28280   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 21:05:35,992-Speed 10483.15 samples/sec   Loss 11.3483   LearningRate 0.4957   Epoch: 5   Global Step: 28290   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 21:05:43,797-Speed 10497.14 samples/sec   Loss 11.3267   LearningRate 0.4956   Epoch: 5   Global Step: 28300   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 21:05:51,581-Speed 10525.32 samples/sec   Loss 11.3181   LearningRate 0.4954   Epoch: 5   Global Step: 28310   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 21:05:59,378-Speed 10507.54 samples/sec   Loss 11.2079   LearningRate 0.4953   Epoch: 5   Global Step: 28320   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 21:06:07,198-Speed 10477.64 samples/sec   Loss 11.2332   LearningRate 0.4952   Epoch: 5   Global Step: 28330   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 21:06:15,022-Speed 10471.93 samples/sec   Loss 11.2790   LearningRate 0.4950   Epoch: 5   Global Step: 28340   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 21:06:22,838-Speed 10484.09 samples/sec   Loss 11.1644   LearningRate 0.4949   Epoch: 5   Global Step: 28350   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 21:06:30,660-Speed 10474.25 samples/sec   Loss 11.2849   LearningRate 0.4948   Epoch: 5   Global Step: 28360   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 21:06:38,452-Speed 10514.55 samples/sec   Loss 11.2281   LearningRate 0.4946   Epoch: 5   Global Step: 28370   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 21:06:46,288-Speed 10456.50 samples/sec   Loss 11.2733   LearningRate 0.4945   Epoch: 5   Global Step: 28380   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 21:06:54,125-Speed 10454.06 samples/sec   Loss 11.2209   LearningRate 0.4944   Epoch: 5   Global Step: 28390   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 21:07:01,931-Speed 10495.06 samples/sec   Loss 11.2438   LearningRate 0.4942   Epoch: 5   Global Step: 28400   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 21:07:09,763-Speed 10461.81 samples/sec   Loss 11.2472   LearningRate 0.4941   Epoch: 5   Global Step: 28410   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 21:07:17,561-Speed 10506.32 samples/sec   Loss 11.2334   LearningRate 0.4940   Epoch: 5   Global Step: 28420   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 21:07:25,338-Speed 10535.24 samples/sec   Loss 11.2209   LearningRate 0.4938   Epoch: 5   Global Step: 28430   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 21:07:33,158-Speed 10477.63 samples/sec   Loss 11.2446   LearningRate 0.4937   Epoch: 5   Global Step: 28440   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 21:07:40,960-Speed 10500.72 samples/sec   Loss 11.2987   LearningRate 0.4936   Epoch: 5   Global Step: 28450   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 21:07:48,743-Speed 10527.51 samples/sec   Loss 11.2381   LearningRate 0.4935   Epoch: 5   Global Step: 28460   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-15 21:07:56,561-Speed 10484.47 samples/sec   Loss 11.2724   LearningRate 0.4933   Epoch: 5   Global Step: 28470   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 21:08:04,373-Speed 10486.83 samples/sec   Loss 11.3449   LearningRate 0.4932   Epoch: 5   Global Step: 28480   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 21:08:12,204-Speed 10463.68 samples/sec   Loss 11.2447   LearningRate 0.4931   Epoch: 5   Global Step: 28490   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-15 21:08:20,070-Speed 10415.63 samples/sec   Loss 11.1560   LearningRate 0.4929   Epoch: 5   Global Step: 28500   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:08:27,918-Speed 10444.88 samples/sec   Loss 11.2265   LearningRate 0.4928   Epoch: 5   Global Step: 28510   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:08:35,745-Speed 10467.71 samples/sec   Loss 11.2882   LearningRate 0.4927   Epoch: 5   Global Step: 28520   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:08:43,531-Speed 10524.11 samples/sec   Loss 11.2390   LearningRate 0.4925   Epoch: 5   Global Step: 28530   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:08:51,305-Speed 10537.85 samples/sec   Loss 11.2508   LearningRate 0.4924   Epoch: 5   Global Step: 28540   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:08:59,092-Speed 10522.16 samples/sec   Loss 11.2039   LearningRate 0.4923   Epoch: 5   Global Step: 28550   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:09:06,952-Speed 10423.96 samples/sec   Loss 11.1841   LearningRate 0.4921   Epoch: 5   Global Step: 28560   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:09:14,752-Speed 10504.60 samples/sec   Loss 11.2096   LearningRate 0.4920   Epoch: 5   Global Step: 28570   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:09:22,565-Speed 10486.14 samples/sec   Loss 11.2000   LearningRate 0.4919   Epoch: 5   Global Step: 28580   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:09:30,349-Speed 10525.66 samples/sec   Loss 11.2570   LearningRate 0.4918   Epoch: 5   Global Step: 28590   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:09:38,139-Speed 10519.02 samples/sec   Loss 11.2374   LearningRate 0.4916   Epoch: 5   Global Step: 28600   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:09:45,922-Speed 10526.60 samples/sec   Loss 11.2292   LearningRate 0.4915   Epoch: 5   Global Step: 28610   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:09:53,791-Speed 10411.24 samples/sec   Loss 11.1801   LearningRate 0.4914   Epoch: 5   Global Step: 28620   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:10:01,606-Speed 10484.43 samples/sec   Loss 11.1471   LearningRate 0.4912   Epoch: 5   Global Step: 28630   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:10:09,417-Speed 10489.88 samples/sec   Loss 11.2570   LearningRate 0.4911   Epoch: 5   Global Step: 28640   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:10:17,304-Speed 10387.79 samples/sec   Loss 11.2006   LearningRate 0.4910   Epoch: 5   Global Step: 28650   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:10:25,097-Speed 10515.44 samples/sec   Loss 11.1271   LearningRate 0.4908   Epoch: 5   Global Step: 28660   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:10:32,870-Speed 10539.46 samples/sec   Loss 11.2136   LearningRate 0.4907   Epoch: 5   Global Step: 28670   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:10:40,670-Speed 10505.29 samples/sec   Loss 11.3073   LearningRate 0.4906   Epoch: 5   Global Step: 28680   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:10:48,447-Speed 10534.82 samples/sec   Loss 11.2727   LearningRate 0.4904   Epoch: 5   Global Step: 28690   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:10:56,245-Speed 10505.32 samples/sec   Loss 11.2822   LearningRate 0.4903   Epoch: 5   Global Step: 28700   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:11:04,061-Speed 10483.76 samples/sec   Loss 11.1830   LearningRate 0.4902   Epoch: 5   Global Step: 28710   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:11:11,856-Speed 10511.43 samples/sec   Loss 11.2059   LearningRate 0.4901   Epoch: 5   Global Step: 28720   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:11:19,659-Speed 10498.64 samples/sec   Loss 11.1729   LearningRate 0.4899   Epoch: 5   Global Step: 28730   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:11:27,459-Speed 10503.82 samples/sec   Loss 11.0767   LearningRate 0.4898   Epoch: 5   Global Step: 28740   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:11:35,248-Speed 10519.16 samples/sec   Loss 11.1869   LearningRate 0.4897   Epoch: 5   Global Step: 28750   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:11:43,053-Speed 10497.31 samples/sec   Loss 11.2702   LearningRate 0.4895   Epoch: 5   Global Step: 28760   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:11:50,863-Speed 10491.48 samples/sec   Loss 11.1985   LearningRate 0.4894   Epoch: 5   Global Step: 28770   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:11:58,675-Speed 10487.88 samples/sec   Loss 11.1591   LearningRate 0.4893   Epoch: 5   Global Step: 28780   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:12:06,472-Speed 10507.35 samples/sec   Loss 11.1165   LearningRate 0.4891   Epoch: 5   Global Step: 28790   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:12:14,294-Speed 10475.31 samples/sec   Loss 11.2077   LearningRate 0.4890   Epoch: 5   Global Step: 28800   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:12:22,121-Speed 10467.33 samples/sec   Loss 11.1520   LearningRate 0.4889   Epoch: 5   Global Step: 28810   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:12:29,962-Speed 10449.32 samples/sec   Loss 11.1945   LearningRate 0.4887   Epoch: 5   Global Step: 28820   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:12:37,796-Speed 10458.42 samples/sec   Loss 11.1694   LearningRate 0.4886   Epoch: 5   Global Step: 28830   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:12:45,594-Speed 10506.70 samples/sec   Loss 11.2184   LearningRate 0.4885   Epoch: 5   Global Step: 28840   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:12:53,428-Speed 10458.41 samples/sec   Loss 11.2278   LearningRate 0.4884   Epoch: 5   Global Step: 28850   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:13:01,264-Speed 10456.26 samples/sec   Loss 11.2343   LearningRate 0.4882   Epoch: 5   Global Step: 28860   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:13:09,116-Speed 10434.48 samples/sec   Loss 11.0576   LearningRate 0.4881   Epoch: 5   Global Step: 28870   Fp16 Grad Scale: 524288   Required: 16 hours
Training: 2022-01-15 21:13:16,910-Speed 10512.70 samples/sec   Loss 11.1304   LearningRate 0.4880   Epoch: 5   Global Step: 28880   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:13:24,703-Speed 10513.18 samples/sec   Loss 11.2631   LearningRate 0.4878   Epoch: 5   Global Step: 28890   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:13:32,513-Speed 10489.99 samples/sec   Loss 11.2502   LearningRate 0.4877   Epoch: 5   Global Step: 28900   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:13:40,372-Speed 10425.16 samples/sec   Loss 11.1852   LearningRate 0.4876   Epoch: 5   Global Step: 28910   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:13:48,216-Speed 10446.00 samples/sec   Loss 11.2902   LearningRate 0.4874   Epoch: 5   Global Step: 28920   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:13:56,017-Speed 10502.84 samples/sec   Loss 11.1389   LearningRate 0.4873   Epoch: 5   Global Step: 28930   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:14:03,815-Speed 10505.78 samples/sec   Loss 11.1910   LearningRate 0.4872   Epoch: 5   Global Step: 28940   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:14:11,607-Speed 10515.37 samples/sec   Loss 11.1367   LearningRate 0.4870   Epoch: 5   Global Step: 28950   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:14:19,462-Speed 10429.93 samples/sec   Loss 11.1250   LearningRate 0.4869   Epoch: 5   Global Step: 28960   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:14:27,304-Speed 10448.19 samples/sec   Loss 11.0777   LearningRate 0.4868   Epoch: 5   Global Step: 28970   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:14:35,099-Speed 10511.51 samples/sec   Loss 11.1337   LearningRate 0.4867   Epoch: 5   Global Step: 28980   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:14:42,916-Speed 10481.74 samples/sec   Loss 11.0601   LearningRate 0.4865   Epoch: 5   Global Step: 28990   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:14:50,727-Speed 10489.28 samples/sec   Loss 11.1656   LearningRate 0.4864   Epoch: 5   Global Step: 29000   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:14:58,529-Speed 10500.87 samples/sec   Loss 11.1464   LearningRate 0.4863   Epoch: 5   Global Step: 29010   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:15:06,343-Speed 10485.61 samples/sec   Loss 11.3368   LearningRate 0.4861   Epoch: 5   Global Step: 29020   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:15:14,126-Speed 10528.73 samples/sec   Loss 11.1771   LearningRate 0.4860   Epoch: 5   Global Step: 29030   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:15:21,913-Speed 10521.12 samples/sec   Loss 11.0722   LearningRate 0.4859   Epoch: 5   Global Step: 29040   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:15:29,749-Speed 10456.73 samples/sec   Loss 11.1570   LearningRate 0.4857   Epoch: 5   Global Step: 29050   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:15:37,534-Speed 10523.37 samples/sec   Loss 11.1168   LearningRate 0.4856   Epoch: 5   Global Step: 29060   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:15:45,330-Speed 10510.19 samples/sec   Loss 11.1611   LearningRate 0.4855   Epoch: 5   Global Step: 29070   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:15:53,134-Speed 10498.59 samples/sec   Loss 11.2108   LearningRate 0.4854   Epoch: 5   Global Step: 29080   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:16:00,933-Speed 10505.96 samples/sec   Loss 11.1793   LearningRate 0.4852   Epoch: 5   Global Step: 29090   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:16:08,742-Speed 10493.13 samples/sec   Loss 11.1694   LearningRate 0.4851   Epoch: 5   Global Step: 29100   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:16:16,540-Speed 10506.67 samples/sec   Loss 11.1109   LearningRate 0.4850   Epoch: 5   Global Step: 29110   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:16:24,328-Speed 10520.05 samples/sec   Loss 11.1386   LearningRate 0.4848   Epoch: 5   Global Step: 29120   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:16:32,124-Speed 10508.28 samples/sec   Loss 11.0996   LearningRate 0.4847   Epoch: 5   Global Step: 29130   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:16:39,933-Speed 10493.09 samples/sec   Loss 11.1345   LearningRate 0.4846   Epoch: 5   Global Step: 29140   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:16:47,757-Speed 10471.59 samples/sec   Loss 11.0976   LearningRate 0.4844   Epoch: 5   Global Step: 29150   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:16:55,566-Speed 10492.19 samples/sec   Loss 11.0929   LearningRate 0.4843   Epoch: 5   Global Step: 29160   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:17:03,377-Speed 10489.48 samples/sec   Loss 11.1127   LearningRate 0.4842   Epoch: 5   Global Step: 29170   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:17:11,177-Speed 10504.53 samples/sec   Loss 11.1484   LearningRate 0.4841   Epoch: 5   Global Step: 29180   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:17:18,960-Speed 10526.98 samples/sec   Loss 11.0210   LearningRate 0.4839   Epoch: 5   Global Step: 29190   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:17:26,769-Speed 10491.23 samples/sec   Loss 11.0734   LearningRate 0.4838   Epoch: 5   Global Step: 29200   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:17:34,565-Speed 10510.37 samples/sec   Loss 11.0817   LearningRate 0.4837   Epoch: 5   Global Step: 29210   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:17:42,371-Speed 10495.54 samples/sec   Loss 11.1901   LearningRate 0.4835   Epoch: 5   Global Step: 29220   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:17:50,173-Speed 10501.73 samples/sec   Loss 11.1022   LearningRate 0.4834   Epoch: 5   Global Step: 29230   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:17:57,941-Speed 10546.95 samples/sec   Loss 11.1052   LearningRate 0.4833   Epoch: 5   Global Step: 29240   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:18:05,750-Speed 10492.01 samples/sec   Loss 11.1204   LearningRate 0.4831   Epoch: 5   Global Step: 29250   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:18:13,562-Speed 10489.00 samples/sec   Loss 11.2460   LearningRate 0.4830   Epoch: 5   Global Step: 29260   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:18:21,352-Speed 10516.44 samples/sec   Loss 11.1313   LearningRate 0.4829   Epoch: 5   Global Step: 29270   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:18:29,153-Speed 10503.96 samples/sec   Loss 11.1387   LearningRate 0.4828   Epoch: 5   Global Step: 29280   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:18:36,939-Speed 10521.81 samples/sec   Loss 11.1097   LearningRate 0.4826   Epoch: 5   Global Step: 29290   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:18:44,785-Speed 10443.81 samples/sec   Loss 11.0621   LearningRate 0.4825   Epoch: 5   Global Step: 29300   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:18:52,580-Speed 10510.57 samples/sec   Loss 11.1273   LearningRate 0.4824   Epoch: 5   Global Step: 29310   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:19:00,404-Speed 10472.87 samples/sec   Loss 11.2578   LearningRate 0.4822   Epoch: 5   Global Step: 29320   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:19:08,217-Speed 10485.53 samples/sec   Loss 11.1437   LearningRate 0.4821   Epoch: 5   Global Step: 29330   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:19:16,025-Speed 10493.97 samples/sec   Loss 11.0301   LearningRate 0.4820   Epoch: 5   Global Step: 29340   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:19:23,842-Speed 10480.34 samples/sec   Loss 11.0538   LearningRate 0.4818   Epoch: 5   Global Step: 29350   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:19:31,627-Speed 10524.86 samples/sec   Loss 11.0418   LearningRate 0.4817   Epoch: 5   Global Step: 29360   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:19:39,413-Speed 10522.71 samples/sec   Loss 11.1943   LearningRate 0.4816   Epoch: 5   Global Step: 29370   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:19:47,223-Speed 10489.91 samples/sec   Loss 11.2535   LearningRate 0.4815   Epoch: 5   Global Step: 29380   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:19:55,021-Speed 10508.17 samples/sec   Loss 11.1034   LearningRate 0.4813   Epoch: 5   Global Step: 29390   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:20:02,823-Speed 10501.25 samples/sec   Loss 11.0475   LearningRate 0.4812   Epoch: 5   Global Step: 29400   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:20:10,638-Speed 10482.44 samples/sec   Loss 11.0682   LearningRate 0.4811   Epoch: 5   Global Step: 29410   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:20:18,438-Speed 10503.69 samples/sec   Loss 10.9854   LearningRate 0.4809   Epoch: 5   Global Step: 29420   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:20:26,238-Speed 10504.96 samples/sec   Loss 10.9783   LearningRate 0.4808   Epoch: 5   Global Step: 29430   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:20:34,052-Speed 10485.85 samples/sec   Loss 11.0527   LearningRate 0.4807   Epoch: 5   Global Step: 29440   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:20:41,871-Speed 10477.28 samples/sec   Loss 11.0657   LearningRate 0.4806   Epoch: 5   Global Step: 29450   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:20:49,690-Speed 10477.99 samples/sec   Loss 11.1048   LearningRate 0.4804   Epoch: 5   Global Step: 29460   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:20:57,486-Speed 10511.35 samples/sec   Loss 11.0260   LearningRate 0.4803   Epoch: 5   Global Step: 29470   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:21:05,264-Speed 10533.40 samples/sec   Loss 11.0543   LearningRate 0.4802   Epoch: 5   Global Step: 29480   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:21:13,044-Speed 10531.29 samples/sec   Loss 11.1709   LearningRate 0.4800   Epoch: 5   Global Step: 29490   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:21:20,846-Speed 10501.91 samples/sec   Loss 11.0662   LearningRate 0.4799   Epoch: 5   Global Step: 29500   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:21:28,665-Speed 10478.72 samples/sec   Loss 11.0176   LearningRate 0.4798   Epoch: 5   Global Step: 29510   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:21:36,457-Speed 10515.67 samples/sec   Loss 11.0795   LearningRate 0.4796   Epoch: 5   Global Step: 29520   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:21:44,254-Speed 10506.99 samples/sec   Loss 11.0525   LearningRate 0.4795   Epoch: 5   Global Step: 29530   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:21:52,056-Speed 10502.52 samples/sec   Loss 11.0408   LearningRate 0.4794   Epoch: 5   Global Step: 29540   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:21:59,873-Speed 10481.45 samples/sec   Loss 11.1322   LearningRate 0.4793   Epoch: 5   Global Step: 29550   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:22:07,675-Speed 10501.13 samples/sec   Loss 11.1569   LearningRate 0.4791   Epoch: 5   Global Step: 29560   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:22:15,474-Speed 10504.22 samples/sec   Loss 11.0588   LearningRate 0.4790   Epoch: 5   Global Step: 29570   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:22:23,263-Speed 10519.36 samples/sec   Loss 11.0661   LearningRate 0.4789   Epoch: 5   Global Step: 29580   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:22:31,122-Speed 10425.27 samples/sec   Loss 11.0368   LearningRate 0.4787   Epoch: 5   Global Step: 29590   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:22:38,938-Speed 10482.74 samples/sec   Loss 11.1334   LearningRate 0.4786   Epoch: 5   Global Step: 29600   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:22:46,756-Speed 10479.78 samples/sec   Loss 10.9943   LearningRate 0.4785   Epoch: 5   Global Step: 29610   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:22:54,565-Speed 10492.55 samples/sec   Loss 11.1094   LearningRate 0.4784   Epoch: 5   Global Step: 29620   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:23:02,349-Speed 10525.94 samples/sec   Loss 11.0068   LearningRate 0.4782   Epoch: 5   Global Step: 29630   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:23:10,129-Speed 10530.73 samples/sec   Loss 11.1003   LearningRate 0.4781   Epoch: 5   Global Step: 29640   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:23:17,907-Speed 10533.39 samples/sec   Loss 11.1184   LearningRate 0.4780   Epoch: 5   Global Step: 29650   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:23:25,708-Speed 10502.80 samples/sec   Loss 11.0390   LearningRate 0.4778   Epoch: 5   Global Step: 29660   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:23:33,496-Speed 10519.20 samples/sec   Loss 10.9793   LearningRate 0.4777   Epoch: 5   Global Step: 29670   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:23:41,295-Speed 10506.27 samples/sec   Loss 11.0715   LearningRate 0.4776   Epoch: 5   Global Step: 29680   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:23:49,096-Speed 10502.93 samples/sec   Loss 11.0504   LearningRate 0.4774   Epoch: 5   Global Step: 29690   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:23:56,874-Speed 10532.69 samples/sec   Loss 11.0655   LearningRate 0.4773   Epoch: 5   Global Step: 29700   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:24:04,674-Speed 10504.03 samples/sec   Loss 10.9768   LearningRate 0.4772   Epoch: 5   Global Step: 29710   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:24:12,466-Speed 10515.39 samples/sec   Loss 10.9786   LearningRate 0.4771   Epoch: 5   Global Step: 29720   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:24:20,236-Speed 10544.44 samples/sec   Loss 10.9651   LearningRate 0.4769   Epoch: 5   Global Step: 29730   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:24:28,027-Speed 10515.42 samples/sec   Loss 11.1011   LearningRate 0.4768   Epoch: 5   Global Step: 29740   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:24:35,801-Speed 10539.89 samples/sec   Loss 11.0369   LearningRate 0.4767   Epoch: 5   Global Step: 29750   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:24:43,624-Speed 10473.11 samples/sec   Loss 11.1068   LearningRate 0.4765   Epoch: 5   Global Step: 29760   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:24:51,462-Speed 10453.05 samples/sec   Loss 11.0695   LearningRate 0.4764   Epoch: 5   Global Step: 29770   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:24:59,277-Speed 10484.59 samples/sec   Loss 11.0664   LearningRate 0.4763   Epoch: 5   Global Step: 29780   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:25:07,049-Speed 10541.83 samples/sec   Loss 11.0493   LearningRate 0.4762   Epoch: 5   Global Step: 29790   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:25:14,831-Speed 10528.91 samples/sec   Loss 10.9747   LearningRate 0.4760   Epoch: 5   Global Step: 29800   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:25:22,623-Speed 10514.05 samples/sec   Loss 10.9830   LearningRate 0.4759   Epoch: 5   Global Step: 29810   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:25:30,412-Speed 10519.65 samples/sec   Loss 11.0189   LearningRate 0.4758   Epoch: 5   Global Step: 29820   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:25:38,199-Speed 10520.86 samples/sec   Loss 10.9483   LearningRate 0.4756   Epoch: 5   Global Step: 29830   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:25:46,025-Speed 10469.65 samples/sec   Loss 11.0064   LearningRate 0.4755   Epoch: 5   Global Step: 29840   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:25:53,815-Speed 10516.96 samples/sec   Loss 11.0020   LearningRate 0.4754   Epoch: 5   Global Step: 29850   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:26:01,611-Speed 10509.24 samples/sec   Loss 10.9629   LearningRate 0.4753   Epoch: 5   Global Step: 29860   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:26:09,403-Speed 10515.71 samples/sec   Loss 11.0939   LearningRate 0.4751   Epoch: 5   Global Step: 29870   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:26:17,196-Speed 10513.70 samples/sec   Loss 11.0515   LearningRate 0.4750   Epoch: 5   Global Step: 29880   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:26:24,979-Speed 10525.82 samples/sec   Loss 11.2207   LearningRate 0.4749   Epoch: 5   Global Step: 29890   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:26:32,752-Speed 10541.02 samples/sec   Loss 11.2967   LearningRate 0.4747   Epoch: 5   Global Step: 29900   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:26:40,542-Speed 10517.68 samples/sec   Loss 11.0515   LearningRate 0.4746   Epoch: 5   Global Step: 29910   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:26:48,329-Speed 10521.23 samples/sec   Loss 10.9319   LearningRate 0.4745   Epoch: 5   Global Step: 29920   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:26:56,097-Speed 10547.84 samples/sec   Loss 10.9318   LearningRate 0.4744   Epoch: 5   Global Step: 29930   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-15 21:27:03,869-Speed 10541.26 samples/sec   Loss 11.0362   LearningRate 0.4742   Epoch: 5   Global Step: 29940   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-15 21:27:11,650-Speed 10529.36 samples/sec   Loss 10.9426   LearningRate 0.4741   Epoch: 5   Global Step: 29950   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-15 21:27:19,435-Speed 10525.11 samples/sec   Loss 11.0239   LearningRate 0.4740   Epoch: 5   Global Step: 29960   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-15 21:27:27,236-Speed 10502.63 samples/sec   Loss 10.9342   LearningRate 0.4738   Epoch: 5   Global Step: 29970   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-15 21:27:35,030-Speed 10513.14 samples/sec   Loss 10.9995   LearningRate 0.4737   Epoch: 5   Global Step: 29980   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-15 21:27:42,808-Speed 10534.00 samples/sec   Loss 10.9694   LearningRate 0.4736   Epoch: 5   Global Step: 29990   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-15 21:27:50,572-Speed 10551.99 samples/sec   Loss 10.9665   LearningRate 0.4735   Epoch: 5   Global Step: 30000   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-15 21:28:17,886-[lfw][30000]XNorm: 22.428017
Training: 2022-01-15 21:28:17,887-[lfw][30000]Accuracy-Flip: 0.99667+-0.00197
Training: 2022-01-15 21:28:17,888-[lfw][30000]Accuracy-Highest: 0.99667
Training: 2022-01-15 21:28:50,212-[cfp_fp][30000]XNorm: 19.699531
Training: 2022-01-15 21:28:50,212-[cfp_fp][30000]Accuracy-Flip: 0.97286+-0.00767
Training: 2022-01-15 21:28:50,213-[cfp_fp][30000]Accuracy-Highest: 0.97286
Training: 2022-01-15 21:29:18,414-[agedb_30][30000]XNorm: 21.906889
Training: 2022-01-15 21:29:18,414-[agedb_30][30000]Accuracy-Flip: 0.96250+-0.00735
Training: 2022-01-15 21:29:18,415-[agedb_30][30000]Accuracy-Highest: 0.96250
Training: 2022-01-15 21:29:26,157-Speed 857.05 samples/sec   Loss 10.9846   LearningRate 0.4733   Epoch: 5   Global Step: 30010   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-15 21:29:33,896-Speed 10587.58 samples/sec   Loss 11.0951   LearningRate 0.4732   Epoch: 5   Global Step: 30020   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-15 21:29:41,641-Speed 10581.82 samples/sec   Loss 10.9771   LearningRate 0.4731   Epoch: 5   Global Step: 30030   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:29:49,394-Speed 10568.47 samples/sec   Loss 11.0717   LearningRate 0.4729   Epoch: 5   Global Step: 30040   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:29:57,185-Speed 10515.60 samples/sec   Loss 10.9423   LearningRate 0.4728   Epoch: 5   Global Step: 30050   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:30:04,949-Speed 10553.31 samples/sec   Loss 10.9395   LearningRate 0.4727   Epoch: 5   Global Step: 30060   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:30:12,762-Speed 10486.10 samples/sec   Loss 10.9522   LearningRate 0.4726   Epoch: 5   Global Step: 30070   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:30:20,555-Speed 10513.41 samples/sec   Loss 11.0042   LearningRate 0.4724   Epoch: 5   Global Step: 30080   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:30:28,378-Speed 10473.06 samples/sec   Loss 10.9720   LearningRate 0.4723   Epoch: 5   Global Step: 30090   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:30:36,188-Speed 10494.22 samples/sec   Loss 10.9151   LearningRate 0.4722   Epoch: 5   Global Step: 30100   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:30:43,962-Speed 10538.63 samples/sec   Loss 10.9579   LearningRate 0.4720   Epoch: 5   Global Step: 30110   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:30:51,735-Speed 10539.93 samples/sec   Loss 10.8753   LearningRate 0.4719   Epoch: 5   Global Step: 30120   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:30:59,533-Speed 10506.28 samples/sec   Loss 11.0052   LearningRate 0.4718   Epoch: 5   Global Step: 30130   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:31:07,307-Speed 10540.34 samples/sec   Loss 11.0083   LearningRate 0.4717   Epoch: 5   Global Step: 30140   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:31:15,076-Speed 10545.66 samples/sec   Loss 11.0175   LearningRate 0.4715   Epoch: 5   Global Step: 30150   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:31:22,852-Speed 10536.52 samples/sec   Loss 10.9315   LearningRate 0.4714   Epoch: 5   Global Step: 30160   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:31:30,659-Speed 10493.57 samples/sec   Loss 10.9745   LearningRate 0.4713   Epoch: 5   Global Step: 30170   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:31:38,426-Speed 10549.01 samples/sec   Loss 10.9293   LearningRate 0.4711   Epoch: 5   Global Step: 30180   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:31:46,206-Speed 10531.21 samples/sec   Loss 10.9750   LearningRate 0.4710   Epoch: 5   Global Step: 30190   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:31:53,972-Speed 10549.80 samples/sec   Loss 10.9908   LearningRate 0.4709   Epoch: 5   Global Step: 30200   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:32:01,752-Speed 10529.97 samples/sec   Loss 11.0327   LearningRate 0.4708   Epoch: 5   Global Step: 30210   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:32:09,526-Speed 10539.57 samples/sec   Loss 11.0191   LearningRate 0.4706   Epoch: 5   Global Step: 30220   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:32:17,331-Speed 10497.72 samples/sec   Loss 11.0312   LearningRate 0.4705   Epoch: 5   Global Step: 30230   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:32:25,107-Speed 10536.10 samples/sec   Loss 11.0436   LearningRate 0.4704   Epoch: 5   Global Step: 30240   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:32:32,909-Speed 10501.21 samples/sec   Loss 11.0398   LearningRate 0.4702   Epoch: 5   Global Step: 30250   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:32:40,684-Speed 10538.24 samples/sec   Loss 10.9271   LearningRate 0.4701   Epoch: 5   Global Step: 30260   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:32:48,499-Speed 10484.05 samples/sec   Loss 10.9309   LearningRate 0.4700   Epoch: 5   Global Step: 30270   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:32:56,297-Speed 10507.02 samples/sec   Loss 11.0602   LearningRate 0.4699   Epoch: 5   Global Step: 30280   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:33:04,125-Speed 10466.23 samples/sec   Loss 10.8343   LearningRate 0.4697   Epoch: 5   Global Step: 30290   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:33:11,902-Speed 10535.48 samples/sec   Loss 10.8587   LearningRate 0.4696   Epoch: 5   Global Step: 30300   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:33:19,695-Speed 10514.06 samples/sec   Loss 10.9129   LearningRate 0.4695   Epoch: 5   Global Step: 30310   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:33:27,484-Speed 10518.43 samples/sec   Loss 10.9708   LearningRate 0.4694   Epoch: 5   Global Step: 30320   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:33:35,249-Speed 10551.99 samples/sec   Loss 10.9819   LearningRate 0.4692   Epoch: 5   Global Step: 30330   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:33:43,036-Speed 10521.01 samples/sec   Loss 10.9189   LearningRate 0.4691   Epoch: 5   Global Step: 30340   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:33:50,834-Speed 10508.08 samples/sec   Loss 10.9296   LearningRate 0.4690   Epoch: 5   Global Step: 30350   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:33:58,637-Speed 10499.67 samples/sec   Loss 10.9137   LearningRate 0.4688   Epoch: 5   Global Step: 30360   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:34:06,420-Speed 10526.22 samples/sec   Loss 10.8951   LearningRate 0.4687   Epoch: 5   Global Step: 30370   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:34:14,222-Speed 10501.07 samples/sec   Loss 10.9193   LearningRate 0.4686   Epoch: 5   Global Step: 30380   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:34:22,018-Speed 10510.20 samples/sec   Loss 10.9404   LearningRate 0.4685   Epoch: 5   Global Step: 30390   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:34:29,837-Speed 10478.06 samples/sec   Loss 10.9178   LearningRate 0.4683   Epoch: 5   Global Step: 30400   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:34:37,610-Speed 10540.80 samples/sec   Loss 11.0210   LearningRate 0.4682   Epoch: 5   Global Step: 30410   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:34:45,409-Speed 10504.46 samples/sec   Loss 10.9006   LearningRate 0.4681   Epoch: 5   Global Step: 30420   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:34:53,200-Speed 10516.16 samples/sec   Loss 10.8690   LearningRate 0.4679   Epoch: 5   Global Step: 30430   Fp16 Grad Scale: 524288   Required: 16 hours
Training: 2022-01-15 21:35:00,997-Speed 10509.13 samples/sec   Loss 10.9334   LearningRate 0.4678   Epoch: 5   Global Step: 30440   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:35:08,799-Speed 10500.38 samples/sec   Loss 10.9028   LearningRate 0.4677   Epoch: 5   Global Step: 30450   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:35:16,608-Speed 10492.06 samples/sec   Loss 10.9651   LearningRate 0.4676   Epoch: 5   Global Step: 30460   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:35:24,408-Speed 10504.84 samples/sec   Loss 10.9172   LearningRate 0.4674   Epoch: 5   Global Step: 30470   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:35:32,206-Speed 10507.45 samples/sec   Loss 10.8901   LearningRate 0.4673   Epoch: 5   Global Step: 30480   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:35:39,979-Speed 10540.96 samples/sec   Loss 11.0168   LearningRate 0.4672   Epoch: 5   Global Step: 30490   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:35:47,761-Speed 10529.14 samples/sec   Loss 10.9179   LearningRate 0.4671   Epoch: 5   Global Step: 30500   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:35:55,593-Speed 10460.53 samples/sec   Loss 10.8889   LearningRate 0.4669   Epoch: 5   Global Step: 30510   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:36:03,423-Speed 10463.64 samples/sec   Loss 10.9255   LearningRate 0.4668   Epoch: 5   Global Step: 30520   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:36:11,230-Speed 10498.55 samples/sec   Loss 10.8610   LearningRate 0.4667   Epoch: 5   Global Step: 30530   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:36:19,032-Speed 10501.92 samples/sec   Loss 10.8818   LearningRate 0.4665   Epoch: 5   Global Step: 30540   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:36:26,881-Speed 10438.83 samples/sec   Loss 10.9209   LearningRate 0.4664   Epoch: 5   Global Step: 30550   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:36:34,650-Speed 10545.12 samples/sec   Loss 10.9706   LearningRate 0.4663   Epoch: 5   Global Step: 30560   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:36:42,465-Speed 10483.56 samples/sec   Loss 10.8809   LearningRate 0.4662   Epoch: 5   Global Step: 30570   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:36:50,282-Speed 10480.88 samples/sec   Loss 11.0207   LearningRate 0.4660   Epoch: 5   Global Step: 30580   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:36:58,110-Speed 10466.91 samples/sec   Loss 11.0076   LearningRate 0.4659   Epoch: 5   Global Step: 30590   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:37:05,915-Speed 10497.00 samples/sec   Loss 10.9248   LearningRate 0.4658   Epoch: 5   Global Step: 30600   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:37:13,700-Speed 10528.75 samples/sec   Loss 10.9839   LearningRate 0.4656   Epoch: 5   Global Step: 30610   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:37:21,509-Speed 10492.12 samples/sec   Loss 10.9332   LearningRate 0.4655   Epoch: 5   Global Step: 30620   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:37:29,304-Speed 10512.01 samples/sec   Loss 10.8613   LearningRate 0.4654   Epoch: 5   Global Step: 30630   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:37:37,093-Speed 10518.49 samples/sec   Loss 10.8972   LearningRate 0.4653   Epoch: 5   Global Step: 30640   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:37:44,870-Speed 10534.04 samples/sec   Loss 10.9298   LearningRate 0.4651   Epoch: 5   Global Step: 30650   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:37:52,659-Speed 10518.57 samples/sec   Loss 10.8507   LearningRate 0.4650   Epoch: 5   Global Step: 30660   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:38:00,438-Speed 10533.06 samples/sec   Loss 10.9003   LearningRate 0.4649   Epoch: 5   Global Step: 30670   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:38:08,205-Speed 10547.37 samples/sec   Loss 10.8423   LearningRate 0.4648   Epoch: 5   Global Step: 30680   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-15 21:38:16,024-Speed 10478.89 samples/sec   Loss 10.9671   LearningRate 0.4646   Epoch: 5   Global Step: 30690   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-15 21:38:23,864-Speed 10450.66 samples/sec   Loss 10.8745   LearningRate 0.4645   Epoch: 5   Global Step: 30700   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-15 21:38:31,650-Speed 10523.53 samples/sec   Loss 10.9129   LearningRate 0.4644   Epoch: 5   Global Step: 30710   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-15 21:38:39,442-Speed 10514.70 samples/sec   Loss 10.9318   LearningRate 0.4642   Epoch: 5   Global Step: 30720   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-15 21:38:47,221-Speed 10532.02 samples/sec   Loss 10.9396   LearningRate 0.4641   Epoch: 5   Global Step: 30730   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-15 21:38:55,006-Speed 10523.60 samples/sec   Loss 10.9732   LearningRate 0.4640   Epoch: 5   Global Step: 30740   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-15 21:39:02,774-Speed 10547.90 samples/sec   Loss 10.9020   LearningRate 0.4639   Epoch: 5   Global Step: 30750   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-15 21:39:10,554-Speed 10530.89 samples/sec   Loss 10.7753   LearningRate 0.4637   Epoch: 5   Global Step: 30760   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-15 21:39:18,339-Speed 10524.26 samples/sec   Loss 10.8174   LearningRate 0.4636   Epoch: 5   Global Step: 30770   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-15 21:39:26,110-Speed 10543.72 samples/sec   Loss 10.8951   LearningRate 0.4635   Epoch: 5   Global Step: 30780   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:39:33,935-Speed 10471.37 samples/sec   Loss 10.8624   LearningRate 0.4634   Epoch: 5   Global Step: 30790   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:39:41,757-Speed 10473.65 samples/sec   Loss 10.9435   LearningRate 0.4632   Epoch: 5   Global Step: 30800   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:39:49,591-Speed 10458.15 samples/sec   Loss 10.8911   LearningRate 0.4631   Epoch: 5   Global Step: 30810   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:39:57,377-Speed 10522.60 samples/sec   Loss 10.9234   LearningRate 0.4630   Epoch: 5   Global Step: 30820   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:40:05,155-Speed 10534.28 samples/sec   Loss 10.7841   LearningRate 0.4629   Epoch: 5   Global Step: 30830   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:40:12,956-Speed 10505.91 samples/sec   Loss 10.8306   LearningRate 0.4627   Epoch: 5   Global Step: 30840   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:40:20,752-Speed 10510.38 samples/sec   Loss 10.8382   LearningRate 0.4626   Epoch: 5   Global Step: 30850   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:40:28,562-Speed 10490.74 samples/sec   Loss 10.8856   LearningRate 0.4625   Epoch: 5   Global Step: 30860   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:40:36,361-Speed 10504.05 samples/sec   Loss 10.8876   LearningRate 0.4623   Epoch: 5   Global Step: 30870   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:40:44,181-Speed 10477.30 samples/sec   Loss 10.9080   LearningRate 0.4622   Epoch: 5   Global Step: 30880   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:40:52,003-Speed 10475.64 samples/sec   Loss 10.9218   LearningRate 0.4621   Epoch: 5   Global Step: 30890   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:40:59,810-Speed 10493.87 samples/sec   Loss 10.9234   LearningRate 0.4620   Epoch: 5   Global Step: 30900   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:41:07,591-Speed 10529.33 samples/sec   Loss 10.8804   LearningRate 0.4618   Epoch: 5   Global Step: 30910   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:41:15,390-Speed 10505.54 samples/sec   Loss 10.8417   LearningRate 0.4617   Epoch: 5   Global Step: 30920   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:41:23,176-Speed 10523.27 samples/sec   Loss 10.8578   LearningRate 0.4616   Epoch: 5   Global Step: 30930   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:41:30,959-Speed 10526.40 samples/sec   Loss 10.7944   LearningRate 0.4615   Epoch: 5   Global Step: 30940   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:41:38,752-Speed 10518.09 samples/sec   Loss 11.1091   LearningRate 0.4613   Epoch: 5   Global Step: 30950   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:41:46,574-Speed 10474.79 samples/sec   Loss 10.9174   LearningRate 0.4612   Epoch: 5   Global Step: 30960   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:41:54,367-Speed 10513.03 samples/sec   Loss 10.8765   LearningRate 0.4611   Epoch: 5   Global Step: 30970   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:42:02,141-Speed 10538.48 samples/sec   Loss 10.7882   LearningRate 0.4609   Epoch: 5   Global Step: 30980   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:42:09,963-Speed 10475.00 samples/sec   Loss 10.8068   LearningRate 0.4608   Epoch: 5   Global Step: 30990   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:42:17,777-Speed 10485.30 samples/sec   Loss 10.8720   LearningRate 0.4607   Epoch: 5   Global Step: 31000   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:42:25,575-Speed 10506.05 samples/sec   Loss 10.8801   LearningRate 0.4606   Epoch: 5   Global Step: 31010   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:42:33,357-Speed 10529.05 samples/sec   Loss 10.8804   LearningRate 0.4604   Epoch: 5   Global Step: 31020   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:42:41,149-Speed 10516.13 samples/sec   Loss 10.7619   LearningRate 0.4603   Epoch: 5   Global Step: 31030   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:42:48,929-Speed 10531.46 samples/sec   Loss 10.8619   LearningRate 0.4602   Epoch: 5   Global Step: 31040   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:42:56,740-Speed 10488.44 samples/sec   Loss 10.8580   LearningRate 0.4601   Epoch: 5   Global Step: 31050   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:43:04,531-Speed 10516.67 samples/sec   Loss 10.8430   LearningRate 0.4599   Epoch: 5   Global Step: 31060   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:43:12,325-Speed 10513.10 samples/sec   Loss 10.8462   LearningRate 0.4598   Epoch: 5   Global Step: 31070   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:43:20,097-Speed 10541.84 samples/sec   Loss 10.8702   LearningRate 0.4597   Epoch: 5   Global Step: 31080   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:43:27,925-Speed 10466.19 samples/sec   Loss 10.9111   LearningRate 0.4596   Epoch: 5   Global Step: 31090   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:43:35,746-Speed 10477.70 samples/sec   Loss 10.9134   LearningRate 0.4594   Epoch: 5   Global Step: 31100   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:43:43,568-Speed 10474.02 samples/sec   Loss 10.7533   LearningRate 0.4593   Epoch: 5   Global Step: 31110   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:44:06,823-Speed 3522.84 samples/sec   Loss 10.7530   LearningRate 0.4592   Epoch: 6   Global Step: 31120   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:44:14,613-Speed 10518.15 samples/sec   Loss 10.7925   LearningRate 0.4590   Epoch: 6   Global Step: 31130   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:44:22,371-Speed 10562.04 samples/sec   Loss 10.8982   LearningRate 0.4589   Epoch: 6   Global Step: 31140   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:44:30,133-Speed 10555.43 samples/sec   Loss 10.8596   LearningRate 0.4588   Epoch: 6   Global Step: 31150   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:44:37,916-Speed 10527.23 samples/sec   Loss 10.8835   LearningRate 0.4587   Epoch: 6   Global Step: 31160   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:44:45,702-Speed 10522.07 samples/sec   Loss 10.7393   LearningRate 0.4585   Epoch: 6   Global Step: 31170   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:44:53,505-Speed 10500.02 samples/sec   Loss 10.7237   LearningRate 0.4584   Epoch: 6   Global Step: 31180   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:45:01,286-Speed 10529.67 samples/sec   Loss 10.7982   LearningRate 0.4583   Epoch: 6   Global Step: 31190   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:45:09,084-Speed 10506.57 samples/sec   Loss 10.7044   LearningRate 0.4582   Epoch: 6   Global Step: 31200   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:45:16,892-Speed 10493.13 samples/sec   Loss 10.8499   LearningRate 0.4580   Epoch: 6   Global Step: 31210   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:45:24,697-Speed 10497.11 samples/sec   Loss 10.8678   LearningRate 0.4579   Epoch: 6   Global Step: 31220   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:45:32,531-Speed 10458.15 samples/sec   Loss 10.7444   LearningRate 0.4578   Epoch: 6   Global Step: 31230   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:45:40,314-Speed 10527.45 samples/sec   Loss 10.7924   LearningRate 0.4577   Epoch: 6   Global Step: 31240   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:45:48,109-Speed 10510.39 samples/sec   Loss 10.7829   LearningRate 0.4575   Epoch: 6   Global Step: 31250   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:45:56,006-Speed 10375.58 samples/sec   Loss 10.7445   LearningRate 0.4574   Epoch: 6   Global Step: 31260   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:46:03,812-Speed 10495.33 samples/sec   Loss 10.9208   LearningRate 0.4573   Epoch: 6   Global Step: 31270   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:46:11,655-Speed 10446.84 samples/sec   Loss 10.7622   LearningRate 0.4571   Epoch: 6   Global Step: 31280   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:46:19,469-Speed 10485.46 samples/sec   Loss 10.7911   LearningRate 0.4570   Epoch: 6   Global Step: 31290   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:46:27,306-Speed 10453.98 samples/sec   Loss 10.8759   LearningRate 0.4569   Epoch: 6   Global Step: 31300   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:46:35,129-Speed 10473.16 samples/sec   Loss 10.8818   LearningRate 0.4568   Epoch: 6   Global Step: 31310   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:46:42,946-Speed 10480.69 samples/sec   Loss 10.8510   LearningRate 0.4566   Epoch: 6   Global Step: 31320   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:46:50,732-Speed 10523.43 samples/sec   Loss 10.7983   LearningRate 0.4565   Epoch: 6   Global Step: 31330   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:46:58,519-Speed 10520.96 samples/sec   Loss 10.7771   LearningRate 0.4564   Epoch: 6   Global Step: 31340   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:47:06,307-Speed 10520.45 samples/sec   Loss 10.7348   LearningRate 0.4563   Epoch: 6   Global Step: 31350   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:47:14,091-Speed 10526.58 samples/sec   Loss 10.7398   LearningRate 0.4561   Epoch: 6   Global Step: 31360   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:47:21,873-Speed 10529.42 samples/sec   Loss 10.8142   LearningRate 0.4560   Epoch: 6   Global Step: 31370   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:47:29,688-Speed 10483.71 samples/sec   Loss 10.7490   LearningRate 0.4559   Epoch: 6   Global Step: 31380   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:47:37,478-Speed 10519.27 samples/sec   Loss 10.8543   LearningRate 0.4558   Epoch: 6   Global Step: 31390   Fp16 Grad Scale: 524288   Required: 16 hours
Training: 2022-01-15 21:47:45,262-Speed 10524.04 samples/sec   Loss 10.8129   LearningRate 0.4556   Epoch: 6   Global Step: 31400   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:47:53,078-Speed 10482.32 samples/sec   Loss 10.7868   LearningRate 0.4555   Epoch: 6   Global Step: 31410   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:48:00,858-Speed 10531.60 samples/sec   Loss 10.7623   LearningRate 0.4554   Epoch: 6   Global Step: 31420   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:48:08,687-Speed 10465.76 samples/sec   Loss 10.7286   LearningRate 0.4553   Epoch: 6   Global Step: 31430   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:48:16,481-Speed 10511.64 samples/sec   Loss 10.8896   LearningRate 0.4551   Epoch: 6   Global Step: 31440   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:48:24,287-Speed 10496.59 samples/sec   Loss 10.8060   LearningRate 0.4550   Epoch: 6   Global Step: 31450   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:48:32,097-Speed 10489.78 samples/sec   Loss 10.7673   LearningRate 0.4549   Epoch: 6   Global Step: 31460   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:48:39,899-Speed 10502.89 samples/sec   Loss 10.7485   LearningRate 0.4548   Epoch: 6   Global Step: 31470   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:48:47,689-Speed 10516.69 samples/sec   Loss 10.7806   LearningRate 0.4546   Epoch: 6   Global Step: 31480   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:48:55,537-Speed 10440.34 samples/sec   Loss 10.7949   LearningRate 0.4545   Epoch: 6   Global Step: 31490   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:49:03,335-Speed 10505.81 samples/sec   Loss 10.8297   LearningRate 0.4544   Epoch: 6   Global Step: 31500   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:49:11,153-Speed 10479.99 samples/sec   Loss 10.8311   LearningRate 0.4542   Epoch: 6   Global Step: 31510   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:49:19,015-Speed 10421.23 samples/sec   Loss 10.7614   LearningRate 0.4541   Epoch: 6   Global Step: 31520   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:49:26,837-Speed 10474.67 samples/sec   Loss 10.7803   LearningRate 0.4540   Epoch: 6   Global Step: 31530   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:49:34,656-Speed 10479.22 samples/sec   Loss 10.7920   LearningRate 0.4539   Epoch: 6   Global Step: 31540   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:49:42,489-Speed 10459.34 samples/sec   Loss 10.8628   LearningRate 0.4537   Epoch: 6   Global Step: 31550   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:49:50,344-Speed 10430.43 samples/sec   Loss 10.8298   LearningRate 0.4536   Epoch: 6   Global Step: 31560   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:49:58,190-Speed 10441.93 samples/sec   Loss 10.6995   LearningRate 0.4535   Epoch: 6   Global Step: 31570   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:50:06,053-Speed 10420.89 samples/sec   Loss 10.7911   LearningRate 0.4534   Epoch: 6   Global Step: 31580   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:50:13,884-Speed 10462.18 samples/sec   Loss 10.7751   LearningRate 0.4532   Epoch: 6   Global Step: 31590   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:50:21,725-Speed 10449.80 samples/sec   Loss 10.7903   LearningRate 0.4531   Epoch: 6   Global Step: 31600   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:50:29,590-Speed 10416.09 samples/sec   Loss 10.7500   LearningRate 0.4530   Epoch: 6   Global Step: 31610   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:50:37,467-Speed 10402.32 samples/sec   Loss 10.7820   LearningRate 0.4529   Epoch: 6   Global Step: 31620   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:50:45,295-Speed 10466.53 samples/sec   Loss 10.6809   LearningRate 0.4527   Epoch: 6   Global Step: 31630   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:50:53,110-Speed 10482.63 samples/sec   Loss 10.8306   LearningRate 0.4526   Epoch: 6   Global Step: 31640   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:51:00,927-Speed 10481.22 samples/sec   Loss 10.7179   LearningRate 0.4525   Epoch: 6   Global Step: 31650   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:51:08,756-Speed 10465.19 samples/sec   Loss 10.7535   LearningRate 0.4524   Epoch: 6   Global Step: 31660   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:51:16,587-Speed 10462.51 samples/sec   Loss 10.7084   LearningRate 0.4522   Epoch: 6   Global Step: 31670   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:51:24,418-Speed 10462.42 samples/sec   Loss 10.8269   LearningRate 0.4521   Epoch: 6   Global Step: 31680   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:51:32,244-Speed 10469.13 samples/sec   Loss 10.7442   LearningRate 0.4520   Epoch: 6   Global Step: 31690   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:51:40,069-Speed 10470.32 samples/sec   Loss 10.6782   LearningRate 0.4519   Epoch: 6   Global Step: 31700   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:51:47,921-Speed 10433.94 samples/sec   Loss 10.7736   LearningRate 0.4517   Epoch: 6   Global Step: 31710   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:51:55,738-Speed 10481.74 samples/sec   Loss 10.7615   LearningRate 0.4516   Epoch: 6   Global Step: 31720   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:52:03,558-Speed 10476.78 samples/sec   Loss 10.7317   LearningRate 0.4515   Epoch: 6   Global Step: 31730   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:52:11,394-Speed 10455.88 samples/sec   Loss 10.7569   LearningRate 0.4514   Epoch: 6   Global Step: 31740   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:52:19,212-Speed 10479.66 samples/sec   Loss 10.6814   LearningRate 0.4512   Epoch: 6   Global Step: 31750   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:52:27,031-Speed 10478.08 samples/sec   Loss 10.7136   LearningRate 0.4511   Epoch: 6   Global Step: 31760   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:52:34,859-Speed 10469.59 samples/sec   Loss 10.7331   LearningRate 0.4510   Epoch: 6   Global Step: 31770   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:52:42,694-Speed 10456.71 samples/sec   Loss 10.6429   LearningRate 0.4509   Epoch: 6   Global Step: 31780   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:52:50,538-Speed 10445.29 samples/sec   Loss 10.7070   LearningRate 0.4507   Epoch: 6   Global Step: 31790   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:52:58,389-Speed 10436.77 samples/sec   Loss 10.8842   LearningRate 0.4506   Epoch: 6   Global Step: 31800   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:53:06,229-Speed 10449.33 samples/sec   Loss 10.8185   LearningRate 0.4505   Epoch: 6   Global Step: 31810   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:53:14,074-Speed 10443.59 samples/sec   Loss 10.7243   LearningRate 0.4504   Epoch: 6   Global Step: 31820   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:53:21,943-Speed 10413.17 samples/sec   Loss 10.6901   LearningRate 0.4502   Epoch: 6   Global Step: 31830   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:53:29,750-Speed 10494.68 samples/sec   Loss 10.6922   LearningRate 0.4501   Epoch: 6   Global Step: 31840   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:53:37,566-Speed 10480.89 samples/sec   Loss 10.7757   LearningRate 0.4500   Epoch: 6   Global Step: 31850   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:53:45,389-Speed 10473.81 samples/sec   Loss 10.7258   LearningRate 0.4499   Epoch: 6   Global Step: 31860   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:53:53,272-Speed 10394.82 samples/sec   Loss 10.7366   LearningRate 0.4497   Epoch: 6   Global Step: 31870   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:54:01,074-Speed 10500.80 samples/sec   Loss 10.7269   LearningRate 0.4496   Epoch: 6   Global Step: 31880   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:54:08,858-Speed 10525.39 samples/sec   Loss 10.6844   LearningRate 0.4495   Epoch: 6   Global Step: 31890   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:54:16,651-Speed 10513.04 samples/sec   Loss 10.7001   LearningRate 0.4494   Epoch: 6   Global Step: 31900   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:54:24,432-Speed 10529.12 samples/sec   Loss 10.7429   LearningRate 0.4492   Epoch: 6   Global Step: 31910   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:54:32,224-Speed 10522.76 samples/sec   Loss 10.7876   LearningRate 0.4491   Epoch: 6   Global Step: 31920   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:54:40,006-Speed 10528.40 samples/sec   Loss 10.7588   LearningRate 0.4490   Epoch: 6   Global Step: 31930   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:54:47,795-Speed 10519.54 samples/sec   Loss 10.6852   LearningRate 0.4489   Epoch: 6   Global Step: 31940   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:54:55,580-Speed 10523.65 samples/sec   Loss 10.7627   LearningRate 0.4487   Epoch: 6   Global Step: 31950   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:55:03,357-Speed 10534.76 samples/sec   Loss 10.6946   LearningRate 0.4486   Epoch: 6   Global Step: 31960   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:55:11,149-Speed 10520.33 samples/sec   Loss 10.6570   LearningRate 0.4485   Epoch: 6   Global Step: 31970   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:55:18,931-Speed 10528.84 samples/sec   Loss 10.6622   LearningRate 0.4484   Epoch: 6   Global Step: 31980   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:55:26,715-Speed 10524.50 samples/sec   Loss 10.7156   LearningRate 0.4482   Epoch: 6   Global Step: 31990   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:55:34,513-Speed 10510.30 samples/sec   Loss 10.7195   LearningRate 0.4481   Epoch: 6   Global Step: 32000   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:55:42,291-Speed 10533.97 samples/sec   Loss 10.6806   LearningRate 0.4480   Epoch: 6   Global Step: 32010   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:55:50,064-Speed 10540.10 samples/sec   Loss 10.6963   LearningRate 0.4479   Epoch: 6   Global Step: 32020   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:55:57,881-Speed 10481.87 samples/sec   Loss 10.6396   LearningRate 0.4477   Epoch: 6   Global Step: 32030   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:56:05,692-Speed 10489.12 samples/sec   Loss 10.8020   LearningRate 0.4476   Epoch: 6   Global Step: 32040   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:56:13,510-Speed 10479.13 samples/sec   Loss 10.6672   LearningRate 0.4475   Epoch: 6   Global Step: 32050   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:56:21,314-Speed 10499.37 samples/sec   Loss 10.7686   LearningRate 0.4474   Epoch: 6   Global Step: 32060   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:56:29,111-Speed 10508.11 samples/sec   Loss 10.7190   LearningRate 0.4472   Epoch: 6   Global Step: 32070   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:56:36,928-Speed 10481.18 samples/sec   Loss 10.7127   LearningRate 0.4471   Epoch: 6   Global Step: 32080   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:56:44,725-Speed 10507.87 samples/sec   Loss 10.5985   LearningRate 0.4470   Epoch: 6   Global Step: 32090   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:56:52,506-Speed 10530.23 samples/sec   Loss 10.6821   LearningRate 0.4469   Epoch: 6   Global Step: 32100   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:57:00,320-Speed 10485.31 samples/sec   Loss 10.6975   LearningRate 0.4467   Epoch: 6   Global Step: 32110   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:57:08,109-Speed 10518.46 samples/sec   Loss 10.7166   LearningRate 0.4466   Epoch: 6   Global Step: 32120   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:57:15,939-Speed 10464.05 samples/sec   Loss 10.6826   LearningRate 0.4465   Epoch: 6   Global Step: 32130   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:57:23,738-Speed 10504.32 samples/sec   Loss 10.6898   LearningRate 0.4464   Epoch: 6   Global Step: 32140   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:57:31,530-Speed 10514.84 samples/sec   Loss 10.6852   LearningRate 0.4462   Epoch: 6   Global Step: 32150   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:57:39,323-Speed 10514.21 samples/sec   Loss 10.7105   LearningRate 0.4461   Epoch: 6   Global Step: 32160   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:57:47,114-Speed 10515.52 samples/sec   Loss 10.6503   LearningRate 0.4460   Epoch: 6   Global Step: 32170   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:57:54,886-Speed 10541.72 samples/sec   Loss 10.6319   LearningRate 0.4459   Epoch: 6   Global Step: 32180   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:58:02,682-Speed 10510.02 samples/sec   Loss 10.6348   LearningRate 0.4457   Epoch: 6   Global Step: 32190   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:58:10,472-Speed 10518.02 samples/sec   Loss 10.6172   LearningRate 0.4456   Epoch: 6   Global Step: 32200   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:58:18,262-Speed 10517.06 samples/sec   Loss 10.6446   LearningRate 0.4455   Epoch: 6   Global Step: 32210   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:58:26,073-Speed 10489.14 samples/sec   Loss 10.7371   LearningRate 0.4454   Epoch: 6   Global Step: 32220   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:58:33,874-Speed 10502.24 samples/sec   Loss 10.6640   LearningRate 0.4452   Epoch: 6   Global Step: 32230   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:58:41,695-Speed 10476.78 samples/sec   Loss 10.6456   LearningRate 0.4451   Epoch: 6   Global Step: 32240   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:58:49,476-Speed 10528.90 samples/sec   Loss 10.7356   LearningRate 0.4450   Epoch: 6   Global Step: 32250   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:58:57,261-Speed 10523.67 samples/sec   Loss 10.7428   LearningRate 0.4449   Epoch: 6   Global Step: 32260   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:59:05,074-Speed 10486.66 samples/sec   Loss 10.6016   LearningRate 0.4447   Epoch: 6   Global Step: 32270   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:59:12,859-Speed 10524.01 samples/sec   Loss 10.6540   LearningRate 0.4446   Epoch: 6   Global Step: 32280   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:59:20,641-Speed 10529.22 samples/sec   Loss 10.7345   LearningRate 0.4445   Epoch: 6   Global Step: 32290   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:59:28,428-Speed 10520.82 samples/sec   Loss 10.7116   LearningRate 0.4444   Epoch: 6   Global Step: 32300   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:59:36,210-Speed 10528.88 samples/sec   Loss 10.7053   LearningRate 0.4442   Epoch: 6   Global Step: 32310   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:59:43,996-Speed 10522.74 samples/sec   Loss 10.7840   LearningRate 0.4441   Epoch: 6   Global Step: 32320   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 21:59:51,806-Speed 10490.02 samples/sec   Loss 10.6756   LearningRate 0.4440   Epoch: 6   Global Step: 32330   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 21:59:59,595-Speed 10519.13 samples/sec   Loss 10.5766   LearningRate 0.4439   Epoch: 6   Global Step: 32340   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 22:00:07,382-Speed 10521.82 samples/sec   Loss 10.6444   LearningRate 0.4437   Epoch: 6   Global Step: 32350   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:00:15,170-Speed 10520.32 samples/sec   Loss 10.7189   LearningRate 0.4436   Epoch: 6   Global Step: 32360   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:00:22,977-Speed 10494.52 samples/sec   Loss 10.7001   LearningRate 0.4435   Epoch: 6   Global Step: 32370   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:00:30,775-Speed 10507.39 samples/sec   Loss 10.6181   LearningRate 0.4434   Epoch: 6   Global Step: 32380   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:00:38,586-Speed 10489.14 samples/sec   Loss 10.6366   LearningRate 0.4432   Epoch: 6   Global Step: 32390   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:00:46,361-Speed 10537.57 samples/sec   Loss 10.6775   LearningRate 0.4431   Epoch: 6   Global Step: 32400   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:00:54,184-Speed 10472.49 samples/sec   Loss 10.6734   LearningRate 0.4430   Epoch: 6   Global Step: 32410   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-15 22:01:01,967-Speed 10527.11 samples/sec   Loss 10.6276   LearningRate 0.4429   Epoch: 6   Global Step: 32420   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-15 22:01:09,788-Speed 10476.03 samples/sec   Loss 10.6395   LearningRate 0.4427   Epoch: 6   Global Step: 32430   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-15 22:01:17,566-Speed 10533.51 samples/sec   Loss 10.6435   LearningRate 0.4426   Epoch: 6   Global Step: 32440   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-15 22:01:25,352-Speed 10523.13 samples/sec   Loss 10.6323   LearningRate 0.4425   Epoch: 6   Global Step: 32450   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-15 22:01:33,124-Speed 10542.37 samples/sec   Loss 10.6749   LearningRate 0.4424   Epoch: 6   Global Step: 32460   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-15 22:01:40,911-Speed 10521.20 samples/sec   Loss 10.6330   LearningRate 0.4422   Epoch: 6   Global Step: 32470   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-15 22:01:48,749-Speed 10455.95 samples/sec   Loss 10.7165   LearningRate 0.4421   Epoch: 6   Global Step: 32480   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-15 22:01:56,541-Speed 10514.08 samples/sec   Loss 10.6292   LearningRate 0.4420   Epoch: 6   Global Step: 32490   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-15 22:02:04,345-Speed 10499.11 samples/sec   Loss 10.6677   LearningRate 0.4419   Epoch: 6   Global Step: 32500   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-15 22:02:12,138-Speed 10513.22 samples/sec   Loss 10.6454   LearningRate 0.4417   Epoch: 6   Global Step: 32510   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:02:19,941-Speed 10499.91 samples/sec   Loss 10.6312   LearningRate 0.4416   Epoch: 6   Global Step: 32520   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:02:27,733-Speed 10513.97 samples/sec   Loss 10.6374   LearningRate 0.4415   Epoch: 6   Global Step: 32530   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:02:35,537-Speed 10499.08 samples/sec   Loss 10.6232   LearningRate 0.4414   Epoch: 6   Global Step: 32540   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:02:43,333-Speed 10509.49 samples/sec   Loss 10.6333   LearningRate 0.4413   Epoch: 6   Global Step: 32550   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:02:51,115-Speed 10527.86 samples/sec   Loss 10.5883   LearningRate 0.4411   Epoch: 6   Global Step: 32560   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:02:58,921-Speed 10496.94 samples/sec   Loss 10.7083   LearningRate 0.4410   Epoch: 6   Global Step: 32570   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:03:06,706-Speed 10523.82 samples/sec   Loss 10.6108   LearningRate 0.4409   Epoch: 6   Global Step: 32580   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:03:14,505-Speed 10505.00 samples/sec   Loss 10.7116   LearningRate 0.4408   Epoch: 6   Global Step: 32590   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:03:22,314-Speed 10491.40 samples/sec   Loss 10.6583   LearningRate 0.4406   Epoch: 6   Global Step: 32600   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:03:30,142-Speed 10468.06 samples/sec   Loss 10.6008   LearningRate 0.4405   Epoch: 6   Global Step: 32610   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 22:03:37,952-Speed 10490.16 samples/sec   Loss 10.6295   LearningRate 0.4404   Epoch: 6   Global Step: 32620   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 22:03:45,744-Speed 10514.14 samples/sec   Loss 10.5695   LearningRate 0.4403   Epoch: 6   Global Step: 32630   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 22:03:53,555-Speed 10490.17 samples/sec   Loss 10.6201   LearningRate 0.4401   Epoch: 6   Global Step: 32640   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 22:04:01,414-Speed 10425.77 samples/sec   Loss 10.6726   LearningRate 0.4400   Epoch: 6   Global Step: 32650   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 22:04:09,215-Speed 10503.54 samples/sec   Loss 10.6730   LearningRate 0.4399   Epoch: 6   Global Step: 32660   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 22:04:17,014-Speed 10505.46 samples/sec   Loss 10.6851   LearningRate 0.4398   Epoch: 6   Global Step: 32670   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 22:04:24,827-Speed 10486.76 samples/sec   Loss 10.6245   LearningRate 0.4396   Epoch: 6   Global Step: 32680   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 22:04:32,639-Speed 10488.50 samples/sec   Loss 10.5113   LearningRate 0.4395   Epoch: 6   Global Step: 32690   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 22:04:40,431-Speed 10513.57 samples/sec   Loss 10.5455   LearningRate 0.4394   Epoch: 6   Global Step: 32700   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 22:04:48,211-Speed 10531.62 samples/sec   Loss 10.5476   LearningRate 0.4393   Epoch: 6   Global Step: 32710   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:04:56,014-Speed 10499.34 samples/sec   Loss 10.5654   LearningRate 0.4391   Epoch: 6   Global Step: 32720   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:05:03,817-Speed 10500.86 samples/sec   Loss 10.5487   LearningRate 0.4390   Epoch: 6   Global Step: 32730   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:05:11,642-Speed 10470.33 samples/sec   Loss 10.6973   LearningRate 0.4389   Epoch: 6   Global Step: 32740   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:05:19,439-Speed 10507.97 samples/sec   Loss 10.6353   LearningRate 0.4388   Epoch: 6   Global Step: 32750   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:05:27,227-Speed 10520.30 samples/sec   Loss 10.5988   LearningRate 0.4387   Epoch: 6   Global Step: 32760   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:05:35,030-Speed 10499.21 samples/sec   Loss 10.5538   LearningRate 0.4385   Epoch: 6   Global Step: 32770   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:05:42,804-Speed 10539.69 samples/sec   Loss 10.5515   LearningRate 0.4384   Epoch: 6   Global Step: 32780   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:05:50,629-Speed 10469.83 samples/sec   Loss 10.6289   LearningRate 0.4383   Epoch: 6   Global Step: 32790   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:05:58,414-Speed 10524.12 samples/sec   Loss 10.5462   LearningRate 0.4382   Epoch: 6   Global Step: 32800   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:06:06,194-Speed 10531.38 samples/sec   Loss 10.6167   LearningRate 0.4380   Epoch: 6   Global Step: 32810   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 22:06:14,016-Speed 10474.65 samples/sec   Loss 10.4915   LearningRate 0.4379   Epoch: 6   Global Step: 32820   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 22:06:21,826-Speed 10489.67 samples/sec   Loss 10.6343   LearningRate 0.4378   Epoch: 6   Global Step: 32830   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 22:06:29,610-Speed 10526.63 samples/sec   Loss 10.6528   LearningRate 0.4377   Epoch: 6   Global Step: 32840   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 22:06:37,431-Speed 10475.26 samples/sec   Loss 10.5414   LearningRate 0.4375   Epoch: 6   Global Step: 32850   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 22:06:45,233-Speed 10500.75 samples/sec   Loss 10.6421   LearningRate 0.4374   Epoch: 6   Global Step: 32860   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 22:06:53,026-Speed 10513.43 samples/sec   Loss 10.5232   LearningRate 0.4373   Epoch: 6   Global Step: 32870   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 22:07:00,830-Speed 10498.55 samples/sec   Loss 10.5911   LearningRate 0.4372   Epoch: 6   Global Step: 32880   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 22:07:08,624-Speed 10512.73 samples/sec   Loss 10.5261   LearningRate 0.4370   Epoch: 6   Global Step: 32890   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 22:07:16,405-Speed 10530.16 samples/sec   Loss 10.6160   LearningRate 0.4369   Epoch: 6   Global Step: 32900   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:07:24,228-Speed 10472.26 samples/sec   Loss 10.6107   LearningRate 0.4368   Epoch: 6   Global Step: 32910   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:07:32,049-Speed 10476.14 samples/sec   Loss 10.5099   LearningRate 0.4367   Epoch: 6   Global Step: 32920   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:07:39,824-Speed 10538.29 samples/sec   Loss 10.5937   LearningRate 0.4366   Epoch: 6   Global Step: 32930   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:07:47,621-Speed 10507.38 samples/sec   Loss 10.5000   LearningRate 0.4364   Epoch: 6   Global Step: 32940   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:07:55,450-Speed 10465.40 samples/sec   Loss 10.5598   LearningRate 0.4363   Epoch: 6   Global Step: 32950   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:08:03,224-Speed 10539.17 samples/sec   Loss 10.6460   LearningRate 0.4362   Epoch: 6   Global Step: 32960   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:08:11,023-Speed 10505.52 samples/sec   Loss 10.6313   LearningRate 0.4361   Epoch: 6   Global Step: 32970   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:08:18,857-Speed 10457.80 samples/sec   Loss 10.5221   LearningRate 0.4359   Epoch: 6   Global Step: 32980   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:08:26,664-Speed 10494.69 samples/sec   Loss 10.6902   LearningRate 0.4358   Epoch: 6   Global Step: 32990   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:08:34,476-Speed 10489.01 samples/sec   Loss 10.5769   LearningRate 0.4357   Epoch: 6   Global Step: 33000   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 22:08:42,281-Speed 10497.50 samples/sec   Loss 10.6221   LearningRate 0.4356   Epoch: 6   Global Step: 33010   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 22:08:50,068-Speed 10521.08 samples/sec   Loss 10.5591   LearningRate 0.4354   Epoch: 6   Global Step: 33020   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 22:08:57,857-Speed 10518.91 samples/sec   Loss 10.4766   LearningRate 0.4353   Epoch: 6   Global Step: 33030   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 22:09:05,639-Speed 10528.42 samples/sec   Loss 10.4916   LearningRate 0.4352   Epoch: 6   Global Step: 33040   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 22:09:13,414-Speed 10537.58 samples/sec   Loss 10.4823   LearningRate 0.4351   Epoch: 6   Global Step: 33050   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 22:09:21,215-Speed 10502.77 samples/sec   Loss 10.5944   LearningRate 0.4349   Epoch: 6   Global Step: 33060   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 22:09:29,015-Speed 10504.44 samples/sec   Loss 10.5560   LearningRate 0.4348   Epoch: 6   Global Step: 33070   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 22:09:36,812-Speed 10508.32 samples/sec   Loss 10.4854   LearningRate 0.4347   Epoch: 6   Global Step: 33080   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 22:09:44,597-Speed 10524.34 samples/sec   Loss 10.5458   LearningRate 0.4346   Epoch: 6   Global Step: 33090   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 22:09:52,396-Speed 10504.59 samples/sec   Loss 10.4890   LearningRate 0.4345   Epoch: 6   Global Step: 33100   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 22:10:00,218-Speed 10475.06 samples/sec   Loss 10.5854   LearningRate 0.4343   Epoch: 6   Global Step: 33110   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 22:10:08,008-Speed 10517.08 samples/sec   Loss 10.4440   LearningRate 0.4342   Epoch: 6   Global Step: 33120   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 22:10:15,791-Speed 10527.39 samples/sec   Loss 10.4674   LearningRate 0.4341   Epoch: 6   Global Step: 33130   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-15 22:10:23,586-Speed 10511.52 samples/sec   Loss 10.4409   LearningRate 0.4340   Epoch: 6   Global Step: 33140   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:10:31,369-Speed 10526.84 samples/sec   Loss 10.5370   LearningRate 0.4338   Epoch: 6   Global Step: 33150   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:10:39,179-Speed 10489.39 samples/sec   Loss 10.6172   LearningRate 0.4337   Epoch: 6   Global Step: 33160   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:10:47,002-Speed 10474.16 samples/sec   Loss 10.6226   LearningRate 0.4336   Epoch: 6   Global Step: 33170   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:10:54,793-Speed 10521.08 samples/sec   Loss 10.5520   LearningRate 0.4335   Epoch: 6   Global Step: 33180   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:11:02,599-Speed 10495.63 samples/sec   Loss 10.5688   LearningRate 0.4333   Epoch: 6   Global Step: 33190   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:11:10,398-Speed 10504.87 samples/sec   Loss 10.4723   LearningRate 0.4332   Epoch: 6   Global Step: 33200   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-15 22:11:18,203-Speed 10497.05 samples/sec   Loss 10.4654   LearningRate 0.4331   Epoch: 6   Global Step: 33210   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:11:29,891-Speed 10539.31 samples/sec   Loss 10.5022   LearningRate 0.4330   Epoch: 6   Global Step: 33220   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:11:37,657-Speed 10553.87 samples/sec   Loss 10.4975   LearningRate 0.4329   Epoch: 6   Global Step: 33230   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:11:45,450-Speed 10512.37 samples/sec   Loss 10.4904   LearningRate 0.4327   Epoch: 6   Global Step: 33240   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:11:53,276-Speed 10469.33 samples/sec   Loss 10.4851   LearningRate 0.4326   Epoch: 6   Global Step: 33250   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:12:01,082-Speed 10496.40 samples/sec   Loss 10.5133   LearningRate 0.4325   Epoch: 6   Global Step: 33260   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:12:08,869-Speed 10521.16 samples/sec   Loss 10.5299   LearningRate 0.4324   Epoch: 6   Global Step: 33270   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:12:16,689-Speed 10480.33 samples/sec   Loss 10.5138   LearningRate 0.4322   Epoch: 6   Global Step: 33280   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:12:24,481-Speed 10519.97 samples/sec   Loss 10.5127   LearningRate 0.4321   Epoch: 6   Global Step: 33290   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:12:32,283-Speed 10502.58 samples/sec   Loss 10.5040   LearningRate 0.4320   Epoch: 6   Global Step: 33300   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:12:40,085-Speed 10500.18 samples/sec   Loss 10.5539   LearningRate 0.4319   Epoch: 6   Global Step: 33310   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:12:47,875-Speed 10518.02 samples/sec   Loss 10.4786   LearningRate 0.4318   Epoch: 6   Global Step: 33320   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:12:55,662-Speed 10522.66 samples/sec   Loss 10.5692   LearningRate 0.4316   Epoch: 6   Global Step: 33330   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:13:03,456-Speed 10511.88 samples/sec   Loss 10.5745   LearningRate 0.4315   Epoch: 6   Global Step: 33340   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:13:11,277-Speed 10482.13 samples/sec   Loss 10.4089   LearningRate 0.4314   Epoch: 6   Global Step: 33350   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:13:19,123-Speed 10442.61 samples/sec   Loss 10.5198   LearningRate 0.4313   Epoch: 6   Global Step: 33360   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:13:27,057-Speed 10327.23 samples/sec   Loss 10.4282   LearningRate 0.4311   Epoch: 6   Global Step: 33370   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:13:34,927-Speed 10412.38 samples/sec   Loss 10.5441   LearningRate 0.4310   Epoch: 6   Global Step: 33380   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:13:42,721-Speed 10512.26 samples/sec   Loss 10.5607   LearningRate 0.4309   Epoch: 6   Global Step: 33390   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:13:50,546-Speed 10469.81 samples/sec   Loss 10.5604   LearningRate 0.4308   Epoch: 6   Global Step: 33400   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:13:58,329-Speed 10527.52 samples/sec   Loss 10.4541   LearningRate 0.4306   Epoch: 6   Global Step: 33410   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:14:06,116-Speed 10521.66 samples/sec   Loss 10.5186   LearningRate 0.4305   Epoch: 6   Global Step: 33420   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:14:13,912-Speed 10509.51 samples/sec   Loss 10.4762   LearningRate 0.4304   Epoch: 6   Global Step: 33430   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:14:21,711-Speed 10505.60 samples/sec   Loss 10.4854   LearningRate 0.4303   Epoch: 6   Global Step: 33440   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:14:29,538-Speed 10468.40 samples/sec   Loss 10.5265   LearningRate 0.4302   Epoch: 6   Global Step: 33450   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:14:37,388-Speed 10438.55 samples/sec   Loss 10.6596   LearningRate 0.4300   Epoch: 6   Global Step: 33460   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:14:45,223-Speed 10457.42 samples/sec   Loss 10.5177   LearningRate 0.4299   Epoch: 6   Global Step: 33470   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:14:53,007-Speed 10524.23 samples/sec   Loss 10.4944   LearningRate 0.4298   Epoch: 6   Global Step: 33480   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:15:00,784-Speed 10535.26 samples/sec   Loss 10.4971   LearningRate 0.4297   Epoch: 6   Global Step: 33490   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:15:08,547-Speed 10554.35 samples/sec   Loss 10.4550   LearningRate 0.4295   Epoch: 6   Global Step: 33500   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:15:16,318-Speed 10542.53 samples/sec   Loss 10.5359   LearningRate 0.4294   Epoch: 6   Global Step: 33510   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:15:24,109-Speed 10516.34 samples/sec   Loss 10.5272   LearningRate 0.4293   Epoch: 6   Global Step: 33520   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:15:31,885-Speed 10538.09 samples/sec   Loss 10.4958   LearningRate 0.4292   Epoch: 6   Global Step: 33530   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:15:39,696-Speed 10489.35 samples/sec   Loss 10.5394   LearningRate 0.4291   Epoch: 6   Global Step: 33540   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:15:47,487-Speed 10516.65 samples/sec   Loss 10.4735   LearningRate 0.4289   Epoch: 6   Global Step: 33550   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:15:55,276-Speed 10517.61 samples/sec   Loss 10.4876   LearningRate 0.4288   Epoch: 6   Global Step: 33560   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:16:03,078-Speed 10502.39 samples/sec   Loss 10.4114   LearningRate 0.4287   Epoch: 6   Global Step: 33570   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:16:10,876-Speed 10506.00 samples/sec   Loss 10.5213   LearningRate 0.4286   Epoch: 6   Global Step: 33580   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:16:18,665-Speed 10524.29 samples/sec   Loss 10.5212   LearningRate 0.4284   Epoch: 6   Global Step: 33590   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:16:26,467-Speed 10500.59 samples/sec   Loss 10.4664   LearningRate 0.4283   Epoch: 6   Global Step: 33600   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:16:34,243-Speed 10536.89 samples/sec   Loss 10.4052   LearningRate 0.4282   Epoch: 6   Global Step: 33610   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:16:42,078-Speed 10457.02 samples/sec   Loss 10.4805   LearningRate 0.4281   Epoch: 6   Global Step: 33620   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:16:49,899-Speed 10475.95 samples/sec   Loss 10.4657   LearningRate 0.4280   Epoch: 6   Global Step: 33630   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:16:57,694-Speed 10512.54 samples/sec   Loss 10.5832   LearningRate 0.4278   Epoch: 6   Global Step: 33640   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:17:05,487-Speed 10513.23 samples/sec   Loss 10.4893   LearningRate 0.4277   Epoch: 6   Global Step: 33650   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:17:13,266-Speed 10532.63 samples/sec   Loss 10.4614   LearningRate 0.4276   Epoch: 6   Global Step: 33660   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:17:21,053-Speed 10521.39 samples/sec   Loss 10.5049   LearningRate 0.4275   Epoch: 6   Global Step: 33670   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:17:28,887-Speed 10459.04 samples/sec   Loss 10.4114   LearningRate 0.4273   Epoch: 6   Global Step: 33680   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:17:36,679-Speed 10515.20 samples/sec   Loss 10.4969   LearningRate 0.4272   Epoch: 6   Global Step: 33690   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:17:44,455-Speed 10535.73 samples/sec   Loss 10.4505   LearningRate 0.4271   Epoch: 6   Global Step: 33700   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:17:52,271-Speed 10483.52 samples/sec   Loss 10.4923   LearningRate 0.4270   Epoch: 6   Global Step: 33710   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:18:00,072-Speed 10502.76 samples/sec   Loss 10.4377   LearningRate 0.4269   Epoch: 6   Global Step: 33720   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:18:07,859-Speed 10520.85 samples/sec   Loss 10.4228   LearningRate 0.4267   Epoch: 6   Global Step: 33730   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:18:15,667-Speed 10493.29 samples/sec   Loss 10.5014   LearningRate 0.4266   Epoch: 6   Global Step: 33740   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:18:23,468-Speed 10502.67 samples/sec   Loss 10.4267   LearningRate 0.4265   Epoch: 6   Global Step: 33750   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:18:31,262-Speed 10511.73 samples/sec   Loss 10.4078   LearningRate 0.4264   Epoch: 6   Global Step: 33760   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:18:39,126-Speed 10418.42 samples/sec   Loss 10.4205   LearningRate 0.4262   Epoch: 6   Global Step: 33770   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:18:46,936-Speed 10491.00 samples/sec   Loss 10.4661   LearningRate 0.4261   Epoch: 6   Global Step: 33780   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:18:54,740-Speed 10497.52 samples/sec   Loss 10.3963   LearningRate 0.4260   Epoch: 6   Global Step: 33790   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:19:02,549-Speed 10493.71 samples/sec   Loss 10.5014   LearningRate 0.4259   Epoch: 6   Global Step: 33800   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:19:10,367-Speed 10479.58 samples/sec   Loss 10.4093   LearningRate 0.4258   Epoch: 6   Global Step: 33810   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:19:18,161-Speed 10510.92 samples/sec   Loss 10.3447   LearningRate 0.4256   Epoch: 6   Global Step: 33820   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:19:25,966-Speed 10498.17 samples/sec   Loss 10.3748   LearningRate 0.4255   Epoch: 6   Global Step: 33830   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:19:33,753-Speed 10522.24 samples/sec   Loss 10.3804   LearningRate 0.4254   Epoch: 6   Global Step: 33840   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:19:41,548-Speed 10509.35 samples/sec   Loss 10.4880   LearningRate 0.4253   Epoch: 6   Global Step: 33850   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:19:49,406-Speed 10426.32 samples/sec   Loss 10.4622   LearningRate 0.4251   Epoch: 6   Global Step: 33860   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:19:57,186-Speed 10531.45 samples/sec   Loss 10.4266   LearningRate 0.4250   Epoch: 6   Global Step: 33870   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:20:05,050-Speed 10418.90 samples/sec   Loss 10.4222   LearningRate 0.4249   Epoch: 6   Global Step: 33880   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:20:12,874-Speed 10471.69 samples/sec   Loss 10.4049   LearningRate 0.4248   Epoch: 6   Global Step: 33890   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:20:20,664-Speed 10516.74 samples/sec   Loss 10.5095   LearningRate 0.4247   Epoch: 6   Global Step: 33900   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:20:28,445-Speed 10530.69 samples/sec   Loss 10.4320   LearningRate 0.4245   Epoch: 6   Global Step: 33910   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:20:36,241-Speed 10508.74 samples/sec   Loss 10.4435   LearningRate 0.4244   Epoch: 6   Global Step: 33920   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:20:44,012-Speed 10542.61 samples/sec   Loss 10.5016   LearningRate 0.4243   Epoch: 6   Global Step: 33930   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:20:51,834-Speed 10473.39 samples/sec   Loss 10.4545   LearningRate 0.4242   Epoch: 6   Global Step: 33940   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:20:59,650-Speed 10483.33 samples/sec   Loss 10.4675   LearningRate 0.4241   Epoch: 6   Global Step: 33950   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:21:07,487-Speed 10455.71 samples/sec   Loss 10.4204   LearningRate 0.4239   Epoch: 6   Global Step: 33960   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:21:15,287-Speed 10503.10 samples/sec   Loss 10.4097   LearningRate 0.4238   Epoch: 6   Global Step: 33970   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:21:23,078-Speed 10516.68 samples/sec   Loss 10.4130   LearningRate 0.4237   Epoch: 6   Global Step: 33980   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:21:30,919-Speed 10449.41 samples/sec   Loss 10.5152   LearningRate 0.4236   Epoch: 6   Global Step: 33990   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:21:38,699-Speed 10531.09 samples/sec   Loss 10.4041   LearningRate 0.4234   Epoch: 6   Global Step: 34000   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:21:46,489-Speed 10516.67 samples/sec   Loss 10.4725   LearningRate 0.4233   Epoch: 6   Global Step: 34010   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:21:54,309-Speed 10476.95 samples/sec   Loss 10.3935   LearningRate 0.4232   Epoch: 6   Global Step: 34020   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:22:02,134-Speed 10469.98 samples/sec   Loss 10.3942   LearningRate 0.4231   Epoch: 6   Global Step: 34030   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:22:09,964-Speed 10464.29 samples/sec   Loss 10.3930   LearningRate 0.4230   Epoch: 6   Global Step: 34040   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:22:17,767-Speed 10500.09 samples/sec   Loss 10.3796   LearningRate 0.4228   Epoch: 6   Global Step: 34050   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:22:25,634-Speed 10413.84 samples/sec   Loss 10.3586   LearningRate 0.4227   Epoch: 6   Global Step: 34060   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:22:33,433-Speed 10506.29 samples/sec   Loss 10.4057   LearningRate 0.4226   Epoch: 6   Global Step: 34070   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:22:41,227-Speed 10512.00 samples/sec   Loss 10.4061   LearningRate 0.4225   Epoch: 6   Global Step: 34080   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:22:49,047-Speed 10476.14 samples/sec   Loss 10.3241   LearningRate 0.4224   Epoch: 6   Global Step: 34090   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:22:56,858-Speed 10489.18 samples/sec   Loss 10.3491   LearningRate 0.4222   Epoch: 6   Global Step: 34100   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:23:04,666-Speed 10493.33 samples/sec   Loss 10.5251   LearningRate 0.4221   Epoch: 6   Global Step: 34110   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:23:12,481-Speed 10485.10 samples/sec   Loss 10.4789   LearningRate 0.4220   Epoch: 6   Global Step: 34120   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:23:20,257-Speed 10537.07 samples/sec   Loss 10.3522   LearningRate 0.4219   Epoch: 6   Global Step: 34130   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:23:28,097-Speed 10450.74 samples/sec   Loss 10.3076   LearningRate 0.4217   Epoch: 6   Global Step: 34140   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:23:35,895-Speed 10506.30 samples/sec   Loss 10.4003   LearningRate 0.4216   Epoch: 6   Global Step: 34150   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:23:43,698-Speed 10501.47 samples/sec   Loss 10.4089   LearningRate 0.4215   Epoch: 6   Global Step: 34160   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:23:51,512-Speed 10485.35 samples/sec   Loss 10.4182   LearningRate 0.4214   Epoch: 6   Global Step: 34170   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:23:59,314-Speed 10501.21 samples/sec   Loss 10.3687   LearningRate 0.4213   Epoch: 6   Global Step: 34180   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:24:07,090-Speed 10537.61 samples/sec   Loss 10.3378   LearningRate 0.4211   Epoch: 6   Global Step: 34190   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:24:14,944-Speed 10432.56 samples/sec   Loss 10.3503   LearningRate 0.4210   Epoch: 6   Global Step: 34200   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:24:22,739-Speed 10510.11 samples/sec   Loss 10.4220   LearningRate 0.4209   Epoch: 6   Global Step: 34210   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:24:30,538-Speed 10505.13 samples/sec   Loss 10.4507   LearningRate 0.4208   Epoch: 6   Global Step: 34220   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:24:38,348-Speed 10491.38 samples/sec   Loss 10.4566   LearningRate 0.4207   Epoch: 6   Global Step: 34230   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:24:46,145-Speed 10508.85 samples/sec   Loss 10.4310   LearningRate 0.4205   Epoch: 6   Global Step: 34240   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:24:53,932-Speed 10521.00 samples/sec   Loss 10.3583   LearningRate 0.4204   Epoch: 6   Global Step: 34250   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:25:01,726-Speed 10510.99 samples/sec   Loss 10.3753   LearningRate 0.4203   Epoch: 6   Global Step: 34260   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:25:09,513-Speed 10522.31 samples/sec   Loss 10.4206   LearningRate 0.4202   Epoch: 6   Global Step: 34270   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:25:17,316-Speed 10500.10 samples/sec   Loss 10.3905   LearningRate 0.4200   Epoch: 6   Global Step: 34280   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:25:25,152-Speed 10454.99 samples/sec   Loss 10.3566   LearningRate 0.4199   Epoch: 6   Global Step: 34290   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:25:34,948-Speed 8363.11 samples/sec   Loss 10.3355   LearningRate 0.4198   Epoch: 6   Global Step: 34300   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:25:42,734-Speed 10523.72 samples/sec   Loss 10.3927   LearningRate 0.4197   Epoch: 6   Global Step: 34310   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:25:50,568-Speed 10459.11 samples/sec   Loss 10.3844   LearningRate 0.4196   Epoch: 6   Global Step: 34320   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:25:58,400-Speed 10461.21 samples/sec   Loss 10.3651   LearningRate 0.4194   Epoch: 6   Global Step: 34330   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:26:06,202-Speed 10501.97 samples/sec   Loss 10.2941   LearningRate 0.4193   Epoch: 6   Global Step: 34340   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:26:14,004-Speed 10501.22 samples/sec   Loss 10.3646   LearningRate 0.4192   Epoch: 6   Global Step: 34350   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:26:21,792-Speed 10520.58 samples/sec   Loss 10.2838   LearningRate 0.4191   Epoch: 6   Global Step: 34360   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:26:29,618-Speed 10469.27 samples/sec   Loss 10.3613   LearningRate 0.4190   Epoch: 6   Global Step: 34370   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:26:37,413-Speed 10510.41 samples/sec   Loss 10.3555   LearningRate 0.4188   Epoch: 6   Global Step: 34380   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:26:45,238-Speed 10470.42 samples/sec   Loss 10.4727   LearningRate 0.4187   Epoch: 6   Global Step: 34390   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:26:53,053-Speed 10484.62 samples/sec   Loss 10.3791   LearningRate 0.4186   Epoch: 6   Global Step: 34400   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:27:00,833-Speed 10530.87 samples/sec   Loss 10.2950   LearningRate 0.4185   Epoch: 6   Global Step: 34410   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:27:08,614-Speed 10528.87 samples/sec   Loss 10.3685   LearningRate 0.4184   Epoch: 6   Global Step: 34420   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:27:16,394-Speed 10531.41 samples/sec   Loss 10.3030   LearningRate 0.4182   Epoch: 6   Global Step: 34430   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:27:24,186-Speed 10514.08 samples/sec   Loss 10.3727   LearningRate 0.4181   Epoch: 6   Global Step: 34440   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:27:31,979-Speed 10512.94 samples/sec   Loss 10.3339   LearningRate 0.4180   Epoch: 6   Global Step: 34450   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:27:39,768-Speed 10519.26 samples/sec   Loss 10.2887   LearningRate 0.4179   Epoch: 6   Global Step: 34460   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:27:47,568-Speed 10504.35 samples/sec   Loss 10.3046   LearningRate 0.4178   Epoch: 6   Global Step: 34470   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:27:55,338-Speed 10544.30 samples/sec   Loss 10.2943   LearningRate 0.4176   Epoch: 6   Global Step: 34480   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:28:03,124-Speed 10523.45 samples/sec   Loss 10.3520   LearningRate 0.4175   Epoch: 6   Global Step: 34490   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:28:10,908-Speed 10525.75 samples/sec   Loss 10.3795   LearningRate 0.4174   Epoch: 6   Global Step: 34500   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:28:18,704-Speed 10508.74 samples/sec   Loss 10.2875   LearningRate 0.4173   Epoch: 6   Global Step: 34510   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:28:26,497-Speed 10513.85 samples/sec   Loss 10.4088   LearningRate 0.4171   Epoch: 6   Global Step: 34520   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:28:34,361-Speed 10418.14 samples/sec   Loss 10.3360   LearningRate 0.4170   Epoch: 6   Global Step: 34530   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:28:42,165-Speed 10498.42 samples/sec   Loss 10.3390   LearningRate 0.4169   Epoch: 6   Global Step: 34540   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:28:49,957-Speed 10515.89 samples/sec   Loss 10.2853   LearningRate 0.4168   Epoch: 6   Global Step: 34550   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:28:57,764-Speed 10493.86 samples/sec   Loss 10.2905   LearningRate 0.4167   Epoch: 6   Global Step: 34560   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:29:05,607-Speed 10447.02 samples/sec   Loss 10.2717   LearningRate 0.4165   Epoch: 6   Global Step: 34570   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:29:13,433-Speed 10469.89 samples/sec   Loss 10.3090   LearningRate 0.4164   Epoch: 6   Global Step: 34580   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:29:21,259-Speed 10472.45 samples/sec   Loss 10.3731   LearningRate 0.4163   Epoch: 6   Global Step: 34590   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:29:29,081-Speed 10475.46 samples/sec   Loss 10.3734   LearningRate 0.4162   Epoch: 6   Global Step: 34600   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:29:36,899-Speed 10479.87 samples/sec   Loss 10.3853   LearningRate 0.4161   Epoch: 6   Global Step: 34610   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:29:44,687-Speed 10520.51 samples/sec   Loss 10.3637   LearningRate 0.4159   Epoch: 6   Global Step: 34620   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:29:52,496-Speed 10492.02 samples/sec   Loss 10.3101   LearningRate 0.4158   Epoch: 6   Global Step: 34630   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:30:00,323-Speed 10466.90 samples/sec   Loss 10.2283   LearningRate 0.4157   Epoch: 6   Global Step: 34640   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:30:08,149-Speed 10469.43 samples/sec   Loss 10.3284   LearningRate 0.4156   Epoch: 6   Global Step: 34650   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:30:15,953-Speed 10499.08 samples/sec   Loss 10.3424   LearningRate 0.4155   Epoch: 6   Global Step: 34660   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:30:23,760-Speed 10494.17 samples/sec   Loss 10.3643   LearningRate 0.4153   Epoch: 6   Global Step: 34670   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:30:31,547-Speed 10521.41 samples/sec   Loss 10.2631   LearningRate 0.4152   Epoch: 6   Global Step: 34680   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:30:39,380-Speed 10459.28 samples/sec   Loss 10.2201   LearningRate 0.4151   Epoch: 6   Global Step: 34690   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:30:47,204-Speed 10471.87 samples/sec   Loss 10.2544   LearningRate 0.4150   Epoch: 6   Global Step: 34700   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:30:55,003-Speed 10506.33 samples/sec   Loss 10.2586   LearningRate 0.4149   Epoch: 6   Global Step: 34710   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:31:02,804-Speed 10502.21 samples/sec   Loss 10.2859   LearningRate 0.4147   Epoch: 6   Global Step: 34720   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:31:10,593-Speed 10518.61 samples/sec   Loss 10.3134   LearningRate 0.4146   Epoch: 6   Global Step: 34730   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:31:18,387-Speed 10513.33 samples/sec   Loss 10.3116   LearningRate 0.4145   Epoch: 6   Global Step: 34740   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:31:26,188-Speed 10502.05 samples/sec   Loss 10.3641   LearningRate 0.4144   Epoch: 6   Global Step: 34750   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:31:34,008-Speed 10477.00 samples/sec   Loss 10.2577   LearningRate 0.4143   Epoch: 6   Global Step: 34760   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:31:41,818-Speed 10492.08 samples/sec   Loss 10.3184   LearningRate 0.4141   Epoch: 6   Global Step: 34770   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:31:49,611-Speed 10514.53 samples/sec   Loss 10.3019   LearningRate 0.4140   Epoch: 6   Global Step: 34780   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:31:57,431-Speed 10476.78 samples/sec   Loss 10.3422   LearningRate 0.4139   Epoch: 6   Global Step: 34790   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:32:05,244-Speed 10486.02 samples/sec   Loss 10.2001   LearningRate 0.4138   Epoch: 6   Global Step: 34800   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:32:13,049-Speed 10498.49 samples/sec   Loss 10.3051   LearningRate 0.4137   Epoch: 6   Global Step: 34810   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:32:20,860-Speed 10488.78 samples/sec   Loss 10.3073   LearningRate 0.4135   Epoch: 6   Global Step: 34820   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:32:28,678-Speed 10479.64 samples/sec   Loss 10.2470   LearningRate 0.4134   Epoch: 6   Global Step: 34830   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:32:36,505-Speed 10468.48 samples/sec   Loss 10.2667   LearningRate 0.4133   Epoch: 6   Global Step: 34840   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:32:44,304-Speed 10505.65 samples/sec   Loss 10.2982   LearningRate 0.4132   Epoch: 6   Global Step: 34850   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:32:52,106-Speed 10502.36 samples/sec   Loss 10.2609   LearningRate 0.4131   Epoch: 6   Global Step: 34860   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:32:59,879-Speed 10539.17 samples/sec   Loss 10.3130   LearningRate 0.4129   Epoch: 6   Global Step: 34870   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:33:07,718-Speed 10452.57 samples/sec   Loss 10.3175   LearningRate 0.4128   Epoch: 6   Global Step: 34880   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:33:15,523-Speed 10499.02 samples/sec   Loss 10.2988   LearningRate 0.4127   Epoch: 6   Global Step: 34890   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:33:23,315-Speed 10514.81 samples/sec   Loss 10.2594   LearningRate 0.4126   Epoch: 6   Global Step: 34900   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:33:31,117-Speed 10501.83 samples/sec   Loss 10.2875   LearningRate 0.4125   Epoch: 6   Global Step: 34910   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:33:38,933-Speed 10482.96 samples/sec   Loss 10.1869   LearningRate 0.4123   Epoch: 6   Global Step: 34920   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:33:46,714-Speed 10529.23 samples/sec   Loss 10.2527   LearningRate 0.4122   Epoch: 6   Global Step: 34930   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:33:54,497-Speed 10527.74 samples/sec   Loss 10.2810   LearningRate 0.4121   Epoch: 6   Global Step: 34940   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:34:02,279-Speed 10526.99 samples/sec   Loss 10.1922   LearningRate 0.4120   Epoch: 6   Global Step: 34950   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:34:10,069-Speed 10518.19 samples/sec   Loss 10.2219   LearningRate 0.4119   Epoch: 6   Global Step: 34960   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:34:17,916-Speed 10441.16 samples/sec   Loss 10.3222   LearningRate 0.4117   Epoch: 6   Global Step: 34970   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:34:25,728-Speed 10488.34 samples/sec   Loss 10.2223   LearningRate 0.4116   Epoch: 6   Global Step: 34980   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:34:33,517-Speed 10518.28 samples/sec   Loss 10.2802   LearningRate 0.4115   Epoch: 6   Global Step: 34990   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:34:41,314-Speed 10509.16 samples/sec   Loss 10.1400   LearningRate 0.4114   Epoch: 6   Global Step: 35000   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:34:49,116-Speed 10501.66 samples/sec   Loss 10.2349   LearningRate 0.4113   Epoch: 6   Global Step: 35010   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:34:56,897-Speed 10530.65 samples/sec   Loss 10.1922   LearningRate 0.4111   Epoch: 6   Global Step: 35020   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:35:04,751-Speed 10430.60 samples/sec   Loss 10.1918   LearningRate 0.4110   Epoch: 6   Global Step: 35030   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:35:12,558-Speed 10494.53 samples/sec   Loss 10.2691   LearningRate 0.4109   Epoch: 6   Global Step: 35040   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:35:20,379-Speed 10476.82 samples/sec   Loss 10.3418   LearningRate 0.4108   Epoch: 6   Global Step: 35050   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:35:28,198-Speed 10478.84 samples/sec   Loss 10.2964   LearningRate 0.4107   Epoch: 6   Global Step: 35060   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:35:36,001-Speed 10499.14 samples/sec   Loss 10.2977   LearningRate 0.4105   Epoch: 6   Global Step: 35070   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:35:43,789-Speed 10520.70 samples/sec   Loss 10.2428   LearningRate 0.4104   Epoch: 6   Global Step: 35080   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:35:51,621-Speed 10460.64 samples/sec   Loss 10.1852   LearningRate 0.4103   Epoch: 6   Global Step: 35090   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:35:59,397-Speed 10537.02 samples/sec   Loss 10.2114   LearningRate 0.4102   Epoch: 6   Global Step: 35100   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:36:07,181-Speed 10525.46 samples/sec   Loss 10.2644   LearningRate 0.4101   Epoch: 6   Global Step: 35110   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:36:14,959-Speed 10533.45 samples/sec   Loss 10.3309   LearningRate 0.4099   Epoch: 6   Global Step: 35120   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:36:22,763-Speed 10498.58 samples/sec   Loss 10.2290   LearningRate 0.4098   Epoch: 6   Global Step: 35130   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:36:30,556-Speed 10513.84 samples/sec   Loss 10.2214   LearningRate 0.4097   Epoch: 6   Global Step: 35140   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:36:38,351-Speed 10511.12 samples/sec   Loss 10.2501   LearningRate 0.4096   Epoch: 6   Global Step: 35150   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:36:46,137-Speed 10522.15 samples/sec   Loss 10.1938   LearningRate 0.4095   Epoch: 6   Global Step: 35160   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:36:53,941-Speed 10498.26 samples/sec   Loss 10.2176   LearningRate 0.4093   Epoch: 6   Global Step: 35170   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:37:01,732-Speed 10517.43 samples/sec   Loss 10.1716   LearningRate 0.4092   Epoch: 6   Global Step: 35180   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:37:09,534-Speed 10501.48 samples/sec   Loss 10.2138   LearningRate 0.4091   Epoch: 6   Global Step: 35190   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:37:17,341-Speed 10494.17 samples/sec   Loss 10.2632   LearningRate 0.4090   Epoch: 6   Global Step: 35200   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:37:25,127-Speed 10523.88 samples/sec   Loss 10.2307   LearningRate 0.4089   Epoch: 6   Global Step: 35210   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:37:32,915-Speed 10518.88 samples/sec   Loss 10.2400   LearningRate 0.4087   Epoch: 6   Global Step: 35220   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:37:40,723-Speed 10493.75 samples/sec   Loss 10.1674   LearningRate 0.4086   Epoch: 6   Global Step: 35230   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:37:48,523-Speed 10505.10 samples/sec   Loss 10.2045   LearningRate 0.4085   Epoch: 6   Global Step: 35240   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:37:56,313-Speed 10515.86 samples/sec   Loss 10.2509   LearningRate 0.4084   Epoch: 6   Global Step: 35250   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:38:04,113-Speed 10504.18 samples/sec   Loss 10.2225   LearningRate 0.4083   Epoch: 6   Global Step: 35260   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:38:11,918-Speed 10497.89 samples/sec   Loss 10.1475   LearningRate 0.4082   Epoch: 6   Global Step: 35270   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:38:19,753-Speed 10458.20 samples/sec   Loss 10.1916   LearningRate 0.4080   Epoch: 6   Global Step: 35280   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:38:27,541-Speed 10518.90 samples/sec   Loss 10.1822   LearningRate 0.4079   Epoch: 6   Global Step: 35290   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:38:35,326-Speed 10524.83 samples/sec   Loss 10.4451   LearningRate 0.4078   Epoch: 6   Global Step: 35300   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:38:43,128-Speed 10501.40 samples/sec   Loss 10.2437   LearningRate 0.4077   Epoch: 6   Global Step: 35310   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:38:50,907-Speed 10533.53 samples/sec   Loss 10.1713   LearningRate 0.4076   Epoch: 6   Global Step: 35320   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:38:58,696-Speed 10518.33 samples/sec   Loss 10.1431   LearningRate 0.4074   Epoch: 6   Global Step: 35330   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:39:06,504-Speed 10493.38 samples/sec   Loss 10.2169   LearningRate 0.4073   Epoch: 6   Global Step: 35340   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:39:14,291-Speed 10521.30 samples/sec   Loss 10.1834   LearningRate 0.4072   Epoch: 6   Global Step: 35350   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:39:22,097-Speed 10497.05 samples/sec   Loss 10.1921   LearningRate 0.4071   Epoch: 6   Global Step: 35360   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:39:29,937-Speed 10449.53 samples/sec   Loss 10.0828   LearningRate 0.4070   Epoch: 6   Global Step: 35370   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:39:37,734-Speed 10508.91 samples/sec   Loss 10.1845   LearningRate 0.4068   Epoch: 6   Global Step: 35380   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:39:45,500-Speed 10550.42 samples/sec   Loss 10.1933   LearningRate 0.4067   Epoch: 6   Global Step: 35390   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:39:53,298-Speed 10507.30 samples/sec   Loss 10.2096   LearningRate 0.4066   Epoch: 6   Global Step: 35400   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:40:01,127-Speed 10471.50 samples/sec   Loss 10.1330   LearningRate 0.4065   Epoch: 6   Global Step: 35410   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:40:08,903-Speed 10535.66 samples/sec   Loss 10.2892   LearningRate 0.4064   Epoch: 6   Global Step: 35420   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:40:16,733-Speed 10463.77 samples/sec   Loss 10.2176   LearningRate 0.4062   Epoch: 6   Global Step: 35430   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:40:24,531-Speed 10507.25 samples/sec   Loss 10.1717   LearningRate 0.4061   Epoch: 6   Global Step: 35440   Fp16 Grad Scale: 524288   Required: 15 hours
Training: 2022-01-15 22:40:32,307-Speed 10535.86 samples/sec   Loss 10.2507   LearningRate 0.4060   Epoch: 6   Global Step: 35450   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:40:40,096-Speed 10519.72 samples/sec   Loss 10.1886   LearningRate 0.4059   Epoch: 6   Global Step: 35460   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:40:47,888-Speed 10515.97 samples/sec   Loss 10.1828   LearningRate 0.4058   Epoch: 6   Global Step: 35470   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:40:55,669-Speed 10528.37 samples/sec   Loss 10.1087   LearningRate 0.4056   Epoch: 6   Global Step: 35480   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:41:03,483-Speed 10485.69 samples/sec   Loss 10.1512   LearningRate 0.4055   Epoch: 6   Global Step: 35490   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:41:11,289-Speed 10496.74 samples/sec   Loss 10.2476   LearningRate 0.4054   Epoch: 6   Global Step: 35500   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:41:19,064-Speed 10537.73 samples/sec   Loss 10.1678   LearningRate 0.4053   Epoch: 6   Global Step: 35510   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:41:26,880-Speed 10481.59 samples/sec   Loss 10.1656   LearningRate 0.4052   Epoch: 6   Global Step: 35520   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:41:34,667-Speed 10521.97 samples/sec   Loss 10.1803   LearningRate 0.4051   Epoch: 6   Global Step: 35530   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:41:42,450-Speed 10527.01 samples/sec   Loss 10.1455   LearningRate 0.4049   Epoch: 6   Global Step: 35540   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:41:50,234-Speed 10526.26 samples/sec   Loss 10.1985   LearningRate 0.4048   Epoch: 6   Global Step: 35550   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:41:58,014-Speed 10531.03 samples/sec   Loss 10.1557   LearningRate 0.4047   Epoch: 6   Global Step: 35560   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:42:05,831-Speed 10480.70 samples/sec   Loss 10.2243   LearningRate 0.4046   Epoch: 6   Global Step: 35570   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:42:13,633-Speed 10501.04 samples/sec   Loss 10.1208   LearningRate 0.4045   Epoch: 6   Global Step: 35580   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:42:21,430-Speed 10507.54 samples/sec   Loss 10.1229   LearningRate 0.4043   Epoch: 6   Global Step: 35590   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:42:29,231-Speed 10503.87 samples/sec   Loss 10.1603   LearningRate 0.4042   Epoch: 6   Global Step: 35600   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-15 22:42:37,034-Speed 10499.02 samples/sec   Loss 10.1534   LearningRate 0.4041   Epoch: 6   Global Step: 35610   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:42:44,830-Speed 10509.63 samples/sec   Loss 10.1287   LearningRate 0.4040   Epoch: 6   Global Step: 35620   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:42:52,612-Speed 10528.68 samples/sec   Loss 10.1101   LearningRate 0.4039   Epoch: 6   Global Step: 35630   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:43:00,417-Speed 10502.23 samples/sec   Loss 10.1432   LearningRate 0.4037   Epoch: 6   Global Step: 35640   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:43:08,216-Speed 10504.60 samples/sec   Loss 10.1317   LearningRate 0.4036   Epoch: 6   Global Step: 35650   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:43:16,010-Speed 10512.86 samples/sec   Loss 10.1613   LearningRate 0.4035   Epoch: 6   Global Step: 35660   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:43:23,808-Speed 10505.22 samples/sec   Loss 10.1808   LearningRate 0.4034   Epoch: 6   Global Step: 35670   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:43:31,612-Speed 10499.66 samples/sec   Loss 10.1017   LearningRate 0.4033   Epoch: 6   Global Step: 35680   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:43:39,414-Speed 10500.64 samples/sec   Loss 10.1121   LearningRate 0.4032   Epoch: 6   Global Step: 35690   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:43:47,186-Speed 10542.48 samples/sec   Loss 10.1938   LearningRate 0.4030   Epoch: 6   Global Step: 35700   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:43:54,990-Speed 10498.34 samples/sec   Loss 10.1420   LearningRate 0.4029   Epoch: 6   Global Step: 35710   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:44:02,772-Speed 10528.99 samples/sec   Loss 10.0436   LearningRate 0.4028   Epoch: 6   Global Step: 35720   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:44:10,581-Speed 10490.98 samples/sec   Loss 10.1540   LearningRate 0.4027   Epoch: 6   Global Step: 35730   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:44:18,393-Speed 10488.17 samples/sec   Loss 10.1031   LearningRate 0.4026   Epoch: 6   Global Step: 35740   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:44:26,188-Speed 10511.54 samples/sec   Loss 10.1728   LearningRate 0.4024   Epoch: 6   Global Step: 35750   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:44:34,025-Speed 10453.87 samples/sec   Loss 10.1074   LearningRate 0.4023   Epoch: 6   Global Step: 35760   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:44:41,822-Speed 10508.53 samples/sec   Loss 10.1139   LearningRate 0.4022   Epoch: 6   Global Step: 35770   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:44:49,620-Speed 10506.40 samples/sec   Loss 10.1197   LearningRate 0.4021   Epoch: 6   Global Step: 35780   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:44:57,413-Speed 10513.48 samples/sec   Loss 10.1098   LearningRate 0.4020   Epoch: 6   Global Step: 35790   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:45:05,237-Speed 10471.89 samples/sec   Loss 10.1254   LearningRate 0.4019   Epoch: 6   Global Step: 35800   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:45:13,067-Speed 10464.87 samples/sec   Loss 10.1410   LearningRate 0.4017   Epoch: 6   Global Step: 35810   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:45:20,861-Speed 10511.80 samples/sec   Loss 10.1325   LearningRate 0.4016   Epoch: 6   Global Step: 35820   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:45:28,658-Speed 10508.07 samples/sec   Loss 10.1778   LearningRate 0.4015   Epoch: 6   Global Step: 35830   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:45:36,454-Speed 10509.02 samples/sec   Loss 10.1465   LearningRate 0.4014   Epoch: 6   Global Step: 35840   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:45:44,240-Speed 10522.33 samples/sec   Loss 10.0990   LearningRate 0.4013   Epoch: 6   Global Step: 35850   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:45:52,048-Speed 10494.03 samples/sec   Loss 10.0820   LearningRate 0.4011   Epoch: 6   Global Step: 35860   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:45:59,837-Speed 10517.29 samples/sec   Loss 10.1477   LearningRate 0.4010   Epoch: 6   Global Step: 35870   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:46:07,620-Speed 10527.29 samples/sec   Loss 10.0733   LearningRate 0.4009   Epoch: 6   Global Step: 35880   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:46:15,423-Speed 10500.36 samples/sec   Loss 10.0859   LearningRate 0.4008   Epoch: 6   Global Step: 35890   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:46:23,240-Speed 10480.92 samples/sec   Loss 10.0933   LearningRate 0.4007   Epoch: 6   Global Step: 35900   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:46:31,023-Speed 10527.11 samples/sec   Loss 10.2055   LearningRate 0.4005   Epoch: 6   Global Step: 35910   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:46:38,840-Speed 10484.33 samples/sec   Loss 10.1305   LearningRate 0.4004   Epoch: 6   Global Step: 35920   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:46:46,636-Speed 10508.89 samples/sec   Loss 10.1122   LearningRate 0.4003   Epoch: 6   Global Step: 35930   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:46:54,450-Speed 10484.57 samples/sec   Loss 10.0682   LearningRate 0.4002   Epoch: 6   Global Step: 35940   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:47:02,271-Speed 10476.62 samples/sec   Loss 10.2006   LearningRate 0.4001   Epoch: 6   Global Step: 35950   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:47:10,130-Speed 10425.13 samples/sec   Loss 10.1366   LearningRate 0.4000   Epoch: 6   Global Step: 35960   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:47:17,906-Speed 10537.13 samples/sec   Loss 10.0858   LearningRate 0.3998   Epoch: 6   Global Step: 35970   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:47:25,698-Speed 10514.77 samples/sec   Loss 10.1639   LearningRate 0.3997   Epoch: 6   Global Step: 35980   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:47:33,516-Speed 10480.48 samples/sec   Loss 10.1114   LearningRate 0.3996   Epoch: 6   Global Step: 35990   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:47:41,327-Speed 10489.44 samples/sec   Loss 10.1912   LearningRate 0.3995   Epoch: 6   Global Step: 36000   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:47:49,130-Speed 10499.90 samples/sec   Loss 10.0723   LearningRate 0.3994   Epoch: 6   Global Step: 36010   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:47:56,930-Speed 10503.78 samples/sec   Loss 10.0668   LearningRate 0.3993   Epoch: 6   Global Step: 36020   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:48:04,750-Speed 10476.83 samples/sec   Loss 10.0098   LearningRate 0.3991   Epoch: 6   Global Step: 36030   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:48:12,523-Speed 10542.27 samples/sec   Loss 10.1189   LearningRate 0.3990   Epoch: 6   Global Step: 36040   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:48:20,318-Speed 10511.97 samples/sec   Loss 10.0376   LearningRate 0.3989   Epoch: 6   Global Step: 36050   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:48:28,142-Speed 10470.59 samples/sec   Loss 9.9816   LearningRate 0.3988   Epoch: 6   Global Step: 36060   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:48:35,940-Speed 10506.06 samples/sec   Loss 10.1005   LearningRate 0.3987   Epoch: 6   Global Step: 36070   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:48:43,789-Speed 10439.53 samples/sec   Loss 10.0590   LearningRate 0.3985   Epoch: 6   Global Step: 36080   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:48:51,599-Speed 10490.11 samples/sec   Loss 10.2509   LearningRate 0.3984   Epoch: 6   Global Step: 36090   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:48:59,386-Speed 10520.86 samples/sec   Loss 10.2780   LearningRate 0.3983   Epoch: 6   Global Step: 36100   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:49:07,170-Speed 10526.03 samples/sec   Loss 10.1362   LearningRate 0.3982   Epoch: 6   Global Step: 36110   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:49:14,979-Speed 10491.98 samples/sec   Loss 10.0623   LearningRate 0.3981   Epoch: 6   Global Step: 36120   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:49:22,771-Speed 10515.33 samples/sec   Loss 9.9433   LearningRate 0.3980   Epoch: 6   Global Step: 36130   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:49:30,565-Speed 10516.71 samples/sec   Loss 10.0444   LearningRate 0.3978   Epoch: 6   Global Step: 36140   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:49:38,378-Speed 10486.63 samples/sec   Loss 10.0302   LearningRate 0.3977   Epoch: 6   Global Step: 36150   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:49:46,156-Speed 10538.10 samples/sec   Loss 10.0566   LearningRate 0.3976   Epoch: 6   Global Step: 36160   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:49:53,963-Speed 10493.71 samples/sec   Loss 10.1550   LearningRate 0.3975   Epoch: 6   Global Step: 36170   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:50:01,768-Speed 10497.36 samples/sec   Loss 10.0674   LearningRate 0.3974   Epoch: 6   Global Step: 36180   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:50:09,567-Speed 10505.53 samples/sec   Loss 10.0322   LearningRate 0.3972   Epoch: 6   Global Step: 36190   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:50:17,347-Speed 10531.41 samples/sec   Loss 10.0022   LearningRate 0.3971   Epoch: 6   Global Step: 36200   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:50:25,133-Speed 10522.69 samples/sec   Loss 10.1109   LearningRate 0.3970   Epoch: 6   Global Step: 36210   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:50:32,936-Speed 10500.25 samples/sec   Loss 10.0303   LearningRate 0.3969   Epoch: 6   Global Step: 36220   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:50:40,726-Speed 10517.67 samples/sec   Loss 10.0840   LearningRate 0.3968   Epoch: 6   Global Step: 36230   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:50:48,526-Speed 10504.13 samples/sec   Loss 10.0706   LearningRate 0.3967   Epoch: 6   Global Step: 36240   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:50:56,341-Speed 10483.76 samples/sec   Loss 10.1009   LearningRate 0.3965   Epoch: 6   Global Step: 36250   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:51:04,119-Speed 10532.97 samples/sec   Loss 10.0677   LearningRate 0.3964   Epoch: 6   Global Step: 36260   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:51:11,917-Speed 10507.64 samples/sec   Loss 10.0637   LearningRate 0.3963   Epoch: 6   Global Step: 36270   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:51:19,753-Speed 10455.46 samples/sec   Loss 10.1333   LearningRate 0.3962   Epoch: 6   Global Step: 36280   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:51:27,542-Speed 10518.93 samples/sec   Loss 10.0071   LearningRate 0.3961   Epoch: 6   Global Step: 36290   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:51:50,498-Speed 3568.64 samples/sec   Loss 10.0450   LearningRate 0.3960   Epoch: 7   Global Step: 36300   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:51:58,287-Speed 10520.32 samples/sec   Loss 10.0143   LearningRate 0.3958   Epoch: 7   Global Step: 36310   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:52:06,061-Speed 10539.12 samples/sec   Loss 10.0526   LearningRate 0.3957   Epoch: 7   Global Step: 36320   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:52:13,878-Speed 10480.56 samples/sec   Loss 10.0492   LearningRate 0.3956   Epoch: 7   Global Step: 36330   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:52:21,671-Speed 10513.82 samples/sec   Loss 10.0366   LearningRate 0.3955   Epoch: 7   Global Step: 36340   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:52:29,446-Speed 10538.33 samples/sec   Loss 10.0851   LearningRate 0.3954   Epoch: 7   Global Step: 36350   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:52:37,239-Speed 10512.75 samples/sec   Loss 10.0690   LearningRate 0.3952   Epoch: 7   Global Step: 36360   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:52:45,029-Speed 10516.73 samples/sec   Loss 10.0518   LearningRate 0.3951   Epoch: 7   Global Step: 36370   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:52:52,853-Speed 10472.18 samples/sec   Loss 9.9860   LearningRate 0.3950   Epoch: 7   Global Step: 36380   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:53:00,639-Speed 10523.38 samples/sec   Loss 10.0421   LearningRate 0.3949   Epoch: 7   Global Step: 36390   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:53:08,461-Speed 10474.71 samples/sec   Loss 9.9442   LearningRate 0.3948   Epoch: 7   Global Step: 36400   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:53:16,261-Speed 10503.67 samples/sec   Loss 9.9787   LearningRate 0.3947   Epoch: 7   Global Step: 36410   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:53:24,121-Speed 10423.33 samples/sec   Loss 10.0186   LearningRate 0.3945   Epoch: 7   Global Step: 36420   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:53:31,935-Speed 10485.37 samples/sec   Loss 10.0578   LearningRate 0.3944   Epoch: 7   Global Step: 36430   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:53:39,720-Speed 10524.02 samples/sec   Loss 9.9911   LearningRate 0.3943   Epoch: 7   Global Step: 36440   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:53:47,501-Speed 10529.95 samples/sec   Loss 10.0579   LearningRate 0.3942   Epoch: 7   Global Step: 36450   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:53:55,299-Speed 10505.94 samples/sec   Loss 10.0217   LearningRate 0.3941   Epoch: 7   Global Step: 36460   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:54:03,064-Speed 10551.97 samples/sec   Loss 9.9466   LearningRate 0.3940   Epoch: 7   Global Step: 36470   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:54:10,865-Speed 10503.16 samples/sec   Loss 9.9845   LearningRate 0.3938   Epoch: 7   Global Step: 36480   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:54:18,634-Speed 10545.44 samples/sec   Loss 10.0502   LearningRate 0.3937   Epoch: 7   Global Step: 36490   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:54:26,401-Speed 10549.24 samples/sec   Loss 9.8856   LearningRate 0.3936   Epoch: 7   Global Step: 36500   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:54:34,185-Speed 10526.05 samples/sec   Loss 9.9970   LearningRate 0.3935   Epoch: 7   Global Step: 36510   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:54:41,975-Speed 10516.92 samples/sec   Loss 9.9819   LearningRate 0.3934   Epoch: 7   Global Step: 36520   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:54:49,776-Speed 10501.42 samples/sec   Loss 10.0001   LearningRate 0.3933   Epoch: 7   Global Step: 36530   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:54:57,587-Speed 10489.65 samples/sec   Loss 9.9920   LearningRate 0.3931   Epoch: 7   Global Step: 36540   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:55:05,395-Speed 10494.30 samples/sec   Loss 9.9976   LearningRate 0.3930   Epoch: 7   Global Step: 36550   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:55:13,243-Speed 10438.37 samples/sec   Loss 10.0893   LearningRate 0.3929   Epoch: 7   Global Step: 36560   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:55:21,079-Speed 10456.56 samples/sec   Loss 9.9903   LearningRate 0.3928   Epoch: 7   Global Step: 36570   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:55:28,893-Speed 10485.34 samples/sec   Loss 10.0172   LearningRate 0.3927   Epoch: 7   Global Step: 36580   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:55:36,696-Speed 10500.34 samples/sec   Loss 9.9323   LearningRate 0.3926   Epoch: 7   Global Step: 36590   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:55:44,502-Speed 10495.35 samples/sec   Loss 10.0849   LearningRate 0.3924   Epoch: 7   Global Step: 36600   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:55:52,323-Speed 10475.76 samples/sec   Loss 10.0242   LearningRate 0.3923   Epoch: 7   Global Step: 36610   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:56:00,186-Speed 10419.36 samples/sec   Loss 9.8826   LearningRate 0.3922   Epoch: 7   Global Step: 36620   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:56:08,000-Speed 10486.98 samples/sec   Loss 9.9413   LearningRate 0.3921   Epoch: 7   Global Step: 36630   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:56:15,828-Speed 10466.28 samples/sec   Loss 10.1826   LearningRate 0.3920   Epoch: 7   Global Step: 36640   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:56:23,647-Speed 10478.70 samples/sec   Loss 9.9924   LearningRate 0.3918   Epoch: 7   Global Step: 36650   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:56:31,508-Speed 10422.56 samples/sec   Loss 10.0997   LearningRate 0.3917   Epoch: 7   Global Step: 36660   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:56:39,330-Speed 10480.95 samples/sec   Loss 10.0441   LearningRate 0.3916   Epoch: 7   Global Step: 36670   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:56:47,196-Speed 10416.07 samples/sec   Loss 10.0149   LearningRate 0.3915   Epoch: 7   Global Step: 36680   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:56:55,041-Speed 10443.41 samples/sec   Loss 9.9797   LearningRate 0.3914   Epoch: 7   Global Step: 36690   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:57:02,933-Speed 10381.10 samples/sec   Loss 9.9747   LearningRate 0.3913   Epoch: 7   Global Step: 36700   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:57:10,760-Speed 10468.75 samples/sec   Loss 9.9845   LearningRate 0.3911   Epoch: 7   Global Step: 36710   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:57:18,594-Speed 10457.44 samples/sec   Loss 9.9504   LearningRate 0.3910   Epoch: 7   Global Step: 36720   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:57:26,445-Speed 10436.36 samples/sec   Loss 10.0785   LearningRate 0.3909   Epoch: 7   Global Step: 36730   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:57:34,284-Speed 10452.25 samples/sec   Loss 9.9686   LearningRate 0.3908   Epoch: 7   Global Step: 36740   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:57:42,154-Speed 10411.07 samples/sec   Loss 9.9510   LearningRate 0.3907   Epoch: 7   Global Step: 36750   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:57:50,021-Speed 10413.31 samples/sec   Loss 10.0710   LearningRate 0.3906   Epoch: 7   Global Step: 36760   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:57:57,931-Speed 10358.62 samples/sec   Loss 10.0093   LearningRate 0.3904   Epoch: 7   Global Step: 36770   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:58:05,781-Speed 10437.17 samples/sec   Loss 9.9705   LearningRate 0.3903   Epoch: 7   Global Step: 36780   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:58:13,612-Speed 10463.82 samples/sec   Loss 9.9667   LearningRate 0.3902   Epoch: 7   Global Step: 36790   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:58:21,459-Speed 10440.21 samples/sec   Loss 9.9366   LearningRate 0.3901   Epoch: 7   Global Step: 36800   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 22:58:29,303-Speed 10445.53 samples/sec   Loss 9.9009   LearningRate 0.3900   Epoch: 7   Global Step: 36810   Fp16 Grad Scale: 524288   Required: 15 hours
Training: 2022-01-15 22:58:37,140-Speed 10454.66 samples/sec   Loss 9.9888   LearningRate 0.3899   Epoch: 7   Global Step: 36820   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:58:44,997-Speed 10428.38 samples/sec   Loss 9.9793   LearningRate 0.3897   Epoch: 7   Global Step: 36830   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:58:52,824-Speed 10467.61 samples/sec   Loss 10.0011   LearningRate 0.3896   Epoch: 7   Global Step: 36840   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:59:00,664-Speed 10449.73 samples/sec   Loss 10.0110   LearningRate 0.3895   Epoch: 7   Global Step: 36850   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:59:08,546-Speed 10394.70 samples/sec   Loss 9.9574   LearningRate 0.3894   Epoch: 7   Global Step: 36860   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:59:16,389-Speed 10447.13 samples/sec   Loss 9.9326   LearningRate 0.3893   Epoch: 7   Global Step: 36870   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:59:24,226-Speed 10454.13 samples/sec   Loss 9.9322   LearningRate 0.3892   Epoch: 7   Global Step: 36880   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:59:32,072-Speed 10443.56 samples/sec   Loss 9.9953   LearningRate 0.3890   Epoch: 7   Global Step: 36890   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:59:39,909-Speed 10454.44 samples/sec   Loss 9.9678   LearningRate 0.3889   Epoch: 7   Global Step: 36900   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:59:47,727-Speed 10480.48 samples/sec   Loss 9.9332   LearningRate 0.3888   Epoch: 7   Global Step: 36910   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 22:59:55,545-Speed 10478.84 samples/sec   Loss 9.9655   LearningRate 0.3887   Epoch: 7   Global Step: 36920   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:00:03,396-Speed 10436.54 samples/sec   Loss 9.9476   LearningRate 0.3886   Epoch: 7   Global Step: 36930   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:00:11,239-Speed 10446.20 samples/sec   Loss 9.9339   LearningRate 0.3885   Epoch: 7   Global Step: 36940   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:00:19,074-Speed 10456.90 samples/sec   Loss 9.9024   LearningRate 0.3884   Epoch: 7   Global Step: 36950   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:00:26,924-Speed 10437.88 samples/sec   Loss 9.9812   LearningRate 0.3882   Epoch: 7   Global Step: 36960   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:00:34,774-Speed 10437.26 samples/sec   Loss 10.0086   LearningRate 0.3881   Epoch: 7   Global Step: 36970   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:00:42,619-Speed 10443.63 samples/sec   Loss 9.9440   LearningRate 0.3880   Epoch: 7   Global Step: 36980   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:00:50,480-Speed 10421.56 samples/sec   Loss 9.8873   LearningRate 0.3879   Epoch: 7   Global Step: 36990   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:00:58,322-Speed 10449.35 samples/sec   Loss 9.9303   LearningRate 0.3878   Epoch: 7   Global Step: 37000   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:01:06,167-Speed 10443.19 samples/sec   Loss 9.9216   LearningRate 0.3877   Epoch: 7   Global Step: 37010   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:01:13,977-Speed 10490.07 samples/sec   Loss 9.9267   LearningRate 0.3875   Epoch: 7   Global Step: 37020   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:01:21,776-Speed 10505.72 samples/sec   Loss 9.9687   LearningRate 0.3874   Epoch: 7   Global Step: 37030   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:01:29,596-Speed 10477.02 samples/sec   Loss 10.0431   LearningRate 0.3873   Epoch: 7   Global Step: 37040   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:01:37,404-Speed 10493.16 samples/sec   Loss 9.9566   LearningRate 0.3872   Epoch: 7   Global Step: 37050   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:01:45,213-Speed 10492.25 samples/sec   Loss 9.8926   LearningRate 0.3871   Epoch: 7   Global Step: 37060   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:01:53,009-Speed 10509.21 samples/sec   Loss 9.9643   LearningRate 0.3870   Epoch: 7   Global Step: 37070   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:02:00,823-Speed 10485.19 samples/sec   Loss 9.9351   LearningRate 0.3868   Epoch: 7   Global Step: 37080   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:02:08,641-Speed 10482.16 samples/sec   Loss 9.8978   LearningRate 0.3867   Epoch: 7   Global Step: 37090   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:02:16,457-Speed 10482.54 samples/sec   Loss 9.9295   LearningRate 0.3866   Epoch: 7   Global Step: 37100   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:02:24,270-Speed 10486.54 samples/sec   Loss 9.8710   LearningRate 0.3865   Epoch: 7   Global Step: 37110   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:02:32,124-Speed 10430.91 samples/sec   Loss 9.9073   LearningRate 0.3864   Epoch: 7   Global Step: 37120   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:02:39,930-Speed 10496.58 samples/sec   Loss 9.9475   LearningRate 0.3863   Epoch: 7   Global Step: 37130   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:02:47,727-Speed 10507.91 samples/sec   Loss 9.9075   LearningRate 0.3861   Epoch: 7   Global Step: 37140   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:02:55,556-Speed 10465.98 samples/sec   Loss 9.8430   LearningRate 0.3860   Epoch: 7   Global Step: 37150   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:03:03,367-Speed 10488.33 samples/sec   Loss 9.9925   LearningRate 0.3859   Epoch: 7   Global Step: 37160   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:03:11,193-Speed 10469.02 samples/sec   Loss 9.9474   LearningRate 0.3858   Epoch: 7   Global Step: 37170   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:03:19,015-Speed 10475.19 samples/sec   Loss 9.9262   LearningRate 0.3857   Epoch: 7   Global Step: 37180   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:03:26,830-Speed 10483.14 samples/sec   Loss 9.9680   LearningRate 0.3856   Epoch: 7   Global Step: 37190   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:03:34,624-Speed 10512.63 samples/sec   Loss 9.8878   LearningRate 0.3854   Epoch: 7   Global Step: 37200   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:03:42,413-Speed 10518.24 samples/sec   Loss 9.8864   LearningRate 0.3853   Epoch: 7   Global Step: 37210   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:03:50,216-Speed 10500.65 samples/sec   Loss 9.9425   LearningRate 0.3852   Epoch: 7   Global Step: 37220   Fp16 Grad Scale: 524288   Required: 15 hours
Training: 2022-01-15 23:03:57,990-Speed 10538.79 samples/sec   Loss 9.9396   LearningRate 0.3851   Epoch: 7   Global Step: 37230   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:04:05,808-Speed 10480.20 samples/sec   Loss 9.9278   LearningRate 0.3850   Epoch: 7   Global Step: 37240   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:04:13,622-Speed 10484.85 samples/sec   Loss 9.9379   LearningRate 0.3849   Epoch: 7   Global Step: 37250   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:04:21,423-Speed 10502.79 samples/sec   Loss 9.8651   LearningRate 0.3848   Epoch: 7   Global Step: 37260   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:04:29,217-Speed 10512.77 samples/sec   Loss 9.8628   LearningRate 0.3846   Epoch: 7   Global Step: 37270   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:04:37,044-Speed 10467.43 samples/sec   Loss 9.7972   LearningRate 0.3845   Epoch: 7   Global Step: 37280   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:04:44,827-Speed 10526.47 samples/sec   Loss 9.7992   LearningRate 0.3844   Epoch: 7   Global Step: 37290   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 23:04:52,621-Speed 10512.49 samples/sec   Loss 9.9416   LearningRate 0.3843   Epoch: 7   Global Step: 37300   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 23:05:00,400-Speed 10533.60 samples/sec   Loss 9.8575   LearningRate 0.3842   Epoch: 7   Global Step: 37310   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 23:05:08,221-Speed 10476.21 samples/sec   Loss 9.8523   LearningRate 0.3841   Epoch: 7   Global Step: 37320   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 23:05:16,009-Speed 10519.56 samples/sec   Loss 9.9159   LearningRate 0.3839   Epoch: 7   Global Step: 37330   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 23:05:23,804-Speed 10511.20 samples/sec   Loss 9.9134   LearningRate 0.3838   Epoch: 7   Global Step: 37340   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 23:05:31,601-Speed 10508.32 samples/sec   Loss 9.8287   LearningRate 0.3837   Epoch: 7   Global Step: 37350   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 23:05:39,406-Speed 10497.15 samples/sec   Loss 9.9629   LearningRate 0.3836   Epoch: 7   Global Step: 37360   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 23:05:47,215-Speed 10492.79 samples/sec   Loss 9.8865   LearningRate 0.3835   Epoch: 7   Global Step: 37370   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 23:05:55,023-Speed 10493.31 samples/sec   Loss 9.8047   LearningRate 0.3834   Epoch: 7   Global Step: 37380   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 23:06:02,821-Speed 10506.53 samples/sec   Loss 9.8426   LearningRate 0.3832   Epoch: 7   Global Step: 37390   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:06:10,646-Speed 10470.86 samples/sec   Loss 9.8667   LearningRate 0.3831   Epoch: 7   Global Step: 37400   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:06:18,460-Speed 10486.03 samples/sec   Loss 9.8645   LearningRate 0.3830   Epoch: 7   Global Step: 37410   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:06:26,247-Speed 10519.96 samples/sec   Loss 9.9642   LearningRate 0.3829   Epoch: 7   Global Step: 37420   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:06:34,037-Speed 10518.45 samples/sec   Loss 9.8919   LearningRate 0.3828   Epoch: 7   Global Step: 37430   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:06:41,844-Speed 10494.46 samples/sec   Loss 9.9075   LearningRate 0.3827   Epoch: 7   Global Step: 37440   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:06:49,641-Speed 10508.29 samples/sec   Loss 9.8416   LearningRate 0.3826   Epoch: 7   Global Step: 37450   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:06:57,432-Speed 10515.54 samples/sec   Loss 9.8876   LearningRate 0.3824   Epoch: 7   Global Step: 37460   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:07:05,260-Speed 10472.04 samples/sec   Loss 9.8873   LearningRate 0.3823   Epoch: 7   Global Step: 37470   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:07:13,042-Speed 10528.78 samples/sec   Loss 9.8339   LearningRate 0.3822   Epoch: 7   Global Step: 37480   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:07:20,826-Speed 10525.20 samples/sec   Loss 9.8649   LearningRate 0.3821   Epoch: 7   Global Step: 37490   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:07:28,629-Speed 10499.17 samples/sec   Loss 9.9124   LearningRate 0.3820   Epoch: 7   Global Step: 37500   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 23:07:36,426-Speed 10508.32 samples/sec   Loss 9.9359   LearningRate 0.3819   Epoch: 7   Global Step: 37510   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 23:07:44,240-Speed 10485.99 samples/sec   Loss 9.8840   LearningRate 0.3817   Epoch: 7   Global Step: 37520   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 23:07:52,077-Speed 10453.50 samples/sec   Loss 9.8947   LearningRate 0.3816   Epoch: 7   Global Step: 37530   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 23:07:59,907-Speed 10464.96 samples/sec   Loss 9.9020   LearningRate 0.3815   Epoch: 7   Global Step: 37540   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 23:08:07,707-Speed 10503.99 samples/sec   Loss 9.8044   LearningRate 0.3814   Epoch: 7   Global Step: 37550   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 23:08:15,521-Speed 10486.11 samples/sec   Loss 9.8448   LearningRate 0.3813   Epoch: 7   Global Step: 37560   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 23:08:23,344-Speed 10472.19 samples/sec   Loss 9.8240   LearningRate 0.3812   Epoch: 7   Global Step: 37570   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 23:08:31,170-Speed 10469.60 samples/sec   Loss 9.7901   LearningRate 0.3811   Epoch: 7   Global Step: 37580   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 23:08:39,001-Speed 10461.79 samples/sec   Loss 9.8384   LearningRate 0.3809   Epoch: 7   Global Step: 37590   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-15 23:08:46,834-Speed 10460.21 samples/sec   Loss 9.8173   LearningRate 0.3808   Epoch: 7   Global Step: 37600   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:08:54,632-Speed 10505.96 samples/sec   Loss 9.8001   LearningRate 0.3807   Epoch: 7   Global Step: 37610   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:09:02,468-Speed 10456.67 samples/sec   Loss 9.8705   LearningRate 0.3806   Epoch: 7   Global Step: 37620   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:09:10,277-Speed 10491.55 samples/sec   Loss 9.8713   LearningRate 0.3805   Epoch: 7   Global Step: 37630   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:09:18,058-Speed 10529.84 samples/sec   Loss 9.8782   LearningRate 0.3804   Epoch: 7   Global Step: 37640   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:09:25,863-Speed 10497.71 samples/sec   Loss 9.8786   LearningRate 0.3802   Epoch: 7   Global Step: 37650   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:09:33,632-Speed 10545.20 samples/sec   Loss 9.8305   LearningRate 0.3801   Epoch: 7   Global Step: 37660   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:09:41,419-Speed 10521.46 samples/sec   Loss 9.9002   LearningRate 0.3800   Epoch: 7   Global Step: 37670   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:09:49,227-Speed 10492.60 samples/sec   Loss 9.8674   LearningRate 0.3799   Epoch: 7   Global Step: 37680   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-15 23:09:57,011-Speed 10525.80 samples/sec   Loss 9.8102   LearningRate 0.3798   Epoch: 7   Global Step: 37690   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:10:04,797-Speed 10522.81 samples/sec   Loss 9.8884   LearningRate 0.3797   Epoch: 7   Global Step: 37700   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:10:12,597-Speed 10504.15 samples/sec   Loss 9.8124   LearningRate 0.3796   Epoch: 7   Global Step: 37710   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:10:20,422-Speed 10471.53 samples/sec   Loss 9.8204   LearningRate 0.3794   Epoch: 7   Global Step: 37720   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:10:28,226-Speed 10497.65 samples/sec   Loss 9.8198   LearningRate 0.3793   Epoch: 7   Global Step: 37730   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:10:36,055-Speed 10465.01 samples/sec   Loss 9.8086   LearningRate 0.3792   Epoch: 7   Global Step: 37740   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:10:43,899-Speed 10445.75 samples/sec   Loss 9.9178   LearningRate 0.3791   Epoch: 7   Global Step: 37750   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:10:51,684-Speed 10523.33 samples/sec   Loss 9.8763   LearningRate 0.3790   Epoch: 7   Global Step: 37760   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:10:59,471-Speed 10522.56 samples/sec   Loss 9.8202   LearningRate 0.3789   Epoch: 7   Global Step: 37770   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:11:07,256-Speed 10524.53 samples/sec   Loss 9.7694   LearningRate 0.3787   Epoch: 7   Global Step: 37780   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:11:15,059-Speed 10500.02 samples/sec   Loss 9.7551   LearningRate 0.3786   Epoch: 7   Global Step: 37790   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:11:22,851-Speed 10516.16 samples/sec   Loss 9.8306   LearningRate 0.3785   Epoch: 7   Global Step: 37800   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:11:30,665-Speed 10485.00 samples/sec   Loss 9.8317   LearningRate 0.3784   Epoch: 7   Global Step: 37810   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:11:38,468-Speed 10500.02 samples/sec   Loss 9.9081   LearningRate 0.3783   Epoch: 7   Global Step: 37820   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:11:46,259-Speed 10515.80 samples/sec   Loss 9.8418   LearningRate 0.3782   Epoch: 7   Global Step: 37830   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:11:54,043-Speed 10525.72 samples/sec   Loss 9.8240   LearningRate 0.3781   Epoch: 7   Global Step: 37840   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:12:01,826-Speed 10526.36 samples/sec   Loss 9.7961   LearningRate 0.3779   Epoch: 7   Global Step: 37850   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:12:09,624-Speed 10508.34 samples/sec   Loss 9.8021   LearningRate 0.3778   Epoch: 7   Global Step: 37860   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:12:17,449-Speed 10470.04 samples/sec   Loss 9.7585   LearningRate 0.3777   Epoch: 7   Global Step: 37870   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:12:25,265-Speed 10483.38 samples/sec   Loss 9.8425   LearningRate 0.3776   Epoch: 7   Global Step: 37880   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:12:33,067-Speed 10502.45 samples/sec   Loss 9.7681   LearningRate 0.3775   Epoch: 7   Global Step: 37890   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:12:40,932-Speed 10416.45 samples/sec   Loss 9.7741   LearningRate 0.3774   Epoch: 7   Global Step: 37900   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:12:48,721-Speed 10518.74 samples/sec   Loss 9.7527   LearningRate 0.3773   Epoch: 7   Global Step: 37910   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:12:56,497-Speed 10536.80 samples/sec   Loss 9.7733   LearningRate 0.3771   Epoch: 7   Global Step: 37920   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:13:04,320-Speed 10472.77 samples/sec   Loss 9.8460   LearningRate 0.3770   Epoch: 7   Global Step: 37930   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:13:12,142-Speed 10474.99 samples/sec   Loss 9.8017   LearningRate 0.3769   Epoch: 7   Global Step: 37940   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:13:19,930-Speed 10520.03 samples/sec   Loss 9.8121   LearningRate 0.3768   Epoch: 7   Global Step: 37950   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:13:27,745-Speed 10483.88 samples/sec   Loss 9.7744   LearningRate 0.3767   Epoch: 7   Global Step: 37960   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:13:35,533-Speed 10520.28 samples/sec   Loss 9.7955   LearningRate 0.3766   Epoch: 7   Global Step: 37970   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-15 23:13:43,325-Speed 10514.31 samples/sec   Loss 9.7520   LearningRate 0.3765   Epoch: 7   Global Step: 37980   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-15 23:13:51,153-Speed 10466.09 samples/sec   Loss 9.7770   LearningRate 0.3763   Epoch: 7   Global Step: 37990   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-15 23:13:58,930-Speed 10534.88 samples/sec   Loss 9.7686   LearningRate 0.3762   Epoch: 7   Global Step: 38000   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-15 23:14:06,729-Speed 10505.64 samples/sec   Loss 9.7108   LearningRate 0.3761   Epoch: 7   Global Step: 38010   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-15 23:14:14,524-Speed 10510.31 samples/sec   Loss 9.7651   LearningRate 0.3760   Epoch: 7   Global Step: 38020   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-15 23:14:22,370-Speed 10442.78 samples/sec   Loss 9.8598   LearningRate 0.3759   Epoch: 7   Global Step: 38030   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-15 23:14:30,167-Speed 10507.42 samples/sec   Loss 9.8156   LearningRate 0.3758   Epoch: 7   Global Step: 38040   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-15 23:14:37,956-Speed 10518.40 samples/sec   Loss 9.7905   LearningRate 0.3757   Epoch: 7   Global Step: 38050   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-15 23:14:45,766-Speed 10491.41 samples/sec   Loss 9.7582   LearningRate 0.3755   Epoch: 7   Global Step: 38060   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-15 23:14:53,555-Speed 10518.86 samples/sec   Loss 9.7677   LearningRate 0.3754   Epoch: 7   Global Step: 38070   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:15:01,457-Speed 10368.48 samples/sec   Loss 9.7089   LearningRate 0.3753   Epoch: 7   Global Step: 38080   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:15:09,324-Speed 10414.19 samples/sec   Loss 9.7577   LearningRate 0.3752   Epoch: 7   Global Step: 38090   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:15:17,134-Speed 10490.61 samples/sec   Loss 9.7835   LearningRate 0.3751   Epoch: 7   Global Step: 38100   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:15:24,955-Speed 10475.43 samples/sec   Loss 9.7860   LearningRate 0.3750   Epoch: 7   Global Step: 38110   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:15:32,760-Speed 10496.98 samples/sec   Loss 9.8778   LearningRate 0.3749   Epoch: 7   Global Step: 38120   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:15:40,557-Speed 10508.26 samples/sec   Loss 9.7577   LearningRate 0.3747   Epoch: 7   Global Step: 38130   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:15:48,371-Speed 10485.80 samples/sec   Loss 9.7785   LearningRate 0.3746   Epoch: 7   Global Step: 38140   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:15:56,171-Speed 10503.84 samples/sec   Loss 9.7697   LearningRate 0.3745   Epoch: 7   Global Step: 38150   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:16:03,972-Speed 10502.10 samples/sec   Loss 9.7760   LearningRate 0.3744   Epoch: 7   Global Step: 38160   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-15 23:16:11,795-Speed 10472.82 samples/sec   Loss 9.8586   LearningRate 0.3743   Epoch: 7   Global Step: 38170   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-15 23:16:19,625-Speed 10464.77 samples/sec   Loss 9.7258   LearningRate 0.3742   Epoch: 7   Global Step: 38180   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-15 23:16:27,438-Speed 10485.59 samples/sec   Loss 9.6733   LearningRate 0.3741   Epoch: 7   Global Step: 38190   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-15 23:16:35,230-Speed 10515.98 samples/sec   Loss 9.6947   LearningRate 0.3739   Epoch: 7   Global Step: 38200   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-15 23:16:43,067-Speed 10454.94 samples/sec   Loss 9.7407   LearningRate 0.3738   Epoch: 7   Global Step: 38210   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-15 23:16:50,878-Speed 10489.75 samples/sec   Loss 9.7334   LearningRate 0.3737   Epoch: 7   Global Step: 38220   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-15 23:16:58,688-Speed 10491.26 samples/sec   Loss 9.6989   LearningRate 0.3736   Epoch: 7   Global Step: 38230   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-15 23:17:06,518-Speed 10462.79 samples/sec   Loss 9.7370   LearningRate 0.3735   Epoch: 7   Global Step: 38240   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-15 23:17:14,314-Speed 10509.85 samples/sec   Loss 9.6990   LearningRate 0.3734   Epoch: 7   Global Step: 38250   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-15 23:17:22,126-Speed 10487.96 samples/sec   Loss 9.7854   LearningRate 0.3733   Epoch: 7   Global Step: 38260   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-15 23:17:30,001-Speed 10404.95 samples/sec   Loss 9.6897   LearningRate 0.3731   Epoch: 7   Global Step: 38270   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-15 23:17:37,885-Speed 10391.35 samples/sec   Loss 9.7765   LearningRate 0.3730   Epoch: 7   Global Step: 38280   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-15 23:17:45,693-Speed 10493.14 samples/sec   Loss 9.7403   LearningRate 0.3729   Epoch: 7   Global Step: 38290   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-15 23:17:53,517-Speed 10471.73 samples/sec   Loss 9.7033   LearningRate 0.3728   Epoch: 7   Global Step: 38300   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-15 23:18:01,325-Speed 10492.99 samples/sec   Loss 9.7257   LearningRate 0.3727   Epoch: 7   Global Step: 38310   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-15 23:18:09,106-Speed 10529.60 samples/sec   Loss 9.6460   LearningRate 0.3726   Epoch: 7   Global Step: 38320   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-15 23:18:16,901-Speed 10511.45 samples/sec   Loss 9.7566   LearningRate 0.3725   Epoch: 7   Global Step: 38330   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-15 23:18:24,743-Speed 10447.58 samples/sec   Loss 9.8122   LearningRate 0.3723   Epoch: 7   Global Step: 38340   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-15 23:18:32,553-Speed 10490.68 samples/sec   Loss 9.7605   LearningRate 0.3722   Epoch: 7   Global Step: 38350   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-15 23:18:40,331-Speed 10533.46 samples/sec   Loss 9.7273   LearningRate 0.3721   Epoch: 7   Global Step: 38360   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:18:48,156-Speed 10470.36 samples/sec   Loss 9.7573   LearningRate 0.3720   Epoch: 7   Global Step: 38370   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:18:55,989-Speed 10460.13 samples/sec   Loss 9.6641   LearningRate 0.3719   Epoch: 7   Global Step: 38380   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:19:03,829-Speed 10451.24 samples/sec   Loss 9.7404   LearningRate 0.3718   Epoch: 7   Global Step: 38390   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:19:11,611-Speed 10527.99 samples/sec   Loss 9.7243   LearningRate 0.3717   Epoch: 7   Global Step: 38400   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:19:19,406-Speed 10510.60 samples/sec   Loss 9.6134   LearningRate 0.3715   Epoch: 7   Global Step: 38410   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:19:27,247-Speed 10449.99 samples/sec   Loss 9.7346   LearningRate 0.3714   Epoch: 7   Global Step: 38420   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:19:35,087-Speed 10450.29 samples/sec   Loss 9.6899   LearningRate 0.3713   Epoch: 7   Global Step: 38430   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:19:42,876-Speed 10517.31 samples/sec   Loss 9.6771   LearningRate 0.3712   Epoch: 7   Global Step: 38440   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:19:50,674-Speed 10507.62 samples/sec   Loss 9.7469   LearningRate 0.3711   Epoch: 7   Global Step: 38450   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:19:58,480-Speed 10496.01 samples/sec   Loss 9.8443   LearningRate 0.3710   Epoch: 7   Global Step: 38460   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:20:06,272-Speed 10514.05 samples/sec   Loss 9.7056   LearningRate 0.3709   Epoch: 7   Global Step: 38470   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:20:14,078-Speed 10496.64 samples/sec   Loss 9.6756   LearningRate 0.3707   Epoch: 7   Global Step: 38480   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:20:21,894-Speed 10482.34 samples/sec   Loss 9.7279   LearningRate 0.3706   Epoch: 7   Global Step: 38490   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:20:29,760-Speed 10420.25 samples/sec   Loss 9.7143   LearningRate 0.3705   Epoch: 7   Global Step: 38500   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:20:37,578-Speed 10480.82 samples/sec   Loss 9.7421   LearningRate 0.3704   Epoch: 7   Global Step: 38510   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:20:45,384-Speed 10495.84 samples/sec   Loss 9.6715   LearningRate 0.3703   Epoch: 7   Global Step: 38520   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:20:53,186-Speed 10500.55 samples/sec   Loss 9.6813   LearningRate 0.3702   Epoch: 7   Global Step: 38530   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:21:00,984-Speed 10507.05 samples/sec   Loss 9.7665   LearningRate 0.3701   Epoch: 7   Global Step: 38540   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:21:08,773-Speed 10518.31 samples/sec   Loss 9.6886   LearningRate 0.3700   Epoch: 7   Global Step: 38550   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:21:16,578-Speed 10498.06 samples/sec   Loss 9.6870   LearningRate 0.3698   Epoch: 7   Global Step: 38560   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:21:24,401-Speed 10472.10 samples/sec   Loss 9.6252   LearningRate 0.3697   Epoch: 7   Global Step: 38570   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:21:32,206-Speed 10498.02 samples/sec   Loss 9.7309   LearningRate 0.3696   Epoch: 7   Global Step: 38580   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:21:40,005-Speed 10505.36 samples/sec   Loss 9.7243   LearningRate 0.3695   Epoch: 7   Global Step: 38590   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:21:47,810-Speed 10496.15 samples/sec   Loss 9.6857   LearningRate 0.3694   Epoch: 7   Global Step: 38600   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:21:55,610-Speed 10503.87 samples/sec   Loss 9.7554   LearningRate 0.3693   Epoch: 7   Global Step: 38610   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:22:03,414-Speed 10500.47 samples/sec   Loss 9.6648   LearningRate 0.3692   Epoch: 7   Global Step: 38620   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:22:11,217-Speed 10498.86 samples/sec   Loss 9.7663   LearningRate 0.3690   Epoch: 7   Global Step: 38630   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:22:19,004-Speed 10521.59 samples/sec   Loss 9.6363   LearningRate 0.3689   Epoch: 7   Global Step: 38640   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:22:26,799-Speed 10510.15 samples/sec   Loss 9.7505   LearningRate 0.3688   Epoch: 7   Global Step: 38650   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:22:34,618-Speed 10481.85 samples/sec   Loss 9.7064   LearningRate 0.3687   Epoch: 7   Global Step: 38660   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:22:42,454-Speed 10455.81 samples/sec   Loss 9.6924   LearningRate 0.3686   Epoch: 7   Global Step: 38670   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:22:50,255-Speed 10501.55 samples/sec   Loss 9.7104   LearningRate 0.3685   Epoch: 7   Global Step: 38680   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:22:58,068-Speed 10487.33 samples/sec   Loss 9.6730   LearningRate 0.3684   Epoch: 7   Global Step: 38690   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:23:05,878-Speed 10490.28 samples/sec   Loss 9.6557   LearningRate 0.3682   Epoch: 7   Global Step: 38700   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:23:13,684-Speed 10495.70 samples/sec   Loss 9.6612   LearningRate 0.3681   Epoch: 7   Global Step: 38710   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:23:21,477-Speed 10513.65 samples/sec   Loss 9.6560   LearningRate 0.3680   Epoch: 7   Global Step: 38720   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:23:29,266-Speed 10519.63 samples/sec   Loss 9.7321   LearningRate 0.3679   Epoch: 7   Global Step: 38730   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:23:37,053-Speed 10520.75 samples/sec   Loss 9.7296   LearningRate 0.3678   Epoch: 7   Global Step: 38740   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:23:44,847-Speed 10512.76 samples/sec   Loss 9.7706   LearningRate 0.3677   Epoch: 7   Global Step: 38750   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:23:52,667-Speed 10475.93 samples/sec   Loss 9.6674   LearningRate 0.3676   Epoch: 7   Global Step: 38760   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:24:00,466-Speed 10506.41 samples/sec   Loss 9.6779   LearningRate 0.3675   Epoch: 7   Global Step: 38770   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:24:08,260-Speed 10511.80 samples/sec   Loss 9.6190   LearningRate 0.3673   Epoch: 7   Global Step: 38780   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:24:16,048-Speed 10519.96 samples/sec   Loss 9.6659   LearningRate 0.3672   Epoch: 7   Global Step: 38790   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:24:23,827-Speed 10532.50 samples/sec   Loss 9.6379   LearningRate 0.3671   Epoch: 7   Global Step: 38800   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:24:31,631-Speed 10499.12 samples/sec   Loss 9.7186   LearningRate 0.3670   Epoch: 7   Global Step: 38810   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:24:39,423-Speed 10515.11 samples/sec   Loss 9.6425   LearningRate 0.3669   Epoch: 7   Global Step: 38820   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:24:47,234-Speed 10488.93 samples/sec   Loss 9.6253   LearningRate 0.3668   Epoch: 7   Global Step: 38830   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:24:55,019-Speed 10522.55 samples/sec   Loss 9.6789   LearningRate 0.3667   Epoch: 7   Global Step: 38840   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:25:02,848-Speed 10465.96 samples/sec   Loss 9.6450   LearningRate 0.3666   Epoch: 7   Global Step: 38850   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:25:10,675-Speed 10468.87 samples/sec   Loss 9.6634   LearningRate 0.3664   Epoch: 7   Global Step: 38860   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:25:18,497-Speed 10473.51 samples/sec   Loss 9.6520   LearningRate 0.3663   Epoch: 7   Global Step: 38870   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:25:26,312-Speed 10483.71 samples/sec   Loss 9.6807   LearningRate 0.3662   Epoch: 7   Global Step: 38880   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:25:34,113-Speed 10503.53 samples/sec   Loss 9.6501   LearningRate 0.3661   Epoch: 7   Global Step: 38890   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:25:41,924-Speed 10488.31 samples/sec   Loss 9.6083   LearningRate 0.3660   Epoch: 7   Global Step: 38900   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:25:49,750-Speed 10469.62 samples/sec   Loss 9.6694   LearningRate 0.3659   Epoch: 7   Global Step: 38910   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:25:57,583-Speed 10460.24 samples/sec   Loss 9.6463   LearningRate 0.3658   Epoch: 7   Global Step: 38920   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:26:05,406-Speed 10472.41 samples/sec   Loss 9.7194   LearningRate 0.3656   Epoch: 7   Global Step: 38930   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:26:13,241-Speed 10457.34 samples/sec   Loss 9.6408   LearningRate 0.3655   Epoch: 7   Global Step: 38940   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:26:21,059-Speed 10479.53 samples/sec   Loss 9.6719   LearningRate 0.3654   Epoch: 7   Global Step: 38950   Fp16 Grad Scale: 524288   Required: 14 hours
Training: 2022-01-15 23:26:28,842-Speed 10527.35 samples/sec   Loss 9.6686   LearningRate 0.3653   Epoch: 7   Global Step: 38960   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:26:36,625-Speed 10526.50 samples/sec   Loss 9.6503   LearningRate 0.3652   Epoch: 7   Global Step: 38970   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:26:44,433-Speed 10493.00 samples/sec   Loss 9.6186   LearningRate 0.3651   Epoch: 7   Global Step: 38980   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:26:52,250-Speed 10481.04 samples/sec   Loss 9.6504   LearningRate 0.3650   Epoch: 7   Global Step: 38990   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:27:00,048-Speed 10507.26 samples/sec   Loss 9.6293   LearningRate 0.3649   Epoch: 7   Global Step: 39000   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:27:07,839-Speed 10516.90 samples/sec   Loss 9.6685   LearningRate 0.3647   Epoch: 7   Global Step: 39010   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:27:15,627-Speed 10519.79 samples/sec   Loss 9.6778   LearningRate 0.3646   Epoch: 7   Global Step: 39020   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:27:23,438-Speed 10488.66 samples/sec   Loss 9.6395   LearningRate 0.3645   Epoch: 7   Global Step: 39030   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:27:31,249-Speed 10489.16 samples/sec   Loss 9.6574   LearningRate 0.3644   Epoch: 7   Global Step: 39040   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:27:39,042-Speed 10514.18 samples/sec   Loss 9.6950   LearningRate 0.3643   Epoch: 7   Global Step: 39050   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:27:46,864-Speed 10473.75 samples/sec   Loss 9.6484   LearningRate 0.3642   Epoch: 7   Global Step: 39060   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:27:54,651-Speed 10520.94 samples/sec   Loss 9.6208   LearningRate 0.3641   Epoch: 7   Global Step: 39070   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:28:02,450-Speed 10505.74 samples/sec   Loss 9.6107   LearningRate 0.3640   Epoch: 7   Global Step: 39080   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:28:10,254-Speed 10498.84 samples/sec   Loss 9.6182   LearningRate 0.3638   Epoch: 7   Global Step: 39090   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:28:18,063-Speed 10493.01 samples/sec   Loss 9.6345   LearningRate 0.3637   Epoch: 7   Global Step: 39100   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:28:25,871-Speed 10492.67 samples/sec   Loss 9.5895   LearningRate 0.3636   Epoch: 7   Global Step: 39110   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:28:33,681-Speed 10489.64 samples/sec   Loss 9.6571   LearningRate 0.3635   Epoch: 7   Global Step: 39120   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:28:41,460-Speed 10533.00 samples/sec   Loss 9.6567   LearningRate 0.3634   Epoch: 7   Global Step: 39130   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:28:49,264-Speed 10498.46 samples/sec   Loss 9.6140   LearningRate 0.3633   Epoch: 7   Global Step: 39140   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:28:57,061-Speed 10508.14 samples/sec   Loss 9.6451   LearningRate 0.3632   Epoch: 7   Global Step: 39150   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:29:04,870-Speed 10491.80 samples/sec   Loss 9.5767   LearningRate 0.3631   Epoch: 7   Global Step: 39160   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:29:12,692-Speed 10477.45 samples/sec   Loss 9.6520   LearningRate 0.3629   Epoch: 7   Global Step: 39170   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:29:20,481-Speed 10518.92 samples/sec   Loss 9.6383   LearningRate 0.3628   Epoch: 7   Global Step: 39180   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:29:28,266-Speed 10524.71 samples/sec   Loss 9.6072   LearningRate 0.3627   Epoch: 7   Global Step: 39190   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:29:36,100-Speed 10457.22 samples/sec   Loss 9.6869   LearningRate 0.3626   Epoch: 7   Global Step: 39200   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:29:43,908-Speed 10492.78 samples/sec   Loss 9.6555   LearningRate 0.3625   Epoch: 7   Global Step: 39210   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:29:51,714-Speed 10495.88 samples/sec   Loss 9.6444   LearningRate 0.3624   Epoch: 7   Global Step: 39220   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:29:59,516-Speed 10501.98 samples/sec   Loss 9.5714   LearningRate 0.3623   Epoch: 7   Global Step: 39230   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:30:07,338-Speed 10474.18 samples/sec   Loss 9.5702   LearningRate 0.3622   Epoch: 7   Global Step: 39240   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:30:15,124-Speed 10523.22 samples/sec   Loss 9.5067   LearningRate 0.3620   Epoch: 7   Global Step: 39250   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:30:22,906-Speed 10527.87 samples/sec   Loss 9.6641   LearningRate 0.3619   Epoch: 7   Global Step: 39260   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:30:30,689-Speed 10527.67 samples/sec   Loss 9.7016   LearningRate 0.3618   Epoch: 7   Global Step: 39270   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:30:38,478-Speed 10518.36 samples/sec   Loss 9.5905   LearningRate 0.3617   Epoch: 7   Global Step: 39280   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:30:46,278-Speed 10503.25 samples/sec   Loss 9.6221   LearningRate 0.3616   Epoch: 7   Global Step: 39290   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:30:54,081-Speed 10501.14 samples/sec   Loss 9.6274   LearningRate 0.3615   Epoch: 7   Global Step: 39300   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:31:01,898-Speed 10481.63 samples/sec   Loss 9.5320   LearningRate 0.3614   Epoch: 7   Global Step: 39310   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:31:09,737-Speed 10451.05 samples/sec   Loss 9.6100   LearningRate 0.3613   Epoch: 7   Global Step: 39320   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:31:17,541-Speed 10498.56 samples/sec   Loss 9.5873   LearningRate 0.3611   Epoch: 7   Global Step: 39330   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:31:25,344-Speed 10500.63 samples/sec   Loss 9.5586   LearningRate 0.3610   Epoch: 7   Global Step: 39340   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:31:33,141-Speed 10507.55 samples/sec   Loss 9.5415   LearningRate 0.3609   Epoch: 7   Global Step: 39350   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:31:40,929-Speed 10520.90 samples/sec   Loss 9.5303   LearningRate 0.3608   Epoch: 7   Global Step: 39360   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:31:48,736-Speed 10493.73 samples/sec   Loss 9.6018   LearningRate 0.3607   Epoch: 7   Global Step: 39370   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:31:56,545-Speed 10492.34 samples/sec   Loss 9.6654   LearningRate 0.3606   Epoch: 7   Global Step: 39380   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:32:04,334-Speed 10518.82 samples/sec   Loss 9.6428   LearningRate 0.3605   Epoch: 7   Global Step: 39390   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:32:12,115-Speed 10530.03 samples/sec   Loss 9.6620   LearningRate 0.3604   Epoch: 7   Global Step: 39400   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:32:19,904-Speed 10518.52 samples/sec   Loss 9.6622   LearningRate 0.3602   Epoch: 7   Global Step: 39410   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:32:27,696-Speed 10514.81 samples/sec   Loss 9.5463   LearningRate 0.3601   Epoch: 7   Global Step: 39420   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:32:35,521-Speed 10470.53 samples/sec   Loss 9.5403   LearningRate 0.3600   Epoch: 7   Global Step: 39430   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:32:43,301-Speed 10531.05 samples/sec   Loss 9.4992   LearningRate 0.3599   Epoch: 7   Global Step: 39440   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:32:51,129-Speed 10467.10 samples/sec   Loss 9.5790   LearningRate 0.3598   Epoch: 7   Global Step: 39450   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:32:58,952-Speed 10472.90 samples/sec   Loss 9.5538   LearningRate 0.3597   Epoch: 7   Global Step: 39460   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:33:06,762-Speed 10490.42 samples/sec   Loss 9.5832   LearningRate 0.3596   Epoch: 7   Global Step: 39470   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:33:14,552-Speed 10517.44 samples/sec   Loss 9.4907   LearningRate 0.3595   Epoch: 7   Global Step: 39480   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:33:22,350-Speed 10507.20 samples/sec   Loss 9.5335   LearningRate 0.3593   Epoch: 7   Global Step: 39490   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:33:30,140-Speed 10516.58 samples/sec   Loss 9.5994   LearningRate 0.3592   Epoch: 7   Global Step: 39500   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:33:38,000-Speed 10424.00 samples/sec   Loss 9.5468   LearningRate 0.3591   Epoch: 7   Global Step: 39510   Fp16 Grad Scale: 524288   Required: 14 hours
Training: 2022-01-15 23:33:45,825-Speed 10470.59 samples/sec   Loss 9.6292   LearningRate 0.3590   Epoch: 7   Global Step: 39520   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:33:53,625-Speed 10505.22 samples/sec   Loss 9.5422   LearningRate 0.3589   Epoch: 7   Global Step: 39530   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:34:01,411-Speed 10523.53 samples/sec   Loss 9.5555   LearningRate 0.3588   Epoch: 7   Global Step: 39540   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:34:09,231-Speed 10476.29 samples/sec   Loss 9.5569   LearningRate 0.3587   Epoch: 7   Global Step: 39550   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:34:17,081-Speed 10436.95 samples/sec   Loss 9.5600   LearningRate 0.3586   Epoch: 7   Global Step: 39560   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:34:24,859-Speed 10533.76 samples/sec   Loss 9.5858   LearningRate 0.3585   Epoch: 7   Global Step: 39570   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:34:32,648-Speed 10528.04 samples/sec   Loss 9.6149   LearningRate 0.3583   Epoch: 7   Global Step: 39580   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:34:40,454-Speed 10496.46 samples/sec   Loss 9.5185   LearningRate 0.3582   Epoch: 7   Global Step: 39590   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:34:48,267-Speed 10485.07 samples/sec   Loss 9.5579   LearningRate 0.3581   Epoch: 7   Global Step: 39600   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:34:56,065-Speed 10506.60 samples/sec   Loss 9.5968   LearningRate 0.3580   Epoch: 7   Global Step: 39610   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:35:03,853-Speed 10521.06 samples/sec   Loss 9.5895   LearningRate 0.3579   Epoch: 7   Global Step: 39620   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:35:11,654-Speed 10503.36 samples/sec   Loss 9.5477   LearningRate 0.3578   Epoch: 7   Global Step: 39630   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:35:19,445-Speed 10515.38 samples/sec   Loss 9.5645   LearningRate 0.3577   Epoch: 7   Global Step: 39640   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:35:27,239-Speed 10513.08 samples/sec   Loss 9.5118   LearningRate 0.3576   Epoch: 7   Global Step: 39650   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:35:35,014-Speed 10536.93 samples/sec   Loss 9.4659   LearningRate 0.3574   Epoch: 7   Global Step: 39660   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:35:42,838-Speed 10472.05 samples/sec   Loss 9.5577   LearningRate 0.3573   Epoch: 7   Global Step: 39670   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:35:50,622-Speed 10525.00 samples/sec   Loss 9.5891   LearningRate 0.3572   Epoch: 7   Global Step: 39680   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:35:58,444-Speed 10475.52 samples/sec   Loss 9.5182   LearningRate 0.3571   Epoch: 7   Global Step: 39690   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:36:06,245-Speed 10501.81 samples/sec   Loss 9.4771   LearningRate 0.3570   Epoch: 7   Global Step: 39700   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:36:14,044-Speed 10505.59 samples/sec   Loss 9.5182   LearningRate 0.3569   Epoch: 7   Global Step: 39710   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:36:21,835-Speed 10516.16 samples/sec   Loss 9.5268   LearningRate 0.3568   Epoch: 7   Global Step: 39720   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:36:29,630-Speed 10510.89 samples/sec   Loss 9.4943   LearningRate 0.3567   Epoch: 7   Global Step: 39730   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:36:37,423-Speed 10513.52 samples/sec   Loss 9.5758   LearningRate 0.3566   Epoch: 7   Global Step: 39740   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:36:45,228-Speed 10496.42 samples/sec   Loss 9.5045   LearningRate 0.3564   Epoch: 7   Global Step: 39750   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:36:53,031-Speed 10500.69 samples/sec   Loss 9.5953   LearningRate 0.3563   Epoch: 7   Global Step: 39760   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:37:00,842-Speed 10489.19 samples/sec   Loss 9.6606   LearningRate 0.3562   Epoch: 7   Global Step: 39770   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:37:08,633-Speed 10516.11 samples/sec   Loss 9.5618   LearningRate 0.3561   Epoch: 7   Global Step: 39780   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:37:16,422-Speed 10518.94 samples/sec   Loss 9.5242   LearningRate 0.3560   Epoch: 7   Global Step: 39790   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:37:24,239-Speed 10481.15 samples/sec   Loss 9.5222   LearningRate 0.3559   Epoch: 7   Global Step: 39800   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:37:32,042-Speed 10500.34 samples/sec   Loss 9.5112   LearningRate 0.3558   Epoch: 7   Global Step: 39810   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:37:39,854-Speed 10487.53 samples/sec   Loss 9.4632   LearningRate 0.3557   Epoch: 7   Global Step: 39820   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:37:47,673-Speed 10478.19 samples/sec   Loss 9.5221   LearningRate 0.3556   Epoch: 7   Global Step: 39830   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:37:55,491-Speed 10479.88 samples/sec   Loss 9.5361   LearningRate 0.3554   Epoch: 7   Global Step: 39840   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:38:03,282-Speed 10516.98 samples/sec   Loss 9.4927   LearningRate 0.3553   Epoch: 7   Global Step: 39850   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:38:11,062-Speed 10531.02 samples/sec   Loss 9.4745   LearningRate 0.3552   Epoch: 7   Global Step: 39860   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:38:18,864-Speed 10502.14 samples/sec   Loss 9.5269   LearningRate 0.3551   Epoch: 7   Global Step: 39870   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:38:26,631-Speed 10547.59 samples/sec   Loss 9.4470   LearningRate 0.3550   Epoch: 7   Global Step: 39880   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:38:34,424-Speed 10512.67 samples/sec   Loss 9.4486   LearningRate 0.3549   Epoch: 7   Global Step: 39890   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:38:42,239-Speed 10484.04 samples/sec   Loss 9.5675   LearningRate 0.3548   Epoch: 7   Global Step: 39900   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:38:50,027-Speed 10520.44 samples/sec   Loss 9.6014   LearningRate 0.3547   Epoch: 7   Global Step: 39910   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:38:57,799-Speed 10541.18 samples/sec   Loss 9.4758   LearningRate 0.3546   Epoch: 7   Global Step: 39920   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:39:05,567-Speed 10547.10 samples/sec   Loss 9.5176   LearningRate 0.3544   Epoch: 7   Global Step: 39930   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:39:13,361-Speed 10511.86 samples/sec   Loss 9.5031   LearningRate 0.3543   Epoch: 7   Global Step: 39940   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:39:21,153-Speed 10516.36 samples/sec   Loss 9.5189   LearningRate 0.3542   Epoch: 7   Global Step: 39950   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:39:29,031-Speed 10399.28 samples/sec   Loss 9.5243   LearningRate 0.3541   Epoch: 7   Global Step: 39960   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:39:36,814-Speed 10526.82 samples/sec   Loss 9.4545   LearningRate 0.3540   Epoch: 7   Global Step: 39970   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:39:44,602-Speed 10520.06 samples/sec   Loss 9.5439   LearningRate 0.3539   Epoch: 7   Global Step: 39980   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:39:52,418-Speed 10482.81 samples/sec   Loss 9.5078   LearningRate 0.3538   Epoch: 7   Global Step: 39990   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:40:00,202-Speed 10525.62 samples/sec   Loss 9.4429   LearningRate 0.3537   Epoch: 7   Global Step: 40000   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:40:28,043-[lfw][40000]XNorm: 23.041705
Training: 2022-01-15 23:40:28,043-[lfw][40000]Accuracy-Flip: 0.99667+-0.00236
Training: 2022-01-15 23:40:28,044-[lfw][40000]Accuracy-Highest: 0.99667
Training: 2022-01-15 23:41:00,975-[cfp_fp][40000]XNorm: 19.753488
Training: 2022-01-15 23:41:00,976-[cfp_fp][40000]Accuracy-Flip: 0.97571+-0.00818
Training: 2022-01-15 23:41:00,976-[cfp_fp][40000]Accuracy-Highest: 0.97571
Training: 2022-01-15 23:41:29,290-[agedb_30][40000]XNorm: 22.239426
Training: 2022-01-15 23:41:29,291-[agedb_30][40000]Accuracy-Flip: 0.96467+-0.01092
Training: 2022-01-15 23:41:29,291-[agedb_30][40000]Accuracy-Highest: 0.96467
Training: 2022-01-15 23:41:37,051-Speed 845.86 samples/sec   Loss 9.5444   LearningRate 0.3536   Epoch: 7   Global Step: 40010   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:41:44,797-Speed 10577.06 samples/sec   Loss 9.4968   LearningRate 0.3534   Epoch: 7   Global Step: 40020   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:41:52,560-Speed 10554.31 samples/sec   Loss 9.5220   LearningRate 0.3533   Epoch: 7   Global Step: 40030   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:42:00,306-Speed 10577.08 samples/sec   Loss 9.5181   LearningRate 0.3532   Epoch: 7   Global Step: 40040   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:42:08,044-Speed 10589.06 samples/sec   Loss 9.4939   LearningRate 0.3531   Epoch: 7   Global Step: 40050   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:42:15,822-Speed 10533.74 samples/sec   Loss 9.4221   LearningRate 0.3530   Epoch: 7   Global Step: 40060   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:42:23,579-Speed 10561.51 samples/sec   Loss 9.5659   LearningRate 0.3529   Epoch: 7   Global Step: 40070   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:42:31,347-Speed 10548.31 samples/sec   Loss 9.5836   LearningRate 0.3528   Epoch: 7   Global Step: 40080   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:42:39,122-Speed 10536.69 samples/sec   Loss 9.4509   LearningRate 0.3527   Epoch: 7   Global Step: 40090   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:42:46,896-Speed 10540.15 samples/sec   Loss 9.4689   LearningRate 0.3526   Epoch: 7   Global Step: 40100   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:42:54,683-Speed 10520.88 samples/sec   Loss 9.4502   LearningRate 0.3524   Epoch: 7   Global Step: 40110   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:43:02,495-Speed 10487.43 samples/sec   Loss 9.4475   LearningRate 0.3523   Epoch: 7   Global Step: 40120   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:43:10,257-Speed 10555.53 samples/sec   Loss 9.4761   LearningRate 0.3522   Epoch: 7   Global Step: 40130   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:43:18,025-Speed 10549.20 samples/sec   Loss 9.4941   LearningRate 0.3521   Epoch: 7   Global Step: 40140   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:43:25,821-Speed 10508.98 samples/sec   Loss 9.4824   LearningRate 0.3520   Epoch: 7   Global Step: 40150   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:43:33,583-Speed 10555.37 samples/sec   Loss 9.5030   LearningRate 0.3519   Epoch: 7   Global Step: 40160   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:43:41,377-Speed 10512.48 samples/sec   Loss 9.4451   LearningRate 0.3518   Epoch: 7   Global Step: 40170   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:43:49,164-Speed 10530.15 samples/sec   Loss 9.4338   LearningRate 0.3517   Epoch: 7   Global Step: 40180   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:43:56,922-Speed 10560.40 samples/sec   Loss 9.4351   LearningRate 0.3516   Epoch: 7   Global Step: 40190   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:44:04,702-Speed 10530.54 samples/sec   Loss 9.4430   LearningRate 0.3514   Epoch: 7   Global Step: 40200   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:44:12,459-Speed 10561.07 samples/sec   Loss 9.4231   LearningRate 0.3513   Epoch: 7   Global Step: 40210   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:44:20,227-Speed 10548.57 samples/sec   Loss 9.4594   LearningRate 0.3512   Epoch: 7   Global Step: 40220   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:44:27,982-Speed 10565.56 samples/sec   Loss 9.5063   LearningRate 0.3511   Epoch: 7   Global Step: 40230   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:44:35,739-Speed 10561.47 samples/sec   Loss 9.4351   LearningRate 0.3510   Epoch: 7   Global Step: 40240   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:44:43,543-Speed 10499.45 samples/sec   Loss 9.4359   LearningRate 0.3509   Epoch: 7   Global Step: 40250   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:44:51,315-Speed 10541.68 samples/sec   Loss 9.3857   LearningRate 0.3508   Epoch: 7   Global Step: 40260   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:44:59,098-Speed 10528.03 samples/sec   Loss 9.4327   LearningRate 0.3507   Epoch: 7   Global Step: 40270   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:45:06,867-Speed 10544.34 samples/sec   Loss 9.4328   LearningRate 0.3506   Epoch: 7   Global Step: 40280   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:45:14,633-Speed 10550.87 samples/sec   Loss 9.4417   LearningRate 0.3504   Epoch: 7   Global Step: 40290   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:45:22,434-Speed 10503.49 samples/sec   Loss 9.4564   LearningRate 0.3503   Epoch: 7   Global Step: 40300   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:45:30,244-Speed 10490.71 samples/sec   Loss 9.6271   LearningRate 0.3502   Epoch: 7   Global Step: 40310   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:45:38,020-Speed 10535.94 samples/sec   Loss 9.6370   LearningRate 0.3501   Epoch: 7   Global Step: 40320   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:45:45,795-Speed 10538.16 samples/sec   Loss 9.4685   LearningRate 0.3500   Epoch: 7   Global Step: 40330   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:45:53,561-Speed 10549.65 samples/sec   Loss 9.4110   LearningRate 0.3499   Epoch: 7   Global Step: 40340   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:46:01,386-Speed 10471.40 samples/sec   Loss 9.4316   LearningRate 0.3498   Epoch: 7   Global Step: 40350   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:46:09,155-Speed 10545.47 samples/sec   Loss 9.4785   LearningRate 0.3497   Epoch: 7   Global Step: 40360   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:46:16,946-Speed 10516.67 samples/sec   Loss 9.4468   LearningRate 0.3496   Epoch: 7   Global Step: 40370   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:46:24,725-Speed 10530.73 samples/sec   Loss 9.4421   LearningRate 0.3495   Epoch: 7   Global Step: 40380   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:46:32,546-Speed 10476.46 samples/sec   Loss 9.4311   LearningRate 0.3493   Epoch: 7   Global Step: 40390   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:46:40,324-Speed 10534.01 samples/sec   Loss 9.4562   LearningRate 0.3492   Epoch: 7   Global Step: 40400   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:46:48,104-Speed 10530.51 samples/sec   Loss 9.4157   LearningRate 0.3491   Epoch: 7   Global Step: 40410   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:46:55,890-Speed 10523.77 samples/sec   Loss 9.4226   LearningRate 0.3490   Epoch: 7   Global Step: 40420   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:47:03,691-Speed 10503.09 samples/sec   Loss 9.4104   LearningRate 0.3489   Epoch: 7   Global Step: 40430   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:47:11,476-Speed 10522.82 samples/sec   Loss 9.3782   LearningRate 0.3488   Epoch: 7   Global Step: 40440   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:47:19,283-Speed 10494.43 samples/sec   Loss 9.4942   LearningRate 0.3487   Epoch: 7   Global Step: 40450   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:47:27,095-Speed 10488.00 samples/sec   Loss 9.4177   LearningRate 0.3486   Epoch: 7   Global Step: 40460   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:47:34,879-Speed 10528.23 samples/sec   Loss 9.3755   LearningRate 0.3485   Epoch: 7   Global Step: 40470   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:47:42,667-Speed 10519.73 samples/sec   Loss 9.4180   LearningRate 0.3483   Epoch: 7   Global Step: 40480   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:47:50,440-Speed 10541.66 samples/sec   Loss 9.4487   LearningRate 0.3482   Epoch: 7   Global Step: 40490   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:47:58,215-Speed 10541.61 samples/sec   Loss 9.4442   LearningRate 0.3481   Epoch: 7   Global Step: 40500   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:48:05,994-Speed 10532.72 samples/sec   Loss 9.4587   LearningRate 0.3480   Epoch: 7   Global Step: 40510   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:48:13,775-Speed 10529.44 samples/sec   Loss 9.4099   LearningRate 0.3479   Epoch: 7   Global Step: 40520   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:48:21,584-Speed 10493.43 samples/sec   Loss 9.3958   LearningRate 0.3478   Epoch: 7   Global Step: 40530   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:48:29,383-Speed 10509.13 samples/sec   Loss 9.4336   LearningRate 0.3477   Epoch: 7   Global Step: 40540   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:48:37,209-Speed 10469.23 samples/sec   Loss 9.4286   LearningRate 0.3476   Epoch: 7   Global Step: 40550   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:48:44,998-Speed 10518.84 samples/sec   Loss 9.4116   LearningRate 0.3475   Epoch: 7   Global Step: 40560   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:48:52,782-Speed 10524.23 samples/sec   Loss 9.4466   LearningRate 0.3474   Epoch: 7   Global Step: 40570   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:49:00,610-Speed 10466.60 samples/sec   Loss 9.3672   LearningRate 0.3472   Epoch: 7   Global Step: 40580   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:49:08,403-Speed 10513.83 samples/sec   Loss 9.3763   LearningRate 0.3471   Epoch: 7   Global Step: 40590   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:49:16,180-Speed 10534.43 samples/sec   Loss 9.4458   LearningRate 0.3470   Epoch: 7   Global Step: 40600   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:49:23,983-Speed 10500.57 samples/sec   Loss 9.4556   LearningRate 0.3469   Epoch: 7   Global Step: 40610   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:49:31,782-Speed 10504.94 samples/sec   Loss 9.3895   LearningRate 0.3468   Epoch: 7   Global Step: 40620   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:49:39,587-Speed 10496.57 samples/sec   Loss 9.3164   LearningRate 0.3467   Epoch: 7   Global Step: 40630   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:49:47,365-Speed 10534.62 samples/sec   Loss 9.5108   LearningRate 0.3466   Epoch: 7   Global Step: 40640   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:49:55,137-Speed 10541.87 samples/sec   Loss 9.3823   LearningRate 0.3465   Epoch: 7   Global Step: 40650   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:50:02,940-Speed 10500.30 samples/sec   Loss 9.3482   LearningRate 0.3464   Epoch: 7   Global Step: 40660   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:50:10,711-Speed 10543.60 samples/sec   Loss 9.4366   LearningRate 0.3463   Epoch: 7   Global Step: 40670   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:50:18,500-Speed 10517.76 samples/sec   Loss 9.3558   LearningRate 0.3461   Epoch: 7   Global Step: 40680   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:50:26,294-Speed 10512.57 samples/sec   Loss 9.4990   LearningRate 0.3460   Epoch: 7   Global Step: 40690   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:50:34,083-Speed 10519.07 samples/sec   Loss 9.3515   LearningRate 0.3459   Epoch: 7   Global Step: 40700   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:50:41,867-Speed 10526.49 samples/sec   Loss 9.3968   LearningRate 0.3458   Epoch: 7   Global Step: 40710   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:50:49,653-Speed 10522.03 samples/sec   Loss 9.3713   LearningRate 0.3457   Epoch: 7   Global Step: 40720   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:50:57,463-Speed 10490.54 samples/sec   Loss 9.3617   LearningRate 0.3456   Epoch: 7   Global Step: 40730   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:51:05,236-Speed 10539.91 samples/sec   Loss 9.4164   LearningRate 0.3455   Epoch: 7   Global Step: 40740   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:51:13,001-Speed 10552.56 samples/sec   Loss 9.3856   LearningRate 0.3454   Epoch: 7   Global Step: 40750   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:51:20,757-Speed 10563.32 samples/sec   Loss 9.3755   LearningRate 0.3453   Epoch: 7   Global Step: 40760   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:51:28,540-Speed 10526.83 samples/sec   Loss 9.4331   LearningRate 0.3452   Epoch: 7   Global Step: 40770   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:51:36,329-Speed 10518.09 samples/sec   Loss 9.3431   LearningRate 0.3451   Epoch: 7   Global Step: 40780   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:51:44,116-Speed 10522.13 samples/sec   Loss 9.3325   LearningRate 0.3449   Epoch: 7   Global Step: 40790   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:51:51,894-Speed 10533.11 samples/sec   Loss 9.3458   LearningRate 0.3448   Epoch: 7   Global Step: 40800   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:51:59,689-Speed 10510.58 samples/sec   Loss 9.3725   LearningRate 0.3447   Epoch: 7   Global Step: 40810   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:52:07,499-Speed 10491.31 samples/sec   Loss 9.3465   LearningRate 0.3446   Epoch: 7   Global Step: 40820   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:52:15,309-Speed 10491.17 samples/sec   Loss 9.3400   LearningRate 0.3445   Epoch: 7   Global Step: 40830   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:52:23,120-Speed 10488.60 samples/sec   Loss 9.3761   LearningRate 0.3444   Epoch: 7   Global Step: 40840   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:52:30,914-Speed 10512.05 samples/sec   Loss 9.3555   LearningRate 0.3443   Epoch: 7   Global Step: 40850   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:52:38,695-Speed 10530.90 samples/sec   Loss 9.3108   LearningRate 0.3442   Epoch: 7   Global Step: 40860   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:52:46,570-Speed 10403.19 samples/sec   Loss 9.4265   LearningRate 0.3441   Epoch: 7   Global Step: 40870   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:52:54,355-Speed 10525.01 samples/sec   Loss 9.3575   LearningRate 0.3440   Epoch: 7   Global Step: 40880   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:53:02,136-Speed 10529.10 samples/sec   Loss 9.3793   LearningRate 0.3438   Epoch: 7   Global Step: 40890   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:53:09,927-Speed 10516.53 samples/sec   Loss 9.3554   LearningRate 0.3437   Epoch: 7   Global Step: 40900   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:53:17,733-Speed 10495.56 samples/sec   Loss 9.3806   LearningRate 0.3436   Epoch: 7   Global Step: 40910   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:53:25,516-Speed 10527.01 samples/sec   Loss 9.4079   LearningRate 0.3435   Epoch: 7   Global Step: 40920   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:53:33,314-Speed 10505.72 samples/sec   Loss 9.3379   LearningRate 0.3434   Epoch: 7   Global Step: 40930   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:53:41,088-Speed 10539.60 samples/sec   Loss 9.3450   LearningRate 0.3433   Epoch: 7   Global Step: 40940   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:53:48,888-Speed 10504.00 samples/sec   Loss 9.3124   LearningRate 0.3432   Epoch: 7   Global Step: 40950   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:53:56,688-Speed 10503.92 samples/sec   Loss 9.4147   LearningRate 0.3431   Epoch: 7   Global Step: 40960   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:54:04,476-Speed 10520.09 samples/sec   Loss 9.4043   LearningRate 0.3430   Epoch: 7   Global Step: 40970   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:54:12,258-Speed 10528.84 samples/sec   Loss 9.3520   LearningRate 0.3429   Epoch: 7   Global Step: 40980   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:54:20,043-Speed 10524.20 samples/sec   Loss 9.3953   LearningRate 0.3428   Epoch: 7   Global Step: 40990   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:54:27,860-Speed 10480.75 samples/sec   Loss 9.4412   LearningRate 0.3426   Epoch: 7   Global Step: 41000   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:54:35,666-Speed 10495.38 samples/sec   Loss 9.3547   LearningRate 0.3425   Epoch: 7   Global Step: 41010   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:54:43,462-Speed 10510.56 samples/sec   Loss 9.3208   LearningRate 0.3424   Epoch: 7   Global Step: 41020   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:54:51,248-Speed 10522.35 samples/sec   Loss 9.3017   LearningRate 0.3423   Epoch: 7   Global Step: 41030   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:54:59,050-Speed 10501.07 samples/sec   Loss 9.3741   LearningRate 0.3422   Epoch: 7   Global Step: 41040   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:55:06,858-Speed 10493.32 samples/sec   Loss 9.2801   LearningRate 0.3421   Epoch: 7   Global Step: 41050   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:55:14,676-Speed 10482.59 samples/sec   Loss 9.3247   LearningRate 0.3420   Epoch: 7   Global Step: 41060   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:55:22,471-Speed 10510.54 samples/sec   Loss 9.2787   LearningRate 0.3419   Epoch: 7   Global Step: 41070   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:55:30,294-Speed 10473.25 samples/sec   Loss 9.2954   LearningRate 0.3418   Epoch: 7   Global Step: 41080   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:55:38,088-Speed 10511.77 samples/sec   Loss 9.2915   LearningRate 0.3417   Epoch: 7   Global Step: 41090   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:55:45,877-Speed 10519.18 samples/sec   Loss 9.3985   LearningRate 0.3415   Epoch: 7   Global Step: 41100   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:55:53,664-Speed 10521.86 samples/sec   Loss 9.3051   LearningRate 0.3414   Epoch: 7   Global Step: 41110   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:56:01,458-Speed 10511.38 samples/sec   Loss 9.4037   LearningRate 0.3413   Epoch: 7   Global Step: 41120   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:56:09,284-Speed 10469.42 samples/sec   Loss 9.4060   LearningRate 0.3412   Epoch: 7   Global Step: 41130   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:56:17,122-Speed 10454.57 samples/sec   Loss 9.3174   LearningRate 0.3411   Epoch: 7   Global Step: 41140   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:56:24,947-Speed 10470.97 samples/sec   Loss 9.2853   LearningRate 0.3410   Epoch: 7   Global Step: 41150   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:56:32,751-Speed 10497.72 samples/sec   Loss 9.4273   LearningRate 0.3409   Epoch: 7   Global Step: 41160   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:56:40,589-Speed 10454.40 samples/sec   Loss 9.3601   LearningRate 0.3408   Epoch: 7   Global Step: 41170   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:56:48,404-Speed 10483.24 samples/sec   Loss 9.3250   LearningRate 0.3407   Epoch: 7   Global Step: 41180   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:56:56,207-Speed 10499.92 samples/sec   Loss 9.2813   LearningRate 0.3406   Epoch: 7   Global Step: 41190   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:57:03,991-Speed 10525.58 samples/sec   Loss 9.2820   LearningRate 0.3405   Epoch: 7   Global Step: 41200   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:57:11,782-Speed 10517.92 samples/sec   Loss 9.3394   LearningRate 0.3403   Epoch: 7   Global Step: 41210   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:57:19,598-Speed 10482.80 samples/sec   Loss 9.3285   LearningRate 0.3402   Epoch: 7   Global Step: 41220   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:57:27,398-Speed 10503.30 samples/sec   Loss 9.2781   LearningRate 0.3401   Epoch: 7   Global Step: 41230   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:57:35,199-Speed 10503.59 samples/sec   Loss 9.3434   LearningRate 0.3400   Epoch: 7   Global Step: 41240   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:57:43,029-Speed 10466.08 samples/sec   Loss 9.3209   LearningRate 0.3399   Epoch: 7   Global Step: 41250   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-15 23:57:50,833-Speed 10498.73 samples/sec   Loss 9.3380   LearningRate 0.3398   Epoch: 7   Global Step: 41260   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:57:58,632-Speed 10505.81 samples/sec   Loss 9.3316   LearningRate 0.3397   Epoch: 7   Global Step: 41270   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:58:06,466-Speed 10458.10 samples/sec   Loss 9.2687   LearningRate 0.3396   Epoch: 7   Global Step: 41280   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:58:14,262-Speed 10508.63 samples/sec   Loss 9.3121   LearningRate 0.3395   Epoch: 7   Global Step: 41290   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:58:22,063-Speed 10503.51 samples/sec   Loss 9.3280   LearningRate 0.3394   Epoch: 7   Global Step: 41300   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:58:29,884-Speed 10475.70 samples/sec   Loss 9.2963   LearningRate 0.3393   Epoch: 7   Global Step: 41310   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:58:37,658-Speed 10539.44 samples/sec   Loss 9.3155   LearningRate 0.3392   Epoch: 7   Global Step: 41320   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:58:45,443-Speed 10524.12 samples/sec   Loss 9.2436   LearningRate 0.3390   Epoch: 7   Global Step: 41330   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:58:53,239-Speed 10509.57 samples/sec   Loss 9.3147   LearningRate 0.3389   Epoch: 7   Global Step: 41340   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:59:01,013-Speed 10539.83 samples/sec   Loss 9.2864   LearningRate 0.3388   Epoch: 7   Global Step: 41350   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:59:08,787-Speed 10539.79 samples/sec   Loss 9.3470   LearningRate 0.3387   Epoch: 7   Global Step: 41360   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:59:16,570-Speed 10525.59 samples/sec   Loss 9.5356   LearningRate 0.3386   Epoch: 7   Global Step: 41370   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:59:24,396-Speed 10468.20 samples/sec   Loss 9.3917   LearningRate 0.3385   Epoch: 7   Global Step: 41380   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:59:32,193-Speed 10508.86 samples/sec   Loss 9.3074   LearningRate 0.3384   Epoch: 7   Global Step: 41390   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:59:39,985-Speed 10517.60 samples/sec   Loss 9.2656   LearningRate 0.3383   Epoch: 7   Global Step: 41400   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:59:47,805-Speed 10475.52 samples/sec   Loss 9.2315   LearningRate 0.3382   Epoch: 7   Global Step: 41410   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-15 23:59:55,702-Speed 10374.80 samples/sec   Loss 9.2843   LearningRate 0.3381   Epoch: 7   Global Step: 41420   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:00:03,529-Speed 10468.67 samples/sec   Loss 9.2505   LearningRate 0.3380   Epoch: 7   Global Step: 41430   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:00:11,334-Speed 10499.28 samples/sec   Loss 9.2828   LearningRate 0.3378   Epoch: 7   Global Step: 41440   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:00:19,114-Speed 10531.77 samples/sec   Loss 9.2549   LearningRate 0.3377   Epoch: 7   Global Step: 41450   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:00:26,942-Speed 10466.58 samples/sec   Loss 9.3226   LearningRate 0.3376   Epoch: 7   Global Step: 41460   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-16 00:00:34,755-Speed 10487.00 samples/sec   Loss 9.3094   LearningRate 0.3375   Epoch: 7   Global Step: 41470   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-16 00:00:42,602-Speed 10440.79 samples/sec   Loss 9.2814   LearningRate 0.3374   Epoch: 7   Global Step: 41480   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-16 00:01:04,874-Speed 3678.26 samples/sec   Loss 9.3820   LearningRate 0.3373   Epoch: 8   Global Step: 41490   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-16 00:01:12,656-Speed 10529.37 samples/sec   Loss 9.2966   LearningRate 0.3372   Epoch: 8   Global Step: 41500   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-16 00:01:20,415-Speed 10559.32 samples/sec   Loss 9.4096   LearningRate 0.3371   Epoch: 8   Global Step: 41510   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-16 00:01:28,204-Speed 10519.11 samples/sec   Loss 9.2986   LearningRate 0.3370   Epoch: 8   Global Step: 41520   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-16 00:01:35,972-Speed 10545.92 samples/sec   Loss 9.2792   LearningRate 0.3369   Epoch: 8   Global Step: 41530   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-16 00:01:43,754-Speed 10529.66 samples/sec   Loss 9.2263   LearningRate 0.3368   Epoch: 8   Global Step: 41540   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-16 00:01:51,536-Speed 10527.40 samples/sec   Loss 9.2597   LearningRate 0.3367   Epoch: 8   Global Step: 41550   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:01:59,321-Speed 10525.05 samples/sec   Loss 9.2767   LearningRate 0.3365   Epoch: 8   Global Step: 41560   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:02:07,104-Speed 10525.97 samples/sec   Loss 9.2512   LearningRate 0.3364   Epoch: 8   Global Step: 41570   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:02:14,889-Speed 10525.43 samples/sec   Loss 9.2355   LearningRate 0.3363   Epoch: 8   Global Step: 41580   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:02:22,671-Speed 10527.03 samples/sec   Loss 9.2054   LearningRate 0.3362   Epoch: 8   Global Step: 41590   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:02:30,479-Speed 10493.67 samples/sec   Loss 9.2477   LearningRate 0.3361   Epoch: 8   Global Step: 41600   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:02:38,308-Speed 10465.58 samples/sec   Loss 9.3171   LearningRate 0.3360   Epoch: 8   Global Step: 41610   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:02:46,103-Speed 10511.60 samples/sec   Loss 9.2921   LearningRate 0.3359   Epoch: 8   Global Step: 41620   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:02:53,905-Speed 10500.62 samples/sec   Loss 9.2149   LearningRate 0.3358   Epoch: 8   Global Step: 41630   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:03:01,694-Speed 10517.91 samples/sec   Loss 9.1655   LearningRate 0.3357   Epoch: 8   Global Step: 41640   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:03:09,501-Speed 10495.37 samples/sec   Loss 9.2812   LearningRate 0.3356   Epoch: 8   Global Step: 41650   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-16 00:03:17,302-Speed 10503.17 samples/sec   Loss 9.2022   LearningRate 0.3355   Epoch: 8   Global Step: 41660   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-16 00:03:25,102-Speed 10503.01 samples/sec   Loss 9.2255   LearningRate 0.3354   Epoch: 8   Global Step: 41670   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-16 00:03:32,902-Speed 10504.45 samples/sec   Loss 9.2069   LearningRate 0.3352   Epoch: 8   Global Step: 41680   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-16 00:03:40,683-Speed 10531.27 samples/sec   Loss 9.3627   LearningRate 0.3351   Epoch: 8   Global Step: 41690   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:03:48,486-Speed 10500.81 samples/sec   Loss 9.2853   LearningRate 0.3350   Epoch: 8   Global Step: 41700   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:03:56,297-Speed 10489.06 samples/sec   Loss 9.2672   LearningRate 0.3349   Epoch: 8   Global Step: 41710   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:04:04,083-Speed 10523.04 samples/sec   Loss 9.2049   LearningRate 0.3348   Epoch: 8   Global Step: 41720   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:04:11,875-Speed 10514.91 samples/sec   Loss 9.2662   LearningRate 0.3347   Epoch: 8   Global Step: 41730   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:04:19,671-Speed 10509.42 samples/sec   Loss 9.2434   LearningRate 0.3346   Epoch: 8   Global Step: 41740   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:04:27,458-Speed 10521.11 samples/sec   Loss 9.2325   LearningRate 0.3345   Epoch: 8   Global Step: 41750   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:04:35,243-Speed 10523.84 samples/sec   Loss 9.1685   LearningRate 0.3344   Epoch: 8   Global Step: 41760   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:04:43,060-Speed 10482.62 samples/sec   Loss 9.1676   LearningRate 0.3343   Epoch: 8   Global Step: 41770   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:04:50,878-Speed 10480.30 samples/sec   Loss 9.3607   LearningRate 0.3342   Epoch: 8   Global Step: 41780   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:04:58,755-Speed 10401.48 samples/sec   Loss 9.1986   LearningRate 0.3341   Epoch: 8   Global Step: 41790   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-16 00:05:06,581-Speed 10468.46 samples/sec   Loss 9.3580   LearningRate 0.3340   Epoch: 8   Global Step: 41800   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-16 00:05:14,379-Speed 10508.82 samples/sec   Loss 9.2322   LearningRate 0.3338   Epoch: 8   Global Step: 41810   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-16 00:05:22,174-Speed 10509.80 samples/sec   Loss 9.2875   LearningRate 0.3337   Epoch: 8   Global Step: 41820   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-16 00:05:29,980-Speed 10496.02 samples/sec   Loss 9.2470   LearningRate 0.3336   Epoch: 8   Global Step: 41830   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-16 00:05:37,776-Speed 10509.63 samples/sec   Loss 9.1983   LearningRate 0.3335   Epoch: 8   Global Step: 41840   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-16 00:05:45,580-Speed 10499.09 samples/sec   Loss 9.1920   LearningRate 0.3334   Epoch: 8   Global Step: 41850   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-16 00:05:53,386-Speed 10495.93 samples/sec   Loss 9.2629   LearningRate 0.3333   Epoch: 8   Global Step: 41860   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:06:01,201-Speed 10483.70 samples/sec   Loss 9.2745   LearningRate 0.3332   Epoch: 8   Global Step: 41870   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:06:09,021-Speed 10478.07 samples/sec   Loss 9.2265   LearningRate 0.3331   Epoch: 8   Global Step: 41880   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:06:16,860-Speed 10451.40 samples/sec   Loss 9.2200   LearningRate 0.3330   Epoch: 8   Global Step: 41890   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:06:24,701-Speed 10449.62 samples/sec   Loss 9.2364   LearningRate 0.3329   Epoch: 8   Global Step: 41900   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:06:32,566-Speed 10416.85 samples/sec   Loss 9.2318   LearningRate 0.3328   Epoch: 8   Global Step: 41910   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:06:40,411-Speed 10442.87 samples/sec   Loss 9.2550   LearningRate 0.3327   Epoch: 8   Global Step: 41920   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:06:48,237-Speed 10469.77 samples/sec   Loss 9.3247   LearningRate 0.3325   Epoch: 8   Global Step: 41930   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:06:56,079-Speed 10447.32 samples/sec   Loss 9.2919   LearningRate 0.3324   Epoch: 8   Global Step: 41940   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:07:03,913-Speed 10458.80 samples/sec   Loss 9.1980   LearningRate 0.3323   Epoch: 8   Global Step: 41950   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:07:11,802-Speed 10386.18 samples/sec   Loss 9.1723   LearningRate 0.3322   Epoch: 8   Global Step: 41960   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-16 00:07:19,629-Speed 10467.68 samples/sec   Loss 9.2285   LearningRate 0.3321   Epoch: 8   Global Step: 41970   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:07:27,466-Speed 10454.19 samples/sec   Loss 9.2436   LearningRate 0.3320   Epoch: 8   Global Step: 41980   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:07:35,298-Speed 10460.53 samples/sec   Loss 9.1849   LearningRate 0.3319   Epoch: 8   Global Step: 41990   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:07:43,131-Speed 10460.55 samples/sec   Loss 9.2764   LearningRate 0.3318   Epoch: 8   Global Step: 42000   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:07:50,985-Speed 10432.12 samples/sec   Loss 9.2337   LearningRate 0.3317   Epoch: 8   Global Step: 42010   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:07:58,817-Speed 10462.20 samples/sec   Loss 9.1950   LearningRate 0.3316   Epoch: 8   Global Step: 42020   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:08:06,632-Speed 10484.59 samples/sec   Loss 9.1909   LearningRate 0.3315   Epoch: 8   Global Step: 42030   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:08:14,471-Speed 10451.39 samples/sec   Loss 9.1746   LearningRate 0.3314   Epoch: 8   Global Step: 42040   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:08:22,314-Speed 10447.46 samples/sec   Loss 9.1433   LearningRate 0.3313   Epoch: 8   Global Step: 42050   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:08:30,159-Speed 10444.48 samples/sec   Loss 9.1685   LearningRate 0.3311   Epoch: 8   Global Step: 42060   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:08:38,002-Speed 10446.77 samples/sec   Loss 9.1641   LearningRate 0.3310   Epoch: 8   Global Step: 42070   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-16 00:08:45,829-Speed 10467.51 samples/sec   Loss 9.2064   LearningRate 0.3309   Epoch: 8   Global Step: 42080   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-16 00:08:53,657-Speed 10467.13 samples/sec   Loss 9.1750   LearningRate 0.3308   Epoch: 8   Global Step: 42090   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-16 00:09:01,521-Speed 10419.62 samples/sec   Loss 9.2255   LearningRate 0.3307   Epoch: 8   Global Step: 42100   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-16 00:09:09,343-Speed 10473.88 samples/sec   Loss 9.1982   LearningRate 0.3306   Epoch: 8   Global Step: 42110   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-16 00:09:17,172-Speed 10464.69 samples/sec   Loss 9.2153   LearningRate 0.3305   Epoch: 8   Global Step: 42120   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-16 00:09:25,005-Speed 10460.81 samples/sec   Loss 9.1972   LearningRate 0.3304   Epoch: 8   Global Step: 42130   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:09:32,834-Speed 10464.12 samples/sec   Loss 9.1413   LearningRate 0.3303   Epoch: 8   Global Step: 42140   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:09:40,667-Speed 10460.16 samples/sec   Loss 9.1589   LearningRate 0.3302   Epoch: 8   Global Step: 42150   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:09:48,474-Speed 10495.07 samples/sec   Loss 9.2001   LearningRate 0.3301   Epoch: 8   Global Step: 42160   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:09:56,298-Speed 10470.56 samples/sec   Loss 9.1816   LearningRate 0.3300   Epoch: 8   Global Step: 42170   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:10:04,132-Speed 10458.99 samples/sec   Loss 9.1371   LearningRate 0.3299   Epoch: 8   Global Step: 42180   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:10:12,035-Speed 10367.33 samples/sec   Loss 9.1827   LearningRate 0.3298   Epoch: 8   Global Step: 42190   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:10:19,879-Speed 10445.74 samples/sec   Loss 9.2110   LearningRate 0.3296   Epoch: 8   Global Step: 42200   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:10:27,758-Speed 10398.96 samples/sec   Loss 9.1487   LearningRate 0.3295   Epoch: 8   Global Step: 42210   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:10:35,603-Speed 10443.17 samples/sec   Loss 9.1853   LearningRate 0.3294   Epoch: 8   Global Step: 42220   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:10:43,471-Speed 10414.15 samples/sec   Loss 9.2123   LearningRate 0.3293   Epoch: 8   Global Step: 42230   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-16 00:10:51,312-Speed 10447.82 samples/sec   Loss 9.2023   LearningRate 0.3292   Epoch: 8   Global Step: 42240   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-16 00:10:59,130-Speed 10483.21 samples/sec   Loss 9.1845   LearningRate 0.3291   Epoch: 8   Global Step: 42250   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-16 00:11:06,939-Speed 10492.07 samples/sec   Loss 9.2026   LearningRate 0.3290   Epoch: 8   Global Step: 42260   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-16 00:11:14,778-Speed 10452.00 samples/sec   Loss 9.1608   LearningRate 0.3289   Epoch: 8   Global Step: 42270   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-16 00:11:22,574-Speed 10509.32 samples/sec   Loss 9.1567   LearningRate 0.3288   Epoch: 8   Global Step: 42280   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-16 00:11:30,343-Speed 10546.19 samples/sec   Loss 9.2070   LearningRate 0.3287   Epoch: 8   Global Step: 42290   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:11:38,140-Speed 10508.61 samples/sec   Loss 9.1614   LearningRate 0.3286   Epoch: 8   Global Step: 42300   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:11:45,939-Speed 10505.90 samples/sec   Loss 9.1704   LearningRate 0.3285   Epoch: 8   Global Step: 42310   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:11:53,767-Speed 10465.87 samples/sec   Loss 9.1194   LearningRate 0.3284   Epoch: 8   Global Step: 42320   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:12:01,587-Speed 10477.51 samples/sec   Loss 9.1684   LearningRate 0.3283   Epoch: 8   Global Step: 42330   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-16 00:12:09,389-Speed 10502.92 samples/sec   Loss 9.0836   LearningRate 0.3281   Epoch: 8   Global Step: 42340   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:12:17,201-Speed 10487.03 samples/sec   Loss 9.1396   LearningRate 0.3280   Epoch: 8   Global Step: 42350   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:12:24,997-Speed 10510.54 samples/sec   Loss 9.1578   LearningRate 0.3279   Epoch: 8   Global Step: 42360   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:12:32,857-Speed 10423.33 samples/sec   Loss 9.2376   LearningRate 0.3278   Epoch: 8   Global Step: 42370   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:12:40,665-Speed 10493.74 samples/sec   Loss 9.1588   LearningRate 0.3277   Epoch: 8   Global Step: 42380   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:12:48,473-Speed 10492.84 samples/sec   Loss 9.1667   LearningRate 0.3276   Epoch: 8   Global Step: 42390   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:12:56,283-Speed 10489.89 samples/sec   Loss 9.1862   LearningRate 0.3275   Epoch: 8   Global Step: 42400   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:13:04,108-Speed 10470.06 samples/sec   Loss 9.0938   LearningRate 0.3274   Epoch: 8   Global Step: 42410   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:13:11,899-Speed 10516.97 samples/sec   Loss 9.1903   LearningRate 0.3273   Epoch: 8   Global Step: 42420   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:13:19,710-Speed 10489.29 samples/sec   Loss 9.1408   LearningRate 0.3272   Epoch: 8   Global Step: 42430   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:13:27,539-Speed 10465.08 samples/sec   Loss 9.0785   LearningRate 0.3271   Epoch: 8   Global Step: 42440   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:13:35,374-Speed 10457.01 samples/sec   Loss 9.0881   LearningRate 0.3270   Epoch: 8   Global Step: 42450   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:13:43,186-Speed 10488.31 samples/sec   Loss 9.0603   LearningRate 0.3269   Epoch: 8   Global Step: 42460   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:13:50,993-Speed 10494.25 samples/sec   Loss 9.1069   LearningRate 0.3268   Epoch: 8   Global Step: 42470   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:13:58,804-Speed 10488.92 samples/sec   Loss 9.2112   LearningRate 0.3267   Epoch: 8   Global Step: 42480   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:14:06,613-Speed 10494.80 samples/sec   Loss 9.1129   LearningRate 0.3265   Epoch: 8   Global Step: 42490   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:14:14,428-Speed 10484.35 samples/sec   Loss 9.1703   LearningRate 0.3264   Epoch: 8   Global Step: 42500   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:14:22,225-Speed 10507.05 samples/sec   Loss 9.1462   LearningRate 0.3263   Epoch: 8   Global Step: 42510   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:14:30,083-Speed 10427.27 samples/sec   Loss 9.2074   LearningRate 0.3262   Epoch: 8   Global Step: 42520   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:14:37,899-Speed 10482.39 samples/sec   Loss 9.1725   LearningRate 0.3261   Epoch: 8   Global Step: 42530   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:14:45,696-Speed 10508.59 samples/sec   Loss 9.1351   LearningRate 0.3260   Epoch: 8   Global Step: 42540   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:14:53,515-Speed 10478.75 samples/sec   Loss 9.1300   LearningRate 0.3259   Epoch: 8   Global Step: 42550   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:15:01,332-Speed 10480.89 samples/sec   Loss 9.1124   LearningRate 0.3258   Epoch: 8   Global Step: 42560   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:15:09,162-Speed 10463.90 samples/sec   Loss 9.1041   LearningRate 0.3257   Epoch: 8   Global Step: 42570   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:15:16,963-Speed 10503.60 samples/sec   Loss 9.1501   LearningRate 0.3256   Epoch: 8   Global Step: 42580   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:15:24,782-Speed 10478.32 samples/sec   Loss 9.1397   LearningRate 0.3255   Epoch: 8   Global Step: 42590   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:15:32,573-Speed 10516.55 samples/sec   Loss 9.0032   LearningRate 0.3254   Epoch: 8   Global Step: 42600   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:15:40,358-Speed 10523.62 samples/sec   Loss 9.1504   LearningRate 0.3253   Epoch: 8   Global Step: 42610   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:15:48,166-Speed 10494.23 samples/sec   Loss 9.0733   LearningRate 0.3252   Epoch: 8   Global Step: 42620   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:15:55,968-Speed 10501.26 samples/sec   Loss 9.1208   LearningRate 0.3251   Epoch: 8   Global Step: 42630   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:16:03,799-Speed 10461.68 samples/sec   Loss 9.1880   LearningRate 0.3249   Epoch: 8   Global Step: 42640   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:16:11,655-Speed 10429.34 samples/sec   Loss 9.1665   LearningRate 0.3248   Epoch: 8   Global Step: 42650   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:16:19,480-Speed 10470.49 samples/sec   Loss 9.1542   LearningRate 0.3247   Epoch: 8   Global Step: 42660   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:16:27,274-Speed 10511.44 samples/sec   Loss 9.1040   LearningRate 0.3246   Epoch: 8   Global Step: 42670   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:16:35,119-Speed 10443.26 samples/sec   Loss 9.1426   LearningRate 0.3245   Epoch: 8   Global Step: 42680   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:16:42,968-Speed 10438.39 samples/sec   Loss 9.1364   LearningRate 0.3244   Epoch: 8   Global Step: 42690   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:16:50,762-Speed 10513.35 samples/sec   Loss 9.0640   LearningRate 0.3243   Epoch: 8   Global Step: 42700   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:16:58,579-Speed 10481.51 samples/sec   Loss 9.0739   LearningRate 0.3242   Epoch: 8   Global Step: 42710   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:17:06,389-Speed 10489.60 samples/sec   Loss 9.0950   LearningRate 0.3241   Epoch: 8   Global Step: 42720   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:17:14,202-Speed 10486.42 samples/sec   Loss 9.0884   LearningRate 0.3240   Epoch: 8   Global Step: 42730   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:17:22,017-Speed 10484.37 samples/sec   Loss 9.1525   LearningRate 0.3239   Epoch: 8   Global Step: 42740   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:17:29,868-Speed 10435.87 samples/sec   Loss 9.1920   LearningRate 0.3238   Epoch: 8   Global Step: 42750   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:17:37,683-Speed 10484.03 samples/sec   Loss 9.1426   LearningRate 0.3237   Epoch: 8   Global Step: 42760   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:17:45,486-Speed 10499.31 samples/sec   Loss 9.1264   LearningRate 0.3236   Epoch: 8   Global Step: 42770   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:17:53,275-Speed 10519.71 samples/sec   Loss 9.1250   LearningRate 0.3235   Epoch: 8   Global Step: 42780   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:18:01,065-Speed 10516.99 samples/sec   Loss 9.1213   LearningRate 0.3234   Epoch: 8   Global Step: 42790   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:18:08,859-Speed 10512.45 samples/sec   Loss 9.0517   LearningRate 0.3232   Epoch: 8   Global Step: 42800   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:18:16,649-Speed 10518.05 samples/sec   Loss 9.0808   LearningRate 0.3231   Epoch: 8   Global Step: 42810   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:18:24,462-Speed 10486.83 samples/sec   Loss 9.0567   LearningRate 0.3230   Epoch: 8   Global Step: 42820   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:18:32,249-Speed 10521.16 samples/sec   Loss 9.0532   LearningRate 0.3229   Epoch: 8   Global Step: 42830   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:18:40,043-Speed 10511.79 samples/sec   Loss 9.0840   LearningRate 0.3228   Epoch: 8   Global Step: 42840   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:18:47,821-Speed 10533.37 samples/sec   Loss 8.9961   LearningRate 0.3227   Epoch: 8   Global Step: 42850   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:18:55,622-Speed 10503.82 samples/sec   Loss 9.0124   LearningRate 0.3226   Epoch: 8   Global Step: 42860   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:19:03,446-Speed 10471.52 samples/sec   Loss 9.1017   LearningRate 0.3225   Epoch: 8   Global Step: 42870   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:19:11,246-Speed 10504.47 samples/sec   Loss 9.1506   LearningRate 0.3224   Epoch: 8   Global Step: 42880   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:19:19,041-Speed 10511.06 samples/sec   Loss 9.0850   LearningRate 0.3223   Epoch: 8   Global Step: 42890   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:19:26,838-Speed 10508.34 samples/sec   Loss 9.1351   LearningRate 0.3222   Epoch: 8   Global Step: 42900   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:19:34,655-Speed 10481.57 samples/sec   Loss 9.0995   LearningRate 0.3221   Epoch: 8   Global Step: 42910   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:19:42,463-Speed 10493.28 samples/sec   Loss 9.1359   LearningRate 0.3220   Epoch: 8   Global Step: 42920   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:19:50,289-Speed 10468.72 samples/sec   Loss 9.1530   LearningRate 0.3219   Epoch: 8   Global Step: 42930   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:19:58,082-Speed 10513.34 samples/sec   Loss 9.0082   LearningRate 0.3218   Epoch: 8   Global Step: 42940   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:20:05,902-Speed 10476.83 samples/sec   Loss 9.1310   LearningRate 0.3217   Epoch: 8   Global Step: 42950   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:20:13,706-Speed 10498.68 samples/sec   Loss 9.0920   LearningRate 0.3215   Epoch: 8   Global Step: 42960   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:20:21,535-Speed 10465.16 samples/sec   Loss 9.0871   LearningRate 0.3214   Epoch: 8   Global Step: 42970   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:20:29,323-Speed 10519.75 samples/sec   Loss 9.0819   LearningRate 0.3213   Epoch: 8   Global Step: 42980   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:20:37,136-Speed 10486.71 samples/sec   Loss 9.0563   LearningRate 0.3212   Epoch: 8   Global Step: 42990   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:20:44,935-Speed 10504.64 samples/sec   Loss 9.0769   LearningRate 0.3211   Epoch: 8   Global Step: 43000   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:20:52,753-Speed 10481.70 samples/sec   Loss 9.0512   LearningRate 0.3210   Epoch: 8   Global Step: 43010   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:21:00,582-Speed 10465.73 samples/sec   Loss 8.9861   LearningRate 0.3209   Epoch: 8   Global Step: 43020   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:21:08,385-Speed 10499.87 samples/sec   Loss 9.0861   LearningRate 0.3208   Epoch: 8   Global Step: 43030   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:21:16,183-Speed 10506.75 samples/sec   Loss 9.0396   LearningRate 0.3207   Epoch: 8   Global Step: 43040   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:21:23,977-Speed 10512.50 samples/sec   Loss 8.9851   LearningRate 0.3206   Epoch: 8   Global Step: 43050   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:21:31,790-Speed 10487.50 samples/sec   Loss 9.0632   LearningRate 0.3205   Epoch: 8   Global Step: 43060   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:21:39,590-Speed 10503.63 samples/sec   Loss 9.0188   LearningRate 0.3204   Epoch: 8   Global Step: 43070   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:21:47,421-Speed 10462.38 samples/sec   Loss 9.1049   LearningRate 0.3203   Epoch: 8   Global Step: 43080   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:21:55,280-Speed 10424.71 samples/sec   Loss 9.0591   LearningRate 0.3202   Epoch: 8   Global Step: 43090   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:22:03,076-Speed 10508.91 samples/sec   Loss 9.1228   LearningRate 0.3201   Epoch: 8   Global Step: 43100   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:22:10,908-Speed 10461.25 samples/sec   Loss 9.0550   LearningRate 0.3200   Epoch: 8   Global Step: 43110   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:22:18,720-Speed 10487.99 samples/sec   Loss 9.0252   LearningRate 0.3199   Epoch: 8   Global Step: 43120   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:22:26,526-Speed 10495.95 samples/sec   Loss 9.0529   LearningRate 0.3197   Epoch: 8   Global Step: 43130   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:22:34,332-Speed 10496.11 samples/sec   Loss 9.0564   LearningRate 0.3196   Epoch: 8   Global Step: 43140   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:22:42,159-Speed 10467.46 samples/sec   Loss 9.0802   LearningRate 0.3195   Epoch: 8   Global Step: 43150   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:22:49,965-Speed 10495.39 samples/sec   Loss 9.0454   LearningRate 0.3194   Epoch: 8   Global Step: 43160   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:22:57,824-Speed 10425.97 samples/sec   Loss 9.0592   LearningRate 0.3193   Epoch: 8   Global Step: 43170   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:23:05,624-Speed 10504.16 samples/sec   Loss 9.1117   LearningRate 0.3192   Epoch: 8   Global Step: 43180   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:23:13,429-Speed 10497.50 samples/sec   Loss 8.9973   LearningRate 0.3191   Epoch: 8   Global Step: 43190   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:23:21,212-Speed 10527.76 samples/sec   Loss 9.0048   LearningRate 0.3190   Epoch: 8   Global Step: 43200   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:23:28,992-Speed 10531.24 samples/sec   Loss 9.0908   LearningRate 0.3189   Epoch: 8   Global Step: 43210   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:23:36,789-Speed 10507.79 samples/sec   Loss 9.0160   LearningRate 0.3188   Epoch: 8   Global Step: 43220   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:23:44,619-Speed 10463.06 samples/sec   Loss 9.0205   LearningRate 0.3187   Epoch: 8   Global Step: 43230   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:23:52,408-Speed 10519.17 samples/sec   Loss 8.9887   LearningRate 0.3186   Epoch: 8   Global Step: 43240   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:24:00,212-Speed 10499.99 samples/sec   Loss 8.9905   LearningRate 0.3185   Epoch: 8   Global Step: 43250   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:24:08,011-Speed 10505.15 samples/sec   Loss 9.0025   LearningRate 0.3184   Epoch: 8   Global Step: 43260   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:24:15,814-Speed 10499.59 samples/sec   Loss 9.0233   LearningRate 0.3183   Epoch: 8   Global Step: 43270   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:24:23,613-Speed 10505.83 samples/sec   Loss 9.0052   LearningRate 0.3182   Epoch: 8   Global Step: 43280   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:24:31,528-Speed 10351.45 samples/sec   Loss 8.9746   LearningRate 0.3181   Epoch: 8   Global Step: 43290   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:24:39,324-Speed 10510.56 samples/sec   Loss 9.0239   LearningRate 0.3180   Epoch: 8   Global Step: 43300   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:24:47,104-Speed 10530.31 samples/sec   Loss 9.0224   LearningRate 0.3179   Epoch: 8   Global Step: 43310   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:24:54,884-Speed 10530.92 samples/sec   Loss 8.9650   LearningRate 0.3177   Epoch: 8   Global Step: 43320   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:25:02,705-Speed 10475.09 samples/sec   Loss 9.0383   LearningRate 0.3176   Epoch: 8   Global Step: 43330   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:25:10,495-Speed 10517.75 samples/sec   Loss 9.0405   LearningRate 0.3175   Epoch: 8   Global Step: 43340   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:25:18,309-Speed 10485.36 samples/sec   Loss 9.0684   LearningRate 0.3174   Epoch: 8   Global Step: 43350   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:25:26,167-Speed 10425.22 samples/sec   Loss 9.0526   LearningRate 0.3173   Epoch: 8   Global Step: 43360   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:25:33,995-Speed 10472.86 samples/sec   Loss 9.0885   LearningRate 0.3172   Epoch: 8   Global Step: 43370   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:25:41,784-Speed 10523.29 samples/sec   Loss 9.0778   LearningRate 0.3171   Epoch: 8   Global Step: 43380   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:25:49,590-Speed 10496.31 samples/sec   Loss 9.0146   LearningRate 0.3170   Epoch: 8   Global Step: 43390   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:25:57,401-Speed 10490.93 samples/sec   Loss 9.0147   LearningRate 0.3169   Epoch: 8   Global Step: 43400   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:26:05,208-Speed 10495.61 samples/sec   Loss 9.0747   LearningRate 0.3168   Epoch: 8   Global Step: 43410   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:26:13,019-Speed 10489.19 samples/sec   Loss 8.9830   LearningRate 0.3167   Epoch: 8   Global Step: 43420   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:26:20,830-Speed 10489.62 samples/sec   Loss 8.9040   LearningRate 0.3166   Epoch: 8   Global Step: 43430   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:26:28,651-Speed 10476.81 samples/sec   Loss 8.9492   LearningRate 0.3165   Epoch: 8   Global Step: 43440   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:26:36,469-Speed 10480.06 samples/sec   Loss 8.9776   LearningRate 0.3164   Epoch: 8   Global Step: 43450   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:26:44,299-Speed 10463.94 samples/sec   Loss 9.0190   LearningRate 0.3163   Epoch: 8   Global Step: 43460   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:26:52,077-Speed 10533.42 samples/sec   Loss 9.0054   LearningRate 0.3162   Epoch: 8   Global Step: 43470   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:26:59,865-Speed 10520.22 samples/sec   Loss 9.0613   LearningRate 0.3161   Epoch: 8   Global Step: 43480   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:27:07,681-Speed 10482.39 samples/sec   Loss 8.9538   LearningRate 0.3160   Epoch: 8   Global Step: 43490   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:27:15,506-Speed 10470.22 samples/sec   Loss 8.9145   LearningRate 0.3159   Epoch: 8   Global Step: 43500   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:27:23,346-Speed 10450.30 samples/sec   Loss 8.9816   LearningRate 0.3157   Epoch: 8   Global Step: 43510   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:27:31,159-Speed 10487.37 samples/sec   Loss 9.0055   LearningRate 0.3156   Epoch: 8   Global Step: 43520   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:27:38,980-Speed 10475.13 samples/sec   Loss 8.9379   LearningRate 0.3155   Epoch: 8   Global Step: 43530   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:27:46,762-Speed 10528.79 samples/sec   Loss 8.9819   LearningRate 0.3154   Epoch: 8   Global Step: 43540   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:27:54,583-Speed 10475.82 samples/sec   Loss 8.9619   LearningRate 0.3153   Epoch: 8   Global Step: 43550   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:28:02,382-Speed 10504.20 samples/sec   Loss 8.9701   LearningRate 0.3152   Epoch: 8   Global Step: 43560   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:28:10,218-Speed 10460.19 samples/sec   Loss 8.9669   LearningRate 0.3151   Epoch: 8   Global Step: 43570   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:28:18,072-Speed 10432.07 samples/sec   Loss 9.0432   LearningRate 0.3150   Epoch: 8   Global Step: 43580   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:28:25,909-Speed 10455.04 samples/sec   Loss 8.9812   LearningRate 0.3149   Epoch: 8   Global Step: 43590   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:28:33,731-Speed 10473.84 samples/sec   Loss 8.9734   LearningRate 0.3148   Epoch: 8   Global Step: 43600   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:28:41,567-Speed 10457.05 samples/sec   Loss 8.9941   LearningRate 0.3147   Epoch: 8   Global Step: 43610   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:28:49,372-Speed 10497.12 samples/sec   Loss 9.0099   LearningRate 0.3146   Epoch: 8   Global Step: 43620   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:28:57,196-Speed 10471.66 samples/sec   Loss 8.9999   LearningRate 0.3145   Epoch: 8   Global Step: 43630   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:29:05,041-Speed 10446.30 samples/sec   Loss 8.9753   LearningRate 0.3144   Epoch: 8   Global Step: 43640   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:29:12,832-Speed 10515.58 samples/sec   Loss 8.9699   LearningRate 0.3143   Epoch: 8   Global Step: 43650   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:29:20,644-Speed 10488.70 samples/sec   Loss 8.9422   LearningRate 0.3142   Epoch: 8   Global Step: 43660   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:29:28,429-Speed 10524.31 samples/sec   Loss 8.9560   LearningRate 0.3141   Epoch: 8   Global Step: 43670   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:29:36,245-Speed 10482.20 samples/sec   Loss 8.9898   LearningRate 0.3140   Epoch: 8   Global Step: 43680   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:29:44,041-Speed 10510.30 samples/sec   Loss 9.0299   LearningRate 0.3139   Epoch: 8   Global Step: 43690   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:29:51,855-Speed 10484.44 samples/sec   Loss 8.9614   LearningRate 0.3138   Epoch: 8   Global Step: 43700   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:29:59,688-Speed 10459.00 samples/sec   Loss 9.0419   LearningRate 0.3137   Epoch: 8   Global Step: 43710   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:30:07,514-Speed 10469.73 samples/sec   Loss 8.9727   LearningRate 0.3135   Epoch: 8   Global Step: 43720   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:30:15,324-Speed 10491.11 samples/sec   Loss 8.9804   LearningRate 0.3134   Epoch: 8   Global Step: 43730   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:30:23,119-Speed 10510.60 samples/sec   Loss 8.9344   LearningRate 0.3133   Epoch: 8   Global Step: 43740   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:30:30,904-Speed 10524.68 samples/sec   Loss 8.9397   LearningRate 0.3132   Epoch: 8   Global Step: 43750   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:30:38,732-Speed 10466.41 samples/sec   Loss 8.9993   LearningRate 0.3131   Epoch: 8   Global Step: 43760   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:30:46,560-Speed 10466.49 samples/sec   Loss 8.9938   LearningRate 0.3130   Epoch: 8   Global Step: 43770   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:30:54,359-Speed 10505.35 samples/sec   Loss 8.9706   LearningRate 0.3129   Epoch: 8   Global Step: 43780   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:31:02,158-Speed 10503.98 samples/sec   Loss 8.9601   LearningRate 0.3128   Epoch: 8   Global Step: 43790   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:31:09,987-Speed 10465.39 samples/sec   Loss 9.0125   LearningRate 0.3127   Epoch: 8   Global Step: 43800   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:31:17,773-Speed 10524.49 samples/sec   Loss 8.9561   LearningRate 0.3126   Epoch: 8   Global Step: 43810   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:31:25,581-Speed 10491.89 samples/sec   Loss 9.0057   LearningRate 0.3125   Epoch: 8   Global Step: 43820   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:31:33,389-Speed 10493.37 samples/sec   Loss 8.9722   LearningRate 0.3124   Epoch: 8   Global Step: 43830   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:31:41,211-Speed 10474.02 samples/sec   Loss 8.9549   LearningRate 0.3123   Epoch: 8   Global Step: 43840   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:31:49,007-Speed 10510.08 samples/sec   Loss 8.9216   LearningRate 0.3122   Epoch: 8   Global Step: 43850   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:31:56,823-Speed 10482.09 samples/sec   Loss 8.9411   LearningRate 0.3121   Epoch: 8   Global Step: 43860   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:32:04,661-Speed 10453.15 samples/sec   Loss 8.9631   LearningRate 0.3120   Epoch: 8   Global Step: 43870   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:32:12,449-Speed 10519.96 samples/sec   Loss 8.9809   LearningRate 0.3119   Epoch: 8   Global Step: 43880   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:32:20,279-Speed 10464.78 samples/sec   Loss 8.8975   LearningRate 0.3118   Epoch: 8   Global Step: 43890   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:32:28,073-Speed 10512.13 samples/sec   Loss 9.0341   LearningRate 0.3117   Epoch: 8   Global Step: 43900   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:32:35,892-Speed 10477.81 samples/sec   Loss 8.9256   LearningRate 0.3116   Epoch: 8   Global Step: 43910   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:32:43,710-Speed 10480.59 samples/sec   Loss 8.9279   LearningRate 0.3115   Epoch: 8   Global Step: 43920   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:32:51,498-Speed 10520.49 samples/sec   Loss 8.9590   LearningRate 0.3114   Epoch: 8   Global Step: 43930   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:32:59,324-Speed 10469.38 samples/sec   Loss 8.8729   LearningRate 0.3113   Epoch: 8   Global Step: 43940   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:33:07,156-Speed 10461.62 samples/sec   Loss 8.9124   LearningRate 0.3111   Epoch: 8   Global Step: 43950   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:33:14,988-Speed 10461.20 samples/sec   Loss 8.8993   LearningRate 0.3110   Epoch: 8   Global Step: 43960   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:33:22,799-Speed 10489.59 samples/sec   Loss 8.9241   LearningRate 0.3109   Epoch: 8   Global Step: 43970   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:33:30,608-Speed 10490.74 samples/sec   Loss 8.9192   LearningRate 0.3108   Epoch: 8   Global Step: 43980   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:33:38,411-Speed 10500.56 samples/sec   Loss 8.8960   LearningRate 0.3107   Epoch: 8   Global Step: 43990   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:33:46,235-Speed 10472.44 samples/sec   Loss 8.8917   LearningRate 0.3106   Epoch: 8   Global Step: 44000   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:33:54,048-Speed 10486.50 samples/sec   Loss 8.9398   LearningRate 0.3105   Epoch: 8   Global Step: 44010   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:34:01,890-Speed 10448.12 samples/sec   Loss 8.9395   LearningRate 0.3104   Epoch: 8   Global Step: 44020   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:34:09,713-Speed 10472.68 samples/sec   Loss 8.9802   LearningRate 0.3103   Epoch: 8   Global Step: 44030   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:34:17,551-Speed 10453.35 samples/sec   Loss 8.9367   LearningRate 0.3102   Epoch: 8   Global Step: 44040   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:34:25,360-Speed 10492.86 samples/sec   Loss 8.9359   LearningRate 0.3101   Epoch: 8   Global Step: 44050   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:34:33,167-Speed 10494.49 samples/sec   Loss 8.8827   LearningRate 0.3100   Epoch: 8   Global Step: 44060   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:34:40,962-Speed 10509.66 samples/sec   Loss 8.9269   LearningRate 0.3099   Epoch: 8   Global Step: 44070   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:34:48,765-Speed 10500.58 samples/sec   Loss 8.8933   LearningRate 0.3098   Epoch: 8   Global Step: 44080   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:34:56,564-Speed 10506.79 samples/sec   Loss 9.0110   LearningRate 0.3097   Epoch: 8   Global Step: 44090   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:35:04,377-Speed 10485.81 samples/sec   Loss 8.9812   LearningRate 0.3096   Epoch: 8   Global Step: 44100   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:35:12,166-Speed 10525.97 samples/sec   Loss 8.8977   LearningRate 0.3095   Epoch: 8   Global Step: 44110   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:35:19,965-Speed 10505.38 samples/sec   Loss 8.8692   LearningRate 0.3094   Epoch: 8   Global Step: 44120   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:35:27,833-Speed 10413.34 samples/sec   Loss 8.9036   LearningRate 0.3093   Epoch: 8   Global Step: 44130   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:35:35,631-Speed 10506.36 samples/sec   Loss 8.8641   LearningRate 0.3092   Epoch: 8   Global Step: 44140   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:35:43,433-Speed 10501.47 samples/sec   Loss 8.9184   LearningRate 0.3091   Epoch: 8   Global Step: 44150   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:35:51,274-Speed 10449.52 samples/sec   Loss 8.9109   LearningRate 0.3090   Epoch: 8   Global Step: 44160   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:35:59,078-Speed 10497.99 samples/sec   Loss 8.8658   LearningRate 0.3089   Epoch: 8   Global Step: 44170   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:36:06,889-Speed 10489.02 samples/sec   Loss 8.8643   LearningRate 0.3088   Epoch: 8   Global Step: 44180   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:36:14,670-Speed 10530.17 samples/sec   Loss 8.8954   LearningRate 0.3087   Epoch: 8   Global Step: 44190   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:36:22,468-Speed 10506.76 samples/sec   Loss 8.9117   LearningRate 0.3085   Epoch: 8   Global Step: 44200   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:36:30,280-Speed 10488.71 samples/sec   Loss 8.9546   LearningRate 0.3084   Epoch: 8   Global Step: 44210   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:36:38,102-Speed 10473.21 samples/sec   Loss 8.9284   LearningRate 0.3083   Epoch: 8   Global Step: 44220   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:36:45,924-Speed 10474.24 samples/sec   Loss 8.8798   LearningRate 0.3082   Epoch: 8   Global Step: 44230   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:36:53,747-Speed 10473.54 samples/sec   Loss 8.8467   LearningRate 0.3081   Epoch: 8   Global Step: 44240   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:37:01,589-Speed 10447.90 samples/sec   Loss 8.9518   LearningRate 0.3080   Epoch: 8   Global Step: 44250   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:37:09,388-Speed 10504.26 samples/sec   Loss 8.9151   LearningRate 0.3079   Epoch: 8   Global Step: 44260   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:37:17,207-Speed 10478.85 samples/sec   Loss 8.9363   LearningRate 0.3078   Epoch: 8   Global Step: 44270   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:37:25,043-Speed 10455.61 samples/sec   Loss 8.9199   LearningRate 0.3077   Epoch: 8   Global Step: 44280   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:37:32,852-Speed 10492.34 samples/sec   Loss 8.9505   LearningRate 0.3076   Epoch: 8   Global Step: 44290   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:37:40,652-Speed 10503.24 samples/sec   Loss 8.8771   LearningRate 0.3075   Epoch: 8   Global Step: 44300   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:37:48,452-Speed 10508.97 samples/sec   Loss 8.9301   LearningRate 0.3074   Epoch: 8   Global Step: 44310   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:37:56,279-Speed 10468.13 samples/sec   Loss 8.9259   LearningRate 0.3073   Epoch: 8   Global Step: 44320   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:38:04,095-Speed 10483.80 samples/sec   Loss 8.8316   LearningRate 0.3072   Epoch: 8   Global Step: 44330   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:38:11,889-Speed 10510.71 samples/sec   Loss 8.9026   LearningRate 0.3071   Epoch: 8   Global Step: 44340   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:38:19,680-Speed 10516.54 samples/sec   Loss 8.8511   LearningRate 0.3070   Epoch: 8   Global Step: 44350   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:38:27,482-Speed 10501.72 samples/sec   Loss 8.9200   LearningRate 0.3069   Epoch: 8   Global Step: 44360   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:38:35,290-Speed 10493.59 samples/sec   Loss 8.8702   LearningRate 0.3068   Epoch: 8   Global Step: 44370   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:38:43,087-Speed 10507.55 samples/sec   Loss 8.8487   LearningRate 0.3067   Epoch: 8   Global Step: 44380   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:38:50,860-Speed 10540.15 samples/sec   Loss 8.8744   LearningRate 0.3066   Epoch: 8   Global Step: 44390   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:38:58,651-Speed 10515.66 samples/sec   Loss 8.8177   LearningRate 0.3065   Epoch: 8   Global Step: 44400   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:39:06,446-Speed 10512.70 samples/sec   Loss 8.7859   LearningRate 0.3064   Epoch: 8   Global Step: 44410   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:39:14,251-Speed 10496.37 samples/sec   Loss 8.8856   LearningRate 0.3063   Epoch: 8   Global Step: 44420   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:39:22,084-Speed 10459.16 samples/sec   Loss 8.8864   LearningRate 0.3062   Epoch: 8   Global Step: 44430   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:39:29,909-Speed 10471.64 samples/sec   Loss 8.8446   LearningRate 0.3061   Epoch: 8   Global Step: 44440   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:39:37,779-Speed 10411.02 samples/sec   Loss 8.8035   LearningRate 0.3060   Epoch: 8   Global Step: 44450   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:39:45,593-Speed 10486.57 samples/sec   Loss 8.8910   LearningRate 0.3059   Epoch: 8   Global Step: 44460   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:39:53,383-Speed 10517.31 samples/sec   Loss 8.8419   LearningRate 0.3058   Epoch: 8   Global Step: 44470   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:40:01,202-Speed 10479.08 samples/sec   Loss 8.8399   LearningRate 0.3057   Epoch: 8   Global Step: 44480   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:40:08,996-Speed 10512.63 samples/sec   Loss 8.8835   LearningRate 0.3055   Epoch: 8   Global Step: 44490   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:40:16,796-Speed 10503.38 samples/sec   Loss 8.8618   LearningRate 0.3054   Epoch: 8   Global Step: 44500   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:40:24,616-Speed 10477.47 samples/sec   Loss 8.8933   LearningRate 0.3053   Epoch: 8   Global Step: 44510   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:40:32,403-Speed 10521.32 samples/sec   Loss 8.8586   LearningRate 0.3052   Epoch: 8   Global Step: 44520   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:40:40,213-Speed 10490.36 samples/sec   Loss 8.8507   LearningRate 0.3051   Epoch: 8   Global Step: 44530   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:40:47,997-Speed 10525.67 samples/sec   Loss 8.8371   LearningRate 0.3050   Epoch: 8   Global Step: 44540   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:40:55,787-Speed 10518.26 samples/sec   Loss 8.8452   LearningRate 0.3049   Epoch: 8   Global Step: 44550   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:41:03,587-Speed 10507.52 samples/sec   Loss 8.8939   LearningRate 0.3048   Epoch: 8   Global Step: 44560   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:41:11,372-Speed 10525.03 samples/sec   Loss 8.9037   LearningRate 0.3047   Epoch: 8   Global Step: 44570   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:41:19,203-Speed 10461.72 samples/sec   Loss 8.7875   LearningRate 0.3046   Epoch: 8   Global Step: 44580   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:41:27,022-Speed 10478.76 samples/sec   Loss 8.8655   LearningRate 0.3045   Epoch: 8   Global Step: 44590   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:41:34,856-Speed 10458.26 samples/sec   Loss 8.7833   LearningRate 0.3044   Epoch: 8   Global Step: 44600   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:41:42,671-Speed 10483.85 samples/sec   Loss 8.8236   LearningRate 0.3043   Epoch: 8   Global Step: 44610   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:41:50,463-Speed 10514.89 samples/sec   Loss 8.8029   LearningRate 0.3042   Epoch: 8   Global Step: 44620   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:41:58,290-Speed 10467.76 samples/sec   Loss 8.8704   LearningRate 0.3041   Epoch: 8   Global Step: 44630   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:42:06,107-Speed 10486.65 samples/sec   Loss 8.7706   LearningRate 0.3040   Epoch: 8   Global Step: 44640   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:42:13,892-Speed 10524.85 samples/sec   Loss 8.8291   LearningRate 0.3039   Epoch: 8   Global Step: 44650   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:42:21,700-Speed 10492.31 samples/sec   Loss 8.8205   LearningRate 0.3038   Epoch: 8   Global Step: 44660   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:42:29,502-Speed 10500.99 samples/sec   Loss 8.8692   LearningRate 0.3037   Epoch: 8   Global Step: 44670   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:42:37,308-Speed 10496.35 samples/sec   Loss 8.7957   LearningRate 0.3036   Epoch: 8   Global Step: 44680   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:42:45,087-Speed 10532.14 samples/sec   Loss 8.8003   LearningRate 0.3035   Epoch: 8   Global Step: 44690   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:42:52,862-Speed 10538.09 samples/sec   Loss 8.8780   LearningRate 0.3034   Epoch: 8   Global Step: 44700   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:43:00,673-Speed 10488.13 samples/sec   Loss 8.8349   LearningRate 0.3033   Epoch: 8   Global Step: 44710   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:43:08,514-Speed 10448.97 samples/sec   Loss 8.8399   LearningRate 0.3032   Epoch: 8   Global Step: 44720   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:43:16,392-Speed 10400.17 samples/sec   Loss 8.8153   LearningRate 0.3031   Epoch: 8   Global Step: 44730   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:43:24,236-Speed 10445.37 samples/sec   Loss 8.8543   LearningRate 0.3030   Epoch: 8   Global Step: 44740   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:43:32,118-Speed 10395.19 samples/sec   Loss 8.8003   LearningRate 0.3029   Epoch: 8   Global Step: 44750   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:43:39,956-Speed 10453.94 samples/sec   Loss 8.8368   LearningRate 0.3028   Epoch: 8   Global Step: 44760   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:43:47,759-Speed 10500.34 samples/sec   Loss 8.7464   LearningRate 0.3027   Epoch: 8   Global Step: 44770   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:43:55,559-Speed 10505.02 samples/sec   Loss 8.8184   LearningRate 0.3026   Epoch: 8   Global Step: 44780   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:44:03,357-Speed 10507.57 samples/sec   Loss 8.7980   LearningRate 0.3025   Epoch: 8   Global Step: 44790   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:44:11,177-Speed 10477.69 samples/sec   Loss 8.7508   LearningRate 0.3024   Epoch: 8   Global Step: 44800   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:44:19,046-Speed 10412.27 samples/sec   Loss 8.7906   LearningRate 0.3023   Epoch: 8   Global Step: 44810   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:44:26,868-Speed 10473.91 samples/sec   Loss 8.7794   LearningRate 0.3021   Epoch: 8   Global Step: 44820   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:44:34,677-Speed 10491.69 samples/sec   Loss 8.8412   LearningRate 0.3020   Epoch: 8   Global Step: 44830   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:44:42,486-Speed 10492.99 samples/sec   Loss 8.8017   LearningRate 0.3019   Epoch: 8   Global Step: 44840   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:44:50,303-Speed 10480.48 samples/sec   Loss 8.8007   LearningRate 0.3018   Epoch: 8   Global Step: 44850   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:44:58,090-Speed 10522.12 samples/sec   Loss 8.8219   LearningRate 0.3017   Epoch: 8   Global Step: 44860   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:45:05,899-Speed 10491.74 samples/sec   Loss 8.7458   LearningRate 0.3016   Epoch: 8   Global Step: 44870   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:45:13,726-Speed 10467.26 samples/sec   Loss 8.8321   LearningRate 0.3015   Epoch: 8   Global Step: 44880   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:45:21,602-Speed 10402.60 samples/sec   Loss 8.7848   LearningRate 0.3014   Epoch: 8   Global Step: 44890   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:45:29,409-Speed 10494.90 samples/sec   Loss 8.7697   LearningRate 0.3013   Epoch: 8   Global Step: 44900   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:45:37,276-Speed 10413.36 samples/sec   Loss 8.7732   LearningRate 0.3012   Epoch: 8   Global Step: 44910   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:45:45,078-Speed 10502.63 samples/sec   Loss 8.8279   LearningRate 0.3011   Epoch: 8   Global Step: 44920   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:45:52,891-Speed 10486.39 samples/sec   Loss 8.8392   LearningRate 0.3010   Epoch: 8   Global Step: 44930   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:46:00,691-Speed 10503.31 samples/sec   Loss 8.8193   LearningRate 0.3009   Epoch: 8   Global Step: 44940   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:46:08,493-Speed 10502.21 samples/sec   Loss 8.7208   LearningRate 0.3008   Epoch: 8   Global Step: 44950   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:46:16,306-Speed 10486.76 samples/sec   Loss 8.7848   LearningRate 0.3007   Epoch: 8   Global Step: 44960   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:46:24,100-Speed 10512.17 samples/sec   Loss 8.7891   LearningRate 0.3006   Epoch: 8   Global Step: 44970   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:46:31,894-Speed 10511.87 samples/sec   Loss 8.7636   LearningRate 0.3005   Epoch: 8   Global Step: 44980   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:46:39,713-Speed 10478.57 samples/sec   Loss 8.7353   LearningRate 0.3004   Epoch: 8   Global Step: 44990   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:46:47,504-Speed 10516.37 samples/sec   Loss 8.7817   LearningRate 0.3003   Epoch: 8   Global Step: 45000   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:46:55,302-Speed 10507.08 samples/sec   Loss 8.7856   LearningRate 0.3002   Epoch: 8   Global Step: 45010   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:47:03,121-Speed 10477.73 samples/sec   Loss 8.7694   LearningRate 0.3001   Epoch: 8   Global Step: 45020   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:47:10,909-Speed 10519.91 samples/sec   Loss 8.7875   LearningRate 0.3000   Epoch: 8   Global Step: 45030   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:47:18,739-Speed 10463.40 samples/sec   Loss 8.7574   LearningRate 0.2999   Epoch: 8   Global Step: 45040   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:47:26,536-Speed 10508.19 samples/sec   Loss 8.7551   LearningRate 0.2998   Epoch: 8   Global Step: 45050   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:47:34,347-Speed 10489.69 samples/sec   Loss 8.7835   LearningRate 0.2997   Epoch: 8   Global Step: 45060   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:47:42,144-Speed 10507.42 samples/sec   Loss 8.7988   LearningRate 0.2996   Epoch: 8   Global Step: 45070   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:47:49,934-Speed 10518.25 samples/sec   Loss 8.7894   LearningRate 0.2995   Epoch: 8   Global Step: 45080   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:47:57,744-Speed 10489.66 samples/sec   Loss 8.7544   LearningRate 0.2994   Epoch: 8   Global Step: 45090   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:48:05,554-Speed 10490.65 samples/sec   Loss 8.7296   LearningRate 0.2993   Epoch: 8   Global Step: 45100   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:48:13,414-Speed 10425.10 samples/sec   Loss 8.8050   LearningRate 0.2992   Epoch: 8   Global Step: 45110   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:48:21,216-Speed 10501.42 samples/sec   Loss 8.7038   LearningRate 0.2991   Epoch: 8   Global Step: 45120   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:48:29,003-Speed 10521.17 samples/sec   Loss 8.7657   LearningRate 0.2990   Epoch: 8   Global Step: 45130   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:48:36,781-Speed 10533.21 samples/sec   Loss 8.7686   LearningRate 0.2989   Epoch: 8   Global Step: 45140   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:48:44,575-Speed 10512.41 samples/sec   Loss 8.7379   LearningRate 0.2988   Epoch: 8   Global Step: 45150   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:48:52,394-Speed 10479.14 samples/sec   Loss 8.7254   LearningRate 0.2987   Epoch: 8   Global Step: 45160   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:49:00,224-Speed 10463.88 samples/sec   Loss 8.7494   LearningRate 0.2986   Epoch: 8   Global Step: 45170   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:49:08,008-Speed 10524.54 samples/sec   Loss 8.7720   LearningRate 0.2985   Epoch: 8   Global Step: 45180   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:49:15,789-Speed 10530.07 samples/sec   Loss 8.7208   LearningRate 0.2984   Epoch: 8   Global Step: 45190   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:49:23,581-Speed 10515.57 samples/sec   Loss 8.8287   LearningRate 0.2983   Epoch: 8   Global Step: 45200   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:49:31,356-Speed 10537.43 samples/sec   Loss 8.7595   LearningRate 0.2982   Epoch: 8   Global Step: 45210   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:49:39,192-Speed 10456.25 samples/sec   Loss 8.6998   LearningRate 0.2981   Epoch: 8   Global Step: 45220   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:49:47,046-Speed 10431.77 samples/sec   Loss 8.7742   LearningRate 0.2980   Epoch: 8   Global Step: 45230   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:49:54,851-Speed 10496.65 samples/sec   Loss 8.7759   LearningRate 0.2979   Epoch: 8   Global Step: 45240   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:50:02,675-Speed 10472.36 samples/sec   Loss 8.7445   LearningRate 0.2978   Epoch: 8   Global Step: 45250   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:50:10,511-Speed 10455.93 samples/sec   Loss 8.7323   LearningRate 0.2976   Epoch: 8   Global Step: 45260   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:50:18,332-Speed 10476.53 samples/sec   Loss 8.7535   LearningRate 0.2975   Epoch: 8   Global Step: 45270   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:50:26,151-Speed 10477.76 samples/sec   Loss 8.7425   LearningRate 0.2974   Epoch: 8   Global Step: 45280   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:50:33,970-Speed 10479.49 samples/sec   Loss 8.8148   LearningRate 0.2973   Epoch: 8   Global Step: 45290   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:50:41,792-Speed 10474.61 samples/sec   Loss 8.7943   LearningRate 0.2972   Epoch: 8   Global Step: 45300   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:50:49,646-Speed 10430.82 samples/sec   Loss 8.7036   LearningRate 0.2971   Epoch: 8   Global Step: 45310   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:50:57,482-Speed 10455.46 samples/sec   Loss 8.7596   LearningRate 0.2970   Epoch: 8   Global Step: 45320   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:51:05,279-Speed 10507.45 samples/sec   Loss 8.6624   LearningRate 0.2969   Epoch: 8   Global Step: 45330   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:51:13,149-Speed 10411.37 samples/sec   Loss 8.7266   LearningRate 0.2968   Epoch: 8   Global Step: 45340   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:51:20,956-Speed 10493.75 samples/sec   Loss 8.7456   LearningRate 0.2967   Epoch: 8   Global Step: 45350   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:51:28,757-Speed 10504.92 samples/sec   Loss 8.7163   LearningRate 0.2966   Epoch: 8   Global Step: 45360   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:51:36,571-Speed 10486.07 samples/sec   Loss 8.7350   LearningRate 0.2965   Epoch: 8   Global Step: 45370   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:51:44,373-Speed 10501.99 samples/sec   Loss 8.7263   LearningRate 0.2964   Epoch: 8   Global Step: 45380   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:51:52,206-Speed 10459.71 samples/sec   Loss 8.7016   LearningRate 0.2963   Epoch: 8   Global Step: 45390   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:52:00,046-Speed 10449.79 samples/sec   Loss 8.6825   LearningRate 0.2962   Epoch: 8   Global Step: 45400   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:52:07,867-Speed 10475.55 samples/sec   Loss 8.7525   LearningRate 0.2961   Epoch: 8   Global Step: 45410   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:52:15,672-Speed 10497.03 samples/sec   Loss 8.7228   LearningRate 0.2960   Epoch: 8   Global Step: 45420   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:52:23,503-Speed 10462.38 samples/sec   Loss 8.7187   LearningRate 0.2959   Epoch: 8   Global Step: 45430   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:52:31,332-Speed 10464.75 samples/sec   Loss 8.6770   LearningRate 0.2958   Epoch: 8   Global Step: 45440   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:52:39,154-Speed 10474.78 samples/sec   Loss 8.6844   LearningRate 0.2957   Epoch: 8   Global Step: 45450   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:52:46,968-Speed 10486.02 samples/sec   Loss 8.6902   LearningRate 0.2956   Epoch: 8   Global Step: 45460   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:52:54,800-Speed 10466.29 samples/sec   Loss 8.7701   LearningRate 0.2955   Epoch: 8   Global Step: 45470   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:53:02,639-Speed 10451.36 samples/sec   Loss 8.6823   LearningRate 0.2954   Epoch: 8   Global Step: 45480   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:53:10,464-Speed 10470.72 samples/sec   Loss 8.7021   LearningRate 0.2953   Epoch: 8   Global Step: 45490   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:53:18,296-Speed 10466.23 samples/sec   Loss 8.7764   LearningRate 0.2952   Epoch: 8   Global Step: 45500   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:53:26,147-Speed 10436.45 samples/sec   Loss 8.6554   LearningRate 0.2951   Epoch: 8   Global Step: 45510   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:53:33,944-Speed 10506.73 samples/sec   Loss 8.6693   LearningRate 0.2950   Epoch: 8   Global Step: 45520   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:53:41,785-Speed 10450.04 samples/sec   Loss 8.7134   LearningRate 0.2949   Epoch: 8   Global Step: 45530   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:53:49,606-Speed 10476.37 samples/sec   Loss 8.7140   LearningRate 0.2948   Epoch: 8   Global Step: 45540   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:53:57,462-Speed 10428.39 samples/sec   Loss 8.7152   LearningRate 0.2947   Epoch: 8   Global Step: 45550   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:54:05,288-Speed 10469.16 samples/sec   Loss 8.6681   LearningRate 0.2946   Epoch: 8   Global Step: 45560   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:54:13,117-Speed 10464.81 samples/sec   Loss 8.7045   LearningRate 0.2945   Epoch: 8   Global Step: 45570   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:54:20,946-Speed 10465.06 samples/sec   Loss 8.6593   LearningRate 0.2944   Epoch: 8   Global Step: 45580   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:54:28,776-Speed 10463.87 samples/sec   Loss 8.6928   LearningRate 0.2943   Epoch: 8   Global Step: 45590   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:54:36,609-Speed 10459.62 samples/sec   Loss 8.6504   LearningRate 0.2942   Epoch: 8   Global Step: 45600   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:54:44,427-Speed 10479.02 samples/sec   Loss 8.6556   LearningRate 0.2941   Epoch: 8   Global Step: 45610   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:54:52,222-Speed 10511.46 samples/sec   Loss 8.6850   LearningRate 0.2940   Epoch: 8   Global Step: 45620   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:55:00,017-Speed 10510.71 samples/sec   Loss 8.6845   LearningRate 0.2939   Epoch: 8   Global Step: 45630   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:55:07,808-Speed 10515.82 samples/sec   Loss 8.7452   LearningRate 0.2938   Epoch: 8   Global Step: 45640   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:55:15,605-Speed 10508.27 samples/sec   Loss 8.7458   LearningRate 0.2937   Epoch: 8   Global Step: 45650   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:55:23,412-Speed 10494.26 samples/sec   Loss 8.6772   LearningRate 0.2936   Epoch: 8   Global Step: 45660   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:55:31,201-Speed 10518.76 samples/sec   Loss 8.6270   LearningRate 0.2935   Epoch: 8   Global Step: 45670   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:55:39,008-Speed 10495.52 samples/sec   Loss 8.6864   LearningRate 0.2934   Epoch: 8   Global Step: 45680   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:55:46,801-Speed 10512.52 samples/sec   Loss 8.6851   LearningRate 0.2933   Epoch: 8   Global Step: 45690   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:55:54,586-Speed 10525.42 samples/sec   Loss 8.6425   LearningRate 0.2932   Epoch: 8   Global Step: 45700   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:56:02,370-Speed 10525.78 samples/sec   Loss 8.6584   LearningRate 0.2931   Epoch: 8   Global Step: 45710   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:56:10,149-Speed 10532.04 samples/sec   Loss 8.6764   LearningRate 0.2930   Epoch: 8   Global Step: 45720   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:56:17,931-Speed 10526.96 samples/sec   Loss 8.6411   LearningRate 0.2929   Epoch: 8   Global Step: 45730   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:56:25,711-Speed 10535.26 samples/sec   Loss 8.6653   LearningRate 0.2928   Epoch: 8   Global Step: 45740   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:56:33,509-Speed 10506.48 samples/sec   Loss 8.5999   LearningRate 0.2927   Epoch: 8   Global Step: 45750   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:56:41,311-Speed 10501.48 samples/sec   Loss 8.6702   LearningRate 0.2926   Epoch: 8   Global Step: 45760   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:56:49,081-Speed 10544.92 samples/sec   Loss 8.7091   LearningRate 0.2925   Epoch: 8   Global Step: 45770   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:56:56,864-Speed 10527.06 samples/sec   Loss 8.6299   LearningRate 0.2924   Epoch: 8   Global Step: 45780   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:57:04,686-Speed 10475.35 samples/sec   Loss 8.6385   LearningRate 0.2923   Epoch: 8   Global Step: 45790   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:57:12,477-Speed 10516.38 samples/sec   Loss 8.7646   LearningRate 0.2922   Epoch: 8   Global Step: 45800   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:57:20,273-Speed 10508.30 samples/sec   Loss 8.7138   LearningRate 0.2921   Epoch: 8   Global Step: 45810   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:57:28,068-Speed 10509.69 samples/sec   Loss 8.6300   LearningRate 0.2920   Epoch: 8   Global Step: 45820   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:57:35,850-Speed 10529.91 samples/sec   Loss 8.6699   LearningRate 0.2919   Epoch: 8   Global Step: 45830   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:57:43,693-Speed 10446.25 samples/sec   Loss 8.6498   LearningRate 0.2918   Epoch: 8   Global Step: 45840   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:57:51,522-Speed 10463.66 samples/sec   Loss 8.6081   LearningRate 0.2917   Epoch: 8   Global Step: 45850   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-16 00:57:59,347-Speed 10475.27 samples/sec   Loss 8.6459   LearningRate 0.2916   Epoch: 8   Global Step: 45860   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:58:07,154-Speed 10500.17 samples/sec   Loss 8.5707   LearningRate 0.2915   Epoch: 8   Global Step: 45870   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:58:14,977-Speed 10473.24 samples/sec   Loss 8.6948   LearningRate 0.2914   Epoch: 8   Global Step: 45880   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:58:22,765-Speed 10520.46 samples/sec   Loss 8.6399   LearningRate 0.2913   Epoch: 8   Global Step: 45890   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:58:30,569-Speed 10498.32 samples/sec   Loss 8.6213   LearningRate 0.2912   Epoch: 8   Global Step: 45900   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:58:38,374-Speed 10496.42 samples/sec   Loss 8.6553   LearningRate 0.2911   Epoch: 8   Global Step: 45910   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:58:46,167-Speed 10514.52 samples/sec   Loss 8.6117   LearningRate 0.2910   Epoch: 8   Global Step: 45920   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:58:53,975-Speed 10493.38 samples/sec   Loss 8.6055   LearningRate 0.2909   Epoch: 8   Global Step: 45930   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:59:01,785-Speed 10489.53 samples/sec   Loss 8.6768   LearningRate 0.2908   Epoch: 8   Global Step: 45940   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:59:09,607-Speed 10474.71 samples/sec   Loss 8.6365   LearningRate 0.2907   Epoch: 8   Global Step: 45950   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:59:17,408-Speed 10502.08 samples/sec   Loss 8.7095   LearningRate 0.2906   Epoch: 8   Global Step: 45960   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:59:25,239-Speed 10462.75 samples/sec   Loss 8.6948   LearningRate 0.2905   Epoch: 8   Global Step: 45970   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:59:33,072-Speed 10459.98 samples/sec   Loss 8.6075   LearningRate 0.2904   Epoch: 8   Global Step: 45980   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 00:59:40,865-Speed 10512.98 samples/sec   Loss 8.6016   LearningRate 0.2903   Epoch: 8   Global Step: 45990   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:59:48,681-Speed 10487.40 samples/sec   Loss 8.6197   LearningRate 0.2902   Epoch: 8   Global Step: 46000   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 00:59:56,507-Speed 10468.53 samples/sec   Loss 8.5776   LearningRate 0.2901   Epoch: 8   Global Step: 46010   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:00:04,330-Speed 10473.54 samples/sec   Loss 8.6915   LearningRate 0.2900   Epoch: 8   Global Step: 46020   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:00:12,153-Speed 10473.52 samples/sec   Loss 8.6442   LearningRate 0.2899   Epoch: 8   Global Step: 46030   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:00:19,951-Speed 10506.21 samples/sec   Loss 8.5897   LearningRate 0.2898   Epoch: 8   Global Step: 46040   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:00:27,778-Speed 10467.63 samples/sec   Loss 8.6499   LearningRate 0.2897   Epoch: 8   Global Step: 46050   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:00:35,605-Speed 10468.84 samples/sec   Loss 8.5994   LearningRate 0.2896   Epoch: 8   Global Step: 46060   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:00:43,389-Speed 10525.00 samples/sec   Loss 8.6046   LearningRate 0.2895   Epoch: 8   Global Step: 46070   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:00:51,186-Speed 10508.68 samples/sec   Loss 8.5894   LearningRate 0.2894   Epoch: 8   Global Step: 46080   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:00:58,981-Speed 10510.42 samples/sec   Loss 8.5998   LearningRate 0.2893   Epoch: 8   Global Step: 46090   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 01:01:06,769-Speed 10520.87 samples/sec   Loss 8.6242   LearningRate 0.2892   Epoch: 8   Global Step: 46100   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 01:01:14,542-Speed 10542.29 samples/sec   Loss 8.6282   LearningRate 0.2891   Epoch: 8   Global Step: 46110   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 01:01:22,331-Speed 10519.46 samples/sec   Loss 8.5514   LearningRate 0.2890   Epoch: 8   Global Step: 46120   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:01:30,139-Speed 10492.82 samples/sec   Loss 8.7057   LearningRate 0.2888   Epoch: 8   Global Step: 46130   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:01:37,930-Speed 10516.76 samples/sec   Loss 8.5885   LearningRate 0.2887   Epoch: 8   Global Step: 46140   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:01:45,715-Speed 10523.70 samples/sec   Loss 8.5804   LearningRate 0.2886   Epoch: 8   Global Step: 46150   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:01:53,507-Speed 10514.23 samples/sec   Loss 8.6329   LearningRate 0.2885   Epoch: 8   Global Step: 46160   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:02:01,302-Speed 10511.71 samples/sec   Loss 8.6021   LearningRate 0.2884   Epoch: 8   Global Step: 46170   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:02:09,117-Speed 10483.14 samples/sec   Loss 8.5864   LearningRate 0.2883   Epoch: 8   Global Step: 46180   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:02:16,922-Speed 10497.57 samples/sec   Loss 8.6708   LearningRate 0.2882   Epoch: 8   Global Step: 46190   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:02:24,711-Speed 10518.90 samples/sec   Loss 8.5372   LearningRate 0.2881   Epoch: 8   Global Step: 46200   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:02:32,516-Speed 10498.89 samples/sec   Loss 8.7088   LearningRate 0.2880   Epoch: 8   Global Step: 46210   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:02:40,345-Speed 10464.74 samples/sec   Loss 8.6457   LearningRate 0.2879   Epoch: 8   Global Step: 46220   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 01:02:48,130-Speed 10523.12 samples/sec   Loss 8.5796   LearningRate 0.2878   Epoch: 8   Global Step: 46230   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 01:02:55,918-Speed 10521.05 samples/sec   Loss 8.6382   LearningRate 0.2877   Epoch: 8   Global Step: 46240   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:03:03,721-Speed 10500.35 samples/sec   Loss 8.6339   LearningRate 0.2876   Epoch: 8   Global Step: 46250   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:03:11,523-Speed 10499.99 samples/sec   Loss 8.6365   LearningRate 0.2875   Epoch: 8   Global Step: 46260   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:03:19,349-Speed 10469.30 samples/sec   Loss 8.5173   LearningRate 0.2874   Epoch: 8   Global Step: 46270   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:03:27,164-Speed 10484.85 samples/sec   Loss 8.5060   LearningRate 0.2873   Epoch: 8   Global Step: 46280   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:03:34,953-Speed 10518.53 samples/sec   Loss 8.5849   LearningRate 0.2872   Epoch: 8   Global Step: 46290   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:03:42,747-Speed 10511.87 samples/sec   Loss 8.6023   LearningRate 0.2871   Epoch: 8   Global Step: 46300   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:03:50,581-Speed 10458.48 samples/sec   Loss 8.5808   LearningRate 0.2870   Epoch: 8   Global Step: 46310   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:03:58,387-Speed 10495.26 samples/sec   Loss 8.5733   LearningRate 0.2869   Epoch: 8   Global Step: 46320   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:04:06,159-Speed 10542.54 samples/sec   Loss 8.5762   LearningRate 0.2868   Epoch: 8   Global Step: 46330   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:04:13,939-Speed 10530.87 samples/sec   Loss 8.5975   LearningRate 0.2867   Epoch: 8   Global Step: 46340   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 01:04:21,731-Speed 10514.54 samples/sec   Loss 8.6368   LearningRate 0.2866   Epoch: 8   Global Step: 46350   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:04:29,521-Speed 10517.64 samples/sec   Loss 8.6373   LearningRate 0.2865   Epoch: 8   Global Step: 46360   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:04:37,306-Speed 10524.72 samples/sec   Loss 8.5775   LearningRate 0.2864   Epoch: 8   Global Step: 46370   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:04:45,092-Speed 10523.04 samples/sec   Loss 8.5365   LearningRate 0.2863   Epoch: 8   Global Step: 46380   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:04:52,901-Speed 10492.18 samples/sec   Loss 8.5758   LearningRate 0.2862   Epoch: 8   Global Step: 46390   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:05:00,682-Speed 10528.97 samples/sec   Loss 8.4723   LearningRate 0.2861   Epoch: 8   Global Step: 46400   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:05:08,517-Speed 10456.88 samples/sec   Loss 8.5577   LearningRate 0.2860   Epoch: 8   Global Step: 46410   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:05:16,336-Speed 10478.60 samples/sec   Loss 8.6000   LearningRate 0.2859   Epoch: 8   Global Step: 46420   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:05:24,136-Speed 10504.35 samples/sec   Loss 8.5180   LearningRate 0.2858   Epoch: 8   Global Step: 46430   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:05:31,940-Speed 10499.53 samples/sec   Loss 8.5356   LearningRate 0.2857   Epoch: 8   Global Step: 46440   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:05:39,733-Speed 10512.91 samples/sec   Loss 8.5733   LearningRate 0.2856   Epoch: 8   Global Step: 46450   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:05:47,529-Speed 10508.54 samples/sec   Loss 8.5776   LearningRate 0.2855   Epoch: 8   Global Step: 46460   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:05:55,333-Speed 10499.43 samples/sec   Loss 8.5352   LearningRate 0.2854   Epoch: 8   Global Step: 46470   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:06:03,152-Speed 10477.76 samples/sec   Loss 8.5740   LearningRate 0.2853   Epoch: 8   Global Step: 46480   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:06:10,952-Speed 10504.93 samples/sec   Loss 8.5883   LearningRate 0.2852   Epoch: 8   Global Step: 46490   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:06:18,756-Speed 10497.71 samples/sec   Loss 8.6044   LearningRate 0.2851   Epoch: 8   Global Step: 46500   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:06:26,530-Speed 10539.84 samples/sec   Loss 8.5138   LearningRate 0.2850   Epoch: 8   Global Step: 46510   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:06:34,310-Speed 10530.37 samples/sec   Loss 8.5689   LearningRate 0.2849   Epoch: 8   Global Step: 46520   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:06:42,101-Speed 10516.76 samples/sec   Loss 8.6413   LearningRate 0.2848   Epoch: 8   Global Step: 46530   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:06:49,909-Speed 10493.59 samples/sec   Loss 8.5319   LearningRate 0.2847   Epoch: 8   Global Step: 46540   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:06:57,702-Speed 10512.26 samples/sec   Loss 8.5290   LearningRate 0.2846   Epoch: 8   Global Step: 46550   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 01:07:05,492-Speed 10517.09 samples/sec   Loss 8.4994   LearningRate 0.2845   Epoch: 8   Global Step: 46560   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 01:07:13,305-Speed 10486.72 samples/sec   Loss 8.5712   LearningRate 0.2844   Epoch: 8   Global Step: 46570   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 01:07:21,089-Speed 10525.51 samples/sec   Loss 8.5636   LearningRate 0.2844   Epoch: 8   Global Step: 46580   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 01:07:28,887-Speed 10506.68 samples/sec   Loss 8.5147   LearningRate 0.2843   Epoch: 8   Global Step: 46590   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 01:07:36,675-Speed 10519.79 samples/sec   Loss 8.5406   LearningRate 0.2842   Epoch: 8   Global Step: 46600   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 01:07:44,467-Speed 10515.55 samples/sec   Loss 8.5328   LearningRate 0.2841   Epoch: 8   Global Step: 46610   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 01:07:52,275-Speed 10493.15 samples/sec   Loss 8.5496   LearningRate 0.2840   Epoch: 8   Global Step: 46620   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:08:00,083-Speed 10492.55 samples/sec   Loss 8.5272   LearningRate 0.2839   Epoch: 8   Global Step: 46630   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:08:07,896-Speed 10487.45 samples/sec   Loss 8.6432   LearningRate 0.2838   Epoch: 8   Global Step: 46640   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:08:15,701-Speed 10496.88 samples/sec   Loss 8.5777   LearningRate 0.2837   Epoch: 8   Global Step: 46650   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:08:23,499-Speed 10507.48 samples/sec   Loss 8.5631   LearningRate 0.2836   Epoch: 8   Global Step: 46660   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:08:46,366-Speed 3582.50 samples/sec   Loss 8.5290   LearningRate 0.2835   Epoch: 9   Global Step: 46670   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:08:54,120-Speed 10566.48 samples/sec   Loss 8.5241   LearningRate 0.2834   Epoch: 9   Global Step: 46680   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:09:01,916-Speed 10509.65 samples/sec   Loss 8.5198   LearningRate 0.2833   Epoch: 9   Global Step: 46690   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:09:09,708-Speed 10514.37 samples/sec   Loss 8.5253   LearningRate 0.2832   Epoch: 9   Global Step: 46700   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:09:17,499-Speed 10515.52 samples/sec   Loss 8.5235   LearningRate 0.2831   Epoch: 9   Global Step: 46710   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:09:25,298-Speed 10505.70 samples/sec   Loss 8.5280   LearningRate 0.2830   Epoch: 9   Global Step: 46720   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 01:09:33,103-Speed 10497.86 samples/sec   Loss 8.5253   LearningRate 0.2829   Epoch: 9   Global Step: 46730   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 01:09:40,865-Speed 10554.07 samples/sec   Loss 8.5771   LearningRate 0.2828   Epoch: 9   Global Step: 46740   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 01:09:48,664-Speed 10505.67 samples/sec   Loss 8.5426   LearningRate 0.2827   Epoch: 9   Global Step: 46750   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 01:09:56,475-Speed 10490.09 samples/sec   Loss 8.5394   LearningRate 0.2826   Epoch: 9   Global Step: 46760   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 01:10:04,266-Speed 10515.40 samples/sec   Loss 8.4594   LearningRate 0.2825   Epoch: 9   Global Step: 46770   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-16 01:10:12,058-Speed 10515.01 samples/sec   Loss 8.4466   LearningRate 0.2824   Epoch: 9   Global Step: 46780   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:10:19,858-Speed 10506.40 samples/sec   Loss 8.5026   LearningRate 0.2823   Epoch: 9   Global Step: 46790   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:10:27,656-Speed 10507.38 samples/sec   Loss 8.5061   LearningRate 0.2822   Epoch: 9   Global Step: 46800   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:10:35,457-Speed 10502.60 samples/sec   Loss 8.4761   LearningRate 0.2821   Epoch: 9   Global Step: 46810   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:10:43,256-Speed 10505.30 samples/sec   Loss 8.4912   LearningRate 0.2820   Epoch: 9   Global Step: 46820   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-16 01:10:51,043-Speed 10521.28 samples/sec   Loss 8.4246   LearningRate 0.2819   Epoch: 9   Global Step: 46830   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:10:58,838-Speed 10511.92 samples/sec   Loss 8.5136   LearningRate 0.2818   Epoch: 9   Global Step: 46840   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:11:06,691-Speed 10431.83 samples/sec   Loss 8.4984   LearningRate 0.2817   Epoch: 9   Global Step: 46850   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:11:14,507-Speed 10483.06 samples/sec   Loss 8.5289   LearningRate 0.2816   Epoch: 9   Global Step: 46860   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:11:22,296-Speed 10519.14 samples/sec   Loss 8.4792   LearningRate 0.2815   Epoch: 9   Global Step: 46870   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:11:30,105-Speed 10491.13 samples/sec   Loss 8.4835   LearningRate 0.2814   Epoch: 9   Global Step: 46880   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:11:37,897-Speed 10515.46 samples/sec   Loss 8.4679   LearningRate 0.2813   Epoch: 9   Global Step: 46890   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:11:45,727-Speed 10462.73 samples/sec   Loss 8.4893   LearningRate 0.2812   Epoch: 9   Global Step: 46900   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:11:53,506-Speed 10531.66 samples/sec   Loss 8.4688   LearningRate 0.2811   Epoch: 9   Global Step: 46910   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:12:01,288-Speed 10529.42 samples/sec   Loss 8.5072   LearningRate 0.2810   Epoch: 9   Global Step: 46920   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:12:09,091-Speed 10498.94 samples/sec   Loss 8.5141   LearningRate 0.2809   Epoch: 9   Global Step: 46930   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:12:16,872-Speed 10529.59 samples/sec   Loss 8.5236   LearningRate 0.2808   Epoch: 9   Global Step: 46940   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:12:24,660-Speed 10520.15 samples/sec   Loss 8.5304   LearningRate 0.2807   Epoch: 9   Global Step: 46950   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:12:32,446-Speed 10522.86 samples/sec   Loss 8.5329   LearningRate 0.2806   Epoch: 9   Global Step: 46960   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:12:40,245-Speed 10505.07 samples/sec   Loss 8.4182   LearningRate 0.2805   Epoch: 9   Global Step: 46970   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:12:48,028-Speed 10527.02 samples/sec   Loss 8.5120   LearningRate 0.2804   Epoch: 9   Global Step: 46980   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:12:55,811-Speed 10528.42 samples/sec   Loss 8.4625   LearningRate 0.2803   Epoch: 9   Global Step: 46990   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:13:03,652-Speed 10448.26 samples/sec   Loss 8.5790   LearningRate 0.2802   Epoch: 9   Global Step: 47000   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:13:11,445-Speed 10513.21 samples/sec   Loss 8.5485   LearningRate 0.2801   Epoch: 9   Global Step: 47010   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:13:19,235-Speed 10518.19 samples/sec   Loss 8.4775   LearningRate 0.2800   Epoch: 9   Global Step: 47020   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:13:27,047-Speed 10489.77 samples/sec   Loss 8.4822   LearningRate 0.2799   Epoch: 9   Global Step: 47030   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:13:34,850-Speed 10500.10 samples/sec   Loss 8.4981   LearningRate 0.2798   Epoch: 9   Global Step: 47040   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:13:42,645-Speed 10509.51 samples/sec   Loss 8.4881   LearningRate 0.2797   Epoch: 9   Global Step: 47050   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:13:50,456-Speed 10489.57 samples/sec   Loss 8.4540   LearningRate 0.2796   Epoch: 9   Global Step: 47060   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:13:58,280-Speed 10473.21 samples/sec   Loss 8.5184   LearningRate 0.2795   Epoch: 9   Global Step: 47070   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:14:06,130-Speed 10435.82 samples/sec   Loss 8.4891   LearningRate 0.2794   Epoch: 9   Global Step: 47080   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:14:13,961-Speed 10462.75 samples/sec   Loss 8.4792   LearningRate 0.2793   Epoch: 9   Global Step: 47090   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:14:21,776-Speed 10483.21 samples/sec   Loss 8.4227   LearningRate 0.2792   Epoch: 9   Global Step: 47100   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:14:29,609-Speed 10460.59 samples/sec   Loss 8.4732   LearningRate 0.2791   Epoch: 9   Global Step: 47110   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:14:37,432-Speed 10472.08 samples/sec   Loss 8.4398   LearningRate 0.2790   Epoch: 9   Global Step: 47120   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:14:45,268-Speed 10455.94 samples/sec   Loss 8.4664   LearningRate 0.2789   Epoch: 9   Global Step: 47130   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:14:53,098-Speed 10464.02 samples/sec   Loss 8.5057   LearningRate 0.2788   Epoch: 9   Global Step: 47140   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:15:00,942-Speed 10445.25 samples/sec   Loss 8.4885   LearningRate 0.2787   Epoch: 9   Global Step: 47150   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:15:08,810-Speed 10412.23 samples/sec   Loss 8.5047   LearningRate 0.2786   Epoch: 9   Global Step: 47160   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:15:16,666-Speed 10429.78 samples/sec   Loss 8.4672   LearningRate 0.2785   Epoch: 9   Global Step: 47170   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:15:24,507-Speed 10448.63 samples/sec   Loss 8.4390   LearningRate 0.2784   Epoch: 9   Global Step: 47180   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:15:32,335-Speed 10467.40 samples/sec   Loss 8.5087   LearningRate 0.2783   Epoch: 9   Global Step: 47190   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:15:40,167-Speed 10460.75 samples/sec   Loss 8.4060   LearningRate 0.2782   Epoch: 9   Global Step: 47200   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:15:47,997-Speed 10463.08 samples/sec   Loss 8.4555   LearningRate 0.2781   Epoch: 9   Global Step: 47210   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:15:55,825-Speed 10466.21 samples/sec   Loss 8.5167   LearningRate 0.2780   Epoch: 9   Global Step: 47220   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:16:03,674-Speed 10439.23 samples/sec   Loss 8.4597   LearningRate 0.2779   Epoch: 9   Global Step: 47230   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:16:11,531-Speed 10427.00 samples/sec   Loss 8.4517   LearningRate 0.2778   Epoch: 9   Global Step: 47240   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:16:19,362-Speed 10462.65 samples/sec   Loss 8.4058   LearningRate 0.2777   Epoch: 9   Global Step: 47250   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:16:27,180-Speed 10479.54 samples/sec   Loss 8.4293   LearningRate 0.2776   Epoch: 9   Global Step: 47260   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:16:35,001-Speed 10475.08 samples/sec   Loss 8.4287   LearningRate 0.2775   Epoch: 9   Global Step: 47270   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:16:42,864-Speed 10420.49 samples/sec   Loss 8.4417   LearningRate 0.2774   Epoch: 9   Global Step: 47280   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:16:50,694-Speed 10463.68 samples/sec   Loss 8.3890   LearningRate 0.2773   Epoch: 9   Global Step: 47290   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:16:58,517-Speed 10472.73 samples/sec   Loss 8.4513   LearningRate 0.2772   Epoch: 9   Global Step: 47300   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:17:06,351-Speed 10458.88 samples/sec   Loss 8.4281   LearningRate 0.2771   Epoch: 9   Global Step: 47310   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:17:14,175-Speed 10471.60 samples/sec   Loss 8.4566   LearningRate 0.2770   Epoch: 9   Global Step: 47320   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:17:22,110-Speed 10324.27 samples/sec   Loss 8.3915   LearningRate 0.2769   Epoch: 9   Global Step: 47330   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:17:29,962-Speed 10439.22 samples/sec   Loss 8.4972   LearningRate 0.2768   Epoch: 9   Global Step: 47340   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:17:37,817-Speed 10430.07 samples/sec   Loss 8.4295   LearningRate 0.2767   Epoch: 9   Global Step: 47350   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:17:45,628-Speed 10489.17 samples/sec   Loss 8.3650   LearningRate 0.2766   Epoch: 9   Global Step: 47360   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:17:53,465-Speed 10454.23 samples/sec   Loss 8.4141   LearningRate 0.2765   Epoch: 9   Global Step: 47370   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:18:01,283-Speed 10481.81 samples/sec   Loss 8.4396   LearningRate 0.2764   Epoch: 9   Global Step: 47380   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:18:09,132-Speed 10439.96 samples/sec   Loss 8.3756   LearningRate 0.2763   Epoch: 9   Global Step: 47390   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:18:17,001-Speed 10412.08 samples/sec   Loss 8.4396   LearningRate 0.2762   Epoch: 9   Global Step: 47400   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:18:24,833-Speed 10461.29 samples/sec   Loss 8.4160   LearningRate 0.2761   Epoch: 9   Global Step: 47410   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:18:32,664-Speed 10462.56 samples/sec   Loss 8.3846   LearningRate 0.2760   Epoch: 9   Global Step: 47420   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:18:40,519-Speed 10430.29 samples/sec   Loss 8.3977   LearningRate 0.2759   Epoch: 9   Global Step: 47430   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:18:48,329-Speed 10491.63 samples/sec   Loss 8.4027   LearningRate 0.2758   Epoch: 9   Global Step: 47440   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:18:56,179-Speed 10436.40 samples/sec   Loss 8.4477   LearningRate 0.2758   Epoch: 9   Global Step: 47450   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:19:04,005-Speed 10469.99 samples/sec   Loss 8.4227   LearningRate 0.2757   Epoch: 9   Global Step: 47460   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:19:11,802-Speed 10507.47 samples/sec   Loss 8.4479   LearningRate 0.2756   Epoch: 9   Global Step: 47470   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:19:19,593-Speed 10516.56 samples/sec   Loss 8.4323   LearningRate 0.2755   Epoch: 9   Global Step: 47480   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:19:27,427-Speed 10457.61 samples/sec   Loss 8.4266   LearningRate 0.2754   Epoch: 9   Global Step: 47490   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:19:35,237-Speed 10492.01 samples/sec   Loss 8.3772   LearningRate 0.2753   Epoch: 9   Global Step: 47500   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:19:43,045-Speed 10493.08 samples/sec   Loss 8.4060   LearningRate 0.2752   Epoch: 9   Global Step: 47510   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:19:50,843-Speed 10506.48 samples/sec   Loss 8.4164   LearningRate 0.2751   Epoch: 9   Global Step: 47520   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:19:58,628-Speed 10524.47 samples/sec   Loss 8.3882   LearningRate 0.2750   Epoch: 9   Global Step: 47530   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:20:06,432-Speed 10498.75 samples/sec   Loss 8.5234   LearningRate 0.2749   Epoch: 9   Global Step: 47540   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:20:14,262-Speed 10462.84 samples/sec   Loss 8.4663   LearningRate 0.2748   Epoch: 9   Global Step: 47550   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:20:22,058-Speed 10509.86 samples/sec   Loss 8.3944   LearningRate 0.2747   Epoch: 9   Global Step: 47560   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:20:29,864-Speed 10495.85 samples/sec   Loss 8.4140   LearningRate 0.2746   Epoch: 9   Global Step: 47570   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:20:37,696-Speed 10461.17 samples/sec   Loss 8.3692   LearningRate 0.2745   Epoch: 9   Global Step: 47580   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:20:45,526-Speed 10462.79 samples/sec   Loss 8.3766   LearningRate 0.2744   Epoch: 9   Global Step: 47590   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:20:53,333-Speed 10494.99 samples/sec   Loss 8.4659   LearningRate 0.2743   Epoch: 9   Global Step: 47600   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:21:01,186-Speed 10432.74 samples/sec   Loss 8.4580   LearningRate 0.2742   Epoch: 9   Global Step: 47610   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:21:08,988-Speed 10501.94 samples/sec   Loss 8.3850   LearningRate 0.2741   Epoch: 9   Global Step: 47620   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:21:16,786-Speed 10507.61 samples/sec   Loss 8.3362   LearningRate 0.2740   Epoch: 9   Global Step: 47630   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:21:24,607-Speed 10474.69 samples/sec   Loss 8.3475   LearningRate 0.2739   Epoch: 9   Global Step: 47640   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:21:32,401-Speed 10512.53 samples/sec   Loss 8.3851   LearningRate 0.2738   Epoch: 9   Global Step: 47650   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:21:40,190-Speed 10521.31 samples/sec   Loss 8.3556   LearningRate 0.2737   Epoch: 9   Global Step: 47660   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:21:47,994-Speed 10502.43 samples/sec   Loss 8.3316   LearningRate 0.2736   Epoch: 9   Global Step: 47670   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:21:55,782-Speed 10520.41 samples/sec   Loss 8.3484   LearningRate 0.2735   Epoch: 9   Global Step: 47680   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:22:03,609-Speed 10468.26 samples/sec   Loss 8.3494   LearningRate 0.2734   Epoch: 9   Global Step: 47690   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:22:11,397-Speed 10519.42 samples/sec   Loss 8.3521   LearningRate 0.2733   Epoch: 9   Global Step: 47700   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:22:19,239-Speed 10447.44 samples/sec   Loss 8.3505   LearningRate 0.2732   Epoch: 9   Global Step: 47710   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:22:27,075-Speed 10456.81 samples/sec   Loss 8.3856   LearningRate 0.2731   Epoch: 9   Global Step: 47720   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:22:34,867-Speed 10514.85 samples/sec   Loss 8.3413   LearningRate 0.2730   Epoch: 9   Global Step: 47730   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:22:42,645-Speed 10533.22 samples/sec   Loss 8.3938   LearningRate 0.2729   Epoch: 9   Global Step: 47740   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:22:50,428-Speed 10527.32 samples/sec   Loss 8.3856   LearningRate 0.2728   Epoch: 9   Global Step: 47750   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:22:58,223-Speed 10509.89 samples/sec   Loss 8.3652   LearningRate 0.2727   Epoch: 9   Global Step: 47760   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:23:06,028-Speed 10498.97 samples/sec   Loss 8.3688   LearningRate 0.2726   Epoch: 9   Global Step: 47770   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:23:13,802-Speed 10539.38 samples/sec   Loss 8.3552   LearningRate 0.2725   Epoch: 9   Global Step: 47780   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:23:21,582-Speed 10531.49 samples/sec   Loss 8.3621   LearningRate 0.2724   Epoch: 9   Global Step: 47790   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:23:29,389-Speed 10494.39 samples/sec   Loss 8.4133   LearningRate 0.2723   Epoch: 9   Global Step: 47800   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:23:37,182-Speed 10514.39 samples/sec   Loss 8.3416   LearningRate 0.2722   Epoch: 9   Global Step: 47810   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:23:44,955-Speed 10540.10 samples/sec   Loss 8.3639   LearningRate 0.2721   Epoch: 9   Global Step: 47820   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:23:52,747-Speed 10515.21 samples/sec   Loss 8.3476   LearningRate 0.2720   Epoch: 9   Global Step: 47830   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:24:00,561-Speed 10485.95 samples/sec   Loss 8.3378   LearningRate 0.2719   Epoch: 9   Global Step: 47840   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:24:08,349-Speed 10518.56 samples/sec   Loss 8.4363   LearningRate 0.2718   Epoch: 9   Global Step: 47850   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:24:16,171-Speed 10474.23 samples/sec   Loss 8.4001   LearningRate 0.2717   Epoch: 9   Global Step: 47860   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:24:23,967-Speed 10510.29 samples/sec   Loss 8.3648   LearningRate 0.2716   Epoch: 9   Global Step: 47870   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:24:31,764-Speed 10508.14 samples/sec   Loss 8.3420   LearningRate 0.2715   Epoch: 9   Global Step: 47880   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:24:39,571-Speed 10494.35 samples/sec   Loss 8.3486   LearningRate 0.2715   Epoch: 9   Global Step: 47890   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:24:47,356-Speed 10524.05 samples/sec   Loss 8.3421   LearningRate 0.2714   Epoch: 9   Global Step: 47900   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:24:55,171-Speed 10484.22 samples/sec   Loss 8.3464   LearningRate 0.2713   Epoch: 9   Global Step: 47910   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:25:02,966-Speed 10511.16 samples/sec   Loss 8.3826   LearningRate 0.2712   Epoch: 9   Global Step: 47920   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:25:10,766-Speed 10504.35 samples/sec   Loss 8.3964   LearningRate 0.2711   Epoch: 9   Global Step: 47930   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:25:18,588-Speed 10474.04 samples/sec   Loss 8.3335   LearningRate 0.2710   Epoch: 9   Global Step: 47940   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:25:26,415-Speed 10466.89 samples/sec   Loss 8.3409   LearningRate 0.2709   Epoch: 9   Global Step: 47950   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:25:34,213-Speed 10507.40 samples/sec   Loss 8.4193   LearningRate 0.2708   Epoch: 9   Global Step: 47960   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:25:42,020-Speed 10493.87 samples/sec   Loss 8.3378   LearningRate 0.2707   Epoch: 9   Global Step: 47970   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:25:49,824-Speed 10499.54 samples/sec   Loss 8.2920   LearningRate 0.2706   Epoch: 9   Global Step: 47980   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:25:57,668-Speed 10444.91 samples/sec   Loss 8.3425   LearningRate 0.2705   Epoch: 9   Global Step: 47990   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:26:05,477-Speed 10491.53 samples/sec   Loss 8.3647   LearningRate 0.2704   Epoch: 9   Global Step: 48000   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:26:13,268-Speed 10515.81 samples/sec   Loss 8.3385   LearningRate 0.2703   Epoch: 9   Global Step: 48010   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:26:21,055-Speed 10521.87 samples/sec   Loss 8.3427   LearningRate 0.2702   Epoch: 9   Global Step: 48020   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:26:28,855-Speed 10504.52 samples/sec   Loss 8.3236   LearningRate 0.2701   Epoch: 9   Global Step: 48030   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:26:36,694-Speed 10450.72 samples/sec   Loss 8.2692   LearningRate 0.2700   Epoch: 9   Global Step: 48040   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:26:44,523-Speed 10465.93 samples/sec   Loss 8.3595   LearningRate 0.2699   Epoch: 9   Global Step: 48050   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:26:52,387-Speed 10418.48 samples/sec   Loss 8.3367   LearningRate 0.2698   Epoch: 9   Global Step: 48060   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:27:00,190-Speed 10500.94 samples/sec   Loss 8.3472   LearningRate 0.2697   Epoch: 9   Global Step: 48070   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:27:08,003-Speed 10486.45 samples/sec   Loss 8.3022   LearningRate 0.2696   Epoch: 9   Global Step: 48080   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:27:15,805-Speed 10501.03 samples/sec   Loss 8.3720   LearningRate 0.2695   Epoch: 9   Global Step: 48090   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:27:23,580-Speed 10537.25 samples/sec   Loss 8.3088   LearningRate 0.2694   Epoch: 9   Global Step: 48100   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:27:31,382-Speed 10501.47 samples/sec   Loss 8.3594   LearningRate 0.2693   Epoch: 9   Global Step: 48110   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:27:39,168-Speed 10523.24 samples/sec   Loss 8.3009   LearningRate 0.2692   Epoch: 9   Global Step: 48120   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:27:46,971-Speed 10499.83 samples/sec   Loss 8.3381   LearningRate 0.2691   Epoch: 9   Global Step: 48130   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:27:54,773-Speed 10501.07 samples/sec   Loss 8.3079   LearningRate 0.2690   Epoch: 9   Global Step: 48140   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:28:02,567-Speed 10511.77 samples/sec   Loss 8.3203   LearningRate 0.2689   Epoch: 9   Global Step: 48150   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:28:10,370-Speed 10499.73 samples/sec   Loss 8.3892   LearningRate 0.2688   Epoch: 9   Global Step: 48160   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:28:18,187-Speed 10481.37 samples/sec   Loss 8.3200   LearningRate 0.2687   Epoch: 9   Global Step: 48170   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:28:25,986-Speed 10505.57 samples/sec   Loss 8.2859   LearningRate 0.2686   Epoch: 9   Global Step: 48180   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:28:33,802-Speed 10482.12 samples/sec   Loss 8.3415   LearningRate 0.2685   Epoch: 9   Global Step: 48190   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:28:41,608-Speed 10495.65 samples/sec   Loss 8.2717   LearningRate 0.2684   Epoch: 9   Global Step: 48200   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:28:49,399-Speed 10516.20 samples/sec   Loss 8.3459   LearningRate 0.2683   Epoch: 9   Global Step: 48210   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:28:57,246-Speed 10440.92 samples/sec   Loss 8.2991   LearningRate 0.2683   Epoch: 9   Global Step: 48220   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:29:05,064-Speed 10480.06 samples/sec   Loss 8.2946   LearningRate 0.2682   Epoch: 9   Global Step: 48230   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:29:12,898-Speed 10458.97 samples/sec   Loss 8.2262   LearningRate 0.2681   Epoch: 9   Global Step: 48240   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:29:20,737-Speed 10450.69 samples/sec   Loss 8.3229   LearningRate 0.2680   Epoch: 9   Global Step: 48250   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:29:28,606-Speed 10415.75 samples/sec   Loss 8.3445   LearningRate 0.2679   Epoch: 9   Global Step: 48260   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:29:36,427-Speed 10476.65 samples/sec   Loss 8.2887   LearningRate 0.2678   Epoch: 9   Global Step: 48270   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:29:44,246-Speed 10479.39 samples/sec   Loss 8.2385   LearningRate 0.2677   Epoch: 9   Global Step: 48280   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:29:52,080-Speed 10457.33 samples/sec   Loss 8.2988   LearningRate 0.2676   Epoch: 9   Global Step: 48290   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:29:59,882-Speed 10501.57 samples/sec   Loss 8.2758   LearningRate 0.2675   Epoch: 9   Global Step: 48300   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:30:07,709-Speed 10468.05 samples/sec   Loss 8.2798   LearningRate 0.2674   Epoch: 9   Global Step: 48310   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:30:15,506-Speed 10507.73 samples/sec   Loss 8.3251   LearningRate 0.2673   Epoch: 9   Global Step: 48320   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:30:23,316-Speed 10491.72 samples/sec   Loss 8.2959   LearningRate 0.2672   Epoch: 9   Global Step: 48330   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:30:31,103-Speed 10521.93 samples/sec   Loss 8.2925   LearningRate 0.2671   Epoch: 9   Global Step: 48340   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:30:38,886-Speed 10526.31 samples/sec   Loss 8.3073   LearningRate 0.2670   Epoch: 9   Global Step: 48350   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:30:46,672-Speed 10523.54 samples/sec   Loss 8.2378   LearningRate 0.2669   Epoch: 9   Global Step: 48360   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:30:54,463-Speed 10516.59 samples/sec   Loss 8.2677   LearningRate 0.2668   Epoch: 9   Global Step: 48370   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:31:02,301-Speed 10453.13 samples/sec   Loss 8.2248   LearningRate 0.2667   Epoch: 9   Global Step: 48380   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:31:10,089-Speed 10519.09 samples/sec   Loss 8.2852   LearningRate 0.2666   Epoch: 9   Global Step: 48390   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:31:17,878-Speed 10520.74 samples/sec   Loss 8.3503   LearningRate 0.2665   Epoch: 9   Global Step: 48400   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:31:25,667-Speed 10517.74 samples/sec   Loss 8.2970   LearningRate 0.2664   Epoch: 9   Global Step: 48410   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:31:33,482-Speed 10484.16 samples/sec   Loss 8.2115   LearningRate 0.2663   Epoch: 9   Global Step: 48420   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:31:41,295-Speed 10487.15 samples/sec   Loss 8.2826   LearningRate 0.2662   Epoch: 9   Global Step: 48430   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:31:49,120-Speed 10470.13 samples/sec   Loss 8.3098   LearningRate 0.2661   Epoch: 9   Global Step: 48440   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:31:56,900-Speed 10530.69 samples/sec   Loss 8.2461   LearningRate 0.2660   Epoch: 9   Global Step: 48450   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:32:04,679-Speed 10532.79 samples/sec   Loss 8.2798   LearningRate 0.2659   Epoch: 9   Global Step: 48460   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:32:12,489-Speed 10490.41 samples/sec   Loss 8.2959   LearningRate 0.2658   Epoch: 9   Global Step: 48470   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:32:20,281-Speed 10515.26 samples/sec   Loss 8.2761   LearningRate 0.2657   Epoch: 9   Global Step: 48480   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:32:28,115-Speed 10457.80 samples/sec   Loss 8.2590   LearningRate 0.2656   Epoch: 9   Global Step: 48490   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:32:35,967-Speed 10433.59 samples/sec   Loss 8.2444   LearningRate 0.2655   Epoch: 9   Global Step: 48500   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:32:43,791-Speed 10472.65 samples/sec   Loss 8.2891   LearningRate 0.2655   Epoch: 9   Global Step: 48510   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:32:51,585-Speed 10511.75 samples/sec   Loss 8.2636   LearningRate 0.2654   Epoch: 9   Global Step: 48520   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:32:59,418-Speed 10459.63 samples/sec   Loss 8.2755   LearningRate 0.2653   Epoch: 9   Global Step: 48530   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:33:07,253-Speed 10456.75 samples/sec   Loss 8.2362   LearningRate 0.2652   Epoch: 9   Global Step: 48540   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:33:15,043-Speed 10518.79 samples/sec   Loss 8.3559   LearningRate 0.2651   Epoch: 9   Global Step: 48550   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:33:22,844-Speed 10502.32 samples/sec   Loss 8.2396   LearningRate 0.2650   Epoch: 9   Global Step: 48560   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:33:30,649-Speed 10496.47 samples/sec   Loss 8.2564   LearningRate 0.2649   Epoch: 9   Global Step: 48570   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:33:38,476-Speed 10468.57 samples/sec   Loss 8.1766   LearningRate 0.2648   Epoch: 9   Global Step: 48580   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:33:46,298-Speed 10475.10 samples/sec   Loss 8.2009   LearningRate 0.2647   Epoch: 9   Global Step: 48590   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:33:54,135-Speed 10456.37 samples/sec   Loss 8.2304   LearningRate 0.2646   Epoch: 9   Global Step: 48600   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:34:01,940-Speed 10497.52 samples/sec   Loss 8.2881   LearningRate 0.2645   Epoch: 9   Global Step: 48610   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:34:09,780-Speed 10450.76 samples/sec   Loss 8.2581   LearningRate 0.2644   Epoch: 9   Global Step: 48620   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:34:17,584-Speed 10498.33 samples/sec   Loss 8.2664   LearningRate 0.2643   Epoch: 9   Global Step: 48630   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:34:25,403-Speed 10478.59 samples/sec   Loss 8.2363   LearningRate 0.2642   Epoch: 9   Global Step: 48640   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:34:33,213-Speed 10491.47 samples/sec   Loss 8.2534   LearningRate 0.2641   Epoch: 9   Global Step: 48650   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:34:41,015-Speed 10500.92 samples/sec   Loss 8.2028   LearningRate 0.2640   Epoch: 9   Global Step: 48660   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:34:48,822-Speed 10495.50 samples/sec   Loss 8.2190   LearningRate 0.2639   Epoch: 9   Global Step: 48670   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:34:56,617-Speed 10509.99 samples/sec   Loss 8.2401   LearningRate 0.2638   Epoch: 9   Global Step: 48680   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:35:04,404-Speed 10521.43 samples/sec   Loss 8.2239   LearningRate 0.2637   Epoch: 9   Global Step: 48690   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:35:12,201-Speed 10508.72 samples/sec   Loss 8.2903   LearningRate 0.2636   Epoch: 9   Global Step: 48700   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:35:19,977-Speed 10535.56 samples/sec   Loss 8.2085   LearningRate 0.2635   Epoch: 9   Global Step: 48710   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:35:27,777-Speed 10504.28 samples/sec   Loss 8.2306   LearningRate 0.2634   Epoch: 9   Global Step: 48720   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:35:35,569-Speed 10515.35 samples/sec   Loss 8.2068   LearningRate 0.2633   Epoch: 9   Global Step: 48730   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:35:43,364-Speed 10511.34 samples/sec   Loss 8.2827   LearningRate 0.2632   Epoch: 9   Global Step: 48740   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:35:51,202-Speed 10452.46 samples/sec   Loss 8.2146   LearningRate 0.2631   Epoch: 9   Global Step: 48750   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:35:59,007-Speed 10496.74 samples/sec   Loss 8.2635   LearningRate 0.2631   Epoch: 9   Global Step: 48760   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:36:06,791-Speed 10525.33 samples/sec   Loss 8.2158   LearningRate 0.2630   Epoch: 9   Global Step: 48770   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:36:14,594-Speed 10499.93 samples/sec   Loss 8.2113   LearningRate 0.2629   Epoch: 9   Global Step: 48780   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:36:22,403-Speed 10491.88 samples/sec   Loss 8.1967   LearningRate 0.2628   Epoch: 9   Global Step: 48790   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:36:30,197-Speed 10512.36 samples/sec   Loss 8.2446   LearningRate 0.2627   Epoch: 9   Global Step: 48800   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:36:38,008-Speed 10488.63 samples/sec   Loss 8.1887   LearningRate 0.2626   Epoch: 9   Global Step: 48810   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:36:45,824-Speed 10482.85 samples/sec   Loss 8.2383   LearningRate 0.2625   Epoch: 9   Global Step: 48820   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:36:53,632-Speed 10494.33 samples/sec   Loss 8.1653   LearningRate 0.2624   Epoch: 9   Global Step: 48830   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:37:01,523-Speed 10383.23 samples/sec   Loss 8.1907   LearningRate 0.2623   Epoch: 9   Global Step: 48840   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:37:09,307-Speed 10526.43 samples/sec   Loss 8.2269   LearningRate 0.2622   Epoch: 9   Global Step: 48850   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:37:17,106-Speed 10505.16 samples/sec   Loss 8.2167   LearningRate 0.2621   Epoch: 9   Global Step: 48860   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:37:24,898-Speed 10514.22 samples/sec   Loss 8.2619   LearningRate 0.2620   Epoch: 9   Global Step: 48870   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:37:32,714-Speed 10481.89 samples/sec   Loss 8.2017   LearningRate 0.2619   Epoch: 9   Global Step: 48880   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:37:40,522-Speed 10494.03 samples/sec   Loss 8.2572   LearningRate 0.2618   Epoch: 9   Global Step: 48890   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:37:48,326-Speed 10498.22 samples/sec   Loss 8.2298   LearningRate 0.2617   Epoch: 9   Global Step: 48900   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:37:56,132-Speed 10496.38 samples/sec   Loss 8.2493   LearningRate 0.2616   Epoch: 9   Global Step: 48910   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:38:03,927-Speed 10511.07 samples/sec   Loss 8.2236   LearningRate 0.2615   Epoch: 9   Global Step: 48920   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:38:11,725-Speed 10506.32 samples/sec   Loss 8.1825   LearningRate 0.2614   Epoch: 9   Global Step: 48930   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:38:19,542-Speed 10482.03 samples/sec   Loss 8.2216   LearningRate 0.2613   Epoch: 9   Global Step: 48940   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:38:27,348-Speed 10494.76 samples/sec   Loss 8.2390   LearningRate 0.2612   Epoch: 9   Global Step: 48950   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:38:35,124-Speed 10535.36 samples/sec   Loss 8.2215   LearningRate 0.2611   Epoch: 9   Global Step: 48960   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:38:42,927-Speed 10501.85 samples/sec   Loss 8.1630   LearningRate 0.2610   Epoch: 9   Global Step: 48970   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:38:50,722-Speed 10510.92 samples/sec   Loss 8.2337   LearningRate 0.2609   Epoch: 9   Global Step: 48980   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:38:58,526-Speed 10497.51 samples/sec   Loss 8.1654   LearningRate 0.2609   Epoch: 9   Global Step: 48990   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:39:06,338-Speed 10487.61 samples/sec   Loss 8.1927   LearningRate 0.2608   Epoch: 9   Global Step: 49000   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:39:14,150-Speed 10491.42 samples/sec   Loss 8.1288   LearningRate 0.2607   Epoch: 9   Global Step: 49010   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:39:21,941-Speed 10516.07 samples/sec   Loss 8.1491   LearningRate 0.2606   Epoch: 9   Global Step: 49020   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:39:29,725-Speed 10525.81 samples/sec   Loss 8.1813   LearningRate 0.2605   Epoch: 9   Global Step: 49030   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:39:37,506-Speed 10529.19 samples/sec   Loss 8.1537   LearningRate 0.2604   Epoch: 9   Global Step: 49040   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:39:45,255-Speed 10572.95 samples/sec   Loss 8.1719   LearningRate 0.2603   Epoch: 9   Global Step: 49050   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-16 01:39:53,028-Speed 10540.42 samples/sec   Loss 8.2324   LearningRate 0.2602   Epoch: 9   Global Step: 49060   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-16 01:40:00,824-Speed 10509.81 samples/sec   Loss 8.2225   LearningRate 0.2601   Epoch: 9   Global Step: 49070   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-16 01:40:08,611-Speed 10520.60 samples/sec   Loss 8.1839   LearningRate 0.2600   Epoch: 9   Global Step: 49080   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-16 01:40:16,400-Speed 10519.94 samples/sec   Loss 8.1816   LearningRate 0.2599   Epoch: 9   Global Step: 49090   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-16 01:40:24,197-Speed 10508.70 samples/sec   Loss 8.1854   LearningRate 0.2598   Epoch: 9   Global Step: 49100   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-16 01:40:31,966-Speed 10544.65 samples/sec   Loss 8.1935   LearningRate 0.2597   Epoch: 9   Global Step: 49110   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-16 01:40:39,778-Speed 10488.20 samples/sec   Loss 8.2310   LearningRate 0.2596   Epoch: 9   Global Step: 49120   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-16 01:40:47,596-Speed 10479.63 samples/sec   Loss 8.1784   LearningRate 0.2595   Epoch: 9   Global Step: 49130   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-16 01:40:55,413-Speed 10480.49 samples/sec   Loss 8.1964   LearningRate 0.2594   Epoch: 9   Global Step: 49140   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-16 01:41:03,206-Speed 10513.96 samples/sec   Loss 8.2547   LearningRate 0.2593   Epoch: 9   Global Step: 49150   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:41:11,003-Speed 10508.38 samples/sec   Loss 8.1184   LearningRate 0.2592   Epoch: 9   Global Step: 49160   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:41:18,807-Speed 10498.84 samples/sec   Loss 8.2089   LearningRate 0.2591   Epoch: 9   Global Step: 49170   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:41:26,598-Speed 10516.49 samples/sec   Loss 8.1916   LearningRate 0.2590   Epoch: 9   Global Step: 49180   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:41:34,376-Speed 10534.95 samples/sec   Loss 8.1940   LearningRate 0.2589   Epoch: 9   Global Step: 49190   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:41:42,165-Speed 10519.08 samples/sec   Loss 8.1753   LearningRate 0.2589   Epoch: 9   Global Step: 49200   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:41:49,947-Speed 10527.97 samples/sec   Loss 8.1210   LearningRate 0.2588   Epoch: 9   Global Step: 49210   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:41:57,757-Speed 10491.29 samples/sec   Loss 8.1516   LearningRate 0.2587   Epoch: 9   Global Step: 49220   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:42:05,584-Speed 10467.97 samples/sec   Loss 8.1230   LearningRate 0.2586   Epoch: 9   Global Step: 49230   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:42:13,391-Speed 10494.89 samples/sec   Loss 8.2223   LearningRate 0.2585   Epoch: 9   Global Step: 49240   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:42:21,219-Speed 10469.37 samples/sec   Loss 8.2610   LearningRate 0.2584   Epoch: 9   Global Step: 49250   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:42:29,007-Speed 10521.09 samples/sec   Loss 8.1560   LearningRate 0.2583   Epoch: 9   Global Step: 49260   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:42:36,799-Speed 10514.23 samples/sec   Loss 8.1766   LearningRate 0.2582   Epoch: 9   Global Step: 49270   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:42:44,588-Speed 10518.78 samples/sec   Loss 8.1588   LearningRate 0.2581   Epoch: 9   Global Step: 49280   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:42:52,411-Speed 10472.87 samples/sec   Loss 8.1640   LearningRate 0.2580   Epoch: 9   Global Step: 49290   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:43:00,217-Speed 10496.65 samples/sec   Loss 8.1519   LearningRate 0.2579   Epoch: 9   Global Step: 49300   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:43:08,026-Speed 10493.08 samples/sec   Loss 8.1363   LearningRate 0.2578   Epoch: 9   Global Step: 49310   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:43:15,793-Speed 10547.69 samples/sec   Loss 8.1126   LearningRate 0.2577   Epoch: 9   Global Step: 49320   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:43:23,562-Speed 10546.58 samples/sec   Loss 8.1113   LearningRate 0.2576   Epoch: 9   Global Step: 49330   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:43:31,342-Speed 10530.78 samples/sec   Loss 8.1658   LearningRate 0.2575   Epoch: 9   Global Step: 49340   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:43:39,144-Speed 10502.51 samples/sec   Loss 8.1806   LearningRate 0.2574   Epoch: 9   Global Step: 49350   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:43:46,937-Speed 10512.64 samples/sec   Loss 8.1880   LearningRate 0.2573   Epoch: 9   Global Step: 49360   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:43:54,726-Speed 10518.30 samples/sec   Loss 8.1914   LearningRate 0.2572   Epoch: 9   Global Step: 49370   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:44:02,521-Speed 10510.51 samples/sec   Loss 8.1988   LearningRate 0.2571   Epoch: 9   Global Step: 49380   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:44:10,290-Speed 10546.59 samples/sec   Loss 8.2046   LearningRate 0.2571   Epoch: 9   Global Step: 49390   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:44:18,075-Speed 10523.50 samples/sec   Loss 8.1463   LearningRate 0.2570   Epoch: 9   Global Step: 49400   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:44:25,913-Speed 10453.74 samples/sec   Loss 8.1896   LearningRate 0.2569   Epoch: 9   Global Step: 49410   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:44:33,703-Speed 10517.73 samples/sec   Loss 8.1453   LearningRate 0.2568   Epoch: 9   Global Step: 49420   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:44:41,499-Speed 10508.63 samples/sec   Loss 8.1125   LearningRate 0.2567   Epoch: 9   Global Step: 49430   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:44:49,313-Speed 10487.41 samples/sec   Loss 8.0966   LearningRate 0.2566   Epoch: 9   Global Step: 49440   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:44:57,109-Speed 10513.02 samples/sec   Loss 8.1277   LearningRate 0.2565   Epoch: 9   Global Step: 49450   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:45:04,893-Speed 10524.88 samples/sec   Loss 8.1214   LearningRate 0.2564   Epoch: 9   Global Step: 49460   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:45:12,681-Speed 10521.15 samples/sec   Loss 8.1090   LearningRate 0.2563   Epoch: 9   Global Step: 49470   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:45:20,506-Speed 10470.58 samples/sec   Loss 8.1085   LearningRate 0.2562   Epoch: 9   Global Step: 49480   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:45:28,292-Speed 10522.88 samples/sec   Loss 8.1330   LearningRate 0.2561   Epoch: 9   Global Step: 49490   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:45:36,078-Speed 10522.95 samples/sec   Loss 8.1055   LearningRate 0.2560   Epoch: 9   Global Step: 49500   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:45:43,875-Speed 10509.92 samples/sec   Loss 8.1083   LearningRate 0.2559   Epoch: 9   Global Step: 49510   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:45:51,671-Speed 10509.42 samples/sec   Loss 8.1503   LearningRate 0.2558   Epoch: 9   Global Step: 49520   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:45:59,478-Speed 10494.17 samples/sec   Loss 8.1443   LearningRate 0.2557   Epoch: 9   Global Step: 49530   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:46:07,278-Speed 10504.69 samples/sec   Loss 8.1085   LearningRate 0.2556   Epoch: 9   Global Step: 49540   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:46:15,060-Speed 10528.08 samples/sec   Loss 8.2522   LearningRate 0.2555   Epoch: 9   Global Step: 49550   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:46:22,849-Speed 10519.62 samples/sec   Loss 8.1286   LearningRate 0.2554   Epoch: 9   Global Step: 49560   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:46:30,650-Speed 10502.21 samples/sec   Loss 8.1214   LearningRate 0.2554   Epoch: 9   Global Step: 49570   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:46:38,451-Speed 10503.19 samples/sec   Loss 8.0887   LearningRate 0.2553   Epoch: 9   Global Step: 49580   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:46:46,244-Speed 10513.35 samples/sec   Loss 8.0842   LearningRate 0.2552   Epoch: 9   Global Step: 49590   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:46:54,048-Speed 10498.05 samples/sec   Loss 8.1040   LearningRate 0.2551   Epoch: 9   Global Step: 49600   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:47:01,857-Speed 10492.14 samples/sec   Loss 8.1288   LearningRate 0.2550   Epoch: 9   Global Step: 49610   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:47:09,637-Speed 10530.22 samples/sec   Loss 8.0751   LearningRate 0.2549   Epoch: 9   Global Step: 49620   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:47:17,412-Speed 10537.97 samples/sec   Loss 8.0792   LearningRate 0.2548   Epoch: 9   Global Step: 49630   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:47:25,198-Speed 10523.31 samples/sec   Loss 8.0998   LearningRate 0.2547   Epoch: 9   Global Step: 49640   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:47:32,976-Speed 10533.27 samples/sec   Loss 8.1011   LearningRate 0.2546   Epoch: 9   Global Step: 49650   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:47:40,754-Speed 10534.05 samples/sec   Loss 8.0824   LearningRate 0.2545   Epoch: 9   Global Step: 49660   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:47:48,561-Speed 10494.52 samples/sec   Loss 8.0736   LearningRate 0.2544   Epoch: 9   Global Step: 49670   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:47:56,340-Speed 10532.02 samples/sec   Loss 8.1109   LearningRate 0.2543   Epoch: 9   Global Step: 49680   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:48:04,165-Speed 10470.13 samples/sec   Loss 8.1207   LearningRate 0.2542   Epoch: 9   Global Step: 49690   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:48:11,967-Speed 10501.41 samples/sec   Loss 8.0570   LearningRate 0.2541   Epoch: 9   Global Step: 49700   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:48:19,784-Speed 10482.41 samples/sec   Loss 8.1159   LearningRate 0.2540   Epoch: 9   Global Step: 49710   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:48:27,592-Speed 10493.55 samples/sec   Loss 8.0986   LearningRate 0.2539   Epoch: 9   Global Step: 49720   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:48:35,379-Speed 10521.56 samples/sec   Loss 8.0707   LearningRate 0.2538   Epoch: 9   Global Step: 49730   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:48:43,177-Speed 10509.15 samples/sec   Loss 8.1303   LearningRate 0.2537   Epoch: 9   Global Step: 49740   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:48:50,980-Speed 10501.09 samples/sec   Loss 8.0953   LearningRate 0.2537   Epoch: 9   Global Step: 49750   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:48:58,791-Speed 10490.35 samples/sec   Loss 8.0710   LearningRate 0.2536   Epoch: 9   Global Step: 49760   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:49:06,584-Speed 10512.72 samples/sec   Loss 8.1365   LearningRate 0.2535   Epoch: 9   Global Step: 49770   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:49:14,388-Speed 10498.23 samples/sec   Loss 8.1220   LearningRate 0.2534   Epoch: 9   Global Step: 49780   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:49:22,215-Speed 10469.33 samples/sec   Loss 8.0911   LearningRate 0.2533   Epoch: 9   Global Step: 49790   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:49:30,015-Speed 10504.87 samples/sec   Loss 8.1074   LearningRate 0.2532   Epoch: 9   Global Step: 49800   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:49:37,809-Speed 10511.49 samples/sec   Loss 8.0094   LearningRate 0.2531   Epoch: 9   Global Step: 49810   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:49:45,618-Speed 10491.55 samples/sec   Loss 8.0450   LearningRate 0.2530   Epoch: 9   Global Step: 49820   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:49:53,422-Speed 10498.55 samples/sec   Loss 8.0683   LearningRate 0.2529   Epoch: 9   Global Step: 49830   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:50:01,210-Speed 10520.48 samples/sec   Loss 8.0624   LearningRate 0.2528   Epoch: 9   Global Step: 49840   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:50:08,989-Speed 10532.62 samples/sec   Loss 8.1103   LearningRate 0.2527   Epoch: 9   Global Step: 49850   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:50:16,781-Speed 10514.31 samples/sec   Loss 8.0612   LearningRate 0.2526   Epoch: 9   Global Step: 49860   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:50:24,575-Speed 10512.72 samples/sec   Loss 8.0897   LearningRate 0.2525   Epoch: 9   Global Step: 49870   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:50:32,379-Speed 10498.29 samples/sec   Loss 8.1327   LearningRate 0.2524   Epoch: 9   Global Step: 49880   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:50:40,158-Speed 10532.34 samples/sec   Loss 8.0759   LearningRate 0.2523   Epoch: 9   Global Step: 49890   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:50:47,944-Speed 10523.71 samples/sec   Loss 8.0619   LearningRate 0.2522   Epoch: 9   Global Step: 49900   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:50:55,751-Speed 10494.28 samples/sec   Loss 8.0850   LearningRate 0.2522   Epoch: 9   Global Step: 49910   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:51:03,530-Speed 10531.62 samples/sec   Loss 8.0915   LearningRate 0.2521   Epoch: 9   Global Step: 49920   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:51:11,334-Speed 10499.23 samples/sec   Loss 8.0329   LearningRate 0.2520   Epoch: 9   Global Step: 49930   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:51:19,111-Speed 10534.84 samples/sec   Loss 8.0745   LearningRate 0.2519   Epoch: 9   Global Step: 49940   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:51:26,914-Speed 10499.82 samples/sec   Loss 8.0757   LearningRate 0.2518   Epoch: 9   Global Step: 49950   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:51:34,701-Speed 10521.31 samples/sec   Loss 8.1231   LearningRate 0.2517   Epoch: 9   Global Step: 49960   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:51:42,506-Speed 10497.04 samples/sec   Loss 8.0123   LearningRate 0.2516   Epoch: 9   Global Step: 49970   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:51:50,298-Speed 10515.42 samples/sec   Loss 8.0001   LearningRate 0.2515   Epoch: 9   Global Step: 49980   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:51:58,104-Speed 10496.43 samples/sec   Loss 8.0594   LearningRate 0.2514   Epoch: 9   Global Step: 49990   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:52:05,923-Speed 10478.31 samples/sec   Loss 8.0848   LearningRate 0.2513   Epoch: 9   Global Step: 50000   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:52:33,309-[lfw][50000]XNorm: 21.813953
Training: 2022-01-16 01:52:33,309-[lfw][50000]Accuracy-Flip: 0.99783+-0.00211
Training: 2022-01-16 01:52:33,310-[lfw][50000]Accuracy-Highest: 0.99783
Training: 2022-01-16 01:53:05,107-[cfp_fp][50000]XNorm: 18.823709
Training: 2022-01-16 01:53:05,108-[cfp_fp][50000]Accuracy-Flip: 0.97886+-0.00625
Training: 2022-01-16 01:53:05,109-[cfp_fp][50000]Accuracy-Highest: 0.97886
Training: 2022-01-16 01:53:33,424-[agedb_30][50000]XNorm: 21.084104
Training: 2022-01-16 01:53:33,424-[agedb_30][50000]Accuracy-Flip: 0.96667+-0.00972
Training: 2022-01-16 01:53:33,425-[agedb_30][50000]Accuracy-Highest: 0.96667
Training: 2022-01-16 01:53:41,229-Speed 859.57 samples/sec   Loss 8.0596   LearningRate 0.2512   Epoch: 9   Global Step: 50010   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:53:49,048-Speed 10479.39 samples/sec   Loss 8.0247   LearningRate 0.2511   Epoch: 9   Global Step: 50020   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:53:56,899-Speed 10437.16 samples/sec   Loss 8.1111   LearningRate 0.2510   Epoch: 9   Global Step: 50030   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:54:04,721-Speed 10474.80 samples/sec   Loss 8.1080   LearningRate 0.2509   Epoch: 9   Global Step: 50040   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:54:12,524-Speed 10500.72 samples/sec   Loss 8.0238   LearningRate 0.2508   Epoch: 9   Global Step: 50050   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:54:20,325-Speed 10503.18 samples/sec   Loss 8.0466   LearningRate 0.2507   Epoch: 9   Global Step: 50060   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:54:28,129-Speed 10499.34 samples/sec   Loss 8.0405   LearningRate 0.2507   Epoch: 9   Global Step: 50070   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:54:35,907-Speed 10532.90 samples/sec   Loss 8.0383   LearningRate 0.2506   Epoch: 9   Global Step: 50080   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:54:43,720-Speed 10490.02 samples/sec   Loss 8.0941   LearningRate 0.2505   Epoch: 9   Global Step: 50090   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:54:51,549-Speed 10469.73 samples/sec   Loss 8.0650   LearningRate 0.2504   Epoch: 9   Global Step: 50100   Fp16 Grad Scale: 524288   Required: 12 hours
Training: 2022-01-16 01:54:59,349-Speed 10504.38 samples/sec   Loss 8.0925   LearningRate 0.2503   Epoch: 9   Global Step: 50110   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:55:07,145-Speed 10509.90 samples/sec   Loss 8.0028   LearningRate 0.2502   Epoch: 9   Global Step: 50120   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:55:14,920-Speed 10538.29 samples/sec   Loss 7.9982   LearningRate 0.2501   Epoch: 9   Global Step: 50130   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:55:22,698-Speed 10538.21 samples/sec   Loss 8.0513   LearningRate 0.2500   Epoch: 9   Global Step: 50140   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:55:30,496-Speed 10506.60 samples/sec   Loss 8.0340   LearningRate 0.2499   Epoch: 9   Global Step: 50150   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:55:38,296-Speed 10505.20 samples/sec   Loss 8.0053   LearningRate 0.2498   Epoch: 9   Global Step: 50160   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:55:46,094-Speed 10507.29 samples/sec   Loss 8.0316   LearningRate 0.2497   Epoch: 9   Global Step: 50170   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:55:53,863-Speed 10545.48 samples/sec   Loss 8.0473   LearningRate 0.2496   Epoch: 9   Global Step: 50180   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:56:01,627-Speed 10553.41 samples/sec   Loss 8.0440   LearningRate 0.2495   Epoch: 9   Global Step: 50190   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:56:09,427-Speed 10504.68 samples/sec   Loss 7.9829   LearningRate 0.2494   Epoch: 9   Global Step: 50200   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:56:17,186-Speed 10560.02 samples/sec   Loss 8.0439   LearningRate 0.2493   Epoch: 9   Global Step: 50210   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:56:24,972-Speed 10523.16 samples/sec   Loss 8.0145   LearningRate 0.2493   Epoch: 9   Global Step: 50220   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:56:32,763-Speed 10516.70 samples/sec   Loss 8.0824   LearningRate 0.2492   Epoch: 9   Global Step: 50230   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:56:40,555-Speed 10515.61 samples/sec   Loss 8.0261   LearningRate 0.2491   Epoch: 9   Global Step: 50240   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:56:48,360-Speed 10497.46 samples/sec   Loss 8.0338   LearningRate 0.2490   Epoch: 9   Global Step: 50250   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:56:56,169-Speed 10491.87 samples/sec   Loss 8.0380   LearningRate 0.2489   Epoch: 9   Global Step: 50260   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:57:04,026-Speed 10427.69 samples/sec   Loss 8.0348   LearningRate 0.2488   Epoch: 9   Global Step: 50270   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:57:11,817-Speed 10516.64 samples/sec   Loss 7.9853   LearningRate 0.2487   Epoch: 9   Global Step: 50280   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:57:19,597-Speed 10530.93 samples/sec   Loss 8.1055   LearningRate 0.2486   Epoch: 9   Global Step: 50290   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:57:27,380-Speed 10527.41 samples/sec   Loss 7.9629   LearningRate 0.2485   Epoch: 9   Global Step: 50300   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:57:35,171-Speed 10516.56 samples/sec   Loss 7.9450   LearningRate 0.2484   Epoch: 9   Global Step: 50310   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:57:42,969-Speed 10506.26 samples/sec   Loss 7.9832   LearningRate 0.2483   Epoch: 9   Global Step: 50320   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:57:50,765-Speed 10510.39 samples/sec   Loss 8.0105   LearningRate 0.2482   Epoch: 9   Global Step: 50330   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:57:58,557-Speed 10515.50 samples/sec   Loss 8.0071   LearningRate 0.2481   Epoch: 9   Global Step: 50340   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:58:06,344-Speed 10520.50 samples/sec   Loss 8.0643   LearningRate 0.2480   Epoch: 9   Global Step: 50350   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:58:14,130-Speed 10522.82 samples/sec   Loss 8.0348   LearningRate 0.2479   Epoch: 9   Global Step: 50360   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:58:21,948-Speed 10480.77 samples/sec   Loss 8.0335   LearningRate 0.2479   Epoch: 9   Global Step: 50370   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:58:29,761-Speed 10486.47 samples/sec   Loss 7.9741   LearningRate 0.2478   Epoch: 9   Global Step: 50380   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:58:37,573-Speed 10487.80 samples/sec   Loss 8.0304   LearningRate 0.2477   Epoch: 9   Global Step: 50390   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:58:45,388-Speed 10484.04 samples/sec   Loss 8.0116   LearningRate 0.2476   Epoch: 9   Global Step: 50400   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:58:53,177-Speed 10518.38 samples/sec   Loss 8.0570   LearningRate 0.2475   Epoch: 9   Global Step: 50410   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:59:00,995-Speed 10480.74 samples/sec   Loss 8.0725   LearningRate 0.2474   Epoch: 9   Global Step: 50420   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:59:08,799-Speed 10499.20 samples/sec   Loss 8.0743   LearningRate 0.2473   Epoch: 9   Global Step: 50430   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:59:16,599-Speed 10506.65 samples/sec   Loss 7.9889   LearningRate 0.2472   Epoch: 9   Global Step: 50440   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 01:59:24,388-Speed 10518.76 samples/sec   Loss 7.9759   LearningRate 0.2471   Epoch: 9   Global Step: 50450   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 01:59:32,163-Speed 10537.80 samples/sec   Loss 7.9637   LearningRate 0.2470   Epoch: 9   Global Step: 50460   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:59:39,938-Speed 10537.71 samples/sec   Loss 7.9456   LearningRate 0.2469   Epoch: 9   Global Step: 50470   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:59:47,718-Speed 10530.91 samples/sec   Loss 7.8899   LearningRate 0.2468   Epoch: 9   Global Step: 50480   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 01:59:55,497-Speed 10532.29 samples/sec   Loss 8.0310   LearningRate 0.2467   Epoch: 9   Global Step: 50490   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 02:00:03,291-Speed 10515.63 samples/sec   Loss 8.0259   LearningRate 0.2466   Epoch: 9   Global Step: 50500   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 02:00:11,067-Speed 10535.61 samples/sec   Loss 7.8999   LearningRate 0.2466   Epoch: 9   Global Step: 50510   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 02:00:18,914-Speed 10441.66 samples/sec   Loss 7.9440   LearningRate 0.2465   Epoch: 9   Global Step: 50520   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 02:00:26,737-Speed 10473.75 samples/sec   Loss 8.0107   LearningRate 0.2464   Epoch: 9   Global Step: 50530   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 02:00:34,515-Speed 10533.86 samples/sec   Loss 7.9407   LearningRate 0.2463   Epoch: 9   Global Step: 50540   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 02:00:42,308-Speed 10513.24 samples/sec   Loss 8.0079   LearningRate 0.2462   Epoch: 9   Global Step: 50550   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 02:00:50,120-Speed 10488.76 samples/sec   Loss 7.9826   LearningRate 0.2461   Epoch: 9   Global Step: 50560   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:00:57,921-Speed 10503.90 samples/sec   Loss 7.9863   LearningRate 0.2460   Epoch: 9   Global Step: 50570   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:01:05,707-Speed 10522.73 samples/sec   Loss 7.9747   LearningRate 0.2459   Epoch: 9   Global Step: 50580   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:01:13,481-Speed 10540.64 samples/sec   Loss 7.9800   LearningRate 0.2458   Epoch: 9   Global Step: 50590   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:01:21,289-Speed 10493.27 samples/sec   Loss 7.9017   LearningRate 0.2457   Epoch: 9   Global Step: 50600   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:01:29,116-Speed 10468.05 samples/sec   Loss 7.9420   LearningRate 0.2456   Epoch: 9   Global Step: 50610   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:01:36,917-Speed 10506.82 samples/sec   Loss 7.9170   LearningRate 0.2455   Epoch: 9   Global Step: 50620   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:01:44,705-Speed 10519.59 samples/sec   Loss 7.9609   LearningRate 0.2454   Epoch: 9   Global Step: 50630   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:01:52,494-Speed 10518.91 samples/sec   Loss 7.9962   LearningRate 0.2454   Epoch: 9   Global Step: 50640   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:02:00,273-Speed 10533.04 samples/sec   Loss 7.9557   LearningRate 0.2453   Epoch: 9   Global Step: 50650   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:02:08,065-Speed 10513.89 samples/sec   Loss 8.0050   LearningRate 0.2452   Epoch: 9   Global Step: 50660   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 02:02:15,860-Speed 10510.87 samples/sec   Loss 7.9217   LearningRate 0.2451   Epoch: 9   Global Step: 50670   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 02:02:23,630-Speed 10544.27 samples/sec   Loss 7.9940   LearningRate 0.2450   Epoch: 9   Global Step: 50680   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:02:31,418-Speed 10519.78 samples/sec   Loss 7.9791   LearningRate 0.2449   Epoch: 9   Global Step: 50690   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:02:39,195-Speed 10536.17 samples/sec   Loss 7.9397   LearningRate 0.2448   Epoch: 9   Global Step: 50700   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:02:47,020-Speed 10471.00 samples/sec   Loss 8.0250   LearningRate 0.2447   Epoch: 9   Global Step: 50710   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:02:54,822-Speed 10499.69 samples/sec   Loss 7.9596   LearningRate 0.2446   Epoch: 9   Global Step: 50720   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:03:02,611-Speed 10519.24 samples/sec   Loss 8.0089   LearningRate 0.2445   Epoch: 9   Global Step: 50730   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:03:10,395-Speed 10524.88 samples/sec   Loss 8.0395   LearningRate 0.2444   Epoch: 9   Global Step: 50740   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:03:18,187-Speed 10515.72 samples/sec   Loss 7.9750   LearningRate 0.2443   Epoch: 9   Global Step: 50750   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:03:25,978-Speed 10515.35 samples/sec   Loss 7.9193   LearningRate 0.2442   Epoch: 9   Global Step: 50760   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:03:33,757-Speed 10533.51 samples/sec   Loss 7.9522   LearningRate 0.2442   Epoch: 9   Global Step: 50770   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:03:41,558-Speed 10502.89 samples/sec   Loss 7.9308   LearningRate 0.2441   Epoch: 9   Global Step: 50780   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 02:03:49,373-Speed 10485.38 samples/sec   Loss 7.9279   LearningRate 0.2440   Epoch: 9   Global Step: 50790   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 02:03:57,168-Speed 10512.32 samples/sec   Loss 7.9076   LearningRate 0.2439   Epoch: 9   Global Step: 50800   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 02:04:04,956-Speed 10520.62 samples/sec   Loss 7.9741   LearningRate 0.2438   Epoch: 9   Global Step: 50810   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 02:04:12,743-Speed 10521.87 samples/sec   Loss 7.9444   LearningRate 0.2437   Epoch: 9   Global Step: 50820   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 02:04:20,529-Speed 10522.65 samples/sec   Loss 7.9697   LearningRate 0.2436   Epoch: 9   Global Step: 50830   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 02:04:28,316-Speed 10522.27 samples/sec   Loss 7.9084   LearningRate 0.2435   Epoch: 9   Global Step: 50840   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 02:04:36,100-Speed 10525.19 samples/sec   Loss 7.9196   LearningRate 0.2434   Epoch: 9   Global Step: 50850   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 02:04:43,908-Speed 10493.23 samples/sec   Loss 7.8803   LearningRate 0.2433   Epoch: 9   Global Step: 50860   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 02:04:51,721-Speed 10486.69 samples/sec   Loss 7.9307   LearningRate 0.2432   Epoch: 9   Global Step: 50870   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 02:04:59,519-Speed 10507.16 samples/sec   Loss 7.9328   LearningRate 0.2431   Epoch: 9   Global Step: 50880   Fp16 Grad Scale: 524288   Required: 12 hours
Training: 2022-01-16 02:05:07,326-Speed 10495.05 samples/sec   Loss 7.9301   LearningRate 0.2430   Epoch: 9   Global Step: 50890   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 02:05:15,128-Speed 10501.67 samples/sec   Loss 7.8966   LearningRate 0.2430   Epoch: 9   Global Step: 50900   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 02:05:22,953-Speed 10469.58 samples/sec   Loss 7.9431   LearningRate 0.2429   Epoch: 9   Global Step: 50910   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 02:05:30,756-Speed 10500.82 samples/sec   Loss 7.9656   LearningRate 0.2428   Epoch: 9   Global Step: 50920   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:05:38,536-Speed 10530.41 samples/sec   Loss 7.8851   LearningRate 0.2427   Epoch: 9   Global Step: 50930   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:05:46,312-Speed 10536.06 samples/sec   Loss 7.9038   LearningRate 0.2426   Epoch: 9   Global Step: 50940   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:05:54,138-Speed 10468.93 samples/sec   Loss 7.8982   LearningRate 0.2425   Epoch: 9   Global Step: 50950   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:06:01,949-Speed 10489.78 samples/sec   Loss 7.9183   LearningRate 0.2424   Epoch: 9   Global Step: 50960   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:06:09,737-Speed 10519.67 samples/sec   Loss 7.9063   LearningRate 0.2423   Epoch: 9   Global Step: 50970   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:06:17,522-Speed 10524.97 samples/sec   Loss 7.8806   LearningRate 0.2422   Epoch: 9   Global Step: 50980   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:06:25,333-Speed 10488.51 samples/sec   Loss 7.9196   LearningRate 0.2421   Epoch: 9   Global Step: 50990   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:06:33,126-Speed 10514.84 samples/sec   Loss 7.8673   LearningRate 0.2420   Epoch: 9   Global Step: 51000   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:06:40,932-Speed 10494.30 samples/sec   Loss 7.9135   LearningRate 0.2419   Epoch: 9   Global Step: 51010   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:06:48,746-Speed 10485.94 samples/sec   Loss 7.9371   LearningRate 0.2418   Epoch: 9   Global Step: 51020   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 02:06:56,526-Speed 10531.26 samples/sec   Loss 7.8480   LearningRate 0.2418   Epoch: 9   Global Step: 51030   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:07:04,344-Speed 10480.20 samples/sec   Loss 7.9114   LearningRate 0.2417   Epoch: 9   Global Step: 51040   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:07:12,149-Speed 10496.15 samples/sec   Loss 7.9390   LearningRate 0.2416   Epoch: 9   Global Step: 51050   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:07:19,955-Speed 10495.89 samples/sec   Loss 7.8924   LearningRate 0.2415   Epoch: 9   Global Step: 51060   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:07:27,747-Speed 10515.71 samples/sec   Loss 7.8143   LearningRate 0.2414   Epoch: 9   Global Step: 51070   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:07:35,544-Speed 10509.68 samples/sec   Loss 7.9384   LearningRate 0.2413   Epoch: 9   Global Step: 51080   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:07:43,321-Speed 10534.80 samples/sec   Loss 7.9680   LearningRate 0.2412   Epoch: 9   Global Step: 51090   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:07:51,133-Speed 10486.48 samples/sec   Loss 7.9099   LearningRate 0.2411   Epoch: 9   Global Step: 51100   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:07:58,933-Speed 10509.60 samples/sec   Loss 7.9176   LearningRate 0.2410   Epoch: 9   Global Step: 51110   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:08:06,720-Speed 10520.66 samples/sec   Loss 7.8641   LearningRate 0.2409   Epoch: 9   Global Step: 51120   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:08:14,537-Speed 10481.24 samples/sec   Loss 7.8820   LearningRate 0.2408   Epoch: 9   Global Step: 51130   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 02:08:22,354-Speed 10481.98 samples/sec   Loss 7.9119   LearningRate 0.2407   Epoch: 9   Global Step: 51140   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 02:08:30,144-Speed 10517.34 samples/sec   Loss 7.9264   LearningRate 0.2407   Epoch: 9   Global Step: 51150   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 02:08:37,945-Speed 10503.93 samples/sec   Loss 7.9581   LearningRate 0.2406   Epoch: 9   Global Step: 51160   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 02:08:45,739-Speed 10511.81 samples/sec   Loss 7.8592   LearningRate 0.2405   Epoch: 9   Global Step: 51170   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 02:08:53,527-Speed 10520.33 samples/sec   Loss 7.8681   LearningRate 0.2404   Epoch: 9   Global Step: 51180   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 02:09:01,334-Speed 10493.77 samples/sec   Loss 7.8989   LearningRate 0.2403   Epoch: 9   Global Step: 51190   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 02:09:09,120-Speed 10523.71 samples/sec   Loss 7.8537   LearningRate 0.2402   Epoch: 9   Global Step: 51200   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 02:09:16,919-Speed 10505.20 samples/sec   Loss 7.8616   LearningRate 0.2401   Epoch: 9   Global Step: 51210   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 02:09:24,746-Speed 10468.27 samples/sec   Loss 7.9195   LearningRate 0.2400   Epoch: 9   Global Step: 51220   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 02:09:32,563-Speed 10480.81 samples/sec   Loss 7.9006   LearningRate 0.2399   Epoch: 9   Global Step: 51230   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 02:09:40,373-Speed 10490.66 samples/sec   Loss 7.8974   LearningRate 0.2398   Epoch: 9   Global Step: 51240   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-16 02:09:48,157-Speed 10526.17 samples/sec   Loss 7.8805   LearningRate 0.2397   Epoch: 9   Global Step: 51250   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:09:55,966-Speed 10490.33 samples/sec   Loss 7.8998   LearningRate 0.2396   Epoch: 9   Global Step: 51260   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:10:03,768-Speed 10504.59 samples/sec   Loss 7.9237   LearningRate 0.2396   Epoch: 9   Global Step: 51270   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:10:11,607-Speed 10452.61 samples/sec   Loss 7.8626   LearningRate 0.2395   Epoch: 9   Global Step: 51280   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:10:19,394-Speed 10521.88 samples/sec   Loss 7.8903   LearningRate 0.2394   Epoch: 9   Global Step: 51290   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:10:27,201-Speed 10495.03 samples/sec   Loss 7.9129   LearningRate 0.2393   Epoch: 9   Global Step: 51300   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:10:34,994-Speed 10513.72 samples/sec   Loss 7.8396   LearningRate 0.2392   Epoch: 9   Global Step: 51310   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:10:42,795-Speed 10502.84 samples/sec   Loss 7.8283   LearningRate 0.2391   Epoch: 9   Global Step: 51320   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:10:50,614-Speed 10478.49 samples/sec   Loss 7.8889   LearningRate 0.2390   Epoch: 9   Global Step: 51330   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:10:58,399-Speed 10524.62 samples/sec   Loss 7.8955   LearningRate 0.2389   Epoch: 9   Global Step: 51340   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:11:06,211-Speed 10487.98 samples/sec   Loss 7.8284   LearningRate 0.2388   Epoch: 9   Global Step: 51350   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 02:11:14,038-Speed 10468.18 samples/sec   Loss 7.8400   LearningRate 0.2387   Epoch: 9   Global Step: 51360   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-16 02:11:21,845-Speed 10495.80 samples/sec   Loss 7.8222   LearningRate 0.2386   Epoch: 9   Global Step: 51370   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:11:29,651-Speed 10495.20 samples/sec   Loss 7.8453   LearningRate 0.2386   Epoch: 9   Global Step: 51380   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:11:37,475-Speed 10471.83 samples/sec   Loss 7.8188   LearningRate 0.2385   Epoch: 9   Global Step: 51390   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:11:45,312-Speed 10453.66 samples/sec   Loss 7.8900   LearningRate 0.2384   Epoch: 9   Global Step: 51400   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:11:53,136-Speed 10472.91 samples/sec   Loss 7.9015   LearningRate 0.2383   Epoch: 9   Global Step: 51410   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:12:00,992-Speed 10429.95 samples/sec   Loss 7.8663   LearningRate 0.2382   Epoch: 9   Global Step: 51420   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-16 02:12:08,795-Speed 10498.40 samples/sec   Loss 7.8553   LearningRate 0.2381   Epoch: 9   Global Step: 51430   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:12:16,601-Speed 10496.45 samples/sec   Loss 7.8306   LearningRate 0.2380   Epoch: 9   Global Step: 51440   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:12:24,418-Speed 10481.06 samples/sec   Loss 7.8381   LearningRate 0.2379   Epoch: 9   Global Step: 51450   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:12:32,202-Speed 10526.92 samples/sec   Loss 7.8383   LearningRate 0.2378   Epoch: 9   Global Step: 51460   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:12:40,000-Speed 10505.40 samples/sec   Loss 7.8308   LearningRate 0.2377   Epoch: 9   Global Step: 51470   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 02:12:47,794-Speed 10512.21 samples/sec   Loss 7.8497   LearningRate 0.2376   Epoch: 9   Global Step: 51480   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 02:12:55,591-Speed 10508.31 samples/sec   Loss 7.8399   LearningRate 0.2376   Epoch: 9   Global Step: 51490   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 02:13:03,382-Speed 10516.29 samples/sec   Loss 7.8789   LearningRate 0.2375   Epoch: 9   Global Step: 51500   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:13:11,164-Speed 10528.58 samples/sec   Loss 7.8557   LearningRate 0.2374   Epoch: 9   Global Step: 51510   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:13:18,962-Speed 10506.10 samples/sec   Loss 7.8457   LearningRate 0.2373   Epoch: 9   Global Step: 51520   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:13:26,818-Speed 10430.63 samples/sec   Loss 7.8903   LearningRate 0.2372   Epoch: 9   Global Step: 51530   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:13:34,598-Speed 10530.74 samples/sec   Loss 7.8810   LearningRate 0.2371   Epoch: 9   Global Step: 51540   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:13:42,415-Speed 10480.88 samples/sec   Loss 7.8094   LearningRate 0.2370   Epoch: 9   Global Step: 51550   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:13:50,238-Speed 10473.75 samples/sec   Loss 7.7899   LearningRate 0.2369   Epoch: 9   Global Step: 51560   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:13:58,076-Speed 10453.36 samples/sec   Loss 7.8215   LearningRate 0.2368   Epoch: 9   Global Step: 51570   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:14:05,882-Speed 10495.28 samples/sec   Loss 7.8508   LearningRate 0.2367   Epoch: 9   Global Step: 51580   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:14:13,673-Speed 10515.29 samples/sec   Loss 7.7884   LearningRate 0.2366   Epoch: 9   Global Step: 51590   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:14:21,463-Speed 10518.18 samples/sec   Loss 7.8028   LearningRate 0.2366   Epoch: 9   Global Step: 51600   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:14:29,254-Speed 10517.10 samples/sec   Loss 7.7978   LearningRate 0.2365   Epoch: 9   Global Step: 51610   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:14:37,049-Speed 10510.23 samples/sec   Loss 7.8088   LearningRate 0.2364   Epoch: 9   Global Step: 51620   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:14:44,837-Speed 10520.03 samples/sec   Loss 7.8363   LearningRate 0.2363   Epoch: 9   Global Step: 51630   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:14:52,631-Speed 10512.61 samples/sec   Loss 7.8230   LearningRate 0.2362   Epoch: 9   Global Step: 51640   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:15:00,404-Speed 10541.07 samples/sec   Loss 7.8148   LearningRate 0.2361   Epoch: 9   Global Step: 51650   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:15:08,263-Speed 10423.61 samples/sec   Loss 7.7875   LearningRate 0.2360   Epoch: 9   Global Step: 51660   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:15:16,050-Speed 10523.16 samples/sec   Loss 7.8423   LearningRate 0.2359   Epoch: 9   Global Step: 51670   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:15:23,839-Speed 10518.63 samples/sec   Loss 7.8885   LearningRate 0.2358   Epoch: 9   Global Step: 51680   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:15:31,652-Speed 10486.52 samples/sec   Loss 7.8357   LearningRate 0.2357   Epoch: 9   Global Step: 51690   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:15:39,443-Speed 10516.39 samples/sec   Loss 7.8028   LearningRate 0.2356   Epoch: 9   Global Step: 51700   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:15:47,263-Speed 10476.69 samples/sec   Loss 7.7616   LearningRate 0.2356   Epoch: 9   Global Step: 51710   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:15:55,084-Speed 10476.81 samples/sec   Loss 7.8202   LearningRate 0.2355   Epoch: 9   Global Step: 51720   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:16:02,890-Speed 10496.10 samples/sec   Loss 7.8022   LearningRate 0.2354   Epoch: 9   Global Step: 51730   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:16:10,722-Speed 10460.49 samples/sec   Loss 7.7891   LearningRate 0.2353   Epoch: 9   Global Step: 51740   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:16:18,563-Speed 10449.68 samples/sec   Loss 7.8494   LearningRate 0.2352   Epoch: 9   Global Step: 51750   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:16:26,350-Speed 10521.25 samples/sec   Loss 7.8016   LearningRate 0.2351   Epoch: 9   Global Step: 51760   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:16:34,142-Speed 10515.04 samples/sec   Loss 7.8450   LearningRate 0.2350   Epoch: 9   Global Step: 51770   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:16:41,956-Speed 10485.97 samples/sec   Loss 7.8474   LearningRate 0.2349   Epoch: 9   Global Step: 51780   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:16:49,747-Speed 10515.44 samples/sec   Loss 7.8427   LearningRate 0.2348   Epoch: 9   Global Step: 51790   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:16:57,534-Speed 10522.23 samples/sec   Loss 7.8241   LearningRate 0.2347   Epoch: 9   Global Step: 51800   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:17:05,342-Speed 10492.87 samples/sec   Loss 7.7859   LearningRate 0.2346   Epoch: 9   Global Step: 51810   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:17:13,139-Speed 10508.57 samples/sec   Loss 7.7930   LearningRate 0.2346   Epoch: 9   Global Step: 51820   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:17:20,932-Speed 10514.10 samples/sec   Loss 7.8134   LearningRate 0.2345   Epoch: 9   Global Step: 51830   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:17:28,746-Speed 10488.73 samples/sec   Loss 7.8163   LearningRate 0.2344   Epoch: 9   Global Step: 51840   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:17:51,182-Speed 3651.59 samples/sec   Loss 7.8212   LearningRate 0.2343   Epoch: 10   Global Step: 51850   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:17:58,953-Speed 10543.27 samples/sec   Loss 7.7921   LearningRate 0.2342   Epoch: 10   Global Step: 51860   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:18:06,721-Speed 10547.17 samples/sec   Loss 7.7550   LearningRate 0.2341   Epoch: 10   Global Step: 51870   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:18:14,510-Speed 10519.51 samples/sec   Loss 7.7873   LearningRate 0.2340   Epoch: 10   Global Step: 51880   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:18:22,280-Speed 10545.17 samples/sec   Loss 7.7843   LearningRate 0.2339   Epoch: 10   Global Step: 51890   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:18:30,048-Speed 10545.97 samples/sec   Loss 7.7155   LearningRate 0.2338   Epoch: 10   Global Step: 51900   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:18:37,829-Speed 10530.10 samples/sec   Loss 7.7742   LearningRate 0.2337   Epoch: 10   Global Step: 51910   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:18:45,615-Speed 10525.65 samples/sec   Loss 7.8075   LearningRate 0.2337   Epoch: 10   Global Step: 51920   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:18:53,414-Speed 10506.56 samples/sec   Loss 7.7251   LearningRate 0.2336   Epoch: 10   Global Step: 51930   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:19:01,229-Speed 10483.99 samples/sec   Loss 7.7761   LearningRate 0.2335   Epoch: 10   Global Step: 51940   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:19:09,065-Speed 10454.32 samples/sec   Loss 7.8183   LearningRate 0.2334   Epoch: 10   Global Step: 51950   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:19:16,897-Speed 10460.94 samples/sec   Loss 7.8202   LearningRate 0.2333   Epoch: 10   Global Step: 51960   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:19:24,724-Speed 10468.32 samples/sec   Loss 7.7561   LearningRate 0.2332   Epoch: 10   Global Step: 51970   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:19:32,526-Speed 10501.05 samples/sec   Loss 7.7458   LearningRate 0.2331   Epoch: 10   Global Step: 51980   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:19:40,324-Speed 10507.05 samples/sec   Loss 7.7339   LearningRate 0.2330   Epoch: 10   Global Step: 51990   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:19:48,114-Speed 10517.62 samples/sec   Loss 7.7187   LearningRate 0.2329   Epoch: 10   Global Step: 52000   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:19:55,891-Speed 10534.95 samples/sec   Loss 7.7635   LearningRate 0.2328   Epoch: 10   Global Step: 52010   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:20:03,674-Speed 10527.27 samples/sec   Loss 7.7815   LearningRate 0.2328   Epoch: 10   Global Step: 52020   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:20:11,478-Speed 10498.31 samples/sec   Loss 7.8287   LearningRate 0.2327   Epoch: 10   Global Step: 52030   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:20:19,267-Speed 10519.14 samples/sec   Loss 7.7961   LearningRate 0.2326   Epoch: 10   Global Step: 52040   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:20:27,076-Speed 10490.80 samples/sec   Loss 7.7443   LearningRate 0.2325   Epoch: 10   Global Step: 52050   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:20:34,875-Speed 10505.64 samples/sec   Loss 7.7605   LearningRate 0.2324   Epoch: 10   Global Step: 52060   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:20:42,661-Speed 10522.97 samples/sec   Loss 7.7161   LearningRate 0.2323   Epoch: 10   Global Step: 52070   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:20:50,450-Speed 10519.80 samples/sec   Loss 7.7663   LearningRate 0.2322   Epoch: 10   Global Step: 52080   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:20:58,230-Speed 10531.72 samples/sec   Loss 7.7563   LearningRate 0.2321   Epoch: 10   Global Step: 52090   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:21:06,011-Speed 10529.24 samples/sec   Loss 7.7851   LearningRate 0.2320   Epoch: 10   Global Step: 52100   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:21:13,802-Speed 10515.08 samples/sec   Loss 7.7453   LearningRate 0.2319   Epoch: 10   Global Step: 52110   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:21:21,588-Speed 10524.79 samples/sec   Loss 7.7763   LearningRate 0.2319   Epoch: 10   Global Step: 52120   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:21:29,370-Speed 10527.11 samples/sec   Loss 7.7767   LearningRate 0.2318   Epoch: 10   Global Step: 52130   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:21:37,180-Speed 10490.09 samples/sec   Loss 7.7743   LearningRate 0.2317   Epoch: 10   Global Step: 52140   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:21:44,968-Speed 10520.90 samples/sec   Loss 7.7966   LearningRate 0.2316   Epoch: 10   Global Step: 52150   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:21:52,775-Speed 10494.11 samples/sec   Loss 7.7588   LearningRate 0.2315   Epoch: 10   Global Step: 52160   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:22:00,567-Speed 10515.42 samples/sec   Loss 7.7508   LearningRate 0.2314   Epoch: 10   Global Step: 52170   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:22:08,353-Speed 10520.99 samples/sec   Loss 7.7270   LearningRate 0.2313   Epoch: 10   Global Step: 52180   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:22:16,157-Speed 10501.99 samples/sec   Loss 7.7088   LearningRate 0.2312   Epoch: 10   Global Step: 52190   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:22:23,950-Speed 10514.68 samples/sec   Loss 7.7512   LearningRate 0.2311   Epoch: 10   Global Step: 52200   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:22:31,727-Speed 10534.14 samples/sec   Loss 7.7317   LearningRate 0.2310   Epoch: 10   Global Step: 52210   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:22:39,521-Speed 10512.05 samples/sec   Loss 7.7313   LearningRate 0.2310   Epoch: 10   Global Step: 52220   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:22:47,310-Speed 10517.88 samples/sec   Loss 7.7168   LearningRate 0.2309   Epoch: 10   Global Step: 52230   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:22:55,122-Speed 10489.89 samples/sec   Loss 7.7150   LearningRate 0.2308   Epoch: 10   Global Step: 52240   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:23:02,937-Speed 10482.51 samples/sec   Loss 7.7012   LearningRate 0.2307   Epoch: 10   Global Step: 52250   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:23:10,730-Speed 10513.12 samples/sec   Loss 7.7216   LearningRate 0.2306   Epoch: 10   Global Step: 52260   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:23:18,543-Speed 10487.16 samples/sec   Loss 7.7701   LearningRate 0.2305   Epoch: 10   Global Step: 52270   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:23:26,367-Speed 10472.27 samples/sec   Loss 7.7389   LearningRate 0.2304   Epoch: 10   Global Step: 52280   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:23:34,178-Speed 10488.54 samples/sec   Loss 7.8209   LearningRate 0.2303   Epoch: 10   Global Step: 52290   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 02:23:42,013-Speed 10457.35 samples/sec   Loss 7.7921   LearningRate 0.2302   Epoch: 10   Global Step: 52300   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:23:49,889-Speed 10402.86 samples/sec   Loss 7.7514   LearningRate 0.2301   Epoch: 10   Global Step: 52310   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:23:57,745-Speed 10429.84 samples/sec   Loss 7.8016   LearningRate 0.2301   Epoch: 10   Global Step: 52320   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:24:05,574-Speed 10464.37 samples/sec   Loss 7.8241   LearningRate 0.2300   Epoch: 10   Global Step: 52330   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:24:13,408-Speed 10458.72 samples/sec   Loss 7.7548   LearningRate 0.2299   Epoch: 10   Global Step: 52340   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:24:21,228-Speed 10478.58 samples/sec   Loss 7.7256   LearningRate 0.2298   Epoch: 10   Global Step: 52350   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:24:29,053-Speed 10469.71 samples/sec   Loss 7.7831   LearningRate 0.2297   Epoch: 10   Global Step: 52360   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:24:36,916-Speed 10419.12 samples/sec   Loss 7.6993   LearningRate 0.2296   Epoch: 10   Global Step: 52370   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:24:44,747-Speed 10463.33 samples/sec   Loss 7.7693   LearningRate 0.2295   Epoch: 10   Global Step: 52380   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:24:52,569-Speed 10474.40 samples/sec   Loss 7.7270   LearningRate 0.2294   Epoch: 10   Global Step: 52390   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:25:00,442-Speed 10406.07 samples/sec   Loss 7.6904   LearningRate 0.2293   Epoch: 10   Global Step: 52400   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:25:08,295-Speed 10432.12 samples/sec   Loss 7.7233   LearningRate 0.2292   Epoch: 10   Global Step: 52410   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:25:16,120-Speed 10471.17 samples/sec   Loss 7.6756   LearningRate 0.2292   Epoch: 10   Global Step: 52420   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:25:24,003-Speed 10394.24 samples/sec   Loss 7.7377   LearningRate 0.2291   Epoch: 10   Global Step: 52430   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:25:31,856-Speed 10433.18 samples/sec   Loss 7.6844   LearningRate 0.2290   Epoch: 10   Global Step: 52440   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:25:39,688-Speed 10460.21 samples/sec   Loss 7.7057   LearningRate 0.2289   Epoch: 10   Global Step: 52450   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:25:47,520-Speed 10461.25 samples/sec   Loss 7.7214   LearningRate 0.2288   Epoch: 10   Global Step: 52460   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:25:55,296-Speed 10537.33 samples/sec   Loss 7.7124   LearningRate 0.2287   Epoch: 10   Global Step: 52470   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:26:03,085-Speed 10518.36 samples/sec   Loss 7.6890   LearningRate 0.2286   Epoch: 10   Global Step: 52480   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:26:10,883-Speed 10508.33 samples/sec   Loss 7.6525   LearningRate 0.2285   Epoch: 10   Global Step: 52490   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:26:18,652-Speed 10545.52 samples/sec   Loss 7.6416   LearningRate 0.2284   Epoch: 10   Global Step: 52500   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:26:26,442-Speed 10518.18 samples/sec   Loss 7.7464   LearningRate 0.2284   Epoch: 10   Global Step: 52510   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:26:34,222-Speed 10530.13 samples/sec   Loss 7.6280   LearningRate 0.2283   Epoch: 10   Global Step: 52520   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:26:42,017-Speed 10510.84 samples/sec   Loss 7.6885   LearningRate 0.2282   Epoch: 10   Global Step: 52530   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:26:49,793-Speed 10536.36 samples/sec   Loss 7.6915   LearningRate 0.2281   Epoch: 10   Global Step: 52540   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:26:57,569-Speed 10535.96 samples/sec   Loss 7.6483   LearningRate 0.2280   Epoch: 10   Global Step: 52550   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:27:05,396-Speed 10468.40 samples/sec   Loss 7.6933   LearningRate 0.2279   Epoch: 10   Global Step: 52560   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:27:13,161-Speed 10550.78 samples/sec   Loss 7.6842   LearningRate 0.2278   Epoch: 10   Global Step: 52570   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:27:20,976-Speed 10484.50 samples/sec   Loss 7.7334   LearningRate 0.2277   Epoch: 10   Global Step: 52580   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:27:28,782-Speed 10496.43 samples/sec   Loss 7.6803   LearningRate 0.2276   Epoch: 10   Global Step: 52590   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:27:36,572-Speed 10517.28 samples/sec   Loss 7.7074   LearningRate 0.2276   Epoch: 10   Global Step: 52600   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:27:44,348-Speed 10535.68 samples/sec   Loss 7.6838   LearningRate 0.2275   Epoch: 10   Global Step: 52610   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:27:52,111-Speed 10554.62 samples/sec   Loss 7.6473   LearningRate 0.2274   Epoch: 10   Global Step: 52620   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:27:59,914-Speed 10499.58 samples/sec   Loss 7.6653   LearningRate 0.2273   Epoch: 10   Global Step: 52630   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:28:07,685-Speed 10542.75 samples/sec   Loss 7.6483   LearningRate 0.2272   Epoch: 10   Global Step: 52640   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:28:15,471-Speed 10524.10 samples/sec   Loss 7.6634   LearningRate 0.2271   Epoch: 10   Global Step: 52650   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:28:23,261-Speed 10517.25 samples/sec   Loss 7.6506   LearningRate 0.2270   Epoch: 10   Global Step: 52660   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:28:31,055-Speed 10512.98 samples/sec   Loss 7.6904   LearningRate 0.2269   Epoch: 10   Global Step: 52670   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 02:28:38,884-Speed 10465.25 samples/sec   Loss 7.7149   LearningRate 0.2268   Epoch: 10   Global Step: 52680   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 02:28:46,671-Speed 10521.81 samples/sec   Loss 7.6837   LearningRate 0.2268   Epoch: 10   Global Step: 52690   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:28:54,449-Speed 10534.05 samples/sec   Loss 7.7252   LearningRate 0.2267   Epoch: 10   Global Step: 52700   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:29:02,280-Speed 10466.17 samples/sec   Loss 7.6890   LearningRate 0.2266   Epoch: 10   Global Step: 52710   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:29:10,071-Speed 10515.41 samples/sec   Loss 7.6976   LearningRate 0.2265   Epoch: 10   Global Step: 52720   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:29:17,843-Speed 10541.25 samples/sec   Loss 7.6718   LearningRate 0.2264   Epoch: 10   Global Step: 52730   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:29:25,634-Speed 10516.10 samples/sec   Loss 7.6454   LearningRate 0.2263   Epoch: 10   Global Step: 52740   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:29:33,433-Speed 10505.40 samples/sec   Loss 7.6885   LearningRate 0.2262   Epoch: 10   Global Step: 52750   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:29:41,235-Speed 10501.29 samples/sec   Loss 7.6886   LearningRate 0.2261   Epoch: 10   Global Step: 52760   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:29:49,028-Speed 10512.56 samples/sec   Loss 7.6610   LearningRate 0.2260   Epoch: 10   Global Step: 52770   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:29:56,848-Speed 10478.67 samples/sec   Loss 7.6732   LearningRate 0.2260   Epoch: 10   Global Step: 52780   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:30:04,644-Speed 10508.76 samples/sec   Loss 7.6936   LearningRate 0.2259   Epoch: 10   Global Step: 52790   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:30:12,438-Speed 10511.67 samples/sec   Loss 7.6445   LearningRate 0.2258   Epoch: 10   Global Step: 52800   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:30:20,231-Speed 10513.05 samples/sec   Loss 7.6833   LearningRate 0.2257   Epoch: 10   Global Step: 52810   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:30:28,015-Speed 10526.91 samples/sec   Loss 7.6198   LearningRate 0.2256   Epoch: 10   Global Step: 52820   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:30:35,850-Speed 10457.09 samples/sec   Loss 7.6146   LearningRate 0.2255   Epoch: 10   Global Step: 52830   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:30:43,668-Speed 10480.00 samples/sec   Loss 7.6259   LearningRate 0.2254   Epoch: 10   Global Step: 52840   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:30:51,487-Speed 10482.39 samples/sec   Loss 7.6197   LearningRate 0.2253   Epoch: 10   Global Step: 52850   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:30:59,289-Speed 10501.62 samples/sec   Loss 7.6193   LearningRate 0.2252   Epoch: 10   Global Step: 52860   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:31:07,097-Speed 10493.48 samples/sec   Loss 7.6655   LearningRate 0.2252   Epoch: 10   Global Step: 52870   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:31:14,881-Speed 10524.41 samples/sec   Loss 7.6372   LearningRate 0.2251   Epoch: 10   Global Step: 52880   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:31:22,693-Speed 10488.81 samples/sec   Loss 7.6226   LearningRate 0.2250   Epoch: 10   Global Step: 52890   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:31:30,500-Speed 10495.02 samples/sec   Loss 7.6877   LearningRate 0.2249   Epoch: 10   Global Step: 52900   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 02:31:38,300-Speed 10502.99 samples/sec   Loss 7.6647   LearningRate 0.2248   Epoch: 10   Global Step: 52910   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 02:31:46,125-Speed 10469.91 samples/sec   Loss 7.6582   LearningRate 0.2247   Epoch: 10   Global Step: 52920   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 02:31:53,933-Speed 10493.85 samples/sec   Loss 7.6399   LearningRate 0.2246   Epoch: 10   Global Step: 52930   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 02:32:01,724-Speed 10516.27 samples/sec   Loss 7.6337   LearningRate 0.2245   Epoch: 10   Global Step: 52940   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 02:32:09,503-Speed 10531.99 samples/sec   Loss 7.6603   LearningRate 0.2244   Epoch: 10   Global Step: 52950   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 02:32:17,284-Speed 10529.67 samples/sec   Loss 7.6187   LearningRate 0.2244   Epoch: 10   Global Step: 52960   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-16 02:32:25,085-Speed 10501.89 samples/sec   Loss 7.6457   LearningRate 0.2243   Epoch: 10   Global Step: 52970   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-16 02:32:32,883-Speed 10507.03 samples/sec   Loss 7.6280   LearningRate 0.2242   Epoch: 10   Global Step: 52980   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-16 02:32:40,679-Speed 10509.47 samples/sec   Loss 7.6375   LearningRate 0.2241   Epoch: 10   Global Step: 52990   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-16 02:32:48,452-Speed 10539.90 samples/sec   Loss 7.6623   LearningRate 0.2240   Epoch: 10   Global Step: 53000   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-16 02:32:56,261-Speed 10492.51 samples/sec   Loss 7.6597   LearningRate 0.2239   Epoch: 10   Global Step: 53010   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-16 02:33:04,064-Speed 10499.46 samples/sec   Loss 7.6374   LearningRate 0.2238   Epoch: 10   Global Step: 53020   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-16 02:33:11,884-Speed 10477.18 samples/sec   Loss 7.6998   LearningRate 0.2237   Epoch: 10   Global Step: 53030   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-16 02:33:19,683-Speed 10505.74 samples/sec   Loss 7.6562   LearningRate 0.2236   Epoch: 10   Global Step: 53040   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-16 02:33:27,471-Speed 10520.34 samples/sec   Loss 7.6335   LearningRate 0.2236   Epoch: 10   Global Step: 53050   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-16 02:33:35,275-Speed 10498.02 samples/sec   Loss 7.5531   LearningRate 0.2235   Epoch: 10   Global Step: 53060   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:33:43,061-Speed 10523.33 samples/sec   Loss 7.6041   LearningRate 0.2234   Epoch: 10   Global Step: 53070   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:33:50,858-Speed 10508.16 samples/sec   Loss 7.6459   LearningRate 0.2233   Epoch: 10   Global Step: 53080   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:33:58,671-Speed 10486.92 samples/sec   Loss 7.5931   LearningRate 0.2232   Epoch: 10   Global Step: 53090   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:34:06,456-Speed 10523.09 samples/sec   Loss 7.5848   LearningRate 0.2231   Epoch: 10   Global Step: 53100   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:34:14,252-Speed 10509.12 samples/sec   Loss 7.5847   LearningRate 0.2230   Epoch: 10   Global Step: 53110   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:34:22,025-Speed 10540.36 samples/sec   Loss 7.6262   LearningRate 0.2229   Epoch: 10   Global Step: 53120   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:34:29,806-Speed 10530.98 samples/sec   Loss 7.6638   LearningRate 0.2229   Epoch: 10   Global Step: 53130   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:34:37,682-Speed 10407.98 samples/sec   Loss 7.6412   LearningRate 0.2228   Epoch: 10   Global Step: 53140   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:34:45,465-Speed 10527.01 samples/sec   Loss 7.6679   LearningRate 0.2227   Epoch: 10   Global Step: 53150   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:34:53,243-Speed 10534.15 samples/sec   Loss 7.5989   LearningRate 0.2226   Epoch: 10   Global Step: 53160   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:35:01,050-Speed 10495.12 samples/sec   Loss 7.5688   LearningRate 0.2225   Epoch: 10   Global Step: 53170   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:35:08,844-Speed 10512.98 samples/sec   Loss 7.6194   LearningRate 0.2224   Epoch: 10   Global Step: 53180   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:35:16,644-Speed 10504.07 samples/sec   Loss 7.6214   LearningRate 0.2223   Epoch: 10   Global Step: 53190   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:35:24,454-Speed 10490.23 samples/sec   Loss 7.6347   LearningRate 0.2222   Epoch: 10   Global Step: 53200   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:35:32,241-Speed 10520.97 samples/sec   Loss 7.6011   LearningRate 0.2222   Epoch: 10   Global Step: 53210   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:35:40,015-Speed 10546.76 samples/sec   Loss 7.6250   LearningRate 0.2221   Epoch: 10   Global Step: 53220   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:35:47,801-Speed 10521.67 samples/sec   Loss 7.5203   LearningRate 0.2220   Epoch: 10   Global Step: 53230   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:35:55,601-Speed 10504.98 samples/sec   Loss 7.6149   LearningRate 0.2219   Epoch: 10   Global Step: 53240   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:36:03,383-Speed 10527.94 samples/sec   Loss 7.6260   LearningRate 0.2218   Epoch: 10   Global Step: 53250   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:36:11,209-Speed 10468.81 samples/sec   Loss 7.6455   LearningRate 0.2217   Epoch: 10   Global Step: 53260   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 02:36:19,034-Speed 10470.61 samples/sec   Loss 7.6136   LearningRate 0.2216   Epoch: 10   Global Step: 53270   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 02:36:26,864-Speed 10464.39 samples/sec   Loss 7.6150   LearningRate 0.2215   Epoch: 10   Global Step: 53280   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 02:36:34,664-Speed 10503.73 samples/sec   Loss 7.6133   LearningRate 0.2214   Epoch: 10   Global Step: 53290   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 02:36:42,448-Speed 10528.15 samples/sec   Loss 7.6297   LearningRate 0.2214   Epoch: 10   Global Step: 53300   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 02:36:50,248-Speed 10503.88 samples/sec   Loss 7.6103   LearningRate 0.2213   Epoch: 10   Global Step: 53310   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 02:36:58,030-Speed 10527.98 samples/sec   Loss 7.5473   LearningRate 0.2212   Epoch: 10   Global Step: 53320   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:37:05,820-Speed 10520.81 samples/sec   Loss 7.5908   LearningRate 0.2211   Epoch: 10   Global Step: 53330   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:37:13,617-Speed 10508.08 samples/sec   Loss 7.5440   LearningRate 0.2210   Epoch: 10   Global Step: 53340   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:37:21,456-Speed 10452.32 samples/sec   Loss 7.5896   LearningRate 0.2209   Epoch: 10   Global Step: 53350   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:37:29,283-Speed 10468.14 samples/sec   Loss 7.5552   LearningRate 0.2208   Epoch: 10   Global Step: 53360   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:37:37,085-Speed 10501.18 samples/sec   Loss 7.5364   LearningRate 0.2207   Epoch: 10   Global Step: 53370   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:37:44,870-Speed 10523.55 samples/sec   Loss 7.5405   LearningRate 0.2207   Epoch: 10   Global Step: 53380   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:37:52,702-Speed 10461.39 samples/sec   Loss 7.5385   LearningRate 0.2206   Epoch: 10   Global Step: 53390   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:38:00,567-Speed 10417.47 samples/sec   Loss 7.5916   LearningRate 0.2205   Epoch: 10   Global Step: 53400   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:38:08,378-Speed 10488.58 samples/sec   Loss 7.5763   LearningRate 0.2204   Epoch: 10   Global Step: 53410   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:38:16,176-Speed 10507.87 samples/sec   Loss 7.6317   LearningRate 0.2203   Epoch: 10   Global Step: 53420   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 02:38:24,015-Speed 10454.19 samples/sec   Loss 7.5692   LearningRate 0.2202   Epoch: 10   Global Step: 53430   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 02:38:31,865-Speed 10435.95 samples/sec   Loss 7.6296   LearningRate 0.2201   Epoch: 10   Global Step: 53440   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 02:38:39,684-Speed 10478.55 samples/sec   Loss 7.6125   LearningRate 0.2200   Epoch: 10   Global Step: 53450   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 02:38:47,528-Speed 10444.70 samples/sec   Loss 7.5531   LearningRate 0.2200   Epoch: 10   Global Step: 53460   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 02:38:55,360-Speed 10461.74 samples/sec   Loss 7.5776   LearningRate 0.2199   Epoch: 10   Global Step: 53470   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:39:03,184-Speed 10471.95 samples/sec   Loss 7.5315   LearningRate 0.2198   Epoch: 10   Global Step: 53480   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:39:11,019-Speed 10456.17 samples/sec   Loss 7.5863   LearningRate 0.2197   Epoch: 10   Global Step: 53490   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:39:18,827-Speed 10493.46 samples/sec   Loss 7.6036   LearningRate 0.2196   Epoch: 10   Global Step: 53500   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:39:26,643-Speed 10482.94 samples/sec   Loss 7.5305   LearningRate 0.2195   Epoch: 10   Global Step: 53510   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:39:34,448-Speed 10496.59 samples/sec   Loss 7.5847   LearningRate 0.2194   Epoch: 10   Global Step: 53520   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:39:42,263-Speed 10483.44 samples/sec   Loss 7.5551   LearningRate 0.2193   Epoch: 10   Global Step: 53530   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:39:50,086-Speed 10473.53 samples/sec   Loss 7.5510   LearningRate 0.2193   Epoch: 10   Global Step: 53540   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:39:57,874-Speed 10520.33 samples/sec   Loss 7.4994   LearningRate 0.2192   Epoch: 10   Global Step: 53550   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:40:05,676-Speed 10500.83 samples/sec   Loss 7.5125   LearningRate 0.2191   Epoch: 10   Global Step: 53560   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:40:13,505-Speed 10465.09 samples/sec   Loss 7.5704   LearningRate 0.2190   Epoch: 10   Global Step: 53570   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 02:40:21,308-Speed 10505.57 samples/sec   Loss 7.5832   LearningRate 0.2189   Epoch: 10   Global Step: 53580   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 02:40:29,094-Speed 10522.34 samples/sec   Loss 7.5485   LearningRate 0.2188   Epoch: 10   Global Step: 53590   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:40:36,891-Speed 10508.11 samples/sec   Loss 7.5472   LearningRate 0.2187   Epoch: 10   Global Step: 53600   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:40:44,677-Speed 10522.27 samples/sec   Loss 7.5152   LearningRate 0.2186   Epoch: 10   Global Step: 53610   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:40:52,470-Speed 10512.64 samples/sec   Loss 7.5294   LearningRate 0.2186   Epoch: 10   Global Step: 53620   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:41:00,274-Speed 10500.07 samples/sec   Loss 7.5619   LearningRate 0.2185   Epoch: 10   Global Step: 53630   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:41:08,068-Speed 10511.01 samples/sec   Loss 7.5580   LearningRate 0.2184   Epoch: 10   Global Step: 53640   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:41:15,875-Speed 10495.04 samples/sec   Loss 7.5210   LearningRate 0.2183   Epoch: 10   Global Step: 53650   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:41:23,695-Speed 10477.68 samples/sec   Loss 7.5270   LearningRate 0.2182   Epoch: 10   Global Step: 53660   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:41:31,463-Speed 10546.81 samples/sec   Loss 7.5303   LearningRate 0.2181   Epoch: 10   Global Step: 53670   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:41:39,233-Speed 10545.82 samples/sec   Loss 7.4911   LearningRate 0.2180   Epoch: 10   Global Step: 53680   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:41:47,036-Speed 10499.74 samples/sec   Loss 7.5610   LearningRate 0.2179   Epoch: 10   Global Step: 53690   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:41:54,830-Speed 10510.39 samples/sec   Loss 7.6134   LearningRate 0.2179   Epoch: 10   Global Step: 53700   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:42:02,618-Speed 10521.79 samples/sec   Loss 7.5608   LearningRate 0.2178   Epoch: 10   Global Step: 53710   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:42:10,394-Speed 10535.84 samples/sec   Loss 7.4678   LearningRate 0.2177   Epoch: 10   Global Step: 53720   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:42:18,178-Speed 10525.58 samples/sec   Loss 7.5400   LearningRate 0.2176   Epoch: 10   Global Step: 53730   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:42:25,989-Speed 10488.38 samples/sec   Loss 7.4827   LearningRate 0.2175   Epoch: 10   Global Step: 53740   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:42:33,777-Speed 10521.00 samples/sec   Loss 7.4976   LearningRate 0.2174   Epoch: 10   Global Step: 53750   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:42:41,594-Speed 10480.88 samples/sec   Loss 7.5276   LearningRate 0.2173   Epoch: 10   Global Step: 53760   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:42:49,381-Speed 10522.63 samples/sec   Loss 7.4778   LearningRate 0.2172   Epoch: 10   Global Step: 53770   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:42:57,194-Speed 10484.86 samples/sec   Loss 7.5068   LearningRate 0.2172   Epoch: 10   Global Step: 53780   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:43:04,995-Speed 10502.76 samples/sec   Loss 7.5148   LearningRate 0.2171   Epoch: 10   Global Step: 53790   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:43:12,800-Speed 10497.24 samples/sec   Loss 7.4893   LearningRate 0.2170   Epoch: 10   Global Step: 53800   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:43:20,592-Speed 10515.20 samples/sec   Loss 7.5301   LearningRate 0.2169   Epoch: 10   Global Step: 53810   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:43:28,416-Speed 10472.25 samples/sec   Loss 7.5612   LearningRate 0.2168   Epoch: 10   Global Step: 53820   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:43:36,238-Speed 10473.77 samples/sec   Loss 7.5065   LearningRate 0.2167   Epoch: 10   Global Step: 53830   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:43:44,038-Speed 10504.98 samples/sec   Loss 7.5130   LearningRate 0.2166   Epoch: 10   Global Step: 53840   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:43:51,867-Speed 10463.95 samples/sec   Loss 7.4868   LearningRate 0.2166   Epoch: 10   Global Step: 53850   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:43:59,667-Speed 10504.21 samples/sec   Loss 7.5387   LearningRate 0.2165   Epoch: 10   Global Step: 53860   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:44:07,463-Speed 10509.80 samples/sec   Loss 7.5248   LearningRate 0.2164   Epoch: 10   Global Step: 53870   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:44:15,280-Speed 10481.62 samples/sec   Loss 7.4925   LearningRate 0.2163   Epoch: 10   Global Step: 53880   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:44:23,108-Speed 10465.62 samples/sec   Loss 7.5064   LearningRate 0.2162   Epoch: 10   Global Step: 53890   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:44:30,911-Speed 10499.48 samples/sec   Loss 7.5224   LearningRate 0.2161   Epoch: 10   Global Step: 53900   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:44:38,712-Speed 10507.64 samples/sec   Loss 7.5545   LearningRate 0.2160   Epoch: 10   Global Step: 53910   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:44:46,493-Speed 10529.43 samples/sec   Loss 7.5483   LearningRate 0.2159   Epoch: 10   Global Step: 53920   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:44:54,269-Speed 10536.20 samples/sec   Loss 7.4497   LearningRate 0.2159   Epoch: 10   Global Step: 53930   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:45:02,104-Speed 10461.27 samples/sec   Loss 7.4683   LearningRate 0.2158   Epoch: 10   Global Step: 53940   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:45:09,905-Speed 10502.66 samples/sec   Loss 7.4635   LearningRate 0.2157   Epoch: 10   Global Step: 53950   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:45:17,723-Speed 10480.11 samples/sec   Loss 7.4085   LearningRate 0.2156   Epoch: 10   Global Step: 53960   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:45:25,523-Speed 10502.72 samples/sec   Loss 7.4822   LearningRate 0.2155   Epoch: 10   Global Step: 53970   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:45:33,326-Speed 10500.78 samples/sec   Loss 7.4912   LearningRate 0.2154   Epoch: 10   Global Step: 53980   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:45:41,138-Speed 10488.99 samples/sec   Loss 7.5047   LearningRate 0.2153   Epoch: 10   Global Step: 53990   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:45:48,972-Speed 10458.02 samples/sec   Loss 7.4916   LearningRate 0.2153   Epoch: 10   Global Step: 54000   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:45:56,802-Speed 10463.17 samples/sec   Loss 7.4730   LearningRate 0.2152   Epoch: 10   Global Step: 54010   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:46:04,629-Speed 10467.11 samples/sec   Loss 7.5005   LearningRate 0.2151   Epoch: 10   Global Step: 54020   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:46:12,426-Speed 10507.70 samples/sec   Loss 7.4839   LearningRate 0.2150   Epoch: 10   Global Step: 54030   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 02:46:20,216-Speed 10517.43 samples/sec   Loss 7.5087   LearningRate 0.2149   Epoch: 10   Global Step: 54040   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:46:28,025-Speed 10492.14 samples/sec   Loss 7.5175   LearningRate 0.2148   Epoch: 10   Global Step: 54050   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:46:35,829-Speed 10498.79 samples/sec   Loss 7.4950   LearningRate 0.2147   Epoch: 10   Global Step: 54060   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:46:43,629-Speed 10504.71 samples/sec   Loss 7.5429   LearningRate 0.2146   Epoch: 10   Global Step: 54070   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:46:51,447-Speed 10480.08 samples/sec   Loss 7.5003   LearningRate 0.2146   Epoch: 10   Global Step: 54080   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:46:59,243-Speed 10509.73 samples/sec   Loss 7.5048   LearningRate 0.2145   Epoch: 10   Global Step: 54090   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:47:07,023-Speed 10530.36 samples/sec   Loss 7.5282   LearningRate 0.2144   Epoch: 10   Global Step: 54100   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:47:14,802-Speed 10532.58 samples/sec   Loss 7.5020   LearningRate 0.2143   Epoch: 10   Global Step: 54110   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:47:22,612-Speed 10491.12 samples/sec   Loss 7.5001   LearningRate 0.2142   Epoch: 10   Global Step: 54120   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:47:30,428-Speed 10482.29 samples/sec   Loss 7.4842   LearningRate 0.2141   Epoch: 10   Global Step: 54130   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:47:38,220-Speed 10514.62 samples/sec   Loss 7.4345   LearningRate 0.2140   Epoch: 10   Global Step: 54140   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:47:46,015-Speed 10510.48 samples/sec   Loss 7.4550   LearningRate 0.2140   Epoch: 10   Global Step: 54150   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:47:53,833-Speed 10480.78 samples/sec   Loss 7.4501   LearningRate 0.2139   Epoch: 10   Global Step: 54160   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:48:01,654-Speed 10476.06 samples/sec   Loss 7.4611   LearningRate 0.2138   Epoch: 10   Global Step: 54170   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:48:09,458-Speed 10498.28 samples/sec   Loss 7.4836   LearningRate 0.2137   Epoch: 10   Global Step: 54180   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:48:17,266-Speed 10493.74 samples/sec   Loss 7.4330   LearningRate 0.2136   Epoch: 10   Global Step: 54190   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:48:25,089-Speed 10472.18 samples/sec   Loss 7.4826   LearningRate 0.2135   Epoch: 10   Global Step: 54200   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:48:32,905-Speed 10482.72 samples/sec   Loss 7.4574   LearningRate 0.2134   Epoch: 10   Global Step: 54210   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:48:40,720-Speed 10483.98 samples/sec   Loss 7.4319   LearningRate 0.2133   Epoch: 10   Global Step: 54220   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:48:48,507-Speed 10521.68 samples/sec   Loss 7.5166   LearningRate 0.2133   Epoch: 10   Global Step: 54230   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:48:56,300-Speed 10512.98 samples/sec   Loss 7.4503   LearningRate 0.2132   Epoch: 10   Global Step: 54240   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 02:49:04,142-Speed 10448.40 samples/sec   Loss 7.4530   LearningRate 0.2131   Epoch: 10   Global Step: 54250   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 02:49:11,942-Speed 10504.35 samples/sec   Loss 7.4492   LearningRate 0.2130   Epoch: 10   Global Step: 54260   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 02:49:19,729-Speed 10521.57 samples/sec   Loss 7.4830   LearningRate 0.2129   Epoch: 10   Global Step: 54270   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 02:49:27,540-Speed 10488.79 samples/sec   Loss 7.4528   LearningRate 0.2128   Epoch: 10   Global Step: 54280   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:49:35,317-Speed 10535.55 samples/sec   Loss 7.4665   LearningRate 0.2127   Epoch: 10   Global Step: 54290   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:49:43,107-Speed 10517.81 samples/sec   Loss 7.4621   LearningRate 0.2127   Epoch: 10   Global Step: 54300   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:49:50,922-Speed 10483.17 samples/sec   Loss 7.4147   LearningRate 0.2126   Epoch: 10   Global Step: 54310   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:49:58,706-Speed 10525.45 samples/sec   Loss 7.5128   LearningRate 0.2125   Epoch: 10   Global Step: 54320   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:50:06,553-Speed 10441.35 samples/sec   Loss 7.4653   LearningRate 0.2124   Epoch: 10   Global Step: 54330   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:50:14,360-Speed 10495.40 samples/sec   Loss 7.4265   LearningRate 0.2123   Epoch: 10   Global Step: 54340   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:50:22,176-Speed 10481.08 samples/sec   Loss 7.4577   LearningRate 0.2122   Epoch: 10   Global Step: 54350   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:50:29,965-Speed 10518.52 samples/sec   Loss 7.4409   LearningRate 0.2121   Epoch: 10   Global Step: 54360   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:50:37,755-Speed 10518.19 samples/sec   Loss 7.3806   LearningRate 0.2121   Epoch: 10   Global Step: 54370   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:50:45,555-Speed 10503.91 samples/sec   Loss 7.3950   LearningRate 0.2120   Epoch: 10   Global Step: 54380   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 02:50:53,357-Speed 10500.62 samples/sec   Loss 7.4613   LearningRate 0.2119   Epoch: 10   Global Step: 54390   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 02:51:01,158-Speed 10503.36 samples/sec   Loss 7.5230   LearningRate 0.2118   Epoch: 10   Global Step: 54400   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:51:08,979-Speed 10475.49 samples/sec   Loss 7.4643   LearningRate 0.2117   Epoch: 10   Global Step: 54410   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:51:16,789-Speed 10490.29 samples/sec   Loss 7.4230   LearningRate 0.2116   Epoch: 10   Global Step: 54420   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:51:24,638-Speed 10437.80 samples/sec   Loss 7.4963   LearningRate 0.2115   Epoch: 10   Global Step: 54430   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:51:32,470-Speed 10461.00 samples/sec   Loss 7.4931   LearningRate 0.2115   Epoch: 10   Global Step: 54440   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:51:40,271-Speed 10502.97 samples/sec   Loss 7.4609   LearningRate 0.2114   Epoch: 10   Global Step: 54450   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:51:48,057-Speed 10523.20 samples/sec   Loss 7.4387   LearningRate 0.2113   Epoch: 10   Global Step: 54460   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:51:55,881-Speed 10470.63 samples/sec   Loss 7.4288   LearningRate 0.2112   Epoch: 10   Global Step: 54470   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:52:03,688-Speed 10496.25 samples/sec   Loss 7.4132   LearningRate 0.2111   Epoch: 10   Global Step: 54480   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:52:11,460-Speed 10542.51 samples/sec   Loss 7.3685   LearningRate 0.2110   Epoch: 10   Global Step: 54490   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:52:19,289-Speed 10463.24 samples/sec   Loss 7.4003   LearningRate 0.2109   Epoch: 10   Global Step: 54500   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 02:52:27,085-Speed 10510.36 samples/sec   Loss 7.4193   LearningRate 0.2109   Epoch: 10   Global Step: 54510   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:52:34,881-Speed 10510.30 samples/sec   Loss 7.4138   LearningRate 0.2108   Epoch: 10   Global Step: 54520   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:52:42,673-Speed 10514.32 samples/sec   Loss 7.4208   LearningRate 0.2107   Epoch: 10   Global Step: 54530   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:52:50,465-Speed 10514.53 samples/sec   Loss 7.4074   LearningRate 0.2106   Epoch: 10   Global Step: 54540   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:52:58,269-Speed 10498.53 samples/sec   Loss 7.3900   LearningRate 0.2105   Epoch: 10   Global Step: 54550   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:53:06,077-Speed 10493.41 samples/sec   Loss 7.3708   LearningRate 0.2104   Epoch: 10   Global Step: 54560   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:53:13,867-Speed 10517.31 samples/sec   Loss 7.3740   LearningRate 0.2103   Epoch: 10   Global Step: 54570   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:53:21,656-Speed 10518.92 samples/sec   Loss 7.3929   LearningRate 0.2103   Epoch: 10   Global Step: 54580   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:53:29,500-Speed 10445.78 samples/sec   Loss 7.4474   LearningRate 0.2102   Epoch: 10   Global Step: 54590   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:53:37,299-Speed 10505.14 samples/sec   Loss 7.4093   LearningRate 0.2101   Epoch: 10   Global Step: 54600   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:53:45,128-Speed 10464.49 samples/sec   Loss 7.3607   LearningRate 0.2100   Epoch: 10   Global Step: 54610   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:53:52,905-Speed 10535.91 samples/sec   Loss 7.4207   LearningRate 0.2099   Epoch: 10   Global Step: 54620   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:54:00,747-Speed 10447.46 samples/sec   Loss 7.4135   LearningRate 0.2098   Epoch: 10   Global Step: 54630   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:54:08,533-Speed 10523.18 samples/sec   Loss 7.4049   LearningRate 0.2097   Epoch: 10   Global Step: 54640   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:54:16,333-Speed 10502.87 samples/sec   Loss 7.4169   LearningRate 0.2097   Epoch: 10   Global Step: 54650   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:54:24,140-Speed 10494.41 samples/sec   Loss 7.4746   LearningRate 0.2096   Epoch: 10   Global Step: 54660   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:54:31,986-Speed 10445.92 samples/sec   Loss 7.4344   LearningRate 0.2095   Epoch: 10   Global Step: 54670   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:54:39,801-Speed 10482.99 samples/sec   Loss 7.4045   LearningRate 0.2094   Epoch: 10   Global Step: 54680   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:54:47,659-Speed 10427.67 samples/sec   Loss 7.3982   LearningRate 0.2093   Epoch: 10   Global Step: 54690   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:54:55,480-Speed 10475.96 samples/sec   Loss 7.3874   LearningRate 0.2092   Epoch: 10   Global Step: 54700   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:55:03,282-Speed 10500.50 samples/sec   Loss 7.3847   LearningRate 0.2091   Epoch: 10   Global Step: 54710   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:55:11,075-Speed 10513.38 samples/sec   Loss 7.3856   LearningRate 0.2091   Epoch: 10   Global Step: 54720   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:55:18,890-Speed 10484.04 samples/sec   Loss 7.4272   LearningRate 0.2090   Epoch: 10   Global Step: 54730   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:55:26,698-Speed 10494.09 samples/sec   Loss 7.4406   LearningRate 0.2089   Epoch: 10   Global Step: 54740   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:55:34,482-Speed 10526.56 samples/sec   Loss 7.4112   LearningRate 0.2088   Epoch: 10   Global Step: 54750   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:55:42,251-Speed 10544.90 samples/sec   Loss 7.3935   LearningRate 0.2087   Epoch: 10   Global Step: 54760   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:55:50,108-Speed 10427.53 samples/sec   Loss 7.3863   LearningRate 0.2086   Epoch: 10   Global Step: 54770   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:55:57,949-Speed 10449.86 samples/sec   Loss 7.3373   LearningRate 0.2085   Epoch: 10   Global Step: 54780   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:56:05,746-Speed 10508.47 samples/sec   Loss 7.4057   LearningRate 0.2085   Epoch: 10   Global Step: 54790   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:56:13,523-Speed 10534.87 samples/sec   Loss 7.4084   LearningRate 0.2084   Epoch: 10   Global Step: 54800   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:56:21,321-Speed 10505.96 samples/sec   Loss 7.4282   LearningRate 0.2083   Epoch: 10   Global Step: 54810   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:56:29,163-Speed 10447.31 samples/sec   Loss 7.4162   LearningRate 0.2082   Epoch: 10   Global Step: 54820   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:56:36,953-Speed 10517.77 samples/sec   Loss 7.3831   LearningRate 0.2081   Epoch: 10   Global Step: 54830   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:56:44,735-Speed 10527.79 samples/sec   Loss 7.3649   LearningRate 0.2080   Epoch: 10   Global Step: 54840   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:56:52,518-Speed 10526.96 samples/sec   Loss 7.3676   LearningRate 0.2079   Epoch: 10   Global Step: 54850   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 02:57:00,319-Speed 10503.32 samples/sec   Loss 7.4195   LearningRate 0.2079   Epoch: 10   Global Step: 54860   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:57:08,121-Speed 10505.06 samples/sec   Loss 7.4277   LearningRate 0.2078   Epoch: 10   Global Step: 54870   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:57:15,903-Speed 10528.53 samples/sec   Loss 7.3991   LearningRate 0.2077   Epoch: 10   Global Step: 54880   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:57:23,687-Speed 10530.15 samples/sec   Loss 7.3747   LearningRate 0.2076   Epoch: 10   Global Step: 54890   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:57:31,476-Speed 10519.47 samples/sec   Loss 7.3562   LearningRate 0.2075   Epoch: 10   Global Step: 54900   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:57:39,340-Speed 10417.46 samples/sec   Loss 7.3828   LearningRate 0.2074   Epoch: 10   Global Step: 54910   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:57:47,126-Speed 10524.33 samples/sec   Loss 7.2936   LearningRate 0.2074   Epoch: 10   Global Step: 54920   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:57:54,903-Speed 10534.56 samples/sec   Loss 7.3353   LearningRate 0.2073   Epoch: 10   Global Step: 54930   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:58:02,722-Speed 10478.73 samples/sec   Loss 7.2961   LearningRate 0.2072   Epoch: 10   Global Step: 54940   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:58:10,572-Speed 10437.90 samples/sec   Loss 7.4022   LearningRate 0.2071   Epoch: 10   Global Step: 54950   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:58:18,363-Speed 10516.07 samples/sec   Loss 7.3818   LearningRate 0.2070   Epoch: 10   Global Step: 54960   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 02:58:26,173-Speed 10489.49 samples/sec   Loss 7.3772   LearningRate 0.2069   Epoch: 10   Global Step: 54970   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 02:58:33,975-Speed 10501.95 samples/sec   Loss 7.3823   LearningRate 0.2068   Epoch: 10   Global Step: 54980   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:58:41,783-Speed 10492.83 samples/sec   Loss 7.3895   LearningRate 0.2068   Epoch: 10   Global Step: 54990   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:58:49,600-Speed 10481.25 samples/sec   Loss 7.3182   LearningRate 0.2067   Epoch: 10   Global Step: 55000   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:58:59,249-Speed 8491.13 samples/sec   Loss 7.3697   LearningRate 0.2066   Epoch: 10   Global Step: 55010   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:59:07,038-Speed 10518.98 samples/sec   Loss 7.2973   LearningRate 0.2065   Epoch: 10   Global Step: 55020   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:59:14,838-Speed 10503.71 samples/sec   Loss 7.3937   LearningRate 0.2064   Epoch: 10   Global Step: 55030   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:59:22,641-Speed 10500.32 samples/sec   Loss 7.3325   LearningRate 0.2063   Epoch: 10   Global Step: 55040   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:59:30,466-Speed 10469.66 samples/sec   Loss 7.3461   LearningRate 0.2062   Epoch: 10   Global Step: 55050   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:59:38,293-Speed 10468.05 samples/sec   Loss 7.4069   LearningRate 0.2062   Epoch: 10   Global Step: 55060   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:59:46,115-Speed 10474.74 samples/sec   Loss 7.3137   LearningRate 0.2061   Epoch: 10   Global Step: 55070   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 02:59:53,936-Speed 10475.02 samples/sec   Loss 7.3541   LearningRate 0.2060   Epoch: 10   Global Step: 55080   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 03:00:01,773-Speed 10454.69 samples/sec   Loss 7.3532   LearningRate 0.2059   Epoch: 10   Global Step: 55090   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:00:09,611-Speed 10452.11 samples/sec   Loss 7.3552   LearningRate 0.2058   Epoch: 10   Global Step: 55100   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:00:17,476-Speed 10417.69 samples/sec   Loss 7.3426   LearningRate 0.2057   Epoch: 10   Global Step: 55110   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:00:25,301-Speed 10470.73 samples/sec   Loss 7.2674   LearningRate 0.2057   Epoch: 10   Global Step: 55120   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:00:33,134-Speed 10458.80 samples/sec   Loss 7.3196   LearningRate 0.2056   Epoch: 10   Global Step: 55130   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:00:40,928-Speed 10512.93 samples/sec   Loss 7.3458   LearningRate 0.2055   Epoch: 10   Global Step: 55140   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:00:48,719-Speed 10516.59 samples/sec   Loss 7.3380   LearningRate 0.2054   Epoch: 10   Global Step: 55150   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-16 03:00:56,554-Speed 10455.59 samples/sec   Loss 7.3365   LearningRate 0.2053   Epoch: 10   Global Step: 55160   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-16 03:01:04,375-Speed 10476.65 samples/sec   Loss 7.3169   LearningRate 0.2052   Epoch: 10   Global Step: 55170   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-16 03:01:12,163-Speed 10520.26 samples/sec   Loss 7.2829   LearningRate 0.2051   Epoch: 10   Global Step: 55180   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-16 03:01:19,928-Speed 10551.15 samples/sec   Loss 7.3703   LearningRate 0.2051   Epoch: 10   Global Step: 55190   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-16 03:01:27,714-Speed 10522.40 samples/sec   Loss 7.3266   LearningRate 0.2050   Epoch: 10   Global Step: 55200   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-16 03:01:35,500-Speed 10522.88 samples/sec   Loss 7.3430   LearningRate 0.2049   Epoch: 10   Global Step: 55210   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-16 03:01:43,274-Speed 10540.00 samples/sec   Loss 7.3129   LearningRate 0.2048   Epoch: 10   Global Step: 55220   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-16 03:01:51,051-Speed 10535.81 samples/sec   Loss 7.4008   LearningRate 0.2047   Epoch: 10   Global Step: 55230   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-16 03:01:58,877-Speed 10467.34 samples/sec   Loss 7.3585   LearningRate 0.2046   Epoch: 10   Global Step: 55240   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-16 03:02:06,697-Speed 10477.99 samples/sec   Loss 7.3373   LearningRate 0.2046   Epoch: 10   Global Step: 55250   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 03:02:14,490-Speed 10514.57 samples/sec   Loss 7.2922   LearningRate 0.2045   Epoch: 10   Global Step: 55260   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 03:02:22,289-Speed 10505.57 samples/sec   Loss 7.2726   LearningRate 0.2044   Epoch: 10   Global Step: 55270   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 03:02:30,061-Speed 10540.97 samples/sec   Loss 7.3225   LearningRate 0.2043   Epoch: 10   Global Step: 55280   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 03:02:37,871-Speed 10490.33 samples/sec   Loss 7.2342   LearningRate 0.2042   Epoch: 10   Global Step: 55290   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 03:02:45,691-Speed 10477.59 samples/sec   Loss 7.3368   LearningRate 0.2041   Epoch: 10   Global Step: 55300   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 03:02:53,493-Speed 10503.07 samples/sec   Loss 7.3113   LearningRate 0.2040   Epoch: 10   Global Step: 55310   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 03:03:01,286-Speed 10513.86 samples/sec   Loss 7.2735   LearningRate 0.2040   Epoch: 10   Global Step: 55320   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 03:03:09,104-Speed 10479.63 samples/sec   Loss 7.2738   LearningRate 0.2039   Epoch: 10   Global Step: 55330   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 03:03:16,956-Speed 10434.42 samples/sec   Loss 7.2810   LearningRate 0.2038   Epoch: 10   Global Step: 55340   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 03:03:24,753-Speed 10507.92 samples/sec   Loss 7.3085   LearningRate 0.2037   Epoch: 10   Global Step: 55350   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:03:32,556-Speed 10504.32 samples/sec   Loss 7.2749   LearningRate 0.2036   Epoch: 10   Global Step: 55360   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:03:40,340-Speed 10525.30 samples/sec   Loss 7.2396   LearningRate 0.2035   Epoch: 10   Global Step: 55370   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:03:48,116-Speed 10536.24 samples/sec   Loss 7.3050   LearningRate 0.2035   Epoch: 10   Global Step: 55380   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:03:55,973-Speed 10427.61 samples/sec   Loss 7.3508   LearningRate 0.2034   Epoch: 10   Global Step: 55390   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:04:03,784-Speed 10489.10 samples/sec   Loss 7.3013   LearningRate 0.2033   Epoch: 10   Global Step: 55400   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:04:11,590-Speed 10495.38 samples/sec   Loss 7.3087   LearningRate 0.2032   Epoch: 10   Global Step: 55410   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:04:19,397-Speed 10495.19 samples/sec   Loss 7.3113   LearningRate 0.2031   Epoch: 10   Global Step: 55420   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:04:27,233-Speed 10456.03 samples/sec   Loss 7.3195   LearningRate 0.2030   Epoch: 10   Global Step: 55430   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:04:35,035-Speed 10505.91 samples/sec   Loss 7.3246   LearningRate 0.2030   Epoch: 10   Global Step: 55440   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:04:42,859-Speed 10470.92 samples/sec   Loss 7.3298   LearningRate 0.2029   Epoch: 10   Global Step: 55450   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 03:04:50,680-Speed 10477.11 samples/sec   Loss 7.3009   LearningRate 0.2028   Epoch: 10   Global Step: 55460   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 03:04:58,544-Speed 10418.40 samples/sec   Loss 7.2364   LearningRate 0.2027   Epoch: 10   Global Step: 55470   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:05:06,366-Speed 10479.14 samples/sec   Loss 7.2859   LearningRate 0.2026   Epoch: 10   Global Step: 55480   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:05:14,152-Speed 10523.23 samples/sec   Loss 7.2791   LearningRate 0.2025   Epoch: 10   Global Step: 55490   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 03:05:21,967-Speed 10483.94 samples/sec   Loss 7.2579   LearningRate 0.2024   Epoch: 10   Global Step: 55500   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 03:05:29,789-Speed 10474.99 samples/sec   Loss 7.2874   LearningRate 0.2024   Epoch: 10   Global Step: 55510   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 03:05:37,585-Speed 10509.60 samples/sec   Loss 7.3012   LearningRate 0.2023   Epoch: 10   Global Step: 55520   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 03:05:45,362-Speed 10534.71 samples/sec   Loss 7.2824   LearningRate 0.2022   Epoch: 10   Global Step: 55530   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 03:05:53,180-Speed 10479.10 samples/sec   Loss 7.2630   LearningRate 0.2021   Epoch: 10   Global Step: 55540   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 03:06:00,966-Speed 10524.12 samples/sec   Loss 7.2707   LearningRate 0.2020   Epoch: 10   Global Step: 55550   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 03:06:08,763-Speed 10508.35 samples/sec   Loss 7.2579   LearningRate 0.2019   Epoch: 10   Global Step: 55560   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 03:06:16,594-Speed 10462.07 samples/sec   Loss 7.2257   LearningRate 0.2019   Epoch: 10   Global Step: 55570   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 03:06:24,379-Speed 10523.66 samples/sec   Loss 7.2237   LearningRate 0.2018   Epoch: 10   Global Step: 55580   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-16 03:06:32,174-Speed 10511.41 samples/sec   Loss 7.2773   LearningRate 0.2017   Epoch: 10   Global Step: 55590   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:06:39,990-Speed 10482.37 samples/sec   Loss 7.2801   LearningRate 0.2016   Epoch: 10   Global Step: 55600   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:06:47,775-Speed 10524.18 samples/sec   Loss 7.3248   LearningRate 0.2015   Epoch: 10   Global Step: 55610   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:06:55,562-Speed 10520.82 samples/sec   Loss 7.3118   LearningRate 0.2014   Epoch: 10   Global Step: 55620   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:07:03,383-Speed 10475.52 samples/sec   Loss 7.2601   LearningRate 0.2014   Epoch: 10   Global Step: 55630   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:07:11,223-Speed 10451.26 samples/sec   Loss 7.2781   LearningRate 0.2013   Epoch: 10   Global Step: 55640   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:07:19,027-Speed 10497.67 samples/sec   Loss 7.2722   LearningRate 0.2012   Epoch: 10   Global Step: 55650   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:07:26,856-Speed 10466.28 samples/sec   Loss 7.2778   LearningRate 0.2011   Epoch: 10   Global Step: 55660   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:07:34,664-Speed 10492.83 samples/sec   Loss 7.2151   LearningRate 0.2010   Epoch: 10   Global Step: 55670   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:07:42,466-Speed 10501.24 samples/sec   Loss 7.2700   LearningRate 0.2009   Epoch: 10   Global Step: 55680   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:07:50,263-Speed 10508.30 samples/sec   Loss 7.2859   LearningRate 0.2009   Epoch: 10   Global Step: 55690   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 03:07:58,070-Speed 10494.51 samples/sec   Loss 7.1776   LearningRate 0.2008   Epoch: 10   Global Step: 55700   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:08:05,862-Speed 10514.68 samples/sec   Loss 7.2395   LearningRate 0.2007   Epoch: 10   Global Step: 55710   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:08:13,641-Speed 10532.89 samples/sec   Loss 7.2575   LearningRate 0.2006   Epoch: 10   Global Step: 55720   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:08:21,437-Speed 10508.74 samples/sec   Loss 7.2187   LearningRate 0.2005   Epoch: 10   Global Step: 55730   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:08:29,278-Speed 10448.21 samples/sec   Loss 7.2231   LearningRate 0.2004   Epoch: 10   Global Step: 55740   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:08:37,097-Speed 10482.89 samples/sec   Loss 7.2733   LearningRate 0.2004   Epoch: 10   Global Step: 55750   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:08:44,907-Speed 10492.32 samples/sec   Loss 7.2789   LearningRate 0.2003   Epoch: 10   Global Step: 55760   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:08:52,720-Speed 10484.87 samples/sec   Loss 7.2354   LearningRate 0.2002   Epoch: 10   Global Step: 55770   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:09:00,514-Speed 10512.37 samples/sec   Loss 7.2834   LearningRate 0.2001   Epoch: 10   Global Step: 55780   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:09:08,316-Speed 10501.91 samples/sec   Loss 7.2686   LearningRate 0.2000   Epoch: 10   Global Step: 55790   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:09:16,097-Speed 10529.75 samples/sec   Loss 7.1975   LearningRate 0.1999   Epoch: 10   Global Step: 55800   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 03:09:23,879-Speed 10528.62 samples/sec   Loss 7.2345   LearningRate 0.1999   Epoch: 10   Global Step: 55810   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:09:31,704-Speed 10469.38 samples/sec   Loss 7.2399   LearningRate 0.1998   Epoch: 10   Global Step: 55820   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:09:39,478-Speed 10540.39 samples/sec   Loss 7.2633   LearningRate 0.1997   Epoch: 10   Global Step: 55830   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:09:47,273-Speed 10509.91 samples/sec   Loss 7.2597   LearningRate 0.1996   Epoch: 10   Global Step: 55840   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:09:55,088-Speed 10484.28 samples/sec   Loss 7.2196   LearningRate 0.1995   Epoch: 10   Global Step: 55850   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:10:02,901-Speed 10486.30 samples/sec   Loss 7.2370   LearningRate 0.1994   Epoch: 10   Global Step: 55860   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:10:10,707-Speed 10497.35 samples/sec   Loss 7.2525   LearningRate 0.1994   Epoch: 10   Global Step: 55870   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:10:18,482-Speed 10536.35 samples/sec   Loss 7.2155   LearningRate 0.1993   Epoch: 10   Global Step: 55880   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:10:26,276-Speed 10513.49 samples/sec   Loss 7.2719   LearningRate 0.1992   Epoch: 10   Global Step: 55890   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:10:34,048-Speed 10541.66 samples/sec   Loss 7.2161   LearningRate 0.1991   Epoch: 10   Global Step: 55900   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:10:41,867-Speed 10478.23 samples/sec   Loss 7.2325   LearningRate 0.1990   Epoch: 10   Global Step: 55910   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-01-16 03:10:49,686-Speed 10478.76 samples/sec   Loss 7.2121   LearningRate 0.1989   Epoch: 10   Global Step: 55920   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-16 03:10:57,483-Speed 10509.36 samples/sec   Loss 7.1976   LearningRate 0.1989   Epoch: 10   Global Step: 55930   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:11:05,282-Speed 10505.60 samples/sec   Loss 7.2435   LearningRate 0.1988   Epoch: 10   Global Step: 55940   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:11:13,070-Speed 10520.45 samples/sec   Loss 7.2102   LearningRate 0.1987   Epoch: 10   Global Step: 55950   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:11:20,872-Speed 10501.74 samples/sec   Loss 7.2460   LearningRate 0.1986   Epoch: 10   Global Step: 55960   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:11:28,694-Speed 10473.38 samples/sec   Loss 7.2125   LearningRate 0.1985   Epoch: 10   Global Step: 55970   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:11:36,481-Speed 10522.57 samples/sec   Loss 7.2088   LearningRate 0.1984   Epoch: 10   Global Step: 55980   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:11:44,255-Speed 10538.71 samples/sec   Loss 7.1549   LearningRate 0.1984   Epoch: 10   Global Step: 55990   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:11:52,049-Speed 10511.37 samples/sec   Loss 7.1991   LearningRate 0.1983   Epoch: 10   Global Step: 56000   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:11:59,842-Speed 10514.19 samples/sec   Loss 7.1972   LearningRate 0.1982   Epoch: 10   Global Step: 56010   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:12:07,634-Speed 10515.71 samples/sec   Loss 7.1746   LearningRate 0.1981   Epoch: 10   Global Step: 56020   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-01-16 03:12:15,411-Speed 10535.28 samples/sec   Loss 7.2329   LearningRate 0.1980   Epoch: 10   Global Step: 56030   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:12:23,198-Speed 10521.69 samples/sec   Loss 7.1645   LearningRate 0.1979   Epoch: 10   Global Step: 56040   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:12:31,000-Speed 10502.33 samples/sec   Loss 7.1805   LearningRate 0.1979   Epoch: 10   Global Step: 56050   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:12:38,787-Speed 10520.52 samples/sec   Loss 7.1747   LearningRate 0.1978   Epoch: 10   Global Step: 56060   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:12:46,571-Speed 10525.35 samples/sec   Loss 7.1843   LearningRate 0.1977   Epoch: 10   Global Step: 56070   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:12:54,378-Speed 10496.54 samples/sec   Loss 7.2058   LearningRate 0.1976   Epoch: 10   Global Step: 56080   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:13:02,162-Speed 10524.67 samples/sec   Loss 7.2215   LearningRate 0.1975   Epoch: 10   Global Step: 56090   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:13:09,954-Speed 10514.47 samples/sec   Loss 7.1903   LearningRate 0.1974   Epoch: 10   Global Step: 56100   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:13:17,749-Speed 10511.51 samples/sec   Loss 7.1766   LearningRate 0.1974   Epoch: 10   Global Step: 56110   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:13:25,544-Speed 10511.23 samples/sec   Loss 7.2457   LearningRate 0.1973   Epoch: 10   Global Step: 56120   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:13:33,328-Speed 10525.16 samples/sec   Loss 7.1817   LearningRate 0.1972   Epoch: 10   Global Step: 56130   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-01-16 03:13:41,125-Speed 10507.61 samples/sec   Loss 7.1776   LearningRate 0.1971   Epoch: 10   Global Step: 56140   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-01-16 03:13:48,928-Speed 10500.62 samples/sec   Loss 7.1962   LearningRate 0.1970   Epoch: 10   Global Step: 56150   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-01-16 03:13:56,760-Speed 10461.28 samples/sec   Loss 7.1609   LearningRate 0.1969   Epoch: 10   Global Step: 56160   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-01-16 03:14:04,556-Speed 10508.45 samples/sec   Loss 7.2026   LearningRate 0.1969   Epoch: 10   Global Step: 56170   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-01-16 03:14:12,325-Speed 10547.36 samples/sec   Loss 7.2000   LearningRate 0.1968   Epoch: 10   Global Step: 56180   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:14:20,136-Speed 10489.21 samples/sec   Loss 7.2141   LearningRate 0.1967   Epoch: 10   Global Step: 56190   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:14:27,917-Speed 10528.64 samples/sec   Loss 7.1860   LearningRate 0.1966   Epoch: 10   Global Step: 56200   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:14:35,698-Speed 10530.17 samples/sec   Loss 7.1817   LearningRate 0.1965   Epoch: 10   Global Step: 56210   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:14:43,494-Speed 10509.61 samples/sec   Loss 7.1529   LearningRate 0.1964   Epoch: 10   Global Step: 56220   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:14:51,286-Speed 10514.20 samples/sec   Loss 7.1493   LearningRate 0.1964   Epoch: 10   Global Step: 56230   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:14:59,084-Speed 10506.33 samples/sec   Loss 7.1750   LearningRate 0.1963   Epoch: 10   Global Step: 56240   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:15:06,932-Speed 10439.60 samples/sec   Loss 7.2322   LearningRate 0.1962   Epoch: 10   Global Step: 56250   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:15:14,744-Speed 10488.54 samples/sec   Loss 7.1761   LearningRate 0.1961   Epoch: 10   Global Step: 56260   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:15:22,537-Speed 10513.10 samples/sec   Loss 7.1506   LearningRate 0.1960   Epoch: 10   Global Step: 56270   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:15:30,328-Speed 10515.61 samples/sec   Loss 7.2068   LearningRate 0.1959   Epoch: 10   Global Step: 56280   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-01-16 03:15:38,123-Speed 10510.20 samples/sec   Loss 7.1597   LearningRate 0.1959   Epoch: 10   Global Step: 56290   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:15:45,888-Speed 10551.55 samples/sec   Loss 7.2051   LearningRate 0.1958   Epoch: 10   Global Step: 56300   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:15:53,687-Speed 10505.86 samples/sec   Loss 7.2168   LearningRate 0.1957   Epoch: 10   Global Step: 56310   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:16:01,541-Speed 10432.56 samples/sec   Loss 7.2193   LearningRate 0.1956   Epoch: 10   Global Step: 56320   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:16:09,340-Speed 10504.83 samples/sec   Loss 7.1335   LearningRate 0.1955   Epoch: 10   Global Step: 56330   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:16:17,177-Speed 10453.88 samples/sec   Loss 7.1938   LearningRate 0.1955   Epoch: 10   Global Step: 56340   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:16:24,961-Speed 10526.08 samples/sec   Loss 7.1784   LearningRate 0.1954   Epoch: 10   Global Step: 56350   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:16:32,763-Speed 10500.48 samples/sec   Loss 7.1610   LearningRate 0.1953   Epoch: 10   Global Step: 56360   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:16:40,628-Speed 10418.44 samples/sec   Loss 7.1242   LearningRate 0.1952   Epoch: 10   Global Step: 56370   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:16:48,405-Speed 10534.14 samples/sec   Loss 7.1035   LearningRate 0.1951   Epoch: 10   Global Step: 56380   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:16:56,187-Speed 10528.42 samples/sec   Loss 7.1538   LearningRate 0.1950   Epoch: 10   Global Step: 56390   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:17:04,000-Speed 10486.21 samples/sec   Loss 7.1485   LearningRate 0.1950   Epoch: 10   Global Step: 56400   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:17:11,803-Speed 10500.10 samples/sec   Loss 7.1579   LearningRate 0.1949   Epoch: 10   Global Step: 56410   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:17:19,608-Speed 10496.94 samples/sec   Loss 7.1649   LearningRate 0.1948   Epoch: 10   Global Step: 56420   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:17:27,395-Speed 10521.59 samples/sec   Loss 7.1025   LearningRate 0.1947   Epoch: 10   Global Step: 56430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:17:35,188-Speed 10513.98 samples/sec   Loss 7.1643   LearningRate 0.1946   Epoch: 10   Global Step: 56440   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:17:42,994-Speed 10495.62 samples/sec   Loss 7.1478   LearningRate 0.1945   Epoch: 10   Global Step: 56450   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:17:50,780-Speed 10524.11 samples/sec   Loss 7.1453   LearningRate 0.1945   Epoch: 10   Global Step: 56460   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:17:58,554-Speed 10545.37 samples/sec   Loss 7.1646   LearningRate 0.1944   Epoch: 10   Global Step: 56470   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:18:06,331-Speed 10538.19 samples/sec   Loss 7.1431   LearningRate 0.1943   Epoch: 10   Global Step: 56480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:18:14,120-Speed 10519.94 samples/sec   Loss 7.0613   LearningRate 0.1942   Epoch: 10   Global Step: 56490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:18:21,903-Speed 10525.89 samples/sec   Loss 7.1523   LearningRate 0.1941   Epoch: 10   Global Step: 56500   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-01-16 03:18:29,677-Speed 10538.97 samples/sec   Loss 7.1154   LearningRate 0.1940   Epoch: 10   Global Step: 56510   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:18:37,491-Speed 10485.64 samples/sec   Loss 7.1103   LearningRate 0.1940   Epoch: 10   Global Step: 56520   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:18:45,272-Speed 10531.43 samples/sec   Loss 7.1264   LearningRate 0.1939   Epoch: 10   Global Step: 56530   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:18:53,071-Speed 10504.76 samples/sec   Loss 7.1461   LearningRate 0.1938   Epoch: 10   Global Step: 56540   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:19:00,860-Speed 10518.95 samples/sec   Loss 7.1756   LearningRate 0.1937   Epoch: 10   Global Step: 56550   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:19:08,687-Speed 10466.98 samples/sec   Loss 7.1157   LearningRate 0.1936   Epoch: 10   Global Step: 56560   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:19:16,502-Speed 10484.95 samples/sec   Loss 7.1217   LearningRate 0.1936   Epoch: 10   Global Step: 56570   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:19:24,277-Speed 10536.70 samples/sec   Loss 7.1060   LearningRate 0.1935   Epoch: 10   Global Step: 56580   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:19:32,059-Speed 10529.04 samples/sec   Loss 7.1486   LearningRate 0.1934   Epoch: 10   Global Step: 56590   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:19:39,836-Speed 10534.62 samples/sec   Loss 7.1351   LearningRate 0.1933   Epoch: 10   Global Step: 56600   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:19:47,636-Speed 10504.99 samples/sec   Loss 7.1442   LearningRate 0.1932   Epoch: 10   Global Step: 56610   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-01-16 03:19:55,422-Speed 10521.78 samples/sec   Loss 7.1234   LearningRate 0.1931   Epoch: 10   Global Step: 56620   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:20:03,225-Speed 10499.75 samples/sec   Loss 7.1253   LearningRate 0.1931   Epoch: 10   Global Step: 56630   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:20:11,005-Speed 10531.98 samples/sec   Loss 7.0897   LearningRate 0.1930   Epoch: 10   Global Step: 56640   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:20:18,792-Speed 10521.97 samples/sec   Loss 7.1201   LearningRate 0.1929   Epoch: 10   Global Step: 56650   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:20:26,561-Speed 10545.77 samples/sec   Loss 7.0824   LearningRate 0.1928   Epoch: 10   Global Step: 56660   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:20:34,362-Speed 10501.28 samples/sec   Loss 7.1057   LearningRate 0.1927   Epoch: 10   Global Step: 56670   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:20:42,196-Speed 10459.57 samples/sec   Loss 7.1237   LearningRate 0.1927   Epoch: 10   Global Step: 56680   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:20:50,001-Speed 10497.93 samples/sec   Loss 7.1327   LearningRate 0.1926   Epoch: 10   Global Step: 56690   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:20:57,798-Speed 10506.97 samples/sec   Loss 7.1015   LearningRate 0.1925   Epoch: 10   Global Step: 56700   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:21:05,620-Speed 10473.31 samples/sec   Loss 7.0983   LearningRate 0.1924   Epoch: 10   Global Step: 56710   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:21:13,415-Speed 10512.13 samples/sec   Loss 7.1236   LearningRate 0.1923   Epoch: 10   Global Step: 56720   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-01-16 03:21:21,229-Speed 10484.86 samples/sec   Loss 7.0799   LearningRate 0.1922   Epoch: 10   Global Step: 56730   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-01-16 03:21:29,025-Speed 10508.53 samples/sec   Loss 7.0848   LearningRate 0.1922   Epoch: 10   Global Step: 56740   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-01-16 03:21:36,827-Speed 10501.52 samples/sec   Loss 7.1370   LearningRate 0.1921   Epoch: 10   Global Step: 56750   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-01-16 03:21:44,634-Speed 10494.71 samples/sec   Loss 7.1139   LearningRate 0.1920   Epoch: 10   Global Step: 56760   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-01-16 03:21:52,432-Speed 10507.27 samples/sec   Loss 7.1016   LearningRate 0.1919   Epoch: 10   Global Step: 56770   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:22:00,237-Speed 10496.83 samples/sec   Loss 7.0441   LearningRate 0.1918   Epoch: 10   Global Step: 56780   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:22:08,040-Speed 10504.00 samples/sec   Loss 7.0576   LearningRate 0.1918   Epoch: 10   Global Step: 56790   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:22:15,837-Speed 10508.48 samples/sec   Loss 7.0854   LearningRate 0.1917   Epoch: 10   Global Step: 56800   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:22:23,638-Speed 10503.08 samples/sec   Loss 7.1303   LearningRate 0.1916   Epoch: 10   Global Step: 56810   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:22:31,448-Speed 10490.50 samples/sec   Loss 7.0975   LearningRate 0.1915   Epoch: 10   Global Step: 56820   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:22:39,271-Speed 10472.86 samples/sec   Loss 7.0294   LearningRate 0.1914   Epoch: 10   Global Step: 56830   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:22:47,095-Speed 10471.76 samples/sec   Loss 7.1357   LearningRate 0.1913   Epoch: 10   Global Step: 56840   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:22:54,873-Speed 10533.63 samples/sec   Loss 7.0721   LearningRate 0.1913   Epoch: 10   Global Step: 56850   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:23:02,689-Speed 10482.54 samples/sec   Loss 7.0766   LearningRate 0.1912   Epoch: 10   Global Step: 56860   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:23:10,488-Speed 10504.72 samples/sec   Loss 7.0867   LearningRate 0.1911   Epoch: 10   Global Step: 56870   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-01-16 03:23:18,288-Speed 10504.90 samples/sec   Loss 7.0730   LearningRate 0.1910   Epoch: 10   Global Step: 56880   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:23:26,106-Speed 10479.57 samples/sec   Loss 7.0743   LearningRate 0.1909   Epoch: 10   Global Step: 56890   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:23:33,912-Speed 10495.46 samples/sec   Loss 7.1506   LearningRate 0.1909   Epoch: 10   Global Step: 56900   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:23:41,724-Speed 10488.18 samples/sec   Loss 7.1126   LearningRate 0.1908   Epoch: 10   Global Step: 56910   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:23:49,502-Speed 10533.40 samples/sec   Loss 7.1517   LearningRate 0.1907   Epoch: 10   Global Step: 56920   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:23:57,297-Speed 10511.91 samples/sec   Loss 7.1024   LearningRate 0.1906   Epoch: 10   Global Step: 56930   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:24:05,098-Speed 10503.59 samples/sec   Loss 7.1139   LearningRate 0.1905   Epoch: 10   Global Step: 56940   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:24:12,882-Speed 10524.61 samples/sec   Loss 7.1162   LearningRate 0.1904   Epoch: 10   Global Step: 56950   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:24:20,707-Speed 10470.59 samples/sec   Loss 7.1361   LearningRate 0.1904   Epoch: 10   Global Step: 56960   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:24:28,526-Speed 10478.22 samples/sec   Loss 7.0317   LearningRate 0.1903   Epoch: 10   Global Step: 56970   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:24:36,340-Speed 10485.07 samples/sec   Loss 7.0921   LearningRate 0.1902   Epoch: 10   Global Step: 56980   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:24:44,175-Speed 10457.36 samples/sec   Loss 7.0680   LearningRate 0.1901   Epoch: 10   Global Step: 56990   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:24:51,953-Speed 10534.60 samples/sec   Loss 7.0443   LearningRate 0.1900   Epoch: 10   Global Step: 57000   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:24:59,756-Speed 10500.00 samples/sec   Loss 7.0600   LearningRate 0.1900   Epoch: 10   Global Step: 57010   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:25:07,552-Speed 10509.95 samples/sec   Loss 7.0826   LearningRate 0.1899   Epoch: 10   Global Step: 57020   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:25:15,356-Speed 10497.73 samples/sec   Loss 7.0776   LearningRate 0.1898   Epoch: 10   Global Step: 57030   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:25:37,496-Speed 3700.32 samples/sec   Loss 7.0811   LearningRate 0.1897   Epoch: 11   Global Step: 57040   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:25:45,257-Speed 10560.68 samples/sec   Loss 7.0777   LearningRate 0.1896   Epoch: 11   Global Step: 57050   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:25:53,032-Speed 10538.59 samples/sec   Loss 7.0473   LearningRate 0.1896   Epoch: 11   Global Step: 57060   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:26:00,807-Speed 10537.77 samples/sec   Loss 7.0316   LearningRate 0.1895   Epoch: 11   Global Step: 57070   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:26:08,583-Speed 10539.17 samples/sec   Loss 6.9979   LearningRate 0.1894   Epoch: 11   Global Step: 57080   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:26:16,389-Speed 10495.44 samples/sec   Loss 7.0492   LearningRate 0.1893   Epoch: 11   Global Step: 57090   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:26:24,179-Speed 10517.32 samples/sec   Loss 7.0739   LearningRate 0.1892   Epoch: 11   Global Step: 57100   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:26:31,971-Speed 10515.01 samples/sec   Loss 7.0777   LearningRate 0.1891   Epoch: 11   Global Step: 57110   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:26:39,787-Speed 10483.36 samples/sec   Loss 7.0574   LearningRate 0.1891   Epoch: 11   Global Step: 57120   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:26:47,589-Speed 10500.51 samples/sec   Loss 7.0512   LearningRate 0.1890   Epoch: 11   Global Step: 57130   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:26:55,427-Speed 10453.03 samples/sec   Loss 7.0226   LearningRate 0.1889   Epoch: 11   Global Step: 57140   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:27:03,211-Speed 10533.33 samples/sec   Loss 7.0130   LearningRate 0.1888   Epoch: 11   Global Step: 57150   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:27:11,018-Speed 10494.83 samples/sec   Loss 7.0221   LearningRate 0.1887   Epoch: 11   Global Step: 57160   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:27:18,881-Speed 10419.05 samples/sec   Loss 7.0460   LearningRate 0.1887   Epoch: 11   Global Step: 57170   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:27:26,670-Speed 10518.61 samples/sec   Loss 7.0511   LearningRate 0.1886   Epoch: 11   Global Step: 57180   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:27:34,437-Speed 10548.73 samples/sec   Loss 7.0217   LearningRate 0.1885   Epoch: 11   Global Step: 57190   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:27:42,225-Speed 10520.07 samples/sec   Loss 7.0404   LearningRate 0.1884   Epoch: 11   Global Step: 57200   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:27:50,007-Speed 10529.15 samples/sec   Loss 7.0291   LearningRate 0.1883   Epoch: 11   Global Step: 57210   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:27:57,826-Speed 10478.17 samples/sec   Loss 6.9708   LearningRate 0.1883   Epoch: 11   Global Step: 57220   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:28:05,635-Speed 10490.79 samples/sec   Loss 7.0251   LearningRate 0.1882   Epoch: 11   Global Step: 57230   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:28:13,407-Speed 10543.17 samples/sec   Loss 7.0046   LearningRate 0.1881   Epoch: 11   Global Step: 57240   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:28:21,270-Speed 10419.92 samples/sec   Loss 7.0267   LearningRate 0.1880   Epoch: 11   Global Step: 57250   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:28:29,059-Speed 10518.14 samples/sec   Loss 7.0519   LearningRate 0.1879   Epoch: 11   Global Step: 57260   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:28:36,827-Speed 10546.26 samples/sec   Loss 7.0070   LearningRate 0.1878   Epoch: 11   Global Step: 57270   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:28:44,606-Speed 10532.66 samples/sec   Loss 7.0787   LearningRate 0.1878   Epoch: 11   Global Step: 57280   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:28:52,414-Speed 10494.01 samples/sec   Loss 7.0759   LearningRate 0.1877   Epoch: 11   Global Step: 57290   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:29:00,194-Speed 10529.80 samples/sec   Loss 6.9930   LearningRate 0.1876   Epoch: 11   Global Step: 57300   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:29:07,979-Speed 10523.61 samples/sec   Loss 7.0270   LearningRate 0.1875   Epoch: 11   Global Step: 57310   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:29:15,770-Speed 10516.33 samples/sec   Loss 7.0434   LearningRate 0.1874   Epoch: 11   Global Step: 57320   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:29:23,564-Speed 10512.52 samples/sec   Loss 7.0302   LearningRate 0.1874   Epoch: 11   Global Step: 57330   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:29:31,362-Speed 10506.25 samples/sec   Loss 7.0227   LearningRate 0.1873   Epoch: 11   Global Step: 57340   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:29:39,190-Speed 10468.18 samples/sec   Loss 6.9638   LearningRate 0.1872   Epoch: 11   Global Step: 57350   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:29:46,978-Speed 10519.48 samples/sec   Loss 7.0425   LearningRate 0.1871   Epoch: 11   Global Step: 57360   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:29:54,784-Speed 10495.86 samples/sec   Loss 7.0419   LearningRate 0.1870   Epoch: 11   Global Step: 57370   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:30:02,587-Speed 10500.80 samples/sec   Loss 7.0705   LearningRate 0.1870   Epoch: 11   Global Step: 57380   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:30:10,377-Speed 10517.97 samples/sec   Loss 6.9759   LearningRate 0.1869   Epoch: 11   Global Step: 57390   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:30:18,178-Speed 10502.35 samples/sec   Loss 7.0173   LearningRate 0.1868   Epoch: 11   Global Step: 57400   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:30:25,996-Speed 10479.69 samples/sec   Loss 6.9784   LearningRate 0.1867   Epoch: 11   Global Step: 57410   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:30:33,867-Speed 10409.87 samples/sec   Loss 7.0010   LearningRate 0.1866   Epoch: 11   Global Step: 57420   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:30:41,696-Speed 10464.42 samples/sec   Loss 6.9496   LearningRate 0.1866   Epoch: 11   Global Step: 57430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:30:49,523-Speed 10468.04 samples/sec   Loss 7.0307   LearningRate 0.1865   Epoch: 11   Global Step: 57440   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:30:57,393-Speed 10411.10 samples/sec   Loss 7.0561   LearningRate 0.1864   Epoch: 11   Global Step: 57450   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:31:05,248-Speed 10430.70 samples/sec   Loss 7.0342   LearningRate 0.1863   Epoch: 11   Global Step: 57460   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:31:13,081-Speed 10458.28 samples/sec   Loss 7.0132   LearningRate 0.1862   Epoch: 11   Global Step: 57470   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:31:20,914-Speed 10459.49 samples/sec   Loss 6.9904   LearningRate 0.1862   Epoch: 11   Global Step: 57480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:31:28,743-Speed 10465.67 samples/sec   Loss 7.0180   LearningRate 0.1861   Epoch: 11   Global Step: 57490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:31:36,589-Speed 10442.28 samples/sec   Loss 6.9837   LearningRate 0.1860   Epoch: 11   Global Step: 57500   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:31:44,427-Speed 10453.48 samples/sec   Loss 7.0093   LearningRate 0.1859   Epoch: 11   Global Step: 57510   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:31:52,273-Speed 10442.28 samples/sec   Loss 7.0101   LearningRate 0.1858   Epoch: 11   Global Step: 57520   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:32:00,104-Speed 10463.60 samples/sec   Loss 7.0772   LearningRate 0.1857   Epoch: 11   Global Step: 57530   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:32:07,921-Speed 10480.05 samples/sec   Loss 7.0033   LearningRate 0.1857   Epoch: 11   Global Step: 57540   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:32:15,809-Speed 10386.85 samples/sec   Loss 7.0158   LearningRate 0.1856   Epoch: 11   Global Step: 57550   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:32:23,632-Speed 10473.98 samples/sec   Loss 6.9949   LearningRate 0.1855   Epoch: 11   Global Step: 57560   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:32:31,467-Speed 10456.94 samples/sec   Loss 7.0147   LearningRate 0.1854   Epoch: 11   Global Step: 57570   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:32:39,279-Speed 10486.99 samples/sec   Loss 6.9665   LearningRate 0.1853   Epoch: 11   Global Step: 57580   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:32:47,122-Speed 10446.16 samples/sec   Loss 7.0086   LearningRate 0.1853   Epoch: 11   Global Step: 57590   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:32:54,965-Speed 10447.28 samples/sec   Loss 7.0149   LearningRate 0.1852   Epoch: 11   Global Step: 57600   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:33:02,790-Speed 10470.71 samples/sec   Loss 6.9893   LearningRate 0.1851   Epoch: 11   Global Step: 57610   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:33:10,610-Speed 10476.12 samples/sec   Loss 6.9996   LearningRate 0.1850   Epoch: 11   Global Step: 57620   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:33:18,457-Speed 10440.95 samples/sec   Loss 6.9534   LearningRate 0.1849   Epoch: 11   Global Step: 57630   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:33:26,280-Speed 10474.04 samples/sec   Loss 7.0139   LearningRate 0.1849   Epoch: 11   Global Step: 57640   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:33:34,104-Speed 10472.14 samples/sec   Loss 6.9953   LearningRate 0.1848   Epoch: 11   Global Step: 57650   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:33:41,973-Speed 10412.62 samples/sec   Loss 6.9284   LearningRate 0.1847   Epoch: 11   Global Step: 57660   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:33:49,797-Speed 10472.02 samples/sec   Loss 6.9726   LearningRate 0.1846   Epoch: 11   Global Step: 57670   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:33:57,616-Speed 10477.92 samples/sec   Loss 6.9704   LearningRate 0.1845   Epoch: 11   Global Step: 57680   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-01-16 03:34:05,393-Speed 10534.73 samples/sec   Loss 7.0081   LearningRate 0.1845   Epoch: 11   Global Step: 57690   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-01-16 03:34:13,181-Speed 10520.13 samples/sec   Loss 6.9356   LearningRate 0.1844   Epoch: 11   Global Step: 57700   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-01-16 03:34:20,993-Speed 10488.95 samples/sec   Loss 6.9830   LearningRate 0.1843   Epoch: 11   Global Step: 57710   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:34:28,813-Speed 10476.91 samples/sec   Loss 6.9605   LearningRate 0.1842   Epoch: 11   Global Step: 57720   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:34:36,640-Speed 10467.82 samples/sec   Loss 6.9846   LearningRate 0.1841   Epoch: 11   Global Step: 57730   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:34:44,462-Speed 10479.65 samples/sec   Loss 7.0166   LearningRate 0.1841   Epoch: 11   Global Step: 57740   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:34:52,271-Speed 10492.20 samples/sec   Loss 6.9421   LearningRate 0.1840   Epoch: 11   Global Step: 57750   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:35:00,080-Speed 10491.92 samples/sec   Loss 6.9058   LearningRate 0.1839   Epoch: 11   Global Step: 57760   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:35:07,916-Speed 10456.21 samples/sec   Loss 6.9238   LearningRate 0.1838   Epoch: 11   Global Step: 57770   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:35:15,766-Speed 10436.35 samples/sec   Loss 6.9383   LearningRate 0.1837   Epoch: 11   Global Step: 57780   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:35:23,588-Speed 10474.54 samples/sec   Loss 6.9727   LearningRate 0.1837   Epoch: 11   Global Step: 57790   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:35:31,400-Speed 10488.53 samples/sec   Loss 6.9523   LearningRate 0.1836   Epoch: 11   Global Step: 57800   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:35:39,251-Speed 10435.29 samples/sec   Loss 6.9550   LearningRate 0.1835   Epoch: 11   Global Step: 57810   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:35:47,086-Speed 10457.71 samples/sec   Loss 6.9634   LearningRate 0.1834   Epoch: 11   Global Step: 57820   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:35:54,918-Speed 10460.86 samples/sec   Loss 6.9730   LearningRate 0.1833   Epoch: 11   Global Step: 57830   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:36:02,769-Speed 10435.21 samples/sec   Loss 6.9545   LearningRate 0.1833   Epoch: 11   Global Step: 57840   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:36:10,612-Speed 10447.25 samples/sec   Loss 6.9407   LearningRate 0.1832   Epoch: 11   Global Step: 57850   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:36:18,534-Speed 10342.13 samples/sec   Loss 6.9718   LearningRate 0.1831   Epoch: 11   Global Step: 57860   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:36:26,340-Speed 10495.08 samples/sec   Loss 6.9581   LearningRate 0.1830   Epoch: 11   Global Step: 57870   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:36:34,137-Speed 10507.58 samples/sec   Loss 6.9938   LearningRate 0.1829   Epoch: 11   Global Step: 57880   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:36:41,944-Speed 10495.25 samples/sec   Loss 6.9619   LearningRate 0.1829   Epoch: 11   Global Step: 57890   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:36:49,756-Speed 10488.24 samples/sec   Loss 6.9406   LearningRate 0.1828   Epoch: 11   Global Step: 57900   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:36:57,560-Speed 10499.01 samples/sec   Loss 6.9721   LearningRate 0.1827   Epoch: 11   Global Step: 57910   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:37:05,361-Speed 10501.53 samples/sec   Loss 6.9145   LearningRate 0.1826   Epoch: 11   Global Step: 57920   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:37:13,201-Speed 10451.47 samples/sec   Loss 6.9625   LearningRate 0.1825   Epoch: 11   Global Step: 57930   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:37:21,007-Speed 10495.48 samples/sec   Loss 6.9354   LearningRate 0.1825   Epoch: 11   Global Step: 57940   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:37:28,816-Speed 10492.10 samples/sec   Loss 6.9647   LearningRate 0.1824   Epoch: 11   Global Step: 57950   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:37:36,594-Speed 10533.46 samples/sec   Loss 6.9082   LearningRate 0.1823   Epoch: 11   Global Step: 57960   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:37:44,383-Speed 10518.61 samples/sec   Loss 6.9175   LearningRate 0.1822   Epoch: 11   Global Step: 57970   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:37:52,196-Speed 10487.52 samples/sec   Loss 6.9211   LearningRate 0.1821   Epoch: 11   Global Step: 57980   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:38:00,002-Speed 10494.45 samples/sec   Loss 6.9619   LearningRate 0.1821   Epoch: 11   Global Step: 57990   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:38:07,777-Speed 10537.19 samples/sec   Loss 6.9750   LearningRate 0.1820   Epoch: 11   Global Step: 58000   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:38:15,564-Speed 10521.67 samples/sec   Loss 6.9001   LearningRate 0.1819   Epoch: 11   Global Step: 58010   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:38:23,355-Speed 10517.69 samples/sec   Loss 6.9312   LearningRate 0.1818   Epoch: 11   Global Step: 58020   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:38:31,158-Speed 10498.81 samples/sec   Loss 6.9348   LearningRate 0.1817   Epoch: 11   Global Step: 58030   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:38:38,971-Speed 10493.36 samples/sec   Loss 6.9242   LearningRate 0.1817   Epoch: 11   Global Step: 58040   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:38:46,761-Speed 10518.26 samples/sec   Loss 6.8676   LearningRate 0.1816   Epoch: 11   Global Step: 58050   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:38:54,544-Speed 10527.42 samples/sec   Loss 6.9128   LearningRate 0.1815   Epoch: 11   Global Step: 58060   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:39:02,342-Speed 10507.21 samples/sec   Loss 6.9361   LearningRate 0.1814   Epoch: 11   Global Step: 58070   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:39:10,148-Speed 10494.71 samples/sec   Loss 6.9268   LearningRate 0.1813   Epoch: 11   Global Step: 58080   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:39:17,942-Speed 10512.03 samples/sec   Loss 6.9340   LearningRate 0.1813   Epoch: 11   Global Step: 58090   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:39:25,723-Speed 10530.44 samples/sec   Loss 6.9278   LearningRate 0.1812   Epoch: 11   Global Step: 58100   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:39:33,560-Speed 10454.05 samples/sec   Loss 6.9264   LearningRate 0.1811   Epoch: 11   Global Step: 58110   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:39:41,366-Speed 10495.95 samples/sec   Loss 6.8890   LearningRate 0.1810   Epoch: 11   Global Step: 58120   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:39:49,167-Speed 10502.36 samples/sec   Loss 6.8726   LearningRate 0.1809   Epoch: 11   Global Step: 58130   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:39:56,958-Speed 10517.45 samples/sec   Loss 6.9361   LearningRate 0.1809   Epoch: 11   Global Step: 58140   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:40:04,750-Speed 10517.64 samples/sec   Loss 6.9660   LearningRate 0.1808   Epoch: 11   Global Step: 58150   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:40:12,540-Speed 10516.25 samples/sec   Loss 6.9163   LearningRate 0.1807   Epoch: 11   Global Step: 58160   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:40:20,319-Speed 10533.33 samples/sec   Loss 6.9664   LearningRate 0.1806   Epoch: 11   Global Step: 58170   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:40:28,104-Speed 10524.36 samples/sec   Loss 6.9349   LearningRate 0.1806   Epoch: 11   Global Step: 58180   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:40:35,886-Speed 10527.62 samples/sec   Loss 6.9017   LearningRate 0.1805   Epoch: 11   Global Step: 58190   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:40:43,662-Speed 10537.22 samples/sec   Loss 6.8819   LearningRate 0.1804   Epoch: 11   Global Step: 58200   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:40:51,489-Speed 10468.14 samples/sec   Loss 6.8984   LearningRate 0.1803   Epoch: 11   Global Step: 58210   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:40:59,247-Speed 10559.76 samples/sec   Loss 6.8596   LearningRate 0.1802   Epoch: 11   Global Step: 58220   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:41:07,027-Speed 10531.90 samples/sec   Loss 6.9574   LearningRate 0.1802   Epoch: 11   Global Step: 58230   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:41:14,825-Speed 10506.68 samples/sec   Loss 6.8583   LearningRate 0.1801   Epoch: 11   Global Step: 58240   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:41:22,629-Speed 10498.35 samples/sec   Loss 6.9181   LearningRate 0.1800   Epoch: 11   Global Step: 58250   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:41:30,432-Speed 10499.66 samples/sec   Loss 6.9470   LearningRate 0.1799   Epoch: 11   Global Step: 58260   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:41:38,253-Speed 10476.39 samples/sec   Loss 6.9109   LearningRate 0.1798   Epoch: 11   Global Step: 58270   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:41:46,041-Speed 10520.64 samples/sec   Loss 6.8975   LearningRate 0.1798   Epoch: 11   Global Step: 58280   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-01-16 03:41:53,858-Speed 10480.59 samples/sec   Loss 6.9182   LearningRate 0.1797   Epoch: 11   Global Step: 58290   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-01-16 03:42:01,667-Speed 10494.97 samples/sec   Loss 6.8503   LearningRate 0.1796   Epoch: 11   Global Step: 58300   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:42:09,469-Speed 10501.63 samples/sec   Loss 6.8729   LearningRate 0.1795   Epoch: 11   Global Step: 58310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:42:17,316-Speed 10441.31 samples/sec   Loss 6.8780   LearningRate 0.1794   Epoch: 11   Global Step: 58320   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:42:25,138-Speed 10474.29 samples/sec   Loss 6.8383   LearningRate 0.1794   Epoch: 11   Global Step: 58330   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:42:32,926-Speed 10520.11 samples/sec   Loss 6.8593   LearningRate 0.1793   Epoch: 11   Global Step: 58340   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:42:40,729-Speed 10500.89 samples/sec   Loss 6.9050   LearningRate 0.1792   Epoch: 11   Global Step: 58350   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:42:48,512-Speed 10525.98 samples/sec   Loss 6.8870   LearningRate 0.1791   Epoch: 11   Global Step: 58360   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:42:56,311-Speed 10504.93 samples/sec   Loss 6.9002   LearningRate 0.1790   Epoch: 11   Global Step: 58370   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:43:04,114-Speed 10500.13 samples/sec   Loss 6.8751   LearningRate 0.1790   Epoch: 11   Global Step: 58380   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:43:11,937-Speed 10474.34 samples/sec   Loss 6.8985   LearningRate 0.1789   Epoch: 11   Global Step: 58390   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:43:19,735-Speed 10506.25 samples/sec   Loss 6.9353   LearningRate 0.1788   Epoch: 11   Global Step: 58400   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-01-16 03:43:27,538-Speed 10499.60 samples/sec   Loss 6.8688   LearningRate 0.1787   Epoch: 11   Global Step: 58410   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-01-16 03:43:35,331-Speed 10514.26 samples/sec   Loss 6.8857   LearningRate 0.1787   Epoch: 11   Global Step: 58420   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-01-16 03:43:43,133-Speed 10501.46 samples/sec   Loss 6.8665   LearningRate 0.1786   Epoch: 11   Global Step: 58430   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-01-16 03:43:50,931-Speed 10506.74 samples/sec   Loss 6.7982   LearningRate 0.1785   Epoch: 11   Global Step: 58440   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:43:58,728-Speed 10507.15 samples/sec   Loss 6.8546   LearningRate 0.1784   Epoch: 11   Global Step: 58450   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:44:06,526-Speed 10507.59 samples/sec   Loss 6.8460   LearningRate 0.1783   Epoch: 11   Global Step: 58460   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:44:14,320-Speed 10512.62 samples/sec   Loss 6.8890   LearningRate 0.1783   Epoch: 11   Global Step: 58470   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:44:22,111-Speed 10514.65 samples/sec   Loss 6.8431   LearningRate 0.1782   Epoch: 11   Global Step: 58480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:44:29,887-Speed 10537.52 samples/sec   Loss 6.8833   LearningRate 0.1781   Epoch: 11   Global Step: 58490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:44:37,705-Speed 10478.94 samples/sec   Loss 6.8288   LearningRate 0.1780   Epoch: 11   Global Step: 58500   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:44:45,506-Speed 10503.68 samples/sec   Loss 6.8740   LearningRate 0.1779   Epoch: 11   Global Step: 58510   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:44:53,312-Speed 10495.74 samples/sec   Loss 6.8780   LearningRate 0.1779   Epoch: 11   Global Step: 58520   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:45:01,138-Speed 10468.77 samples/sec   Loss 6.8752   LearningRate 0.1778   Epoch: 11   Global Step: 58530   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:45:08,924-Speed 10522.12 samples/sec   Loss 6.8833   LearningRate 0.1777   Epoch: 11   Global Step: 58540   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:45:16,713-Speed 10519.65 samples/sec   Loss 6.8415   LearningRate 0.1776   Epoch: 11   Global Step: 58550   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:45:24,496-Speed 10533.18 samples/sec   Loss 6.8313   LearningRate 0.1775   Epoch: 11   Global Step: 58560   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:45:32,275-Speed 10531.96 samples/sec   Loss 6.8418   LearningRate 0.1775   Epoch: 11   Global Step: 58570   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:45:40,086-Speed 10489.94 samples/sec   Loss 6.8349   LearningRate 0.1774   Epoch: 11   Global Step: 58580   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:45:47,867-Speed 10529.51 samples/sec   Loss 6.8051   LearningRate 0.1773   Epoch: 11   Global Step: 58590   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:45:55,646-Speed 10532.34 samples/sec   Loss 6.8600   LearningRate 0.1772   Epoch: 11   Global Step: 58600   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:46:03,450-Speed 10498.91 samples/sec   Loss 6.8867   LearningRate 0.1772   Epoch: 11   Global Step: 58610   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:46:11,232-Speed 10528.00 samples/sec   Loss 6.8867   LearningRate 0.1771   Epoch: 11   Global Step: 58620   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:46:19,008-Speed 10536.83 samples/sec   Loss 6.8248   LearningRate 0.1770   Epoch: 11   Global Step: 58630   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:46:26,789-Speed 10528.65 samples/sec   Loss 6.8261   LearningRate 0.1769   Epoch: 11   Global Step: 58640   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-01-16 03:46:34,602-Speed 10486.77 samples/sec   Loss 6.8455   LearningRate 0.1768   Epoch: 11   Global Step: 58650   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:46:42,402-Speed 10504.58 samples/sec   Loss 6.8239   LearningRate 0.1768   Epoch: 11   Global Step: 58660   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:46:50,205-Speed 10498.67 samples/sec   Loss 6.7835   LearningRate 0.1767   Epoch: 11   Global Step: 58670   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:46:58,021-Speed 10483.06 samples/sec   Loss 6.8333   LearningRate 0.1766   Epoch: 11   Global Step: 58680   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:47:05,843-Speed 10475.05 samples/sec   Loss 6.8671   LearningRate 0.1765   Epoch: 11   Global Step: 58690   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:47:13,628-Speed 10523.91 samples/sec   Loss 6.8317   LearningRate 0.1764   Epoch: 11   Global Step: 58700   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:47:21,435-Speed 10495.14 samples/sec   Loss 6.7897   LearningRate 0.1764   Epoch: 11   Global Step: 58710   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:47:29,227-Speed 10513.77 samples/sec   Loss 6.8033   LearningRate 0.1763   Epoch: 11   Global Step: 58720   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:47:37,014-Speed 10520.94 samples/sec   Loss 6.8162   LearningRate 0.1762   Epoch: 11   Global Step: 58730   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:47:44,814-Speed 10505.49 samples/sec   Loss 6.8543   LearningRate 0.1761   Epoch: 11   Global Step: 58740   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:47:52,604-Speed 10517.47 samples/sec   Loss 6.7986   LearningRate 0.1761   Epoch: 11   Global Step: 58750   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:48:00,403-Speed 10505.12 samples/sec   Loss 6.8032   LearningRate 0.1760   Epoch: 11   Global Step: 58760   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:48:08,190-Speed 10521.55 samples/sec   Loss 6.7919   LearningRate 0.1759   Epoch: 11   Global Step: 58770   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:48:15,991-Speed 10502.79 samples/sec   Loss 6.8143   LearningRate 0.1758   Epoch: 11   Global Step: 58780   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:48:23,791-Speed 10503.52 samples/sec   Loss 6.8254   LearningRate 0.1757   Epoch: 11   Global Step: 58790   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:48:31,570-Speed 10535.60 samples/sec   Loss 6.8148   LearningRate 0.1757   Epoch: 11   Global Step: 58800   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:48:39,380-Speed 10490.17 samples/sec   Loss 6.8027   LearningRate 0.1756   Epoch: 11   Global Step: 58810   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:48:47,176-Speed 10510.44 samples/sec   Loss 6.8544   LearningRate 0.1755   Epoch: 11   Global Step: 58820   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:48:54,972-Speed 10510.09 samples/sec   Loss 6.8276   LearningRate 0.1754   Epoch: 11   Global Step: 58830   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:49:02,807-Speed 10455.95 samples/sec   Loss 6.8480   LearningRate 0.1754   Epoch: 11   Global Step: 58840   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:49:10,621-Speed 10485.32 samples/sec   Loss 6.8315   LearningRate 0.1753   Epoch: 11   Global Step: 58850   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-01-16 03:49:18,481-Speed 10424.23 samples/sec   Loss 6.7977   LearningRate 0.1752   Epoch: 11   Global Step: 58860   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-01-16 03:49:26,286-Speed 10497.42 samples/sec   Loss 6.8200   LearningRate 0.1751   Epoch: 11   Global Step: 58870   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-01-16 03:49:34,096-Speed 10490.90 samples/sec   Loss 6.7819   LearningRate 0.1750   Epoch: 11   Global Step: 58880   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:49:41,933-Speed 10453.38 samples/sec   Loss 6.7892   LearningRate 0.1750   Epoch: 11   Global Step: 58890   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:49:49,748-Speed 10484.15 samples/sec   Loss 6.8424   LearningRate 0.1749   Epoch: 11   Global Step: 58900   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:49:57,564-Speed 10482.66 samples/sec   Loss 6.8013   LearningRate 0.1748   Epoch: 11   Global Step: 58910   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:50:05,370-Speed 10495.03 samples/sec   Loss 6.7951   LearningRate 0.1747   Epoch: 11   Global Step: 58920   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:50:13,174-Speed 10499.22 samples/sec   Loss 6.7700   LearningRate 0.1746   Epoch: 11   Global Step: 58930   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:50:20,993-Speed 10478.06 samples/sec   Loss 6.7776   LearningRate 0.1746   Epoch: 11   Global Step: 58940   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:50:28,792-Speed 10505.55 samples/sec   Loss 6.7812   LearningRate 0.1745   Epoch: 11   Global Step: 58950   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:50:36,599-Speed 10494.38 samples/sec   Loss 6.7870   LearningRate 0.1744   Epoch: 11   Global Step: 58960   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:50:44,455-Speed 10429.10 samples/sec   Loss 6.7901   LearningRate 0.1743   Epoch: 11   Global Step: 58970   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:50:52,252-Speed 10508.95 samples/sec   Loss 6.7723   LearningRate 0.1743   Epoch: 11   Global Step: 58980   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-01-16 03:51:00,052-Speed 10502.78 samples/sec   Loss 6.7679   LearningRate 0.1742   Epoch: 11   Global Step: 58990   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-01-16 03:51:07,848-Speed 10509.70 samples/sec   Loss 6.7902   LearningRate 0.1741   Epoch: 11   Global Step: 59000   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:51:15,659-Speed 10490.74 samples/sec   Loss 6.8112   LearningRate 0.1740   Epoch: 11   Global Step: 59010   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:51:23,451-Speed 10514.07 samples/sec   Loss 6.8199   LearningRate 0.1739   Epoch: 11   Global Step: 59020   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:51:31,230-Speed 10532.22 samples/sec   Loss 6.7999   LearningRate 0.1739   Epoch: 11   Global Step: 59030   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:51:39,030-Speed 10503.80 samples/sec   Loss 6.7917   LearningRate 0.1738   Epoch: 11   Global Step: 59040   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:51:46,820-Speed 10516.99 samples/sec   Loss 6.7411   LearningRate 0.1737   Epoch: 11   Global Step: 59050   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:51:54,674-Speed 10432.37 samples/sec   Loss 6.8560   LearningRate 0.1736   Epoch: 11   Global Step: 59060   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:52:02,488-Speed 10484.18 samples/sec   Loss 6.7899   LearningRate 0.1736   Epoch: 11   Global Step: 59070   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:52:10,309-Speed 10477.44 samples/sec   Loss 6.7528   LearningRate 0.1735   Epoch: 11   Global Step: 59080   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:52:18,111-Speed 10500.59 samples/sec   Loss 6.7848   LearningRate 0.1734   Epoch: 11   Global Step: 59090   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:52:25,932-Speed 10481.57 samples/sec   Loss 6.7571   LearningRate 0.1733   Epoch: 11   Global Step: 59100   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:52:33,729-Speed 10508.39 samples/sec   Loss 6.7545   LearningRate 0.1732   Epoch: 11   Global Step: 59110   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:52:41,521-Speed 10515.40 samples/sec   Loss 6.7742   LearningRate 0.1732   Epoch: 11   Global Step: 59120   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:52:49,304-Speed 10526.56 samples/sec   Loss 6.7610   LearningRate 0.1731   Epoch: 11   Global Step: 59130   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:52:57,092-Speed 10519.97 samples/sec   Loss 6.7532   LearningRate 0.1730   Epoch: 11   Global Step: 59140   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:53:04,902-Speed 10491.20 samples/sec   Loss 6.7326   LearningRate 0.1729   Epoch: 11   Global Step: 59150   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:53:12,690-Speed 10520.59 samples/sec   Loss 6.8027   LearningRate 0.1729   Epoch: 11   Global Step: 59160   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:53:20,509-Speed 10478.06 samples/sec   Loss 6.7489   LearningRate 0.1728   Epoch: 11   Global Step: 59170   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:53:28,353-Speed 10445.97 samples/sec   Loss 6.7381   LearningRate 0.1727   Epoch: 11   Global Step: 59180   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:53:36,138-Speed 10524.19 samples/sec   Loss 6.6779   LearningRate 0.1726   Epoch: 11   Global Step: 59190   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:53:43,923-Speed 10525.00 samples/sec   Loss 6.7521   LearningRate 0.1725   Epoch: 11   Global Step: 59200   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:53:51,717-Speed 10511.08 samples/sec   Loss 6.7549   LearningRate 0.1725   Epoch: 11   Global Step: 59210   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:53:59,546-Speed 10464.45 samples/sec   Loss 6.7252   LearningRate 0.1724   Epoch: 11   Global Step: 59220   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:54:07,383-Speed 10454.47 samples/sec   Loss 6.7434   LearningRate 0.1723   Epoch: 11   Global Step: 59230   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:54:15,172-Speed 10523.37 samples/sec   Loss 6.7729   LearningRate 0.1722   Epoch: 11   Global Step: 59240   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:54:22,968-Speed 10509.25 samples/sec   Loss 6.8126   LearningRate 0.1722   Epoch: 11   Global Step: 59250   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:54:30,764-Speed 10510.05 samples/sec   Loss 6.7671   LearningRate 0.1721   Epoch: 11   Global Step: 59260   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:54:38,558-Speed 10512.10 samples/sec   Loss 6.7730   LearningRate 0.1720   Epoch: 11   Global Step: 59270   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:54:46,372-Speed 10485.97 samples/sec   Loss 6.7545   LearningRate 0.1719   Epoch: 11   Global Step: 59280   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:54:54,140-Speed 10547.16 samples/sec   Loss 6.7268   LearningRate 0.1719   Epoch: 11   Global Step: 59290   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:55:01,942-Speed 10501.71 samples/sec   Loss 6.7619   LearningRate 0.1718   Epoch: 11   Global Step: 59300   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:55:09,724-Speed 10527.52 samples/sec   Loss 6.7775   LearningRate 0.1717   Epoch: 11   Global Step: 59310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:55:17,517-Speed 10514.34 samples/sec   Loss 6.7669   LearningRate 0.1716   Epoch: 11   Global Step: 59320   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:55:25,297-Speed 10531.47 samples/sec   Loss 6.7383   LearningRate 0.1715   Epoch: 11   Global Step: 59330   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:55:33,076-Speed 10531.58 samples/sec   Loss 6.7363   LearningRate 0.1715   Epoch: 11   Global Step: 59340   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:55:40,874-Speed 10506.94 samples/sec   Loss 6.7385   LearningRate 0.1714   Epoch: 11   Global Step: 59350   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:55:48,652-Speed 10533.54 samples/sec   Loss 6.7157   LearningRate 0.1713   Epoch: 11   Global Step: 59360   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:55:56,455-Speed 10506.07 samples/sec   Loss 6.7730   LearningRate 0.1712   Epoch: 11   Global Step: 59370   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:56:04,244-Speed 10518.16 samples/sec   Loss 6.7452   LearningRate 0.1712   Epoch: 11   Global Step: 59380   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:56:12,029-Speed 10523.86 samples/sec   Loss 6.7402   LearningRate 0.1711   Epoch: 11   Global Step: 59390   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:56:19,810-Speed 10530.85 samples/sec   Loss 6.7128   LearningRate 0.1710   Epoch: 11   Global Step: 59400   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:56:27,608-Speed 10505.96 samples/sec   Loss 6.6972   LearningRate 0.1709   Epoch: 11   Global Step: 59410   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:56:35,391-Speed 10527.50 samples/sec   Loss 6.7476   LearningRate 0.1708   Epoch: 11   Global Step: 59420   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:56:43,205-Speed 10485.47 samples/sec   Loss 6.7481   LearningRate 0.1708   Epoch: 11   Global Step: 59430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:56:51,026-Speed 10474.64 samples/sec   Loss 6.7276   LearningRate 0.1707   Epoch: 11   Global Step: 59440   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:56:58,829-Speed 10504.00 samples/sec   Loss 6.6948   LearningRate 0.1706   Epoch: 11   Global Step: 59450   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:57:06,647-Speed 10478.49 samples/sec   Loss 6.6802   LearningRate 0.1705   Epoch: 11   Global Step: 59460   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:57:14,423-Speed 10536.61 samples/sec   Loss 6.6895   LearningRate 0.1705   Epoch: 11   Global Step: 59470   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:57:22,207-Speed 10526.04 samples/sec   Loss 6.7169   LearningRate 0.1704   Epoch: 11   Global Step: 59480   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:57:30,027-Speed 10477.70 samples/sec   Loss 6.7688   LearningRate 0.1703   Epoch: 11   Global Step: 59490   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:57:37,816-Speed 10517.56 samples/sec   Loss 6.7677   LearningRate 0.1702   Epoch: 11   Global Step: 59500   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:57:45,604-Speed 10520.17 samples/sec   Loss 6.7377   LearningRate 0.1702   Epoch: 11   Global Step: 59510   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:57:53,399-Speed 10510.15 samples/sec   Loss 6.6921   LearningRate 0.1701   Epoch: 11   Global Step: 59520   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:58:01,192-Speed 10514.43 samples/sec   Loss 6.7579   LearningRate 0.1700   Epoch: 11   Global Step: 59530   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:58:09,030-Speed 10452.23 samples/sec   Loss 6.7088   LearningRate 0.1699   Epoch: 11   Global Step: 59540   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:58:16,829-Speed 10504.61 samples/sec   Loss 6.7377   LearningRate 0.1698   Epoch: 11   Global Step: 59550   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:58:24,609-Speed 10531.02 samples/sec   Loss 6.6959   LearningRate 0.1698   Epoch: 11   Global Step: 59560   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:58:32,395-Speed 10523.47 samples/sec   Loss 6.7502   LearningRate 0.1697   Epoch: 11   Global Step: 59570   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 03:58:40,160-Speed 10551.21 samples/sec   Loss 6.7028   LearningRate 0.1696   Epoch: 11   Global Step: 59580   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:58:47,970-Speed 10491.03 samples/sec   Loss 6.7429   LearningRate 0.1695   Epoch: 11   Global Step: 59590   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:58:55,763-Speed 10513.92 samples/sec   Loss 6.6774   LearningRate 0.1695   Epoch: 11   Global Step: 59600   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:59:03,558-Speed 10510.56 samples/sec   Loss 6.7037   LearningRate 0.1694   Epoch: 11   Global Step: 59610   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:59:11,353-Speed 10510.56 samples/sec   Loss 6.7361   LearningRate 0.1693   Epoch: 11   Global Step: 59620   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:59:19,151-Speed 10506.01 samples/sec   Loss 6.6915   LearningRate 0.1692   Epoch: 11   Global Step: 59630   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:59:26,944-Speed 10514.08 samples/sec   Loss 6.6598   LearningRate 0.1692   Epoch: 11   Global Step: 59640   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:59:34,741-Speed 10508.09 samples/sec   Loss 6.6865   LearningRate 0.1691   Epoch: 11   Global Step: 59650   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:59:42,543-Speed 10501.09 samples/sec   Loss 6.6846   LearningRate 0.1690   Epoch: 11   Global Step: 59660   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:59:50,352-Speed 10491.41 samples/sec   Loss 6.6931   LearningRate 0.1689   Epoch: 11   Global Step: 59670   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 03:59:58,187-Speed 10457.56 samples/sec   Loss 6.7101   LearningRate 0.1688   Epoch: 11   Global Step: 59680   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:00:06,044-Speed 10428.01 samples/sec   Loss 6.7191   LearningRate 0.1688   Epoch: 11   Global Step: 59690   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:00:13,882-Speed 10452.53 samples/sec   Loss 6.6368   LearningRate 0.1687   Epoch: 11   Global Step: 59700   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:00:21,695-Speed 10486.95 samples/sec   Loss 6.7041   LearningRate 0.1686   Epoch: 11   Global Step: 59710   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:00:29,496-Speed 10504.79 samples/sec   Loss 6.6916   LearningRate 0.1685   Epoch: 11   Global Step: 59720   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:00:37,302-Speed 10494.67 samples/sec   Loss 6.6843   LearningRate 0.1685   Epoch: 11   Global Step: 59730   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:00:45,115-Speed 10487.43 samples/sec   Loss 6.7329   LearningRate 0.1684   Epoch: 11   Global Step: 59740   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:00:52,914-Speed 10504.85 samples/sec   Loss 6.6796   LearningRate 0.1683   Epoch: 11   Global Step: 59750   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:01:00,741-Speed 10467.54 samples/sec   Loss 6.6774   LearningRate 0.1682   Epoch: 11   Global Step: 59760   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:01:08,563-Speed 10474.82 samples/sec   Loss 6.6714   LearningRate 0.1682   Epoch: 11   Global Step: 59770   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:01:16,355-Speed 10513.64 samples/sec   Loss 6.6801   LearningRate 0.1681   Epoch: 11   Global Step: 59780   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-01-16 04:01:24,170-Speed 10483.83 samples/sec   Loss 6.6959   LearningRate 0.1680   Epoch: 11   Global Step: 59790   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:01:32,011-Speed 10448.90 samples/sec   Loss 6.6811   LearningRate 0.1679   Epoch: 11   Global Step: 59800   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:01:39,811-Speed 10504.00 samples/sec   Loss 6.7129   LearningRate 0.1678   Epoch: 11   Global Step: 59810   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:01:47,640-Speed 10466.17 samples/sec   Loss 6.7470   LearningRate 0.1678   Epoch: 11   Global Step: 59820   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:01:55,443-Speed 10499.55 samples/sec   Loss 6.7100   LearningRate 0.1677   Epoch: 11   Global Step: 59830   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:02:03,221-Speed 10534.26 samples/sec   Loss 6.6551   LearningRate 0.1676   Epoch: 11   Global Step: 59840   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:02:11,033-Speed 10487.84 samples/sec   Loss 6.6609   LearningRate 0.1675   Epoch: 11   Global Step: 59850   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:02:18,960-Speed 10336.00 samples/sec   Loss 6.6771   LearningRate 0.1675   Epoch: 11   Global Step: 59860   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:02:26,774-Speed 10485.76 samples/sec   Loss 6.6698   LearningRate 0.1674   Epoch: 11   Global Step: 59870   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:02:34,569-Speed 10510.30 samples/sec   Loss 6.7050   LearningRate 0.1673   Epoch: 11   Global Step: 59880   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:02:42,368-Speed 10506.63 samples/sec   Loss 6.6511   LearningRate 0.1672   Epoch: 11   Global Step: 59890   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-01-16 04:02:50,160-Speed 10513.71 samples/sec   Loss 6.6642   LearningRate 0.1672   Epoch: 11   Global Step: 59900   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-01-16 04:02:57,961-Speed 10502.57 samples/sec   Loss 6.6957   LearningRate 0.1671   Epoch: 11   Global Step: 59910   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:03:05,758-Speed 10508.18 samples/sec   Loss 6.6835   LearningRate 0.1670   Epoch: 11   Global Step: 59920   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:03:13,577-Speed 10478.39 samples/sec   Loss 6.7115   LearningRate 0.1669   Epoch: 11   Global Step: 59930   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:03:21,399-Speed 10475.02 samples/sec   Loss 6.7007   LearningRate 0.1669   Epoch: 11   Global Step: 59940   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:03:29,179-Speed 10531.51 samples/sec   Loss 6.6850   LearningRate 0.1668   Epoch: 11   Global Step: 59950   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:03:37,011-Speed 10459.50 samples/sec   Loss 6.6529   LearningRate 0.1667   Epoch: 11   Global Step: 59960   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:03:44,812-Speed 10503.47 samples/sec   Loss 6.6214   LearningRate 0.1666   Epoch: 11   Global Step: 59970   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:03:52,603-Speed 10516.76 samples/sec   Loss 6.6482   LearningRate 0.1665   Epoch: 11   Global Step: 59980   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:04:00,395-Speed 10513.50 samples/sec   Loss 6.6261   LearningRate 0.1665   Epoch: 11   Global Step: 59990   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:04:08,189-Speed 10512.33 samples/sec   Loss 6.6901   LearningRate 0.1664   Epoch: 11   Global Step: 60000   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:04:36,334-[lfw][60000]XNorm: 24.240593
Training: 2022-01-16 04:04:36,335-[lfw][60000]Accuracy-Flip: 0.99733+-0.00291
Training: 2022-01-16 04:04:36,336-[lfw][60000]Accuracy-Highest: 0.99783
Training: 2022-01-16 04:05:08,911-[cfp_fp][60000]XNorm: 21.586307
Training: 2022-01-16 04:05:08,911-[cfp_fp][60000]Accuracy-Flip: 0.98500+-0.00448
Training: 2022-01-16 04:05:08,912-[cfp_fp][60000]Accuracy-Highest: 0.98500
Training: 2022-01-16 04:05:36,980-[agedb_30][60000]XNorm: 23.682680
Training: 2022-01-16 04:05:36,980-[agedb_30][60000]Accuracy-Flip: 0.97067+-0.00803
Training: 2022-01-16 04:05:36,981-[agedb_30][60000]Accuracy-Highest: 0.97067
Training: 2022-01-16 04:05:44,744-Speed 848.48 samples/sec   Loss 6.6711   LearningRate 0.1663   Epoch: 11   Global Step: 60010   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-01-16 04:05:52,479-Speed 10593.11 samples/sec   Loss 6.6805   LearningRate 0.1662   Epoch: 11   Global Step: 60020   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:06:00,218-Speed 10586.34 samples/sec   Loss 6.6357   LearningRate 0.1662   Epoch: 11   Global Step: 60030   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:06:07,979-Speed 10557.01 samples/sec   Loss 6.6487   LearningRate 0.1661   Epoch: 11   Global Step: 60040   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:06:15,733-Speed 10565.79 samples/sec   Loss 6.6538   LearningRate 0.1660   Epoch: 11   Global Step: 60050   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:06:23,478-Speed 10579.17 samples/sec   Loss 6.6329   LearningRate 0.1659   Epoch: 11   Global Step: 60060   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:06:31,273-Speed 10511.12 samples/sec   Loss 6.6739   LearningRate 0.1659   Epoch: 11   Global Step: 60070   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:06:39,054-Speed 10528.68 samples/sec   Loss 6.6608   LearningRate 0.1658   Epoch: 11   Global Step: 60080   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:06:46,839-Speed 10524.26 samples/sec   Loss 6.6468   LearningRate 0.1657   Epoch: 11   Global Step: 60090   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:06:54,611-Speed 10541.54 samples/sec   Loss 6.6389   LearningRate 0.1656   Epoch: 11   Global Step: 60100   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:07:02,386-Speed 10538.69 samples/sec   Loss 6.6255   LearningRate 0.1656   Epoch: 11   Global Step: 60110   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:07:10,158-Speed 10540.69 samples/sec   Loss 6.6361   LearningRate 0.1655   Epoch: 11   Global Step: 60120   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-01-16 04:07:17,961-Speed 10499.63 samples/sec   Loss 6.6518   LearningRate 0.1654   Epoch: 11   Global Step: 60130   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-01-16 04:07:25,745-Speed 10526.80 samples/sec   Loss 6.6438   LearningRate 0.1653   Epoch: 11   Global Step: 60140   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:07:33,514-Speed 10545.32 samples/sec   Loss 6.6527   LearningRate 0.1653   Epoch: 11   Global Step: 60150   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 04:07:41,273-Speed 10559.33 samples/sec   Loss 6.6263   LearningRate 0.1652   Epoch: 11   Global Step: 60160   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 04:07:49,042-Speed 10546.56 samples/sec   Loss 6.6178   LearningRate 0.1651   Epoch: 11   Global Step: 60170   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 04:07:56,809-Speed 10549.39 samples/sec   Loss 6.6040   LearningRate 0.1650   Epoch: 11   Global Step: 60180   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 04:08:04,602-Speed 10512.60 samples/sec   Loss 6.5735   LearningRate 0.1650   Epoch: 11   Global Step: 60190   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 04:08:12,463-Speed 10422.71 samples/sec   Loss 6.6322   LearningRate 0.1649   Epoch: 11   Global Step: 60200   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 04:08:20,312-Speed 10438.98 samples/sec   Loss 6.6231   LearningRate 0.1648   Epoch: 11   Global Step: 60210   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 04:08:28,076-Speed 10552.24 samples/sec   Loss 6.6310   LearningRate 0.1647   Epoch: 11   Global Step: 60220   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 04:08:35,854-Speed 10534.59 samples/sec   Loss 6.5671   LearningRate 0.1646   Epoch: 11   Global Step: 60230   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 04:08:43,664-Speed 10490.52 samples/sec   Loss 6.6084   LearningRate 0.1646   Epoch: 11   Global Step: 60240   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 04:08:51,449-Speed 10523.61 samples/sec   Loss 6.6350   LearningRate 0.1645   Epoch: 11   Global Step: 60250   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:08:59,217-Speed 10548.06 samples/sec   Loss 6.6207   LearningRate 0.1644   Epoch: 11   Global Step: 60260   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:09:07,000-Speed 10525.56 samples/sec   Loss 6.5924   LearningRate 0.1643   Epoch: 11   Global Step: 60270   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:09:14,794-Speed 10511.80 samples/sec   Loss 6.6173   LearningRate 0.1643   Epoch: 11   Global Step: 60280   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:09:22,560-Speed 10550.62 samples/sec   Loss 6.5799   LearningRate 0.1642   Epoch: 11   Global Step: 60290   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:09:30,354-Speed 10513.05 samples/sec   Loss 6.5869   LearningRate 0.1641   Epoch: 11   Global Step: 60300   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:09:38,115-Speed 10556.08 samples/sec   Loss 6.5874   LearningRate 0.1640   Epoch: 11   Global Step: 60310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:09:45,959-Speed 10445.22 samples/sec   Loss 6.5984   LearningRate 0.1640   Epoch: 11   Global Step: 60320   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:09:53,778-Speed 10478.09 samples/sec   Loss 6.6354   LearningRate 0.1639   Epoch: 11   Global Step: 60330   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:10:01,610-Speed 10461.86 samples/sec   Loss 6.6585   LearningRate 0.1638   Epoch: 11   Global Step: 60340   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:10:09,370-Speed 10557.07 samples/sec   Loss 6.6174   LearningRate 0.1637   Epoch: 11   Global Step: 60350   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:10:17,189-Speed 10478.11 samples/sec   Loss 6.6053   LearningRate 0.1637   Epoch: 11   Global Step: 60360   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:10:24,957-Speed 10547.29 samples/sec   Loss 6.5930   LearningRate 0.1636   Epoch: 11   Global Step: 60370   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:10:32,725-Speed 10547.90 samples/sec   Loss 6.6218   LearningRate 0.1635   Epoch: 11   Global Step: 60380   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 04:10:40,551-Speed 10468.85 samples/sec   Loss 6.5569   LearningRate 0.1634   Epoch: 11   Global Step: 60390   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 04:10:48,331-Speed 10530.91 samples/sec   Loss 6.6191   LearningRate 0.1634   Epoch: 11   Global Step: 60400   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 04:10:56,104-Speed 10543.63 samples/sec   Loss 6.6114   LearningRate 0.1633   Epoch: 11   Global Step: 60410   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 04:11:03,869-Speed 10551.35 samples/sec   Loss 6.5561   LearningRate 0.1632   Epoch: 11   Global Step: 60420   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 04:11:11,636-Speed 10549.26 samples/sec   Loss 6.5814   LearningRate 0.1631   Epoch: 11   Global Step: 60430   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 04:11:19,413-Speed 10535.31 samples/sec   Loss 6.6179   LearningRate 0.1631   Epoch: 11   Global Step: 60440   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 04:11:27,185-Speed 10541.66 samples/sec   Loss 6.5931   LearningRate 0.1630   Epoch: 11   Global Step: 60450   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 04:11:34,964-Speed 10532.96 samples/sec   Loss 6.6386   LearningRate 0.1629   Epoch: 11   Global Step: 60460   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 04:11:42,759-Speed 10509.52 samples/sec   Loss 6.5714   LearningRate 0.1628   Epoch: 11   Global Step: 60470   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-16 04:11:50,509-Speed 10571.61 samples/sec   Loss 6.6376   LearningRate 0.1628   Epoch: 11   Global Step: 60480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:11:58,282-Speed 10541.08 samples/sec   Loss 6.5865   LearningRate 0.1627   Epoch: 11   Global Step: 60490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:12:06,059-Speed 10534.31 samples/sec   Loss 6.5439   LearningRate 0.1626   Epoch: 11   Global Step: 60500   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:12:13,833-Speed 10539.00 samples/sec   Loss 6.5821   LearningRate 0.1625   Epoch: 11   Global Step: 60510   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:12:21,596-Speed 10554.14 samples/sec   Loss 6.5758   LearningRate 0.1625   Epoch: 11   Global Step: 60520   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-16 04:12:29,370-Speed 10540.13 samples/sec   Loss 6.5448   LearningRate 0.1624   Epoch: 11   Global Step: 60530   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:12:37,162-Speed 10515.44 samples/sec   Loss 6.5459   LearningRate 0.1623   Epoch: 11   Global Step: 60540   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:12:44,936-Speed 10538.12 samples/sec   Loss 6.5852   LearningRate 0.1622   Epoch: 11   Global Step: 60550   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:12:52,686-Speed 10571.18 samples/sec   Loss 6.5935   LearningRate 0.1622   Epoch: 11   Global Step: 60560   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:13:00,438-Speed 10568.98 samples/sec   Loss 6.5365   LearningRate 0.1621   Epoch: 11   Global Step: 60570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:13:08,210-Speed 10541.63 samples/sec   Loss 6.5749   LearningRate 0.1620   Epoch: 11   Global Step: 60580   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:13:16,015-Speed 10496.84 samples/sec   Loss 6.5721   LearningRate 0.1619   Epoch: 11   Global Step: 60590   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:13:23,809-Speed 10512.95 samples/sec   Loss 6.6169   LearningRate 0.1619   Epoch: 11   Global Step: 60600   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:13:31,598-Speed 10518.64 samples/sec   Loss 6.5676   LearningRate 0.1618   Epoch: 11   Global Step: 60610   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:13:39,444-Speed 10441.97 samples/sec   Loss 6.5627   LearningRate 0.1617   Epoch: 11   Global Step: 60620   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:13:47,220-Speed 10537.76 samples/sec   Loss 6.5812   LearningRate 0.1616   Epoch: 11   Global Step: 60630   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:13:54,991-Speed 10543.28 samples/sec   Loss 6.5855   LearningRate 0.1616   Epoch: 11   Global Step: 60640   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:14:02,783-Speed 10515.06 samples/sec   Loss 6.5291   LearningRate 0.1615   Epoch: 11   Global Step: 60650   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:14:10,565-Speed 10529.16 samples/sec   Loss 6.6160   LearningRate 0.1614   Epoch: 11   Global Step: 60660   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:14:18,412-Speed 10441.79 samples/sec   Loss 6.6028   LearningRate 0.1613   Epoch: 11   Global Step: 60670   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:14:26,198-Speed 10522.12 samples/sec   Loss 6.5626   LearningRate 0.1613   Epoch: 11   Global Step: 60680   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:14:33,987-Speed 10519.39 samples/sec   Loss 6.5886   LearningRate 0.1612   Epoch: 11   Global Step: 60690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:14:41,786-Speed 10504.10 samples/sec   Loss 6.5970   LearningRate 0.1611   Epoch: 11   Global Step: 60700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:14:49,609-Speed 10474.14 samples/sec   Loss 6.5315   LearningRate 0.1610   Epoch: 11   Global Step: 60710   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:14:57,410-Speed 10503.49 samples/sec   Loss 6.5593   LearningRate 0.1610   Epoch: 11   Global Step: 60720   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:15:05,221-Speed 10488.12 samples/sec   Loss 6.5196   LearningRate 0.1609   Epoch: 11   Global Step: 60730   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:15:13,016-Speed 10510.99 samples/sec   Loss 6.5845   LearningRate 0.1608   Epoch: 11   Global Step: 60740   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:15:20,793-Speed 10535.46 samples/sec   Loss 6.5463   LearningRate 0.1607   Epoch: 11   Global Step: 60750   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:15:28,604-Speed 10489.63 samples/sec   Loss 6.5314   LearningRate 0.1607   Epoch: 11   Global Step: 60760   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:15:36,398-Speed 10511.74 samples/sec   Loss 6.5207   LearningRate 0.1606   Epoch: 11   Global Step: 60770   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:15:44,226-Speed 10466.65 samples/sec   Loss 6.5586   LearningRate 0.1605   Epoch: 11   Global Step: 60780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:15:52,027-Speed 10503.62 samples/sec   Loss 6.5568   LearningRate 0.1604   Epoch: 11   Global Step: 60790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:15:59,867-Speed 10453.25 samples/sec   Loss 6.5072   LearningRate 0.1604   Epoch: 11   Global Step: 60800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:16:07,685-Speed 10482.52 samples/sec   Loss 6.5452   LearningRate 0.1603   Epoch: 11   Global Step: 60810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:16:15,516-Speed 10467.56 samples/sec   Loss 6.5397   LearningRate 0.1602   Epoch: 11   Global Step: 60820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:16:23,315-Speed 10504.33 samples/sec   Loss 6.5781   LearningRate 0.1601   Epoch: 11   Global Step: 60830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:16:31,152-Speed 10454.36 samples/sec   Loss 6.5384   LearningRate 0.1601   Epoch: 11   Global Step: 60840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:16:38,944-Speed 10514.35 samples/sec   Loss 6.4807   LearningRate 0.1600   Epoch: 11   Global Step: 60850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:16:46,740-Speed 10509.51 samples/sec   Loss 6.5031   LearningRate 0.1599   Epoch: 11   Global Step: 60860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:16:54,518-Speed 10534.45 samples/sec   Loss 6.5185   LearningRate 0.1598   Epoch: 11   Global Step: 60870   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:17:02,308-Speed 10517.58 samples/sec   Loss 6.5293   LearningRate 0.1598   Epoch: 11   Global Step: 60880   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:17:10,088-Speed 10531.10 samples/sec   Loss 6.5148   LearningRate 0.1597   Epoch: 11   Global Step: 60890   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:17:17,863-Speed 10537.03 samples/sec   Loss 6.5194   LearningRate 0.1596   Epoch: 11   Global Step: 60900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:17:25,648-Speed 10525.06 samples/sec   Loss 6.4990   LearningRate 0.1595   Epoch: 11   Global Step: 60910   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-01-16 04:17:33,473-Speed 10470.86 samples/sec   Loss 6.5033   LearningRate 0.1595   Epoch: 11   Global Step: 60920   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-01-16 04:17:41,270-Speed 10506.91 samples/sec   Loss 6.5223   LearningRate 0.1594   Epoch: 11   Global Step: 60930   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-01-16 04:17:49,073-Speed 10499.84 samples/sec   Loss 6.5410   LearningRate 0.1593   Epoch: 11   Global Step: 60940   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:17:56,889-Speed 10483.66 samples/sec   Loss 6.5112   LearningRate 0.1592   Epoch: 11   Global Step: 60950   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:18:04,674-Speed 10523.26 samples/sec   Loss 6.4931   LearningRate 0.1592   Epoch: 11   Global Step: 60960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:18:12,488-Speed 10485.41 samples/sec   Loss 6.4936   LearningRate 0.1591   Epoch: 11   Global Step: 60970   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:18:20,264-Speed 10535.61 samples/sec   Loss 6.4933   LearningRate 0.1590   Epoch: 11   Global Step: 60980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:18:28,055-Speed 10516.64 samples/sec   Loss 6.5490   LearningRate 0.1589   Epoch: 11   Global Step: 60990   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:18:35,853-Speed 10506.75 samples/sec   Loss 6.5427   LearningRate 0.1589   Epoch: 11   Global Step: 61000   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:18:43,653-Speed 10504.23 samples/sec   Loss 6.5013   LearningRate 0.1588   Epoch: 11   Global Step: 61010   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:18:51,423-Speed 10544.79 samples/sec   Loss 6.5474   LearningRate 0.1587   Epoch: 11   Global Step: 61020   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:18:59,224-Speed 10502.06 samples/sec   Loss 6.5786   LearningRate 0.1586   Epoch: 11   Global Step: 61030   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:19:07,045-Speed 10476.72 samples/sec   Loss 6.5466   LearningRate 0.1586   Epoch: 11   Global Step: 61040   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:19:14,864-Speed 10478.18 samples/sec   Loss 6.4739   LearningRate 0.1585   Epoch: 11   Global Step: 61050   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:19:22,673-Speed 10492.59 samples/sec   Loss 6.5254   LearningRate 0.1584   Epoch: 11   Global Step: 61060   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:19:30,445-Speed 10542.19 samples/sec   Loss 6.5160   LearningRate 0.1583   Epoch: 11   Global Step: 61070   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:19:38,249-Speed 10501.20 samples/sec   Loss 6.5216   LearningRate 0.1583   Epoch: 11   Global Step: 61080   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:19:46,026-Speed 10535.25 samples/sec   Loss 6.5370   LearningRate 0.1582   Epoch: 11   Global Step: 61090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:19:53,884-Speed 10427.00 samples/sec   Loss 6.5348   LearningRate 0.1581   Epoch: 11   Global Step: 61100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:20:01,686-Speed 10500.99 samples/sec   Loss 6.5400   LearningRate 0.1580   Epoch: 11   Global Step: 61110   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:20:09,502-Speed 10483.27 samples/sec   Loss 6.4978   LearningRate 0.1580   Epoch: 11   Global Step: 61120   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:20:17,355-Speed 10432.95 samples/sec   Loss 6.4961   LearningRate 0.1579   Epoch: 11   Global Step: 61130   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:20:25,132-Speed 10535.06 samples/sec   Loss 6.4678   LearningRate 0.1578   Epoch: 11   Global Step: 61140   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-01-16 04:20:32,905-Speed 10539.82 samples/sec   Loss 6.4664   LearningRate 0.1578   Epoch: 11   Global Step: 61150   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:20:40,687-Speed 10527.58 samples/sec   Loss 6.5021   LearningRate 0.1577   Epoch: 11   Global Step: 61160   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:20:48,476-Speed 10519.34 samples/sec   Loss 6.4937   LearningRate 0.1576   Epoch: 11   Global Step: 61170   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:20:56,277-Speed 10502.69 samples/sec   Loss 6.4579   LearningRate 0.1575   Epoch: 11   Global Step: 61180   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:21:04,063-Speed 10522.49 samples/sec   Loss 6.4765   LearningRate 0.1575   Epoch: 11   Global Step: 61190   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:21:11,867-Speed 10498.92 samples/sec   Loss 6.5198   LearningRate 0.1574   Epoch: 11   Global Step: 61200   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:21:19,642-Speed 10537.97 samples/sec   Loss 6.4577   LearningRate 0.1573   Epoch: 11   Global Step: 61210   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:21:27,435-Speed 10513.30 samples/sec   Loss 6.4868   LearningRate 0.1572   Epoch: 11   Global Step: 61220   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:21:35,248-Speed 10486.36 samples/sec   Loss 6.4948   LearningRate 0.1572   Epoch: 11   Global Step: 61230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:21:43,045-Speed 10508.90 samples/sec   Loss 6.4868   LearningRate 0.1571   Epoch: 11   Global Step: 61240   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:21:50,831-Speed 10523.40 samples/sec   Loss 6.4966   LearningRate 0.1570   Epoch: 11   Global Step: 61250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:21:58,615-Speed 10525.73 samples/sec   Loss 6.4677   LearningRate 0.1569   Epoch: 11   Global Step: 61260   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:22:06,411-Speed 10508.31 samples/sec   Loss 6.4428   LearningRate 0.1569   Epoch: 11   Global Step: 61270   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:22:14,246-Speed 10457.42 samples/sec   Loss 6.4670   LearningRate 0.1568   Epoch: 11   Global Step: 61280   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:22:22,047-Speed 10502.75 samples/sec   Loss 6.4637   LearningRate 0.1567   Epoch: 11   Global Step: 61290   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:22:29,836-Speed 10518.94 samples/sec   Loss 6.4753   LearningRate 0.1566   Epoch: 11   Global Step: 61300   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:22:37,624-Speed 10520.75 samples/sec   Loss 6.4726   LearningRate 0.1566   Epoch: 11   Global Step: 61310   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:22:45,432-Speed 10493.24 samples/sec   Loss 6.4266   LearningRate 0.1565   Epoch: 11   Global Step: 61320   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:22:53,214-Speed 10527.69 samples/sec   Loss 6.4577   LearningRate 0.1564   Epoch: 11   Global Step: 61330   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:23:01,021-Speed 10494.76 samples/sec   Loss 6.4762   LearningRate 0.1563   Epoch: 11   Global Step: 61340   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:23:08,813-Speed 10515.65 samples/sec   Loss 6.4899   LearningRate 0.1563   Epoch: 11   Global Step: 61350   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:23:16,622-Speed 10491.26 samples/sec   Loss 6.4595   LearningRate 0.1562   Epoch: 11   Global Step: 61360   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:23:24,433-Speed 10489.15 samples/sec   Loss 6.4997   LearningRate 0.1561   Epoch: 11   Global Step: 61370   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:23:32,271-Speed 10453.33 samples/sec   Loss 6.4493   LearningRate 0.1560   Epoch: 11   Global Step: 61380   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:23:40,077-Speed 10495.81 samples/sec   Loss 6.4768   LearningRate 0.1560   Epoch: 11   Global Step: 61390   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:23:47,883-Speed 10496.52 samples/sec   Loss 6.4359   LearningRate 0.1559   Epoch: 11   Global Step: 61400   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:23:55,700-Speed 10481.22 samples/sec   Loss 6.4568   LearningRate 0.1558   Epoch: 11   Global Step: 61410   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:24:03,533-Speed 10460.22 samples/sec   Loss 6.3971   LearningRate 0.1558   Epoch: 11   Global Step: 61420   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:24:11,305-Speed 10542.04 samples/sec   Loss 6.4410   LearningRate 0.1557   Epoch: 11   Global Step: 61430   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:24:19,087-Speed 10527.31 samples/sec   Loss 6.4882   LearningRate 0.1556   Epoch: 11   Global Step: 61440   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:24:26,885-Speed 10508.44 samples/sec   Loss 6.4861   LearningRate 0.1555   Epoch: 11   Global Step: 61450   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:24:34,687-Speed 10501.79 samples/sec   Loss 6.4923   LearningRate 0.1555   Epoch: 11   Global Step: 61460   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-01-16 04:24:42,474-Speed 10521.86 samples/sec   Loss 6.4047   LearningRate 0.1554   Epoch: 11   Global Step: 61470   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-01-16 04:24:50,277-Speed 10500.02 samples/sec   Loss 6.4715   LearningRate 0.1553   Epoch: 11   Global Step: 61480   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:24:58,076-Speed 10506.59 samples/sec   Loss 6.4136   LearningRate 0.1552   Epoch: 11   Global Step: 61490   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:25:05,861-Speed 10524.74 samples/sec   Loss 6.4369   LearningRate 0.1552   Epoch: 11   Global Step: 61500   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:25:13,657-Speed 10509.98 samples/sec   Loss 6.4549   LearningRate 0.1551   Epoch: 11   Global Step: 61510   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:25:21,438-Speed 10529.37 samples/sec   Loss 6.4632   LearningRate 0.1550   Epoch: 11   Global Step: 61520   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:25:29,244-Speed 10497.49 samples/sec   Loss 6.4551   LearningRate 0.1549   Epoch: 11   Global Step: 61530   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:25:37,047-Speed 10500.17 samples/sec   Loss 6.4496   LearningRate 0.1549   Epoch: 11   Global Step: 61540   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:25:44,848-Speed 10501.46 samples/sec   Loss 6.4311   LearningRate 0.1548   Epoch: 11   Global Step: 61550   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:25:52,656-Speed 10493.36 samples/sec   Loss 6.3986   LearningRate 0.1547   Epoch: 11   Global Step: 61560   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:26:00,443-Speed 10521.58 samples/sec   Loss 6.4452   LearningRate 0.1547   Epoch: 11   Global Step: 61570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:26:08,248-Speed 10497.32 samples/sec   Loss 6.4214   LearningRate 0.1546   Epoch: 11   Global Step: 61580   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-01-16 04:26:16,039-Speed 10516.72 samples/sec   Loss 6.4132   LearningRate 0.1545   Epoch: 11   Global Step: 61590   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-01-16 04:26:23,923-Speed 10391.82 samples/sec   Loss 6.4155   LearningRate 0.1544   Epoch: 11   Global Step: 61600   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:26:31,720-Speed 10507.15 samples/sec   Loss 6.4079   LearningRate 0.1544   Epoch: 11   Global Step: 61610   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:26:39,511-Speed 10517.30 samples/sec   Loss 6.4265   LearningRate 0.1543   Epoch: 11   Global Step: 61620   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:26:47,307-Speed 10508.01 samples/sec   Loss 6.4073   LearningRate 0.1542   Epoch: 11   Global Step: 61630   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:26:55,092-Speed 10525.30 samples/sec   Loss 6.4490   LearningRate 0.1541   Epoch: 11   Global Step: 61640   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:27:02,873-Speed 10528.68 samples/sec   Loss 6.4561   LearningRate 0.1541   Epoch: 11   Global Step: 61650   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:27:10,659-Speed 10523.32 samples/sec   Loss 6.4394   LearningRate 0.1540   Epoch: 11   Global Step: 61660   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:27:18,461-Speed 10499.94 samples/sec   Loss 6.4501   LearningRate 0.1539   Epoch: 11   Global Step: 61670   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:27:26,241-Speed 10532.09 samples/sec   Loss 6.3542   LearningRate 0.1538   Epoch: 11   Global Step: 61680   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:27:34,036-Speed 10510.18 samples/sec   Loss 6.3930   LearningRate 0.1538   Epoch: 11   Global Step: 61690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:27:41,827-Speed 10516.60 samples/sec   Loss 6.4234   LearningRate 0.1537   Epoch: 11   Global Step: 61700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:27:49,628-Speed 10502.73 samples/sec   Loss 6.4127   LearningRate 0.1536   Epoch: 11   Global Step: 61710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:27:57,412-Speed 10525.68 samples/sec   Loss 6.4690   LearningRate 0.1536   Epoch: 11   Global Step: 61720   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:28:05,211-Speed 10505.01 samples/sec   Loss 6.4516   LearningRate 0.1535   Epoch: 11   Global Step: 61730   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:28:13,019-Speed 10493.03 samples/sec   Loss 6.4235   LearningRate 0.1534   Epoch: 11   Global Step: 61740   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:28:20,830-Speed 10488.78 samples/sec   Loss 6.3808   LearningRate 0.1533   Epoch: 11   Global Step: 61750   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:28:28,635-Speed 10498.34 samples/sec   Loss 6.3927   LearningRate 0.1533   Epoch: 11   Global Step: 61760   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:28:36,439-Speed 10497.92 samples/sec   Loss 6.3640   LearningRate 0.1532   Epoch: 11   Global Step: 61770   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:28:44,275-Speed 10455.98 samples/sec   Loss 6.3798   LearningRate 0.1531   Epoch: 11   Global Step: 61780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:28:52,087-Speed 10488.58 samples/sec   Loss 6.3963   LearningRate 0.1530   Epoch: 11   Global Step: 61790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:28:59,857-Speed 10544.78 samples/sec   Loss 6.4183   LearningRate 0.1530   Epoch: 11   Global Step: 61800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:29:07,664-Speed 10494.34 samples/sec   Loss 6.4016   LearningRate 0.1529   Epoch: 11   Global Step: 61810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:29:15,452-Speed 10519.71 samples/sec   Loss 6.4280   LearningRate 0.1528   Epoch: 11   Global Step: 61820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:29:23,252-Speed 10503.95 samples/sec   Loss 6.4171   LearningRate 0.1527   Epoch: 11   Global Step: 61830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:29:31,058-Speed 10496.84 samples/sec   Loss 6.3862   LearningRate 0.1527   Epoch: 11   Global Step: 61840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:29:38,851-Speed 10513.58 samples/sec   Loss 6.4089   LearningRate 0.1526   Epoch: 11   Global Step: 61850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:29:46,628-Speed 10534.87 samples/sec   Loss 6.3945   LearningRate 0.1525   Epoch: 11   Global Step: 61860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:29:54,430-Speed 10501.32 samples/sec   Loss 6.4085   LearningRate 0.1525   Epoch: 11   Global Step: 61870   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:30:02,257-Speed 10467.07 samples/sec   Loss 6.3815   LearningRate 0.1524   Epoch: 11   Global Step: 61880   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:30:10,061-Speed 10499.29 samples/sec   Loss 6.4082   LearningRate 0.1523   Epoch: 11   Global Step: 61890   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:30:17,842-Speed 10530.24 samples/sec   Loss 6.3631   LearningRate 0.1522   Epoch: 11   Global Step: 61900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:30:25,624-Speed 10527.47 samples/sec   Loss 6.3549   LearningRate 0.1522   Epoch: 11   Global Step: 61910   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:30:33,434-Speed 10491.45 samples/sec   Loss 6.3860   LearningRate 0.1521   Epoch: 11   Global Step: 61920   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:30:41,238-Speed 10498.79 samples/sec   Loss 6.3885   LearningRate 0.1520   Epoch: 11   Global Step: 61930   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:30:49,036-Speed 10507.28 samples/sec   Loss 6.3606   LearningRate 0.1519   Epoch: 11   Global Step: 61940   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-01-16 04:30:56,837-Speed 10503.65 samples/sec   Loss 6.3753   LearningRate 0.1519   Epoch: 11   Global Step: 61950   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:31:04,631-Speed 10511.34 samples/sec   Loss 6.3153   LearningRate 0.1518   Epoch: 11   Global Step: 61960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:31:12,429-Speed 10507.32 samples/sec   Loss 6.3582   LearningRate 0.1517   Epoch: 11   Global Step: 61970   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:31:20,235-Speed 10495.25 samples/sec   Loss 6.3597   LearningRate 0.1517   Epoch: 11   Global Step: 61980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:31:28,057-Speed 10478.15 samples/sec   Loss 6.3702   LearningRate 0.1516   Epoch: 11   Global Step: 61990   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:31:35,869-Speed 10487.79 samples/sec   Loss 6.3922   LearningRate 0.1515   Epoch: 11   Global Step: 62000   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:31:43,667-Speed 10512.08 samples/sec   Loss 6.3377   LearningRate 0.1514   Epoch: 11   Global Step: 62010   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:31:51,453-Speed 10523.86 samples/sec   Loss 6.3650   LearningRate 0.1514   Epoch: 11   Global Step: 62020   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:31:59,231-Speed 10534.00 samples/sec   Loss 6.4043   LearningRate 0.1513   Epoch: 11   Global Step: 62030   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:32:07,023-Speed 10514.37 samples/sec   Loss 6.4297   LearningRate 0.1512   Epoch: 11   Global Step: 62040   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:32:14,817-Speed 10511.98 samples/sec   Loss 6.3594   LearningRate 0.1511   Epoch: 11   Global Step: 62050   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:32:22,602-Speed 10524.88 samples/sec   Loss 6.3739   LearningRate 0.1511   Epoch: 11   Global Step: 62060   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:32:30,396-Speed 10512.34 samples/sec   Loss 6.3790   LearningRate 0.1510   Epoch: 11   Global Step: 62070   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:32:38,201-Speed 10496.45 samples/sec   Loss 6.3344   LearningRate 0.1509   Epoch: 11   Global Step: 62080   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:32:45,990-Speed 10519.54 samples/sec   Loss 6.3528   LearningRate 0.1509   Epoch: 11   Global Step: 62090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:32:53,777-Speed 10520.94 samples/sec   Loss 6.4105   LearningRate 0.1508   Epoch: 11   Global Step: 62100   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:33:01,571-Speed 10512.07 samples/sec   Loss 6.3642   LearningRate 0.1507   Epoch: 11   Global Step: 62110   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:33:09,372-Speed 10503.22 samples/sec   Loss 6.4091   LearningRate 0.1506   Epoch: 11   Global Step: 62120   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:33:17,197-Speed 10469.84 samples/sec   Loss 6.3552   LearningRate 0.1506   Epoch: 11   Global Step: 62130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:33:25,094-Speed 10375.95 samples/sec   Loss 6.3828   LearningRate 0.1505   Epoch: 11   Global Step: 62140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:33:32,902-Speed 10492.34 samples/sec   Loss 6.3804   LearningRate 0.1504   Epoch: 11   Global Step: 62150   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:33:40,710-Speed 10495.38 samples/sec   Loss 6.3924   LearningRate 0.1503   Epoch: 11   Global Step: 62160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:33:48,505-Speed 10509.73 samples/sec   Loss 6.3772   LearningRate 0.1503   Epoch: 11   Global Step: 62170   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:33:56,317-Speed 10488.84 samples/sec   Loss 6.3149   LearningRate 0.1502   Epoch: 11   Global Step: 62180   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:34:04,124-Speed 10493.50 samples/sec   Loss 6.3852   LearningRate 0.1501   Epoch: 11   Global Step: 62190   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:34:11,952-Speed 10467.00 samples/sec   Loss 6.3338   LearningRate 0.1501   Epoch: 11   Global Step: 62200   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:34:19,767-Speed 10483.39 samples/sec   Loss 6.3641   LearningRate 0.1500   Epoch: 11   Global Step: 62210   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:34:42,690-Speed 3573.90 samples/sec   Loss 6.4050   LearningRate 0.1499   Epoch: 12   Global Step: 62220   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:34:50,497-Speed 10494.53 samples/sec   Loss 6.3650   LearningRate 0.1498   Epoch: 12   Global Step: 62230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:34:58,294-Speed 10508.16 samples/sec   Loss 6.3505   LearningRate 0.1498   Epoch: 12   Global Step: 62240   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:35:06,160-Speed 10414.97 samples/sec   Loss 6.3559   LearningRate 0.1497   Epoch: 12   Global Step: 62250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:35:13,975-Speed 10484.30 samples/sec   Loss 6.3614   LearningRate 0.1496   Epoch: 12   Global Step: 62260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:35:21,758-Speed 10525.94 samples/sec   Loss 6.3445   LearningRate 0.1496   Epoch: 12   Global Step: 62270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:35:29,541-Speed 10527.70 samples/sec   Loss 6.3362   LearningRate 0.1495   Epoch: 12   Global Step: 62280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:35:37,320-Speed 10532.59 samples/sec   Loss 6.3327   LearningRate 0.1494   Epoch: 12   Global Step: 62290   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:35:45,098-Speed 10533.73 samples/sec   Loss 6.3103   LearningRate 0.1493   Epoch: 12   Global Step: 62300   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-01-16 04:35:52,922-Speed 10471.25 samples/sec   Loss 6.3202   LearningRate 0.1493   Epoch: 12   Global Step: 62310   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-01-16 04:36:00,732-Speed 10491.60 samples/sec   Loss 6.2826   LearningRate 0.1492   Epoch: 12   Global Step: 62320   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:36:08,574-Speed 10447.96 samples/sec   Loss 6.3023   LearningRate 0.1491   Epoch: 12   Global Step: 62330   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:36:16,379-Speed 10497.27 samples/sec   Loss 6.3563   LearningRate 0.1490   Epoch: 12   Global Step: 62340   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:36:24,191-Speed 10488.75 samples/sec   Loss 6.2942   LearningRate 0.1490   Epoch: 12   Global Step: 62350   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:36:31,990-Speed 10505.31 samples/sec   Loss 6.3123   LearningRate 0.1489   Epoch: 12   Global Step: 62360   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:36:39,779-Speed 10518.91 samples/sec   Loss 6.3126   LearningRate 0.1488   Epoch: 12   Global Step: 62370   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:36:47,559-Speed 10530.83 samples/sec   Loss 6.3167   LearningRate 0.1488   Epoch: 12   Global Step: 62380   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:36:55,360-Speed 10502.89 samples/sec   Loss 6.2817   LearningRate 0.1487   Epoch: 12   Global Step: 62390   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:37:03,163-Speed 10499.70 samples/sec   Loss 6.3263   LearningRate 0.1486   Epoch: 12   Global Step: 62400   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:37:10,990-Speed 10469.20 samples/sec   Loss 6.3237   LearningRate 0.1485   Epoch: 12   Global Step: 62410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:37:18,776-Speed 10522.76 samples/sec   Loss 6.3102   LearningRate 0.1485   Epoch: 12   Global Step: 62420   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:37:26,586-Speed 10489.43 samples/sec   Loss 6.3012   LearningRate 0.1484   Epoch: 12   Global Step: 62430   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:37:34,371-Speed 10524.66 samples/sec   Loss 6.2980   LearningRate 0.1483   Epoch: 12   Global Step: 62440   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:37:42,166-Speed 10514.73 samples/sec   Loss 6.3183   LearningRate 0.1483   Epoch: 12   Global Step: 62450   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:37:49,973-Speed 10494.04 samples/sec   Loss 6.3005   LearningRate 0.1482   Epoch: 12   Global Step: 62460   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:37:57,776-Speed 10499.14 samples/sec   Loss 6.2830   LearningRate 0.1481   Epoch: 12   Global Step: 62470   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:38:05,565-Speed 10520.51 samples/sec   Loss 6.3281   LearningRate 0.1480   Epoch: 12   Global Step: 62480   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:38:13,359-Speed 10513.97 samples/sec   Loss 6.2731   LearningRate 0.1480   Epoch: 12   Global Step: 62490   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:38:21,160-Speed 10502.36 samples/sec   Loss 6.3311   LearningRate 0.1479   Epoch: 12   Global Step: 62500   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:38:28,972-Speed 10486.99 samples/sec   Loss 6.3207   LearningRate 0.1478   Epoch: 12   Global Step: 62510   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:38:36,757-Speed 10526.09 samples/sec   Loss 6.2902   LearningRate 0.1478   Epoch: 12   Global Step: 62520   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:38:44,572-Speed 10483.50 samples/sec   Loss 6.3002   LearningRate 0.1477   Epoch: 12   Global Step: 62530   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:38:52,356-Speed 10525.97 samples/sec   Loss 6.3025   LearningRate 0.1476   Epoch: 12   Global Step: 62540   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:39:00,175-Speed 10478.30 samples/sec   Loss 6.3759   LearningRate 0.1475   Epoch: 12   Global Step: 62550   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:39:07,980-Speed 10498.13 samples/sec   Loss 6.2805   LearningRate 0.1475   Epoch: 12   Global Step: 62560   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:39:15,786-Speed 10495.84 samples/sec   Loss 6.2806   LearningRate 0.1474   Epoch: 12   Global Step: 62570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:39:23,614-Speed 10465.87 samples/sec   Loss 6.2548   LearningRate 0.1473   Epoch: 12   Global Step: 62580   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-01-16 04:39:31,410-Speed 10508.98 samples/sec   Loss 6.2771   LearningRate 0.1472   Epoch: 12   Global Step: 62590   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-01-16 04:39:39,216-Speed 10497.30 samples/sec   Loss 6.2633   LearningRate 0.1472   Epoch: 12   Global Step: 62600   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-01-16 04:39:47,007-Speed 10515.49 samples/sec   Loss 6.2926   LearningRate 0.1471   Epoch: 12   Global Step: 62610   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:39:54,830-Speed 10473.08 samples/sec   Loss 6.2955   LearningRate 0.1470   Epoch: 12   Global Step: 62620   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:40:02,662-Speed 10461.29 samples/sec   Loss 6.2936   LearningRate 0.1470   Epoch: 12   Global Step: 62630   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:40:10,523-Speed 10428.60 samples/sec   Loss 6.2713   LearningRate 0.1469   Epoch: 12   Global Step: 62640   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:40:18,329-Speed 10497.34 samples/sec   Loss 6.3066   LearningRate 0.1468   Epoch: 12   Global Step: 62650   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:40:26,154-Speed 10470.10 samples/sec   Loss 6.3096   LearningRate 0.1467   Epoch: 12   Global Step: 62660   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:40:33,981-Speed 10470.31 samples/sec   Loss 6.2970   LearningRate 0.1467   Epoch: 12   Global Step: 62670   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:40:41,827-Speed 10442.44 samples/sec   Loss 6.2683   LearningRate 0.1466   Epoch: 12   Global Step: 62680   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:40:49,683-Speed 10428.55 samples/sec   Loss 6.2270   LearningRate 0.1465   Epoch: 12   Global Step: 62690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:40:57,532-Speed 10439.47 samples/sec   Loss 6.2574   LearningRate 0.1465   Epoch: 12   Global Step: 62700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:41:05,373-Speed 10449.87 samples/sec   Loss 6.2772   LearningRate 0.1464   Epoch: 12   Global Step: 62710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:41:13,184-Speed 10488.51 samples/sec   Loss 6.2903   LearningRate 0.1463   Epoch: 12   Global Step: 62720   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:41:21,009-Speed 10470.69 samples/sec   Loss 6.3178   LearningRate 0.1462   Epoch: 12   Global Step: 62730   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:41:28,824-Speed 10483.87 samples/sec   Loss 6.2894   LearningRate 0.1462   Epoch: 12   Global Step: 62740   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:41:36,658-Speed 10457.82 samples/sec   Loss 6.2888   LearningRate 0.1461   Epoch: 12   Global Step: 62750   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:41:44,497-Speed 10452.61 samples/sec   Loss 6.2740   LearningRate 0.1460   Epoch: 12   Global Step: 62760   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:41:52,331-Speed 10457.57 samples/sec   Loss 6.2803   LearningRate 0.1460   Epoch: 12   Global Step: 62770   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:42:00,176-Speed 10444.37 samples/sec   Loss 6.2757   LearningRate 0.1459   Epoch: 12   Global Step: 62780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:42:08,005-Speed 10463.79 samples/sec   Loss 6.2642   LearningRate 0.1458   Epoch: 12   Global Step: 62790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:42:15,830-Speed 10470.97 samples/sec   Loss 6.2033   LearningRate 0.1457   Epoch: 12   Global Step: 62800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:42:23,666-Speed 10455.26 samples/sec   Loss 6.2467   LearningRate 0.1457   Epoch: 12   Global Step: 62810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:42:31,504-Speed 10453.43 samples/sec   Loss 6.2711   LearningRate 0.1456   Epoch: 12   Global Step: 62820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:42:39,392-Speed 10386.49 samples/sec   Loss 6.2235   LearningRate 0.1455   Epoch: 12   Global Step: 62830   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-01-16 04:42:47,258-Speed 10415.81 samples/sec   Loss 6.2386   LearningRate 0.1455   Epoch: 12   Global Step: 62840   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-01-16 04:42:55,095-Speed 10454.87 samples/sec   Loss 6.2830   LearningRate 0.1454   Epoch: 12   Global Step: 62850   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-01-16 04:43:02,934-Speed 10450.45 samples/sec   Loss 6.3175   LearningRate 0.1453   Epoch: 12   Global Step: 62860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:43:10,784-Speed 10437.60 samples/sec   Loss 6.2526   LearningRate 0.1452   Epoch: 12   Global Step: 62870   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:43:18,621-Speed 10454.88 samples/sec   Loss 6.2316   LearningRate 0.1452   Epoch: 12   Global Step: 62880   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:43:26,446-Speed 10469.79 samples/sec   Loss 6.2482   LearningRate 0.1451   Epoch: 12   Global Step: 62890   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:43:34,307-Speed 10422.03 samples/sec   Loss 6.2281   LearningRate 0.1450   Epoch: 12   Global Step: 62900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:43:42,115-Speed 10494.05 samples/sec   Loss 6.2509   LearningRate 0.1450   Epoch: 12   Global Step: 62910   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:43:49,953-Speed 10453.28 samples/sec   Loss 6.2622   LearningRate 0.1449   Epoch: 12   Global Step: 62920   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:43:57,752-Speed 10505.30 samples/sec   Loss 6.2246   LearningRate 0.1448   Epoch: 12   Global Step: 62930   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:44:05,556-Speed 10498.99 samples/sec   Loss 6.2273   LearningRate 0.1448   Epoch: 12   Global Step: 62940   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:44:13,333-Speed 10534.98 samples/sec   Loss 6.2745   LearningRate 0.1447   Epoch: 12   Global Step: 62950   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:44:21,147-Speed 10485.47 samples/sec   Loss 6.2822   LearningRate 0.1446   Epoch: 12   Global Step: 62960   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-01-16 04:44:28,945-Speed 10506.24 samples/sec   Loss 6.2118   LearningRate 0.1445   Epoch: 12   Global Step: 62970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:44:36,765-Speed 10476.58 samples/sec   Loss 6.2637   LearningRate 0.1445   Epoch: 12   Global Step: 62980   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:44:44,583-Speed 10480.66 samples/sec   Loss 6.2823   LearningRate 0.1444   Epoch: 12   Global Step: 62990   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:44:52,389-Speed 10496.23 samples/sec   Loss 6.2111   LearningRate 0.1443   Epoch: 12   Global Step: 63000   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:45:00,197-Speed 10491.96 samples/sec   Loss 6.2363   LearningRate 0.1443   Epoch: 12   Global Step: 63010   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:45:08,013-Speed 10482.14 samples/sec   Loss 6.1829   LearningRate 0.1442   Epoch: 12   Global Step: 63020   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:45:15,811-Speed 10508.39 samples/sec   Loss 6.2250   LearningRate 0.1441   Epoch: 12   Global Step: 63030   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:45:23,594-Speed 10525.77 samples/sec   Loss 6.2585   LearningRate 0.1440   Epoch: 12   Global Step: 63040   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:45:31,386-Speed 10514.42 samples/sec   Loss 6.1986   LearningRate 0.1440   Epoch: 12   Global Step: 63050   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:45:39,186-Speed 10503.78 samples/sec   Loss 6.1936   LearningRate 0.1439   Epoch: 12   Global Step: 63060   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:45:46,999-Speed 10489.33 samples/sec   Loss 6.1970   LearningRate 0.1438   Epoch: 12   Global Step: 63070   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-01-16 04:45:54,871-Speed 10407.30 samples/sec   Loss 6.2010   LearningRate 0.1438   Epoch: 12   Global Step: 63080   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:46:02,661-Speed 10517.70 samples/sec   Loss 6.2527   LearningRate 0.1437   Epoch: 12   Global Step: 63090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:46:10,426-Speed 10551.78 samples/sec   Loss 6.2123   LearningRate 0.1436   Epoch: 12   Global Step: 63100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:46:18,236-Speed 10489.40 samples/sec   Loss 6.2054   LearningRate 0.1435   Epoch: 12   Global Step: 63110   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:46:26,038-Speed 10502.45 samples/sec   Loss 6.2097   LearningRate 0.1435   Epoch: 12   Global Step: 63120   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:46:33,822-Speed 10525.49 samples/sec   Loss 6.2475   LearningRate 0.1434   Epoch: 12   Global Step: 63130   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:46:41,632-Speed 10490.56 samples/sec   Loss 6.2481   LearningRate 0.1433   Epoch: 12   Global Step: 63140   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:46:49,440-Speed 10492.65 samples/sec   Loss 6.2092   LearningRate 0.1433   Epoch: 12   Global Step: 63150   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:46:57,232-Speed 10515.22 samples/sec   Loss 6.2260   LearningRate 0.1432   Epoch: 12   Global Step: 63160   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:47:05,013-Speed 10531.79 samples/sec   Loss 6.2081   LearningRate 0.1431   Epoch: 12   Global Step: 63170   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:47:12,802-Speed 10521.42 samples/sec   Loss 6.1642   LearningRate 0.1431   Epoch: 12   Global Step: 63180   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:47:20,588-Speed 10523.32 samples/sec   Loss 6.2307   LearningRate 0.1430   Epoch: 12   Global Step: 63190   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:47:28,398-Speed 10490.78 samples/sec   Loss 6.1456   LearningRate 0.1429   Epoch: 12   Global Step: 63200   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:47:36,210-Speed 10487.24 samples/sec   Loss 6.2360   LearningRate 0.1428   Epoch: 12   Global Step: 63210   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:47:44,139-Speed 10332.40 samples/sec   Loss 6.2192   LearningRate 0.1428   Epoch: 12   Global Step: 63220   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:47:51,935-Speed 10510.52 samples/sec   Loss 6.2080   LearningRate 0.1427   Epoch: 12   Global Step: 63230   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:47:59,714-Speed 10530.84 samples/sec   Loss 6.1638   LearningRate 0.1426   Epoch: 12   Global Step: 63240   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:48:07,515-Speed 10503.14 samples/sec   Loss 6.1705   LearningRate 0.1426   Epoch: 12   Global Step: 63250   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:48:15,300-Speed 10523.89 samples/sec   Loss 6.1753   LearningRate 0.1425   Epoch: 12   Global Step: 63260   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:48:23,086-Speed 10523.59 samples/sec   Loss 6.2106   LearningRate 0.1424   Epoch: 12   Global Step: 63270   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:48:30,884-Speed 10505.94 samples/sec   Loss 6.2243   LearningRate 0.1423   Epoch: 12   Global Step: 63280   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:48:38,694-Speed 10490.73 samples/sec   Loss 6.1868   LearningRate 0.1423   Epoch: 12   Global Step: 63290   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:48:46,478-Speed 10526.24 samples/sec   Loss 6.1886   LearningRate 0.1422   Epoch: 12   Global Step: 63300   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:48:54,255-Speed 10537.64 samples/sec   Loss 6.2130   LearningRate 0.1421   Epoch: 12   Global Step: 63310   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:49:02,084-Speed 10464.92 samples/sec   Loss 6.1962   LearningRate 0.1421   Epoch: 12   Global Step: 63320   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:49:09,894-Speed 10489.27 samples/sec   Loss 6.2096   LearningRate 0.1420   Epoch: 12   Global Step: 63330   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:49:17,703-Speed 10492.00 samples/sec   Loss 6.1329   LearningRate 0.1419   Epoch: 12   Global Step: 63340   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:49:25,521-Speed 10480.73 samples/sec   Loss 6.1640   LearningRate 0.1419   Epoch: 12   Global Step: 63350   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:49:33,305-Speed 10525.00 samples/sec   Loss 6.2506   LearningRate 0.1418   Epoch: 12   Global Step: 63360   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:49:41,117-Speed 10488.35 samples/sec   Loss 6.1684   LearningRate 0.1417   Epoch: 12   Global Step: 63370   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:49:48,910-Speed 10513.00 samples/sec   Loss 6.1979   LearningRate 0.1416   Epoch: 12   Global Step: 63380   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:49:56,688-Speed 10533.97 samples/sec   Loss 6.1906   LearningRate 0.1416   Epoch: 12   Global Step: 63390   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:50:04,473-Speed 10524.65 samples/sec   Loss 6.2121   LearningRate 0.1415   Epoch: 12   Global Step: 63400   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:50:12,242-Speed 10545.87 samples/sec   Loss 6.1724   LearningRate 0.1414   Epoch: 12   Global Step: 63410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:50:20,036-Speed 10511.42 samples/sec   Loss 6.1894   LearningRate 0.1414   Epoch: 12   Global Step: 63420   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:50:27,807-Speed 10543.28 samples/sec   Loss 6.1814   LearningRate 0.1413   Epoch: 12   Global Step: 63430   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:50:35,584-Speed 10535.15 samples/sec   Loss 6.2155   LearningRate 0.1412   Epoch: 12   Global Step: 63440   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:50:43,398-Speed 10484.28 samples/sec   Loss 6.2167   LearningRate 0.1412   Epoch: 12   Global Step: 63450   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:50:51,191-Speed 10514.60 samples/sec   Loss 6.1460   LearningRate 0.1411   Epoch: 12   Global Step: 63460   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:50:59,004-Speed 10486.56 samples/sec   Loss 6.1414   LearningRate 0.1410   Epoch: 12   Global Step: 63470   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:51:06,816-Speed 10487.46 samples/sec   Loss 6.1797   LearningRate 0.1409   Epoch: 12   Global Step: 63480   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:51:14,612-Speed 10509.88 samples/sec   Loss 6.1418   LearningRate 0.1409   Epoch: 12   Global Step: 63490   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:51:22,381-Speed 10545.83 samples/sec   Loss 6.1628   LearningRate 0.1408   Epoch: 12   Global Step: 63500   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:51:30,172-Speed 10515.74 samples/sec   Loss 6.1574   LearningRate 0.1407   Epoch: 12   Global Step: 63510   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:51:37,960-Speed 10520.36 samples/sec   Loss 6.1872   LearningRate 0.1407   Epoch: 12   Global Step: 63520   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:51:45,760-Speed 10504.10 samples/sec   Loss 6.1596   LearningRate 0.1406   Epoch: 12   Global Step: 63530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:51:53,554-Speed 10512.36 samples/sec   Loss 6.1895   LearningRate 0.1405   Epoch: 12   Global Step: 63540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:52:01,381-Speed 10468.24 samples/sec   Loss 6.1835   LearningRate 0.1404   Epoch: 12   Global Step: 63550   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:52:09,178-Speed 10507.78 samples/sec   Loss 6.1553   LearningRate 0.1404   Epoch: 12   Global Step: 63560   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:52:16,987-Speed 10491.91 samples/sec   Loss 6.1232   LearningRate 0.1403   Epoch: 12   Global Step: 63570   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:52:24,800-Speed 10486.23 samples/sec   Loss 6.1247   LearningRate 0.1402   Epoch: 12   Global Step: 63580   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:52:32,583-Speed 10526.83 samples/sec   Loss 6.1554   LearningRate 0.1402   Epoch: 12   Global Step: 63590   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:52:40,396-Speed 10487.92 samples/sec   Loss 6.1185   LearningRate 0.1401   Epoch: 12   Global Step: 63600   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:52:48,243-Speed 10440.61 samples/sec   Loss 6.1149   LearningRate 0.1400   Epoch: 12   Global Step: 63610   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:52:56,019-Speed 10536.68 samples/sec   Loss 6.1751   LearningRate 0.1400   Epoch: 12   Global Step: 63620   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:53:03,847-Speed 10470.18 samples/sec   Loss 6.1740   LearningRate 0.1399   Epoch: 12   Global Step: 63630   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:53:11,647-Speed 10504.52 samples/sec   Loss 6.1875   LearningRate 0.1398   Epoch: 12   Global Step: 63640   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:53:19,415-Speed 10547.92 samples/sec   Loss 6.2081   LearningRate 0.1398   Epoch: 12   Global Step: 63650   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:53:27,212-Speed 10507.81 samples/sec   Loss 6.1490   LearningRate 0.1397   Epoch: 12   Global Step: 63660   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:53:34,993-Speed 10530.32 samples/sec   Loss 6.1446   LearningRate 0.1396   Epoch: 12   Global Step: 63670   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:53:42,802-Speed 10492.63 samples/sec   Loss 6.1779   LearningRate 0.1395   Epoch: 12   Global Step: 63680   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:53:50,578-Speed 10535.40 samples/sec   Loss 6.1315   LearningRate 0.1395   Epoch: 12   Global Step: 63690   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:53:58,368-Speed 10520.46 samples/sec   Loss 6.1400   LearningRate 0.1394   Epoch: 12   Global Step: 63700   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:54:06,176-Speed 10493.42 samples/sec   Loss 6.1188   LearningRate 0.1393   Epoch: 12   Global Step: 63710   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:54:13,948-Speed 10542.12 samples/sec   Loss 6.1115   LearningRate 0.1393   Epoch: 12   Global Step: 63720   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:54:21,733-Speed 10524.13 samples/sec   Loss 6.1872   LearningRate 0.1392   Epoch: 12   Global Step: 63730   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:54:29,512-Speed 10533.71 samples/sec   Loss 6.1021   LearningRate 0.1391   Epoch: 12   Global Step: 63740   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:54:37,344-Speed 10460.71 samples/sec   Loss 6.1091   LearningRate 0.1391   Epoch: 12   Global Step: 63750   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:54:45,141-Speed 10509.11 samples/sec   Loss 6.1449   LearningRate 0.1390   Epoch: 12   Global Step: 63760   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:54:52,929-Speed 10520.03 samples/sec   Loss 6.1653   LearningRate 0.1389   Epoch: 12   Global Step: 63770   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:55:00,753-Speed 10470.85 samples/sec   Loss 6.1389   LearningRate 0.1388   Epoch: 12   Global Step: 63780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:55:08,570-Speed 10481.89 samples/sec   Loss 6.1301   LearningRate 0.1388   Epoch: 12   Global Step: 63790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:55:16,425-Speed 10430.38 samples/sec   Loss 6.1609   LearningRate 0.1387   Epoch: 12   Global Step: 63800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:55:24,214-Speed 10517.78 samples/sec   Loss 6.1396   LearningRate 0.1386   Epoch: 12   Global Step: 63810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:55:32,009-Speed 10511.60 samples/sec   Loss 6.1374   LearningRate 0.1386   Epoch: 12   Global Step: 63820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:55:39,803-Speed 10511.17 samples/sec   Loss 6.1205   LearningRate 0.1385   Epoch: 12   Global Step: 63830   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-01-16 04:55:47,602-Speed 10505.74 samples/sec   Loss 6.1224   LearningRate 0.1384   Epoch: 12   Global Step: 63840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:55:55,424-Speed 10481.19 samples/sec   Loss 6.1392   LearningRate 0.1384   Epoch: 12   Global Step: 63850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:56:03,212-Speed 10519.63 samples/sec   Loss 6.1266   LearningRate 0.1383   Epoch: 12   Global Step: 63860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:56:11,049-Speed 10454.03 samples/sec   Loss 6.1209   LearningRate 0.1382   Epoch: 12   Global Step: 63870   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:56:18,908-Speed 10425.44 samples/sec   Loss 6.1032   LearningRate 0.1381   Epoch: 12   Global Step: 63880   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:56:26,688-Speed 10531.24 samples/sec   Loss 6.0810   LearningRate 0.1381   Epoch: 12   Global Step: 63890   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:56:34,467-Speed 10532.32 samples/sec   Loss 6.1386   LearningRate 0.1380   Epoch: 12   Global Step: 63900   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:56:42,256-Speed 10518.44 samples/sec   Loss 6.0669   LearningRate 0.1379   Epoch: 12   Global Step: 63910   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:56:50,028-Speed 10543.25 samples/sec   Loss 6.0648   LearningRate 0.1379   Epoch: 12   Global Step: 63920   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:56:57,833-Speed 10497.35 samples/sec   Loss 6.1099   LearningRate 0.1378   Epoch: 12   Global Step: 63930   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:57:05,636-Speed 10500.70 samples/sec   Loss 6.0695   LearningRate 0.1377   Epoch: 12   Global Step: 63940   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:57:13,416-Speed 10530.98 samples/sec   Loss 6.1316   LearningRate 0.1377   Epoch: 12   Global Step: 63950   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:57:21,263-Speed 10441.20 samples/sec   Loss 6.1365   LearningRate 0.1376   Epoch: 12   Global Step: 63960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:57:29,050-Speed 10520.63 samples/sec   Loss 6.1020   LearningRate 0.1375   Epoch: 12   Global Step: 63970   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:57:36,845-Speed 10510.98 samples/sec   Loss 6.1292   LearningRate 0.1375   Epoch: 12   Global Step: 63980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:57:44,663-Speed 10480.37 samples/sec   Loss 6.0918   LearningRate 0.1374   Epoch: 12   Global Step: 63990   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:57:52,453-Speed 10516.73 samples/sec   Loss 6.0977   LearningRate 0.1373   Epoch: 12   Global Step: 64000   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:58:00,245-Speed 10514.57 samples/sec   Loss 6.1059   LearningRate 0.1372   Epoch: 12   Global Step: 64010   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:58:08,070-Speed 10471.25 samples/sec   Loss 6.0809   LearningRate 0.1372   Epoch: 12   Global Step: 64020   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:58:15,855-Speed 10523.26 samples/sec   Loss 6.0827   LearningRate 0.1371   Epoch: 12   Global Step: 64030   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:58:23,642-Speed 10522.05 samples/sec   Loss 6.0850   LearningRate 0.1370   Epoch: 12   Global Step: 64040   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:58:31,486-Speed 10445.13 samples/sec   Loss 6.1067   LearningRate 0.1370   Epoch: 12   Global Step: 64050   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:58:39,325-Speed 10451.03 samples/sec   Loss 6.0893   LearningRate 0.1369   Epoch: 12   Global Step: 64060   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:58:47,123-Speed 10507.93 samples/sec   Loss 6.0949   LearningRate 0.1368   Epoch: 12   Global Step: 64070   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:58:54,918-Speed 10510.61 samples/sec   Loss 6.0908   LearningRate 0.1368   Epoch: 12   Global Step: 64080   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:59:02,743-Speed 10471.24 samples/sec   Loss 6.0826   LearningRate 0.1367   Epoch: 12   Global Step: 64090   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:59:10,559-Speed 10482.33 samples/sec   Loss 6.1038   LearningRate 0.1366   Epoch: 12   Global Step: 64100   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:59:18,353-Speed 10511.87 samples/sec   Loss 6.0823   LearningRate 0.1366   Epoch: 12   Global Step: 64110   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:59:26,173-Speed 10477.71 samples/sec   Loss 6.0955   LearningRate 0.1365   Epoch: 12   Global Step: 64120   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:59:33,979-Speed 10496.39 samples/sec   Loss 6.0912   LearningRate 0.1364   Epoch: 12   Global Step: 64130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:59:41,781-Speed 10500.26 samples/sec   Loss 6.0844   LearningRate 0.1363   Epoch: 12   Global Step: 64140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 04:59:49,569-Speed 10519.87 samples/sec   Loss 6.0857   LearningRate 0.1363   Epoch: 12   Global Step: 64150   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 04:59:57,341-Speed 10542.50 samples/sec   Loss 6.0514   LearningRate 0.1362   Epoch: 12   Global Step: 64160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 05:00:05,120-Speed 10533.78 samples/sec   Loss 6.0375   LearningRate 0.1361   Epoch: 12   Global Step: 64170   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 05:00:12,909-Speed 10517.32 samples/sec   Loss 6.0705   LearningRate 0.1361   Epoch: 12   Global Step: 64180   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 05:00:20,722-Speed 10486.70 samples/sec   Loss 6.0105   LearningRate 0.1360   Epoch: 12   Global Step: 64190   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 05:00:28,515-Speed 10513.44 samples/sec   Loss 6.0717   LearningRate 0.1359   Epoch: 12   Global Step: 64200   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 05:00:36,332-Speed 10483.01 samples/sec   Loss 6.0596   LearningRate 0.1359   Epoch: 12   Global Step: 64210   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 05:00:44,125-Speed 10512.76 samples/sec   Loss 6.0578   LearningRate 0.1358   Epoch: 12   Global Step: 64220   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 05:00:51,915-Speed 10517.10 samples/sec   Loss 6.0119   LearningRate 0.1357   Epoch: 12   Global Step: 64230   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 05:00:59,698-Speed 10526.99 samples/sec   Loss 6.0835   LearningRate 0.1357   Epoch: 12   Global Step: 64240   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 05:01:07,479-Speed 10529.67 samples/sec   Loss 6.0380   LearningRate 0.1356   Epoch: 12   Global Step: 64250   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 05:01:15,272-Speed 10514.11 samples/sec   Loss 6.1047   LearningRate 0.1355   Epoch: 12   Global Step: 64260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:01:23,101-Speed 10463.47 samples/sec   Loss 6.0451   LearningRate 0.1355   Epoch: 12   Global Step: 64270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:01:30,905-Speed 10499.55 samples/sec   Loss 6.0857   LearningRate 0.1354   Epoch: 12   Global Step: 64280   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 05:01:38,698-Speed 10512.67 samples/sec   Loss 6.0275   LearningRate 0.1353   Epoch: 12   Global Step: 64290   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 05:01:46,485-Speed 10521.78 samples/sec   Loss 6.0298   LearningRate 0.1352   Epoch: 12   Global Step: 64300   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 05:01:54,282-Speed 10508.39 samples/sec   Loss 6.0550   LearningRate 0.1352   Epoch: 12   Global Step: 64310   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 05:02:02,124-Speed 10447.80 samples/sec   Loss 6.0070   LearningRate 0.1351   Epoch: 12   Global Step: 64320   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 05:02:09,928-Speed 10498.25 samples/sec   Loss 6.0489   LearningRate 0.1350   Epoch: 12   Global Step: 64330   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 05:02:17,739-Speed 10489.53 samples/sec   Loss 6.0755   LearningRate 0.1350   Epoch: 12   Global Step: 64340   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 05:02:25,520-Speed 10529.89 samples/sec   Loss 6.0622   LearningRate 0.1349   Epoch: 12   Global Step: 64350   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 05:02:33,305-Speed 10523.86 samples/sec   Loss 6.0592   LearningRate 0.1348   Epoch: 12   Global Step: 64360   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 05:02:41,123-Speed 10478.99 samples/sec   Loss 6.0032   LearningRate 0.1348   Epoch: 12   Global Step: 64370   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 05:02:48,929-Speed 10495.93 samples/sec   Loss 5.9985   LearningRate 0.1347   Epoch: 12   Global Step: 64380   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:02:56,744-Speed 10484.98 samples/sec   Loss 6.0601   LearningRate 0.1346   Epoch: 12   Global Step: 64390   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:03:04,554-Speed 10491.36 samples/sec   Loss 6.0438   LearningRate 0.1346   Epoch: 12   Global Step: 64400   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:03:12,364-Speed 10490.81 samples/sec   Loss 6.0247   LearningRate 0.1345   Epoch: 12   Global Step: 64410   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:03:20,154-Speed 10517.67 samples/sec   Loss 6.0475   LearningRate 0.1344   Epoch: 12   Global Step: 64420   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:03:28,006-Speed 10435.97 samples/sec   Loss 6.0318   LearningRate 0.1344   Epoch: 12   Global Step: 64430   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:03:35,797-Speed 10515.52 samples/sec   Loss 6.0451   LearningRate 0.1343   Epoch: 12   Global Step: 64440   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:03:43,610-Speed 10490.60 samples/sec   Loss 6.0393   LearningRate 0.1342   Epoch: 12   Global Step: 64450   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:03:51,390-Speed 10531.30 samples/sec   Loss 6.0488   LearningRate 0.1342   Epoch: 12   Global Step: 64460   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:03:59,198-Speed 10493.44 samples/sec   Loss 6.0299   LearningRate 0.1341   Epoch: 12   Global Step: 64470   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:04:06,987-Speed 10519.51 samples/sec   Loss 6.0530   LearningRate 0.1340   Epoch: 12   Global Step: 64480   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-01-16 05:04:14,773-Speed 10522.37 samples/sec   Loss 6.0033   LearningRate 0.1339   Epoch: 12   Global Step: 64490   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:04:22,553-Speed 10534.30 samples/sec   Loss 6.0535   LearningRate 0.1339   Epoch: 12   Global Step: 64500   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:04:30,368-Speed 10484.46 samples/sec   Loss 5.9896   LearningRate 0.1338   Epoch: 12   Global Step: 64510   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:04:38,148-Speed 10530.78 samples/sec   Loss 6.0428   LearningRate 0.1337   Epoch: 12   Global Step: 64520   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:04:45,927-Speed 10533.37 samples/sec   Loss 6.0391   LearningRate 0.1337   Epoch: 12   Global Step: 64530   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:04:53,722-Speed 10510.27 samples/sec   Loss 6.0098   LearningRate 0.1336   Epoch: 12   Global Step: 64540   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:05:01,514-Speed 10515.20 samples/sec   Loss 6.0183   LearningRate 0.1335   Epoch: 12   Global Step: 64550   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:05:09,386-Speed 10406.62 samples/sec   Loss 6.0582   LearningRate 0.1335   Epoch: 12   Global Step: 64560   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:05:17,171-Speed 10524.59 samples/sec   Loss 6.0561   LearningRate 0.1334   Epoch: 12   Global Step: 64570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:05:24,991-Speed 10477.12 samples/sec   Loss 6.0004   LearningRate 0.1333   Epoch: 12   Global Step: 64580   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:05:32,759-Speed 10547.64 samples/sec   Loss 6.0507   LearningRate 0.1333   Epoch: 12   Global Step: 64590   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-01-16 05:05:40,550-Speed 10515.37 samples/sec   Loss 5.9773   LearningRate 0.1332   Epoch: 12   Global Step: 64600   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:05:48,353-Speed 10500.40 samples/sec   Loss 6.0295   LearningRate 0.1331   Epoch: 12   Global Step: 64610   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:05:56,137-Speed 10524.87 samples/sec   Loss 6.0029   LearningRate 0.1331   Epoch: 12   Global Step: 64620   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 05:06:03,949-Speed 10488.40 samples/sec   Loss 6.0145   LearningRate 0.1330   Epoch: 12   Global Step: 64630   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 05:06:11,734-Speed 10524.49 samples/sec   Loss 6.0100   LearningRate 0.1329   Epoch: 12   Global Step: 64640   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 05:06:19,528-Speed 10513.43 samples/sec   Loss 5.9975   LearningRate 0.1329   Epoch: 12   Global Step: 64650   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 05:06:27,322-Speed 10511.24 samples/sec   Loss 6.0047   LearningRate 0.1328   Epoch: 12   Global Step: 64660   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 05:06:35,105-Speed 10527.61 samples/sec   Loss 6.0161   LearningRate 0.1327   Epoch: 12   Global Step: 64670   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 05:06:42,936-Speed 10462.33 samples/sec   Loss 6.0165   LearningRate 0.1327   Epoch: 12   Global Step: 64680   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 05:06:50,740-Speed 10497.96 samples/sec   Loss 6.0192   LearningRate 0.1326   Epoch: 12   Global Step: 64690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 05:06:58,544-Speed 10499.02 samples/sec   Loss 6.0114   LearningRate 0.1325   Epoch: 12   Global Step: 64700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 05:07:06,386-Speed 10447.74 samples/sec   Loss 6.0289   LearningRate 0.1324   Epoch: 12   Global Step: 64710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 05:07:14,175-Speed 10518.68 samples/sec   Loss 5.9980   LearningRate 0.1324   Epoch: 12   Global Step: 64720   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:07:21,985-Speed 10490.12 samples/sec   Loss 5.9790   LearningRate 0.1323   Epoch: 12   Global Step: 64730   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:07:29,779-Speed 10513.05 samples/sec   Loss 6.0014   LearningRate 0.1322   Epoch: 12   Global Step: 64740   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:07:37,578-Speed 10505.07 samples/sec   Loss 6.0380   LearningRate 0.1322   Epoch: 12   Global Step: 64750   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:07:45,363-Speed 10524.78 samples/sec   Loss 5.9734   LearningRate 0.1321   Epoch: 12   Global Step: 64760   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:07:53,177-Speed 10485.06 samples/sec   Loss 5.9819   LearningRate 0.1320   Epoch: 12   Global Step: 64770   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:08:00,968-Speed 10517.43 samples/sec   Loss 6.0399   LearningRate 0.1320   Epoch: 12   Global Step: 64780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:08:08,783-Speed 10483.88 samples/sec   Loss 6.0186   LearningRate 0.1319   Epoch: 12   Global Step: 64790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:08:16,572-Speed 10519.25 samples/sec   Loss 5.9583   LearningRate 0.1318   Epoch: 12   Global Step: 64800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:08:24,366-Speed 10512.66 samples/sec   Loss 6.0165   LearningRate 0.1318   Epoch: 12   Global Step: 64810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:08:32,164-Speed 10506.47 samples/sec   Loss 5.9474   LearningRate 0.1317   Epoch: 12   Global Step: 64820   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-01-16 05:08:39,966-Speed 10503.51 samples/sec   Loss 5.9685   LearningRate 0.1316   Epoch: 12   Global Step: 64830   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-01-16 05:08:47,748-Speed 10528.04 samples/sec   Loss 5.9828   LearningRate 0.1316   Epoch: 12   Global Step: 64840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:08:55,534-Speed 10522.90 samples/sec   Loss 5.9595   LearningRate 0.1315   Epoch: 12   Global Step: 64850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:09:03,370-Speed 10456.46 samples/sec   Loss 6.0058   LearningRate 0.1314   Epoch: 12   Global Step: 64860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:09:11,152-Speed 10528.59 samples/sec   Loss 5.9467   LearningRate 0.1314   Epoch: 12   Global Step: 64870   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:09:18,928-Speed 10537.07 samples/sec   Loss 5.9577   LearningRate 0.1313   Epoch: 12   Global Step: 64880   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:09:26,731-Speed 10499.89 samples/sec   Loss 5.9242   LearningRate 0.1312   Epoch: 12   Global Step: 64890   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:09:34,514-Speed 10526.37 samples/sec   Loss 5.9989   LearningRate 0.1312   Epoch: 12   Global Step: 64900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:09:42,295-Speed 10529.92 samples/sec   Loss 6.0147   LearningRate 0.1311   Epoch: 12   Global Step: 64910   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:09:50,102-Speed 10494.22 samples/sec   Loss 5.9469   LearningRate 0.1310   Epoch: 12   Global Step: 64920   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:09:57,923-Speed 10475.98 samples/sec   Loss 5.9468   LearningRate 0.1310   Epoch: 12   Global Step: 64930   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:10:05,703-Speed 10530.66 samples/sec   Loss 5.9966   LearningRate 0.1309   Epoch: 12   Global Step: 64940   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:10:13,487-Speed 10525.38 samples/sec   Loss 5.9747   LearningRate 0.1308   Epoch: 12   Global Step: 64950   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:10:21,267-Speed 10531.17 samples/sec   Loss 5.9772   LearningRate 0.1308   Epoch: 12   Global Step: 64960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-16 05:10:29,042-Speed 10537.42 samples/sec   Loss 5.9289   LearningRate 0.1307   Epoch: 12   Global Step: 64970   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 05:10:36,837-Speed 10512.08 samples/sec   Loss 5.9258   LearningRate 0.1306   Epoch: 12   Global Step: 64980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 05:10:44,626-Speed 10518.81 samples/sec   Loss 5.9696   LearningRate 0.1306   Epoch: 12   Global Step: 64990   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 05:10:52,425-Speed 10506.05 samples/sec   Loss 5.9915   LearningRate 0.1305   Epoch: 12   Global Step: 65000   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 05:11:00,213-Speed 10519.53 samples/sec   Loss 6.0320   LearningRate 0.1304   Epoch: 12   Global Step: 65010   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 05:11:07,998-Speed 10524.85 samples/sec   Loss 5.9960   LearningRate 0.1303   Epoch: 12   Global Step: 65020   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 05:11:15,780-Speed 10528.02 samples/sec   Loss 5.9837   LearningRate 0.1303   Epoch: 12   Global Step: 65030   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-16 05:11:23,591-Speed 10493.88 samples/sec   Loss 5.9684   LearningRate 0.1302   Epoch: 12   Global Step: 65040   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:11:31,375-Speed 10525.33 samples/sec   Loss 5.9711   LearningRate 0.1301   Epoch: 12   Global Step: 65050   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-16 05:11:39,177-Speed 10500.87 samples/sec   Loss 5.9305   LearningRate 0.1301   Epoch: 12   Global Step: 65060   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-16 05:11:46,968-Speed 10525.84 samples/sec   Loss 5.9615   LearningRate 0.1300   Epoch: 12   Global Step: 65070   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-16 05:11:54,792-Speed 10472.21 samples/sec   Loss 5.9423   LearningRate 0.1299   Epoch: 12   Global Step: 65080   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-16 05:12:02,590-Speed 10506.04 samples/sec   Loss 5.9558   LearningRate 0.1299   Epoch: 12   Global Step: 65090   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-16 05:12:10,353-Speed 10554.65 samples/sec   Loss 5.9657   LearningRate 0.1298   Epoch: 12   Global Step: 65100   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-16 05:12:18,123-Speed 10544.64 samples/sec   Loss 5.9846   LearningRate 0.1297   Epoch: 12   Global Step: 65110   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-16 05:12:25,918-Speed 10510.68 samples/sec   Loss 5.9791   LearningRate 0.1297   Epoch: 12   Global Step: 65120   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-16 05:12:33,697-Speed 10532.53 samples/sec   Loss 5.9658   LearningRate 0.1296   Epoch: 12   Global Step: 65130   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-16 05:12:41,513-Speed 10481.82 samples/sec   Loss 5.9378   LearningRate 0.1295   Epoch: 12   Global Step: 65140   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-16 05:12:49,276-Speed 10555.08 samples/sec   Loss 5.9593   LearningRate 0.1295   Epoch: 12   Global Step: 65150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:12:57,078-Speed 10500.90 samples/sec   Loss 5.9467   LearningRate 0.1294   Epoch: 12   Global Step: 65160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:13:04,873-Speed 10510.80 samples/sec   Loss 5.9186   LearningRate 0.1293   Epoch: 12   Global Step: 65170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:13:12,691-Speed 10479.66 samples/sec   Loss 5.9428   LearningRate 0.1293   Epoch: 12   Global Step: 65180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:13:20,479-Speed 10520.26 samples/sec   Loss 5.9184   LearningRate 0.1292   Epoch: 12   Global Step: 65190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:13:28,250-Speed 10543.32 samples/sec   Loss 5.8943   LearningRate 0.1291   Epoch: 12   Global Step: 65200   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:13:36,053-Speed 10499.57 samples/sec   Loss 5.9552   LearningRate 0.1291   Epoch: 12   Global Step: 65210   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:13:43,844-Speed 10515.89 samples/sec   Loss 5.9885   LearningRate 0.1290   Epoch: 12   Global Step: 65220   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:13:51,658-Speed 10486.66 samples/sec   Loss 5.9540   LearningRate 0.1289   Epoch: 12   Global Step: 65230   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:13:59,450-Speed 10514.86 samples/sec   Loss 5.9206   LearningRate 0.1289   Epoch: 12   Global Step: 65240   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:14:07,268-Speed 10479.61 samples/sec   Loss 5.8961   LearningRate 0.1288   Epoch: 12   Global Step: 65250   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:14:15,076-Speed 10493.76 samples/sec   Loss 5.9172   LearningRate 0.1287   Epoch: 12   Global Step: 65260   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:14:22,873-Speed 10506.84 samples/sec   Loss 5.9217   LearningRate 0.1287   Epoch: 12   Global Step: 65270   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:14:30,676-Speed 10501.18 samples/sec   Loss 5.9323   LearningRate 0.1286   Epoch: 12   Global Step: 65280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:14:38,476-Speed 10503.61 samples/sec   Loss 5.9294   LearningRate 0.1285   Epoch: 12   Global Step: 65290   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:14:46,260-Speed 10526.41 samples/sec   Loss 5.9465   LearningRate 0.1285   Epoch: 12   Global Step: 65300   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:14:54,043-Speed 10527.30 samples/sec   Loss 5.9084   LearningRate 0.1284   Epoch: 12   Global Step: 65310   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-16 05:15:01,835-Speed 10513.85 samples/sec   Loss 5.9278   LearningRate 0.1283   Epoch: 12   Global Step: 65320   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-16 05:15:09,618-Speed 10527.82 samples/sec   Loss 5.8987   LearningRate 0.1283   Epoch: 12   Global Step: 65330   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-16 05:15:17,402-Speed 10525.00 samples/sec   Loss 5.9746   LearningRate 0.1282   Epoch: 12   Global Step: 65340   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-16 05:15:25,182-Speed 10531.02 samples/sec   Loss 5.8508   LearningRate 0.1281   Epoch: 12   Global Step: 65350   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-16 05:15:33,000-Speed 10480.02 samples/sec   Loss 5.8846   LearningRate 0.1281   Epoch: 12   Global Step: 65360   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-16 05:15:40,772-Speed 10541.13 samples/sec   Loss 5.8985   LearningRate 0.1280   Epoch: 12   Global Step: 65370   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-16 05:15:48,598-Speed 10469.41 samples/sec   Loss 5.9006   LearningRate 0.1279   Epoch: 12   Global Step: 65380   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-16 05:15:56,370-Speed 10542.45 samples/sec   Loss 5.9523   LearningRate 0.1279   Epoch: 12   Global Step: 65390   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-16 05:16:04,152-Speed 10527.14 samples/sec   Loss 5.9456   LearningRate 0.1278   Epoch: 12   Global Step: 65400   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-16 05:16:11,952-Speed 10503.97 samples/sec   Loss 5.9011   LearningRate 0.1277   Epoch: 12   Global Step: 65410   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:16:19,737-Speed 10525.64 samples/sec   Loss 5.8576   LearningRate 0.1277   Epoch: 12   Global Step: 65420   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:16:27,537-Speed 10503.39 samples/sec   Loss 5.9320   LearningRate 0.1276   Epoch: 12   Global Step: 65430   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:16:35,344-Speed 10494.41 samples/sec   Loss 5.9243   LearningRate 0.1275   Epoch: 12   Global Step: 65440   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:16:43,168-Speed 10472.77 samples/sec   Loss 5.9090   LearningRate 0.1275   Epoch: 12   Global Step: 65450   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:16:50,935-Speed 10547.36 samples/sec   Loss 5.9246   LearningRate 0.1274   Epoch: 12   Global Step: 65460   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:16:58,707-Speed 10541.61 samples/sec   Loss 5.8772   LearningRate 0.1273   Epoch: 12   Global Step: 65470   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:17:06,512-Speed 10498.16 samples/sec   Loss 5.9071   LearningRate 0.1273   Epoch: 12   Global Step: 65480   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:17:14,286-Speed 10538.76 samples/sec   Loss 5.9144   LearningRate 0.1272   Epoch: 12   Global Step: 65490   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:17:22,072-Speed 10523.15 samples/sec   Loss 5.8499   LearningRate 0.1271   Epoch: 12   Global Step: 65500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:17:29,854-Speed 10528.17 samples/sec   Loss 5.8761   LearningRate 0.1271   Epoch: 12   Global Step: 65510   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:17:37,667-Speed 10486.83 samples/sec   Loss 5.8491   LearningRate 0.1270   Epoch: 12   Global Step: 65520   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:17:45,468-Speed 10502.37 samples/sec   Loss 5.8770   LearningRate 0.1269   Epoch: 12   Global Step: 65530   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:17:53,272-Speed 10499.57 samples/sec   Loss 5.9002   LearningRate 0.1269   Epoch: 12   Global Step: 65540   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:18:01,044-Speed 10545.46 samples/sec   Loss 5.9235   LearningRate 0.1268   Epoch: 12   Global Step: 65550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:18:08,859-Speed 10483.28 samples/sec   Loss 5.9033   LearningRate 0.1267   Epoch: 12   Global Step: 65560   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:18:16,645-Speed 10525.28 samples/sec   Loss 5.8726   LearningRate 0.1267   Epoch: 12   Global Step: 65570   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:18:24,437-Speed 10514.98 samples/sec   Loss 5.8942   LearningRate 0.1266   Epoch: 12   Global Step: 65580   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:18:32,230-Speed 10513.83 samples/sec   Loss 5.8772   LearningRate 0.1265   Epoch: 12   Global Step: 65590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:18:40,032-Speed 10501.75 samples/sec   Loss 5.8662   LearningRate 0.1265   Epoch: 12   Global Step: 65600   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:18:47,833-Speed 10503.02 samples/sec   Loss 5.8679   LearningRate 0.1264   Epoch: 12   Global Step: 65610   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:18:55,618-Speed 10524.75 samples/sec   Loss 5.8525   LearningRate 0.1263   Epoch: 12   Global Step: 65620   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:19:03,417-Speed 10504.73 samples/sec   Loss 5.8735   LearningRate 0.1263   Epoch: 12   Global Step: 65630   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:19:11,189-Speed 10541.99 samples/sec   Loss 5.8913   LearningRate 0.1262   Epoch: 12   Global Step: 65640   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:19:18,946-Speed 10562.30 samples/sec   Loss 5.8951   LearningRate 0.1261   Epoch: 12   Global Step: 65650   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:19:26,755-Speed 10492.59 samples/sec   Loss 5.8956   LearningRate 0.1261   Epoch: 12   Global Step: 65660   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:19:34,546-Speed 10515.71 samples/sec   Loss 5.8509   LearningRate 0.1260   Epoch: 12   Global Step: 65670   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:19:42,337-Speed 10517.22 samples/sec   Loss 5.8387   LearningRate 0.1259   Epoch: 12   Global Step: 65680   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:19:50,141-Speed 10499.03 samples/sec   Loss 5.8978   LearningRate 0.1259   Epoch: 12   Global Step: 65690   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:19:57,985-Speed 10444.72 samples/sec   Loss 5.8764   LearningRate 0.1258   Epoch: 12   Global Step: 65700   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:20:05,786-Speed 10502.50 samples/sec   Loss 5.8714   LearningRate 0.1257   Epoch: 12   Global Step: 65710   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:20:13,567-Speed 10530.30 samples/sec   Loss 5.8565   LearningRate 0.1257   Epoch: 12   Global Step: 65720   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:20:21,344-Speed 10535.59 samples/sec   Loss 5.8254   LearningRate 0.1256   Epoch: 12   Global Step: 65730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:20:29,112-Speed 10550.86 samples/sec   Loss 5.8556   LearningRate 0.1255   Epoch: 12   Global Step: 65740   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:20:36,915-Speed 10500.01 samples/sec   Loss 5.8356   LearningRate 0.1255   Epoch: 12   Global Step: 65750   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:20:44,748-Speed 10459.25 samples/sec   Loss 5.8481   LearningRate 0.1254   Epoch: 12   Global Step: 65760   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:20:52,540-Speed 10516.37 samples/sec   Loss 5.8451   LearningRate 0.1253   Epoch: 12   Global Step: 65770   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:21:00,349-Speed 10491.39 samples/sec   Loss 5.8724   LearningRate 0.1253   Epoch: 12   Global Step: 65780   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:21:08,179-Speed 10463.47 samples/sec   Loss 5.8834   LearningRate 0.1252   Epoch: 12   Global Step: 65790   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:21:15,969-Speed 10518.52 samples/sec   Loss 5.8354   LearningRate 0.1251   Epoch: 12   Global Step: 65800   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:21:23,762-Speed 10511.97 samples/sec   Loss 5.8355   LearningRate 0.1251   Epoch: 12   Global Step: 65810   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:21:31,552-Speed 10517.60 samples/sec   Loss 5.8448   LearningRate 0.1250   Epoch: 12   Global Step: 65820   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:21:39,337-Speed 10525.03 samples/sec   Loss 5.8511   LearningRate 0.1249   Epoch: 12   Global Step: 65830   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:21:47,156-Speed 10479.00 samples/sec   Loss 5.8351   LearningRate 0.1249   Epoch: 12   Global Step: 65840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:21:54,950-Speed 10511.37 samples/sec   Loss 5.8347   LearningRate 0.1248   Epoch: 12   Global Step: 65850   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:22:02,745-Speed 10511.27 samples/sec   Loss 5.8494   LearningRate 0.1247   Epoch: 12   Global Step: 65860   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:22:10,523-Speed 10534.37 samples/sec   Loss 5.8619   LearningRate 0.1247   Epoch: 12   Global Step: 65870   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:22:18,315-Speed 10514.04 samples/sec   Loss 5.8475   LearningRate 0.1246   Epoch: 12   Global Step: 65880   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:22:26,116-Speed 10502.67 samples/sec   Loss 5.8276   LearningRate 0.1245   Epoch: 12   Global Step: 65890   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:22:33,907-Speed 10516.39 samples/sec   Loss 5.8505   LearningRate 0.1245   Epoch: 12   Global Step: 65900   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:22:41,683-Speed 10536.27 samples/sec   Loss 5.8327   LearningRate 0.1244   Epoch: 12   Global Step: 65910   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:22:49,466-Speed 10526.11 samples/sec   Loss 5.8314   LearningRate 0.1243   Epoch: 12   Global Step: 65920   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:22:57,250-Speed 10526.69 samples/sec   Loss 5.8242   LearningRate 0.1243   Epoch: 12   Global Step: 65930   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:23:05,060-Speed 10489.70 samples/sec   Loss 5.8240   LearningRate 0.1242   Epoch: 12   Global Step: 65940   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:23:12,885-Speed 10471.13 samples/sec   Loss 5.8368   LearningRate 0.1242   Epoch: 12   Global Step: 65950   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:23:20,682-Speed 10508.50 samples/sec   Loss 5.8275   LearningRate 0.1241   Epoch: 12   Global Step: 65960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:23:28,474-Speed 10514.89 samples/sec   Loss 5.8392   LearningRate 0.1240   Epoch: 12   Global Step: 65970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:23:36,231-Speed 10561.97 samples/sec   Loss 5.8416   LearningRate 0.1240   Epoch: 12   Global Step: 65980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:23:44,019-Speed 10519.68 samples/sec   Loss 5.8254   LearningRate 0.1239   Epoch: 12   Global Step: 65990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:23:51,829-Speed 10490.35 samples/sec   Loss 5.7885   LearningRate 0.1238   Epoch: 12   Global Step: 66000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:23:59,662-Speed 10459.87 samples/sec   Loss 5.7909   LearningRate 0.1238   Epoch: 12   Global Step: 66010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:24:07,444-Speed 10528.63 samples/sec   Loss 5.8041   LearningRate 0.1237   Epoch: 12   Global Step: 66020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:24:15,238-Speed 10512.00 samples/sec   Loss 5.7890   LearningRate 0.1236   Epoch: 12   Global Step: 66030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:24:23,020-Speed 10527.88 samples/sec   Loss 5.7890   LearningRate 0.1236   Epoch: 12   Global Step: 66040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:24:30,793-Speed 10539.86 samples/sec   Loss 5.8295   LearningRate 0.1235   Epoch: 12   Global Step: 66050   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-01-16 05:24:38,632-Speed 10452.31 samples/sec   Loss 5.7994   LearningRate 0.1234   Epoch: 12   Global Step: 66060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:24:46,441-Speed 10492.29 samples/sec   Loss 5.8264   LearningRate 0.1234   Epoch: 12   Global Step: 66070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:24:54,245-Speed 10498.29 samples/sec   Loss 5.8265   LearningRate 0.1233   Epoch: 12   Global Step: 66080   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:25:02,078-Speed 10461.92 samples/sec   Loss 5.7776   LearningRate 0.1232   Epoch: 12   Global Step: 66090   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:25:09,879-Speed 10503.03 samples/sec   Loss 5.8077   LearningRate 0.1232   Epoch: 12   Global Step: 66100   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:25:17,687-Speed 10493.46 samples/sec   Loss 5.7849   LearningRate 0.1231   Epoch: 12   Global Step: 66110   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:25:25,492-Speed 10497.56 samples/sec   Loss 5.7864   LearningRate 0.1230   Epoch: 12   Global Step: 66120   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:25:33,290-Speed 10506.10 samples/sec   Loss 5.7867   LearningRate 0.1230   Epoch: 12   Global Step: 66130   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:25:41,079-Speed 10520.23 samples/sec   Loss 5.7808   LearningRate 0.1229   Epoch: 12   Global Step: 66140   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:25:48,857-Speed 10532.44 samples/sec   Loss 5.7935   LearningRate 0.1228   Epoch: 12   Global Step: 66150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:25:56,671-Speed 10491.45 samples/sec   Loss 5.8101   LearningRate 0.1228   Epoch: 12   Global Step: 66160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:26:04,476-Speed 10498.18 samples/sec   Loss 5.8387   LearningRate 0.1227   Epoch: 12   Global Step: 66170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:26:12,277-Speed 10502.34 samples/sec   Loss 5.8243   LearningRate 0.1226   Epoch: 12   Global Step: 66180   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:26:20,099-Speed 10473.92 samples/sec   Loss 5.7999   LearningRate 0.1226   Epoch: 12   Global Step: 66190   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:26:27,930-Speed 10462.48 samples/sec   Loss 5.7965   LearningRate 0.1225   Epoch: 12   Global Step: 66200   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:26:35,782-Speed 10435.30 samples/sec   Loss 5.7953   LearningRate 0.1224   Epoch: 12   Global Step: 66210   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:26:43,638-Speed 10428.23 samples/sec   Loss 5.8199   LearningRate 0.1224   Epoch: 12   Global Step: 66220   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:26:51,426-Speed 10520.77 samples/sec   Loss 5.8081   LearningRate 0.1223   Epoch: 12   Global Step: 66230   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:26:59,214-Speed 10519.91 samples/sec   Loss 5.7971   LearningRate 0.1223   Epoch: 12   Global Step: 66240   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:27:07,010-Speed 10510.61 samples/sec   Loss 5.7873   LearningRate 0.1222   Epoch: 12   Global Step: 66250   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:27:14,862-Speed 10433.47 samples/sec   Loss 5.7617   LearningRate 0.1221   Epoch: 12   Global Step: 66260   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:27:22,694-Speed 10460.88 samples/sec   Loss 5.7909   LearningRate 0.1221   Epoch: 12   Global Step: 66270   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:27:30,498-Speed 10499.74 samples/sec   Loss 5.8018   LearningRate 0.1220   Epoch: 12   Global Step: 66280   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:27:38,321-Speed 10473.49 samples/sec   Loss 5.7524   LearningRate 0.1219   Epoch: 12   Global Step: 66290   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:27:46,122-Speed 10502.52 samples/sec   Loss 5.7567   LearningRate 0.1219   Epoch: 12   Global Step: 66300   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:27:53,936-Speed 10484.12 samples/sec   Loss 5.8009   LearningRate 0.1218   Epoch: 12   Global Step: 66310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:28:01,732-Speed 10510.20 samples/sec   Loss 5.7878   LearningRate 0.1217   Epoch: 12   Global Step: 66320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:28:09,538-Speed 10499.37 samples/sec   Loss 5.8113   LearningRate 0.1217   Epoch: 12   Global Step: 66330   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:28:17,358-Speed 10475.83 samples/sec   Loss 5.7709   LearningRate 0.1216   Epoch: 12   Global Step: 66340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:28:25,141-Speed 10527.71 samples/sec   Loss 5.8004   LearningRate 0.1215   Epoch: 12   Global Step: 66350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:28:32,923-Speed 10528.98 samples/sec   Loss 5.7292   LearningRate 0.1215   Epoch: 12   Global Step: 66360   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:28:40,744-Speed 10475.93 samples/sec   Loss 5.7474   LearningRate 0.1214   Epoch: 12   Global Step: 66370   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:28:48,541-Speed 10509.07 samples/sec   Loss 5.7820   LearningRate 0.1213   Epoch: 12   Global Step: 66380   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:28:56,335-Speed 10511.24 samples/sec   Loss 5.7864   LearningRate 0.1213   Epoch: 12   Global Step: 66390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:29:04,132-Speed 10511.57 samples/sec   Loss 5.7372   LearningRate 0.1212   Epoch: 12   Global Step: 66400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:29:11,958-Speed 10468.33 samples/sec   Loss 5.7797   LearningRate 0.1211   Epoch: 12   Global Step: 66410   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-01-16 05:29:19,757-Speed 10506.07 samples/sec   Loss 5.7629   LearningRate 0.1211   Epoch: 12   Global Step: 66420   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-01-16 05:29:27,560-Speed 10500.51 samples/sec   Loss 5.7822   LearningRate 0.1210   Epoch: 12   Global Step: 66430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:29:35,362-Speed 10502.13 samples/sec   Loss 5.7609   LearningRate 0.1209   Epoch: 12   Global Step: 66440   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:29:43,168-Speed 10495.70 samples/sec   Loss 5.7569   LearningRate 0.1209   Epoch: 12   Global Step: 66450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:29:50,985-Speed 10480.97 samples/sec   Loss 5.7406   LearningRate 0.1208   Epoch: 12   Global Step: 66460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:29:58,808-Speed 10472.82 samples/sec   Loss 5.7423   LearningRate 0.1208   Epoch: 12   Global Step: 66470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:30:06,621-Speed 10486.84 samples/sec   Loss 5.7755   LearningRate 0.1207   Epoch: 12   Global Step: 66480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:30:14,410-Speed 10519.09 samples/sec   Loss 5.7315   LearningRate 0.1206   Epoch: 12   Global Step: 66490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:30:22,212-Speed 10501.43 samples/sec   Loss 5.7515   LearningRate 0.1206   Epoch: 12   Global Step: 66500   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:30:30,004-Speed 10516.15 samples/sec   Loss 5.7518   LearningRate 0.1205   Epoch: 12   Global Step: 66510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:30:37,786-Speed 10528.01 samples/sec   Loss 5.7448   LearningRate 0.1204   Epoch: 12   Global Step: 66520   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:30:45,574-Speed 10519.08 samples/sec   Loss 5.7742   LearningRate 0.1204   Epoch: 12   Global Step: 66530   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:30:53,352-Speed 10534.05 samples/sec   Loss 5.7379   LearningRate 0.1203   Epoch: 12   Global Step: 66540   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:31:01,132-Speed 10532.06 samples/sec   Loss 5.7643   LearningRate 0.1202   Epoch: 12   Global Step: 66550   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:31:08,916-Speed 10525.50 samples/sec   Loss 5.7246   LearningRate 0.1202   Epoch: 12   Global Step: 66560   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:31:16,705-Speed 10518.26 samples/sec   Loss 5.7234   LearningRate 0.1201   Epoch: 12   Global Step: 66570   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:31:24,493-Speed 10519.45 samples/sec   Loss 5.6937   LearningRate 0.1200   Epoch: 12   Global Step: 66580   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:31:32,321-Speed 10467.05 samples/sec   Loss 5.7509   LearningRate 0.1200   Epoch: 12   Global Step: 66590   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:31:40,128-Speed 10494.31 samples/sec   Loss 5.7451   LearningRate 0.1199   Epoch: 12   Global Step: 66600   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:31:47,933-Speed 10497.64 samples/sec   Loss 5.7601   LearningRate 0.1198   Epoch: 12   Global Step: 66610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:31:55,744-Speed 10489.87 samples/sec   Loss 5.7236   LearningRate 0.1198   Epoch: 12   Global Step: 66620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:32:03,557-Speed 10486.36 samples/sec   Loss 5.7508   LearningRate 0.1197   Epoch: 12   Global Step: 66630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:32:11,357-Speed 10504.66 samples/sec   Loss 5.7526   LearningRate 0.1197   Epoch: 12   Global Step: 66640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:32:19,197-Speed 10449.27 samples/sec   Loss 5.7213   LearningRate 0.1196   Epoch: 12   Global Step: 66650   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:32:26,988-Speed 10516.68 samples/sec   Loss 5.7754   LearningRate 0.1195   Epoch: 12   Global Step: 66660   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:32:34,808-Speed 10475.98 samples/sec   Loss 5.7418   LearningRate 0.1195   Epoch: 12   Global Step: 66670   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:32:42,660-Speed 10434.52 samples/sec   Loss 5.7141   LearningRate 0.1194   Epoch: 12   Global Step: 66680   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:32:50,450-Speed 10518.35 samples/sec   Loss 5.7212   LearningRate 0.1193   Epoch: 12   Global Step: 66690   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:32:58,238-Speed 10520.14 samples/sec   Loss 5.7429   LearningRate 0.1193   Epoch: 12   Global Step: 66700   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:33:06,031-Speed 10513.73 samples/sec   Loss 5.7092   LearningRate 0.1192   Epoch: 12   Global Step: 66710   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:33:13,886-Speed 10431.10 samples/sec   Loss 5.7236   LearningRate 0.1191   Epoch: 12   Global Step: 66720   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:33:21,683-Speed 10508.72 samples/sec   Loss 5.7312   LearningRate 0.1191   Epoch: 12   Global Step: 66730   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:33:29,482-Speed 10504.87 samples/sec   Loss 5.7234   LearningRate 0.1190   Epoch: 12   Global Step: 66740   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:33:37,272-Speed 10517.28 samples/sec   Loss 5.7121   LearningRate 0.1189   Epoch: 12   Global Step: 66750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:33:45,081-Speed 10491.45 samples/sec   Loss 5.7109   LearningRate 0.1189   Epoch: 12   Global Step: 66760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:33:52,940-Speed 10427.14 samples/sec   Loss 5.6824   LearningRate 0.1188   Epoch: 12   Global Step: 66770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:34:00,728-Speed 10520.02 samples/sec   Loss 5.7137   LearningRate 0.1188   Epoch: 12   Global Step: 66780   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:34:08,544-Speed 10482.49 samples/sec   Loss 5.7181   LearningRate 0.1187   Epoch: 12   Global Step: 66790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:34:16,338-Speed 10511.21 samples/sec   Loss 5.6659   LearningRate 0.1186   Epoch: 12   Global Step: 66800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:34:24,143-Speed 10497.69 samples/sec   Loss 5.7456   LearningRate 0.1186   Epoch: 12   Global Step: 66810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:34:31,932-Speed 10518.76 samples/sec   Loss 5.7176   LearningRate 0.1185   Epoch: 12   Global Step: 66820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:34:39,751-Speed 10477.64 samples/sec   Loss 5.7327   LearningRate 0.1184   Epoch: 12   Global Step: 66830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:34:47,602-Speed 10438.33 samples/sec   Loss 5.7380   LearningRate 0.1184   Epoch: 12   Global Step: 66840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:34:55,432-Speed 10464.10 samples/sec   Loss 5.6792   LearningRate 0.1183   Epoch: 12   Global Step: 66850   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-01-16 05:35:03,269-Speed 10454.16 samples/sec   Loss 5.7019   LearningRate 0.1182   Epoch: 12   Global Step: 66860   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:35:11,079-Speed 10490.02 samples/sec   Loss 5.7119   LearningRate 0.1182   Epoch: 12   Global Step: 66870   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:35:18,889-Speed 10491.85 samples/sec   Loss 5.7074   LearningRate 0.1181   Epoch: 12   Global Step: 66880   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:35:26,697-Speed 10492.14 samples/sec   Loss 5.6783   LearningRate 0.1180   Epoch: 12   Global Step: 66890   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:35:34,511-Speed 10485.57 samples/sec   Loss 5.7233   LearningRate 0.1180   Epoch: 12   Global Step: 66900   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:35:42,304-Speed 10513.27 samples/sec   Loss 5.6741   LearningRate 0.1179   Epoch: 12   Global Step: 66910   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:35:50,104-Speed 10504.56 samples/sec   Loss 5.6999   LearningRate 0.1179   Epoch: 12   Global Step: 66920   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:35:57,941-Speed 10454.45 samples/sec   Loss 5.7001   LearningRate 0.1178   Epoch: 12   Global Step: 66930   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:36:05,715-Speed 10538.78 samples/sec   Loss 5.7036   LearningRate 0.1177   Epoch: 12   Global Step: 66940   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:36:13,521-Speed 10496.72 samples/sec   Loss 5.6696   LearningRate 0.1177   Epoch: 12   Global Step: 66950   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:36:21,328-Speed 10494.23 samples/sec   Loss 5.6465   LearningRate 0.1176   Epoch: 12   Global Step: 66960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:36:29,120-Speed 10515.40 samples/sec   Loss 5.6730   LearningRate 0.1175   Epoch: 12   Global Step: 66970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:36:36,914-Speed 10511.29 samples/sec   Loss 5.7069   LearningRate 0.1175   Epoch: 12   Global Step: 66980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:36:44,698-Speed 10524.60 samples/sec   Loss 5.6824   LearningRate 0.1174   Epoch: 12   Global Step: 66990   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:36:52,523-Speed 10471.85 samples/sec   Loss 5.6850   LearningRate 0.1173   Epoch: 12   Global Step: 67000   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:37:00,300-Speed 10535.14 samples/sec   Loss 5.6618   LearningRate 0.1173   Epoch: 12   Global Step: 67010   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:37:08,088-Speed 10518.82 samples/sec   Loss 5.6896   LearningRate 0.1172   Epoch: 12   Global Step: 67020   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:37:15,867-Speed 10532.63 samples/sec   Loss 5.6609   LearningRate 0.1171   Epoch: 12   Global Step: 67030   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:37:23,670-Speed 10499.98 samples/sec   Loss 5.6577   LearningRate 0.1171   Epoch: 12   Global Step: 67040   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:37:31,465-Speed 10512.17 samples/sec   Loss 5.6717   LearningRate 0.1170   Epoch: 12   Global Step: 67050   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:37:39,233-Speed 10546.53 samples/sec   Loss 5.6498   LearningRate 0.1170   Epoch: 12   Global Step: 67060   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:37:47,018-Speed 10524.63 samples/sec   Loss 5.6953   LearningRate 0.1169   Epoch: 12   Global Step: 67070   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:37:54,835-Speed 10481.70 samples/sec   Loss 5.6551   LearningRate 0.1168   Epoch: 12   Global Step: 67080   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:38:02,640-Speed 10497.26 samples/sec   Loss 5.6753   LearningRate 0.1168   Epoch: 12   Global Step: 67090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:38:10,433-Speed 10513.37 samples/sec   Loss 5.6673   LearningRate 0.1167   Epoch: 12   Global Step: 67100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:38:18,273-Speed 10450.60 samples/sec   Loss 5.6868   LearningRate 0.1166   Epoch: 12   Global Step: 67110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:38:26,062-Speed 10519.70 samples/sec   Loss 5.6818   LearningRate 0.1166   Epoch: 12   Global Step: 67120   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:38:33,864-Speed 10500.73 samples/sec   Loss 5.6081   LearningRate 0.1165   Epoch: 12   Global Step: 67130   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:38:41,648-Speed 10525.35 samples/sec   Loss 5.6611   LearningRate 0.1164   Epoch: 12   Global Step: 67140   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:38:49,439-Speed 10516.93 samples/sec   Loss 5.6914   LearningRate 0.1164   Epoch: 12   Global Step: 67150   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:38:57,240-Speed 10502.57 samples/sec   Loss 5.7050   LearningRate 0.1163   Epoch: 12   Global Step: 67160   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:39:05,041-Speed 10502.39 samples/sec   Loss 5.6571   LearningRate 0.1163   Epoch: 12   Global Step: 67170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:39:12,836-Speed 10509.98 samples/sec   Loss 5.6428   LearningRate 0.1162   Epoch: 12   Global Step: 67180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:39:20,625-Speed 10519.60 samples/sec   Loss 5.6855   LearningRate 0.1161   Epoch: 12   Global Step: 67190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:39:28,417-Speed 10513.97 samples/sec   Loss 5.6514   LearningRate 0.1161   Epoch: 12   Global Step: 67200   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:39:36,233-Speed 10483.50 samples/sec   Loss 5.6893   LearningRate 0.1160   Epoch: 12   Global Step: 67210   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:39:44,072-Speed 10451.42 samples/sec   Loss 5.6482   LearningRate 0.1159   Epoch: 12   Global Step: 67220   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:39:51,917-Speed 10444.00 samples/sec   Loss 5.6230   LearningRate 0.1159   Epoch: 12   Global Step: 67230   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:39:59,737-Speed 10476.61 samples/sec   Loss 5.6768   LearningRate 0.1158   Epoch: 12   Global Step: 67240   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:40:07,555-Speed 10479.73 samples/sec   Loss 5.6560   LearningRate 0.1157   Epoch: 12   Global Step: 67250   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:40:15,390-Speed 10457.29 samples/sec   Loss 5.6370   LearningRate 0.1157   Epoch: 12   Global Step: 67260   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:40:23,236-Speed 10443.48 samples/sec   Loss 5.6622   LearningRate 0.1156   Epoch: 12   Global Step: 67270   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:40:31,050-Speed 10484.93 samples/sec   Loss 5.6241   LearningRate 0.1156   Epoch: 12   Global Step: 67280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:40:38,862-Speed 10488.03 samples/sec   Loss 5.6697   LearningRate 0.1155   Epoch: 12   Global Step: 67290   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:40:46,679-Speed 10482.02 samples/sec   Loss 5.6654   LearningRate 0.1154   Epoch: 12   Global Step: 67300   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:40:54,469-Speed 10519.24 samples/sec   Loss 5.6528   LearningRate 0.1154   Epoch: 12   Global Step: 67310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:41:02,280-Speed 10488.57 samples/sec   Loss 5.6461   LearningRate 0.1153   Epoch: 12   Global Step: 67320   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:41:10,088-Speed 10493.18 samples/sec   Loss 5.6395   LearningRate 0.1152   Epoch: 12   Global Step: 67330   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:41:17,870-Speed 10528.36 samples/sec   Loss 5.6430   LearningRate 0.1152   Epoch: 12   Global Step: 67340   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:41:25,671-Speed 10502.71 samples/sec   Loss 5.6217   LearningRate 0.1151   Epoch: 12   Global Step: 67350   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:41:33,435-Speed 10552.20 samples/sec   Loss 5.6255   LearningRate 0.1150   Epoch: 12   Global Step: 67360   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:41:41,211-Speed 10535.95 samples/sec   Loss 5.6623   LearningRate 0.1150   Epoch: 12   Global Step: 67370   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:41:48,995-Speed 10526.26 samples/sec   Loss 5.6492   LearningRate 0.1149   Epoch: 12   Global Step: 67380   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:41:56,805-Speed 10490.92 samples/sec   Loss 5.6729   LearningRate 0.1149   Epoch: 12   Global Step: 67390   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:42:04,620-Speed 10484.32 samples/sec   Loss 5.6423   LearningRate 0.1148   Epoch: 12   Global Step: 67400   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:42:27,292-Speed 3613.38 samples/sec   Loss 5.6380   LearningRate 0.1147   Epoch: 13   Global Step: 67410   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:42:35,098-Speed 10497.15 samples/sec   Loss 5.6487   LearningRate 0.1147   Epoch: 13   Global Step: 67420   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:42:42,878-Speed 10533.27 samples/sec   Loss 5.6219   LearningRate 0.1146   Epoch: 13   Global Step: 67430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:42:50,678-Speed 10504.96 samples/sec   Loss 5.6204   LearningRate 0.1145   Epoch: 13   Global Step: 67440   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:42:58,486-Speed 10492.14 samples/sec   Loss 5.5890   LearningRate 0.1145   Epoch: 13   Global Step: 67450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:43:06,294-Speed 10496.66 samples/sec   Loss 5.6230   LearningRate 0.1144   Epoch: 13   Global Step: 67460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:43:14,109-Speed 10485.23 samples/sec   Loss 5.6228   LearningRate 0.1144   Epoch: 13   Global Step: 67470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:43:21,930-Speed 10475.67 samples/sec   Loss 5.5924   LearningRate 0.1143   Epoch: 13   Global Step: 67480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:43:29,755-Speed 10471.31 samples/sec   Loss 5.6022   LearningRate 0.1142   Epoch: 13   Global Step: 67490   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:43:37,584-Speed 10464.34 samples/sec   Loss 5.6021   LearningRate 0.1142   Epoch: 13   Global Step: 67500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:43:45,395-Speed 10488.61 samples/sec   Loss 5.6528   LearningRate 0.1141   Epoch: 13   Global Step: 67510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:43:53,217-Speed 10474.66 samples/sec   Loss 5.5783   LearningRate 0.1140   Epoch: 13   Global Step: 67520   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:44:01,032-Speed 10484.02 samples/sec   Loss 5.5591   LearningRate 0.1140   Epoch: 13   Global Step: 67530   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:44:08,812-Speed 10530.92 samples/sec   Loss 5.5711   LearningRate 0.1139   Epoch: 13   Global Step: 67540   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:44:16,595-Speed 10527.13 samples/sec   Loss 5.5767   LearningRate 0.1138   Epoch: 13   Global Step: 67550   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:44:24,373-Speed 10534.99 samples/sec   Loss 5.6157   LearningRate 0.1138   Epoch: 13   Global Step: 67560   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:44:32,155-Speed 10528.25 samples/sec   Loss 5.6418   LearningRate 0.1137   Epoch: 13   Global Step: 67570   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:44:39,914-Speed 10559.32 samples/sec   Loss 5.5973   LearningRate 0.1137   Epoch: 13   Global Step: 67580   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:44:47,674-Speed 10559.32 samples/sec   Loss 5.6295   LearningRate 0.1136   Epoch: 13   Global Step: 67590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:44:55,424-Speed 10570.45 samples/sec   Loss 5.5674   LearningRate 0.1135   Epoch: 13   Global Step: 67600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:45:03,248-Speed 10472.16 samples/sec   Loss 5.5792   LearningRate 0.1135   Epoch: 13   Global Step: 67610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:45:11,038-Speed 10517.41 samples/sec   Loss 5.5671   LearningRate 0.1134   Epoch: 13   Global Step: 67620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:45:18,824-Speed 10523.19 samples/sec   Loss 5.5907   LearningRate 0.1133   Epoch: 13   Global Step: 67630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:45:26,626-Speed 10500.83 samples/sec   Loss 5.5700   LearningRate 0.1133   Epoch: 13   Global Step: 67640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:45:34,448-Speed 10474.87 samples/sec   Loss 5.5764   LearningRate 0.1132   Epoch: 13   Global Step: 67650   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:45:42,228-Speed 10531.00 samples/sec   Loss 5.6142   LearningRate 0.1132   Epoch: 13   Global Step: 67660   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:45:50,007-Speed 10532.12 samples/sec   Loss 5.5938   LearningRate 0.1131   Epoch: 13   Global Step: 67670   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:45:57,824-Speed 10481.64 samples/sec   Loss 5.5983   LearningRate 0.1130   Epoch: 13   Global Step: 67680   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:46:05,598-Speed 10539.17 samples/sec   Loss 5.6174   LearningRate 0.1130   Epoch: 13   Global Step: 67690   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:46:13,396-Speed 10505.86 samples/sec   Loss 5.5872   LearningRate 0.1129   Epoch: 13   Global Step: 67700   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:46:21,185-Speed 10520.74 samples/sec   Loss 5.6021   LearningRate 0.1128   Epoch: 13   Global Step: 67710   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:46:28,973-Speed 10519.39 samples/sec   Loss 5.6066   LearningRate 0.1128   Epoch: 13   Global Step: 67720   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:46:36,755-Speed 10527.59 samples/sec   Loss 5.6031   LearningRate 0.1127   Epoch: 13   Global Step: 67730   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:46:44,540-Speed 10525.53 samples/sec   Loss 5.6013   LearningRate 0.1127   Epoch: 13   Global Step: 67740   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:46:52,308-Speed 10546.78 samples/sec   Loss 5.5957   LearningRate 0.1126   Epoch: 13   Global Step: 67750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:47:00,088-Speed 10530.65 samples/sec   Loss 5.5784   LearningRate 0.1125   Epoch: 13   Global Step: 67760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:47:07,867-Speed 10531.28 samples/sec   Loss 5.5734   LearningRate 0.1125   Epoch: 13   Global Step: 67770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:47:15,650-Speed 10528.39 samples/sec   Loss 5.5983   LearningRate 0.1124   Epoch: 13   Global Step: 67780   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:47:23,442-Speed 10514.90 samples/sec   Loss 5.5792   LearningRate 0.1123   Epoch: 13   Global Step: 67790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:47:31,250-Speed 10492.17 samples/sec   Loss 5.5326   LearningRate 0.1123   Epoch: 13   Global Step: 67800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:47:39,074-Speed 10471.98 samples/sec   Loss 5.6108   LearningRate 0.1122   Epoch: 13   Global Step: 67810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:47:46,902-Speed 10467.22 samples/sec   Loss 5.5589   LearningRate 0.1122   Epoch: 13   Global Step: 67820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:47:54,737-Speed 10457.70 samples/sec   Loss 5.5602   LearningRate 0.1121   Epoch: 13   Global Step: 67830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:48:02,574-Speed 10454.32 samples/sec   Loss 5.5794   LearningRate 0.1120   Epoch: 13   Global Step: 67840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:48:10,386-Speed 10488.52 samples/sec   Loss 5.5598   LearningRate 0.1120   Epoch: 13   Global Step: 67850   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:48:18,229-Speed 10446.33 samples/sec   Loss 5.5607   LearningRate 0.1119   Epoch: 13   Global Step: 67860   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:48:26,056-Speed 10468.34 samples/sec   Loss 5.5704   LearningRate 0.1118   Epoch: 13   Global Step: 67870   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:48:33,872-Speed 10482.91 samples/sec   Loss 5.6049   LearningRate 0.1118   Epoch: 13   Global Step: 67880   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:48:41,699-Speed 10467.26 samples/sec   Loss 5.6046   LearningRate 0.1117   Epoch: 13   Global Step: 67890   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:48:49,522-Speed 10473.93 samples/sec   Loss 5.5960   LearningRate 0.1117   Epoch: 13   Global Step: 67900   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:48:57,368-Speed 10442.38 samples/sec   Loss 5.5549   LearningRate 0.1116   Epoch: 13   Global Step: 67910   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:49:05,203-Speed 10456.39 samples/sec   Loss 5.5145   LearningRate 0.1115   Epoch: 13   Global Step: 67920   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:49:13,022-Speed 10489.74 samples/sec   Loss 5.5636   LearningRate 0.1115   Epoch: 13   Global Step: 67930   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:49:20,841-Speed 10483.57 samples/sec   Loss 5.5689   LearningRate 0.1114   Epoch: 13   Global Step: 67940   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:49:28,707-Speed 10415.23 samples/sec   Loss 5.5136   LearningRate 0.1113   Epoch: 13   Global Step: 67950   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:49:36,549-Speed 10447.86 samples/sec   Loss 5.5461   LearningRate 0.1113   Epoch: 13   Global Step: 67960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:49:44,403-Speed 10432.34 samples/sec   Loss 5.5246   LearningRate 0.1112   Epoch: 13   Global Step: 67970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:49:52,326-Speed 10341.63 samples/sec   Loss 5.5204   LearningRate 0.1112   Epoch: 13   Global Step: 67980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:50:00,165-Speed 10451.34 samples/sec   Loss 5.5519   LearningRate 0.1111   Epoch: 13   Global Step: 67990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:50:08,027-Speed 10420.19 samples/sec   Loss 5.5299   LearningRate 0.1110   Epoch: 13   Global Step: 68000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:50:15,868-Speed 10449.84 samples/sec   Loss 5.5030   LearningRate 0.1110   Epoch: 13   Global Step: 68010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:50:23,693-Speed 10469.99 samples/sec   Loss 5.5545   LearningRate 0.1109   Epoch: 13   Global Step: 68020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:50:31,499-Speed 10495.96 samples/sec   Loss 5.5342   LearningRate 0.1108   Epoch: 13   Global Step: 68030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:50:39,320-Speed 10476.11 samples/sec   Loss 5.5475   LearningRate 0.1108   Epoch: 13   Global Step: 68040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:50:47,126-Speed 10495.87 samples/sec   Loss 5.5434   LearningRate 0.1107   Epoch: 13   Global Step: 68050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:50:54,926-Speed 10503.67 samples/sec   Loss 5.5333   LearningRate 0.1107   Epoch: 13   Global Step: 68060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:51:02,731-Speed 10498.37 samples/sec   Loss 5.5067   LearningRate 0.1106   Epoch: 13   Global Step: 68070   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:51:10,541-Speed 10489.91 samples/sec   Loss 5.5123   LearningRate 0.1105   Epoch: 13   Global Step: 68080   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:51:18,354-Speed 10485.64 samples/sec   Loss 5.5190   LearningRate 0.1105   Epoch: 13   Global Step: 68090   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:51:26,200-Speed 10442.29 samples/sec   Loss 5.5125   LearningRate 0.1104   Epoch: 13   Global Step: 68100   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:51:34,005-Speed 10497.52 samples/sec   Loss 5.5182   LearningRate 0.1103   Epoch: 13   Global Step: 68110   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:51:41,859-Speed 10431.50 samples/sec   Loss 5.4770   LearningRate 0.1103   Epoch: 13   Global Step: 68120   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:51:49,656-Speed 10508.22 samples/sec   Loss 5.5428   LearningRate 0.1102   Epoch: 13   Global Step: 68130   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:51:57,490-Speed 10459.14 samples/sec   Loss 5.5447   LearningRate 0.1102   Epoch: 13   Global Step: 68140   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:52:05,303-Speed 10486.01 samples/sec   Loss 5.4964   LearningRate 0.1101   Epoch: 13   Global Step: 68150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:52:13,141-Speed 10453.68 samples/sec   Loss 5.5468   LearningRate 0.1100   Epoch: 13   Global Step: 68160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:52:20,960-Speed 10480.67 samples/sec   Loss 5.5267   LearningRate 0.1100   Epoch: 13   Global Step: 68170   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:52:28,757-Speed 10507.20 samples/sec   Loss 5.5511   LearningRate 0.1099   Epoch: 13   Global Step: 68180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:52:36,559-Speed 10501.70 samples/sec   Loss 5.5174   LearningRate 0.1098   Epoch: 13   Global Step: 68190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:52:44,352-Speed 10513.21 samples/sec   Loss 5.4954   LearningRate 0.1098   Epoch: 13   Global Step: 68200   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:52:52,137-Speed 10523.93 samples/sec   Loss 5.4581   LearningRate 0.1097   Epoch: 13   Global Step: 68210   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:52:59,955-Speed 10480.12 samples/sec   Loss 5.4985   LearningRate 0.1097   Epoch: 13   Global Step: 68220   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:53:07,756-Speed 10502.47 samples/sec   Loss 5.5437   LearningRate 0.1096   Epoch: 13   Global Step: 68230   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:53:15,548-Speed 10515.12 samples/sec   Loss 5.5371   LearningRate 0.1095   Epoch: 13   Global Step: 68240   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:53:23,365-Speed 10482.18 samples/sec   Loss 5.5042   LearningRate 0.1095   Epoch: 13   Global Step: 68250   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:53:31,169-Speed 10497.84 samples/sec   Loss 5.4712   LearningRate 0.1094   Epoch: 13   Global Step: 68260   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:53:38,939-Speed 10545.37 samples/sec   Loss 5.5349   LearningRate 0.1094   Epoch: 13   Global Step: 68270   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:53:46,720-Speed 10529.14 samples/sec   Loss 5.5037   LearningRate 0.1093   Epoch: 13   Global Step: 68280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:53:54,516-Speed 10509.57 samples/sec   Loss 5.5069   LearningRate 0.1092   Epoch: 13   Global Step: 68290   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:54:02,292-Speed 10535.84 samples/sec   Loss 5.5319   LearningRate 0.1092   Epoch: 13   Global Step: 68300   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:54:10,067-Speed 10537.09 samples/sec   Loss 5.4944   LearningRate 0.1091   Epoch: 13   Global Step: 68310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:54:17,883-Speed 10483.75 samples/sec   Loss 5.4849   LearningRate 0.1090   Epoch: 13   Global Step: 68320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:54:25,709-Speed 10468.65 samples/sec   Loss 5.4904   LearningRate 0.1090   Epoch: 13   Global Step: 68330   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:54:33,497-Speed 10521.10 samples/sec   Loss 5.4816   LearningRate 0.1089   Epoch: 13   Global Step: 68340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:54:41,290-Speed 10514.67 samples/sec   Loss 5.4622   LearningRate 0.1089   Epoch: 13   Global Step: 68350   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:54:49,075-Speed 10522.96 samples/sec   Loss 5.5262   LearningRate 0.1088   Epoch: 13   Global Step: 68360   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:54:56,864-Speed 10519.06 samples/sec   Loss 5.4779   LearningRate 0.1087   Epoch: 13   Global Step: 68370   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:55:04,667-Speed 10500.21 samples/sec   Loss 5.4574   LearningRate 0.1087   Epoch: 13   Global Step: 68380   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:55:12,461-Speed 10513.09 samples/sec   Loss 5.4743   LearningRate 0.1086   Epoch: 13   Global Step: 68390   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:55:20,266-Speed 10497.06 samples/sec   Loss 5.5233   LearningRate 0.1086   Epoch: 13   Global Step: 68400   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:55:28,079-Speed 10485.33 samples/sec   Loss 5.4692   LearningRate 0.1085   Epoch: 13   Global Step: 68410   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:55:35,895-Speed 10484.99 samples/sec   Loss 5.4528   LearningRate 0.1084   Epoch: 13   Global Step: 68420   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:55:43,716-Speed 10476.76 samples/sec   Loss 5.4381   LearningRate 0.1084   Epoch: 13   Global Step: 68430   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:55:51,533-Speed 10480.30 samples/sec   Loss 5.4599   LearningRate 0.1083   Epoch: 13   Global Step: 68440   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:55:59,361-Speed 10468.96 samples/sec   Loss 5.4856   LearningRate 0.1082   Epoch: 13   Global Step: 68450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:56:07,171-Speed 10490.27 samples/sec   Loss 5.4774   LearningRate 0.1082   Epoch: 13   Global Step: 68460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:56:14,966-Speed 10511.06 samples/sec   Loss 5.4666   LearningRate 0.1081   Epoch: 13   Global Step: 68470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:56:22,757-Speed 10516.46 samples/sec   Loss 5.4503   LearningRate 0.1081   Epoch: 13   Global Step: 68480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:56:30,548-Speed 10515.07 samples/sec   Loss 5.5015   LearningRate 0.1080   Epoch: 13   Global Step: 68490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:56:38,395-Speed 10441.92 samples/sec   Loss 5.4460   LearningRate 0.1079   Epoch: 13   Global Step: 68500   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:56:46,178-Speed 10527.91 samples/sec   Loss 5.4598   LearningRate 0.1079   Epoch: 13   Global Step: 68510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:56:53,961-Speed 10527.05 samples/sec   Loss 5.4953   LearningRate 0.1078   Epoch: 13   Global Step: 68520   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:57:01,810-Speed 10436.93 samples/sec   Loss 5.5119   LearningRate 0.1078   Epoch: 13   Global Step: 68530   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:57:09,620-Speed 10491.33 samples/sec   Loss 5.4743   LearningRate 0.1077   Epoch: 13   Global Step: 68540   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:57:17,404-Speed 10526.52 samples/sec   Loss 5.4891   LearningRate 0.1076   Epoch: 13   Global Step: 68550   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:57:25,228-Speed 10470.77 samples/sec   Loss 5.4549   LearningRate 0.1076   Epoch: 13   Global Step: 68560   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:57:33,017-Speed 10519.53 samples/sec   Loss 5.4801   LearningRate 0.1075   Epoch: 13   Global Step: 68570   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:57:40,810-Speed 10513.90 samples/sec   Loss 5.4597   LearningRate 0.1074   Epoch: 13   Global Step: 68580   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:57:48,601-Speed 10515.60 samples/sec   Loss 5.4748   LearningRate 0.1074   Epoch: 13   Global Step: 68590   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:57:56,397-Speed 10510.48 samples/sec   Loss 5.4666   LearningRate 0.1073   Epoch: 13   Global Step: 68600   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:58:04,194-Speed 10506.92 samples/sec   Loss 5.4897   LearningRate 0.1073   Epoch: 13   Global Step: 68610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:58:11,978-Speed 10529.24 samples/sec   Loss 5.4568   LearningRate 0.1072   Epoch: 13   Global Step: 68620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:58:19,774-Speed 10509.19 samples/sec   Loss 5.4754   LearningRate 0.1071   Epoch: 13   Global Step: 68630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:58:27,561-Speed 10521.73 samples/sec   Loss 5.4213   LearningRate 0.1071   Epoch: 13   Global Step: 68640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:58:35,359-Speed 10506.00 samples/sec   Loss 5.4779   LearningRate 0.1070   Epoch: 13   Global Step: 68650   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:58:43,191-Speed 10461.68 samples/sec   Loss 5.4171   LearningRate 0.1070   Epoch: 13   Global Step: 68660   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:58:50,992-Speed 10502.38 samples/sec   Loss 5.4499   LearningRate 0.1069   Epoch: 13   Global Step: 68670   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 05:58:58,791-Speed 10506.01 samples/sec   Loss 5.4313   LearningRate 0.1068   Epoch: 13   Global Step: 68680   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:59:06,612-Speed 10474.72 samples/sec   Loss 5.4746   LearningRate 0.1068   Epoch: 13   Global Step: 68690   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:59:14,412-Speed 10503.85 samples/sec   Loss 5.4542   LearningRate 0.1067   Epoch: 13   Global Step: 68700   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:59:22,207-Speed 10511.65 samples/sec   Loss 5.4360   LearningRate 0.1067   Epoch: 13   Global Step: 68710   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:59:30,014-Speed 10494.32 samples/sec   Loss 5.4351   LearningRate 0.1066   Epoch: 13   Global Step: 68720   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:59:37,806-Speed 10514.44 samples/sec   Loss 5.4682   LearningRate 0.1065   Epoch: 13   Global Step: 68730   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:59:45,591-Speed 10524.93 samples/sec   Loss 5.4283   LearningRate 0.1065   Epoch: 13   Global Step: 68740   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 05:59:53,407-Speed 10481.67 samples/sec   Loss 5.4388   LearningRate 0.1064   Epoch: 13   Global Step: 68750   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:00:01,203-Speed 10509.72 samples/sec   Loss 5.4357   LearningRate 0.1063   Epoch: 13   Global Step: 68760   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:00:09,010-Speed 10494.53 samples/sec   Loss 5.3798   LearningRate 0.1063   Epoch: 13   Global Step: 68770   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:00:16,842-Speed 10461.94 samples/sec   Loss 5.4327   LearningRate 0.1062   Epoch: 13   Global Step: 68780   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 06:00:24,659-Speed 10481.17 samples/sec   Loss 5.4597   LearningRate 0.1062   Epoch: 13   Global Step: 68790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 06:00:32,468-Speed 10490.68 samples/sec   Loss 5.4207   LearningRate 0.1061   Epoch: 13   Global Step: 68800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 06:00:40,289-Speed 10476.19 samples/sec   Loss 5.4613   LearningRate 0.1060   Epoch: 13   Global Step: 68810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 06:00:48,094-Speed 10497.94 samples/sec   Loss 5.4559   LearningRate 0.1060   Epoch: 13   Global Step: 68820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 06:00:55,894-Speed 10503.06 samples/sec   Loss 5.3935   LearningRate 0.1059   Epoch: 13   Global Step: 68830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 06:01:03,694-Speed 10504.94 samples/sec   Loss 5.4204   LearningRate 0.1059   Epoch: 13   Global Step: 68840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 06:01:11,506-Speed 10486.21 samples/sec   Loss 5.4144   LearningRate 0.1058   Epoch: 13   Global Step: 68850   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 06:01:19,331-Speed 10471.42 samples/sec   Loss 5.4371   LearningRate 0.1057   Epoch: 13   Global Step: 68860   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 06:01:27,134-Speed 10500.40 samples/sec   Loss 5.4135   LearningRate 0.1057   Epoch: 13   Global Step: 68870   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 06:01:34,985-Speed 10435.27 samples/sec   Loss 5.4554   LearningRate 0.1056   Epoch: 13   Global Step: 68880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 06:01:42,767-Speed 10528.46 samples/sec   Loss 5.3818   LearningRate 0.1056   Epoch: 13   Global Step: 68890   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:01:50,594-Speed 10467.17 samples/sec   Loss 5.4575   LearningRate 0.1055   Epoch: 13   Global Step: 68900   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:01:58,427-Speed 10460.91 samples/sec   Loss 5.4134   LearningRate 0.1054   Epoch: 13   Global Step: 68910   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:02:06,229-Speed 10499.76 samples/sec   Loss 5.4417   LearningRate 0.1054   Epoch: 13   Global Step: 68920   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:02:14,036-Speed 10495.14 samples/sec   Loss 5.4127   LearningRate 0.1053   Epoch: 13   Global Step: 68930   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:02:21,834-Speed 10507.34 samples/sec   Loss 5.4312   LearningRate 0.1053   Epoch: 13   Global Step: 68940   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:02:29,629-Speed 10510.84 samples/sec   Loss 5.4286   LearningRate 0.1052   Epoch: 13   Global Step: 68950   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:02:37,414-Speed 10523.98 samples/sec   Loss 5.4271   LearningRate 0.1051   Epoch: 13   Global Step: 68960   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:02:45,250-Speed 10455.37 samples/sec   Loss 5.4439   LearningRate 0.1051   Epoch: 13   Global Step: 68970   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:02:53,106-Speed 10429.76 samples/sec   Loss 5.4363   LearningRate 0.1050   Epoch: 13   Global Step: 68980   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:03:00,926-Speed 10477.21 samples/sec   Loss 5.4698   LearningRate 0.1050   Epoch: 13   Global Step: 68990   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:03:08,736-Speed 10489.36 samples/sec   Loss 5.4142   LearningRate 0.1049   Epoch: 13   Global Step: 69000   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:03:16,573-Speed 10455.10 samples/sec   Loss 5.3955   LearningRate 0.1048   Epoch: 13   Global Step: 69010   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:03:24,353-Speed 10531.51 samples/sec   Loss 5.3994   LearningRate 0.1048   Epoch: 13   Global Step: 69020   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:03:32,148-Speed 10510.92 samples/sec   Loss 5.3819   LearningRate 0.1047   Epoch: 13   Global Step: 69030   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:03:39,942-Speed 10511.56 samples/sec   Loss 5.3519   LearningRate 0.1046   Epoch: 13   Global Step: 69040   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:03:47,771-Speed 10465.18 samples/sec   Loss 5.3640   LearningRate 0.1046   Epoch: 13   Global Step: 69050   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:03:55,565-Speed 10512.07 samples/sec   Loss 5.4215   LearningRate 0.1045   Epoch: 13   Global Step: 69060   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:04:03,361-Speed 10508.80 samples/sec   Loss 5.4297   LearningRate 0.1045   Epoch: 13   Global Step: 69070   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:04:11,154-Speed 10513.17 samples/sec   Loss 5.3675   LearningRate 0.1044   Epoch: 13   Global Step: 69080   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:04:18,946-Speed 10514.92 samples/sec   Loss 5.3721   LearningRate 0.1043   Epoch: 13   Global Step: 69090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 06:04:26,745-Speed 10505.00 samples/sec   Loss 5.3933   LearningRate 0.1043   Epoch: 13   Global Step: 69100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 06:04:34,572-Speed 10474.94 samples/sec   Loss 5.4090   LearningRate 0.1042   Epoch: 13   Global Step: 69110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 06:04:42,389-Speed 10480.61 samples/sec   Loss 5.4210   LearningRate 0.1042   Epoch: 13   Global Step: 69120   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 06:04:50,187-Speed 10508.54 samples/sec   Loss 5.3665   LearningRate 0.1041   Epoch: 13   Global Step: 69130   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:04:58,007-Speed 10478.50 samples/sec   Loss 5.4019   LearningRate 0.1040   Epoch: 13   Global Step: 69140   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:05:05,819-Speed 10487.88 samples/sec   Loss 5.3847   LearningRate 0.1040   Epoch: 13   Global Step: 69150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:05:13,628-Speed 10492.52 samples/sec   Loss 5.3484   LearningRate 0.1039   Epoch: 13   Global Step: 69160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:05:21,406-Speed 10534.01 samples/sec   Loss 5.3587   LearningRate 0.1039   Epoch: 13   Global Step: 69170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:05:29,208-Speed 10500.30 samples/sec   Loss 5.3741   LearningRate 0.1038   Epoch: 13   Global Step: 69180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:05:37,066-Speed 10426.57 samples/sec   Loss 5.3723   LearningRate 0.1037   Epoch: 13   Global Step: 69190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:05:44,874-Speed 10493.33 samples/sec   Loss 5.4028   LearningRate 0.1037   Epoch: 13   Global Step: 69200   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:05:52,653-Speed 10532.55 samples/sec   Loss 5.3453   LearningRate 0.1036   Epoch: 13   Global Step: 69210   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:06:00,459-Speed 10495.44 samples/sec   Loss 5.3894   LearningRate 0.1036   Epoch: 13   Global Step: 69220   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:06:08,234-Speed 10537.56 samples/sec   Loss 5.3858   LearningRate 0.1035   Epoch: 13   Global Step: 69230   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 06:06:16,021-Speed 10521.59 samples/sec   Loss 5.4031   LearningRate 0.1034   Epoch: 13   Global Step: 69240   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 06:06:23,814-Speed 10513.15 samples/sec   Loss 5.3759   LearningRate 0.1034   Epoch: 13   Global Step: 69250   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 06:06:31,588-Speed 10539.24 samples/sec   Loss 5.3459   LearningRate 0.1033   Epoch: 13   Global Step: 69260   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 06:06:39,388-Speed 10504.34 samples/sec   Loss 5.3855   LearningRate 0.1033   Epoch: 13   Global Step: 69270   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 06:06:47,188-Speed 10504.12 samples/sec   Loss 5.3742   LearningRate 0.1032   Epoch: 13   Global Step: 69280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 06:06:55,015-Speed 10467.32 samples/sec   Loss 5.3480   LearningRate 0.1031   Epoch: 13   Global Step: 69290   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 06:07:02,825-Speed 10491.49 samples/sec   Loss 5.3565   LearningRate 0.1031   Epoch: 13   Global Step: 69300   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 06:07:10,605-Speed 10530.95 samples/sec   Loss 5.4100   LearningRate 0.1030   Epoch: 13   Global Step: 69310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 06:07:18,426-Speed 10475.89 samples/sec   Loss 5.3598   LearningRate 0.1030   Epoch: 13   Global Step: 69320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 06:07:26,212-Speed 10523.77 samples/sec   Loss 5.3750   LearningRate 0.1029   Epoch: 13   Global Step: 69330   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-01-16 06:07:34,009-Speed 10507.49 samples/sec   Loss 5.3248   LearningRate 0.1028   Epoch: 13   Global Step: 69340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 06:07:41,813-Speed 10498.83 samples/sec   Loss 5.3439   LearningRate 0.1028   Epoch: 13   Global Step: 69350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 06:07:49,610-Speed 10507.60 samples/sec   Loss 5.3673   LearningRate 0.1027   Epoch: 13   Global Step: 69360   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 06:07:57,493-Speed 10393.83 samples/sec   Loss 5.3674   LearningRate 0.1027   Epoch: 13   Global Step: 69370   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 06:08:05,286-Speed 10513.80 samples/sec   Loss 5.3399   LearningRate 0.1026   Epoch: 13   Global Step: 69380   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 06:08:13,102-Speed 10482.54 samples/sec   Loss 5.2941   LearningRate 0.1025   Epoch: 13   Global Step: 69390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 06:08:20,905-Speed 10499.37 samples/sec   Loss 5.3526   LearningRate 0.1025   Epoch: 13   Global Step: 69400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 06:08:28,713-Speed 10493.85 samples/sec   Loss 5.3382   LearningRate 0.1024   Epoch: 13   Global Step: 69410   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 06:08:36,513-Speed 10504.03 samples/sec   Loss 5.3653   LearningRate 0.1024   Epoch: 13   Global Step: 69420   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 06:08:44,305-Speed 10514.11 samples/sec   Loss 5.3272   LearningRate 0.1023   Epoch: 13   Global Step: 69430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 06:08:52,127-Speed 10474.64 samples/sec   Loss 5.3726   LearningRate 0.1022   Epoch: 13   Global Step: 69440   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-01-16 06:08:59,931-Speed 10499.13 samples/sec   Loss 5.3563   LearningRate 0.1022   Epoch: 13   Global Step: 69450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 06:09:07,727-Speed 10509.61 samples/sec   Loss 5.3724   LearningRate 0.1021   Epoch: 13   Global Step: 69460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 06:09:15,541-Speed 10485.21 samples/sec   Loss 5.2990   LearningRate 0.1021   Epoch: 13   Global Step: 69470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 06:09:23,325-Speed 10525.29 samples/sec   Loss 5.3425   LearningRate 0.1020   Epoch: 13   Global Step: 69480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-16 06:09:31,109-Speed 10524.93 samples/sec   Loss 5.2990   LearningRate 0.1019   Epoch: 13   Global Step: 69490   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:09:38,955-Speed 10442.69 samples/sec   Loss 5.3258   LearningRate 0.1019   Epoch: 13   Global Step: 69500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:09:46,755-Speed 10504.09 samples/sec   Loss 5.3229   LearningRate 0.1018   Epoch: 13   Global Step: 69510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:09:54,557-Speed 10501.62 samples/sec   Loss 5.3030   LearningRate 0.1018   Epoch: 13   Global Step: 69520   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:10:02,381-Speed 10472.00 samples/sec   Loss 5.3092   LearningRate 0.1017   Epoch: 13   Global Step: 69530   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:10:10,158-Speed 10534.58 samples/sec   Loss 5.3220   LearningRate 0.1017   Epoch: 13   Global Step: 69540   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:10:17,978-Speed 10476.51 samples/sec   Loss 5.3144   LearningRate 0.1016   Epoch: 13   Global Step: 69550   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-16 06:10:25,790-Speed 10489.50 samples/sec   Loss 5.3237   LearningRate 0.1015   Epoch: 13   Global Step: 69560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:10:33,587-Speed 10507.14 samples/sec   Loss 5.3326   LearningRate 0.1015   Epoch: 13   Global Step: 69570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:10:41,449-Speed 10420.73 samples/sec   Loss 5.3324   LearningRate 0.1014   Epoch: 13   Global Step: 69580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:10:49,249-Speed 10505.17 samples/sec   Loss 5.2811   LearningRate 0.1014   Epoch: 13   Global Step: 69590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:10:57,076-Speed 10466.98 samples/sec   Loss 5.3061   LearningRate 0.1013   Epoch: 13   Global Step: 69600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:11:04,855-Speed 10533.32 samples/sec   Loss 5.3435   LearningRate 0.1012   Epoch: 13   Global Step: 69610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:11:12,655-Speed 10504.55 samples/sec   Loss 5.3233   LearningRate 0.1012   Epoch: 13   Global Step: 69620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:11:20,476-Speed 10475.55 samples/sec   Loss 5.3016   LearningRate 0.1011   Epoch: 13   Global Step: 69630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:11:28,325-Speed 10438.40 samples/sec   Loss 5.3056   LearningRate 0.1011   Epoch: 13   Global Step: 69640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:11:36,141-Speed 10482.29 samples/sec   Loss 5.2926   LearningRate 0.1010   Epoch: 13   Global Step: 69650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:11:43,964-Speed 10473.55 samples/sec   Loss 5.2954   LearningRate 0.1009   Epoch: 13   Global Step: 69660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:11:51,777-Speed 10486.08 samples/sec   Loss 5.2798   LearningRate 0.1009   Epoch: 13   Global Step: 69670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:11:59,587-Speed 10490.86 samples/sec   Loss 5.3089   LearningRate 0.1008   Epoch: 13   Global Step: 69680   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:12:07,385-Speed 10507.20 samples/sec   Loss 5.3585   LearningRate 0.1008   Epoch: 13   Global Step: 69690   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:12:15,179-Speed 10511.47 samples/sec   Loss 5.3364   LearningRate 0.1007   Epoch: 13   Global Step: 69700   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:12:22,995-Speed 10482.68 samples/sec   Loss 5.2966   LearningRate 0.1006   Epoch: 13   Global Step: 69710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:12:30,817-Speed 10473.69 samples/sec   Loss 5.2822   LearningRate 0.1006   Epoch: 13   Global Step: 69720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:12:38,618-Speed 10503.37 samples/sec   Loss 5.2869   LearningRate 0.1005   Epoch: 13   Global Step: 69730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:12:46,402-Speed 10525.31 samples/sec   Loss 5.3096   LearningRate 0.1005   Epoch: 13   Global Step: 69740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:12:54,224-Speed 10474.01 samples/sec   Loss 5.3434   LearningRate 0.1004   Epoch: 13   Global Step: 69750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:13:02,016-Speed 10515.29 samples/sec   Loss 5.3344   LearningRate 0.1003   Epoch: 13   Global Step: 69760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:13:09,856-Speed 10450.77 samples/sec   Loss 5.3374   LearningRate 0.1003   Epoch: 13   Global Step: 69770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:13:17,671-Speed 10483.84 samples/sec   Loss 5.3117   LearningRate 0.1002   Epoch: 13   Global Step: 69780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:13:25,461-Speed 10517.88 samples/sec   Loss 5.2762   LearningRate 0.1002   Epoch: 13   Global Step: 69790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:13:33,279-Speed 10480.19 samples/sec   Loss 5.2856   LearningRate 0.1001   Epoch: 13   Global Step: 69800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:13:41,088-Speed 10491.89 samples/sec   Loss 5.2500   LearningRate 0.1000   Epoch: 13   Global Step: 69810   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-01-16 06:13:48,909-Speed 10474.69 samples/sec   Loss 5.3334   LearningRate 0.1000   Epoch: 13   Global Step: 69820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:13:56,709-Speed 10504.59 samples/sec   Loss 5.2597   LearningRate 0.0999   Epoch: 13   Global Step: 69830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:14:04,508-Speed 10505.17 samples/sec   Loss 5.2802   LearningRate 0.0999   Epoch: 13   Global Step: 69840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:14:12,332-Speed 10471.80 samples/sec   Loss 5.3152   LearningRate 0.0998   Epoch: 13   Global Step: 69850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:14:20,127-Speed 10509.46 samples/sec   Loss 5.2958   LearningRate 0.0998   Epoch: 13   Global Step: 69860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:14:27,937-Speed 10491.74 samples/sec   Loss 5.2960   LearningRate 0.0997   Epoch: 13   Global Step: 69870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:14:35,742-Speed 10497.67 samples/sec   Loss 5.2774   LearningRate 0.0996   Epoch: 13   Global Step: 69880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:14:43,530-Speed 10518.93 samples/sec   Loss 5.2489   LearningRate 0.0996   Epoch: 13   Global Step: 69890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:14:51,349-Speed 10479.45 samples/sec   Loss 5.3146   LearningRate 0.0995   Epoch: 13   Global Step: 69900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:14:59,167-Speed 10479.37 samples/sec   Loss 5.2797   LearningRate 0.0995   Epoch: 13   Global Step: 69910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:15:06,952-Speed 10523.97 samples/sec   Loss 5.2931   LearningRate 0.0994   Epoch: 13   Global Step: 69920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:15:14,753-Speed 10502.59 samples/sec   Loss 5.2639   LearningRate 0.0993   Epoch: 13   Global Step: 69930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:15:22,547-Speed 10512.21 samples/sec   Loss 5.2749   LearningRate 0.0993   Epoch: 13   Global Step: 69940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:15:30,394-Speed 10441.96 samples/sec   Loss 5.2759   LearningRate 0.0992   Epoch: 13   Global Step: 69950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:15:38,198-Speed 10497.35 samples/sec   Loss 5.3047   LearningRate 0.0992   Epoch: 13   Global Step: 69960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:15:46,017-Speed 10479.24 samples/sec   Loss 5.2873   LearningRate 0.0991   Epoch: 13   Global Step: 69970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:15:53,841-Speed 10472.85 samples/sec   Loss 5.2943   LearningRate 0.0990   Epoch: 13   Global Step: 69980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:16:01,638-Speed 10508.03 samples/sec   Loss 5.2893   LearningRate 0.0990   Epoch: 13   Global Step: 69990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:16:09,442-Speed 10497.93 samples/sec   Loss 5.2927   LearningRate 0.0989   Epoch: 13   Global Step: 70000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:16:37,631-[lfw][70000]XNorm: 24.195342
Training: 2022-01-16 06:16:37,632-[lfw][70000]Accuracy-Flip: 0.99783+-0.00269
Training: 2022-01-16 06:16:37,633-[lfw][70000]Accuracy-Highest: 0.99783
Training: 2022-01-16 06:17:10,348-[cfp_fp][70000]XNorm: 21.559299
Training: 2022-01-16 06:17:10,349-[cfp_fp][70000]Accuracy-Flip: 0.98700+-0.00536
Training: 2022-01-16 06:17:10,349-[cfp_fp][70000]Accuracy-Highest: 0.98700
Training: 2022-01-16 06:17:38,163-[agedb_30][70000]XNorm: 23.580380
Training: 2022-01-16 06:17:38,164-[agedb_30][70000]Accuracy-Flip: 0.97667+-0.00500
Training: 2022-01-16 06:17:38,164-[agedb_30][70000]Accuracy-Highest: 0.97667
Training: 2022-01-16 06:17:45,920-Speed 849.12 samples/sec   Loss 5.2332   LearningRate 0.0989   Epoch: 13   Global Step: 70010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:17:53,650-Speed 10599.18 samples/sec   Loss 5.2801   LearningRate 0.0988   Epoch: 13   Global Step: 70020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:18:01,395-Speed 10577.56 samples/sec   Loss 5.2409   LearningRate 0.0988   Epoch: 13   Global Step: 70030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:18:09,151-Speed 10567.45 samples/sec   Loss 5.2373   LearningRate 0.0987   Epoch: 13   Global Step: 70040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:18:16,918-Speed 10549.60 samples/sec   Loss 5.2341   LearningRate 0.0986   Epoch: 13   Global Step: 70050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:18:24,687-Speed 10545.82 samples/sec   Loss 5.2561   LearningRate 0.0986   Epoch: 13   Global Step: 70060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:18:32,481-Speed 10512.71 samples/sec   Loss 5.2528   LearningRate 0.0985   Epoch: 13   Global Step: 70070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:18:40,277-Speed 10509.22 samples/sec   Loss 5.2593   LearningRate 0.0985   Epoch: 13   Global Step: 70080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:18:48,057-Speed 10530.43 samples/sec   Loss 5.2665   LearningRate 0.0984   Epoch: 13   Global Step: 70090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:18:55,815-Speed 10560.93 samples/sec   Loss 5.2714   LearningRate 0.0983   Epoch: 13   Global Step: 70100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:19:03,648-Speed 10460.59 samples/sec   Loss 5.2613   LearningRate 0.0983   Epoch: 13   Global Step: 70110   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:19:11,403-Speed 10564.90 samples/sec   Loss 5.2310   LearningRate 0.0982   Epoch: 13   Global Step: 70120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:19:19,173-Speed 10552.83 samples/sec   Loss 5.2395   LearningRate 0.0982   Epoch: 13   Global Step: 70130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:19:26,966-Speed 10513.11 samples/sec   Loss 5.2580   LearningRate 0.0981   Epoch: 13   Global Step: 70140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:19:34,750-Speed 10525.01 samples/sec   Loss 5.2394   LearningRate 0.0981   Epoch: 13   Global Step: 70150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:19:42,500-Speed 10572.18 samples/sec   Loss 5.2410   LearningRate 0.0980   Epoch: 13   Global Step: 70160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:19:50,281-Speed 10530.50 samples/sec   Loss 5.2310   LearningRate 0.0979   Epoch: 13   Global Step: 70170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:19:58,073-Speed 10514.90 samples/sec   Loss 5.2709   LearningRate 0.0979   Epoch: 13   Global Step: 70180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:20:05,858-Speed 10523.84 samples/sec   Loss 5.2517   LearningRate 0.0978   Epoch: 13   Global Step: 70190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:20:13,626-Speed 10546.92 samples/sec   Loss 5.2402   LearningRate 0.0978   Epoch: 13   Global Step: 70200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:20:21,384-Speed 10561.54 samples/sec   Loss 5.2394   LearningRate 0.0977   Epoch: 13   Global Step: 70210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:20:29,168-Speed 10525.06 samples/sec   Loss 5.2651   LearningRate 0.0976   Epoch: 13   Global Step: 70220   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-01-16 06:20:36,973-Speed 10498.39 samples/sec   Loss 5.2609   LearningRate 0.0976   Epoch: 13   Global Step: 70230   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-01-16 06:20:44,800-Speed 10467.89 samples/sec   Loss 5.2576   LearningRate 0.0975   Epoch: 13   Global Step: 70240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:20:52,606-Speed 10494.66 samples/sec   Loss 5.2773   LearningRate 0.0975   Epoch: 13   Global Step: 70250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:21:00,421-Speed 10484.13 samples/sec   Loss 5.2451   LearningRate 0.0974   Epoch: 13   Global Step: 70260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:21:08,246-Speed 10470.01 samples/sec   Loss 5.2248   LearningRate 0.0973   Epoch: 13   Global Step: 70270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:21:16,072-Speed 10472.78 samples/sec   Loss 5.2646   LearningRate 0.0973   Epoch: 13   Global Step: 70280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:21:23,896-Speed 10471.71 samples/sec   Loss 5.2443   LearningRate 0.0972   Epoch: 13   Global Step: 70290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:21:31,708-Speed 10487.85 samples/sec   Loss 5.2774   LearningRate 0.0972   Epoch: 13   Global Step: 70300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:21:39,535-Speed 10468.45 samples/sec   Loss 5.2085   LearningRate 0.0971   Epoch: 13   Global Step: 70310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:21:47,299-Speed 10552.19 samples/sec   Loss 5.1932   LearningRate 0.0971   Epoch: 13   Global Step: 70320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:21:55,106-Speed 10494.76 samples/sec   Loss 5.2204   LearningRate 0.0970   Epoch: 13   Global Step: 70330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:22:02,890-Speed 10529.18 samples/sec   Loss 5.2262   LearningRate 0.0969   Epoch: 13   Global Step: 70340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:22:10,679-Speed 10519.10 samples/sec   Loss 5.2270   LearningRate 0.0969   Epoch: 13   Global Step: 70350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:22:18,461-Speed 10528.57 samples/sec   Loss 5.2280   LearningRate 0.0968   Epoch: 13   Global Step: 70360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:22:26,223-Speed 10554.41 samples/sec   Loss 5.2306   LearningRate 0.0968   Epoch: 13   Global Step: 70370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:22:34,009-Speed 10524.10 samples/sec   Loss 5.2193   LearningRate 0.0967   Epoch: 13   Global Step: 70380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:22:41,779-Speed 10544.85 samples/sec   Loss 5.1937   LearningRate 0.0967   Epoch: 13   Global Step: 70390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:22:49,546-Speed 10547.92 samples/sec   Loss 5.2266   LearningRate 0.0966   Epoch: 13   Global Step: 70400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:22:57,327-Speed 10529.31 samples/sec   Loss 5.1991   LearningRate 0.0965   Epoch: 13   Global Step: 70410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:23:05,116-Speed 10518.96 samples/sec   Loss 5.2008   LearningRate 0.0965   Epoch: 13   Global Step: 70420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:23:12,927-Speed 10489.27 samples/sec   Loss 5.2252   LearningRate 0.0964   Epoch: 13   Global Step: 70430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:23:20,689-Speed 10554.96 samples/sec   Loss 5.1821   LearningRate 0.0964   Epoch: 13   Global Step: 70440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:23:28,453-Speed 10552.18 samples/sec   Loss 5.1902   LearningRate 0.0963   Epoch: 13   Global Step: 70450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:23:36,226-Speed 10541.11 samples/sec   Loss 5.1999   LearningRate 0.0962   Epoch: 13   Global Step: 70460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:23:44,029-Speed 10500.86 samples/sec   Loss 5.2031   LearningRate 0.0962   Epoch: 13   Global Step: 70470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:23:51,799-Speed 10543.66 samples/sec   Loss 5.2532   LearningRate 0.0961   Epoch: 13   Global Step: 70480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:23:59,583-Speed 10525.92 samples/sec   Loss 5.2066   LearningRate 0.0961   Epoch: 13   Global Step: 70490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:24:07,363-Speed 10530.49 samples/sec   Loss 5.2173   LearningRate 0.0960   Epoch: 13   Global Step: 70500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:24:15,161-Speed 10507.79 samples/sec   Loss 5.2023   LearningRate 0.0960   Epoch: 13   Global Step: 70510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:24:22,935-Speed 10538.39 samples/sec   Loss 5.1809   LearningRate 0.0959   Epoch: 13   Global Step: 70520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:24:30,774-Speed 10452.24 samples/sec   Loss 5.1864   LearningRate 0.0958   Epoch: 13   Global Step: 70530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:24:38,611-Speed 10455.63 samples/sec   Loss 5.1917   LearningRate 0.0958   Epoch: 13   Global Step: 70540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:24:46,431-Speed 10476.48 samples/sec   Loss 5.1752   LearningRate 0.0957   Epoch: 13   Global Step: 70550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:24:54,232-Speed 10503.42 samples/sec   Loss 5.1625   LearningRate 0.0957   Epoch: 13   Global Step: 70560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:25:02,011-Speed 10531.79 samples/sec   Loss 5.1749   LearningRate 0.0956   Epoch: 13   Global Step: 70570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:25:09,774-Speed 10554.77 samples/sec   Loss 5.2231   LearningRate 0.0956   Epoch: 13   Global Step: 70580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:25:17,539-Speed 10550.76 samples/sec   Loss 5.2065   LearningRate 0.0955   Epoch: 13   Global Step: 70590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:25:25,300-Speed 10556.37 samples/sec   Loss 5.1758   LearningRate 0.0954   Epoch: 13   Global Step: 70600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:25:33,077-Speed 10535.10 samples/sec   Loss 5.2035   LearningRate 0.0954   Epoch: 13   Global Step: 70610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:25:40,845-Speed 10548.09 samples/sec   Loss 5.1661   LearningRate 0.0953   Epoch: 13   Global Step: 70620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:25:48,614-Speed 10544.78 samples/sec   Loss 5.1959   LearningRate 0.0953   Epoch: 13   Global Step: 70630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:25:56,405-Speed 10516.65 samples/sec   Loss 5.1741   LearningRate 0.0952   Epoch: 13   Global Step: 70640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:26:04,193-Speed 10520.26 samples/sec   Loss 5.1699   LearningRate 0.0951   Epoch: 13   Global Step: 70650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:26:11,960-Speed 10547.77 samples/sec   Loss 5.1927   LearningRate 0.0951   Epoch: 13   Global Step: 70660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:26:19,754-Speed 10511.41 samples/sec   Loss 5.1222   LearningRate 0.0950   Epoch: 13   Global Step: 70670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:26:27,540-Speed 10523.77 samples/sec   Loss 5.1627   LearningRate 0.0950   Epoch: 13   Global Step: 70680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:26:35,299-Speed 10559.45 samples/sec   Loss 5.1746   LearningRate 0.0949   Epoch: 13   Global Step: 70690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:26:43,095-Speed 10508.61 samples/sec   Loss 5.1795   LearningRate 0.0949   Epoch: 13   Global Step: 70700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:26:50,895-Speed 10505.34 samples/sec   Loss 5.1435   LearningRate 0.0948   Epoch: 13   Global Step: 70710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:26:58,688-Speed 10512.63 samples/sec   Loss 5.1587   LearningRate 0.0947   Epoch: 13   Global Step: 70720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:27:06,527-Speed 10451.96 samples/sec   Loss 5.1685   LearningRate 0.0947   Epoch: 13   Global Step: 70730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:27:14,334-Speed 10494.77 samples/sec   Loss 5.1620   LearningRate 0.0946   Epoch: 13   Global Step: 70740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:27:22,113-Speed 10531.45 samples/sec   Loss 5.1974   LearningRate 0.0946   Epoch: 13   Global Step: 70750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:27:29,927-Speed 10484.80 samples/sec   Loss 5.1667   LearningRate 0.0945   Epoch: 13   Global Step: 70760   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-01-16 06:27:37,693-Speed 10551.42 samples/sec   Loss 5.1641   LearningRate 0.0945   Epoch: 13   Global Step: 70770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:27:45,507-Speed 10486.76 samples/sec   Loss 5.2051   LearningRate 0.0944   Epoch: 13   Global Step: 70780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:27:53,323-Speed 10482.35 samples/sec   Loss 5.1111   LearningRate 0.0943   Epoch: 13   Global Step: 70790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:28:01,113-Speed 10516.94 samples/sec   Loss 5.1506   LearningRate 0.0943   Epoch: 13   Global Step: 70800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:28:08,899-Speed 10523.17 samples/sec   Loss 5.1959   LearningRate 0.0942   Epoch: 13   Global Step: 70810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:28:16,715-Speed 10483.83 samples/sec   Loss 5.1476   LearningRate 0.0942   Epoch: 13   Global Step: 70820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:28:24,507-Speed 10513.48 samples/sec   Loss 5.1247   LearningRate 0.0941   Epoch: 13   Global Step: 70830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:28:32,328-Speed 10477.20 samples/sec   Loss 5.1744   LearningRate 0.0941   Epoch: 13   Global Step: 70840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:28:40,102-Speed 10539.31 samples/sec   Loss 5.1526   LearningRate 0.0940   Epoch: 13   Global Step: 70850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:28:47,909-Speed 10494.24 samples/sec   Loss 5.1490   LearningRate 0.0939   Epoch: 13   Global Step: 70860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:28:55,680-Speed 10542.21 samples/sec   Loss 5.1384   LearningRate 0.0939   Epoch: 13   Global Step: 70870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:29:03,465-Speed 10525.15 samples/sec   Loss 5.1158   LearningRate 0.0938   Epoch: 13   Global Step: 70880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:29:11,248-Speed 10527.61 samples/sec   Loss 5.1660   LearningRate 0.0938   Epoch: 13   Global Step: 70890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:29:19,032-Speed 10524.58 samples/sec   Loss 5.1342   LearningRate 0.0937   Epoch: 13   Global Step: 70900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:29:26,828-Speed 10509.19 samples/sec   Loss 5.1131   LearningRate 0.0937   Epoch: 13   Global Step: 70910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:29:34,632-Speed 10498.34 samples/sec   Loss 5.1248   LearningRate 0.0936   Epoch: 13   Global Step: 70920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:29:42,426-Speed 10512.03 samples/sec   Loss 5.1755   LearningRate 0.0935   Epoch: 13   Global Step: 70930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:29:50,239-Speed 10486.99 samples/sec   Loss 5.1520   LearningRate 0.0935   Epoch: 13   Global Step: 70940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:29:58,005-Speed 10549.55 samples/sec   Loss 5.1929   LearningRate 0.0934   Epoch: 13   Global Step: 70950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:30:05,781-Speed 10536.87 samples/sec   Loss 5.1492   LearningRate 0.0934   Epoch: 13   Global Step: 70960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:30:13,613-Speed 10460.47 samples/sec   Loss 5.1570   LearningRate 0.0933   Epoch: 13   Global Step: 70970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:30:21,424-Speed 10489.46 samples/sec   Loss 5.1131   LearningRate 0.0933   Epoch: 13   Global Step: 70980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:30:29,195-Speed 10542.36 samples/sec   Loss 5.1529   LearningRate 0.0932   Epoch: 13   Global Step: 70990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:30:37,002-Speed 10494.82 samples/sec   Loss 5.1229   LearningRate 0.0931   Epoch: 13   Global Step: 71000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:30:44,802-Speed 10504.08 samples/sec   Loss 5.1032   LearningRate 0.0931   Epoch: 13   Global Step: 71010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:30:52,584-Speed 10527.86 samples/sec   Loss 5.1424   LearningRate 0.0930   Epoch: 13   Global Step: 71020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:31:00,370-Speed 10523.18 samples/sec   Loss 5.1355   LearningRate 0.0930   Epoch: 13   Global Step: 71030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:31:08,146-Speed 10536.65 samples/sec   Loss 5.1496   LearningRate 0.0929   Epoch: 13   Global Step: 71040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:31:15,954-Speed 10493.70 samples/sec   Loss 5.1078   LearningRate 0.0929   Epoch: 13   Global Step: 71050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:31:23,803-Speed 10437.95 samples/sec   Loss 5.0982   LearningRate 0.0928   Epoch: 13   Global Step: 71060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:31:31,590-Speed 10521.58 samples/sec   Loss 5.1527   LearningRate 0.0927   Epoch: 13   Global Step: 71070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:31:39,369-Speed 10532.88 samples/sec   Loss 5.1728   LearningRate 0.0927   Epoch: 13   Global Step: 71080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:31:47,149-Speed 10531.85 samples/sec   Loss 5.1203   LearningRate 0.0926   Epoch: 13   Global Step: 71090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:31:54,909-Speed 10557.77 samples/sec   Loss 5.1581   LearningRate 0.0926   Epoch: 13   Global Step: 71100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:32:02,692-Speed 10526.51 samples/sec   Loss 5.0702   LearningRate 0.0925   Epoch: 13   Global Step: 71110   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:32:10,464-Speed 10542.13 samples/sec   Loss 5.0852   LearningRate 0.0925   Epoch: 13   Global Step: 71120   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:32:18,244-Speed 10531.72 samples/sec   Loss 5.1099   LearningRate 0.0924   Epoch: 13   Global Step: 71130   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:32:26,030-Speed 10522.42 samples/sec   Loss 5.1266   LearningRate 0.0923   Epoch: 13   Global Step: 71140   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:32:33,839-Speed 10490.60 samples/sec   Loss 5.0968   LearningRate 0.0923   Epoch: 13   Global Step: 71150   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:32:41,638-Speed 10506.74 samples/sec   Loss 5.1239   LearningRate 0.0922   Epoch: 13   Global Step: 71160   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:32:49,412-Speed 10539.39 samples/sec   Loss 5.0935   LearningRate 0.0922   Epoch: 13   Global Step: 71170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:32:57,210-Speed 10505.57 samples/sec   Loss 5.0954   LearningRate 0.0921   Epoch: 13   Global Step: 71180   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:33:05,038-Speed 10466.24 samples/sec   Loss 5.1087   LearningRate 0.0921   Epoch: 13   Global Step: 71190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:33:12,806-Speed 10547.23 samples/sec   Loss 5.0701   LearningRate 0.0920   Epoch: 13   Global Step: 71200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:33:20,595-Speed 10520.20 samples/sec   Loss 5.0854   LearningRate 0.0919   Epoch: 13   Global Step: 71210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:33:28,445-Speed 10436.90 samples/sec   Loss 5.0980   LearningRate 0.0919   Epoch: 13   Global Step: 71220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:33:36,241-Speed 10509.68 samples/sec   Loss 5.0848   LearningRate 0.0918   Epoch: 13   Global Step: 71230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:33:44,031-Speed 10518.73 samples/sec   Loss 5.1365   LearningRate 0.0918   Epoch: 13   Global Step: 71240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:33:51,835-Speed 10498.01 samples/sec   Loss 5.1130   LearningRate 0.0917   Epoch: 13   Global Step: 71250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:33:59,618-Speed 10526.66 samples/sec   Loss 5.0834   LearningRate 0.0917   Epoch: 13   Global Step: 71260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:34:07,500-Speed 10394.32 samples/sec   Loss 5.0706   LearningRate 0.0916   Epoch: 13   Global Step: 71270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:34:15,293-Speed 10514.00 samples/sec   Loss 5.1239   LearningRate 0.0916   Epoch: 13   Global Step: 71280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:34:23,083-Speed 10517.77 samples/sec   Loss 5.1162   LearningRate 0.0915   Epoch: 13   Global Step: 71290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:34:30,860-Speed 10535.19 samples/sec   Loss 5.0812   LearningRate 0.0914   Epoch: 13   Global Step: 71300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:34:38,637-Speed 10535.09 samples/sec   Loss 5.0823   LearningRate 0.0914   Epoch: 13   Global Step: 71310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:34:46,441-Speed 10499.73 samples/sec   Loss 5.0708   LearningRate 0.0913   Epoch: 13   Global Step: 71320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:34:54,245-Speed 10497.88 samples/sec   Loss 5.0830   LearningRate 0.0913   Epoch: 13   Global Step: 71330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:35:02,039-Speed 10511.26 samples/sec   Loss 5.0908   LearningRate 0.0912   Epoch: 13   Global Step: 71340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:35:09,838-Speed 10506.65 samples/sec   Loss 5.0440   LearningRate 0.0912   Epoch: 13   Global Step: 71350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:35:17,630-Speed 10514.97 samples/sec   Loss 5.0862   LearningRate 0.0911   Epoch: 13   Global Step: 71360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:35:25,413-Speed 10526.68 samples/sec   Loss 5.0681   LearningRate 0.0910   Epoch: 13   Global Step: 71370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:35:33,212-Speed 10506.69 samples/sec   Loss 5.0493   LearningRate 0.0910   Epoch: 13   Global Step: 71380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:35:41,030-Speed 10479.10 samples/sec   Loss 5.0858   LearningRate 0.0909   Epoch: 13   Global Step: 71390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:35:48,844-Speed 10492.45 samples/sec   Loss 5.1105   LearningRate 0.0909   Epoch: 13   Global Step: 71400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:35:56,677-Speed 10460.21 samples/sec   Loss 5.0795   LearningRate 0.0908   Epoch: 13   Global Step: 71410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:36:04,506-Speed 10464.81 samples/sec   Loss 5.0643   LearningRate 0.0908   Epoch: 13   Global Step: 71420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:36:12,314-Speed 10493.32 samples/sec   Loss 5.0893   LearningRate 0.0907   Epoch: 13   Global Step: 71430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:36:20,111-Speed 10507.69 samples/sec   Loss 5.0855   LearningRate 0.0907   Epoch: 13   Global Step: 71440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:36:27,897-Speed 10523.55 samples/sec   Loss 5.0792   LearningRate 0.0906   Epoch: 13   Global Step: 71450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:36:35,699-Speed 10501.05 samples/sec   Loss 5.0650   LearningRate 0.0905   Epoch: 13   Global Step: 71460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:36:43,510-Speed 10488.98 samples/sec   Loss 5.0875   LearningRate 0.0905   Epoch: 13   Global Step: 71470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:36:51,372-Speed 10420.25 samples/sec   Loss 5.0693   LearningRate 0.0904   Epoch: 13   Global Step: 71480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:36:59,189-Speed 10481.66 samples/sec   Loss 5.0542   LearningRate 0.0904   Epoch: 13   Global Step: 71490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:37:07,014-Speed 10469.93 samples/sec   Loss 5.0438   LearningRate 0.0903   Epoch: 13   Global Step: 71500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:37:14,813-Speed 10505.00 samples/sec   Loss 5.0711   LearningRate 0.0903   Epoch: 13   Global Step: 71510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:37:22,629-Speed 10483.21 samples/sec   Loss 5.0496   LearningRate 0.0902   Epoch: 13   Global Step: 71520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:37:30,413-Speed 10524.76 samples/sec   Loss 5.0279   LearningRate 0.0901   Epoch: 13   Global Step: 71530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:37:38,210-Speed 10507.79 samples/sec   Loss 5.0558   LearningRate 0.0901   Epoch: 13   Global Step: 71540   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-01-16 06:37:46,016-Speed 10496.27 samples/sec   Loss 5.0569   LearningRate 0.0900   Epoch: 13   Global Step: 71550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:37:53,809-Speed 10513.81 samples/sec   Loss 5.0379   LearningRate 0.0900   Epoch: 13   Global Step: 71560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:38:01,608-Speed 10507.99 samples/sec   Loss 5.0743   LearningRate 0.0899   Epoch: 13   Global Step: 71570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:38:09,408-Speed 10504.17 samples/sec   Loss 5.0037   LearningRate 0.0899   Epoch: 13   Global Step: 71580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:38:17,217-Speed 10492.29 samples/sec   Loss 5.0613   LearningRate 0.0898   Epoch: 13   Global Step: 71590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:38:24,999-Speed 10527.86 samples/sec   Loss 5.0060   LearningRate 0.0898   Epoch: 13   Global Step: 71600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:38:32,779-Speed 10532.10 samples/sec   Loss 5.0437   LearningRate 0.0897   Epoch: 13   Global Step: 71610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:38:40,570-Speed 10515.09 samples/sec   Loss 5.0008   LearningRate 0.0896   Epoch: 13   Global Step: 71620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:38:48,383-Speed 10486.91 samples/sec   Loss 5.0541   LearningRate 0.0896   Epoch: 13   Global Step: 71630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:38:56,188-Speed 10497.68 samples/sec   Loss 5.0518   LearningRate 0.0895   Epoch: 13   Global Step: 71640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:39:03,967-Speed 10534.28 samples/sec   Loss 5.0693   LearningRate 0.0895   Epoch: 13   Global Step: 71650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:39:11,759-Speed 10514.91 samples/sec   Loss 5.0190   LearningRate 0.0894   Epoch: 13   Global Step: 71660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:39:19,572-Speed 10485.74 samples/sec   Loss 5.0320   LearningRate 0.0894   Epoch: 13   Global Step: 71670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:39:27,356-Speed 10526.34 samples/sec   Loss 5.0529   LearningRate 0.0893   Epoch: 13   Global Step: 71680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:39:35,156-Speed 10504.47 samples/sec   Loss 5.0164   LearningRate 0.0893   Epoch: 13   Global Step: 71690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:39:42,947-Speed 10515.42 samples/sec   Loss 5.0414   LearningRate 0.0892   Epoch: 13   Global Step: 71700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:39:50,745-Speed 10506.68 samples/sec   Loss 5.0150   LearningRate 0.0891   Epoch: 13   Global Step: 71710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:39:58,538-Speed 10513.97 samples/sec   Loss 5.0060   LearningRate 0.0891   Epoch: 13   Global Step: 71720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:40:06,330-Speed 10514.33 samples/sec   Loss 5.0391   LearningRate 0.0890   Epoch: 13   Global Step: 71730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:40:14,122-Speed 10514.70 samples/sec   Loss 5.0335   LearningRate 0.0890   Epoch: 13   Global Step: 71740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:40:21,906-Speed 10525.60 samples/sec   Loss 5.0068   LearningRate 0.0889   Epoch: 13   Global Step: 71750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:40:29,690-Speed 10525.13 samples/sec   Loss 4.9859   LearningRate 0.0889   Epoch: 13   Global Step: 71760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:40:37,460-Speed 10551.81 samples/sec   Loss 4.9771   LearningRate 0.0888   Epoch: 13   Global Step: 71770   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-16 06:40:45,232-Speed 10542.95 samples/sec   Loss 5.0609   LearningRate 0.0887   Epoch: 13   Global Step: 71780   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-16 06:40:53,031-Speed 10505.61 samples/sec   Loss 5.0197   LearningRate 0.0887   Epoch: 13   Global Step: 71790   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-16 06:41:00,843-Speed 10490.35 samples/sec   Loss 5.0300   LearningRate 0.0886   Epoch: 13   Global Step: 71800   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-16 06:41:08,629-Speed 10523.62 samples/sec   Loss 5.0220   LearningRate 0.0886   Epoch: 13   Global Step: 71810   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-16 06:41:16,413-Speed 10525.94 samples/sec   Loss 5.0238   LearningRate 0.0885   Epoch: 13   Global Step: 71820   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-16 06:41:24,227-Speed 10485.42 samples/sec   Loss 5.0269   LearningRate 0.0885   Epoch: 13   Global Step: 71830   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-16 06:41:32,057-Speed 10463.92 samples/sec   Loss 5.0203   LearningRate 0.0884   Epoch: 13   Global Step: 71840   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-16 06:41:39,874-Speed 10481.85 samples/sec   Loss 5.0440   LearningRate 0.0884   Epoch: 13   Global Step: 71850   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-16 06:41:47,699-Speed 10471.11 samples/sec   Loss 5.0236   LearningRate 0.0883   Epoch: 13   Global Step: 71860   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-16 06:41:55,482-Speed 10526.98 samples/sec   Loss 4.9915   LearningRate 0.0882   Epoch: 13   Global Step: 71870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:42:03,262-Speed 10530.39 samples/sec   Loss 5.0474   LearningRate 0.0882   Epoch: 13   Global Step: 71880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:42:11,054-Speed 10514.76 samples/sec   Loss 5.0074   LearningRate 0.0881   Epoch: 13   Global Step: 71890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:42:18,922-Speed 10413.05 samples/sec   Loss 4.9878   LearningRate 0.0881   Epoch: 13   Global Step: 71900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:42:26,743-Speed 10479.25 samples/sec   Loss 4.9981   LearningRate 0.0880   Epoch: 13   Global Step: 71910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:42:34,533-Speed 10515.88 samples/sec   Loss 5.0042   LearningRate 0.0880   Epoch: 13   Global Step: 71920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:42:42,325-Speed 10515.38 samples/sec   Loss 4.9972   LearningRate 0.0879   Epoch: 13   Global Step: 71930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:42:50,121-Speed 10510.02 samples/sec   Loss 5.0176   LearningRate 0.0879   Epoch: 13   Global Step: 71940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:42:57,950-Speed 10464.99 samples/sec   Loss 4.9774   LearningRate 0.0878   Epoch: 13   Global Step: 71950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:43:05,748-Speed 10505.66 samples/sec   Loss 4.9944   LearningRate 0.0878   Epoch: 13   Global Step: 71960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:43:13,544-Speed 10510.46 samples/sec   Loss 4.9781   LearningRate 0.0877   Epoch: 13   Global Step: 71970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:43:21,311-Speed 10547.90 samples/sec   Loss 5.0158   LearningRate 0.0876   Epoch: 13   Global Step: 71980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:43:29,118-Speed 10494.16 samples/sec   Loss 4.9966   LearningRate 0.0876   Epoch: 13   Global Step: 71990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:43:36,904-Speed 10524.46 samples/sec   Loss 4.9998   LearningRate 0.0875   Epoch: 13   Global Step: 72000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:43:44,704-Speed 10510.16 samples/sec   Loss 4.9621   LearningRate 0.0875   Epoch: 13   Global Step: 72010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:43:52,508-Speed 10498.42 samples/sec   Loss 5.0106   LearningRate 0.0874   Epoch: 13   Global Step: 72020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:44:00,320-Speed 10487.70 samples/sec   Loss 5.0107   LearningRate 0.0874   Epoch: 13   Global Step: 72030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:44:08,116-Speed 10509.44 samples/sec   Loss 4.9871   LearningRate 0.0873   Epoch: 13   Global Step: 72040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:44:15,892-Speed 10540.22 samples/sec   Loss 4.9791   LearningRate 0.0873   Epoch: 13   Global Step: 72050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:44:23,678-Speed 10522.86 samples/sec   Loss 4.9976   LearningRate 0.0872   Epoch: 13   Global Step: 72060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:44:31,451-Speed 10540.00 samples/sec   Loss 4.9931   LearningRate 0.0871   Epoch: 13   Global Step: 72070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:44:39,233-Speed 10529.14 samples/sec   Loss 5.0247   LearningRate 0.0871   Epoch: 13   Global Step: 72080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:44:47,023-Speed 10516.85 samples/sec   Loss 5.0086   LearningRate 0.0870   Epoch: 13   Global Step: 72090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:44:54,823-Speed 10504.48 samples/sec   Loss 4.9751   LearningRate 0.0870   Epoch: 13   Global Step: 72100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:45:02,680-Speed 10427.25 samples/sec   Loss 4.9847   LearningRate 0.0869   Epoch: 13   Global Step: 72110   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:45:10,493-Speed 10486.00 samples/sec   Loss 4.9585   LearningRate 0.0869   Epoch: 13   Global Step: 72120   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:45:18,270-Speed 10536.09 samples/sec   Loss 4.9959   LearningRate 0.0868   Epoch: 13   Global Step: 72130   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:45:26,046-Speed 10535.89 samples/sec   Loss 4.9565   LearningRate 0.0868   Epoch: 13   Global Step: 72140   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:45:33,856-Speed 10489.71 samples/sec   Loss 4.9868   LearningRate 0.0867   Epoch: 13   Global Step: 72150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:45:41,682-Speed 10469.32 samples/sec   Loss 4.9486   LearningRate 0.0866   Epoch: 13   Global Step: 72160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:45:49,462-Speed 10531.25 samples/sec   Loss 4.9771   LearningRate 0.0866   Epoch: 13   Global Step: 72170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:45:57,280-Speed 10479.70 samples/sec   Loss 4.9770   LearningRate 0.0865   Epoch: 13   Global Step: 72180   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:46:05,072-Speed 10517.26 samples/sec   Loss 4.9784   LearningRate 0.0865   Epoch: 13   Global Step: 72190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:46:12,856-Speed 10525.62 samples/sec   Loss 4.9557   LearningRate 0.0864   Epoch: 13   Global Step: 72200   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:46:20,651-Speed 10511.04 samples/sec   Loss 4.9269   LearningRate 0.0864   Epoch: 13   Global Step: 72210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:46:28,461-Speed 10489.51 samples/sec   Loss 4.9456   LearningRate 0.0863   Epoch: 13   Global Step: 72220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:46:36,252-Speed 10516.63 samples/sec   Loss 4.9677   LearningRate 0.0863   Epoch: 13   Global Step: 72230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:46:44,064-Speed 10488.33 samples/sec   Loss 4.9285   LearningRate 0.0862   Epoch: 13   Global Step: 72240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:46:51,863-Speed 10505.14 samples/sec   Loss 4.9605   LearningRate 0.0862   Epoch: 13   Global Step: 72250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:46:59,691-Speed 10468.94 samples/sec   Loss 4.9323   LearningRate 0.0861   Epoch: 13   Global Step: 72260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:47:07,490-Speed 10507.20 samples/sec   Loss 4.9811   LearningRate 0.0860   Epoch: 13   Global Step: 72270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:47:15,306-Speed 10482.25 samples/sec   Loss 4.9567   LearningRate 0.0860   Epoch: 13   Global Step: 72280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:47:23,122-Speed 10482.11 samples/sec   Loss 4.9707   LearningRate 0.0859   Epoch: 13   Global Step: 72290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:47:30,923-Speed 10501.55 samples/sec   Loss 4.9842   LearningRate 0.0859   Epoch: 13   Global Step: 72300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:47:38,722-Speed 10506.67 samples/sec   Loss 4.9711   LearningRate 0.0858   Epoch: 13   Global Step: 72310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:47:46,514-Speed 10515.03 samples/sec   Loss 4.9386   LearningRate 0.0858   Epoch: 13   Global Step: 72320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:47:54,334-Speed 10475.77 samples/sec   Loss 4.9136   LearningRate 0.0857   Epoch: 13   Global Step: 72330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:48:02,133-Speed 10505.81 samples/sec   Loss 4.9418   LearningRate 0.0857   Epoch: 13   Global Step: 72340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:48:09,943-Speed 10491.24 samples/sec   Loss 4.9593   LearningRate 0.0856   Epoch: 13   Global Step: 72350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:48:17,757-Speed 10484.69 samples/sec   Loss 4.9410   LearningRate 0.0856   Epoch: 13   Global Step: 72360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:48:25,601-Speed 10444.29 samples/sec   Loss 4.9307   LearningRate 0.0855   Epoch: 13   Global Step: 72370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:48:33,392-Speed 10515.89 samples/sec   Loss 4.9246   LearningRate 0.0854   Epoch: 13   Global Step: 72380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:48:41,176-Speed 10525.99 samples/sec   Loss 4.9564   LearningRate 0.0854   Epoch: 13   Global Step: 72390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:48:48,965-Speed 10519.30 samples/sec   Loss 4.9599   LearningRate 0.0853   Epoch: 13   Global Step: 72400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:48:56,761-Speed 10509.60 samples/sec   Loss 4.9433   LearningRate 0.0853   Epoch: 13   Global Step: 72410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:49:04,568-Speed 10493.50 samples/sec   Loss 4.8929   LearningRate 0.0852   Epoch: 13   Global Step: 72420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:49:12,351-Speed 10528.00 samples/sec   Loss 4.9139   LearningRate 0.0852   Epoch: 13   Global Step: 72430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:49:20,145-Speed 10517.33 samples/sec   Loss 4.9232   LearningRate 0.0851   Epoch: 13   Global Step: 72440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:49:27,947-Speed 10500.28 samples/sec   Loss 4.9502   LearningRate 0.0851   Epoch: 13   Global Step: 72450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:49:35,733-Speed 10522.84 samples/sec   Loss 4.9349   LearningRate 0.0850   Epoch: 13   Global Step: 72460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:49:43,506-Speed 10541.78 samples/sec   Loss 4.9526   LearningRate 0.0850   Epoch: 13   Global Step: 72470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:49:51,305-Speed 10504.66 samples/sec   Loss 4.9200   LearningRate 0.0849   Epoch: 13   Global Step: 72480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:49:59,115-Speed 10490.00 samples/sec   Loss 4.9407   LearningRate 0.0848   Epoch: 13   Global Step: 72490   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-01-16 06:50:06,927-Speed 10487.68 samples/sec   Loss 4.9731   LearningRate 0.0848   Epoch: 13   Global Step: 72500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:50:14,763-Speed 10455.91 samples/sec   Loss 4.9388   LearningRate 0.0847   Epoch: 13   Global Step: 72510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:50:22,540-Speed 10534.57 samples/sec   Loss 4.9458   LearningRate 0.0847   Epoch: 13   Global Step: 72520   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:50:30,344-Speed 10499.14 samples/sec   Loss 4.8657   LearningRate 0.0846   Epoch: 13   Global Step: 72530   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:50:38,134-Speed 10516.48 samples/sec   Loss 4.9116   LearningRate 0.0846   Epoch: 13   Global Step: 72540   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:50:45,916-Speed 10528.36 samples/sec   Loss 4.9212   LearningRate 0.0845   Epoch: 13   Global Step: 72550   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:50:53,709-Speed 10513.38 samples/sec   Loss 4.9612   LearningRate 0.0845   Epoch: 13   Global Step: 72560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:51:01,476-Speed 10548.66 samples/sec   Loss 4.9459   LearningRate 0.0844   Epoch: 13   Global Step: 72570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:51:09,270-Speed 10512.90 samples/sec   Loss 4.9123   LearningRate 0.0844   Epoch: 13   Global Step: 72580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:51:32,516-Speed 3524.09 samples/sec   Loss 4.9469   LearningRate 0.0843   Epoch: 14   Global Step: 72590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:51:40,276-Speed 10559.16 samples/sec   Loss 4.9391   LearningRate 0.0842   Epoch: 14   Global Step: 72600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:51:48,080-Speed 10498.94 samples/sec   Loss 4.8938   LearningRate 0.0842   Epoch: 14   Global Step: 72610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:51:55,855-Speed 10540.50 samples/sec   Loss 4.8671   LearningRate 0.0841   Epoch: 14   Global Step: 72620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:52:03,633-Speed 10533.78 samples/sec   Loss 4.8702   LearningRate 0.0841   Epoch: 14   Global Step: 72630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:52:11,384-Speed 10569.17 samples/sec   Loss 4.8750   LearningRate 0.0840   Epoch: 14   Global Step: 72640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:52:19,187-Speed 10500.93 samples/sec   Loss 4.9102   LearningRate 0.0840   Epoch: 14   Global Step: 72650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:52:26,959-Speed 10542.49 samples/sec   Loss 4.8967   LearningRate 0.0839   Epoch: 14   Global Step: 72660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:52:34,720-Speed 10555.63 samples/sec   Loss 4.8839   LearningRate 0.0839   Epoch: 14   Global Step: 72670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:52:42,515-Speed 10511.19 samples/sec   Loss 4.8719   LearningRate 0.0838   Epoch: 14   Global Step: 72680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:52:50,307-Speed 10514.80 samples/sec   Loss 4.8891   LearningRate 0.0838   Epoch: 14   Global Step: 72690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:52:58,082-Speed 10537.27 samples/sec   Loss 4.8811   LearningRate 0.0837   Epoch: 14   Global Step: 72700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:53:05,859-Speed 10534.08 samples/sec   Loss 4.8624   LearningRate 0.0836   Epoch: 14   Global Step: 72710   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:53:13,649-Speed 10517.64 samples/sec   Loss 4.8436   LearningRate 0.0836   Epoch: 14   Global Step: 72720   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:53:21,427-Speed 10534.83 samples/sec   Loss 4.8924   LearningRate 0.0835   Epoch: 14   Global Step: 72730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:53:29,241-Speed 10485.04 samples/sec   Loss 4.8293   LearningRate 0.0835   Epoch: 14   Global Step: 72740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:53:37,025-Speed 10525.03 samples/sec   Loss 4.8494   LearningRate 0.0834   Epoch: 14   Global Step: 72750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:53:44,799-Speed 10539.29 samples/sec   Loss 4.8787   LearningRate 0.0834   Epoch: 14   Global Step: 72760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:53:52,580-Speed 10529.44 samples/sec   Loss 4.8932   LearningRate 0.0833   Epoch: 14   Global Step: 72770   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:54:00,371-Speed 10516.64 samples/sec   Loss 4.8808   LearningRate 0.0833   Epoch: 14   Global Step: 72780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:54:08,138-Speed 10547.68 samples/sec   Loss 4.8602   LearningRate 0.0832   Epoch: 14   Global Step: 72790   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:54:15,911-Speed 10540.81 samples/sec   Loss 4.8914   LearningRate 0.0832   Epoch: 14   Global Step: 72800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:54:23,677-Speed 10550.50 samples/sec   Loss 4.8469   LearningRate 0.0831   Epoch: 14   Global Step: 72810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:54:31,447-Speed 10544.61 samples/sec   Loss 4.8523   LearningRate 0.0831   Epoch: 14   Global Step: 72820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:54:39,228-Speed 10528.47 samples/sec   Loss 4.8826   LearningRate 0.0830   Epoch: 14   Global Step: 72830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:54:47,008-Speed 10531.79 samples/sec   Loss 4.8952   LearningRate 0.0829   Epoch: 14   Global Step: 72840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:54:54,793-Speed 10523.94 samples/sec   Loss 4.8835   LearningRate 0.0829   Epoch: 14   Global Step: 72850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:55:02,573-Speed 10531.21 samples/sec   Loss 4.8926   LearningRate 0.0828   Epoch: 14   Global Step: 72860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:55:10,363-Speed 10517.05 samples/sec   Loss 4.8860   LearningRate 0.0828   Epoch: 14   Global Step: 72870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:55:18,158-Speed 10510.70 samples/sec   Loss 4.9020   LearningRate 0.0827   Epoch: 14   Global Step: 72880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:55:25,937-Speed 10532.88 samples/sec   Loss 4.8622   LearningRate 0.0827   Epoch: 14   Global Step: 72890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:55:33,706-Speed 10546.07 samples/sec   Loss 4.8759   LearningRate 0.0826   Epoch: 14   Global Step: 72900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:55:41,482-Speed 10534.77 samples/sec   Loss 4.8712   LearningRate 0.0826   Epoch: 14   Global Step: 72910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:55:49,249-Speed 10549.72 samples/sec   Loss 4.8483   LearningRate 0.0825   Epoch: 14   Global Step: 72920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:55:57,021-Speed 10541.54 samples/sec   Loss 4.8555   LearningRate 0.0825   Epoch: 14   Global Step: 72930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:56:04,790-Speed 10545.54 samples/sec   Loss 4.8498   LearningRate 0.0824   Epoch: 14   Global Step: 72940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:56:12,587-Speed 10507.90 samples/sec   Loss 4.8565   LearningRate 0.0824   Epoch: 14   Global Step: 72950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:56:20,417-Speed 10464.17 samples/sec   Loss 4.8286   LearningRate 0.0823   Epoch: 14   Global Step: 72960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:56:28,251-Speed 10458.97 samples/sec   Loss 4.8736   LearningRate 0.0823   Epoch: 14   Global Step: 72970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:56:36,069-Speed 10479.61 samples/sec   Loss 4.8905   LearningRate 0.0822   Epoch: 14   Global Step: 72980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:56:43,881-Speed 10486.67 samples/sec   Loss 4.8887   LearningRate 0.0821   Epoch: 14   Global Step: 72990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:56:51,697-Speed 10483.34 samples/sec   Loss 4.8757   LearningRate 0.0821   Epoch: 14   Global Step: 73000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:56:59,521-Speed 10470.83 samples/sec   Loss 4.8267   LearningRate 0.0820   Epoch: 14   Global Step: 73010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:57:07,368-Speed 10441.41 samples/sec   Loss 4.8662   LearningRate 0.0820   Epoch: 14   Global Step: 73020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:57:15,227-Speed 10425.24 samples/sec   Loss 4.8377   LearningRate 0.0819   Epoch: 14   Global Step: 73030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:57:23,102-Speed 10404.41 samples/sec   Loss 4.8530   LearningRate 0.0819   Epoch: 14   Global Step: 73040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:57:30,924-Speed 10474.59 samples/sec   Loss 4.8377   LearningRate 0.0818   Epoch: 14   Global Step: 73050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:57:38,775-Speed 10435.25 samples/sec   Loss 4.8821   LearningRate 0.0818   Epoch: 14   Global Step: 73060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:57:46,593-Speed 10480.38 samples/sec   Loss 4.8420   LearningRate 0.0817   Epoch: 14   Global Step: 73070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:57:54,438-Speed 10444.18 samples/sec   Loss 4.8642   LearningRate 0.0817   Epoch: 14   Global Step: 73080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:58:02,244-Speed 10494.73 samples/sec   Loss 4.8736   LearningRate 0.0816   Epoch: 14   Global Step: 73090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:58:10,063-Speed 10479.18 samples/sec   Loss 4.8442   LearningRate 0.0816   Epoch: 14   Global Step: 73100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:58:17,881-Speed 10480.65 samples/sec   Loss 4.8093   LearningRate 0.0815   Epoch: 14   Global Step: 73110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:58:25,700-Speed 10478.35 samples/sec   Loss 4.7942   LearningRate 0.0814   Epoch: 14   Global Step: 73120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:58:33,510-Speed 10490.25 samples/sec   Loss 4.8054   LearningRate 0.0814   Epoch: 14   Global Step: 73130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:58:41,336-Speed 10469.47 samples/sec   Loss 4.8295   LearningRate 0.0813   Epoch: 14   Global Step: 73140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 06:58:49,168-Speed 10460.19 samples/sec   Loss 4.8391   LearningRate 0.0813   Epoch: 14   Global Step: 73150   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:58:57,000-Speed 10462.14 samples/sec   Loss 4.7924   LearningRate 0.0812   Epoch: 14   Global Step: 73160   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:59:04,833-Speed 10459.40 samples/sec   Loss 4.8519   LearningRate 0.0812   Epoch: 14   Global Step: 73170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:59:12,653-Speed 10477.07 samples/sec   Loss 4.8641   LearningRate 0.0811   Epoch: 14   Global Step: 73180   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:59:20,474-Speed 10475.37 samples/sec   Loss 4.7960   LearningRate 0.0811   Epoch: 14   Global Step: 73190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:59:28,297-Speed 10473.52 samples/sec   Loss 4.8172   LearningRate 0.0810   Epoch: 14   Global Step: 73200   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:59:36,093-Speed 10509.22 samples/sec   Loss 4.8093   LearningRate 0.0810   Epoch: 14   Global Step: 73210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:59:43,902-Speed 10495.00 samples/sec   Loss 4.8316   LearningRate 0.0809   Epoch: 14   Global Step: 73220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:59:51,711-Speed 10491.52 samples/sec   Loss 4.8802   LearningRate 0.0809   Epoch: 14   Global Step: 73230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 06:59:59,517-Speed 10496.27 samples/sec   Loss 4.8232   LearningRate 0.0808   Epoch: 14   Global Step: 73240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:00:07,302-Speed 10522.74 samples/sec   Loss 4.8139   LearningRate 0.0808   Epoch: 14   Global Step: 73250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:00:15,115-Speed 10489.13 samples/sec   Loss 4.7796   LearningRate 0.0807   Epoch: 14   Global Step: 73260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:00:22,901-Speed 10522.36 samples/sec   Loss 4.8291   LearningRate 0.0807   Epoch: 14   Global Step: 73270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:00:30,688-Speed 10521.46 samples/sec   Loss 4.8058   LearningRate 0.0806   Epoch: 14   Global Step: 73280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:00:38,510-Speed 10474.51 samples/sec   Loss 4.8023   LearningRate 0.0805   Epoch: 14   Global Step: 73290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:00:46,322-Speed 10487.49 samples/sec   Loss 4.8153   LearningRate 0.0805   Epoch: 14   Global Step: 73300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:00:54,152-Speed 10463.05 samples/sec   Loss 4.8104   LearningRate 0.0804   Epoch: 14   Global Step: 73310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:01:01,968-Speed 10483.22 samples/sec   Loss 4.8428   LearningRate 0.0804   Epoch: 14   Global Step: 73320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:01:09,788-Speed 10476.64 samples/sec   Loss 4.7981   LearningRate 0.0803   Epoch: 14   Global Step: 73330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:01:17,604-Speed 10483.12 samples/sec   Loss 4.8409   LearningRate 0.0803   Epoch: 14   Global Step: 73340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:01:25,438-Speed 10457.90 samples/sec   Loss 4.8111   LearningRate 0.0802   Epoch: 14   Global Step: 73350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:01:33,265-Speed 10467.98 samples/sec   Loss 4.7786   LearningRate 0.0802   Epoch: 14   Global Step: 73360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:01:41,065-Speed 10503.19 samples/sec   Loss 4.8208   LearningRate 0.0801   Epoch: 14   Global Step: 73370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:01:48,864-Speed 10506.61 samples/sec   Loss 4.7953   LearningRate 0.0801   Epoch: 14   Global Step: 73380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:01:56,652-Speed 10519.60 samples/sec   Loss 4.7710   LearningRate 0.0800   Epoch: 14   Global Step: 73390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:02:04,458-Speed 10495.82 samples/sec   Loss 4.7074   LearningRate 0.0800   Epoch: 14   Global Step: 73400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:02:12,247-Speed 10518.97 samples/sec   Loss 4.7437   LearningRate 0.0799   Epoch: 14   Global Step: 73410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:02:20,042-Speed 10513.91 samples/sec   Loss 4.7953   LearningRate 0.0799   Epoch: 14   Global Step: 73420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:02:27,826-Speed 10524.65 samples/sec   Loss 4.7921   LearningRate 0.0798   Epoch: 14   Global Step: 73430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:02:35,598-Speed 10541.97 samples/sec   Loss 4.7622   LearningRate 0.0798   Epoch: 14   Global Step: 73440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:02:43,416-Speed 10480.30 samples/sec   Loss 4.7980   LearningRate 0.0797   Epoch: 14   Global Step: 73450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:02:51,239-Speed 10473.60 samples/sec   Loss 4.7841   LearningRate 0.0796   Epoch: 14   Global Step: 73460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:02:59,055-Speed 10481.66 samples/sec   Loss 4.7922   LearningRate 0.0796   Epoch: 14   Global Step: 73470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:03:06,836-Speed 10530.54 samples/sec   Loss 4.8175   LearningRate 0.0795   Epoch: 14   Global Step: 73480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:03:14,616-Speed 10530.25 samples/sec   Loss 4.7660   LearningRate 0.0795   Epoch: 14   Global Step: 73490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:03:22,409-Speed 10514.82 samples/sec   Loss 4.8010   LearningRate 0.0794   Epoch: 14   Global Step: 73500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:03:30,234-Speed 10469.84 samples/sec   Loss 4.7685   LearningRate 0.0794   Epoch: 14   Global Step: 73510   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:03:38,025-Speed 10516.40 samples/sec   Loss 4.7785   LearningRate 0.0793   Epoch: 14   Global Step: 73520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:03:45,823-Speed 10507.30 samples/sec   Loss 4.7494   LearningRate 0.0793   Epoch: 14   Global Step: 73530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:03:53,624-Speed 10502.25 samples/sec   Loss 4.8039   LearningRate 0.0792   Epoch: 14   Global Step: 73540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:04:01,449-Speed 10470.49 samples/sec   Loss 4.7684   LearningRate 0.0792   Epoch: 14   Global Step: 73550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:04:09,253-Speed 10499.63 samples/sec   Loss 4.7692   LearningRate 0.0791   Epoch: 14   Global Step: 73560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:04:17,058-Speed 10496.47 samples/sec   Loss 4.8061   LearningRate 0.0791   Epoch: 14   Global Step: 73570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:04:24,860-Speed 10501.21 samples/sec   Loss 4.7638   LearningRate 0.0790   Epoch: 14   Global Step: 73580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:04:32,632-Speed 10541.62 samples/sec   Loss 4.7630   LearningRate 0.0790   Epoch: 14   Global Step: 73590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:04:40,461-Speed 10465.72 samples/sec   Loss 4.7376   LearningRate 0.0789   Epoch: 14   Global Step: 73600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:04:48,263-Speed 10500.38 samples/sec   Loss 4.7668   LearningRate 0.0789   Epoch: 14   Global Step: 73610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:04:56,032-Speed 10545.84 samples/sec   Loss 4.7645   LearningRate 0.0788   Epoch: 14   Global Step: 73620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:05:03,851-Speed 10478.53 samples/sec   Loss 4.7585   LearningRate 0.0788   Epoch: 14   Global Step: 73630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:05:11,714-Speed 10423.22 samples/sec   Loss 4.7517   LearningRate 0.0787   Epoch: 14   Global Step: 73640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:05:19,599-Speed 10390.33 samples/sec   Loss 4.7371   LearningRate 0.0786   Epoch: 14   Global Step: 73650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:05:27,386-Speed 10521.95 samples/sec   Loss 4.7638   LearningRate 0.0786   Epoch: 14   Global Step: 73660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:05:35,169-Speed 10526.14 samples/sec   Loss 4.7489   LearningRate 0.0785   Epoch: 14   Global Step: 73670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:05:42,977-Speed 10493.37 samples/sec   Loss 4.7441   LearningRate 0.0785   Epoch: 14   Global Step: 73680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:05:50,772-Speed 10511.34 samples/sec   Loss 4.7749   LearningRate 0.0784   Epoch: 14   Global Step: 73690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:05:58,592-Speed 10476.30 samples/sec   Loss 4.7483   LearningRate 0.0784   Epoch: 14   Global Step: 73700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:06:06,418-Speed 10469.11 samples/sec   Loss 4.7572   LearningRate 0.0783   Epoch: 14   Global Step: 73710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:06:14,220-Speed 10501.78 samples/sec   Loss 4.7602   LearningRate 0.0783   Epoch: 14   Global Step: 73720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:06:21,997-Speed 10534.72 samples/sec   Loss 4.7528   LearningRate 0.0782   Epoch: 14   Global Step: 73730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:06:29,770-Speed 10539.93 samples/sec   Loss 4.7742   LearningRate 0.0782   Epoch: 14   Global Step: 73740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:06:37,572-Speed 10501.91 samples/sec   Loss 4.7642   LearningRate 0.0781   Epoch: 14   Global Step: 73750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:06:45,381-Speed 10491.84 samples/sec   Loss 4.7070   LearningRate 0.0781   Epoch: 14   Global Step: 73760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:06:53,170-Speed 10518.88 samples/sec   Loss 4.7430   LearningRate 0.0780   Epoch: 14   Global Step: 73770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:07:00,972-Speed 10512.94 samples/sec   Loss 4.7376   LearningRate 0.0780   Epoch: 14   Global Step: 73780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:07:08,749-Speed 10535.28 samples/sec   Loss 4.7380   LearningRate 0.0779   Epoch: 14   Global Step: 73790   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:07:16,535-Speed 10523.35 samples/sec   Loss 4.7697   LearningRate 0.0779   Epoch: 14   Global Step: 73800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:07:24,323-Speed 10518.67 samples/sec   Loss 4.8024   LearningRate 0.0778   Epoch: 14   Global Step: 73810   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:07:32,138-Speed 10483.77 samples/sec   Loss 4.7486   LearningRate 0.0778   Epoch: 14   Global Step: 73820   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:07:39,946-Speed 10493.83 samples/sec   Loss 4.6983   LearningRate 0.0777   Epoch: 14   Global Step: 73830   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:07:47,723-Speed 10534.94 samples/sec   Loss 4.7076   LearningRate 0.0777   Epoch: 14   Global Step: 73840   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:07:55,526-Speed 10499.15 samples/sec   Loss 4.7509   LearningRate 0.0776   Epoch: 14   Global Step: 73850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:08:03,344-Speed 10480.13 samples/sec   Loss 4.7374   LearningRate 0.0776   Epoch: 14   Global Step: 73860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:08:11,129-Speed 10524.58 samples/sec   Loss 4.7181   LearningRate 0.0775   Epoch: 14   Global Step: 73870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:08:18,946-Speed 10482.86 samples/sec   Loss 4.7146   LearningRate 0.0774   Epoch: 14   Global Step: 73880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:08:26,742-Speed 10508.88 samples/sec   Loss 4.6995   LearningRate 0.0774   Epoch: 14   Global Step: 73890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:08:34,535-Speed 10513.40 samples/sec   Loss 4.7125   LearningRate 0.0773   Epoch: 14   Global Step: 73900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:08:42,333-Speed 10507.37 samples/sec   Loss 4.7021   LearningRate 0.0773   Epoch: 14   Global Step: 73910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:08:50,140-Speed 10495.77 samples/sec   Loss 4.7368   LearningRate 0.0772   Epoch: 14   Global Step: 73920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:08:57,948-Speed 10493.48 samples/sec   Loss 4.7162   LearningRate 0.0772   Epoch: 14   Global Step: 73930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:09:05,753-Speed 10498.78 samples/sec   Loss 4.7211   LearningRate 0.0771   Epoch: 14   Global Step: 73940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:09:13,545-Speed 10513.01 samples/sec   Loss 4.7220   LearningRate 0.0771   Epoch: 14   Global Step: 73950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:09:21,321-Speed 10537.03 samples/sec   Loss 4.7108   LearningRate 0.0770   Epoch: 14   Global Step: 73960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:09:29,170-Speed 10439.02 samples/sec   Loss 4.6963   LearningRate 0.0770   Epoch: 14   Global Step: 73970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:09:36,978-Speed 10493.59 samples/sec   Loss 4.6720   LearningRate 0.0769   Epoch: 14   Global Step: 73980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:09:44,787-Speed 10492.59 samples/sec   Loss 4.6983   LearningRate 0.0769   Epoch: 14   Global Step: 73990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:09:52,614-Speed 10466.75 samples/sec   Loss 4.6990   LearningRate 0.0768   Epoch: 14   Global Step: 74000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:10:00,405-Speed 10517.57 samples/sec   Loss 4.6846   LearningRate 0.0768   Epoch: 14   Global Step: 74010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:10:08,208-Speed 10499.53 samples/sec   Loss 4.7162   LearningRate 0.0767   Epoch: 14   Global Step: 74020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:10:16,036-Speed 10466.27 samples/sec   Loss 4.7319   LearningRate 0.0767   Epoch: 14   Global Step: 74030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:10:23,836-Speed 10503.13 samples/sec   Loss 4.7381   LearningRate 0.0766   Epoch: 14   Global Step: 74040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:10:31,630-Speed 10512.03 samples/sec   Loss 4.6748   LearningRate 0.0766   Epoch: 14   Global Step: 74050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:10:39,438-Speed 10494.28 samples/sec   Loss 4.6989   LearningRate 0.0765   Epoch: 14   Global Step: 74060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:10:47,227-Speed 10517.67 samples/sec   Loss 4.7145   LearningRate 0.0765   Epoch: 14   Global Step: 74070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:10:55,004-Speed 10541.30 samples/sec   Loss 4.6984   LearningRate 0.0764   Epoch: 14   Global Step: 74080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:11:02,787-Speed 10526.48 samples/sec   Loss 4.7128   LearningRate 0.0764   Epoch: 14   Global Step: 74090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-16 07:11:10,592-Speed 10497.18 samples/sec   Loss 4.7474   LearningRate 0.0763   Epoch: 14   Global Step: 74100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:11:18,391-Speed 10504.97 samples/sec   Loss 4.6992   LearningRate 0.0763   Epoch: 14   Global Step: 74110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:11:26,200-Speed 10491.97 samples/sec   Loss 4.7049   LearningRate 0.0762   Epoch: 14   Global Step: 74120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:11:33,987-Speed 10522.06 samples/sec   Loss 4.6947   LearningRate 0.0762   Epoch: 14   Global Step: 74130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-16 07:11:41,781-Speed 10512.29 samples/sec   Loss 4.6868   LearningRate 0.0761   Epoch: 14   Global Step: 74140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:11:49,594-Speed 10486.71 samples/sec   Loss 4.6814   LearningRate 0.0761   Epoch: 14   Global Step: 74150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:11:57,411-Speed 10480.51 samples/sec   Loss 4.6785   LearningRate 0.0760   Epoch: 14   Global Step: 74160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:12:05,246-Speed 10456.68 samples/sec   Loss 4.7334   LearningRate 0.0759   Epoch: 14   Global Step: 74170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:12:13,041-Speed 10510.93 samples/sec   Loss 4.6826   LearningRate 0.0759   Epoch: 14   Global Step: 74180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:12:20,849-Speed 10492.98 samples/sec   Loss 4.6550   LearningRate 0.0758   Epoch: 14   Global Step: 74190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:12:28,635-Speed 10523.40 samples/sec   Loss 4.6571   LearningRate 0.0758   Epoch: 14   Global Step: 74200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:12:36,460-Speed 10470.15 samples/sec   Loss 4.6672   LearningRate 0.0757   Epoch: 14   Global Step: 74210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:12:44,248-Speed 10520.32 samples/sec   Loss 4.6928   LearningRate 0.0757   Epoch: 14   Global Step: 74220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:12:52,050-Speed 10501.76 samples/sec   Loss 4.7244   LearningRate 0.0756   Epoch: 14   Global Step: 74230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:12:59,846-Speed 10509.59 samples/sec   Loss 4.6723   LearningRate 0.0756   Epoch: 14   Global Step: 74240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:13:07,630-Speed 10524.68 samples/sec   Loss 4.7083   LearningRate 0.0755   Epoch: 14   Global Step: 74250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:13:15,406-Speed 10536.02 samples/sec   Loss 4.6814   LearningRate 0.0755   Epoch: 14   Global Step: 74260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:13:23,195-Speed 10519.09 samples/sec   Loss 4.6447   LearningRate 0.0754   Epoch: 14   Global Step: 74270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:13:30,998-Speed 10499.79 samples/sec   Loss 4.6175   LearningRate 0.0754   Epoch: 14   Global Step: 74280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:13:38,787-Speed 10519.21 samples/sec   Loss 4.6375   LearningRate 0.0753   Epoch: 14   Global Step: 74290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:13:46,553-Speed 10551.15 samples/sec   Loss 4.6444   LearningRate 0.0753   Epoch: 14   Global Step: 74300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:13:54,366-Speed 10485.70 samples/sec   Loss 4.6421   LearningRate 0.0752   Epoch: 14   Global Step: 74310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:14:02,182-Speed 10481.85 samples/sec   Loss 4.6340   LearningRate 0.0752   Epoch: 14   Global Step: 74320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:14:09,969-Speed 10521.75 samples/sec   Loss 4.6754   LearningRate 0.0751   Epoch: 14   Global Step: 74330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:14:17,807-Speed 10453.43 samples/sec   Loss 4.6517   LearningRate 0.0751   Epoch: 14   Global Step: 74340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:14:25,613-Speed 10495.58 samples/sec   Loss 4.6640   LearningRate 0.0750   Epoch: 14   Global Step: 74350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:14:33,449-Speed 10455.31 samples/sec   Loss 4.6218   LearningRate 0.0750   Epoch: 14   Global Step: 74360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:14:41,304-Speed 10431.80 samples/sec   Loss 4.6624   LearningRate 0.0749   Epoch: 14   Global Step: 74370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:14:49,090-Speed 10522.44 samples/sec   Loss 4.6695   LearningRate 0.0749   Epoch: 14   Global Step: 74380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:14:56,875-Speed 10525.06 samples/sec   Loss 4.6721   LearningRate 0.0748   Epoch: 14   Global Step: 74390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:15:04,665-Speed 10516.02 samples/sec   Loss 4.6904   LearningRate 0.0748   Epoch: 14   Global Step: 74400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:15:12,452-Speed 10522.62 samples/sec   Loss 4.6611   LearningRate 0.0747   Epoch: 14   Global Step: 74410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:15:20,242-Speed 10516.43 samples/sec   Loss 4.6704   LearningRate 0.0747   Epoch: 14   Global Step: 74420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:15:28,042-Speed 10504.98 samples/sec   Loss 4.6399   LearningRate 0.0746   Epoch: 14   Global Step: 74430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:15:35,833-Speed 10516.00 samples/sec   Loss 4.6505   LearningRate 0.0746   Epoch: 14   Global Step: 74440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:15:43,608-Speed 10536.94 samples/sec   Loss 4.6448   LearningRate 0.0745   Epoch: 14   Global Step: 74450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:15:51,418-Speed 10493.91 samples/sec   Loss 4.6039   LearningRate 0.0745   Epoch: 14   Global Step: 74460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:15:59,192-Speed 10538.36 samples/sec   Loss 4.6397   LearningRate 0.0744   Epoch: 14   Global Step: 74470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:16:06,972-Speed 10530.94 samples/sec   Loss 4.6565   LearningRate 0.0744   Epoch: 14   Global Step: 74480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:16:14,768-Speed 10509.49 samples/sec   Loss 4.6599   LearningRate 0.0743   Epoch: 14   Global Step: 74490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:16:22,596-Speed 10467.27 samples/sec   Loss 4.6626   LearningRate 0.0743   Epoch: 14   Global Step: 74500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:16:30,419-Speed 10477.53 samples/sec   Loss 4.6417   LearningRate 0.0742   Epoch: 14   Global Step: 74510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:16:38,226-Speed 10495.17 samples/sec   Loss 4.6287   LearningRate 0.0742   Epoch: 14   Global Step: 74520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:16:46,061-Speed 10456.04 samples/sec   Loss 4.6534   LearningRate 0.0741   Epoch: 14   Global Step: 74530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:16:53,869-Speed 10494.53 samples/sec   Loss 4.6492   LearningRate 0.0741   Epoch: 14   Global Step: 74540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:17:01,645-Speed 10535.34 samples/sec   Loss 4.5901   LearningRate 0.0740   Epoch: 14   Global Step: 74550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:17:09,426-Speed 10530.38 samples/sec   Loss 4.6235   LearningRate 0.0740   Epoch: 14   Global Step: 74560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:17:17,204-Speed 10534.08 samples/sec   Loss 4.6175   LearningRate 0.0739   Epoch: 14   Global Step: 74570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:17:24,983-Speed 10533.12 samples/sec   Loss 4.6544   LearningRate 0.0739   Epoch: 14   Global Step: 74580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:17:32,779-Speed 10508.95 samples/sec   Loss 4.6454   LearningRate 0.0738   Epoch: 14   Global Step: 74590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:17:40,592-Speed 10503.31 samples/sec   Loss 4.6495   LearningRate 0.0738   Epoch: 14   Global Step: 74600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:17:48,398-Speed 10495.63 samples/sec   Loss 4.6031   LearningRate 0.0737   Epoch: 14   Global Step: 74610   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-01-16 07:17:56,179-Speed 10529.72 samples/sec   Loss 4.6663   LearningRate 0.0736   Epoch: 14   Global Step: 74620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:18:03,953-Speed 10538.68 samples/sec   Loss 4.6424   LearningRate 0.0736   Epoch: 14   Global Step: 74630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:18:11,745-Speed 10514.98 samples/sec   Loss 4.6318   LearningRate 0.0735   Epoch: 14   Global Step: 74640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:18:19,517-Speed 10542.82 samples/sec   Loss 4.6347   LearningRate 0.0735   Epoch: 14   Global Step: 74650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:18:27,298-Speed 10528.78 samples/sec   Loss 4.6225   LearningRate 0.0734   Epoch: 14   Global Step: 74660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:18:35,100-Speed 10501.01 samples/sec   Loss 4.6396   LearningRate 0.0734   Epoch: 14   Global Step: 74670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:18:42,902-Speed 10501.17 samples/sec   Loss 4.6339   LearningRate 0.0733   Epoch: 14   Global Step: 74680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:18:50,669-Speed 10548.51 samples/sec   Loss 4.5678   LearningRate 0.0733   Epoch: 14   Global Step: 74690   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-16 07:18:58,464-Speed 10511.18 samples/sec   Loss 4.6140   LearningRate 0.0732   Epoch: 14   Global Step: 74700   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-16 07:19:06,272-Speed 10493.09 samples/sec   Loss 4.5846   LearningRate 0.0732   Epoch: 14   Global Step: 74710   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-16 07:19:14,062-Speed 10517.25 samples/sec   Loss 4.6001   LearningRate 0.0731   Epoch: 14   Global Step: 74720   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-16 07:19:21,834-Speed 10545.84 samples/sec   Loss 4.6261   LearningRate 0.0731   Epoch: 14   Global Step: 74730   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-16 07:19:29,604-Speed 10544.89 samples/sec   Loss 4.6058   LearningRate 0.0730   Epoch: 14   Global Step: 74740   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-16 07:19:37,399-Speed 10514.01 samples/sec   Loss 4.5853   LearningRate 0.0730   Epoch: 14   Global Step: 74750   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-16 07:19:45,183-Speed 10525.39 samples/sec   Loss 4.6234   LearningRate 0.0729   Epoch: 14   Global Step: 74760   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-16 07:19:52,972-Speed 10518.38 samples/sec   Loss 4.6147   LearningRate 0.0729   Epoch: 14   Global Step: 74770   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-16 07:20:00,754-Speed 10528.56 samples/sec   Loss 4.5677   LearningRate 0.0728   Epoch: 14   Global Step: 74780   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-16 07:20:08,559-Speed 10497.85 samples/sec   Loss 4.6336   LearningRate 0.0728   Epoch: 14   Global Step: 74790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:20:16,402-Speed 10446.08 samples/sec   Loss 4.5752   LearningRate 0.0727   Epoch: 14   Global Step: 74800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:20:24,195-Speed 10513.79 samples/sec   Loss 4.5813   LearningRate 0.0727   Epoch: 14   Global Step: 74810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:20:31,983-Speed 10519.61 samples/sec   Loss 4.5896   LearningRate 0.0726   Epoch: 14   Global Step: 74820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:20:39,761-Speed 10534.52 samples/sec   Loss 4.5728   LearningRate 0.0726   Epoch: 14   Global Step: 74830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:20:47,614-Speed 10432.46 samples/sec   Loss 4.5843   LearningRate 0.0725   Epoch: 14   Global Step: 74840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:20:55,411-Speed 10507.79 samples/sec   Loss 4.5994   LearningRate 0.0725   Epoch: 14   Global Step: 74850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:21:03,182-Speed 10542.23 samples/sec   Loss 4.6023   LearningRate 0.0724   Epoch: 14   Global Step: 74860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:21:10,970-Speed 10522.10 samples/sec   Loss 4.5894   LearningRate 0.0724   Epoch: 14   Global Step: 74870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:21:18,727-Speed 10561.48 samples/sec   Loss 4.5858   LearningRate 0.0723   Epoch: 14   Global Step: 74880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:21:26,501-Speed 10539.42 samples/sec   Loss 4.5649   LearningRate 0.0723   Epoch: 14   Global Step: 74890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:21:34,283-Speed 10528.59 samples/sec   Loss 4.5644   LearningRate 0.0722   Epoch: 14   Global Step: 74900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:21:42,070-Speed 10526.37 samples/sec   Loss 4.6080   LearningRate 0.0722   Epoch: 14   Global Step: 74910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:21:49,858-Speed 10520.67 samples/sec   Loss 4.5853   LearningRate 0.0721   Epoch: 14   Global Step: 74920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:21:57,660-Speed 10500.10 samples/sec   Loss 4.5719   LearningRate 0.0721   Epoch: 14   Global Step: 74930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:22:05,446-Speed 10523.54 samples/sec   Loss 4.5804   LearningRate 0.0720   Epoch: 14   Global Step: 74940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:22:13,233-Speed 10521.36 samples/sec   Loss 4.6030   LearningRate 0.0720   Epoch: 14   Global Step: 74950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:22:21,062-Speed 10465.26 samples/sec   Loss 4.6020   LearningRate 0.0719   Epoch: 14   Global Step: 74960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:22:28,860-Speed 10505.97 samples/sec   Loss 4.5647   LearningRate 0.0719   Epoch: 14   Global Step: 74970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:22:36,656-Speed 10515.48 samples/sec   Loss 4.5539   LearningRate 0.0718   Epoch: 14   Global Step: 74980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:22:44,449-Speed 10519.00 samples/sec   Loss 4.5435   LearningRate 0.0718   Epoch: 14   Global Step: 74990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:22:52,237-Speed 10520.11 samples/sec   Loss 4.5734   LearningRate 0.0717   Epoch: 14   Global Step: 75000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:23:00,022-Speed 10523.85 samples/sec   Loss 4.5342   LearningRate 0.0717   Epoch: 14   Global Step: 75010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:23:07,818-Speed 10508.78 samples/sec   Loss 4.5967   LearningRate 0.0716   Epoch: 14   Global Step: 75020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:23:15,609-Speed 10517.08 samples/sec   Loss 4.6034   LearningRate 0.0716   Epoch: 14   Global Step: 75030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:23:23,402-Speed 10513.60 samples/sec   Loss 4.5656   LearningRate 0.0715   Epoch: 14   Global Step: 75040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:23:31,218-Speed 10481.84 samples/sec   Loss 4.5738   LearningRate 0.0715   Epoch: 14   Global Step: 75050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:23:39,026-Speed 10492.98 samples/sec   Loss 4.5710   LearningRate 0.0714   Epoch: 14   Global Step: 75060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:23:46,838-Speed 10488.53 samples/sec   Loss 4.5396   LearningRate 0.0714   Epoch: 14   Global Step: 75070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:23:54,623-Speed 10524.27 samples/sec   Loss 4.6115   LearningRate 0.0713   Epoch: 14   Global Step: 75080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:24:02,419-Speed 10509.08 samples/sec   Loss 4.5201   LearningRate 0.0713   Epoch: 14   Global Step: 75090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:24:10,265-Speed 10441.86 samples/sec   Loss 4.5265   LearningRate 0.0712   Epoch: 14   Global Step: 75100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:24:18,059-Speed 10512.78 samples/sec   Loss 4.5574   LearningRate 0.0712   Epoch: 14   Global Step: 75110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:24:25,846-Speed 10521.56 samples/sec   Loss 4.5457   LearningRate 0.0711   Epoch: 14   Global Step: 75120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:24:33,660-Speed 10484.30 samples/sec   Loss 4.5705   LearningRate 0.0711   Epoch: 14   Global Step: 75130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:24:41,473-Speed 10486.97 samples/sec   Loss 4.5859   LearningRate 0.0710   Epoch: 14   Global Step: 75140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:24:49,254-Speed 10530.30 samples/sec   Loss 4.5781   LearningRate 0.0710   Epoch: 14   Global Step: 75150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:24:57,050-Speed 10508.73 samples/sec   Loss 4.5596   LearningRate 0.0709   Epoch: 14   Global Step: 75160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:25:04,842-Speed 10515.19 samples/sec   Loss 4.5725   LearningRate 0.0709   Epoch: 14   Global Step: 75170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:25:12,661-Speed 10478.09 samples/sec   Loss 4.5604   LearningRate 0.0708   Epoch: 14   Global Step: 75180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:25:20,466-Speed 10498.78 samples/sec   Loss 4.5846   LearningRate 0.0708   Epoch: 14   Global Step: 75190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:25:28,238-Speed 10541.41 samples/sec   Loss 4.5565   LearningRate 0.0707   Epoch: 14   Global Step: 75200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:25:36,022-Speed 10525.63 samples/sec   Loss 4.5471   LearningRate 0.0707   Epoch: 14   Global Step: 75210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:25:43,803-Speed 10528.94 samples/sec   Loss 4.5430   LearningRate 0.0706   Epoch: 14   Global Step: 75220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:25:51,589-Speed 10522.97 samples/sec   Loss 4.5303   LearningRate 0.0706   Epoch: 14   Global Step: 75230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:25:59,380-Speed 10516.29 samples/sec   Loss 4.5205   LearningRate 0.0705   Epoch: 14   Global Step: 75240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:26:07,184-Speed 10498.20 samples/sec   Loss 4.5119   LearningRate 0.0705   Epoch: 14   Global Step: 75250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:26:14,986-Speed 10501.86 samples/sec   Loss 4.5372   LearningRate 0.0704   Epoch: 14   Global Step: 75260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:26:22,770-Speed 10525.68 samples/sec   Loss 4.5306   LearningRate 0.0704   Epoch: 14   Global Step: 75270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:26:30,583-Speed 10485.91 samples/sec   Loss 4.5874   LearningRate 0.0703   Epoch: 14   Global Step: 75280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:26:38,400-Speed 10482.08 samples/sec   Loss 4.4766   LearningRate 0.0703   Epoch: 14   Global Step: 75290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:26:46,176-Speed 10536.13 samples/sec   Loss 4.5001   LearningRate 0.0702   Epoch: 14   Global Step: 75300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:26:53,992-Speed 10481.67 samples/sec   Loss 4.5204   LearningRate 0.0702   Epoch: 14   Global Step: 75310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:27:01,795-Speed 10499.49 samples/sec   Loss 4.5580   LearningRate 0.0701   Epoch: 14   Global Step: 75320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:27:09,604-Speed 10492.81 samples/sec   Loss 4.5613   LearningRate 0.0701   Epoch: 14   Global Step: 75330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:27:17,417-Speed 10485.98 samples/sec   Loss 4.5051   LearningRate 0.0700   Epoch: 14   Global Step: 75340   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-16 07:27:25,215-Speed 10507.59 samples/sec   Loss 4.5275   LearningRate 0.0700   Epoch: 14   Global Step: 75350   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-16 07:27:33,022-Speed 10493.36 samples/sec   Loss 4.5226   LearningRate 0.0699   Epoch: 14   Global Step: 75360   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-16 07:27:40,797-Speed 10538.92 samples/sec   Loss 4.5055   LearningRate 0.0699   Epoch: 14   Global Step: 75370   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-16 07:27:48,565-Speed 10546.60 samples/sec   Loss 4.4991   LearningRate 0.0698   Epoch: 14   Global Step: 75380   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-16 07:27:56,355-Speed 10517.74 samples/sec   Loss 4.5611   LearningRate 0.0698   Epoch: 14   Global Step: 75390   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-16 07:28:04,161-Speed 10495.28 samples/sec   Loss 4.5115   LearningRate 0.0697   Epoch: 14   Global Step: 75400   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-16 07:28:11,938-Speed 10536.31 samples/sec   Loss 4.5066   LearningRate 0.0697   Epoch: 14   Global Step: 75410   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-16 07:28:19,751-Speed 10485.20 samples/sec   Loss 4.5301   LearningRate 0.0697   Epoch: 14   Global Step: 75420   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-16 07:28:27,566-Speed 10484.09 samples/sec   Loss 4.5095   LearningRate 0.0696   Epoch: 14   Global Step: 75430   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-16 07:28:35,362-Speed 10509.08 samples/sec   Loss 4.5352   LearningRate 0.0696   Epoch: 14   Global Step: 75440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:28:43,191-Speed 10466.05 samples/sec   Loss 4.5041   LearningRate 0.0695   Epoch: 14   Global Step: 75450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:28:50,977-Speed 10523.06 samples/sec   Loss 4.5361   LearningRate 0.0695   Epoch: 14   Global Step: 75460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:28:58,791-Speed 10484.70 samples/sec   Loss 4.4982   LearningRate 0.0694   Epoch: 14   Global Step: 75470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:29:06,587-Speed 10510.06 samples/sec   Loss 4.5056   LearningRate 0.0694   Epoch: 14   Global Step: 75480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:29:14,407-Speed 10477.04 samples/sec   Loss 4.4706   LearningRate 0.0693   Epoch: 14   Global Step: 75490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:29:22,218-Speed 10488.06 samples/sec   Loss 4.5316   LearningRate 0.0693   Epoch: 14   Global Step: 75500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:29:30,035-Speed 10481.16 samples/sec   Loss 4.5276   LearningRate 0.0692   Epoch: 14   Global Step: 75510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:29:37,816-Speed 10529.71 samples/sec   Loss 4.5413   LearningRate 0.0692   Epoch: 14   Global Step: 75520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:29:45,627-Speed 10490.63 samples/sec   Loss 4.5347   LearningRate 0.0691   Epoch: 14   Global Step: 75530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:29:53,431-Speed 10497.03 samples/sec   Loss 4.5194   LearningRate 0.0691   Epoch: 14   Global Step: 75540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:30:01,203-Speed 10541.99 samples/sec   Loss 4.4952   LearningRate 0.0690   Epoch: 14   Global Step: 75550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:30:08,973-Speed 10544.17 samples/sec   Loss 4.4754   LearningRate 0.0690   Epoch: 14   Global Step: 75560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:30:16,820-Speed 10441.32 samples/sec   Loss 4.5391   LearningRate 0.0689   Epoch: 14   Global Step: 75570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:30:24,662-Speed 10446.83 samples/sec   Loss 4.4925   LearningRate 0.0689   Epoch: 14   Global Step: 75580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:30:32,488-Speed 10469.93 samples/sec   Loss 4.5122   LearningRate 0.0688   Epoch: 14   Global Step: 75590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:30:40,279-Speed 10515.79 samples/sec   Loss 4.5014   LearningRate 0.0688   Epoch: 14   Global Step: 75600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:30:48,069-Speed 10517.96 samples/sec   Loss 4.4871   LearningRate 0.0687   Epoch: 14   Global Step: 75610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:30:55,927-Speed 10426.40 samples/sec   Loss 4.4993   LearningRate 0.0687   Epoch: 14   Global Step: 75620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:31:03,768-Speed 10449.00 samples/sec   Loss 4.4881   LearningRate 0.0686   Epoch: 14   Global Step: 75630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:31:11,563-Speed 10510.01 samples/sec   Loss 4.4877   LearningRate 0.0686   Epoch: 14   Global Step: 75640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:31:19,378-Speed 10484.77 samples/sec   Loss 4.5050   LearningRate 0.0685   Epoch: 14   Global Step: 75650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:31:27,238-Speed 10423.36 samples/sec   Loss 4.4637   LearningRate 0.0685   Epoch: 14   Global Step: 75660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:31:35,058-Speed 10477.55 samples/sec   Loss 4.5052   LearningRate 0.0684   Epoch: 14   Global Step: 75670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:31:42,876-Speed 10479.28 samples/sec   Loss 4.4766   LearningRate 0.0684   Epoch: 14   Global Step: 75680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:31:50,669-Speed 10514.44 samples/sec   Loss 4.5031   LearningRate 0.0683   Epoch: 14   Global Step: 75690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:31:58,495-Speed 10469.16 samples/sec   Loss 4.4715   LearningRate 0.0683   Epoch: 14   Global Step: 75700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:32:06,296-Speed 10501.44 samples/sec   Loss 4.4833   LearningRate 0.0682   Epoch: 14   Global Step: 75710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:32:14,085-Speed 10518.53 samples/sec   Loss 4.4700   LearningRate 0.0682   Epoch: 14   Global Step: 75720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:32:21,899-Speed 10485.81 samples/sec   Loss 4.4487   LearningRate 0.0681   Epoch: 14   Global Step: 75730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:32:29,709-Speed 10493.95 samples/sec   Loss 4.4832   LearningRate 0.0681   Epoch: 14   Global Step: 75740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:32:37,512-Speed 10500.75 samples/sec   Loss 4.5085   LearningRate 0.0680   Epoch: 14   Global Step: 75750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:32:45,316-Speed 10497.89 samples/sec   Loss 4.4872   LearningRate 0.0680   Epoch: 14   Global Step: 75760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:32:53,131-Speed 10484.87 samples/sec   Loss 4.4562   LearningRate 0.0679   Epoch: 14   Global Step: 75770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:33:00,927-Speed 10509.77 samples/sec   Loss 4.4595   LearningRate 0.0679   Epoch: 14   Global Step: 75780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:33:08,765-Speed 10453.27 samples/sec   Loss 4.4665   LearningRate 0.0678   Epoch: 14   Global Step: 75790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:33:16,609-Speed 10445.02 samples/sec   Loss 4.4630   LearningRate 0.0678   Epoch: 14   Global Step: 75800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:33:24,422-Speed 10485.83 samples/sec   Loss 4.4653   LearningRate 0.0677   Epoch: 14   Global Step: 75810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:33:32,217-Speed 10510.94 samples/sec   Loss 4.4160   LearningRate 0.0677   Epoch: 14   Global Step: 75820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:33:40,010-Speed 10513.37 samples/sec   Loss 4.4446   LearningRate 0.0676   Epoch: 14   Global Step: 75830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:33:47,790-Speed 10530.12 samples/sec   Loss 4.4636   LearningRate 0.0676   Epoch: 14   Global Step: 75840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:33:55,620-Speed 10464.18 samples/sec   Loss 4.4893   LearningRate 0.0675   Epoch: 14   Global Step: 75850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:34:03,477-Speed 10427.84 samples/sec   Loss 4.4208   LearningRate 0.0675   Epoch: 14   Global Step: 75860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:34:11,295-Speed 10480.25 samples/sec   Loss 4.4672   LearningRate 0.0675   Epoch: 14   Global Step: 75870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:34:19,086-Speed 10515.50 samples/sec   Loss 4.4611   LearningRate 0.0674   Epoch: 14   Global Step: 75880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:34:26,887-Speed 10502.50 samples/sec   Loss 4.4247   LearningRate 0.0674   Epoch: 14   Global Step: 75890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:34:34,654-Speed 10548.00 samples/sec   Loss 4.4367   LearningRate 0.0673   Epoch: 14   Global Step: 75900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:34:42,438-Speed 10525.75 samples/sec   Loss 4.4648   LearningRate 0.0673   Epoch: 14   Global Step: 75910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:34:50,226-Speed 10521.49 samples/sec   Loss 4.4381   LearningRate 0.0672   Epoch: 14   Global Step: 75920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:34:58,011-Speed 10523.18 samples/sec   Loss 4.3900   LearningRate 0.0672   Epoch: 14   Global Step: 75930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:35:05,788-Speed 10535.63 samples/sec   Loss 4.4570   LearningRate 0.0671   Epoch: 14   Global Step: 75940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:35:13,599-Speed 10488.95 samples/sec   Loss 4.4027   LearningRate 0.0671   Epoch: 14   Global Step: 75950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:35:21,404-Speed 10497.24 samples/sec   Loss 4.3731   LearningRate 0.0670   Epoch: 14   Global Step: 75960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:35:29,214-Speed 10491.07 samples/sec   Loss 4.4453   LearningRate 0.0670   Epoch: 14   Global Step: 75970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:35:37,001-Speed 10524.11 samples/sec   Loss 4.4215   LearningRate 0.0669   Epoch: 14   Global Step: 75980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:35:44,784-Speed 10526.45 samples/sec   Loss 4.4488   LearningRate 0.0669   Epoch: 14   Global Step: 75990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:35:52,573-Speed 10518.69 samples/sec   Loss 4.4658   LearningRate 0.0668   Epoch: 14   Global Step: 76000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:36:00,355-Speed 10528.72 samples/sec   Loss 4.4220   LearningRate 0.0668   Epoch: 14   Global Step: 76010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:36:08,148-Speed 10513.62 samples/sec   Loss 4.4488   LearningRate 0.0667   Epoch: 14   Global Step: 76020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:36:15,918-Speed 10544.41 samples/sec   Loss 4.3900   LearningRate 0.0667   Epoch: 14   Global Step: 76030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:36:23,734-Speed 10483.69 samples/sec   Loss 4.4002   LearningRate 0.0666   Epoch: 14   Global Step: 76040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:36:31,562-Speed 10471.86 samples/sec   Loss 4.4399   LearningRate 0.0666   Epoch: 14   Global Step: 76050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:36:39,402-Speed 10454.98 samples/sec   Loss 4.4367   LearningRate 0.0665   Epoch: 14   Global Step: 76060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:36:47,255-Speed 10433.90 samples/sec   Loss 4.4289   LearningRate 0.0665   Epoch: 14   Global Step: 76070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:36:55,053-Speed 10505.63 samples/sec   Loss 4.4405   LearningRate 0.0664   Epoch: 14   Global Step: 76080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:37:02,862-Speed 10492.60 samples/sec   Loss 4.4186   LearningRate 0.0664   Epoch: 14   Global Step: 76090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:37:10,655-Speed 10513.60 samples/sec   Loss 4.4212   LearningRate 0.0663   Epoch: 14   Global Step: 76100   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-01-16 07:37:18,453-Speed 10505.84 samples/sec   Loss 4.3923   LearningRate 0.0663   Epoch: 14   Global Step: 76110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:37:26,272-Speed 10479.08 samples/sec   Loss 4.4037   LearningRate 0.0662   Epoch: 14   Global Step: 76120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:37:34,097-Speed 10469.90 samples/sec   Loss 4.4153   LearningRate 0.0662   Epoch: 14   Global Step: 76130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:37:41,900-Speed 10500.58 samples/sec   Loss 4.4256   LearningRate 0.0661   Epoch: 14   Global Step: 76140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:37:49,710-Speed 10490.44 samples/sec   Loss 4.4284   LearningRate 0.0661   Epoch: 14   Global Step: 76150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:37:57,490-Speed 10536.27 samples/sec   Loss 4.4138   LearningRate 0.0661   Epoch: 14   Global Step: 76160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:38:05,281-Speed 10516.44 samples/sec   Loss 4.3932   LearningRate 0.0660   Epoch: 14   Global Step: 76170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:38:13,075-Speed 10511.76 samples/sec   Loss 4.3835   LearningRate 0.0660   Epoch: 14   Global Step: 76180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:38:20,863-Speed 10520.96 samples/sec   Loss 4.4104   LearningRate 0.0659   Epoch: 14   Global Step: 76190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:38:28,677-Speed 10484.03 samples/sec   Loss 4.4367   LearningRate 0.0659   Epoch: 14   Global Step: 76200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:38:36,508-Speed 10462.83 samples/sec   Loss 4.3986   LearningRate 0.0658   Epoch: 14   Global Step: 76210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:38:44,363-Speed 10431.40 samples/sec   Loss 4.4179   LearningRate 0.0658   Epoch: 14   Global Step: 76220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:38:52,177-Speed 10484.92 samples/sec   Loss 4.3799   LearningRate 0.0657   Epoch: 14   Global Step: 76230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:38:59,986-Speed 10491.73 samples/sec   Loss 4.3854   LearningRate 0.0657   Epoch: 14   Global Step: 76240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:39:07,774-Speed 10519.75 samples/sec   Loss 4.3560   LearningRate 0.0656   Epoch: 14   Global Step: 76250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:39:15,566-Speed 10515.75 samples/sec   Loss 4.4141   LearningRate 0.0656   Epoch: 14   Global Step: 76260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:39:23,363-Speed 10507.10 samples/sec   Loss 4.3862   LearningRate 0.0655   Epoch: 14   Global Step: 76270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:39:31,150-Speed 10522.12 samples/sec   Loss 4.3836   LearningRate 0.0655   Epoch: 14   Global Step: 76280   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-16 07:39:38,957-Speed 10496.38 samples/sec   Loss 4.4113   LearningRate 0.0654   Epoch: 14   Global Step: 76290   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-16 07:39:46,788-Speed 10464.78 samples/sec   Loss 4.4113   LearningRate 0.0654   Epoch: 14   Global Step: 76300   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-16 07:39:54,604-Speed 10482.46 samples/sec   Loss 4.4022   LearningRate 0.0653   Epoch: 14   Global Step: 76310   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-16 07:40:02,398-Speed 10512.72 samples/sec   Loss 4.3880   LearningRate 0.0653   Epoch: 14   Global Step: 76320   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-16 07:40:10,200-Speed 10501.27 samples/sec   Loss 4.3828   LearningRate 0.0652   Epoch: 14   Global Step: 76330   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-16 07:40:17,976-Speed 10537.29 samples/sec   Loss 4.3828   LearningRate 0.0652   Epoch: 14   Global Step: 76340   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-16 07:40:25,763-Speed 10520.09 samples/sec   Loss 4.3551   LearningRate 0.0651   Epoch: 14   Global Step: 76350   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-16 07:40:33,578-Speed 10483.73 samples/sec   Loss 4.3801   LearningRate 0.0651   Epoch: 14   Global Step: 76360   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-16 07:40:41,393-Speed 10485.25 samples/sec   Loss 4.3864   LearningRate 0.0650   Epoch: 14   Global Step: 76370   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-16 07:40:49,206-Speed 10486.19 samples/sec   Loss 4.3872   LearningRate 0.0650   Epoch: 14   Global Step: 76380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:40:56,993-Speed 10520.97 samples/sec   Loss 4.3719   LearningRate 0.0650   Epoch: 14   Global Step: 76390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:41:04,816-Speed 10472.06 samples/sec   Loss 4.3771   LearningRate 0.0649   Epoch: 14   Global Step: 76400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:41:12,597-Speed 10530.97 samples/sec   Loss 4.3796   LearningRate 0.0649   Epoch: 14   Global Step: 76410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:41:20,355-Speed 10561.40 samples/sec   Loss 4.3885   LearningRate 0.0648   Epoch: 14   Global Step: 76420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:41:28,186-Speed 10462.08 samples/sec   Loss 4.3917   LearningRate 0.0648   Epoch: 14   Global Step: 76430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:41:35,970-Speed 10525.60 samples/sec   Loss 4.3541   LearningRate 0.0647   Epoch: 14   Global Step: 76440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:41:43,751-Speed 10529.94 samples/sec   Loss 4.3901   LearningRate 0.0647   Epoch: 14   Global Step: 76450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:41:51,562-Speed 10489.23 samples/sec   Loss 4.3530   LearningRate 0.0646   Epoch: 14   Global Step: 76460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:41:59,389-Speed 10467.94 samples/sec   Loss 4.3201   LearningRate 0.0646   Epoch: 14   Global Step: 76470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:42:07,222-Speed 10459.79 samples/sec   Loss 4.3531   LearningRate 0.0645   Epoch: 14   Global Step: 76480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:42:15,026-Speed 10498.34 samples/sec   Loss 4.3321   LearningRate 0.0645   Epoch: 14   Global Step: 76490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:42:22,826-Speed 10503.75 samples/sec   Loss 4.3485   LearningRate 0.0644   Epoch: 14   Global Step: 76500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:42:30,645-Speed 10478.43 samples/sec   Loss 4.3716   LearningRate 0.0644   Epoch: 14   Global Step: 76510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:42:38,418-Speed 10539.75 samples/sec   Loss 4.3419   LearningRate 0.0643   Epoch: 14   Global Step: 76520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:42:46,241-Speed 10473.90 samples/sec   Loss 4.3424   LearningRate 0.0643   Epoch: 14   Global Step: 76530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:42:54,040-Speed 10505.54 samples/sec   Loss 4.3578   LearningRate 0.0642   Epoch: 14   Global Step: 76540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:43:01,861-Speed 10476.16 samples/sec   Loss 4.2993   LearningRate 0.0642   Epoch: 14   Global Step: 76550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:43:09,673-Speed 10487.27 samples/sec   Loss 4.3698   LearningRate 0.0641   Epoch: 14   Global Step: 76560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:43:17,477-Speed 10498.54 samples/sec   Loss 4.3612   LearningRate 0.0641   Epoch: 14   Global Step: 76570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:43:25,285-Speed 10492.87 samples/sec   Loss 4.3875   LearningRate 0.0641   Epoch: 14   Global Step: 76580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:43:33,102-Speed 10482.05 samples/sec   Loss 4.3465   LearningRate 0.0640   Epoch: 14   Global Step: 76590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:43:40,884-Speed 10527.97 samples/sec   Loss 4.3516   LearningRate 0.0640   Epoch: 14   Global Step: 76600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:43:48,679-Speed 10511.54 samples/sec   Loss 4.3669   LearningRate 0.0639   Epoch: 14   Global Step: 76610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:43:56,465-Speed 10522.42 samples/sec   Loss 4.3470   LearningRate 0.0639   Epoch: 14   Global Step: 76620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:44:04,245-Speed 10531.55 samples/sec   Loss 4.3332   LearningRate 0.0638   Epoch: 14   Global Step: 76630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:44:12,029-Speed 10524.97 samples/sec   Loss 4.3418   LearningRate 0.0638   Epoch: 14   Global Step: 76640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:44:19,820-Speed 10516.12 samples/sec   Loss 4.3302   LearningRate 0.0637   Epoch: 14   Global Step: 76650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:44:27,615-Speed 10509.51 samples/sec   Loss 4.3442   LearningRate 0.0637   Epoch: 14   Global Step: 76660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:44:35,399-Speed 10526.52 samples/sec   Loss 4.3611   LearningRate 0.0636   Epoch: 14   Global Step: 76670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:44:43,178-Speed 10531.94 samples/sec   Loss 4.3289   LearningRate 0.0636   Epoch: 14   Global Step: 76680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:44:50,973-Speed 10510.22 samples/sec   Loss 4.3422   LearningRate 0.0635   Epoch: 14   Global Step: 76690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:44:58,764-Speed 10515.99 samples/sec   Loss 4.3411   LearningRate 0.0635   Epoch: 14   Global Step: 76700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:45:06,553-Speed 10519.09 samples/sec   Loss 4.3493   LearningRate 0.0634   Epoch: 14   Global Step: 76710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:45:14,325-Speed 10541.95 samples/sec   Loss 4.3609   LearningRate 0.0634   Epoch: 14   Global Step: 76720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:45:22,108-Speed 10526.43 samples/sec   Loss 4.3570   LearningRate 0.0633   Epoch: 14   Global Step: 76730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:45:29,969-Speed 10421.45 samples/sec   Loss 4.3464   LearningRate 0.0633   Epoch: 14   Global Step: 76740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:45:37,782-Speed 10488.37 samples/sec   Loss 4.3408   LearningRate 0.0632   Epoch: 14   Global Step: 76750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:45:45,560-Speed 10532.60 samples/sec   Loss 4.3290   LearningRate 0.0632   Epoch: 14   Global Step: 76760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:45:53,327-Speed 10548.93 samples/sec   Loss 4.3409   LearningRate 0.0632   Epoch: 14   Global Step: 76770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:46:01,141-Speed 10486.18 samples/sec   Loss 4.3545   LearningRate 0.0631   Epoch: 14   Global Step: 76780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:46:08,952-Speed 10488.32 samples/sec   Loss 4.3215   LearningRate 0.0631   Epoch: 14   Global Step: 76790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:46:16,773-Speed 10475.66 samples/sec   Loss 4.3039   LearningRate 0.0630   Epoch: 14   Global Step: 76800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:46:24,576-Speed 10500.04 samples/sec   Loss 4.3397   LearningRate 0.0630   Epoch: 14   Global Step: 76810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:46:32,365-Speed 10519.79 samples/sec   Loss 4.2709   LearningRate 0.0629   Epoch: 14   Global Step: 76820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:46:40,159-Speed 10511.87 samples/sec   Loss 4.3335   LearningRate 0.0629   Epoch: 14   Global Step: 76830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:46:47,963-Speed 10498.45 samples/sec   Loss 4.3389   LearningRate 0.0628   Epoch: 14   Global Step: 76840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:46:55,760-Speed 10506.49 samples/sec   Loss 4.3275   LearningRate 0.0628   Epoch: 14   Global Step: 76850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:47:03,552-Speed 10516.45 samples/sec   Loss 4.3233   LearningRate 0.0627   Epoch: 14   Global Step: 76860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:47:11,348-Speed 10509.54 samples/sec   Loss 4.3004   LearningRate 0.0627   Epoch: 14   Global Step: 76870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:47:19,134-Speed 10521.74 samples/sec   Loss 4.2626   LearningRate 0.0626   Epoch: 14   Global Step: 76880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:47:26,918-Speed 10530.47 samples/sec   Loss 4.3035   LearningRate 0.0626   Epoch: 14   Global Step: 76890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:47:34,707-Speed 10519.15 samples/sec   Loss 4.2876   LearningRate 0.0625   Epoch: 14   Global Step: 76900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:47:42,504-Speed 10509.03 samples/sec   Loss 4.3370   LearningRate 0.0625   Epoch: 14   Global Step: 76910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:47:50,329-Speed 10475.02 samples/sec   Loss 4.3219   LearningRate 0.0625   Epoch: 14   Global Step: 76920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:47:58,109-Speed 10530.85 samples/sec   Loss 4.3249   LearningRate 0.0624   Epoch: 14   Global Step: 76930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:48:05,893-Speed 10524.47 samples/sec   Loss 4.2553   LearningRate 0.0624   Epoch: 14   Global Step: 76940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:48:13,682-Speed 10519.18 samples/sec   Loss 4.3025   LearningRate 0.0623   Epoch: 14   Global Step: 76950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:48:21,471-Speed 10519.99 samples/sec   Loss 4.3049   LearningRate 0.0623   Epoch: 14   Global Step: 76960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:48:29,247-Speed 10535.72 samples/sec   Loss 4.2981   LearningRate 0.0622   Epoch: 14   Global Step: 76970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:48:37,028-Speed 10530.57 samples/sec   Loss 4.2853   LearningRate 0.0622   Epoch: 14   Global Step: 76980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:48:44,816-Speed 10520.31 samples/sec   Loss 4.3074   LearningRate 0.0621   Epoch: 14   Global Step: 76990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:48:52,595-Speed 10531.01 samples/sec   Loss 4.2818   LearningRate 0.0621   Epoch: 14   Global Step: 77000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:49:00,382-Speed 10522.03 samples/sec   Loss 4.2849   LearningRate 0.0620   Epoch: 14   Global Step: 77010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:49:08,181-Speed 10506.51 samples/sec   Loss 4.2719   LearningRate 0.0620   Epoch: 14   Global Step: 77020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:49:15,971-Speed 10517.28 samples/sec   Loss 4.2802   LearningRate 0.0619   Epoch: 14   Global Step: 77030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:49:23,823-Speed 10435.15 samples/sec   Loss 4.2898   LearningRate 0.0619   Epoch: 14   Global Step: 77040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:49:31,623-Speed 10503.82 samples/sec   Loss 4.3113   LearningRate 0.0618   Epoch: 14   Global Step: 77050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:49:39,432-Speed 10492.47 samples/sec   Loss 4.2782   LearningRate 0.0618   Epoch: 14   Global Step: 77060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:49:47,235-Speed 10499.31 samples/sec   Loss 4.3200   LearningRate 0.0618   Epoch: 14   Global Step: 77070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:49:55,078-Speed 10447.00 samples/sec   Loss 4.2845   LearningRate 0.0617   Epoch: 14   Global Step: 77080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:50:02,905-Speed 10467.45 samples/sec   Loss 4.2647   LearningRate 0.0617   Epoch: 14   Global Step: 77090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:50:10,696-Speed 10515.63 samples/sec   Loss 4.2857   LearningRate 0.0616   Epoch: 14   Global Step: 77100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:50:18,508-Speed 10488.99 samples/sec   Loss 4.2839   LearningRate 0.0616   Epoch: 14   Global Step: 77110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:50:26,294-Speed 10522.55 samples/sec   Loss 4.2635   LearningRate 0.0615   Epoch: 14   Global Step: 77120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:50:34,080-Speed 10522.13 samples/sec   Loss 4.2631   LearningRate 0.0615   Epoch: 14   Global Step: 77130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:50:41,866-Speed 10522.44 samples/sec   Loss 4.2780   LearningRate 0.0614   Epoch: 14   Global Step: 77140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:50:49,669-Speed 10500.63 samples/sec   Loss 4.2741   LearningRate 0.0614   Epoch: 14   Global Step: 77150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:50:57,457-Speed 10519.91 samples/sec   Loss 4.2296   LearningRate 0.0613   Epoch: 14   Global Step: 77160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:51:05,248-Speed 10515.99 samples/sec   Loss 4.2717   LearningRate 0.0613   Epoch: 14   Global Step: 77170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:51:13,054-Speed 10495.64 samples/sec   Loss 4.2741   LearningRate 0.0612   Epoch: 14   Global Step: 77180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:51:20,850-Speed 10509.04 samples/sec   Loss 4.2787   LearningRate 0.0612   Epoch: 14   Global Step: 77190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:51:28,643-Speed 10514.26 samples/sec   Loss 4.2442   LearningRate 0.0612   Epoch: 14   Global Step: 77200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:51:36,468-Speed 10470.23 samples/sec   Loss 4.2379   LearningRate 0.0611   Epoch: 14   Global Step: 77210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:51:44,295-Speed 10468.29 samples/sec   Loss 4.2788   LearningRate 0.0611   Epoch: 14   Global Step: 77220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:51:52,123-Speed 10466.19 samples/sec   Loss 4.2544   LearningRate 0.0610   Epoch: 14   Global Step: 77230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:51:59,923-Speed 10503.88 samples/sec   Loss 4.2577   LearningRate 0.0610   Epoch: 14   Global Step: 77240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:52:07,708-Speed 10524.54 samples/sec   Loss 4.2472   LearningRate 0.0609   Epoch: 14   Global Step: 77250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:52:15,531-Speed 10473.90 samples/sec   Loss 4.2449   LearningRate 0.0609   Epoch: 14   Global Step: 77260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:52:23,342-Speed 10489.34 samples/sec   Loss 4.2776   LearningRate 0.0608   Epoch: 14   Global Step: 77270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:52:31,135-Speed 10512.34 samples/sec   Loss 4.2608   LearningRate 0.0608   Epoch: 14   Global Step: 77280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:52:38,908-Speed 10540.68 samples/sec   Loss 4.2519   LearningRate 0.0607   Epoch: 14   Global Step: 77290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:52:46,714-Speed 10496.05 samples/sec   Loss 4.2679   LearningRate 0.0607   Epoch: 14   Global Step: 77300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:52:54,531-Speed 10482.58 samples/sec   Loss 4.2289   LearningRate 0.0606   Epoch: 14   Global Step: 77310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:53:02,332-Speed 10502.77 samples/sec   Loss 4.2316   LearningRate 0.0606   Epoch: 14   Global Step: 77320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:53:10,138-Speed 10495.74 samples/sec   Loss 4.2442   LearningRate 0.0606   Epoch: 14   Global Step: 77330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:53:17,959-Speed 10474.86 samples/sec   Loss 4.2348   LearningRate 0.0605   Epoch: 14   Global Step: 77340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:53:25,757-Speed 10506.62 samples/sec   Loss 4.2803   LearningRate 0.0605   Epoch: 14   Global Step: 77350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:53:33,545-Speed 10520.85 samples/sec   Loss 4.2742   LearningRate 0.0604   Epoch: 14   Global Step: 77360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:53:41,360-Speed 10483.86 samples/sec   Loss 4.2473   LearningRate 0.0604   Epoch: 14   Global Step: 77370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:53:49,207-Speed 10440.60 samples/sec   Loss 4.2623   LearningRate 0.0603   Epoch: 14   Global Step: 77380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:53:56,999-Speed 10516.00 samples/sec   Loss 4.2457   LearningRate 0.0603   Epoch: 14   Global Step: 77390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:54:04,822-Speed 10472.24 samples/sec   Loss 4.2570   LearningRate 0.0602   Epoch: 14   Global Step: 77400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:54:12,645-Speed 10473.02 samples/sec   Loss 4.2224   LearningRate 0.0602   Epoch: 14   Global Step: 77410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:54:20,517-Speed 10408.53 samples/sec   Loss 4.2451   LearningRate 0.0601   Epoch: 14   Global Step: 77420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:54:28,330-Speed 10486.65 samples/sec   Loss 4.1834   LearningRate 0.0601   Epoch: 14   Global Step: 77430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:54:36,138-Speed 10492.67 samples/sec   Loss 4.2125   LearningRate 0.0600   Epoch: 14   Global Step: 77440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:54:43,942-Speed 10498.54 samples/sec   Loss 4.2629   LearningRate 0.0600   Epoch: 14   Global Step: 77450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:54:51,740-Speed 10507.98 samples/sec   Loss 4.2191   LearningRate 0.0600   Epoch: 14   Global Step: 77460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:54:59,519-Speed 10533.01 samples/sec   Loss 4.2462   LearningRate 0.0599   Epoch: 14   Global Step: 77470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:55:07,294-Speed 10536.46 samples/sec   Loss 4.2275   LearningRate 0.0599   Epoch: 14   Global Step: 77480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:55:15,085-Speed 10516.06 samples/sec   Loss 4.1991   LearningRate 0.0598   Epoch: 14   Global Step: 77490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:55:22,882-Speed 10508.39 samples/sec   Loss 4.2055   LearningRate 0.0598   Epoch: 14   Global Step: 77500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:55:30,678-Speed 10510.54 samples/sec   Loss 4.1805   LearningRate 0.0597   Epoch: 14   Global Step: 77510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:55:38,460-Speed 10526.89 samples/sec   Loss 4.2163   LearningRate 0.0597   Epoch: 14   Global Step: 77520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:55:46,237-Speed 10534.40 samples/sec   Loss 4.1957   LearningRate 0.0596   Epoch: 14   Global Step: 77530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:55:54,072-Speed 10457.47 samples/sec   Loss 4.2441   LearningRate 0.0596   Epoch: 14   Global Step: 77540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:56:01,868-Speed 10509.79 samples/sec   Loss 4.1762   LearningRate 0.0595   Epoch: 14   Global Step: 77550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:56:09,641-Speed 10539.57 samples/sec   Loss 4.2297   LearningRate 0.0595   Epoch: 14   Global Step: 77560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:56:17,419-Speed 10534.87 samples/sec   Loss 4.2404   LearningRate 0.0595   Epoch: 14   Global Step: 77570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:56:25,199-Speed 10531.95 samples/sec   Loss 4.2395   LearningRate 0.0594   Epoch: 14   Global Step: 77580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:56:32,981-Speed 10527.82 samples/sec   Loss 4.2202   LearningRate 0.0594   Epoch: 14   Global Step: 77590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:56:40,790-Speed 10492.26 samples/sec   Loss 4.2020   LearningRate 0.0593   Epoch: 14   Global Step: 77600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:56:48,586-Speed 10509.07 samples/sec   Loss 4.1711   LearningRate 0.0593   Epoch: 14   Global Step: 77610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:56:56,398-Speed 10487.64 samples/sec   Loss 4.2094   LearningRate 0.0592   Epoch: 14   Global Step: 77620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:57:04,215-Speed 10485.69 samples/sec   Loss 4.2333   LearningRate 0.0592   Epoch: 14   Global Step: 77630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:57:11,991-Speed 10536.75 samples/sec   Loss 4.2183   LearningRate 0.0591   Epoch: 14   Global Step: 77640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:57:19,782-Speed 10515.33 samples/sec   Loss 4.2040   LearningRate 0.0591   Epoch: 14   Global Step: 77650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:57:27,573-Speed 10517.45 samples/sec   Loss 4.1665   LearningRate 0.0590   Epoch: 14   Global Step: 77660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:57:35,353-Speed 10530.94 samples/sec   Loss 4.1809   LearningRate 0.0590   Epoch: 14   Global Step: 77670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:57:43,141-Speed 10520.27 samples/sec   Loss 4.1685   LearningRate 0.0590   Epoch: 14   Global Step: 77680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:57:50,936-Speed 10510.61 samples/sec   Loss 4.2309   LearningRate 0.0589   Epoch: 14   Global Step: 77690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:57:58,720-Speed 10525.42 samples/sec   Loss 4.2279   LearningRate 0.0589   Epoch: 14   Global Step: 77700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:58:06,501-Speed 10529.39 samples/sec   Loss 4.1888   LearningRate 0.0588   Epoch: 14   Global Step: 77710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:58:14,298-Speed 10507.08 samples/sec   Loss 4.2296   LearningRate 0.0588   Epoch: 14   Global Step: 77720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:58:22,152-Speed 10432.57 samples/sec   Loss 4.2259   LearningRate 0.0587   Epoch: 14   Global Step: 77730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:58:29,965-Speed 10486.88 samples/sec   Loss 4.2046   LearningRate 0.0587   Epoch: 14   Global Step: 77740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:58:37,770-Speed 10497.50 samples/sec   Loss 4.2123   LearningRate 0.0586   Epoch: 14   Global Step: 77750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:58:45,562-Speed 10514.61 samples/sec   Loss 4.1742   LearningRate 0.0586   Epoch: 14   Global Step: 77760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:58:53,381-Speed 10479.06 samples/sec   Loss 4.1924   LearningRate 0.0585   Epoch: 14   Global Step: 77770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:59:16,015-Speed 3619.41 samples/sec   Loss 4.1958   LearningRate 0.0585   Epoch: 15   Global Step: 77780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:59:23,773-Speed 10561.38 samples/sec   Loss 4.1777   LearningRate 0.0585   Epoch: 15   Global Step: 77790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 07:59:31,526-Speed 10567.40 samples/sec   Loss 4.2187   LearningRate 0.0584   Epoch: 15   Global Step: 77800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:59:39,278-Speed 10569.69 samples/sec   Loss 4.1509   LearningRate 0.0584   Epoch: 15   Global Step: 77810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:59:47,044-Speed 10548.84 samples/sec   Loss 4.1587   LearningRate 0.0583   Epoch: 15   Global Step: 77820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 07:59:54,850-Speed 10497.29 samples/sec   Loss 4.1704   LearningRate 0.0583   Epoch: 15   Global Step: 77830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:00:02,622-Speed 10541.74 samples/sec   Loss 4.1956   LearningRate 0.0582   Epoch: 15   Global Step: 77840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:00:10,397-Speed 10536.86 samples/sec   Loss 4.1794   LearningRate 0.0582   Epoch: 15   Global Step: 77850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:00:18,197-Speed 10505.16 samples/sec   Loss 4.1592   LearningRate 0.0581   Epoch: 15   Global Step: 77860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:00:25,994-Speed 10508.05 samples/sec   Loss 4.1525   LearningRate 0.0581   Epoch: 15   Global Step: 77870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:00:33,785-Speed 10516.31 samples/sec   Loss 4.1803   LearningRate 0.0581   Epoch: 15   Global Step: 77880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:00:41,568-Speed 10526.15 samples/sec   Loss 4.1120   LearningRate 0.0580   Epoch: 15   Global Step: 77890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:00:49,396-Speed 10466.97 samples/sec   Loss 4.1452   LearningRate 0.0580   Epoch: 15   Global Step: 77900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:00:57,197-Speed 10502.46 samples/sec   Loss 4.1417   LearningRate 0.0579   Epoch: 15   Global Step: 77910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:01:05,058-Speed 10422.77 samples/sec   Loss 4.0973   LearningRate 0.0579   Epoch: 15   Global Step: 77920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:01:12,895-Speed 10453.28 samples/sec   Loss 4.1404   LearningRate 0.0578   Epoch: 15   Global Step: 77930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:01:20,677-Speed 10529.04 samples/sec   Loss 4.1623   LearningRate 0.0578   Epoch: 15   Global Step: 77940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:01:28,482-Speed 10496.90 samples/sec   Loss 4.1376   LearningRate 0.0577   Epoch: 15   Global Step: 77950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:01:36,284-Speed 10501.41 samples/sec   Loss 4.1628   LearningRate 0.0577   Epoch: 15   Global Step: 77960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:01:44,077-Speed 10512.37 samples/sec   Loss 4.1496   LearningRate 0.0576   Epoch: 15   Global Step: 77970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:01:51,842-Speed 10551.40 samples/sec   Loss 4.1688   LearningRate 0.0576   Epoch: 15   Global Step: 77980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:01:59,673-Speed 10463.80 samples/sec   Loss 4.1561   LearningRate 0.0576   Epoch: 15   Global Step: 77990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:02:07,459-Speed 10521.68 samples/sec   Loss 4.1575   LearningRate 0.0575   Epoch: 15   Global Step: 78000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 08:02:15,276-Speed 10480.61 samples/sec   Loss 4.1601   LearningRate 0.0575   Epoch: 15   Global Step: 78010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 08:02:23,078-Speed 10502.62 samples/sec   Loss 4.1479   LearningRate 0.0574   Epoch: 15   Global Step: 78020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 08:02:30,874-Speed 10509.20 samples/sec   Loss 4.1421   LearningRate 0.0574   Epoch: 15   Global Step: 78030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:02:38,693-Speed 10477.84 samples/sec   Loss 4.1476   LearningRate 0.0573   Epoch: 15   Global Step: 78040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:02:46,500-Speed 10495.04 samples/sec   Loss 4.1755   LearningRate 0.0573   Epoch: 15   Global Step: 78050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:02:54,289-Speed 10519.86 samples/sec   Loss 4.1545   LearningRate 0.0572   Epoch: 15   Global Step: 78060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:03:02,088-Speed 10505.07 samples/sec   Loss 4.1631   LearningRate 0.0572   Epoch: 15   Global Step: 78070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:03:09,880-Speed 10515.23 samples/sec   Loss 4.1405   LearningRate 0.0572   Epoch: 15   Global Step: 78080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:03:17,675-Speed 10510.05 samples/sec   Loss 4.1471   LearningRate 0.0571   Epoch: 15   Global Step: 78090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:03:25,484-Speed 10493.33 samples/sec   Loss 4.1539   LearningRate 0.0571   Epoch: 15   Global Step: 78100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:03:33,301-Speed 10480.81 samples/sec   Loss 4.1476   LearningRate 0.0570   Epoch: 15   Global Step: 78110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:03:41,086-Speed 10524.92 samples/sec   Loss 4.1275   LearningRate 0.0570   Epoch: 15   Global Step: 78120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:03:48,881-Speed 10510.99 samples/sec   Loss 4.1194   LearningRate 0.0569   Epoch: 15   Global Step: 78130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 08:03:56,690-Speed 10492.10 samples/sec   Loss 4.1250   LearningRate 0.0569   Epoch: 15   Global Step: 78140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 08:04:04,510-Speed 10477.55 samples/sec   Loss 4.0985   LearningRate 0.0568   Epoch: 15   Global Step: 78150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:04:12,314-Speed 10498.90 samples/sec   Loss 4.1299   LearningRate 0.0568   Epoch: 15   Global Step: 78160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:04:20,127-Speed 10485.85 samples/sec   Loss 4.0832   LearningRate 0.0568   Epoch: 15   Global Step: 78170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:04:27,937-Speed 10490.61 samples/sec   Loss 4.1295   LearningRate 0.0567   Epoch: 15   Global Step: 78180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:04:35,769-Speed 10461.53 samples/sec   Loss 4.1011   LearningRate 0.0567   Epoch: 15   Global Step: 78190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:04:43,603-Speed 10463.78 samples/sec   Loss 4.1181   LearningRate 0.0566   Epoch: 15   Global Step: 78200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:04:51,442-Speed 10451.58 samples/sec   Loss 4.1344   LearningRate 0.0566   Epoch: 15   Global Step: 78210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:04:59,276-Speed 10459.57 samples/sec   Loss 4.1392   LearningRate 0.0565   Epoch: 15   Global Step: 78220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:05:07,138-Speed 10419.69 samples/sec   Loss 4.1595   LearningRate 0.0565   Epoch: 15   Global Step: 78230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:05:14,967-Speed 10466.17 samples/sec   Loss 4.1395   LearningRate 0.0564   Epoch: 15   Global Step: 78240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:05:22,791-Speed 10471.94 samples/sec   Loss 4.1348   LearningRate 0.0564   Epoch: 15   Global Step: 78250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 08:05:30,646-Speed 10429.99 samples/sec   Loss 4.0785   LearningRate 0.0564   Epoch: 15   Global Step: 78260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 08:05:38,509-Speed 10419.32 samples/sec   Loss 4.0961   LearningRate 0.0563   Epoch: 15   Global Step: 78270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:05:46,367-Speed 10427.56 samples/sec   Loss 4.1129   LearningRate 0.0563   Epoch: 15   Global Step: 78280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:05:54,244-Speed 10401.16 samples/sec   Loss 4.0963   LearningRate 0.0562   Epoch: 15   Global Step: 78290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:06:02,061-Speed 10482.20 samples/sec   Loss 4.0838   LearningRate 0.0562   Epoch: 15   Global Step: 78300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:06:09,872-Speed 10489.47 samples/sec   Loss 4.1250   LearningRate 0.0561   Epoch: 15   Global Step: 78310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:06:17,668-Speed 10510.02 samples/sec   Loss 4.1253   LearningRate 0.0561   Epoch: 15   Global Step: 78320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:06:25,491-Speed 10472.34 samples/sec   Loss 4.1223   LearningRate 0.0560   Epoch: 15   Global Step: 78330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:06:33,325-Speed 10457.97 samples/sec   Loss 4.1300   LearningRate 0.0560   Epoch: 15   Global Step: 78340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:06:41,136-Speed 10489.05 samples/sec   Loss 4.1259   LearningRate 0.0560   Epoch: 15   Global Step: 78350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:06:48,950-Speed 10485.42 samples/sec   Loss 4.1130   LearningRate 0.0559   Epoch: 15   Global Step: 78360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:06:56,767-Speed 10480.27 samples/sec   Loss 4.1089   LearningRate 0.0559   Epoch: 15   Global Step: 78370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 08:07:04,596-Speed 10469.65 samples/sec   Loss 4.1276   LearningRate 0.0558   Epoch: 15   Global Step: 78380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 08:07:12,406-Speed 10490.10 samples/sec   Loss 4.0904   LearningRate 0.0558   Epoch: 15   Global Step: 78390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 08:07:20,210-Speed 10499.40 samples/sec   Loss 4.0440   LearningRate 0.0557   Epoch: 15   Global Step: 78400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 08:07:28,005-Speed 10509.96 samples/sec   Loss 4.0895   LearningRate 0.0557   Epoch: 15   Global Step: 78410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 08:07:35,789-Speed 10525.71 samples/sec   Loss 4.0592   LearningRate 0.0556   Epoch: 15   Global Step: 78420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 08:07:43,594-Speed 10497.36 samples/sec   Loss 4.0606   LearningRate 0.0556   Epoch: 15   Global Step: 78430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 08:07:51,393-Speed 10509.20 samples/sec   Loss 4.0705   LearningRate 0.0556   Epoch: 15   Global Step: 78440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 08:07:59,173-Speed 10530.30 samples/sec   Loss 4.0792   LearningRate 0.0555   Epoch: 15   Global Step: 78450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:08:06,965-Speed 10516.12 samples/sec   Loss 4.0677   LearningRate 0.0555   Epoch: 15   Global Step: 78460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:08:14,786-Speed 10477.48 samples/sec   Loss 4.0843   LearningRate 0.0554   Epoch: 15   Global Step: 78470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:08:22,575-Speed 10518.97 samples/sec   Loss 4.0635   LearningRate 0.0554   Epoch: 15   Global Step: 78480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:08:30,367-Speed 10514.36 samples/sec   Loss 4.0781   LearningRate 0.0553   Epoch: 15   Global Step: 78490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:08:38,159-Speed 10514.55 samples/sec   Loss 4.0901   LearningRate 0.0553   Epoch: 15   Global Step: 78500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:08:45,957-Speed 10506.40 samples/sec   Loss 4.1280   LearningRate 0.0553   Epoch: 15   Global Step: 78510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:08:53,749-Speed 10515.73 samples/sec   Loss 4.0678   LearningRate 0.0552   Epoch: 15   Global Step: 78520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:09:01,539-Speed 10517.21 samples/sec   Loss 4.0681   LearningRate 0.0552   Epoch: 15   Global Step: 78530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:09:09,363-Speed 10471.05 samples/sec   Loss 4.0755   LearningRate 0.0551   Epoch: 15   Global Step: 78540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:09:17,160-Speed 10508.92 samples/sec   Loss 4.0587   LearningRate 0.0551   Epoch: 15   Global Step: 78550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 08:09:24,957-Speed 10508.45 samples/sec   Loss 4.0778   LearningRate 0.0550   Epoch: 15   Global Step: 78560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 08:09:32,745-Speed 10519.68 samples/sec   Loss 4.0799   LearningRate 0.0550   Epoch: 15   Global Step: 78570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-16 08:09:40,505-Speed 10558.35 samples/sec   Loss 4.0414   LearningRate 0.0549   Epoch: 15   Global Step: 78580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:09:48,290-Speed 10524.22 samples/sec   Loss 4.0648   LearningRate 0.0549   Epoch: 15   Global Step: 78590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:09:56,078-Speed 10524.40 samples/sec   Loss 4.0718   LearningRate 0.0549   Epoch: 15   Global Step: 78600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:10:03,869-Speed 10516.29 samples/sec   Loss 4.0708   LearningRate 0.0548   Epoch: 15   Global Step: 78610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:10:11,673-Speed 10498.44 samples/sec   Loss 4.0436   LearningRate 0.0548   Epoch: 15   Global Step: 78620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:10:19,457-Speed 10525.31 samples/sec   Loss 4.0733   LearningRate 0.0547   Epoch: 15   Global Step: 78630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:10:27,260-Speed 10500.59 samples/sec   Loss 4.0683   LearningRate 0.0547   Epoch: 15   Global Step: 78640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:10:35,050-Speed 10516.04 samples/sec   Loss 4.0861   LearningRate 0.0546   Epoch: 15   Global Step: 78650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:10:42,835-Speed 10524.87 samples/sec   Loss 4.0547   LearningRate 0.0546   Epoch: 15   Global Step: 78660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-16 08:10:50,688-Speed 10433.94 samples/sec   Loss 4.0704   LearningRate 0.0546   Epoch: 15   Global Step: 78670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:10:58,481-Speed 10512.26 samples/sec   Loss 4.0343   LearningRate 0.0545   Epoch: 15   Global Step: 78680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:11:06,288-Speed 10494.76 samples/sec   Loss 4.0484   LearningRate 0.0545   Epoch: 15   Global Step: 78690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:11:14,086-Speed 10510.23 samples/sec   Loss 4.0505   LearningRate 0.0544   Epoch: 15   Global Step: 78700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:11:21,905-Speed 10477.52 samples/sec   Loss 4.0734   LearningRate 0.0544   Epoch: 15   Global Step: 78710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:11:29,719-Speed 10485.22 samples/sec   Loss 4.0983   LearningRate 0.0543   Epoch: 15   Global Step: 78720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:11:37,532-Speed 10486.85 samples/sec   Loss 4.0531   LearningRate 0.0543   Epoch: 15   Global Step: 78730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:11:45,323-Speed 10516.02 samples/sec   Loss 4.0624   LearningRate 0.0542   Epoch: 15   Global Step: 78740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:11:53,105-Speed 10527.69 samples/sec   Loss 3.9916   LearningRate 0.0542   Epoch: 15   Global Step: 78750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:12:00,944-Speed 10451.64 samples/sec   Loss 3.9848   LearningRate 0.0542   Epoch: 15   Global Step: 78760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:12:08,737-Speed 10514.61 samples/sec   Loss 4.0329   LearningRate 0.0541   Epoch: 15   Global Step: 78770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:12:16,528-Speed 10516.73 samples/sec   Loss 4.0329   LearningRate 0.0541   Epoch: 15   Global Step: 78780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:12:24,351-Speed 10473.85 samples/sec   Loss 4.0108   LearningRate 0.0540   Epoch: 15   Global Step: 78790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:12:32,161-Speed 10490.70 samples/sec   Loss 4.0393   LearningRate 0.0540   Epoch: 15   Global Step: 78800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:12:40,040-Speed 10398.11 samples/sec   Loss 4.0389   LearningRate 0.0539   Epoch: 15   Global Step: 78810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:12:47,860-Speed 10476.89 samples/sec   Loss 4.0280   LearningRate 0.0539   Epoch: 15   Global Step: 78820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:12:55,678-Speed 10480.25 samples/sec   Loss 4.0256   LearningRate 0.0539   Epoch: 15   Global Step: 78830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:13:03,509-Speed 10461.82 samples/sec   Loss 4.0172   LearningRate 0.0538   Epoch: 15   Global Step: 78840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:13:11,306-Speed 10508.62 samples/sec   Loss 4.0175   LearningRate 0.0538   Epoch: 15   Global Step: 78850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:13:19,128-Speed 10475.02 samples/sec   Loss 4.0017   LearningRate 0.0537   Epoch: 15   Global Step: 78860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:13:26,950-Speed 10474.49 samples/sec   Loss 4.0207   LearningRate 0.0537   Epoch: 15   Global Step: 78870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:13:34,772-Speed 10474.62 samples/sec   Loss 4.0677   LearningRate 0.0536   Epoch: 15   Global Step: 78880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:13:42,575-Speed 10500.13 samples/sec   Loss 4.0437   LearningRate 0.0536   Epoch: 15   Global Step: 78890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:13:50,399-Speed 10474.89 samples/sec   Loss 4.0665   LearningRate 0.0536   Epoch: 15   Global Step: 78900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:13:58,210-Speed 10489.15 samples/sec   Loss 4.0443   LearningRate 0.0535   Epoch: 15   Global Step: 78910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:14:06,041-Speed 10462.97 samples/sec   Loss 4.0336   LearningRate 0.0535   Epoch: 15   Global Step: 78920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:14:13,847-Speed 10496.19 samples/sec   Loss 4.0511   LearningRate 0.0534   Epoch: 15   Global Step: 78930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:14:21,694-Speed 10441.54 samples/sec   Loss 4.0378   LearningRate 0.0534   Epoch: 15   Global Step: 78940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:14:29,555-Speed 10422.25 samples/sec   Loss 3.9995   LearningRate 0.0533   Epoch: 15   Global Step: 78950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:14:37,365-Speed 10493.61 samples/sec   Loss 4.0238   LearningRate 0.0533   Epoch: 15   Global Step: 78960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:14:45,166-Speed 10503.77 samples/sec   Loss 4.0305   LearningRate 0.0533   Epoch: 15   Global Step: 78970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:14:52,989-Speed 10471.68 samples/sec   Loss 4.0398   LearningRate 0.0532   Epoch: 15   Global Step: 78980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:15:00,801-Speed 10488.07 samples/sec   Loss 4.0232   LearningRate 0.0532   Epoch: 15   Global Step: 78990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:15:08,646-Speed 10443.76 samples/sec   Loss 4.0229   LearningRate 0.0531   Epoch: 15   Global Step: 79000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:15:16,461-Speed 10487.81 samples/sec   Loss 4.0114   LearningRate 0.0531   Epoch: 15   Global Step: 79010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:15:24,277-Speed 10481.49 samples/sec   Loss 4.0137   LearningRate 0.0530   Epoch: 15   Global Step: 79020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:15:32,110-Speed 10460.23 samples/sec   Loss 3.9885   LearningRate 0.0530   Epoch: 15   Global Step: 79030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:15:39,925-Speed 10483.76 samples/sec   Loss 3.9930   LearningRate 0.0529   Epoch: 15   Global Step: 79040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:15:47,713-Speed 10520.95 samples/sec   Loss 4.0123   LearningRate 0.0529   Epoch: 15   Global Step: 79050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:15:55,513-Speed 10502.89 samples/sec   Loss 3.9926   LearningRate 0.0529   Epoch: 15   Global Step: 79060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:16:03,307-Speed 10512.10 samples/sec   Loss 4.0204   LearningRate 0.0528   Epoch: 15   Global Step: 79070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:16:11,109-Speed 10501.15 samples/sec   Loss 4.0038   LearningRate 0.0528   Epoch: 15   Global Step: 79080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:16:18,929-Speed 10477.34 samples/sec   Loss 3.9780   LearningRate 0.0527   Epoch: 15   Global Step: 79090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:16:26,764-Speed 10456.81 samples/sec   Loss 3.9963   LearningRate 0.0527   Epoch: 15   Global Step: 79100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:16:34,576-Speed 10487.14 samples/sec   Loss 4.0140   LearningRate 0.0526   Epoch: 15   Global Step: 79110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:16:42,397-Speed 10476.08 samples/sec   Loss 3.9790   LearningRate 0.0526   Epoch: 15   Global Step: 79120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:16:50,198-Speed 10503.59 samples/sec   Loss 3.9874   LearningRate 0.0526   Epoch: 15   Global Step: 79130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:16:58,004-Speed 10495.21 samples/sec   Loss 3.9775   LearningRate 0.0525   Epoch: 15   Global Step: 79140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:17:05,784-Speed 10531.26 samples/sec   Loss 3.9908   LearningRate 0.0525   Epoch: 15   Global Step: 79150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:17:13,571-Speed 10522.16 samples/sec   Loss 3.9793   LearningRate 0.0524   Epoch: 15   Global Step: 79160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:17:21,369-Speed 10507.01 samples/sec   Loss 4.0238   LearningRate 0.0524   Epoch: 15   Global Step: 79170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:17:29,206-Speed 10452.98 samples/sec   Loss 3.9964   LearningRate 0.0523   Epoch: 15   Global Step: 79180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:17:36,992-Speed 10522.93 samples/sec   Loss 3.9999   LearningRate 0.0523   Epoch: 15   Global Step: 79190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:17:44,799-Speed 10495.71 samples/sec   Loss 3.9734   LearningRate 0.0523   Epoch: 15   Global Step: 79200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:17:52,594-Speed 10510.93 samples/sec   Loss 3.9909   LearningRate 0.0522   Epoch: 15   Global Step: 79210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:18:00,405-Speed 10488.73 samples/sec   Loss 3.9526   LearningRate 0.0522   Epoch: 15   Global Step: 79220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:18:08,212-Speed 10494.35 samples/sec   Loss 3.9399   LearningRate 0.0521   Epoch: 15   Global Step: 79230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:18:16,002-Speed 10518.11 samples/sec   Loss 3.9839   LearningRate 0.0521   Epoch: 15   Global Step: 79240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:18:23,795-Speed 10512.58 samples/sec   Loss 3.9368   LearningRate 0.0521   Epoch: 15   Global Step: 79250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:18:31,600-Speed 10497.16 samples/sec   Loss 4.0137   LearningRate 0.0520   Epoch: 15   Global Step: 79260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:18:39,380-Speed 10530.33 samples/sec   Loss 3.9947   LearningRate 0.0520   Epoch: 15   Global Step: 79270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:18:47,166-Speed 10523.15 samples/sec   Loss 3.9568   LearningRate 0.0519   Epoch: 15   Global Step: 79280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:18:54,970-Speed 10499.14 samples/sec   Loss 3.9927   LearningRate 0.0519   Epoch: 15   Global Step: 79290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:19:02,776-Speed 10495.65 samples/sec   Loss 3.9593   LearningRate 0.0518   Epoch: 15   Global Step: 79300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:19:10,579-Speed 10500.37 samples/sec   Loss 3.9829   LearningRate 0.0518   Epoch: 15   Global Step: 79310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:19:18,352-Speed 10541.90 samples/sec   Loss 3.9720   LearningRate 0.0518   Epoch: 15   Global Step: 79320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:19:26,154-Speed 10502.20 samples/sec   Loss 3.9483   LearningRate 0.0517   Epoch: 15   Global Step: 79330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:19:33,957-Speed 10499.47 samples/sec   Loss 3.9737   LearningRate 0.0517   Epoch: 15   Global Step: 79340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:19:41,758-Speed 10502.81 samples/sec   Loss 3.9525   LearningRate 0.0516   Epoch: 15   Global Step: 79350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:19:49,569-Speed 10489.48 samples/sec   Loss 3.9843   LearningRate 0.0516   Epoch: 15   Global Step: 79360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:19:57,368-Speed 10504.65 samples/sec   Loss 3.9631   LearningRate 0.0515   Epoch: 15   Global Step: 79370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:20:05,209-Speed 10449.64 samples/sec   Loss 3.9537   LearningRate 0.0515   Epoch: 15   Global Step: 79380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:20:13,057-Speed 10444.35 samples/sec   Loss 3.9469   LearningRate 0.0515   Epoch: 15   Global Step: 79390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:20:20,861-Speed 10498.73 samples/sec   Loss 3.9751   LearningRate 0.0514   Epoch: 15   Global Step: 79400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:20:28,676-Speed 10484.56 samples/sec   Loss 3.9758   LearningRate 0.0514   Epoch: 15   Global Step: 79410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:20:36,504-Speed 10464.91 samples/sec   Loss 3.9436   LearningRate 0.0513   Epoch: 15   Global Step: 79420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:20:44,305-Speed 10503.10 samples/sec   Loss 3.9378   LearningRate 0.0513   Epoch: 15   Global Step: 79430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:20:52,120-Speed 10483.87 samples/sec   Loss 3.9524   LearningRate 0.0512   Epoch: 15   Global Step: 79440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:20:59,936-Speed 10482.22 samples/sec   Loss 3.9343   LearningRate 0.0512   Epoch: 15   Global Step: 79450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:21:07,766-Speed 10463.83 samples/sec   Loss 3.9390   LearningRate 0.0512   Epoch: 15   Global Step: 79460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:21:15,577-Speed 10490.18 samples/sec   Loss 3.9358   LearningRate 0.0511   Epoch: 15   Global Step: 79470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:21:23,373-Speed 10508.51 samples/sec   Loss 3.9527   LearningRate 0.0511   Epoch: 15   Global Step: 79480   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-01-16 08:21:31,167-Speed 10519.26 samples/sec   Loss 3.9494   LearningRate 0.0510   Epoch: 15   Global Step: 79490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:21:38,960-Speed 10513.02 samples/sec   Loss 3.9412   LearningRate 0.0510   Epoch: 15   Global Step: 79500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:21:46,773-Speed 10485.96 samples/sec   Loss 3.9420   LearningRate 0.0509   Epoch: 15   Global Step: 79510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:21:54,585-Speed 10487.74 samples/sec   Loss 3.9258   LearningRate 0.0509   Epoch: 15   Global Step: 79520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:22:02,387-Speed 10501.50 samples/sec   Loss 3.9130   LearningRate 0.0509   Epoch: 15   Global Step: 79530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:22:10,178-Speed 10516.78 samples/sec   Loss 3.9434   LearningRate 0.0508   Epoch: 15   Global Step: 79540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:22:17,986-Speed 10493.13 samples/sec   Loss 3.9598   LearningRate 0.0508   Epoch: 15   Global Step: 79550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:22:25,796-Speed 10490.21 samples/sec   Loss 3.9356   LearningRate 0.0507   Epoch: 15   Global Step: 79560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:22:33,575-Speed 10532.68 samples/sec   Loss 3.8879   LearningRate 0.0507   Epoch: 15   Global Step: 79570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:22:41,385-Speed 10490.28 samples/sec   Loss 3.9382   LearningRate 0.0507   Epoch: 15   Global Step: 79580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:22:49,181-Speed 10509.62 samples/sec   Loss 3.9066   LearningRate 0.0506   Epoch: 15   Global Step: 79590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:22:56,987-Speed 10495.96 samples/sec   Loss 3.9274   LearningRate 0.0506   Epoch: 15   Global Step: 79600   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-16 08:23:04,851-Speed 10421.72 samples/sec   Loss 3.9415   LearningRate 0.0505   Epoch: 15   Global Step: 79610   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-16 08:23:12,673-Speed 10473.89 samples/sec   Loss 3.9252   LearningRate 0.0505   Epoch: 15   Global Step: 79620   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-16 08:23:20,477-Speed 10499.05 samples/sec   Loss 3.9276   LearningRate 0.0504   Epoch: 15   Global Step: 79630   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-16 08:23:28,281-Speed 10499.65 samples/sec   Loss 3.9292   LearningRate 0.0504   Epoch: 15   Global Step: 79640   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-16 08:23:36,091-Speed 10490.95 samples/sec   Loss 3.9129   LearningRate 0.0504   Epoch: 15   Global Step: 79650   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-16 08:23:43,913-Speed 10473.90 samples/sec   Loss 3.9046   LearningRate 0.0503   Epoch: 15   Global Step: 79660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-16 08:23:51,708-Speed 10511.35 samples/sec   Loss 3.9350   LearningRate 0.0503   Epoch: 15   Global Step: 79670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-16 08:23:59,491-Speed 10525.99 samples/sec   Loss 3.9087   LearningRate 0.0502   Epoch: 15   Global Step: 79680   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-16 08:24:07,271-Speed 10532.28 samples/sec   Loss 3.9090   LearningRate 0.0502   Epoch: 15   Global Step: 79690   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-16 08:24:15,082-Speed 10488.39 samples/sec   Loss 3.8747   LearningRate 0.0502   Epoch: 15   Global Step: 79700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:24:22,897-Speed 10484.18 samples/sec   Loss 3.9152   LearningRate 0.0501   Epoch: 15   Global Step: 79710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:24:30,705-Speed 10495.09 samples/sec   Loss 3.9269   LearningRate 0.0501   Epoch: 15   Global Step: 79720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:24:38,547-Speed 10446.63 samples/sec   Loss 3.9023   LearningRate 0.0500   Epoch: 15   Global Step: 79730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:24:46,363-Speed 10483.10 samples/sec   Loss 3.9029   LearningRate 0.0500   Epoch: 15   Global Step: 79740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:24:54,144-Speed 10528.41 samples/sec   Loss 3.9128   LearningRate 0.0499   Epoch: 15   Global Step: 79750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:25:01,945-Speed 10503.49 samples/sec   Loss 3.8808   LearningRate 0.0499   Epoch: 15   Global Step: 79760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:25:09,766-Speed 10475.41 samples/sec   Loss 3.9013   LearningRate 0.0499   Epoch: 15   Global Step: 79770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:25:17,567-Speed 10502.88 samples/sec   Loss 3.8749   LearningRate 0.0498   Epoch: 15   Global Step: 79780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:25:25,373-Speed 10495.83 samples/sec   Loss 3.8793   LearningRate 0.0498   Epoch: 15   Global Step: 79790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:25:33,169-Speed 10510.60 samples/sec   Loss 3.8810   LearningRate 0.0497   Epoch: 15   Global Step: 79800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:25:40,985-Speed 10486.16 samples/sec   Loss 3.9076   LearningRate 0.0497   Epoch: 15   Global Step: 79810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:25:48,776-Speed 10515.32 samples/sec   Loss 3.8968   LearningRate 0.0497   Epoch: 15   Global Step: 79820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:25:56,547-Speed 10543.99 samples/sec   Loss 3.9209   LearningRate 0.0496   Epoch: 15   Global Step: 79830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:26:04,339-Speed 10515.11 samples/sec   Loss 3.8858   LearningRate 0.0496   Epoch: 15   Global Step: 79840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:26:12,141-Speed 10503.17 samples/sec   Loss 3.9125   LearningRate 0.0495   Epoch: 15   Global Step: 79850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:26:19,961-Speed 10476.68 samples/sec   Loss 3.8818   LearningRate 0.0495   Epoch: 15   Global Step: 79860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:26:27,761-Speed 10504.85 samples/sec   Loss 3.8928   LearningRate 0.0494   Epoch: 15   Global Step: 79870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:26:35,588-Speed 10470.77 samples/sec   Loss 3.8892   LearningRate 0.0494   Epoch: 15   Global Step: 79880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:26:43,417-Speed 10465.23 samples/sec   Loss 3.8482   LearningRate 0.0494   Epoch: 15   Global Step: 79890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:26:51,217-Speed 10503.82 samples/sec   Loss 3.8622   LearningRate 0.0493   Epoch: 15   Global Step: 79900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:26:59,022-Speed 10497.38 samples/sec   Loss 3.8573   LearningRate 0.0493   Epoch: 15   Global Step: 79910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:27:06,821-Speed 10504.91 samples/sec   Loss 3.8775   LearningRate 0.0492   Epoch: 15   Global Step: 79920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:27:14,640-Speed 10479.54 samples/sec   Loss 3.8663   LearningRate 0.0492   Epoch: 15   Global Step: 79930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:27:22,408-Speed 10547.98 samples/sec   Loss 3.9348   LearningRate 0.0492   Epoch: 15   Global Step: 79940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:27:30,261-Speed 10433.28 samples/sec   Loss 3.8792   LearningRate 0.0491   Epoch: 15   Global Step: 79950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:27:38,094-Speed 10460.29 samples/sec   Loss 3.8578   LearningRate 0.0491   Epoch: 15   Global Step: 79960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:27:45,933-Speed 10452.69 samples/sec   Loss 3.8475   LearningRate 0.0490   Epoch: 15   Global Step: 79970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:27:53,744-Speed 10488.85 samples/sec   Loss 3.8606   LearningRate 0.0490   Epoch: 15   Global Step: 79980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:28:01,560-Speed 10481.26 samples/sec   Loss 3.8720   LearningRate 0.0489   Epoch: 15   Global Step: 79990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:28:09,386-Speed 10468.87 samples/sec   Loss 3.8733   LearningRate 0.0489   Epoch: 15   Global Step: 80000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:28:36,733-[lfw][80000]XNorm: 22.776289
Training: 2022-01-16 08:28:36,734-[lfw][80000]Accuracy-Flip: 0.99783+-0.00289
Training: 2022-01-16 08:28:36,734-[lfw][80000]Accuracy-Highest: 0.99783
Training: 2022-01-16 08:29:09,832-[cfp_fp][80000]XNorm: 20.435738
Training: 2022-01-16 08:29:09,833-[cfp_fp][80000]Accuracy-Flip: 0.99129+-0.00364
Training: 2022-01-16 08:29:09,833-[cfp_fp][80000]Accuracy-Highest: 0.99129
Training: 2022-01-16 08:29:37,972-[agedb_30][80000]XNorm: 22.371951
Training: 2022-01-16 08:29:37,972-[agedb_30][80000]Accuracy-Flip: 0.97950+-0.00495
Training: 2022-01-16 08:29:37,973-[agedb_30][80000]Accuracy-Highest: 0.97950
Training: 2022-01-16 08:29:45,718-Speed 850.41 samples/sec   Loss 3.8905   LearningRate 0.0489   Epoch: 15   Global Step: 80010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:29:53,512-Speed 10512.95 samples/sec   Loss 3.8427   LearningRate 0.0488   Epoch: 15   Global Step: 80020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:30:01,292-Speed 10530.72 samples/sec   Loss 3.8742   LearningRate 0.0488   Epoch: 15   Global Step: 80030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:30:09,052-Speed 10557.88 samples/sec   Loss 3.8619   LearningRate 0.0487   Epoch: 15   Global Step: 80040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:30:16,846-Speed 10512.15 samples/sec   Loss 3.8878   LearningRate 0.0487   Epoch: 15   Global Step: 80050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:30:24,669-Speed 10473.70 samples/sec   Loss 3.8689   LearningRate 0.0487   Epoch: 15   Global Step: 80060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:30:32,488-Speed 10478.73 samples/sec   Loss 3.8256   LearningRate 0.0486   Epoch: 15   Global Step: 80070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:30:40,258-Speed 10543.96 samples/sec   Loss 3.8570   LearningRate 0.0486   Epoch: 15   Global Step: 80080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:30:48,018-Speed 10557.99 samples/sec   Loss 3.8517   LearningRate 0.0485   Epoch: 15   Global Step: 80090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:30:55,804-Speed 10522.58 samples/sec   Loss 3.8599   LearningRate 0.0485   Epoch: 15   Global Step: 80100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:31:03,595-Speed 10516.38 samples/sec   Loss 3.8615   LearningRate 0.0485   Epoch: 15   Global Step: 80110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:31:11,379-Speed 10526.26 samples/sec   Loss 3.8226   LearningRate 0.0484   Epoch: 15   Global Step: 80120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:31:19,166-Speed 10521.49 samples/sec   Loss 3.8242   LearningRate 0.0484   Epoch: 15   Global Step: 80130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:31:27,012-Speed 10443.15 samples/sec   Loss 3.8500   LearningRate 0.0483   Epoch: 15   Global Step: 80140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:31:34,789-Speed 10535.34 samples/sec   Loss 3.8597   LearningRate 0.0483   Epoch: 15   Global Step: 80150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:31:42,570-Speed 10530.06 samples/sec   Loss 3.8768   LearningRate 0.0482   Epoch: 15   Global Step: 80160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:31:50,366-Speed 10509.32 samples/sec   Loss 3.8559   LearningRate 0.0482   Epoch: 15   Global Step: 80170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:31:58,169-Speed 10500.87 samples/sec   Loss 3.8299   LearningRate 0.0482   Epoch: 15   Global Step: 80180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:32:06,003-Speed 10458.27 samples/sec   Loss 3.8507   LearningRate 0.0481   Epoch: 15   Global Step: 80190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:32:13,807-Speed 10500.29 samples/sec   Loss 3.8271   LearningRate 0.0481   Epoch: 15   Global Step: 80200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:32:21,600-Speed 10513.82 samples/sec   Loss 3.8337   LearningRate 0.0480   Epoch: 15   Global Step: 80210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:32:29,443-Speed 10447.35 samples/sec   Loss 3.8257   LearningRate 0.0480   Epoch: 15   Global Step: 80220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:32:37,309-Speed 10415.68 samples/sec   Loss 3.8409   LearningRate 0.0480   Epoch: 15   Global Step: 80230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:32:45,104-Speed 10510.74 samples/sec   Loss 3.8649   LearningRate 0.0479   Epoch: 15   Global Step: 80240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:32:52,871-Speed 10548.95 samples/sec   Loss 3.8346   LearningRate 0.0479   Epoch: 15   Global Step: 80250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:33:00,670-Speed 10506.87 samples/sec   Loss 3.8459   LearningRate 0.0478   Epoch: 15   Global Step: 80260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:33:08,452-Speed 10528.35 samples/sec   Loss 3.8368   LearningRate 0.0478   Epoch: 15   Global Step: 80270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:33:16,253-Speed 10501.69 samples/sec   Loss 3.8201   LearningRate 0.0478   Epoch: 15   Global Step: 80280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:33:24,051-Speed 10506.82 samples/sec   Loss 3.7984   LearningRate 0.0477   Epoch: 15   Global Step: 80290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:33:31,823-Speed 10542.28 samples/sec   Loss 3.8287   LearningRate 0.0477   Epoch: 15   Global Step: 80300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:33:39,627-Speed 10497.49 samples/sec   Loss 3.8521   LearningRate 0.0476   Epoch: 15   Global Step: 80310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:33:47,422-Speed 10511.66 samples/sec   Loss 3.8485   LearningRate 0.0476   Epoch: 15   Global Step: 80320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:33:55,195-Speed 10539.86 samples/sec   Loss 3.8234   LearningRate 0.0476   Epoch: 15   Global Step: 80330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:34:02,974-Speed 10534.12 samples/sec   Loss 3.8489   LearningRate 0.0475   Epoch: 15   Global Step: 80340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:34:10,771-Speed 10508.59 samples/sec   Loss 3.8269   LearningRate 0.0475   Epoch: 15   Global Step: 80350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:34:18,612-Speed 10450.14 samples/sec   Loss 3.8334   LearningRate 0.0474   Epoch: 15   Global Step: 80360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:34:26,400-Speed 10519.86 samples/sec   Loss 3.8252   LearningRate 0.0474   Epoch: 15   Global Step: 80370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:34:34,194-Speed 10512.37 samples/sec   Loss 3.8368   LearningRate 0.0473   Epoch: 15   Global Step: 80380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:34:41,984-Speed 10518.33 samples/sec   Loss 3.8399   LearningRate 0.0473   Epoch: 15   Global Step: 80390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:34:49,779-Speed 10510.58 samples/sec   Loss 3.7935   LearningRate 0.0473   Epoch: 15   Global Step: 80400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:34:57,591-Speed 10488.79 samples/sec   Loss 3.7854   LearningRate 0.0472   Epoch: 15   Global Step: 80410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:35:05,371-Speed 10532.35 samples/sec   Loss 3.7830   LearningRate 0.0472   Epoch: 15   Global Step: 80420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:35:13,171-Speed 10502.61 samples/sec   Loss 3.7960   LearningRate 0.0471   Epoch: 15   Global Step: 80430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:35:20,939-Speed 10546.75 samples/sec   Loss 3.7856   LearningRate 0.0471   Epoch: 15   Global Step: 80440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:35:28,731-Speed 10516.07 samples/sec   Loss 3.7980   LearningRate 0.0471   Epoch: 15   Global Step: 80450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:35:36,509-Speed 10532.82 samples/sec   Loss 3.8138   LearningRate 0.0470   Epoch: 15   Global Step: 80460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:35:44,302-Speed 10512.60 samples/sec   Loss 3.7870   LearningRate 0.0470   Epoch: 15   Global Step: 80470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:35:52,079-Speed 10536.70 samples/sec   Loss 3.7669   LearningRate 0.0469   Epoch: 15   Global Step: 80480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:35:59,856-Speed 10534.49 samples/sec   Loss 3.7939   LearningRate 0.0469   Epoch: 15   Global Step: 80490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:36:07,665-Speed 10492.66 samples/sec   Loss 3.8067   LearningRate 0.0469   Epoch: 15   Global Step: 80500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:36:15,459-Speed 10511.61 samples/sec   Loss 3.7799   LearningRate 0.0468   Epoch: 15   Global Step: 80510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:36:23,252-Speed 10513.26 samples/sec   Loss 3.8071   LearningRate 0.0468   Epoch: 15   Global Step: 80520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:36:31,034-Speed 10528.88 samples/sec   Loss 3.7773   LearningRate 0.0467   Epoch: 15   Global Step: 80530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:36:38,825-Speed 10516.58 samples/sec   Loss 3.7667   LearningRate 0.0467   Epoch: 15   Global Step: 80540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:36:46,617-Speed 10514.59 samples/sec   Loss 3.8025   LearningRate 0.0467   Epoch: 15   Global Step: 80550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:36:54,413-Speed 10509.54 samples/sec   Loss 3.7552   LearningRate 0.0466   Epoch: 15   Global Step: 80560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:37:02,193-Speed 10531.04 samples/sec   Loss 3.7857   LearningRate 0.0466   Epoch: 15   Global Step: 80570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:37:09,991-Speed 10507.46 samples/sec   Loss 3.8021   LearningRate 0.0465   Epoch: 15   Global Step: 80580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:37:17,779-Speed 10520.00 samples/sec   Loss 3.7853   LearningRate 0.0465   Epoch: 15   Global Step: 80590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:37:25,553-Speed 10538.65 samples/sec   Loss 3.8172   LearningRate 0.0465   Epoch: 15   Global Step: 80600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:37:33,368-Speed 10484.39 samples/sec   Loss 3.8026   LearningRate 0.0464   Epoch: 15   Global Step: 80610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:37:41,151-Speed 10528.03 samples/sec   Loss 3.7706   LearningRate 0.0464   Epoch: 15   Global Step: 80620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:37:48,933-Speed 10528.59 samples/sec   Loss 3.7930   LearningRate 0.0463   Epoch: 15   Global Step: 80630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:37:56,745-Speed 10487.44 samples/sec   Loss 3.8168   LearningRate 0.0463   Epoch: 15   Global Step: 80640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:38:04,548-Speed 10499.81 samples/sec   Loss 3.8106   LearningRate 0.0463   Epoch: 15   Global Step: 80650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:38:12,380-Speed 10461.76 samples/sec   Loss 3.7980   LearningRate 0.0462   Epoch: 15   Global Step: 80660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:38:20,168-Speed 10519.39 samples/sec   Loss 3.7755   LearningRate 0.0462   Epoch: 15   Global Step: 80670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:38:27,992-Speed 10471.01 samples/sec   Loss 3.8030   LearningRate 0.0461   Epoch: 15   Global Step: 80680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:38:35,779-Speed 10522.46 samples/sec   Loss 3.7760   LearningRate 0.0461   Epoch: 15   Global Step: 80690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:38:43,583-Speed 10498.52 samples/sec   Loss 3.7663   LearningRate 0.0461   Epoch: 15   Global Step: 80700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:38:51,384-Speed 10501.17 samples/sec   Loss 3.7539   LearningRate 0.0460   Epoch: 15   Global Step: 80710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:38:59,220-Speed 10456.55 samples/sec   Loss 3.7406   LearningRate 0.0460   Epoch: 15   Global Step: 80720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:39:07,064-Speed 10445.73 samples/sec   Loss 3.7579   LearningRate 0.0459   Epoch: 15   Global Step: 80730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:39:14,919-Speed 10430.49 samples/sec   Loss 3.7900   LearningRate 0.0459   Epoch: 15   Global Step: 80740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:39:22,723-Speed 10497.37 samples/sec   Loss 3.7727   LearningRate 0.0459   Epoch: 15   Global Step: 80750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:39:30,518-Speed 10511.54 samples/sec   Loss 3.7955   LearningRate 0.0458   Epoch: 15   Global Step: 80760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:39:38,318-Speed 10505.06 samples/sec   Loss 3.7411   LearningRate 0.0458   Epoch: 15   Global Step: 80770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:39:46,134-Speed 10482.34 samples/sec   Loss 3.7723   LearningRate 0.0457   Epoch: 15   Global Step: 80780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:39:53,942-Speed 10494.61 samples/sec   Loss 3.7507   LearningRate 0.0457   Epoch: 15   Global Step: 80790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:40:01,757-Speed 10483.87 samples/sec   Loss 3.7725   LearningRate 0.0457   Epoch: 15   Global Step: 80800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:40:09,567-Speed 10490.89 samples/sec   Loss 3.7736   LearningRate 0.0456   Epoch: 15   Global Step: 80810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:40:17,387-Speed 10479.59 samples/sec   Loss 3.7859   LearningRate 0.0456   Epoch: 15   Global Step: 80820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:40:25,188-Speed 10501.76 samples/sec   Loss 3.7758   LearningRate 0.0455   Epoch: 15   Global Step: 80830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:40:32,981-Speed 10514.95 samples/sec   Loss 3.7467   LearningRate 0.0455   Epoch: 15   Global Step: 80840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:40:40,784-Speed 10499.88 samples/sec   Loss 3.7182   LearningRate 0.0455   Epoch: 15   Global Step: 80850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:40:48,581-Speed 10508.45 samples/sec   Loss 3.7558   LearningRate 0.0454   Epoch: 15   Global Step: 80860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:40:56,381-Speed 10502.78 samples/sec   Loss 3.7772   LearningRate 0.0454   Epoch: 15   Global Step: 80870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:41:04,195-Speed 10485.55 samples/sec   Loss 3.7304   LearningRate 0.0453   Epoch: 15   Global Step: 80880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:41:11,995-Speed 10503.90 samples/sec   Loss 3.7458   LearningRate 0.0453   Epoch: 15   Global Step: 80890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:41:19,809-Speed 10484.65 samples/sec   Loss 3.7577   LearningRate 0.0453   Epoch: 15   Global Step: 80900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:41:27,597-Speed 10524.40 samples/sec   Loss 3.7575   LearningRate 0.0452   Epoch: 15   Global Step: 80910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:41:35,370-Speed 10539.59 samples/sec   Loss 3.7585   LearningRate 0.0452   Epoch: 15   Global Step: 80920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:41:43,161-Speed 10516.06 samples/sec   Loss 3.7172   LearningRate 0.0451   Epoch: 15   Global Step: 80930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:41:50,970-Speed 10490.99 samples/sec   Loss 3.7290   LearningRate 0.0451   Epoch: 15   Global Step: 80940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:41:58,800-Speed 10464.07 samples/sec   Loss 3.7423   LearningRate 0.0451   Epoch: 15   Global Step: 80950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:42:06,593-Speed 10512.93 samples/sec   Loss 3.7183   LearningRate 0.0450   Epoch: 15   Global Step: 80960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:42:14,394-Speed 10503.59 samples/sec   Loss 3.7511   LearningRate 0.0450   Epoch: 15   Global Step: 80970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:42:22,182-Speed 10520.69 samples/sec   Loss 3.7402   LearningRate 0.0449   Epoch: 15   Global Step: 80980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:42:29,981-Speed 10504.99 samples/sec   Loss 3.7458   LearningRate 0.0449   Epoch: 15   Global Step: 80990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:42:37,767-Speed 10524.05 samples/sec   Loss 3.7145   LearningRate 0.0449   Epoch: 15   Global Step: 81000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:42:45,552-Speed 10524.49 samples/sec   Loss 3.7183   LearningRate 0.0448   Epoch: 15   Global Step: 81010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:42:53,372-Speed 10476.68 samples/sec   Loss 3.7057   LearningRate 0.0448   Epoch: 15   Global Step: 81020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:43:01,172-Speed 10503.45 samples/sec   Loss 3.7129   LearningRate 0.0447   Epoch: 15   Global Step: 81030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:43:08,975-Speed 10501.32 samples/sec   Loss 3.7542   LearningRate 0.0447   Epoch: 15   Global Step: 81040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:43:16,789-Speed 10486.67 samples/sec   Loss 3.7101   LearningRate 0.0447   Epoch: 15   Global Step: 81050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:43:24,653-Speed 10418.73 samples/sec   Loss 3.7340   LearningRate 0.0446   Epoch: 15   Global Step: 81060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:43:32,453-Speed 10503.43 samples/sec   Loss 3.7085   LearningRate 0.0446   Epoch: 15   Global Step: 81070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:43:40,268-Speed 10487.82 samples/sec   Loss 3.7223   LearningRate 0.0445   Epoch: 15   Global Step: 81080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:43:48,082-Speed 10486.08 samples/sec   Loss 3.7135   LearningRate 0.0445   Epoch: 15   Global Step: 81090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:43:55,882-Speed 10503.34 samples/sec   Loss 3.6979   LearningRate 0.0445   Epoch: 15   Global Step: 81100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:44:03,697-Speed 10483.22 samples/sec   Loss 3.7087   LearningRate 0.0444   Epoch: 15   Global Step: 81110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:44:11,490-Speed 10514.32 samples/sec   Loss 3.7244   LearningRate 0.0444   Epoch: 15   Global Step: 81120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:44:19,277-Speed 10521.33 samples/sec   Loss 3.7159   LearningRate 0.0443   Epoch: 15   Global Step: 81130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:44:27,067-Speed 10517.61 samples/sec   Loss 3.6615   LearningRate 0.0443   Epoch: 15   Global Step: 81140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:44:34,850-Speed 10526.51 samples/sec   Loss 3.6969   LearningRate 0.0443   Epoch: 15   Global Step: 81150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:44:42,658-Speed 10493.40 samples/sec   Loss 3.7080   LearningRate 0.0442   Epoch: 15   Global Step: 81160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:44:50,459-Speed 10502.50 samples/sec   Loss 3.6911   LearningRate 0.0442   Epoch: 15   Global Step: 81170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:44:58,241-Speed 10528.13 samples/sec   Loss 3.7046   LearningRate 0.0442   Epoch: 15   Global Step: 81180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:45:06,044-Speed 10500.81 samples/sec   Loss 3.7168   LearningRate 0.0441   Epoch: 15   Global Step: 81190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:45:13,874-Speed 10463.13 samples/sec   Loss 3.6782   LearningRate 0.0441   Epoch: 15   Global Step: 81200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:45:21,658-Speed 10526.05 samples/sec   Loss 3.7294   LearningRate 0.0440   Epoch: 15   Global Step: 81210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:45:29,446-Speed 10520.74 samples/sec   Loss 3.6860   LearningRate 0.0440   Epoch: 15   Global Step: 81220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:45:37,237-Speed 10515.68 samples/sec   Loss 3.6945   LearningRate 0.0440   Epoch: 15   Global Step: 81230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:45:45,047-Speed 10490.14 samples/sec   Loss 3.6940   LearningRate 0.0439   Epoch: 15   Global Step: 81240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:45:52,859-Speed 10487.53 samples/sec   Loss 3.6818   LearningRate 0.0439   Epoch: 15   Global Step: 81250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:46:00,666-Speed 10494.50 samples/sec   Loss 3.7047   LearningRate 0.0438   Epoch: 15   Global Step: 81260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:46:08,459-Speed 10513.66 samples/sec   Loss 3.7504   LearningRate 0.0438   Epoch: 15   Global Step: 81270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:46:16,255-Speed 10510.68 samples/sec   Loss 3.7158   LearningRate 0.0438   Epoch: 15   Global Step: 81280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:46:24,044-Speed 10518.45 samples/sec   Loss 3.6635   LearningRate 0.0437   Epoch: 15   Global Step: 81290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:46:31,824-Speed 10531.12 samples/sec   Loss 3.6951   LearningRate 0.0437   Epoch: 15   Global Step: 81300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:46:39,603-Speed 10532.83 samples/sec   Loss 3.6493   LearningRate 0.0436   Epoch: 15   Global Step: 81310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:46:47,394-Speed 10515.30 samples/sec   Loss 3.6916   LearningRate 0.0436   Epoch: 15   Global Step: 81320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:46:55,183-Speed 10519.67 samples/sec   Loss 3.7077   LearningRate 0.0436   Epoch: 15   Global Step: 81330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:47:03,003-Speed 10476.53 samples/sec   Loss 3.7018   LearningRate 0.0435   Epoch: 15   Global Step: 81340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:47:10,797-Speed 10512.50 samples/sec   Loss 3.6828   LearningRate 0.0435   Epoch: 15   Global Step: 81350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:47:18,582-Speed 10523.27 samples/sec   Loss 3.6897   LearningRate 0.0434   Epoch: 15   Global Step: 81360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:47:26,389-Speed 10494.95 samples/sec   Loss 3.6633   LearningRate 0.0434   Epoch: 15   Global Step: 81370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:47:34,181-Speed 10515.45 samples/sec   Loss 3.6658   LearningRate 0.0434   Epoch: 15   Global Step: 81380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:47:41,978-Speed 10507.48 samples/sec   Loss 3.6619   LearningRate 0.0433   Epoch: 15   Global Step: 81390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:47:49,783-Speed 10498.03 samples/sec   Loss 3.6933   LearningRate 0.0433   Epoch: 15   Global Step: 81400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:47:57,574-Speed 10516.94 samples/sec   Loss 3.7001   LearningRate 0.0433   Epoch: 15   Global Step: 81410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:48:05,434-Speed 10422.90 samples/sec   Loss 3.6975   LearningRate 0.0432   Epoch: 15   Global Step: 81420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:48:13,277-Speed 10446.96 samples/sec   Loss 3.6349   LearningRate 0.0432   Epoch: 15   Global Step: 81430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:48:21,088-Speed 10490.44 samples/sec   Loss 3.6706   LearningRate 0.0431   Epoch: 15   Global Step: 81440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:48:28,893-Speed 10497.61 samples/sec   Loss 3.6434   LearningRate 0.0431   Epoch: 15   Global Step: 81450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:48:36,686-Speed 10514.25 samples/sec   Loss 3.6576   LearningRate 0.0431   Epoch: 15   Global Step: 81460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:48:44,491-Speed 10498.26 samples/sec   Loss 3.6712   LearningRate 0.0430   Epoch: 15   Global Step: 81470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:48:52,315-Speed 10470.65 samples/sec   Loss 3.6730   LearningRate 0.0430   Epoch: 15   Global Step: 81480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:49:00,145-Speed 10464.24 samples/sec   Loss 3.6845   LearningRate 0.0429   Epoch: 15   Global Step: 81490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:49:07,991-Speed 10443.57 samples/sec   Loss 3.6451   LearningRate 0.0429   Epoch: 15   Global Step: 81500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:49:15,778-Speed 10522.04 samples/sec   Loss 3.6735   LearningRate 0.0429   Epoch: 15   Global Step: 81510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:49:23,564-Speed 10522.56 samples/sec   Loss 3.6425   LearningRate 0.0428   Epoch: 15   Global Step: 81520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:49:31,376-Speed 10489.98 samples/sec   Loss 3.6686   LearningRate 0.0428   Epoch: 15   Global Step: 81530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:49:39,168-Speed 10513.86 samples/sec   Loss 3.6409   LearningRate 0.0428   Epoch: 15   Global Step: 81540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:49:46,974-Speed 10497.24 samples/sec   Loss 3.6422   LearningRate 0.0427   Epoch: 15   Global Step: 81550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:49:54,780-Speed 10494.30 samples/sec   Loss 3.6544   LearningRate 0.0427   Epoch: 15   Global Step: 81560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:50:02,572-Speed 10515.60 samples/sec   Loss 3.6013   LearningRate 0.0426   Epoch: 15   Global Step: 81570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:50:10,356-Speed 10525.68 samples/sec   Loss 3.6389   LearningRate 0.0426   Epoch: 15   Global Step: 81580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:50:18,184-Speed 10466.30 samples/sec   Loss 3.6498   LearningRate 0.0426   Epoch: 15   Global Step: 81590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:50:26,018-Speed 10457.94 samples/sec   Loss 3.6471   LearningRate 0.0425   Epoch: 15   Global Step: 81600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:50:33,813-Speed 10511.20 samples/sec   Loss 3.6252   LearningRate 0.0425   Epoch: 15   Global Step: 81610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:50:41,630-Speed 10487.47 samples/sec   Loss 3.6422   LearningRate 0.0424   Epoch: 15   Global Step: 81620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:50:49,435-Speed 10497.00 samples/sec   Loss 3.6317   LearningRate 0.0424   Epoch: 15   Global Step: 81630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:50:57,211-Speed 10537.78 samples/sec   Loss 3.6097   LearningRate 0.0424   Epoch: 15   Global Step: 81640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:51:05,011-Speed 10503.75 samples/sec   Loss 3.6516   LearningRate 0.0423   Epoch: 15   Global Step: 81650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:51:12,824-Speed 10486.24 samples/sec   Loss 3.6419   LearningRate 0.0423   Epoch: 15   Global Step: 81660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:51:20,609-Speed 10524.15 samples/sec   Loss 3.6417   LearningRate 0.0422   Epoch: 15   Global Step: 81670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:51:28,401-Speed 10515.47 samples/sec   Loss 3.5997   LearningRate 0.0422   Epoch: 15   Global Step: 81680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:51:36,187-Speed 10522.22 samples/sec   Loss 3.6154   LearningRate 0.0422   Epoch: 15   Global Step: 81690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:51:43,953-Speed 10550.45 samples/sec   Loss 3.6719   LearningRate 0.0421   Epoch: 15   Global Step: 81700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:51:51,761-Speed 10495.10 samples/sec   Loss 3.6033   LearningRate 0.0421   Epoch: 15   Global Step: 81710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:51:59,556-Speed 10511.24 samples/sec   Loss 3.6218   LearningRate 0.0421   Epoch: 15   Global Step: 81720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:52:07,358-Speed 10500.27 samples/sec   Loss 3.6403   LearningRate 0.0420   Epoch: 15   Global Step: 81730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:52:15,140-Speed 10528.93 samples/sec   Loss 3.6451   LearningRate 0.0420   Epoch: 15   Global Step: 81740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:52:22,935-Speed 10510.34 samples/sec   Loss 3.6420   LearningRate 0.0419   Epoch: 15   Global Step: 81750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:52:30,723-Speed 10519.71 samples/sec   Loss 3.6223   LearningRate 0.0419   Epoch: 15   Global Step: 81760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:52:38,518-Speed 10511.38 samples/sec   Loss 3.6064   LearningRate 0.0419   Epoch: 15   Global Step: 81770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:52:46,315-Speed 10508.61 samples/sec   Loss 3.6292   LearningRate 0.0418   Epoch: 15   Global Step: 81780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:52:54,121-Speed 10495.59 samples/sec   Loss 3.6261   LearningRate 0.0418   Epoch: 15   Global Step: 81790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:53:01,911-Speed 10518.22 samples/sec   Loss 3.6257   LearningRate 0.0418   Epoch: 15   Global Step: 81800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:53:09,726-Speed 10483.22 samples/sec   Loss 3.6193   LearningRate 0.0417   Epoch: 15   Global Step: 81810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:53:17,609-Speed 10394.60 samples/sec   Loss 3.6458   LearningRate 0.0417   Epoch: 15   Global Step: 81820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:53:25,438-Speed 10465.72 samples/sec   Loss 3.6043   LearningRate 0.0416   Epoch: 15   Global Step: 81830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:53:33,235-Speed 10506.39 samples/sec   Loss 3.6049   LearningRate 0.0416   Epoch: 15   Global Step: 81840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:53:41,064-Speed 10465.79 samples/sec   Loss 3.5904   LearningRate 0.0416   Epoch: 15   Global Step: 81850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:53:48,877-Speed 10486.55 samples/sec   Loss 3.6164   LearningRate 0.0415   Epoch: 15   Global Step: 81860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:53:56,706-Speed 10465.25 samples/sec   Loss 3.6221   LearningRate 0.0415   Epoch: 15   Global Step: 81870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:54:04,491-Speed 10524.40 samples/sec   Loss 3.6101   LearningRate 0.0414   Epoch: 15   Global Step: 81880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:54:12,322-Speed 10461.49 samples/sec   Loss 3.6016   LearningRate 0.0414   Epoch: 15   Global Step: 81890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:54:20,144-Speed 10475.63 samples/sec   Loss 3.6184   LearningRate 0.0414   Epoch: 15   Global Step: 81900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:54:27,942-Speed 10506.45 samples/sec   Loss 3.6330   LearningRate 0.0413   Epoch: 15   Global Step: 81910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:54:35,747-Speed 10496.14 samples/sec   Loss 3.5893   LearningRate 0.0413   Epoch: 15   Global Step: 81920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:54:43,562-Speed 10484.15 samples/sec   Loss 3.5789   LearningRate 0.0413   Epoch: 15   Global Step: 81930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:54:51,372-Speed 10496.76 samples/sec   Loss 3.6065   LearningRate 0.0412   Epoch: 15   Global Step: 81940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:54:59,183-Speed 10489.02 samples/sec   Loss 3.5975   LearningRate 0.0412   Epoch: 15   Global Step: 81950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:55:06,987-Speed 10499.95 samples/sec   Loss 3.5870   LearningRate 0.0411   Epoch: 15   Global Step: 81960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:55:14,775-Speed 10519.57 samples/sec   Loss 3.5925   LearningRate 0.0411   Epoch: 15   Global Step: 81970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:55:22,573-Speed 10506.80 samples/sec   Loss 3.5934   LearningRate 0.0411   Epoch: 15   Global Step: 81980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:55:30,368-Speed 10509.97 samples/sec   Loss 3.5853   LearningRate 0.0410   Epoch: 15   Global Step: 81990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:55:38,151-Speed 10527.19 samples/sec   Loss 3.6086   LearningRate 0.0410   Epoch: 15   Global Step: 82000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:55:45,952-Speed 10502.40 samples/sec   Loss 3.5697   LearningRate 0.0410   Epoch: 15   Global Step: 82010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:55:53,763-Speed 10489.45 samples/sec   Loss 3.5719   LearningRate 0.0409   Epoch: 15   Global Step: 82020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:56:01,587-Speed 10472.35 samples/sec   Loss 3.5907   LearningRate 0.0409   Epoch: 15   Global Step: 82030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:56:09,389-Speed 10501.79 samples/sec   Loss 3.5649   LearningRate 0.0408   Epoch: 15   Global Step: 82040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:56:17,201-Speed 10487.73 samples/sec   Loss 3.5913   LearningRate 0.0408   Epoch: 15   Global Step: 82050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:56:24,998-Speed 10507.57 samples/sec   Loss 3.5755   LearningRate 0.0408   Epoch: 15   Global Step: 82060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:56:32,814-Speed 10481.20 samples/sec   Loss 3.5940   LearningRate 0.0407   Epoch: 15   Global Step: 82070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:56:40,609-Speed 10511.53 samples/sec   Loss 3.5708   LearningRate 0.0407   Epoch: 15   Global Step: 82080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:56:48,404-Speed 10511.03 samples/sec   Loss 3.5446   LearningRate 0.0407   Epoch: 15   Global Step: 82090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:56:56,231-Speed 10467.47 samples/sec   Loss 3.5632   LearningRate 0.0406   Epoch: 15   Global Step: 82100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:57:04,036-Speed 10497.38 samples/sec   Loss 3.5532   LearningRate 0.0406   Epoch: 15   Global Step: 82110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:57:11,844-Speed 10493.37 samples/sec   Loss 3.5843   LearningRate 0.0405   Epoch: 15   Global Step: 82120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:57:19,638-Speed 10512.26 samples/sec   Loss 3.5657   LearningRate 0.0405   Epoch: 15   Global Step: 82130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:57:27,458-Speed 10476.02 samples/sec   Loss 3.5538   LearningRate 0.0405   Epoch: 15   Global Step: 82140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:57:35,243-Speed 10525.35 samples/sec   Loss 3.5665   LearningRate 0.0404   Epoch: 15   Global Step: 82150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:57:43,021-Speed 10538.69 samples/sec   Loss 3.5796   LearningRate 0.0404   Epoch: 15   Global Step: 82160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:57:50,824-Speed 10498.65 samples/sec   Loss 3.6057   LearningRate 0.0404   Epoch: 15   Global Step: 82170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:57:58,644-Speed 10477.21 samples/sec   Loss 3.5643   LearningRate 0.0403   Epoch: 15   Global Step: 82180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:58:06,446-Speed 10501.24 samples/sec   Loss 3.5645   LearningRate 0.0403   Epoch: 15   Global Step: 82190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:58:14,279-Speed 10460.68 samples/sec   Loss 3.5189   LearningRate 0.0402   Epoch: 15   Global Step: 82200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:58:22,066-Speed 10520.73 samples/sec   Loss 3.6055   LearningRate 0.0402   Epoch: 15   Global Step: 82210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:58:29,860-Speed 10512.21 samples/sec   Loss 3.5593   LearningRate 0.0402   Epoch: 15   Global Step: 82220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:58:37,647-Speed 10521.33 samples/sec   Loss 3.5601   LearningRate 0.0401   Epoch: 15   Global Step: 82230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:58:45,440-Speed 10514.41 samples/sec   Loss 3.5912   LearningRate 0.0401   Epoch: 15   Global Step: 82240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:58:53,218-Speed 10532.34 samples/sec   Loss 3.5600   LearningRate 0.0401   Epoch: 15   Global Step: 82250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:59:01,002-Speed 10525.82 samples/sec   Loss 3.5495   LearningRate 0.0400   Epoch: 15   Global Step: 82260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:59:08,792-Speed 10518.18 samples/sec   Loss 3.5585   LearningRate 0.0400   Epoch: 15   Global Step: 82270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:59:16,594-Speed 10500.65 samples/sec   Loss 3.5654   LearningRate 0.0399   Epoch: 15   Global Step: 82280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:59:24,413-Speed 10478.29 samples/sec   Loss 3.5608   LearningRate 0.0399   Epoch: 15   Global Step: 82290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:59:32,252-Speed 10452.43 samples/sec   Loss 3.5655   LearningRate 0.0399   Epoch: 15   Global Step: 82300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:59:40,064-Speed 10487.04 samples/sec   Loss 3.5577   LearningRate 0.0398   Epoch: 15   Global Step: 82310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 08:59:47,852-Speed 10520.62 samples/sec   Loss 3.5369   LearningRate 0.0398   Epoch: 15   Global Step: 82320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 08:59:55,647-Speed 10510.03 samples/sec   Loss 3.5125   LearningRate 0.0398   Epoch: 15   Global Step: 82330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:00:03,448-Speed 10503.40 samples/sec   Loss 3.5604   LearningRate 0.0397   Epoch: 15   Global Step: 82340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:00:11,282-Speed 10458.48 samples/sec   Loss 3.5546   LearningRate 0.0397   Epoch: 15   Global Step: 82350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:00:19,143-Speed 10422.37 samples/sec   Loss 3.5145   LearningRate 0.0396   Epoch: 15   Global Step: 82360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:00:26,987-Speed 10445.20 samples/sec   Loss 3.5650   LearningRate 0.0396   Epoch: 15   Global Step: 82370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:00:34,793-Speed 10496.05 samples/sec   Loss 3.5435   LearningRate 0.0396   Epoch: 15   Global Step: 82380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:00:42,594-Speed 10502.76 samples/sec   Loss 3.5124   LearningRate 0.0395   Epoch: 15   Global Step: 82390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:00:50,385-Speed 10516.80 samples/sec   Loss 3.5253   LearningRate 0.0395   Epoch: 15   Global Step: 82400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:00:58,210-Speed 10469.38 samples/sec   Loss 3.5106   LearningRate 0.0395   Epoch: 15   Global Step: 82410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:01:06,032-Speed 10475.48 samples/sec   Loss 3.5177   LearningRate 0.0394   Epoch: 15   Global Step: 82420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 09:01:13,841-Speed 10491.11 samples/sec   Loss 3.5276   LearningRate 0.0394   Epoch: 15   Global Step: 82430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 09:01:21,646-Speed 10497.81 samples/sec   Loss 3.5563   LearningRate 0.0393   Epoch: 15   Global Step: 82440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 09:01:29,436-Speed 10518.39 samples/sec   Loss 3.5473   LearningRate 0.0393   Epoch: 15   Global Step: 82450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 09:01:37,222-Speed 10522.54 samples/sec   Loss 3.4926   LearningRate 0.0393   Epoch: 15   Global Step: 82460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 09:01:45,005-Speed 10527.45 samples/sec   Loss 3.4913   LearningRate 0.0392   Epoch: 15   Global Step: 82470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 09:01:52,785-Speed 10531.40 samples/sec   Loss 3.4950   LearningRate 0.0392   Epoch: 15   Global Step: 82480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:02:00,588-Speed 10500.28 samples/sec   Loss 3.4935   LearningRate 0.0392   Epoch: 15   Global Step: 82490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:02:08,407-Speed 10478.95 samples/sec   Loss 3.5227   LearningRate 0.0391   Epoch: 15   Global Step: 82500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:02:16,182-Speed 10536.23 samples/sec   Loss 3.5171   LearningRate 0.0391   Epoch: 15   Global Step: 82510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:02:23,977-Speed 10511.29 samples/sec   Loss 3.5143   LearningRate 0.0390   Epoch: 15   Global Step: 82520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:02:31,753-Speed 10535.66 samples/sec   Loss 3.5382   LearningRate 0.0390   Epoch: 15   Global Step: 82530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:02:39,538-Speed 10524.58 samples/sec   Loss 3.5253   LearningRate 0.0390   Epoch: 15   Global Step: 82540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:02:47,351-Speed 10486.11 samples/sec   Loss 3.5331   LearningRate 0.0389   Epoch: 15   Global Step: 82550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:02:55,177-Speed 10469.35 samples/sec   Loss 3.5327   LearningRate 0.0389   Epoch: 15   Global Step: 82560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:03:02,955-Speed 10533.22 samples/sec   Loss 3.5166   LearningRate 0.0389   Epoch: 15   Global Step: 82570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:03:10,723-Speed 10548.63 samples/sec   Loss 3.5104   LearningRate 0.0388   Epoch: 15   Global Step: 82580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:03:18,529-Speed 10495.61 samples/sec   Loss 3.5142   LearningRate 0.0388   Epoch: 15   Global Step: 82590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:03:26,333-Speed 10497.67 samples/sec   Loss 3.4867   LearningRate 0.0388   Epoch: 15   Global Step: 82600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:03:34,127-Speed 10514.28 samples/sec   Loss 3.5084   LearningRate 0.0387   Epoch: 15   Global Step: 82610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:03:41,920-Speed 10513.89 samples/sec   Loss 3.4875   LearningRate 0.0387   Epoch: 15   Global Step: 82620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:03:49,728-Speed 10492.70 samples/sec   Loss 3.5066   LearningRate 0.0386   Epoch: 15   Global Step: 82630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:03:57,523-Speed 10510.93 samples/sec   Loss 3.4960   LearningRate 0.0386   Epoch: 15   Global Step: 82640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:04:05,315-Speed 10514.09 samples/sec   Loss 3.4968   LearningRate 0.0386   Epoch: 15   Global Step: 82650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:04:13,108-Speed 10513.84 samples/sec   Loss 3.5092   LearningRate 0.0385   Epoch: 15   Global Step: 82660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:04:20,896-Speed 10519.64 samples/sec   Loss 3.4783   LearningRate 0.0385   Epoch: 15   Global Step: 82670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:04:28,696-Speed 10504.22 samples/sec   Loss 3.5128   LearningRate 0.0385   Epoch: 15   Global Step: 82680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 09:04:36,485-Speed 10519.17 samples/sec   Loss 3.4867   LearningRate 0.0384   Epoch: 15   Global Step: 82690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 09:04:44,284-Speed 10505.10 samples/sec   Loss 3.5052   LearningRate 0.0384   Epoch: 15   Global Step: 82700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 09:04:52,079-Speed 10510.85 samples/sec   Loss 3.4973   LearningRate 0.0384   Epoch: 15   Global Step: 82710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 09:04:59,878-Speed 10505.28 samples/sec   Loss 3.5092   LearningRate 0.0383   Epoch: 15   Global Step: 82720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 09:05:07,683-Speed 10497.43 samples/sec   Loss 3.4696   LearningRate 0.0383   Epoch: 15   Global Step: 82730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 09:05:15,468-Speed 10524.84 samples/sec   Loss 3.4832   LearningRate 0.0382   Epoch: 15   Global Step: 82740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:05:23,270-Speed 10500.30 samples/sec   Loss 3.4925   LearningRate 0.0382   Epoch: 15   Global Step: 82750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:05:31,076-Speed 10496.36 samples/sec   Loss 3.4837   LearningRate 0.0382   Epoch: 15   Global Step: 82760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:05:38,865-Speed 10518.93 samples/sec   Loss 3.4871   LearningRate 0.0381   Epoch: 15   Global Step: 82770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:05:46,657-Speed 10515.22 samples/sec   Loss 3.4878   LearningRate 0.0381   Epoch: 15   Global Step: 82780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:05:54,463-Speed 10495.85 samples/sec   Loss 3.4754   LearningRate 0.0381   Epoch: 15   Global Step: 82790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:06:02,299-Speed 10456.29 samples/sec   Loss 3.4740   LearningRate 0.0380   Epoch: 15   Global Step: 82800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:06:10,113-Speed 10486.50 samples/sec   Loss 3.4834   LearningRate 0.0380   Epoch: 15   Global Step: 82810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:06:17,894-Speed 10530.06 samples/sec   Loss 3.4868   LearningRate 0.0379   Epoch: 15   Global Step: 82820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:06:25,724-Speed 10462.67 samples/sec   Loss 3.4470   LearningRate 0.0379   Epoch: 15   Global Step: 82830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:06:33,523-Speed 10505.81 samples/sec   Loss 3.4708   LearningRate 0.0379   Epoch: 15   Global Step: 82840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 09:06:41,363-Speed 10450.37 samples/sec   Loss 3.4911   LearningRate 0.0378   Epoch: 15   Global Step: 82850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 09:06:49,183-Speed 10477.73 samples/sec   Loss 3.4717   LearningRate 0.0378   Epoch: 15   Global Step: 82860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 09:06:56,974-Speed 10515.81 samples/sec   Loss 3.4537   LearningRate 0.0378   Epoch: 15   Global Step: 82870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 09:07:04,771-Speed 10508.72 samples/sec   Loss 3.4731   LearningRate 0.0377   Epoch: 15   Global Step: 82880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:07:12,570-Speed 10505.16 samples/sec   Loss 3.4578   LearningRate 0.0377   Epoch: 15   Global Step: 82890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:07:20,355-Speed 10523.85 samples/sec   Loss 3.4822   LearningRate 0.0377   Epoch: 15   Global Step: 82900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:07:28,147-Speed 10516.20 samples/sec   Loss 3.4539   LearningRate 0.0376   Epoch: 15   Global Step: 82910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:07:35,953-Speed 10494.83 samples/sec   Loss 3.4604   LearningRate 0.0376   Epoch: 15   Global Step: 82920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:07:43,754-Speed 10501.88 samples/sec   Loss 3.4982   LearningRate 0.0376   Epoch: 15   Global Step: 82930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:07:51,554-Speed 10505.02 samples/sec   Loss 3.4631   LearningRate 0.0375   Epoch: 15   Global Step: 82940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:07:59,370-Speed 10482.97 samples/sec   Loss 3.4939   LearningRate 0.0375   Epoch: 15   Global Step: 82950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:08:22,166-Speed 3593.74 samples/sec   Loss 3.4383   LearningRate 0.0374   Epoch: 16   Global Step: 82960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:08:29,922-Speed 10563.20 samples/sec   Loss 3.4692   LearningRate 0.0374   Epoch: 16   Global Step: 82970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:08:37,690-Speed 10547.57 samples/sec   Loss 3.4382   LearningRate 0.0374   Epoch: 16   Global Step: 82980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:08:45,437-Speed 10574.91 samples/sec   Loss 3.4489   LearningRate 0.0373   Epoch: 16   Global Step: 82990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:08:53,188-Speed 10570.16 samples/sec   Loss 3.4567   LearningRate 0.0373   Epoch: 16   Global Step: 83000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:09:00,958-Speed 10545.18 samples/sec   Loss 3.4499   LearningRate 0.0373   Epoch: 16   Global Step: 83010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:09:08,724-Speed 10549.39 samples/sec   Loss 3.4274   LearningRate 0.0372   Epoch: 16   Global Step: 83020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:09:16,508-Speed 10526.59 samples/sec   Loss 3.4461   LearningRate 0.0372   Epoch: 16   Global Step: 83030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:09:24,306-Speed 10505.84 samples/sec   Loss 3.4673   LearningRate 0.0372   Epoch: 16   Global Step: 83040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:09:32,101-Speed 10511.14 samples/sec   Loss 3.4445   LearningRate 0.0371   Epoch: 16   Global Step: 83050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:09:39,873-Speed 10541.61 samples/sec   Loss 3.4175   LearningRate 0.0371   Epoch: 16   Global Step: 83060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:09:47,659-Speed 10523.46 samples/sec   Loss 3.4337   LearningRate 0.0370   Epoch: 16   Global Step: 83070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:09:55,454-Speed 10510.78 samples/sec   Loss 3.4295   LearningRate 0.0370   Epoch: 16   Global Step: 83080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 09:10:03,228-Speed 10540.09 samples/sec   Loss 3.4475   LearningRate 0.0370   Epoch: 16   Global Step: 83090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:10:11,011-Speed 10527.14 samples/sec   Loss 3.4010   LearningRate 0.0369   Epoch: 16   Global Step: 83100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:10:18,791-Speed 10529.36 samples/sec   Loss 3.4176   LearningRate 0.0369   Epoch: 16   Global Step: 83110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:10:26,594-Speed 10500.63 samples/sec   Loss 3.4205   LearningRate 0.0369   Epoch: 16   Global Step: 83120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:10:34,383-Speed 10518.66 samples/sec   Loss 3.4148   LearningRate 0.0368   Epoch: 16   Global Step: 83130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:10:42,181-Speed 10507.37 samples/sec   Loss 3.3880   LearningRate 0.0368   Epoch: 16   Global Step: 83140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:10:49,959-Speed 10533.14 samples/sec   Loss 3.4061   LearningRate 0.0368   Epoch: 16   Global Step: 83150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:10:57,746-Speed 10521.14 samples/sec   Loss 3.4323   LearningRate 0.0367   Epoch: 16   Global Step: 83160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:11:05,534-Speed 10520.46 samples/sec   Loss 3.4157   LearningRate 0.0367   Epoch: 16   Global Step: 83170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:11:13,299-Speed 10550.91 samples/sec   Loss 3.4353   LearningRate 0.0367   Epoch: 16   Global Step: 83180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:11:21,075-Speed 10536.22 samples/sec   Loss 3.4268   LearningRate 0.0366   Epoch: 16   Global Step: 83190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-16 09:11:28,842-Speed 10548.96 samples/sec   Loss 3.3846   LearningRate 0.0366   Epoch: 16   Global Step: 83200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:11:36,631-Speed 10522.24 samples/sec   Loss 3.4121   LearningRate 0.0365   Epoch: 16   Global Step: 83210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:11:44,422-Speed 10515.35 samples/sec   Loss 3.4253   LearningRate 0.0365   Epoch: 16   Global Step: 83220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-16 09:11:52,196-Speed 10539.88 samples/sec   Loss 3.4164   LearningRate 0.0365   Epoch: 16   Global Step: 83230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:11:59,983-Speed 10521.83 samples/sec   Loss 3.3996   LearningRate 0.0364   Epoch: 16   Global Step: 83240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:12:07,752-Speed 10545.79 samples/sec   Loss 3.3922   LearningRate 0.0364   Epoch: 16   Global Step: 83250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:12:15,554-Speed 10502.19 samples/sec   Loss 3.4259   LearningRate 0.0364   Epoch: 16   Global Step: 83260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:12:23,328-Speed 10538.87 samples/sec   Loss 3.3995   LearningRate 0.0363   Epoch: 16   Global Step: 83270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:12:31,107-Speed 10533.12 samples/sec   Loss 3.4443   LearningRate 0.0363   Epoch: 16   Global Step: 83280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:12:38,871-Speed 10552.22 samples/sec   Loss 3.4318   LearningRate 0.0363   Epoch: 16   Global Step: 83290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:12:46,659-Speed 10520.03 samples/sec   Loss 3.3967   LearningRate 0.0362   Epoch: 16   Global Step: 83300   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:12:54,442-Speed 10527.66 samples/sec   Loss 3.4152   LearningRate 0.0362   Epoch: 16   Global Step: 83310   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:13:02,220-Speed 10536.36 samples/sec   Loss 3.4038   LearningRate 0.0362   Epoch: 16   Global Step: 83320   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:13:10,039-Speed 10478.38 samples/sec   Loss 3.4206   LearningRate 0.0361   Epoch: 16   Global Step: 83330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:13:17,838-Speed 10505.09 samples/sec   Loss 3.4451   LearningRate 0.0361   Epoch: 16   Global Step: 83340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:13:25,646-Speed 10492.73 samples/sec   Loss 3.4426   LearningRate 0.0360   Epoch: 16   Global Step: 83350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:13:33,456-Speed 10493.45 samples/sec   Loss 3.4282   LearningRate 0.0360   Epoch: 16   Global Step: 83360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:13:41,286-Speed 10463.76 samples/sec   Loss 3.4085   LearningRate 0.0360   Epoch: 16   Global Step: 83370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:13:49,091-Speed 10497.09 samples/sec   Loss 3.3957   LearningRate 0.0359   Epoch: 16   Global Step: 83380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:13:56,903-Speed 10487.06 samples/sec   Loss 3.3726   LearningRate 0.0359   Epoch: 16   Global Step: 83390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:14:04,742-Speed 10451.59 samples/sec   Loss 3.3608   LearningRate 0.0359   Epoch: 16   Global Step: 83400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:14:12,556-Speed 10486.13 samples/sec   Loss 3.3807   LearningRate 0.0358   Epoch: 16   Global Step: 83410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:14:20,364-Speed 10492.26 samples/sec   Loss 3.3782   LearningRate 0.0358   Epoch: 16   Global Step: 83420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:14:28,184-Speed 10477.84 samples/sec   Loss 3.4156   LearningRate 0.0358   Epoch: 16   Global Step: 83430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:14:36,017-Speed 10459.19 samples/sec   Loss 3.3838   LearningRate 0.0357   Epoch: 16   Global Step: 83440   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:14:43,850-Speed 10460.17 samples/sec   Loss 3.3891   LearningRate 0.0357   Epoch: 16   Global Step: 83450   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:14:51,696-Speed 10441.57 samples/sec   Loss 3.3788   LearningRate 0.0357   Epoch: 16   Global Step: 83460   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:14:59,579-Speed 10393.37 samples/sec   Loss 3.3625   LearningRate 0.0356   Epoch: 16   Global Step: 83470   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:15:07,409-Speed 10464.55 samples/sec   Loss 3.3636   LearningRate 0.0356   Epoch: 16   Global Step: 83480   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:15:15,250-Speed 10449.17 samples/sec   Loss 3.3802   LearningRate 0.0356   Epoch: 16   Global Step: 83490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:15:23,089-Speed 10451.10 samples/sec   Loss 3.3983   LearningRate 0.0355   Epoch: 16   Global Step: 83500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:15:30,932-Speed 10446.12 samples/sec   Loss 3.3882   LearningRate 0.0355   Epoch: 16   Global Step: 83510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:15:38,783-Speed 10436.75 samples/sec   Loss 3.3918   LearningRate 0.0354   Epoch: 16   Global Step: 83520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:15:46,640-Speed 10429.24 samples/sec   Loss 3.3457   LearningRate 0.0354   Epoch: 16   Global Step: 83530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:15:54,499-Speed 10425.24 samples/sec   Loss 3.3657   LearningRate 0.0354   Epoch: 16   Global Step: 83540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:16:02,362-Speed 10421.22 samples/sec   Loss 3.3685   LearningRate 0.0353   Epoch: 16   Global Step: 83550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:16:10,200-Speed 10452.88 samples/sec   Loss 3.3543   LearningRate 0.0353   Epoch: 16   Global Step: 83560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:16:18,046-Speed 10442.41 samples/sec   Loss 3.3690   LearningRate 0.0353   Epoch: 16   Global Step: 83570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:16:25,871-Speed 10471.12 samples/sec   Loss 3.3640   LearningRate 0.0352   Epoch: 16   Global Step: 83580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:16:33,723-Speed 10433.42 samples/sec   Loss 3.3496   LearningRate 0.0352   Epoch: 16   Global Step: 83590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:16:41,579-Speed 10430.18 samples/sec   Loss 3.3840   LearningRate 0.0352   Epoch: 16   Global Step: 83600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:16:49,410-Speed 10462.68 samples/sec   Loss 3.3310   LearningRate 0.0351   Epoch: 16   Global Step: 83610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:16:57,262-Speed 10434.50 samples/sec   Loss 3.3267   LearningRate 0.0351   Epoch: 16   Global Step: 83620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:17:05,123-Speed 10421.71 samples/sec   Loss 3.3925   LearningRate 0.0351   Epoch: 16   Global Step: 83630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:17:12,992-Speed 10412.40 samples/sec   Loss 3.3432   LearningRate 0.0350   Epoch: 16   Global Step: 83640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:17:20,844-Speed 10434.08 samples/sec   Loss 3.3468   LearningRate 0.0350   Epoch: 16   Global Step: 83650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:17:28,676-Speed 10460.62 samples/sec   Loss 3.3735   LearningRate 0.0350   Epoch: 16   Global Step: 83660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:17:36,523-Speed 10439.70 samples/sec   Loss 3.3417   LearningRate 0.0349   Epoch: 16   Global Step: 83670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:17:44,364-Speed 10449.76 samples/sec   Loss 3.3474   LearningRate 0.0349   Epoch: 16   Global Step: 83680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:17:52,210-Speed 10443.62 samples/sec   Loss 3.3429   LearningRate 0.0349   Epoch: 16   Global Step: 83690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:18:00,042-Speed 10459.68 samples/sec   Loss 3.3785   LearningRate 0.0348   Epoch: 16   Global Step: 83700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:18:07,850-Speed 10494.21 samples/sec   Loss 3.3596   LearningRate 0.0348   Epoch: 16   Global Step: 83710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:18:15,694-Speed 10446.01 samples/sec   Loss 3.3432   LearningRate 0.0347   Epoch: 16   Global Step: 83720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:18:23,508-Speed 10483.66 samples/sec   Loss 3.3067   LearningRate 0.0347   Epoch: 16   Global Step: 83730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:18:31,315-Speed 10494.76 samples/sec   Loss 3.3450   LearningRate 0.0347   Epoch: 16   Global Step: 83740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:18:39,129-Speed 10486.05 samples/sec   Loss 3.3161   LearningRate 0.0346   Epoch: 16   Global Step: 83750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:18:46,946-Speed 10480.67 samples/sec   Loss 3.3189   LearningRate 0.0346   Epoch: 16   Global Step: 83760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:18:54,743-Speed 10508.80 samples/sec   Loss 3.3403   LearningRate 0.0346   Epoch: 16   Global Step: 83770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:19:02,563-Speed 10475.75 samples/sec   Loss 3.3466   LearningRate 0.0345   Epoch: 16   Global Step: 83780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:19:10,364-Speed 10502.63 samples/sec   Loss 3.3323   LearningRate 0.0345   Epoch: 16   Global Step: 83790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:19:18,182-Speed 10480.26 samples/sec   Loss 3.3280   LearningRate 0.0345   Epoch: 16   Global Step: 83800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:19:25,997-Speed 10483.52 samples/sec   Loss 3.3465   LearningRate 0.0344   Epoch: 16   Global Step: 83810   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:19:33,796-Speed 10505.72 samples/sec   Loss 3.3335   LearningRate 0.0344   Epoch: 16   Global Step: 83820   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:19:41,596-Speed 10503.45 samples/sec   Loss 3.3359   LearningRate 0.0344   Epoch: 16   Global Step: 83830   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:19:49,405-Speed 10492.63 samples/sec   Loss 3.3400   LearningRate 0.0343   Epoch: 16   Global Step: 83840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:19:57,210-Speed 10497.50 samples/sec   Loss 3.3202   LearningRate 0.0343   Epoch: 16   Global Step: 83850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:20:05,024-Speed 10485.34 samples/sec   Loss 3.3234   LearningRate 0.0343   Epoch: 16   Global Step: 83860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:20:12,868-Speed 10444.30 samples/sec   Loss 3.3088   LearningRate 0.0342   Epoch: 16   Global Step: 83870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:20:20,705-Speed 10454.88 samples/sec   Loss 3.3268   LearningRate 0.0342   Epoch: 16   Global Step: 83880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:20:28,509-Speed 10497.59 samples/sec   Loss 3.3550   LearningRate 0.0342   Epoch: 16   Global Step: 83890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:20:36,318-Speed 10492.96 samples/sec   Loss 3.3369   LearningRate 0.0341   Epoch: 16   Global Step: 83900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:20:44,131-Speed 10485.93 samples/sec   Loss 3.3279   LearningRate 0.0341   Epoch: 16   Global Step: 83910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:20:51,959-Speed 10467.17 samples/sec   Loss 3.3147   LearningRate 0.0341   Epoch: 16   Global Step: 83920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:20:59,764-Speed 10497.46 samples/sec   Loss 3.3078   LearningRate 0.0340   Epoch: 16   Global Step: 83930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:21:07,582-Speed 10479.67 samples/sec   Loss 3.3156   LearningRate 0.0340   Epoch: 16   Global Step: 83940   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:21:15,386-Speed 10498.38 samples/sec   Loss 3.3183   LearningRate 0.0339   Epoch: 16   Global Step: 83950   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:21:23,179-Speed 10515.33 samples/sec   Loss 3.3109   LearningRate 0.0339   Epoch: 16   Global Step: 83960   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:21:30,975-Speed 10509.12 samples/sec   Loss 3.3094   LearningRate 0.0339   Epoch: 16   Global Step: 83970   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:21:38,800-Speed 10470.01 samples/sec   Loss 3.3217   LearningRate 0.0338   Epoch: 16   Global Step: 83980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:21:46,603-Speed 10500.97 samples/sec   Loss 3.2865   LearningRate 0.0338   Epoch: 16   Global Step: 83990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:21:54,418-Speed 10482.98 samples/sec   Loss 3.2974   LearningRate 0.0338   Epoch: 16   Global Step: 84000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:22:02,200-Speed 10528.51 samples/sec   Loss 3.2964   LearningRate 0.0337   Epoch: 16   Global Step: 84010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:22:09,995-Speed 10510.00 samples/sec   Loss 3.2877   LearningRate 0.0337   Epoch: 16   Global Step: 84020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:22:17,791-Speed 10510.03 samples/sec   Loss 3.2967   LearningRate 0.0337   Epoch: 16   Global Step: 84030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:22:25,571-Speed 10530.42 samples/sec   Loss 3.3288   LearningRate 0.0336   Epoch: 16   Global Step: 84040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:22:33,349-Speed 10533.48 samples/sec   Loss 3.3127   LearningRate 0.0336   Epoch: 16   Global Step: 84050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:22:41,141-Speed 10515.86 samples/sec   Loss 3.3219   LearningRate 0.0336   Epoch: 16   Global Step: 84060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:22:48,945-Speed 10498.54 samples/sec   Loss 3.3041   LearningRate 0.0335   Epoch: 16   Global Step: 84070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:22:56,748-Speed 10499.75 samples/sec   Loss 3.3230   LearningRate 0.0335   Epoch: 16   Global Step: 84080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:23:04,541-Speed 10514.40 samples/sec   Loss 3.2888   LearningRate 0.0335   Epoch: 16   Global Step: 84090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:23:12,336-Speed 10511.07 samples/sec   Loss 3.3109   LearningRate 0.0334   Epoch: 16   Global Step: 84100   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:23:20,134-Speed 10506.58 samples/sec   Loss 3.2862   LearningRate 0.0334   Epoch: 16   Global Step: 84110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:23:27,951-Speed 10480.77 samples/sec   Loss 3.2766   LearningRate 0.0334   Epoch: 16   Global Step: 84120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:23:35,756-Speed 10498.79 samples/sec   Loss 3.2916   LearningRate 0.0333   Epoch: 16   Global Step: 84130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:23:43,542-Speed 10522.20 samples/sec   Loss 3.3078   LearningRate 0.0333   Epoch: 16   Global Step: 84140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:23:51,348-Speed 10495.61 samples/sec   Loss 3.2741   LearningRate 0.0333   Epoch: 16   Global Step: 84150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:23:59,162-Speed 10485.07 samples/sec   Loss 3.2856   LearningRate 0.0332   Epoch: 16   Global Step: 84160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:24:06,980-Speed 10480.27 samples/sec   Loss 3.2859   LearningRate 0.0332   Epoch: 16   Global Step: 84170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:24:14,795-Speed 10483.85 samples/sec   Loss 3.2875   LearningRate 0.0332   Epoch: 16   Global Step: 84180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:24:22,573-Speed 10534.39 samples/sec   Loss 3.2836   LearningRate 0.0331   Epoch: 16   Global Step: 84190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:24:30,371-Speed 10505.80 samples/sec   Loss 3.3000   LearningRate 0.0331   Epoch: 16   Global Step: 84200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:24:38,158-Speed 10522.99 samples/sec   Loss 3.2715   LearningRate 0.0331   Epoch: 16   Global Step: 84210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:24:45,946-Speed 10519.69 samples/sec   Loss 3.2528   LearningRate 0.0330   Epoch: 16   Global Step: 84220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:24:53,738-Speed 10515.15 samples/sec   Loss 3.3035   LearningRate 0.0330   Epoch: 16   Global Step: 84230   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:25:01,531-Speed 10512.60 samples/sec   Loss 3.2522   LearningRate 0.0330   Epoch: 16   Global Step: 84240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:25:09,332-Speed 10503.68 samples/sec   Loss 3.2621   LearningRate 0.0329   Epoch: 16   Global Step: 84250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:25:17,135-Speed 10499.22 samples/sec   Loss 3.2808   LearningRate 0.0329   Epoch: 16   Global Step: 84260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:25:24,922-Speed 10521.60 samples/sec   Loss 3.2557   LearningRate 0.0329   Epoch: 16   Global Step: 84270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:25:32,718-Speed 10509.51 samples/sec   Loss 3.2769   LearningRate 0.0328   Epoch: 16   Global Step: 84280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:25:40,503-Speed 10524.14 samples/sec   Loss 3.2694   LearningRate 0.0328   Epoch: 16   Global Step: 84290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:25:48,329-Speed 10469.00 samples/sec   Loss 3.2925   LearningRate 0.0328   Epoch: 16   Global Step: 84300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:25:56,127-Speed 10507.38 samples/sec   Loss 3.2615   LearningRate 0.0327   Epoch: 16   Global Step: 84310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:26:03,925-Speed 10505.62 samples/sec   Loss 3.2643   LearningRate 0.0327   Epoch: 16   Global Step: 84320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:26:11,741-Speed 10483.09 samples/sec   Loss 3.2885   LearningRate 0.0327   Epoch: 16   Global Step: 84330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:26:19,572-Speed 10461.83 samples/sec   Loss 3.2519   LearningRate 0.0326   Epoch: 16   Global Step: 84340   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:26:27,377-Speed 10497.65 samples/sec   Loss 3.2635   LearningRate 0.0326   Epoch: 16   Global Step: 84350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:26:35,153-Speed 10535.98 samples/sec   Loss 3.2567   LearningRate 0.0326   Epoch: 16   Global Step: 84360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:26:42,932-Speed 10533.09 samples/sec   Loss 3.2575   LearningRate 0.0325   Epoch: 16   Global Step: 84370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:26:50,728-Speed 10509.10 samples/sec   Loss 3.2356   LearningRate 0.0325   Epoch: 16   Global Step: 84380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:26:58,534-Speed 10496.40 samples/sec   Loss 3.2390   LearningRate 0.0325   Epoch: 16   Global Step: 84390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:27:06,336-Speed 10501.04 samples/sec   Loss 3.2387   LearningRate 0.0324   Epoch: 16   Global Step: 84400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:27:14,136-Speed 10504.51 samples/sec   Loss 3.2424   LearningRate 0.0324   Epoch: 16   Global Step: 84410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:27:21,939-Speed 10500.21 samples/sec   Loss 3.2343   LearningRate 0.0324   Epoch: 16   Global Step: 84420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:27:29,728-Speed 10518.50 samples/sec   Loss 3.2509   LearningRate 0.0323   Epoch: 16   Global Step: 84430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:27:37,564-Speed 10454.75 samples/sec   Loss 3.2644   LearningRate 0.0323   Epoch: 16   Global Step: 84440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:27:45,350-Speed 10523.17 samples/sec   Loss 3.2644   LearningRate 0.0323   Epoch: 16   Global Step: 84450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:27:53,142-Speed 10515.76 samples/sec   Loss 3.2624   LearningRate 0.0322   Epoch: 16   Global Step: 84460   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:28:00,934-Speed 10515.01 samples/sec   Loss 3.2404   LearningRate 0.0322   Epoch: 16   Global Step: 84470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:28:08,736-Speed 10500.41 samples/sec   Loss 3.2223   LearningRate 0.0322   Epoch: 16   Global Step: 84480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:28:16,539-Speed 10499.37 samples/sec   Loss 3.2365   LearningRate 0.0321   Epoch: 16   Global Step: 84490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:28:24,353-Speed 10485.84 samples/sec   Loss 3.2460   LearningRate 0.0321   Epoch: 16   Global Step: 84500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:28:32,150-Speed 10508.04 samples/sec   Loss 3.2405   LearningRate 0.0320   Epoch: 16   Global Step: 84510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:28:39,976-Speed 10468.14 samples/sec   Loss 3.2289   LearningRate 0.0320   Epoch: 16   Global Step: 84520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:28:47,789-Speed 10486.24 samples/sec   Loss 3.2500   LearningRate 0.0320   Epoch: 16   Global Step: 84530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:28:55,557-Speed 10547.87 samples/sec   Loss 3.2582   LearningRate 0.0319   Epoch: 16   Global Step: 84540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:29:03,351-Speed 10511.30 samples/sec   Loss 3.2318   LearningRate 0.0319   Epoch: 16   Global Step: 84550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:29:11,141-Speed 10518.39 samples/sec   Loss 3.2569   LearningRate 0.0319   Epoch: 16   Global Step: 84560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:29:18,963-Speed 10474.30 samples/sec   Loss 3.2003   LearningRate 0.0318   Epoch: 16   Global Step: 84570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:29:26,743-Speed 10531.79 samples/sec   Loss 3.2236   LearningRate 0.0318   Epoch: 16   Global Step: 84580   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:29:34,533-Speed 10516.41 samples/sec   Loss 3.2360   LearningRate 0.0318   Epoch: 16   Global Step: 84590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:29:42,342-Speed 10492.31 samples/sec   Loss 3.2295   LearningRate 0.0317   Epoch: 16   Global Step: 84600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:29:50,166-Speed 10471.93 samples/sec   Loss 3.1992   LearningRate 0.0317   Epoch: 16   Global Step: 84610   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:29:57,985-Speed 10478.99 samples/sec   Loss 3.2023   LearningRate 0.0317   Epoch: 16   Global Step: 84620   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:30:05,781-Speed 10509.02 samples/sec   Loss 3.2018   LearningRate 0.0316   Epoch: 16   Global Step: 84630   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:30:13,587-Speed 10495.93 samples/sec   Loss 3.2198   LearningRate 0.0316   Epoch: 16   Global Step: 84640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:30:21,367-Speed 10530.60 samples/sec   Loss 3.2237   LearningRate 0.0316   Epoch: 16   Global Step: 84650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:30:29,139-Speed 10543.07 samples/sec   Loss 3.1977   LearningRate 0.0316   Epoch: 16   Global Step: 84660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:30:36,967-Speed 10466.83 samples/sec   Loss 3.2227   LearningRate 0.0315   Epoch: 16   Global Step: 84670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:30:44,769-Speed 10501.67 samples/sec   Loss 3.2070   LearningRate 0.0315   Epoch: 16   Global Step: 84680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:30:52,559-Speed 10517.86 samples/sec   Loss 3.2284   LearningRate 0.0315   Epoch: 16   Global Step: 84690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:31:00,340-Speed 10530.21 samples/sec   Loss 3.2265   LearningRate 0.0314   Epoch: 16   Global Step: 84700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:31:08,119-Speed 10530.54 samples/sec   Loss 3.2240   LearningRate 0.0314   Epoch: 16   Global Step: 84710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:31:15,906-Speed 10521.78 samples/sec   Loss 3.1771   LearningRate 0.0314   Epoch: 16   Global Step: 84720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:31:23,689-Speed 10528.20 samples/sec   Loss 3.2001   LearningRate 0.0313   Epoch: 16   Global Step: 84730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:31:31,486-Speed 10507.98 samples/sec   Loss 3.2058   LearningRate 0.0313   Epoch: 16   Global Step: 84740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:31:39,289-Speed 10500.38 samples/sec   Loss 3.2015   LearningRate 0.0313   Epoch: 16   Global Step: 84750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:31:47,130-Speed 10449.12 samples/sec   Loss 3.1930   LearningRate 0.0312   Epoch: 16   Global Step: 84760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:31:54,939-Speed 10492.24 samples/sec   Loss 3.1996   LearningRate 0.0312   Epoch: 16   Global Step: 84770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:32:02,756-Speed 10481.97 samples/sec   Loss 3.2463   LearningRate 0.0312   Epoch: 16   Global Step: 84780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:32:10,552-Speed 10508.16 samples/sec   Loss 3.2002   LearningRate 0.0311   Epoch: 16   Global Step: 84790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:32:18,401-Speed 10438.14 samples/sec   Loss 3.2048   LearningRate 0.0311   Epoch: 16   Global Step: 84800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:32:26,226-Speed 10470.43 samples/sec   Loss 3.2073   LearningRate 0.0311   Epoch: 16   Global Step: 84810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:32:34,028-Speed 10502.46 samples/sec   Loss 3.1992   LearningRate 0.0310   Epoch: 16   Global Step: 84820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:32:41,827-Speed 10504.58 samples/sec   Loss 3.1921   LearningRate 0.0310   Epoch: 16   Global Step: 84830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:32:49,645-Speed 10479.81 samples/sec   Loss 3.1777   LearningRate 0.0310   Epoch: 16   Global Step: 84840   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:32:57,458-Speed 10485.78 samples/sec   Loss 3.1925   LearningRate 0.0309   Epoch: 16   Global Step: 84850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:33:05,257-Speed 10506.68 samples/sec   Loss 3.1921   LearningRate 0.0309   Epoch: 16   Global Step: 84860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:33:13,052-Speed 10509.66 samples/sec   Loss 3.1866   LearningRate 0.0309   Epoch: 16   Global Step: 84870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:33:20,839-Speed 10521.11 samples/sec   Loss 3.1803   LearningRate 0.0308   Epoch: 16   Global Step: 84880   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:33:28,632-Speed 10513.61 samples/sec   Loss 3.1620   LearningRate 0.0308   Epoch: 16   Global Step: 84890   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:33:36,433-Speed 10503.63 samples/sec   Loss 3.1800   LearningRate 0.0308   Epoch: 16   Global Step: 84900   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:33:44,225-Speed 10513.28 samples/sec   Loss 3.1844   LearningRate 0.0307   Epoch: 16   Global Step: 84910   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:33:52,040-Speed 10486.59 samples/sec   Loss 3.1630   LearningRate 0.0307   Epoch: 16   Global Step: 84920   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:33:59,833-Speed 10514.48 samples/sec   Loss 3.1367   LearningRate 0.0307   Epoch: 16   Global Step: 84930   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:34:07,640-Speed 10493.42 samples/sec   Loss 3.1599   LearningRate 0.0306   Epoch: 16   Global Step: 84940   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:34:15,446-Speed 10496.66 samples/sec   Loss 3.1565   LearningRate 0.0306   Epoch: 16   Global Step: 84950   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:34:23,257-Speed 10488.89 samples/sec   Loss 3.1867   LearningRate 0.0306   Epoch: 16   Global Step: 84960   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:34:31,051-Speed 10513.30 samples/sec   Loss 3.1637   LearningRate 0.0305   Epoch: 16   Global Step: 84970   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:34:38,850-Speed 10505.37 samples/sec   Loss 3.1989   LearningRate 0.0305   Epoch: 16   Global Step: 84980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:34:46,647-Speed 10506.97 samples/sec   Loss 3.1657   LearningRate 0.0305   Epoch: 16   Global Step: 84990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:34:54,434-Speed 10521.23 samples/sec   Loss 3.1589   LearningRate 0.0304   Epoch: 16   Global Step: 85000   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:35:02,254-Speed 10482.20 samples/sec   Loss 3.1794   LearningRate 0.0304   Epoch: 16   Global Step: 85010   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:35:10,079-Speed 10469.79 samples/sec   Loss 3.1633   LearningRate 0.0304   Epoch: 16   Global Step: 85020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:35:17,884-Speed 10496.53 samples/sec   Loss 3.1485   LearningRate 0.0303   Epoch: 16   Global Step: 85030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:35:25,689-Speed 10497.38 samples/sec   Loss 3.1336   LearningRate 0.0303   Epoch: 16   Global Step: 85040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:35:33,484-Speed 10510.53 samples/sec   Loss 3.1480   LearningRate 0.0303   Epoch: 16   Global Step: 85050   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:35:41,300-Speed 10482.94 samples/sec   Loss 3.1662   LearningRate 0.0302   Epoch: 16   Global Step: 85060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:35:49,097-Speed 10508.23 samples/sec   Loss 3.1405   LearningRate 0.0302   Epoch: 16   Global Step: 85070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:35:56,911-Speed 10484.13 samples/sec   Loss 3.1416   LearningRate 0.0302   Epoch: 16   Global Step: 85080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:36:04,725-Speed 10488.62 samples/sec   Loss 3.1660   LearningRate 0.0301   Epoch: 16   Global Step: 85090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:36:12,533-Speed 10492.33 samples/sec   Loss 3.1594   LearningRate 0.0301   Epoch: 16   Global Step: 85100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:36:20,338-Speed 10497.50 samples/sec   Loss 3.1601   LearningRate 0.0301   Epoch: 16   Global Step: 85110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:36:28,145-Speed 10494.99 samples/sec   Loss 3.1525   LearningRate 0.0300   Epoch: 16   Global Step: 85120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:36:35,974-Speed 10465.48 samples/sec   Loss 3.1504   LearningRate 0.0300   Epoch: 16   Global Step: 85130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:36:43,760-Speed 10522.80 samples/sec   Loss 3.1360   LearningRate 0.0300   Epoch: 16   Global Step: 85140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:36:51,564-Speed 10500.92 samples/sec   Loss 3.1602   LearningRate 0.0299   Epoch: 16   Global Step: 85150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:36:59,369-Speed 10503.23 samples/sec   Loss 3.1538   LearningRate 0.0299   Epoch: 16   Global Step: 85160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:37:07,147-Speed 10534.26 samples/sec   Loss 3.1328   LearningRate 0.0299   Epoch: 16   Global Step: 85170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:37:14,938-Speed 10516.14 samples/sec   Loss 3.1325   LearningRate 0.0298   Epoch: 16   Global Step: 85180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:37:22,713-Speed 10538.15 samples/sec   Loss 3.1604   LearningRate 0.0298   Epoch: 16   Global Step: 85190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:37:30,542-Speed 10464.00 samples/sec   Loss 3.1545   LearningRate 0.0298   Epoch: 16   Global Step: 85200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:37:38,383-Speed 10450.74 samples/sec   Loss 3.1482   LearningRate 0.0298   Epoch: 16   Global Step: 85210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:37:46,181-Speed 10506.19 samples/sec   Loss 3.1538   LearningRate 0.0297   Epoch: 16   Global Step: 85220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:37:53,994-Speed 10485.13 samples/sec   Loss 3.1378   LearningRate 0.0297   Epoch: 16   Global Step: 85230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:38:01,813-Speed 10478.59 samples/sec   Loss 3.1058   LearningRate 0.0297   Epoch: 16   Global Step: 85240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:38:09,614-Speed 10503.92 samples/sec   Loss 3.1249   LearningRate 0.0296   Epoch: 16   Global Step: 85250   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:38:17,404-Speed 10516.90 samples/sec   Loss 3.1089   LearningRate 0.0296   Epoch: 16   Global Step: 85260   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:38:25,195-Speed 10519.16 samples/sec   Loss 3.1226   LearningRate 0.0296   Epoch: 16   Global Step: 85270   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:38:32,992-Speed 10508.19 samples/sec   Loss 3.1069   LearningRate 0.0295   Epoch: 16   Global Step: 85280   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:38:40,835-Speed 10447.74 samples/sec   Loss 3.1259   LearningRate 0.0295   Epoch: 16   Global Step: 85290   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:38:48,644-Speed 10491.78 samples/sec   Loss 3.1364   LearningRate 0.0295   Epoch: 16   Global Step: 85300   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:38:56,454-Speed 10489.91 samples/sec   Loss 3.1472   LearningRate 0.0294   Epoch: 16   Global Step: 85310   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:39:04,299-Speed 10443.07 samples/sec   Loss 3.1279   LearningRate 0.0294   Epoch: 16   Global Step: 85320   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:39:12,127-Speed 10467.47 samples/sec   Loss 3.1406   LearningRate 0.0294   Epoch: 16   Global Step: 85330   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:39:19,943-Speed 10486.73 samples/sec   Loss 3.1509   LearningRate 0.0293   Epoch: 16   Global Step: 85340   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:39:27,810-Speed 10413.16 samples/sec   Loss 3.1415   LearningRate 0.0293   Epoch: 16   Global Step: 85350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:39:35,602-Speed 10515.58 samples/sec   Loss 3.1262   LearningRate 0.0293   Epoch: 16   Global Step: 85360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:39:43,417-Speed 10483.94 samples/sec   Loss 3.1162   LearningRate 0.0292   Epoch: 16   Global Step: 85370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:39:51,221-Speed 10498.00 samples/sec   Loss 3.1098   LearningRate 0.0292   Epoch: 16   Global Step: 85380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:39:59,034-Speed 10487.64 samples/sec   Loss 3.1097   LearningRate 0.0292   Epoch: 16   Global Step: 85390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:40:06,818-Speed 10524.61 samples/sec   Loss 3.1449   LearningRate 0.0291   Epoch: 16   Global Step: 85400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:40:14,618-Speed 10506.94 samples/sec   Loss 3.1056   LearningRate 0.0291   Epoch: 16   Global Step: 85410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:40:22,453-Speed 10458.00 samples/sec   Loss 3.0964   LearningRate 0.0291   Epoch: 16   Global Step: 85420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:40:30,267-Speed 10485.25 samples/sec   Loss 3.1075   LearningRate 0.0290   Epoch: 16   Global Step: 85430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:40:38,057-Speed 10517.92 samples/sec   Loss 3.0993   LearningRate 0.0290   Epoch: 16   Global Step: 85440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:40:45,861-Speed 10498.25 samples/sec   Loss 3.1041   LearningRate 0.0290   Epoch: 16   Global Step: 85450   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:40:53,674-Speed 10486.76 samples/sec   Loss 3.1133   LearningRate 0.0290   Epoch: 16   Global Step: 85460   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:41:01,481-Speed 10494.69 samples/sec   Loss 3.0962   LearningRate 0.0289   Epoch: 16   Global Step: 85470   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:41:09,297-Speed 10481.83 samples/sec   Loss 3.0980   LearningRate 0.0289   Epoch: 16   Global Step: 85480   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:41:17,107-Speed 10490.70 samples/sec   Loss 3.0912   LearningRate 0.0289   Epoch: 16   Global Step: 85490   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:41:24,911-Speed 10499.35 samples/sec   Loss 3.1082   LearningRate 0.0288   Epoch: 16   Global Step: 85500   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:41:32,704-Speed 10513.00 samples/sec   Loss 3.1243   LearningRate 0.0288   Epoch: 16   Global Step: 85510   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:41:40,491-Speed 10521.11 samples/sec   Loss 3.1023   LearningRate 0.0288   Epoch: 16   Global Step: 85520   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:41:48,293-Speed 10500.82 samples/sec   Loss 3.1271   LearningRate 0.0287   Epoch: 16   Global Step: 85530   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:41:56,105-Speed 10488.83 samples/sec   Loss 3.0951   LearningRate 0.0287   Epoch: 16   Global Step: 85540   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:42:03,891-Speed 10522.87 samples/sec   Loss 3.1131   LearningRate 0.0287   Epoch: 16   Global Step: 85550   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:42:11,705-Speed 10484.33 samples/sec   Loss 3.0978   LearningRate 0.0286   Epoch: 16   Global Step: 85560   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:42:19,473-Speed 10547.82 samples/sec   Loss 3.0651   LearningRate 0.0286   Epoch: 16   Global Step: 85570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:42:27,285-Speed 10487.54 samples/sec   Loss 3.0937   LearningRate 0.0286   Epoch: 16   Global Step: 85580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:42:35,079-Speed 10512.33 samples/sec   Loss 3.0824   LearningRate 0.0285   Epoch: 16   Global Step: 85590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:42:42,870-Speed 10515.52 samples/sec   Loss 3.0751   LearningRate 0.0285   Epoch: 16   Global Step: 85600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:42:50,660-Speed 10517.64 samples/sec   Loss 3.0830   LearningRate 0.0285   Epoch: 16   Global Step: 85610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:42:58,445-Speed 10523.92 samples/sec   Loss 3.0727   LearningRate 0.0284   Epoch: 16   Global Step: 85620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:43:06,235-Speed 10518.29 samples/sec   Loss 3.0624   LearningRate 0.0284   Epoch: 16   Global Step: 85630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:43:14,023-Speed 10520.66 samples/sec   Loss 3.0688   LearningRate 0.0284   Epoch: 16   Global Step: 85640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:43:21,803-Speed 10529.93 samples/sec   Loss 3.0671   LearningRate 0.0284   Epoch: 16   Global Step: 85650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:43:29,616-Speed 10487.39 samples/sec   Loss 3.0942   LearningRate 0.0283   Epoch: 16   Global Step: 85660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:43:37,392-Speed 10536.03 samples/sec   Loss 3.0844   LearningRate 0.0283   Epoch: 16   Global Step: 85670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:43:45,183-Speed 10515.69 samples/sec   Loss 3.0434   LearningRate 0.0283   Epoch: 16   Global Step: 85680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:43:52,985-Speed 10501.47 samples/sec   Loss 3.0887   LearningRate 0.0282   Epoch: 16   Global Step: 85690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:44:00,770-Speed 10524.78 samples/sec   Loss 3.0690   LearningRate 0.0282   Epoch: 16   Global Step: 85700   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:44:08,588-Speed 10480.15 samples/sec   Loss 3.0870   LearningRate 0.0282   Epoch: 16   Global Step: 85710   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:44:16,380-Speed 10514.49 samples/sec   Loss 3.0801   LearningRate 0.0281   Epoch: 16   Global Step: 85720   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:44:24,161-Speed 10529.93 samples/sec   Loss 3.0731   LearningRate 0.0281   Epoch: 16   Global Step: 85730   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:44:31,938-Speed 10535.14 samples/sec   Loss 3.0978   LearningRate 0.0281   Epoch: 16   Global Step: 85740   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:44:39,732-Speed 10512.05 samples/sec   Loss 3.0545   LearningRate 0.0280   Epoch: 16   Global Step: 85750   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:44:47,513-Speed 10529.54 samples/sec   Loss 3.0458   LearningRate 0.0280   Epoch: 16   Global Step: 85760   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:44:55,306-Speed 10512.02 samples/sec   Loss 3.1056   LearningRate 0.0280   Epoch: 16   Global Step: 85770   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:45:03,093-Speed 10522.42 samples/sec   Loss 3.0977   LearningRate 0.0279   Epoch: 16   Global Step: 85780   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:45:10,876-Speed 10525.97 samples/sec   Loss 3.0620   LearningRate 0.0279   Epoch: 16   Global Step: 85790   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:45:18,681-Speed 10497.08 samples/sec   Loss 3.0835   LearningRate 0.0279   Epoch: 16   Global Step: 85800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:45:26,484-Speed 10500.84 samples/sec   Loss 3.0712   LearningRate 0.0279   Epoch: 16   Global Step: 85810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:45:34,260-Speed 10541.84 samples/sec   Loss 3.0554   LearningRate 0.0278   Epoch: 16   Global Step: 85820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:45:42,043-Speed 10532.07 samples/sec   Loss 3.0469   LearningRate 0.0278   Epoch: 16   Global Step: 85830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:45:49,824-Speed 10529.53 samples/sec   Loss 3.0332   LearningRate 0.0278   Epoch: 16   Global Step: 85840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:45:57,642-Speed 10480.06 samples/sec   Loss 3.0488   LearningRate 0.0277   Epoch: 16   Global Step: 85850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:46:05,433-Speed 10515.80 samples/sec   Loss 3.0597   LearningRate 0.0277   Epoch: 16   Global Step: 85860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:46:13,226-Speed 10514.22 samples/sec   Loss 3.0605   LearningRate 0.0277   Epoch: 16   Global Step: 85870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:46:21,009-Speed 10526.61 samples/sec   Loss 3.0626   LearningRate 0.0276   Epoch: 16   Global Step: 85880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:46:28,800-Speed 10515.21 samples/sec   Loss 3.0667   LearningRate 0.0276   Epoch: 16   Global Step: 85890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:46:36,581-Speed 10530.17 samples/sec   Loss 3.0426   LearningRate 0.0276   Epoch: 16   Global Step: 85900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:46:44,374-Speed 10513.01 samples/sec   Loss 3.0514   LearningRate 0.0275   Epoch: 16   Global Step: 85910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:46:52,152-Speed 10534.01 samples/sec   Loss 3.0509   LearningRate 0.0275   Epoch: 16   Global Step: 85920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:46:59,935-Speed 10526.61 samples/sec   Loss 3.0789   LearningRate 0.0275   Epoch: 16   Global Step: 85930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:47:07,719-Speed 10525.46 samples/sec   Loss 3.0649   LearningRate 0.0274   Epoch: 16   Global Step: 85940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:47:15,532-Speed 10485.82 samples/sec   Loss 3.0613   LearningRate 0.0274   Epoch: 16   Global Step: 85950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:47:23,340-Speed 10493.85 samples/sec   Loss 3.0211   LearningRate 0.0274   Epoch: 16   Global Step: 85960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:47:31,147-Speed 10493.92 samples/sec   Loss 3.0247   LearningRate 0.0274   Epoch: 16   Global Step: 85970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:47:38,941-Speed 10512.06 samples/sec   Loss 3.0454   LearningRate 0.0273   Epoch: 16   Global Step: 85980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:47:46,747-Speed 10496.43 samples/sec   Loss 3.0226   LearningRate 0.0273   Epoch: 16   Global Step: 85990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:47:54,536-Speed 10518.87 samples/sec   Loss 3.0236   LearningRate 0.0273   Epoch: 16   Global Step: 86000   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:48:02,353-Speed 10481.94 samples/sec   Loss 3.0392   LearningRate 0.0272   Epoch: 16   Global Step: 86010   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:48:10,168-Speed 10484.05 samples/sec   Loss 3.0079   LearningRate 0.0272   Epoch: 16   Global Step: 86020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:48:17,963-Speed 10510.63 samples/sec   Loss 3.0577   LearningRate 0.0272   Epoch: 16   Global Step: 86030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:48:25,744-Speed 10529.77 samples/sec   Loss 3.0192   LearningRate 0.0271   Epoch: 16   Global Step: 86040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:48:33,566-Speed 10473.53 samples/sec   Loss 3.0256   LearningRate 0.0271   Epoch: 16   Global Step: 86050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:48:41,387-Speed 10477.51 samples/sec   Loss 3.0345   LearningRate 0.0271   Epoch: 16   Global Step: 86060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:48:49,221-Speed 10458.23 samples/sec   Loss 3.0358   LearningRate 0.0270   Epoch: 16   Global Step: 86070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:48:57,047-Speed 10469.32 samples/sec   Loss 3.0340   LearningRate 0.0270   Epoch: 16   Global Step: 86080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:49:04,868-Speed 10475.20 samples/sec   Loss 3.0208   LearningRate 0.0270   Epoch: 16   Global Step: 86090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:49:12,638-Speed 10543.40 samples/sec   Loss 2.9830   LearningRate 0.0270   Epoch: 16   Global Step: 86100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:49:20,421-Speed 10528.27 samples/sec   Loss 3.0356   LearningRate 0.0269   Epoch: 16   Global Step: 86110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:49:28,208-Speed 10521.08 samples/sec   Loss 3.0143   LearningRate 0.0269   Epoch: 16   Global Step: 86120   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:49:36,004-Speed 10510.24 samples/sec   Loss 3.0030   LearningRate 0.0269   Epoch: 16   Global Step: 86130   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:49:43,763-Speed 10559.14 samples/sec   Loss 2.9737   LearningRate 0.0268   Epoch: 16   Global Step: 86140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:49:51,544-Speed 10531.50 samples/sec   Loss 3.0049   LearningRate 0.0268   Epoch: 16   Global Step: 86150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:49:59,328-Speed 10524.81 samples/sec   Loss 2.9847   LearningRate 0.0268   Epoch: 16   Global Step: 86160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:50:07,117-Speed 10519.45 samples/sec   Loss 3.0256   LearningRate 0.0267   Epoch: 16   Global Step: 86170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:50:14,912-Speed 10510.55 samples/sec   Loss 2.9954   LearningRate 0.0267   Epoch: 16   Global Step: 86180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:50:22,696-Speed 10524.48 samples/sec   Loss 3.0110   LearningRate 0.0267   Epoch: 16   Global Step: 86190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:50:30,477-Speed 10530.14 samples/sec   Loss 2.9891   LearningRate 0.0266   Epoch: 16   Global Step: 86200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:50:38,305-Speed 10467.11 samples/sec   Loss 3.0046   LearningRate 0.0266   Epoch: 16   Global Step: 86210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:50:46,109-Speed 10498.05 samples/sec   Loss 3.0305   LearningRate 0.0266   Epoch: 16   Global Step: 86220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:50:53,913-Speed 10498.99 samples/sec   Loss 3.0017   LearningRate 0.0266   Epoch: 16   Global Step: 86230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:51:01,707-Speed 10511.53 samples/sec   Loss 2.9915   LearningRate 0.0265   Epoch: 16   Global Step: 86240   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:51:09,475-Speed 10547.69 samples/sec   Loss 2.9802   LearningRate 0.0265   Epoch: 16   Global Step: 86250   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:51:17,260-Speed 10523.74 samples/sec   Loss 2.9774   LearningRate 0.0265   Epoch: 16   Global Step: 86260   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:51:25,034-Speed 10537.96 samples/sec   Loss 3.0012   LearningRate 0.0264   Epoch: 16   Global Step: 86270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:51:32,827-Speed 10513.72 samples/sec   Loss 2.9864   LearningRate 0.0264   Epoch: 16   Global Step: 86280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:51:40,630-Speed 10505.28 samples/sec   Loss 3.0044   LearningRate 0.0264   Epoch: 16   Global Step: 86290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:51:48,428-Speed 10507.89 samples/sec   Loss 2.9720   LearningRate 0.0263   Epoch: 16   Global Step: 86300   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:51:56,239-Speed 10487.83 samples/sec   Loss 2.9978   LearningRate 0.0263   Epoch: 16   Global Step: 86310   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:52:04,084-Speed 10443.81 samples/sec   Loss 2.9621   LearningRate 0.0263   Epoch: 16   Global Step: 86320   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:52:11,895-Speed 10490.60 samples/sec   Loss 2.9658   LearningRate 0.0263   Epoch: 16   Global Step: 86330   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:52:19,738-Speed 10446.19 samples/sec   Loss 2.9875   LearningRate 0.0262   Epoch: 16   Global Step: 86340   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:52:27,543-Speed 10496.17 samples/sec   Loss 2.9860   LearningRate 0.0262   Epoch: 16   Global Step: 86350   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:52:35,330-Speed 10521.46 samples/sec   Loss 2.9840   LearningRate 0.0262   Epoch: 16   Global Step: 86360   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:52:43,099-Speed 10546.85 samples/sec   Loss 2.9982   LearningRate 0.0261   Epoch: 16   Global Step: 86370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:52:50,889-Speed 10517.50 samples/sec   Loss 2.9831   LearningRate 0.0261   Epoch: 16   Global Step: 86380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:52:58,691-Speed 10501.17 samples/sec   Loss 3.0005   LearningRate 0.0261   Epoch: 16   Global Step: 86390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:53:06,472-Speed 10532.49 samples/sec   Loss 2.9625   LearningRate 0.0260   Epoch: 16   Global Step: 86400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:53:14,255-Speed 10527.91 samples/sec   Loss 3.0002   LearningRate 0.0260   Epoch: 16   Global Step: 86410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:53:22,040-Speed 10523.60 samples/sec   Loss 3.0037   LearningRate 0.0260   Epoch: 16   Global Step: 86420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:53:29,889-Speed 10437.61 samples/sec   Loss 2.9454   LearningRate 0.0260   Epoch: 16   Global Step: 86430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:53:37,695-Speed 10499.17 samples/sec   Loss 2.9668   LearningRate 0.0259   Epoch: 16   Global Step: 86440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:53:45,510-Speed 10483.42 samples/sec   Loss 2.9694   LearningRate 0.0259   Epoch: 16   Global Step: 86450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:53:53,354-Speed 10445.44 samples/sec   Loss 2.9675   LearningRate 0.0259   Epoch: 16   Global Step: 86460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:54:01,179-Speed 10471.10 samples/sec   Loss 2.9654   LearningRate 0.0258   Epoch: 16   Global Step: 86470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:54:08,983-Speed 10498.23 samples/sec   Loss 2.9505   LearningRate 0.0258   Epoch: 16   Global Step: 86480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:54:16,789-Speed 10495.92 samples/sec   Loss 2.9702   LearningRate 0.0258   Epoch: 16   Global Step: 86490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:54:24,619-Speed 10463.36 samples/sec   Loss 2.9564   LearningRate 0.0257   Epoch: 16   Global Step: 86500   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:54:32,408-Speed 10519.32 samples/sec   Loss 2.9561   LearningRate 0.0257   Epoch: 16   Global Step: 86510   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:54:40,215-Speed 10495.04 samples/sec   Loss 2.9509   LearningRate 0.0257   Epoch: 16   Global Step: 86520   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 09:54:48,007-Speed 10514.12 samples/sec   Loss 3.0091   LearningRate 0.0257   Epoch: 16   Global Step: 86530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:54:55,853-Speed 10443.18 samples/sec   Loss 2.9661   LearningRate 0.0256   Epoch: 16   Global Step: 86540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:55:03,661-Speed 10492.85 samples/sec   Loss 2.9621   LearningRate 0.0256   Epoch: 16   Global Step: 86550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:55:11,469-Speed 10493.97 samples/sec   Loss 2.9604   LearningRate 0.0256   Epoch: 16   Global Step: 86560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:55:19,309-Speed 10449.73 samples/sec   Loss 2.9802   LearningRate 0.0255   Epoch: 16   Global Step: 86570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:55:27,103-Speed 10512.42 samples/sec   Loss 2.9497   LearningRate 0.0255   Epoch: 16   Global Step: 86580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:55:34,889-Speed 10523.38 samples/sec   Loss 2.9264   LearningRate 0.0255   Epoch: 16   Global Step: 86590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:55:42,669-Speed 10531.36 samples/sec   Loss 2.9226   LearningRate 0.0254   Epoch: 16   Global Step: 86600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:55:50,445-Speed 10536.05 samples/sec   Loss 2.9464   LearningRate 0.0254   Epoch: 16   Global Step: 86610   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:55:58,220-Speed 10537.75 samples/sec   Loss 2.9431   LearningRate 0.0254   Epoch: 16   Global Step: 86620   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:56:06,006-Speed 10522.49 samples/sec   Loss 2.9430   LearningRate 0.0254   Epoch: 16   Global Step: 86630   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:56:13,818-Speed 10488.34 samples/sec   Loss 2.9013   LearningRate 0.0253   Epoch: 16   Global Step: 86640   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:56:21,604-Speed 10523.20 samples/sec   Loss 2.9530   LearningRate 0.0253   Epoch: 16   Global Step: 86650   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:56:29,408-Speed 10498.59 samples/sec   Loss 2.9189   LearningRate 0.0253   Epoch: 16   Global Step: 86660   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:56:37,212-Speed 10498.77 samples/sec   Loss 2.9857   LearningRate 0.0252   Epoch: 16   Global Step: 86670   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:56:45,005-Speed 10513.14 samples/sec   Loss 2.9511   LearningRate 0.0252   Epoch: 16   Global Step: 86680   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:56:52,789-Speed 10525.33 samples/sec   Loss 2.9320   LearningRate 0.0252   Epoch: 16   Global Step: 86690   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:57:00,610-Speed 10475.78 samples/sec   Loss 2.9258   LearningRate 0.0251   Epoch: 16   Global Step: 86700   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:57:08,424-Speed 10485.68 samples/sec   Loss 2.9511   LearningRate 0.0251   Epoch: 16   Global Step: 86710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:57:16,230-Speed 10495.63 samples/sec   Loss 2.9312   LearningRate 0.0251   Epoch: 16   Global Step: 86720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:57:24,020-Speed 10516.91 samples/sec   Loss 2.9158   LearningRate 0.0251   Epoch: 16   Global Step: 86730   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:57:31,846-Speed 10469.62 samples/sec   Loss 2.9253   LearningRate 0.0250   Epoch: 16   Global Step: 86740   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:57:39,661-Speed 10483.21 samples/sec   Loss 2.8981   LearningRate 0.0250   Epoch: 16   Global Step: 86750   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:57:47,459-Speed 10506.61 samples/sec   Loss 2.9211   LearningRate 0.0250   Epoch: 16   Global Step: 86760   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:57:55,265-Speed 10496.23 samples/sec   Loss 2.9290   LearningRate 0.0249   Epoch: 16   Global Step: 86770   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:58:03,071-Speed 10495.94 samples/sec   Loss 2.9185   LearningRate 0.0249   Epoch: 16   Global Step: 86780   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:58:10,864-Speed 10513.97 samples/sec   Loss 2.9131   LearningRate 0.0249   Epoch: 16   Global Step: 86790   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:58:18,682-Speed 10479.36 samples/sec   Loss 2.9194   LearningRate 0.0248   Epoch: 16   Global Step: 86800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:58:26,484-Speed 10502.47 samples/sec   Loss 2.9161   LearningRate 0.0248   Epoch: 16   Global Step: 86810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:58:34,304-Speed 10477.23 samples/sec   Loss 2.9265   LearningRate 0.0248   Epoch: 16   Global Step: 86820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 09:58:42,145-Speed 10447.98 samples/sec   Loss 2.8967   LearningRate 0.0248   Epoch: 16   Global Step: 86830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:58:49,962-Speed 10481.61 samples/sec   Loss 2.9234   LearningRate 0.0247   Epoch: 16   Global Step: 86840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:58:57,767-Speed 10497.20 samples/sec   Loss 2.8972   LearningRate 0.0247   Epoch: 16   Global Step: 86850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:59:05,570-Speed 10500.75 samples/sec   Loss 2.8989   LearningRate 0.0247   Epoch: 16   Global Step: 86860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:59:13,366-Speed 10508.27 samples/sec   Loss 2.9003   LearningRate 0.0246   Epoch: 16   Global Step: 86870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:59:21,174-Speed 10494.45 samples/sec   Loss 2.9178   LearningRate 0.0246   Epoch: 16   Global Step: 86880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:59:28,989-Speed 10483.24 samples/sec   Loss 2.9234   LearningRate 0.0246   Epoch: 16   Global Step: 86890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:59:36,763-Speed 10539.68 samples/sec   Loss 2.9098   LearningRate 0.0246   Epoch: 16   Global Step: 86900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:59:44,582-Speed 10479.06 samples/sec   Loss 2.9010   LearningRate 0.0245   Epoch: 16   Global Step: 86910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 09:59:52,376-Speed 10510.76 samples/sec   Loss 2.9159   LearningRate 0.0245   Epoch: 16   Global Step: 86920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:00:00,163-Speed 10521.76 samples/sec   Loss 2.8777   LearningRate 0.0245   Epoch: 16   Global Step: 86930   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 10:00:07,934-Speed 10543.71 samples/sec   Loss 2.8936   LearningRate 0.0244   Epoch: 16   Global Step: 86940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:00:15,722-Speed 10520.10 samples/sec   Loss 2.9160   LearningRate 0.0244   Epoch: 16   Global Step: 86950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:00:23,511-Speed 10518.16 samples/sec   Loss 2.9066   LearningRate 0.0244   Epoch: 16   Global Step: 86960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:00:31,304-Speed 10513.55 samples/sec   Loss 2.8962   LearningRate 0.0244   Epoch: 16   Global Step: 86970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:00:39,120-Speed 10483.62 samples/sec   Loss 2.8987   LearningRate 0.0243   Epoch: 16   Global Step: 86980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:00:46,925-Speed 10496.93 samples/sec   Loss 2.8961   LearningRate 0.0243   Epoch: 16   Global Step: 86990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:00:54,755-Speed 10464.25 samples/sec   Loss 2.9015   LearningRate 0.0243   Epoch: 16   Global Step: 87000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:01:02,552-Speed 10508.55 samples/sec   Loss 2.9002   LearningRate 0.0242   Epoch: 16   Global Step: 87010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:01:10,352-Speed 10503.85 samples/sec   Loss 2.9128   LearningRate 0.0242   Epoch: 16   Global Step: 87020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:01:18,167-Speed 10484.49 samples/sec   Loss 2.8904   LearningRate 0.0242   Epoch: 16   Global Step: 87030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:01:25,981-Speed 10485.17 samples/sec   Loss 2.9218   LearningRate 0.0241   Epoch: 16   Global Step: 87040   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 10:01:33,765-Speed 10526.82 samples/sec   Loss 2.8869   LearningRate 0.0241   Epoch: 16   Global Step: 87050   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 10:01:41,573-Speed 10492.40 samples/sec   Loss 2.8606   LearningRate 0.0241   Epoch: 16   Global Step: 87060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:01:49,422-Speed 10440.07 samples/sec   Loss 2.9029   LearningRate 0.0241   Epoch: 16   Global Step: 87070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:01:57,247-Speed 10469.35 samples/sec   Loss 2.8804   LearningRate 0.0240   Epoch: 16   Global Step: 87080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:02:05,055-Speed 10494.11 samples/sec   Loss 2.8868   LearningRate 0.0240   Epoch: 16   Global Step: 87090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:02:12,845-Speed 10519.77 samples/sec   Loss 2.8905   LearningRate 0.0240   Epoch: 16   Global Step: 87100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:02:20,632-Speed 10521.34 samples/sec   Loss 2.8783   LearningRate 0.0239   Epoch: 16   Global Step: 87110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:02:28,417-Speed 10524.66 samples/sec   Loss 2.8980   LearningRate 0.0239   Epoch: 16   Global Step: 87120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:02:36,244-Speed 10468.14 samples/sec   Loss 2.8862   LearningRate 0.0239   Epoch: 16   Global Step: 87130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:02:44,040-Speed 10508.87 samples/sec   Loss 2.8554   LearningRate 0.0239   Epoch: 16   Global Step: 87140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:02:51,882-Speed 10446.74 samples/sec   Loss 2.8552   LearningRate 0.0238   Epoch: 16   Global Step: 87150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:02:59,709-Speed 10468.38 samples/sec   Loss 2.8735   LearningRate 0.0238   Epoch: 16   Global Step: 87160   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 10:03:07,529-Speed 10477.46 samples/sec   Loss 2.8673   LearningRate 0.0238   Epoch: 16   Global Step: 87170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:03:15,361-Speed 10460.11 samples/sec   Loss 2.8933   LearningRate 0.0237   Epoch: 16   Global Step: 87180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:03:23,185-Speed 10472.69 samples/sec   Loss 2.8559   LearningRate 0.0237   Epoch: 16   Global Step: 87190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:03:30,990-Speed 10498.18 samples/sec   Loss 2.8761   LearningRate 0.0237   Epoch: 16   Global Step: 87200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:03:38,787-Speed 10507.58 samples/sec   Loss 2.8718   LearningRate 0.0237   Epoch: 16   Global Step: 87210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:03:46,597-Speed 10489.67 samples/sec   Loss 2.8664   LearningRate 0.0236   Epoch: 16   Global Step: 87220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:03:54,422-Speed 10474.61 samples/sec   Loss 2.8477   LearningRate 0.0236   Epoch: 16   Global Step: 87230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:04:02,235-Speed 10488.27 samples/sec   Loss 2.8699   LearningRate 0.0236   Epoch: 16   Global Step: 87240   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 10:04:10,038-Speed 10499.05 samples/sec   Loss 2.8400   LearningRate 0.0235   Epoch: 16   Global Step: 87250   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 10:04:17,884-Speed 10442.10 samples/sec   Loss 2.8915   LearningRate 0.0235   Epoch: 16   Global Step: 87260   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 10:04:25,720-Speed 10455.77 samples/sec   Loss 2.8701   LearningRate 0.0235   Epoch: 16   Global Step: 87270   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 10:04:33,502-Speed 10530.11 samples/sec   Loss 2.8728   LearningRate 0.0235   Epoch: 16   Global Step: 87280   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 10:04:41,296-Speed 10511.94 samples/sec   Loss 2.8755   LearningRate 0.0234   Epoch: 16   Global Step: 87290   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 10:04:49,095-Speed 10504.85 samples/sec   Loss 2.8720   LearningRate 0.0234   Epoch: 16   Global Step: 87300   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 10:04:56,897-Speed 10502.26 samples/sec   Loss 2.8383   LearningRate 0.0234   Epoch: 16   Global Step: 87310   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 10:05:04,684-Speed 10521.31 samples/sec   Loss 2.8133   LearningRate 0.0233   Epoch: 16   Global Step: 87320   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 10:05:12,501-Speed 10481.56 samples/sec   Loss 2.8652   LearningRate 0.0233   Epoch: 16   Global Step: 87330   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-16 10:05:20,309-Speed 10492.68 samples/sec   Loss 2.8215   LearningRate 0.0233   Epoch: 16   Global Step: 87340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:05:28,210-Speed 10369.96 samples/sec   Loss 2.8465   LearningRate 0.0233   Epoch: 16   Global Step: 87350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:05:36,007-Speed 10508.43 samples/sec   Loss 2.8578   LearningRate 0.0232   Epoch: 16   Global Step: 87360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:05:43,818-Speed 10491.18 samples/sec   Loss 2.8438   LearningRate 0.0232   Epoch: 16   Global Step: 87370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:05:51,624-Speed 10494.99 samples/sec   Loss 2.8375   LearningRate 0.0232   Epoch: 16   Global Step: 87380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:05:59,418-Speed 10512.29 samples/sec   Loss 2.8453   LearningRate 0.0231   Epoch: 16   Global Step: 87390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:06:07,227-Speed 10492.58 samples/sec   Loss 2.8410   LearningRate 0.0231   Epoch: 16   Global Step: 87400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:06:15,045-Speed 10478.42 samples/sec   Loss 2.8401   LearningRate 0.0231   Epoch: 16   Global Step: 87410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:06:22,894-Speed 10439.78 samples/sec   Loss 2.8240   LearningRate 0.0231   Epoch: 16   Global Step: 87420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:06:30,728-Speed 10457.17 samples/sec   Loss 2.8341   LearningRate 0.0230   Epoch: 16   Global Step: 87430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:06:38,530-Speed 10501.61 samples/sec   Loss 2.8560   LearningRate 0.0230   Epoch: 16   Global Step: 87440   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 10:06:46,336-Speed 10496.57 samples/sec   Loss 2.8359   LearningRate 0.0230   Epoch: 16   Global Step: 87450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:06:54,131-Speed 10511.80 samples/sec   Loss 2.8578   LearningRate 0.0229   Epoch: 16   Global Step: 87460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:07:01,940-Speed 10492.11 samples/sec   Loss 2.8581   LearningRate 0.0229   Epoch: 16   Global Step: 87470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:07:09,734-Speed 10510.85 samples/sec   Loss 2.8207   LearningRate 0.0229   Epoch: 16   Global Step: 87480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:07:17,533-Speed 10504.90 samples/sec   Loss 2.8292   LearningRate 0.0229   Epoch: 16   Global Step: 87490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:07:25,350-Speed 10481.59 samples/sec   Loss 2.8379   LearningRate 0.0228   Epoch: 16   Global Step: 87500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:07:33,134-Speed 10526.42 samples/sec   Loss 2.8009   LearningRate 0.0228   Epoch: 16   Global Step: 87510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:07:40,964-Speed 10463.57 samples/sec   Loss 2.8227   LearningRate 0.0228   Epoch: 16   Global Step: 87520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:07:48,803-Speed 10452.13 samples/sec   Loss 2.8192   LearningRate 0.0227   Epoch: 16   Global Step: 87530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:07:56,613-Speed 10490.54 samples/sec   Loss 2.8225   LearningRate 0.0227   Epoch: 16   Global Step: 87540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:08:04,458-Speed 10443.83 samples/sec   Loss 2.8258   LearningRate 0.0227   Epoch: 16   Global Step: 87550   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 10:08:12,283-Speed 10470.70 samples/sec   Loss 2.8135   LearningRate 0.0227   Epoch: 16   Global Step: 87560   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 10:08:20,084-Speed 10502.57 samples/sec   Loss 2.7968   LearningRate 0.0226   Epoch: 16   Global Step: 87570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 10:08:27,903-Speed 10479.27 samples/sec   Loss 2.8390   LearningRate 0.0226   Epoch: 16   Global Step: 87580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:08:35,709-Speed 10495.70 samples/sec   Loss 2.7973   LearningRate 0.0226   Epoch: 16   Global Step: 87590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:08:43,519-Speed 10491.10 samples/sec   Loss 2.7957   LearningRate 0.0226   Epoch: 16   Global Step: 87600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:08:51,336-Speed 10480.56 samples/sec   Loss 2.8151   LearningRate 0.0225   Epoch: 16   Global Step: 87610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:08:59,153-Speed 10481.34 samples/sec   Loss 2.8152   LearningRate 0.0225   Epoch: 16   Global Step: 87620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:09:06,957-Speed 10498.66 samples/sec   Loss 2.8006   LearningRate 0.0225   Epoch: 16   Global Step: 87630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:09:14,748-Speed 10515.31 samples/sec   Loss 2.8180   LearningRate 0.0224   Epoch: 16   Global Step: 87640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:09:22,546-Speed 10506.91 samples/sec   Loss 2.8054   LearningRate 0.0224   Epoch: 16   Global Step: 87650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:09:30,334-Speed 10521.19 samples/sec   Loss 2.8006   LearningRate 0.0224   Epoch: 16   Global Step: 87660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:09:38,133-Speed 10506.09 samples/sec   Loss 2.8132   LearningRate 0.0224   Epoch: 16   Global Step: 87670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:09:45,942-Speed 10491.83 samples/sec   Loss 2.8074   LearningRate 0.0223   Epoch: 16   Global Step: 87680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-16 10:09:53,749-Speed 10494.08 samples/sec   Loss 2.7963   LearningRate 0.0223   Epoch: 16   Global Step: 87690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:10:01,564-Speed 10484.97 samples/sec   Loss 2.7932   LearningRate 0.0223   Epoch: 16   Global Step: 87700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:10:09,369-Speed 10497.14 samples/sec   Loss 2.8178   LearningRate 0.0222   Epoch: 16   Global Step: 87710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:10:17,153-Speed 10524.91 samples/sec   Loss 2.7855   LearningRate 0.0222   Epoch: 16   Global Step: 87720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:10:24,962-Speed 10492.43 samples/sec   Loss 2.7935   LearningRate 0.0222   Epoch: 16   Global Step: 87730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:10:32,752-Speed 10517.28 samples/sec   Loss 2.7695   LearningRate 0.0222   Epoch: 16   Global Step: 87740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:10:40,561-Speed 10492.66 samples/sec   Loss 2.7894   LearningRate 0.0221   Epoch: 16   Global Step: 87750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:10:48,387-Speed 10468.40 samples/sec   Loss 2.8214   LearningRate 0.0221   Epoch: 16   Global Step: 87760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-16 10:10:56,182-Speed 10516.09 samples/sec   Loss 2.8041   LearningRate 0.0221   Epoch: 16   Global Step: 87770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:11:03,987-Speed 10498.18 samples/sec   Loss 2.7909   LearningRate 0.0220   Epoch: 16   Global Step: 87780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:11:11,767-Speed 10531.44 samples/sec   Loss 2.7754   LearningRate 0.0220   Epoch: 16   Global Step: 87790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:11:19,566-Speed 10503.93 samples/sec   Loss 2.7827   LearningRate 0.0220   Epoch: 16   Global Step: 87800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:11:27,369-Speed 10499.89 samples/sec   Loss 2.7944   LearningRate 0.0220   Epoch: 16   Global Step: 87810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:11:35,199-Speed 10464.77 samples/sec   Loss 2.7906   LearningRate 0.0219   Epoch: 16   Global Step: 87820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:11:42,992-Speed 10513.93 samples/sec   Loss 2.7931   LearningRate 0.0219   Epoch: 16   Global Step: 87830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:11:50,796-Speed 10498.25 samples/sec   Loss 2.7938   LearningRate 0.0219   Epoch: 16   Global Step: 87840   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:11:58,572-Speed 10536.55 samples/sec   Loss 2.8035   LearningRate 0.0219   Epoch: 16   Global Step: 87850   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:12:06,377-Speed 10501.35 samples/sec   Loss 2.7905   LearningRate 0.0218   Epoch: 16   Global Step: 87860   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:12:14,178-Speed 10502.46 samples/sec   Loss 2.7837   LearningRate 0.0218   Epoch: 16   Global Step: 87870   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:12:21,998-Speed 10477.50 samples/sec   Loss 2.7903   LearningRate 0.0218   Epoch: 16   Global Step: 87880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:12:29,793-Speed 10510.78 samples/sec   Loss 2.7663   LearningRate 0.0217   Epoch: 16   Global Step: 87890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:12:37,590-Speed 10507.40 samples/sec   Loss 2.7909   LearningRate 0.0217   Epoch: 16   Global Step: 87900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:12:45,389-Speed 10506.02 samples/sec   Loss 2.8026   LearningRate 0.0217   Epoch: 16   Global Step: 87910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:12:53,206-Speed 10486.19 samples/sec   Loss 2.7649   LearningRate 0.0217   Epoch: 16   Global Step: 87920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:13:00,988-Speed 10527.33 samples/sec   Loss 2.7726   LearningRate 0.0216   Epoch: 16   Global Step: 87930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:13:08,809-Speed 10477.06 samples/sec   Loss 2.7778   LearningRate 0.0216   Epoch: 16   Global Step: 87940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:13:16,639-Speed 10464.09 samples/sec   Loss 2.7658   LearningRate 0.0216   Epoch: 16   Global Step: 87950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:13:24,432-Speed 10513.15 samples/sec   Loss 2.7621   LearningRate 0.0216   Epoch: 16   Global Step: 87960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:13:32,214-Speed 10527.57 samples/sec   Loss 2.7608   LearningRate 0.0215   Epoch: 16   Global Step: 87970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:13:39,990-Speed 10537.29 samples/sec   Loss 2.7835   LearningRate 0.0215   Epoch: 16   Global Step: 87980   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-01-16 10:13:47,779-Speed 10519.17 samples/sec   Loss 2.7762   LearningRate 0.0215   Epoch: 16   Global Step: 87990   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-01-16 10:13:55,592-Speed 10486.70 samples/sec   Loss 2.7422   LearningRate 0.0214   Epoch: 16   Global Step: 88000   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-01-16 10:14:03,395-Speed 10499.28 samples/sec   Loss 2.7756   LearningRate 0.0214   Epoch: 16   Global Step: 88010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:14:11,170-Speed 10538.17 samples/sec   Loss 2.7700   LearningRate 0.0214   Epoch: 16   Global Step: 88020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:14:18,971-Speed 10502.68 samples/sec   Loss 2.7593   LearningRate 0.0214   Epoch: 16   Global Step: 88030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:14:26,787-Speed 10482.51 samples/sec   Loss 2.7541   LearningRate 0.0213   Epoch: 16   Global Step: 88040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:14:34,589-Speed 10501.49 samples/sec   Loss 2.7587   LearningRate 0.0213   Epoch: 16   Global Step: 88050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:14:42,422-Speed 10459.47 samples/sec   Loss 2.7839   LearningRate 0.0213   Epoch: 16   Global Step: 88060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:14:50,226-Speed 10498.36 samples/sec   Loss 2.7471   LearningRate 0.0213   Epoch: 16   Global Step: 88070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:14:58,020-Speed 10512.11 samples/sec   Loss 2.7617   LearningRate 0.0212   Epoch: 16   Global Step: 88080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:15:05,805-Speed 10524.40 samples/sec   Loss 2.7181   LearningRate 0.0212   Epoch: 16   Global Step: 88090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:15:13,606-Speed 10501.36 samples/sec   Loss 2.7524   LearningRate 0.0212   Epoch: 16   Global Step: 88100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:15:21,413-Speed 10494.91 samples/sec   Loss 2.7562   LearningRate 0.0211   Epoch: 16   Global Step: 88110   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-01-16 10:15:29,270-Speed 10428.55 samples/sec   Loss 2.7553   LearningRate 0.0211   Epoch: 16   Global Step: 88120   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-01-16 10:15:37,102-Speed 10461.06 samples/sec   Loss 2.7560   LearningRate 0.0211   Epoch: 16   Global Step: 88130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:15:44,887-Speed 10523.67 samples/sec   Loss 2.7777   LearningRate 0.0211   Epoch: 16   Global Step: 88140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:16:07,405-Speed 3638.07 samples/sec   Loss 2.8004   LearningRate 0.0210   Epoch: 17   Global Step: 88150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:16:15,171-Speed 10550.69 samples/sec   Loss 2.7609   LearningRate 0.0210   Epoch: 17   Global Step: 88160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:16:22,944-Speed 10540.73 samples/sec   Loss 2.7338   LearningRate 0.0210   Epoch: 17   Global Step: 88170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:16:30,736-Speed 10515.97 samples/sec   Loss 2.7665   LearningRate 0.0210   Epoch: 17   Global Step: 88180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:16:38,521-Speed 10522.95 samples/sec   Loss 2.7015   LearningRate 0.0209   Epoch: 17   Global Step: 88190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:16:46,331-Speed 10491.80 samples/sec   Loss 2.7395   LearningRate 0.0209   Epoch: 17   Global Step: 88200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:16:54,115-Speed 10525.17 samples/sec   Loss 2.7075   LearningRate 0.0209   Epoch: 17   Global Step: 88210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:17:01,946-Speed 10462.19 samples/sec   Loss 2.7362   LearningRate 0.0208   Epoch: 17   Global Step: 88220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:17:09,730-Speed 10526.30 samples/sec   Loss 2.7108   LearningRate 0.0208   Epoch: 17   Global Step: 88230   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-01-16 10:17:17,561-Speed 10462.32 samples/sec   Loss 2.7311   LearningRate 0.0208   Epoch: 17   Global Step: 88240   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-01-16 10:17:25,360-Speed 10505.22 samples/sec   Loss 2.7251   LearningRate 0.0208   Epoch: 17   Global Step: 88250   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-01-16 10:17:33,183-Speed 10473.16 samples/sec   Loss 2.7281   LearningRate 0.0207   Epoch: 17   Global Step: 88260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:17:40,988-Speed 10498.19 samples/sec   Loss 2.7197   LearningRate 0.0207   Epoch: 17   Global Step: 88270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:17:48,830-Speed 10447.82 samples/sec   Loss 2.7130   LearningRate 0.0207   Epoch: 17   Global Step: 88280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:17:56,616-Speed 10522.66 samples/sec   Loss 2.7086   LearningRate 0.0207   Epoch: 17   Global Step: 88290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:18:04,409-Speed 10513.60 samples/sec   Loss 2.7242   LearningRate 0.0206   Epoch: 17   Global Step: 88300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:18:12,221-Speed 10488.64 samples/sec   Loss 2.7311   LearningRate 0.0206   Epoch: 17   Global Step: 88310   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:18:19,996-Speed 10537.64 samples/sec   Loss 2.7004   LearningRate 0.0206   Epoch: 17   Global Step: 88320   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:18:27,779-Speed 10526.32 samples/sec   Loss 2.7070   LearningRate 0.0205   Epoch: 17   Global Step: 88330   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:18:35,577-Speed 10506.75 samples/sec   Loss 2.7118   LearningRate 0.0205   Epoch: 17   Global Step: 88340   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:18:43,387-Speed 10491.19 samples/sec   Loss 2.7128   LearningRate 0.0205   Epoch: 17   Global Step: 88350   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:18:51,184-Speed 10507.88 samples/sec   Loss 2.7002   LearningRate 0.0205   Epoch: 17   Global Step: 88360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:18:59,025-Speed 10448.74 samples/sec   Loss 2.7060   LearningRate 0.0204   Epoch: 17   Global Step: 88370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:19:06,846-Speed 10474.84 samples/sec   Loss 2.7021   LearningRate 0.0204   Epoch: 17   Global Step: 88380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:19:14,633-Speed 10522.74 samples/sec   Loss 2.7095   LearningRate 0.0204   Epoch: 17   Global Step: 88390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:19:22,408-Speed 10539.43 samples/sec   Loss 2.7101   LearningRate 0.0204   Epoch: 17   Global Step: 88400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:19:30,225-Speed 10482.03 samples/sec   Loss 2.6870   LearningRate 0.0203   Epoch: 17   Global Step: 88410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:19:38,003-Speed 10533.06 samples/sec   Loss 2.7248   LearningRate 0.0203   Epoch: 17   Global Step: 88420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:19:45,783-Speed 10531.39 samples/sec   Loss 2.7151   LearningRate 0.0203   Epoch: 17   Global Step: 88430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:19:53,582-Speed 10504.85 samples/sec   Loss 2.6873   LearningRate 0.0203   Epoch: 17   Global Step: 88440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:20:01,385-Speed 10499.65 samples/sec   Loss 2.7308   LearningRate 0.0202   Epoch: 17   Global Step: 88450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:20:09,223-Speed 10452.67 samples/sec   Loss 2.6989   LearningRate 0.0202   Epoch: 17   Global Step: 88460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:20:17,050-Speed 10467.82 samples/sec   Loss 2.7239   LearningRate 0.0202   Epoch: 17   Global Step: 88470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:20:24,873-Speed 10472.73 samples/sec   Loss 2.7095   LearningRate 0.0201   Epoch: 17   Global Step: 88480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:20:32,724-Speed 10435.84 samples/sec   Loss 2.7003   LearningRate 0.0201   Epoch: 17   Global Step: 88490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:20:40,550-Speed 10469.53 samples/sec   Loss 2.7013   LearningRate 0.0201   Epoch: 17   Global Step: 88500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:20:48,410-Speed 10424.12 samples/sec   Loss 2.6644   LearningRate 0.0201   Epoch: 17   Global Step: 88510   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-01-16 10:20:56,250-Speed 10450.53 samples/sec   Loss 2.7002   LearningRate 0.0200   Epoch: 17   Global Step: 88520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:21:04,086-Speed 10455.30 samples/sec   Loss 2.7143   LearningRate 0.0200   Epoch: 17   Global Step: 88530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:21:11,931-Speed 10444.01 samples/sec   Loss 2.6696   LearningRate 0.0200   Epoch: 17   Global Step: 88540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:21:19,781-Speed 10437.27 samples/sec   Loss 2.6830   LearningRate 0.0200   Epoch: 17   Global Step: 88550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:21:27,609-Speed 10465.62 samples/sec   Loss 2.6995   LearningRate 0.0199   Epoch: 17   Global Step: 88560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:21:35,442-Speed 10459.33 samples/sec   Loss 2.6702   LearningRate 0.0199   Epoch: 17   Global Step: 88570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:21:43,276-Speed 10458.46 samples/sec   Loss 2.6744   LearningRate 0.0199   Epoch: 17   Global Step: 88580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:21:51,093-Speed 10481.25 samples/sec   Loss 2.6966   LearningRate 0.0199   Epoch: 17   Global Step: 88590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:21:58,944-Speed 10436.11 samples/sec   Loss 2.6845   LearningRate 0.0198   Epoch: 17   Global Step: 88600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:22:06,763-Speed 10478.30 samples/sec   Loss 2.6880   LearningRate 0.0198   Epoch: 17   Global Step: 88610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:22:14,594-Speed 10462.96 samples/sec   Loss 2.6906   LearningRate 0.0198   Epoch: 17   Global Step: 88620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:22:22,426-Speed 10461.48 samples/sec   Loss 2.6817   LearningRate 0.0198   Epoch: 17   Global Step: 88630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:22:30,272-Speed 10441.98 samples/sec   Loss 2.6844   LearningRate 0.0197   Epoch: 17   Global Step: 88640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:22:38,125-Speed 10433.79 samples/sec   Loss 2.6728   LearningRate 0.0197   Epoch: 17   Global Step: 88650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:22:45,988-Speed 10419.28 samples/sec   Loss 2.6934   LearningRate 0.0197   Epoch: 17   Global Step: 88660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:22:53,816-Speed 10466.15 samples/sec   Loss 2.6678   LearningRate 0.0196   Epoch: 17   Global Step: 88670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:23:01,673-Speed 10427.43 samples/sec   Loss 2.6816   LearningRate 0.0196   Epoch: 17   Global Step: 88680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:23:09,523-Speed 10438.18 samples/sec   Loss 2.6788   LearningRate 0.0196   Epoch: 17   Global Step: 88690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:23:17,360-Speed 10454.30 samples/sec   Loss 2.6754   LearningRate 0.0196   Epoch: 17   Global Step: 88700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:23:25,228-Speed 10414.38 samples/sec   Loss 2.6715   LearningRate 0.0195   Epoch: 17   Global Step: 88710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:23:33,069-Speed 10449.16 samples/sec   Loss 2.6495   LearningRate 0.0195   Epoch: 17   Global Step: 88720   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-01-16 10:23:40,903-Speed 10459.43 samples/sec   Loss 2.6562   LearningRate 0.0195   Epoch: 17   Global Step: 88730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:23:48,747-Speed 10444.91 samples/sec   Loss 2.6848   LearningRate 0.0195   Epoch: 17   Global Step: 88740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:23:56,593-Speed 10441.92 samples/sec   Loss 2.6345   LearningRate 0.0194   Epoch: 17   Global Step: 88750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:24:04,422-Speed 10465.23 samples/sec   Loss 2.6582   LearningRate 0.0194   Epoch: 17   Global Step: 88760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:24:12,241-Speed 10477.87 samples/sec   Loss 2.6580   LearningRate 0.0194   Epoch: 17   Global Step: 88770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:24:20,080-Speed 10452.21 samples/sec   Loss 2.6945   LearningRate 0.0194   Epoch: 17   Global Step: 88780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:24:27,939-Speed 10423.85 samples/sec   Loss 2.6460   LearningRate 0.0193   Epoch: 17   Global Step: 88790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:24:35,784-Speed 10446.40 samples/sec   Loss 2.6284   LearningRate 0.0193   Epoch: 17   Global Step: 88800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:24:43,650-Speed 10415.53 samples/sec   Loss 2.6326   LearningRate 0.0193   Epoch: 17   Global Step: 88810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:24:51,477-Speed 10468.47 samples/sec   Loss 2.6216   LearningRate 0.0193   Epoch: 17   Global Step: 88820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:24:59,276-Speed 10503.71 samples/sec   Loss 2.6619   LearningRate 0.0192   Epoch: 17   Global Step: 88830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:25:07,110-Speed 10459.28 samples/sec   Loss 2.6581   LearningRate 0.0192   Epoch: 17   Global Step: 88840   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:25:14,952-Speed 10447.23 samples/sec   Loss 2.6541   LearningRate 0.0192   Epoch: 17   Global Step: 88850   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:25:22,781-Speed 10465.08 samples/sec   Loss 2.6486   LearningRate 0.0192   Epoch: 17   Global Step: 88860   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:25:30,646-Speed 10417.34 samples/sec   Loss 2.6172   LearningRate 0.0191   Epoch: 17   Global Step: 88870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:25:38,486-Speed 10450.41 samples/sec   Loss 2.6659   LearningRate 0.0191   Epoch: 17   Global Step: 88880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:25:46,311-Speed 10470.79 samples/sec   Loss 2.6536   LearningRate 0.0191   Epoch: 17   Global Step: 88890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:25:54,123-Speed 10488.23 samples/sec   Loss 2.6362   LearningRate 0.0191   Epoch: 17   Global Step: 88900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:26:01,903-Speed 10531.73 samples/sec   Loss 2.6340   LearningRate 0.0190   Epoch: 17   Global Step: 88910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:26:09,674-Speed 10542.41 samples/sec   Loss 2.6323   LearningRate 0.0190   Epoch: 17   Global Step: 88920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:26:17,478-Speed 10498.43 samples/sec   Loss 2.6260   LearningRate 0.0190   Epoch: 17   Global Step: 88930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:26:25,275-Speed 10508.22 samples/sec   Loss 2.6299   LearningRate 0.0189   Epoch: 17   Global Step: 88940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:26:33,053-Speed 10533.78 samples/sec   Loss 2.6149   LearningRate 0.0189   Epoch: 17   Global Step: 88950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:26:40,839-Speed 10522.44 samples/sec   Loss 2.6421   LearningRate 0.0189   Epoch: 17   Global Step: 88960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:26:48,627-Speed 10519.18 samples/sec   Loss 2.6404   LearningRate 0.0189   Epoch: 17   Global Step: 88970   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-01-16 10:26:56,442-Speed 10484.86 samples/sec   Loss 2.6360   LearningRate 0.0188   Epoch: 17   Global Step: 88980   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-01-16 10:27:04,223-Speed 10529.60 samples/sec   Loss 2.6433   LearningRate 0.0188   Epoch: 17   Global Step: 88990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:27:12,014-Speed 10516.51 samples/sec   Loss 2.6081   LearningRate 0.0188   Epoch: 17   Global Step: 89000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:27:19,792-Speed 10533.14 samples/sec   Loss 2.6282   LearningRate 0.0188   Epoch: 17   Global Step: 89010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:27:27,604-Speed 10487.73 samples/sec   Loss 2.6523   LearningRate 0.0187   Epoch: 17   Global Step: 89020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:27:35,408-Speed 10498.35 samples/sec   Loss 2.6237   LearningRate 0.0187   Epoch: 17   Global Step: 89030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:27:43,208-Speed 10504.91 samples/sec   Loss 2.6193   LearningRate 0.0187   Epoch: 17   Global Step: 89040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:27:51,010-Speed 10500.99 samples/sec   Loss 2.6176   LearningRate 0.0187   Epoch: 17   Global Step: 89050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:27:58,805-Speed 10510.81 samples/sec   Loss 2.6288   LearningRate 0.0186   Epoch: 17   Global Step: 89060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:28:06,606-Speed 10502.49 samples/sec   Loss 2.6244   LearningRate 0.0186   Epoch: 17   Global Step: 89070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:28:14,410-Speed 10498.09 samples/sec   Loss 2.5954   LearningRate 0.0186   Epoch: 17   Global Step: 89080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:28:22,221-Speed 10489.69 samples/sec   Loss 2.6184   LearningRate 0.0186   Epoch: 17   Global Step: 89090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:28:30,028-Speed 10495.05 samples/sec   Loss 2.6232   LearningRate 0.0185   Epoch: 17   Global Step: 89100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:28:37,839-Speed 10488.81 samples/sec   Loss 2.6114   LearningRate 0.0185   Epoch: 17   Global Step: 89110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:28:45,661-Speed 10473.27 samples/sec   Loss 2.6311   LearningRate 0.0185   Epoch: 17   Global Step: 89120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:28:53,458-Speed 10508.22 samples/sec   Loss 2.5935   LearningRate 0.0185   Epoch: 17   Global Step: 89130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:29:01,285-Speed 10468.65 samples/sec   Loss 2.5981   LearningRate 0.0184   Epoch: 17   Global Step: 89140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:29:09,136-Speed 10435.14 samples/sec   Loss 2.6073   LearningRate 0.0184   Epoch: 17   Global Step: 89150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:29:16,900-Speed 10552.64 samples/sec   Loss 2.5943   LearningRate 0.0184   Epoch: 17   Global Step: 89160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:29:24,680-Speed 10530.26 samples/sec   Loss 2.5902   LearningRate 0.0184   Epoch: 17   Global Step: 89170   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:29:32,470-Speed 10518.75 samples/sec   Loss 2.5705   LearningRate 0.0183   Epoch: 17   Global Step: 89180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:29:40,246-Speed 10537.09 samples/sec   Loss 2.6106   LearningRate 0.0183   Epoch: 17   Global Step: 89190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:29:48,062-Speed 10481.90 samples/sec   Loss 2.5871   LearningRate 0.0183   Epoch: 17   Global Step: 89200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:29:55,844-Speed 10527.82 samples/sec   Loss 2.6005   LearningRate 0.0183   Epoch: 17   Global Step: 89210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:30:03,648-Speed 10499.30 samples/sec   Loss 2.5942   LearningRate 0.0182   Epoch: 17   Global Step: 89220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:30:11,469-Speed 10476.13 samples/sec   Loss 2.6099   LearningRate 0.0182   Epoch: 17   Global Step: 89230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:30:19,269-Speed 10502.66 samples/sec   Loss 2.6080   LearningRate 0.0182   Epoch: 17   Global Step: 89240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:30:27,074-Speed 10498.01 samples/sec   Loss 2.5784   LearningRate 0.0182   Epoch: 17   Global Step: 89250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:30:34,882-Speed 10492.77 samples/sec   Loss 2.6141   LearningRate 0.0181   Epoch: 17   Global Step: 89260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:30:42,672-Speed 10517.03 samples/sec   Loss 2.6155   LearningRate 0.0181   Epoch: 17   Global Step: 89270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:30:50,478-Speed 10495.57 samples/sec   Loss 2.6125   LearningRate 0.0181   Epoch: 17   Global Step: 89280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:30:58,276-Speed 10508.18 samples/sec   Loss 2.6256   LearningRate 0.0181   Epoch: 17   Global Step: 89290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:31:06,102-Speed 10468.37 samples/sec   Loss 2.5870   LearningRate 0.0180   Epoch: 17   Global Step: 89300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:31:13,893-Speed 10515.90 samples/sec   Loss 2.5892   LearningRate 0.0180   Epoch: 17   Global Step: 89310   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-01-16 10:31:21,709-Speed 10482.30 samples/sec   Loss 2.5982   LearningRate 0.0180   Epoch: 17   Global Step: 89320   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-01-16 10:31:29,498-Speed 10519.65 samples/sec   Loss 2.6049   LearningRate 0.0180   Epoch: 17   Global Step: 89330   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-01-16 10:31:37,277-Speed 10531.60 samples/sec   Loss 2.6287   LearningRate 0.0179   Epoch: 17   Global Step: 89340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:31:45,104-Speed 10467.31 samples/sec   Loss 2.5736   LearningRate 0.0179   Epoch: 17   Global Step: 89350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:31:52,921-Speed 10481.37 samples/sec   Loss 2.5643   LearningRate 0.0179   Epoch: 17   Global Step: 89360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:32:00,709-Speed 10521.00 samples/sec   Loss 2.5794   LearningRate 0.0179   Epoch: 17   Global Step: 89370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:32:08,491-Speed 10527.95 samples/sec   Loss 2.5813   LearningRate 0.0178   Epoch: 17   Global Step: 89380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:32:16,315-Speed 10472.03 samples/sec   Loss 2.5953   LearningRate 0.0178   Epoch: 17   Global Step: 89390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:32:24,127-Speed 10487.35 samples/sec   Loss 2.5587   LearningRate 0.0178   Epoch: 17   Global Step: 89400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:32:31,931-Speed 10499.71 samples/sec   Loss 2.5688   LearningRate 0.0178   Epoch: 17   Global Step: 89410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:32:39,723-Speed 10514.91 samples/sec   Loss 2.5672   LearningRate 0.0177   Epoch: 17   Global Step: 89420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:32:47,508-Speed 10523.53 samples/sec   Loss 2.5727   LearningRate 0.0177   Epoch: 17   Global Step: 89430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:32:55,304-Speed 10508.19 samples/sec   Loss 2.5831   LearningRate 0.0177   Epoch: 17   Global Step: 89440   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-01-16 10:33:03,099-Speed 10519.54 samples/sec   Loss 2.5777   LearningRate 0.0177   Epoch: 17   Global Step: 89450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:33:10,894-Speed 10510.89 samples/sec   Loss 2.5455   LearningRate 0.0176   Epoch: 17   Global Step: 89460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:33:18,717-Speed 10472.65 samples/sec   Loss 2.5879   LearningRate 0.0176   Epoch: 17   Global Step: 89470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:33:26,516-Speed 10505.47 samples/sec   Loss 2.5576   LearningRate 0.0176   Epoch: 17   Global Step: 89480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:33:34,323-Speed 10495.86 samples/sec   Loss 2.5791   LearningRate 0.0176   Epoch: 17   Global Step: 89490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:33:42,147-Speed 10471.40 samples/sec   Loss 2.5559   LearningRate 0.0175   Epoch: 17   Global Step: 89500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:33:49,941-Speed 10512.83 samples/sec   Loss 2.5448   LearningRate 0.0175   Epoch: 17   Global Step: 89510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:33:57,731-Speed 10517.79 samples/sec   Loss 2.5435   LearningRate 0.0175   Epoch: 17   Global Step: 89520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:34:05,523-Speed 10514.03 samples/sec   Loss 2.5586   LearningRate 0.0175   Epoch: 17   Global Step: 89530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:34:13,308-Speed 10524.91 samples/sec   Loss 2.5649   LearningRate 0.0174   Epoch: 17   Global Step: 89540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:34:21,095-Speed 10521.46 samples/sec   Loss 2.5650   LearningRate 0.0174   Epoch: 17   Global Step: 89550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:34:28,923-Speed 10465.37 samples/sec   Loss 2.5557   LearningRate 0.0174   Epoch: 17   Global Step: 89560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:34:36,773-Speed 10444.15 samples/sec   Loss 2.5565   LearningRate 0.0174   Epoch: 17   Global Step: 89570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:34:44,568-Speed 10511.48 samples/sec   Loss 2.5623   LearningRate 0.0173   Epoch: 17   Global Step: 89580   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:34:52,363-Speed 10510.43 samples/sec   Loss 2.5637   LearningRate 0.0173   Epoch: 17   Global Step: 89590   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:35:00,195-Speed 10460.70 samples/sec   Loss 2.5655   LearningRate 0.0173   Epoch: 17   Global Step: 89600   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:35:07,997-Speed 10501.98 samples/sec   Loss 2.5583   LearningRate 0.0173   Epoch: 17   Global Step: 89610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:35:15,818-Speed 10475.04 samples/sec   Loss 2.5468   LearningRate 0.0172   Epoch: 17   Global Step: 89620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:35:23,633-Speed 10483.75 samples/sec   Loss 2.5680   LearningRate 0.0172   Epoch: 17   Global Step: 89630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:35:31,422-Speed 10519.44 samples/sec   Loss 2.5487   LearningRate 0.0172   Epoch: 17   Global Step: 89640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:35:39,275-Speed 10432.57 samples/sec   Loss 2.5520   LearningRate 0.0172   Epoch: 17   Global Step: 89650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:35:47,082-Speed 10495.54 samples/sec   Loss 2.5936   LearningRate 0.0171   Epoch: 17   Global Step: 89660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:35:54,935-Speed 10432.99 samples/sec   Loss 2.5657   LearningRate 0.0171   Epoch: 17   Global Step: 89670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:36:02,726-Speed 10515.58 samples/sec   Loss 2.5365   LearningRate 0.0171   Epoch: 17   Global Step: 89680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:36:10,520-Speed 10511.22 samples/sec   Loss 2.5481   LearningRate 0.0171   Epoch: 17   Global Step: 89690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:36:18,359-Speed 10452.48 samples/sec   Loss 2.5289   LearningRate 0.0170   Epoch: 17   Global Step: 89700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:36:26,144-Speed 10527.60 samples/sec   Loss 2.5340   LearningRate 0.0170   Epoch: 17   Global Step: 89710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:36:33,951-Speed 10494.90 samples/sec   Loss 2.5408   LearningRate 0.0170   Epoch: 17   Global Step: 89720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:36:41,775-Speed 10470.93 samples/sec   Loss 2.5543   LearningRate 0.0170   Epoch: 17   Global Step: 89730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:36:49,586-Speed 10490.75 samples/sec   Loss 2.5539   LearningRate 0.0169   Epoch: 17   Global Step: 89740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:36:57,370-Speed 10524.46 samples/sec   Loss 2.4892   LearningRate 0.0169   Epoch: 17   Global Step: 89750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:37:05,195-Speed 10470.40 samples/sec   Loss 2.5505   LearningRate 0.0169   Epoch: 17   Global Step: 89760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:37:12,982-Speed 10522.53 samples/sec   Loss 2.5319   LearningRate 0.0169   Epoch: 17   Global Step: 89770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:37:20,774-Speed 10514.31 samples/sec   Loss 2.5387   LearningRate 0.0169   Epoch: 17   Global Step: 89780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:37:28,602-Speed 10466.05 samples/sec   Loss 2.5361   LearningRate 0.0168   Epoch: 17   Global Step: 89790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:37:36,405-Speed 10504.41 samples/sec   Loss 2.5235   LearningRate 0.0168   Epoch: 17   Global Step: 89800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:37:44,191-Speed 10523.28 samples/sec   Loss 2.5221   LearningRate 0.0168   Epoch: 17   Global Step: 89810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:37:52,005-Speed 10484.63 samples/sec   Loss 2.5140   LearningRate 0.0168   Epoch: 17   Global Step: 89820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:37:59,799-Speed 10512.83 samples/sec   Loss 2.5509   LearningRate 0.0167   Epoch: 17   Global Step: 89830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:38:07,610-Speed 10488.54 samples/sec   Loss 2.5069   LearningRate 0.0167   Epoch: 17   Global Step: 89840   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:38:15,428-Speed 10479.56 samples/sec   Loss 2.5158   LearningRate 0.0167   Epoch: 17   Global Step: 89850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:38:23,236-Speed 10493.46 samples/sec   Loss 2.5190   LearningRate 0.0167   Epoch: 17   Global Step: 89860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:38:31,039-Speed 10500.46 samples/sec   Loss 2.4876   LearningRate 0.0166   Epoch: 17   Global Step: 89870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:38:38,839-Speed 10504.30 samples/sec   Loss 2.5198   LearningRate 0.0166   Epoch: 17   Global Step: 89880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:38:46,644-Speed 10496.73 samples/sec   Loss 2.5063   LearningRate 0.0166   Epoch: 17   Global Step: 89890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:38:54,467-Speed 10473.57 samples/sec   Loss 2.5145   LearningRate 0.0166   Epoch: 17   Global Step: 89900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:39:02,266-Speed 10505.05 samples/sec   Loss 2.5193   LearningRate 0.0165   Epoch: 17   Global Step: 89910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:39:10,070-Speed 10498.72 samples/sec   Loss 2.5317   LearningRate 0.0165   Epoch: 17   Global Step: 89920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:39:17,878-Speed 10492.80 samples/sec   Loss 2.5299   LearningRate 0.0165   Epoch: 17   Global Step: 89930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:39:25,690-Speed 10488.21 samples/sec   Loss 2.5409   LearningRate 0.0165   Epoch: 17   Global Step: 89940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:39:33,474-Speed 10525.46 samples/sec   Loss 2.5200   LearningRate 0.0164   Epoch: 17   Global Step: 89950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:39:41,305-Speed 10462.21 samples/sec   Loss 2.5019   LearningRate 0.0164   Epoch: 17   Global Step: 89960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:39:49,129-Speed 10471.74 samples/sec   Loss 2.4828   LearningRate 0.0164   Epoch: 17   Global Step: 89970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:39:56,925-Speed 10508.97 samples/sec   Loss 2.5088   LearningRate 0.0164   Epoch: 17   Global Step: 89980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:40:04,747-Speed 10478.75 samples/sec   Loss 2.4963   LearningRate 0.0163   Epoch: 17   Global Step: 89990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:40:12,575-Speed 10465.61 samples/sec   Loss 2.5064   LearningRate 0.0163   Epoch: 17   Global Step: 90000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:40:40,215-[lfw][90000]XNorm: 23.583664
Training: 2022-01-16 10:40:40,216-[lfw][90000]Accuracy-Flip: 0.99783+-0.00248
Training: 2022-01-16 10:40:40,216-[lfw][90000]Accuracy-Highest: 0.99783
Training: 2022-01-16 10:41:12,288-[cfp_fp][90000]XNorm: 21.463575
Training: 2022-01-16 10:41:12,288-[cfp_fp][90000]Accuracy-Flip: 0.99257+-0.00393
Training: 2022-01-16 10:41:12,288-[cfp_fp][90000]Accuracy-Highest: 0.99257
Training: 2022-01-16 10:41:39,973-[agedb_30][90000]XNorm: 23.186239
Training: 2022-01-16 10:41:39,974-[agedb_30][90000]Accuracy-Flip: 0.98083+-0.00569
Training: 2022-01-16 10:41:39,974-[agedb_30][90000]Accuracy-Highest: 0.98083
Training: 2022-01-16 10:41:47,722-Speed 861.01 samples/sec   Loss 2.5005   LearningRate 0.0163   Epoch: 17   Global Step: 90010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:41:55,454-Speed 10596.87 samples/sec   Loss 2.5265   LearningRate 0.0163   Epoch: 17   Global Step: 90020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:42:03,209-Speed 10564.52 samples/sec   Loss 2.4929   LearningRate 0.0162   Epoch: 17   Global Step: 90030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:42:10,974-Speed 10551.70 samples/sec   Loss 2.5398   LearningRate 0.0162   Epoch: 17   Global Step: 90040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:42:18,768-Speed 10511.61 samples/sec   Loss 2.5043   LearningRate 0.0162   Epoch: 17   Global Step: 90050   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-01-16 10:42:26,577-Speed 10491.83 samples/sec   Loss 2.4913   LearningRate 0.0162   Epoch: 17   Global Step: 90060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:42:34,345-Speed 10547.27 samples/sec   Loss 2.4706   LearningRate 0.0162   Epoch: 17   Global Step: 90070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:42:42,103-Speed 10560.11 samples/sec   Loss 2.4965   LearningRate 0.0161   Epoch: 17   Global Step: 90080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:42:49,886-Speed 10527.12 samples/sec   Loss 2.4993   LearningRate 0.0161   Epoch: 17   Global Step: 90090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:42:57,635-Speed 10573.37 samples/sec   Loss 2.4939   LearningRate 0.0161   Epoch: 17   Global Step: 90100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:43:05,392-Speed 10562.48 samples/sec   Loss 2.4855   LearningRate 0.0161   Epoch: 17   Global Step: 90110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:43:13,158-Speed 10549.60 samples/sec   Loss 2.4633   LearningRate 0.0160   Epoch: 17   Global Step: 90120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:43:20,939-Speed 10529.41 samples/sec   Loss 2.4618   LearningRate 0.0160   Epoch: 17   Global Step: 90130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:43:28,767-Speed 10466.98 samples/sec   Loss 2.4715   LearningRate 0.0160   Epoch: 17   Global Step: 90140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:43:36,562-Speed 10511.43 samples/sec   Loss 2.4439   LearningRate 0.0160   Epoch: 17   Global Step: 90150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:43:44,332-Speed 10543.06 samples/sec   Loss 2.4765   LearningRate 0.0159   Epoch: 17   Global Step: 90160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:43:52,109-Speed 10534.94 samples/sec   Loss 2.5049   LearningRate 0.0159   Epoch: 17   Global Step: 90170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:43:59,890-Speed 10530.36 samples/sec   Loss 2.4925   LearningRate 0.0159   Epoch: 17   Global Step: 90180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:44:07,660-Speed 10544.56 samples/sec   Loss 2.5037   LearningRate 0.0159   Epoch: 17   Global Step: 90190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:44:15,436-Speed 10536.72 samples/sec   Loss 2.4735   LearningRate 0.0158   Epoch: 17   Global Step: 90200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:44:23,223-Speed 10521.18 samples/sec   Loss 2.4702   LearningRate 0.0158   Epoch: 17   Global Step: 90210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:44:30,981-Speed 10564.62 samples/sec   Loss 2.4772   LearningRate 0.0158   Epoch: 17   Global Step: 90220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:44:38,752-Speed 10543.24 samples/sec   Loss 2.4805   LearningRate 0.0158   Epoch: 17   Global Step: 90230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:44:46,541-Speed 10517.65 samples/sec   Loss 2.4544   LearningRate 0.0158   Epoch: 17   Global Step: 90240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:44:54,312-Speed 10544.25 samples/sec   Loss 2.4370   LearningRate 0.0157   Epoch: 17   Global Step: 90250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:45:02,089-Speed 10535.65 samples/sec   Loss 2.4481   LearningRate 0.0157   Epoch: 17   Global Step: 90260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:45:09,880-Speed 10515.91 samples/sec   Loss 2.4508   LearningRate 0.0157   Epoch: 17   Global Step: 90270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:45:17,700-Speed 10477.28 samples/sec   Loss 2.4495   LearningRate 0.0157   Epoch: 17   Global Step: 90280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:45:25,475-Speed 10538.24 samples/sec   Loss 2.4576   LearningRate 0.0156   Epoch: 17   Global Step: 90290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:45:33,242-Speed 10549.01 samples/sec   Loss 2.4547   LearningRate 0.0156   Epoch: 17   Global Step: 90300   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:45:41,007-Speed 10550.83 samples/sec   Loss 2.4670   LearningRate 0.0156   Epoch: 17   Global Step: 90310   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:45:48,772-Speed 10552.18 samples/sec   Loss 2.4722   LearningRate 0.0156   Epoch: 17   Global Step: 90320   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:45:56,550-Speed 10533.81 samples/sec   Loss 2.4543   LearningRate 0.0155   Epoch: 17   Global Step: 90330   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:46:04,317-Speed 10548.44 samples/sec   Loss 2.4587   LearningRate 0.0155   Epoch: 17   Global Step: 90340   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:46:12,085-Speed 10547.66 samples/sec   Loss 2.4196   LearningRate 0.0155   Epoch: 17   Global Step: 90350   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:46:19,867-Speed 10527.07 samples/sec   Loss 2.4605   LearningRate 0.0155   Epoch: 17   Global Step: 90360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:46:27,655-Speed 10520.72 samples/sec   Loss 2.4683   LearningRate 0.0155   Epoch: 17   Global Step: 90370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:46:35,418-Speed 10556.87 samples/sec   Loss 2.4651   LearningRate 0.0154   Epoch: 17   Global Step: 90380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:46:43,188-Speed 10544.52 samples/sec   Loss 2.4618   LearningRate 0.0154   Epoch: 17   Global Step: 90390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:46:50,977-Speed 10517.76 samples/sec   Loss 2.4609   LearningRate 0.0154   Epoch: 17   Global Step: 90400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:46:58,765-Speed 10520.67 samples/sec   Loss 2.4615   LearningRate 0.0154   Epoch: 17   Global Step: 90410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:47:06,568-Speed 10500.28 samples/sec   Loss 2.4220   LearningRate 0.0153   Epoch: 17   Global Step: 90420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:47:14,334-Speed 10549.81 samples/sec   Loss 2.4423   LearningRate 0.0153   Epoch: 17   Global Step: 90430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:47:22,120-Speed 10522.26 samples/sec   Loss 2.4345   LearningRate 0.0153   Epoch: 17   Global Step: 90440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:47:29,901-Speed 10530.43 samples/sec   Loss 2.4155   LearningRate 0.0153   Epoch: 17   Global Step: 90450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:47:37,665-Speed 10552.52 samples/sec   Loss 2.4472   LearningRate 0.0152   Epoch: 17   Global Step: 90460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:47:45,455-Speed 10517.47 samples/sec   Loss 2.4297   LearningRate 0.0152   Epoch: 17   Global Step: 90470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:47:53,252-Speed 10508.06 samples/sec   Loss 2.4344   LearningRate 0.0152   Epoch: 17   Global Step: 90480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:48:01,050-Speed 10506.94 samples/sec   Loss 2.4362   LearningRate 0.0152   Epoch: 17   Global Step: 90490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:48:08,873-Speed 10472.99 samples/sec   Loss 2.4427   LearningRate 0.0151   Epoch: 17   Global Step: 90500   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-01-16 10:48:16,675-Speed 10501.30 samples/sec   Loss 2.4245   LearningRate 0.0151   Epoch: 17   Global Step: 90510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:48:24,463-Speed 10520.22 samples/sec   Loss 2.4347   LearningRate 0.0151   Epoch: 17   Global Step: 90520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:48:32,234-Speed 10543.91 samples/sec   Loss 2.4355   LearningRate 0.0151   Epoch: 17   Global Step: 90530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:48:39,998-Speed 10551.55 samples/sec   Loss 2.4275   LearningRate 0.0151   Epoch: 17   Global Step: 90540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:48:47,757-Speed 10559.63 samples/sec   Loss 2.4354   LearningRate 0.0150   Epoch: 17   Global Step: 90550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:48:55,531-Speed 10539.71 samples/sec   Loss 2.4562   LearningRate 0.0150   Epoch: 17   Global Step: 90560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:49:03,321-Speed 10517.54 samples/sec   Loss 2.4325   LearningRate 0.0150   Epoch: 17   Global Step: 90570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:49:11,103-Speed 10528.16 samples/sec   Loss 2.4200   LearningRate 0.0150   Epoch: 17   Global Step: 90580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:49:18,877-Speed 10537.63 samples/sec   Loss 2.4386   LearningRate 0.0149   Epoch: 17   Global Step: 90590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:49:26,658-Speed 10530.94 samples/sec   Loss 2.4231   LearningRate 0.0149   Epoch: 17   Global Step: 90600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:49:34,428-Speed 10543.92 samples/sec   Loss 2.4148   LearningRate 0.0149   Epoch: 17   Global Step: 90610   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-01-16 10:49:42,188-Speed 10558.15 samples/sec   Loss 2.4298   LearningRate 0.0149   Epoch: 17   Global Step: 90620   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-01-16 10:49:49,970-Speed 10528.27 samples/sec   Loss 2.4190   LearningRate 0.0149   Epoch: 17   Global Step: 90630   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-01-16 10:49:57,743-Speed 10541.07 samples/sec   Loss 2.4292   LearningRate 0.0148   Epoch: 17   Global Step: 90640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:50:05,532-Speed 10518.39 samples/sec   Loss 2.4282   LearningRate 0.0148   Epoch: 17   Global Step: 90650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:50:13,330-Speed 10505.81 samples/sec   Loss 2.4092   LearningRate 0.0148   Epoch: 17   Global Step: 90660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:50:21,103-Speed 10540.90 samples/sec   Loss 2.4246   LearningRate 0.0148   Epoch: 17   Global Step: 90670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:50:28,895-Speed 10514.15 samples/sec   Loss 2.4038   LearningRate 0.0147   Epoch: 17   Global Step: 90680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:50:36,662-Speed 10551.61 samples/sec   Loss 2.4094   LearningRate 0.0147   Epoch: 17   Global Step: 90690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:50:44,438-Speed 10535.56 samples/sec   Loss 2.4257   LearningRate 0.0147   Epoch: 17   Global Step: 90700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:50:52,213-Speed 10543.47 samples/sec   Loss 2.4169   LearningRate 0.0147   Epoch: 17   Global Step: 90710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:51:00,003-Speed 10518.52 samples/sec   Loss 2.4411   LearningRate 0.0146   Epoch: 17   Global Step: 90720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:51:07,782-Speed 10532.12 samples/sec   Loss 2.4229   LearningRate 0.0146   Epoch: 17   Global Step: 90730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:51:15,593-Speed 10488.85 samples/sec   Loss 2.4146   LearningRate 0.0146   Epoch: 17   Global Step: 90740   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-01-16 10:51:23,379-Speed 10523.37 samples/sec   Loss 2.4229   LearningRate 0.0146   Epoch: 17   Global Step: 90750   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-01-16 10:51:31,194-Speed 10484.43 samples/sec   Loss 2.4350   LearningRate 0.0146   Epoch: 17   Global Step: 90760   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-01-16 10:51:38,994-Speed 10503.59 samples/sec   Loss 2.4040   LearningRate 0.0145   Epoch: 17   Global Step: 90770   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-01-16 10:51:46,777-Speed 10526.18 samples/sec   Loss 2.4053   LearningRate 0.0145   Epoch: 17   Global Step: 90780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:51:54,626-Speed 10438.15 samples/sec   Loss 2.4238   LearningRate 0.0145   Epoch: 17   Global Step: 90790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:52:02,405-Speed 10532.23 samples/sec   Loss 2.4210   LearningRate 0.0145   Epoch: 17   Global Step: 90800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:52:10,201-Speed 10509.67 samples/sec   Loss 2.4014   LearningRate 0.0144   Epoch: 17   Global Step: 90810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:52:17,975-Speed 10539.27 samples/sec   Loss 2.4064   LearningRate 0.0144   Epoch: 17   Global Step: 90820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:52:25,755-Speed 10532.28 samples/sec   Loss 2.3903   LearningRate 0.0144   Epoch: 17   Global Step: 90830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:52:33,539-Speed 10524.82 samples/sec   Loss 2.3538   LearningRate 0.0144   Epoch: 17   Global Step: 90840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:52:41,335-Speed 10509.93 samples/sec   Loss 2.4091   LearningRate 0.0144   Epoch: 17   Global Step: 90850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:52:49,108-Speed 10540.49 samples/sec   Loss 2.3830   LearningRate 0.0143   Epoch: 17   Global Step: 90860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:52:56,898-Speed 10517.70 samples/sec   Loss 2.3966   LearningRate 0.0143   Epoch: 17   Global Step: 90870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:53:04,691-Speed 10512.40 samples/sec   Loss 2.4218   LearningRate 0.0143   Epoch: 17   Global Step: 90880   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-01-16 10:53:12,484-Speed 10519.34 samples/sec   Loss 2.3581   LearningRate 0.0143   Epoch: 17   Global Step: 90890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:53:20,295-Speed 10489.96 samples/sec   Loss 2.3728   LearningRate 0.0142   Epoch: 17   Global Step: 90900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:53:28,094-Speed 10505.04 samples/sec   Loss 2.3779   LearningRate 0.0142   Epoch: 17   Global Step: 90910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:53:35,877-Speed 10527.06 samples/sec   Loss 2.4117   LearningRate 0.0142   Epoch: 17   Global Step: 90920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:53:43,670-Speed 10512.93 samples/sec   Loss 2.3899   LearningRate 0.0142   Epoch: 17   Global Step: 90930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:53:51,438-Speed 10547.45 samples/sec   Loss 2.4034   LearningRate 0.0142   Epoch: 17   Global Step: 90940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:53:59,214-Speed 10537.17 samples/sec   Loss 2.3710   LearningRate 0.0141   Epoch: 17   Global Step: 90950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:54:06,990-Speed 10534.80 samples/sec   Loss 2.3849   LearningRate 0.0141   Epoch: 17   Global Step: 90960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:54:14,816-Speed 10468.92 samples/sec   Loss 2.3912   LearningRate 0.0141   Epoch: 17   Global Step: 90970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:54:22,604-Speed 10520.72 samples/sec   Loss 2.4018   LearningRate 0.0141   Epoch: 17   Global Step: 90980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:54:30,384-Speed 10531.48 samples/sec   Loss 2.3606   LearningRate 0.0140   Epoch: 17   Global Step: 90990   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:54:38,170-Speed 10522.77 samples/sec   Loss 2.3863   LearningRate 0.0140   Epoch: 17   Global Step: 91000   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:54:45,987-Speed 10480.06 samples/sec   Loss 2.3933   LearningRate 0.0140   Epoch: 17   Global Step: 91010   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:54:53,757-Speed 10545.10 samples/sec   Loss 2.3663   LearningRate 0.0140   Epoch: 17   Global Step: 91020   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:55:01,535-Speed 10533.91 samples/sec   Loss 2.3863   LearningRate 0.0140   Epoch: 17   Global Step: 91030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:55:09,308-Speed 10540.55 samples/sec   Loss 2.3856   LearningRate 0.0139   Epoch: 17   Global Step: 91040   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:55:17,075-Speed 10547.90 samples/sec   Loss 2.3639   LearningRate 0.0139   Epoch: 17   Global Step: 91050   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:55:24,865-Speed 10518.38 samples/sec   Loss 2.3855   LearningRate 0.0139   Epoch: 17   Global Step: 91060   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:55:32,632-Speed 10548.21 samples/sec   Loss 2.3716   LearningRate 0.0139   Epoch: 17   Global Step: 91070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:55:40,406-Speed 10537.98 samples/sec   Loss 2.3636   LearningRate 0.0138   Epoch: 17   Global Step: 91080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:55:48,213-Speed 10502.07 samples/sec   Loss 2.3905   LearningRate 0.0138   Epoch: 17   Global Step: 91090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:55:55,990-Speed 10535.31 samples/sec   Loss 2.3595   LearningRate 0.0138   Epoch: 17   Global Step: 91100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:56:03,758-Speed 10546.39 samples/sec   Loss 2.3572   LearningRate 0.0138   Epoch: 17   Global Step: 91110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:56:11,526-Speed 10547.01 samples/sec   Loss 2.3571   LearningRate 0.0138   Epoch: 17   Global Step: 91120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:56:19,362-Speed 10456.43 samples/sec   Loss 2.3783   LearningRate 0.0137   Epoch: 17   Global Step: 91130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:56:27,156-Speed 10512.69 samples/sec   Loss 2.3731   LearningRate 0.0137   Epoch: 17   Global Step: 91140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:56:34,936-Speed 10529.98 samples/sec   Loss 2.3519   LearningRate 0.0137   Epoch: 17   Global Step: 91150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:56:42,733-Speed 10508.51 samples/sec   Loss 2.3559   LearningRate 0.0137   Epoch: 17   Global Step: 91160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:56:50,519-Speed 10522.16 samples/sec   Loss 2.3444   LearningRate 0.0136   Epoch: 17   Global Step: 91170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:56:58,315-Speed 10509.75 samples/sec   Loss 2.3629   LearningRate 0.0136   Epoch: 17   Global Step: 91180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:57:06,124-Speed 10492.44 samples/sec   Loss 2.3670   LearningRate 0.0136   Epoch: 17   Global Step: 91190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:57:13,909-Speed 10524.13 samples/sec   Loss 2.3488   LearningRate 0.0136   Epoch: 17   Global Step: 91200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:57:21,690-Speed 10530.55 samples/sec   Loss 2.3448   LearningRate 0.0136   Epoch: 17   Global Step: 91210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:57:29,527-Speed 10454.60 samples/sec   Loss 2.3266   LearningRate 0.0135   Epoch: 17   Global Step: 91220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:57:37,306-Speed 10532.06 samples/sec   Loss 2.3607   LearningRate 0.0135   Epoch: 17   Global Step: 91230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:57:45,108-Speed 10501.08 samples/sec   Loss 2.3476   LearningRate 0.0135   Epoch: 17   Global Step: 91240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:57:52,893-Speed 10525.04 samples/sec   Loss 2.3607   LearningRate 0.0135   Epoch: 17   Global Step: 91250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:58:00,676-Speed 10527.33 samples/sec   Loss 2.3256   LearningRate 0.0135   Epoch: 17   Global Step: 91260   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:58:08,464-Speed 10520.07 samples/sec   Loss 2.3611   LearningRate 0.0134   Epoch: 17   Global Step: 91270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:58:16,260-Speed 10510.84 samples/sec   Loss 2.3506   LearningRate 0.0134   Epoch: 17   Global Step: 91280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:58:24,031-Speed 10542.69 samples/sec   Loss 2.3497   LearningRate 0.0134   Epoch: 17   Global Step: 91290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:58:31,822-Speed 10517.49 samples/sec   Loss 2.3509   LearningRate 0.0134   Epoch: 17   Global Step: 91300   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:58:39,671-Speed 10438.37 samples/sec   Loss 2.3411   LearningRate 0.0133   Epoch: 17   Global Step: 91310   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:58:47,481-Speed 10489.18 samples/sec   Loss 2.3209   LearningRate 0.0133   Epoch: 17   Global Step: 91320   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 10:58:55,288-Speed 10494.30 samples/sec   Loss 2.3354   LearningRate 0.0133   Epoch: 17   Global Step: 91330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:59:03,082-Speed 10512.63 samples/sec   Loss 2.3388   LearningRate 0.0133   Epoch: 17   Global Step: 91340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:59:10,892-Speed 10490.46 samples/sec   Loss 2.3409   LearningRate 0.0133   Epoch: 17   Global Step: 91350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:59:18,719-Speed 10467.83 samples/sec   Loss 2.3325   LearningRate 0.0132   Epoch: 17   Global Step: 91360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:59:26,516-Speed 10508.91 samples/sec   Loss 2.3121   LearningRate 0.0132   Epoch: 17   Global Step: 91370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:59:34,305-Speed 10519.67 samples/sec   Loss 2.3270   LearningRate 0.0132   Epoch: 17   Global Step: 91380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:59:42,078-Speed 10539.76 samples/sec   Loss 2.3307   LearningRate 0.0132   Epoch: 17   Global Step: 91390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:59:49,851-Speed 10541.80 samples/sec   Loss 2.3346   LearningRate 0.0132   Epoch: 17   Global Step: 91400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 10:59:57,639-Speed 10519.17 samples/sec   Loss 2.3384   LearningRate 0.0131   Epoch: 17   Global Step: 91410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:00:05,416-Speed 10534.42 samples/sec   Loss 2.3028   LearningRate 0.0131   Epoch: 17   Global Step: 91420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:00:13,211-Speed 10511.00 samples/sec   Loss 2.3207   LearningRate 0.0131   Epoch: 17   Global Step: 91430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:00:21,013-Speed 10501.97 samples/sec   Loss 2.3374   LearningRate 0.0131   Epoch: 17   Global Step: 91440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:00:28,799-Speed 10523.00 samples/sec   Loss 2.3072   LearningRate 0.0130   Epoch: 17   Global Step: 91450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:00:36,597-Speed 10506.17 samples/sec   Loss 2.3361   LearningRate 0.0130   Epoch: 17   Global Step: 91460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:00:44,425-Speed 10469.51 samples/sec   Loss 2.3025   LearningRate 0.0130   Epoch: 17   Global Step: 91470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:00:52,204-Speed 10533.47 samples/sec   Loss 2.3430   LearningRate 0.0130   Epoch: 17   Global Step: 91480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:01:00,026-Speed 10472.97 samples/sec   Loss 2.2978   LearningRate 0.0130   Epoch: 17   Global Step: 91490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:01:07,818-Speed 10514.21 samples/sec   Loss 2.2976   LearningRate 0.0129   Epoch: 17   Global Step: 91500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:01:15,602-Speed 10526.42 samples/sec   Loss 2.3223   LearningRate 0.0129   Epoch: 17   Global Step: 91510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:01:23,388-Speed 10523.93 samples/sec   Loss 2.3260   LearningRate 0.0129   Epoch: 17   Global Step: 91520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:01:31,206-Speed 10479.13 samples/sec   Loss 2.3302   LearningRate 0.0129   Epoch: 17   Global Step: 91530   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-01-16 11:01:38,997-Speed 10519.35 samples/sec   Loss 2.3109   LearningRate 0.0129   Epoch: 17   Global Step: 91540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:01:46,798-Speed 10502.74 samples/sec   Loss 2.3278   LearningRate 0.0128   Epoch: 17   Global Step: 91550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:01:54,628-Speed 10463.57 samples/sec   Loss 2.3301   LearningRate 0.0128   Epoch: 17   Global Step: 91560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:02:02,418-Speed 10517.56 samples/sec   Loss 2.2973   LearningRate 0.0128   Epoch: 17   Global Step: 91570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:02:10,221-Speed 10498.91 samples/sec   Loss 2.2991   LearningRate 0.0128   Epoch: 17   Global Step: 91580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:02:18,003-Speed 10529.16 samples/sec   Loss 2.2943   LearningRate 0.0127   Epoch: 17   Global Step: 91590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:02:25,794-Speed 10516.68 samples/sec   Loss 2.2933   LearningRate 0.0127   Epoch: 17   Global Step: 91600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:02:33,584-Speed 10517.17 samples/sec   Loss 2.3046   LearningRate 0.0127   Epoch: 17   Global Step: 91610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:02:41,381-Speed 10508.21 samples/sec   Loss 2.3164   LearningRate 0.0127   Epoch: 17   Global Step: 91620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:02:49,173-Speed 10518.58 samples/sec   Loss 2.3016   LearningRate 0.0127   Epoch: 17   Global Step: 91630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:02:56,998-Speed 10469.58 samples/sec   Loss 2.3053   LearningRate 0.0126   Epoch: 17   Global Step: 91640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:03:04,774-Speed 10537.24 samples/sec   Loss 2.3216   LearningRate 0.0126   Epoch: 17   Global Step: 91650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:03:12,564-Speed 10519.87 samples/sec   Loss 2.2818   LearningRate 0.0126   Epoch: 17   Global Step: 91660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:03:20,408-Speed 10446.31 samples/sec   Loss 2.2999   LearningRate 0.0126   Epoch: 17   Global Step: 91670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:03:28,203-Speed 10509.71 samples/sec   Loss 2.2950   LearningRate 0.0126   Epoch: 17   Global Step: 91680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:03:36,003-Speed 10502.97 samples/sec   Loss 2.2739   LearningRate 0.0125   Epoch: 17   Global Step: 91690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:03:43,824-Speed 10476.65 samples/sec   Loss 2.3099   LearningRate 0.0125   Epoch: 17   Global Step: 91700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:03:51,604-Speed 10531.71 samples/sec   Loss 2.3116   LearningRate 0.0125   Epoch: 17   Global Step: 91710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:03:59,463-Speed 10424.03 samples/sec   Loss 2.3037   LearningRate 0.0125   Epoch: 17   Global Step: 91720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:04:07,265-Speed 10501.16 samples/sec   Loss 2.3256   LearningRate 0.0125   Epoch: 17   Global Step: 91730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:04:15,088-Speed 10474.00 samples/sec   Loss 2.2874   LearningRate 0.0124   Epoch: 17   Global Step: 91740   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-01-16 11:04:22,865-Speed 10535.72 samples/sec   Loss 2.2800   LearningRate 0.0124   Epoch: 17   Global Step: 91750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:04:30,654-Speed 10519.72 samples/sec   Loss 2.2733   LearningRate 0.0124   Epoch: 17   Global Step: 91760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:04:38,449-Speed 10510.30 samples/sec   Loss 2.3020   LearningRate 0.0124   Epoch: 17   Global Step: 91770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:04:46,260-Speed 10488.23 samples/sec   Loss 2.2669   LearningRate 0.0124   Epoch: 17   Global Step: 91780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:04:54,072-Speed 10488.35 samples/sec   Loss 2.2822   LearningRate 0.0123   Epoch: 17   Global Step: 91790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 11:05:01,873-Speed 10502.93 samples/sec   Loss 2.2735   LearningRate 0.0123   Epoch: 17   Global Step: 91800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 11:05:09,686-Speed 10486.06 samples/sec   Loss 2.2835   LearningRate 0.0123   Epoch: 17   Global Step: 91810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 11:05:17,479-Speed 10514.01 samples/sec   Loss 2.2644   LearningRate 0.0123   Epoch: 17   Global Step: 91820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 11:05:25,274-Speed 10511.15 samples/sec   Loss 2.2823   LearningRate 0.0122   Epoch: 17   Global Step: 91830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 11:05:33,046-Speed 10541.52 samples/sec   Loss 2.2654   LearningRate 0.0122   Epoch: 17   Global Step: 91840   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 11:05:40,850-Speed 10498.79 samples/sec   Loss 2.2807   LearningRate 0.0122   Epoch: 17   Global Step: 91850   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 11:05:48,641-Speed 10515.77 samples/sec   Loss 2.2799   LearningRate 0.0122   Epoch: 17   Global Step: 91860   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 11:05:56,452-Speed 10488.04 samples/sec   Loss 2.2713   LearningRate 0.0122   Epoch: 17   Global Step: 91870   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 11:06:04,248-Speed 10510.20 samples/sec   Loss 2.2435   LearningRate 0.0121   Epoch: 17   Global Step: 91880   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 11:06:12,079-Speed 10462.09 samples/sec   Loss 2.2765   LearningRate 0.0121   Epoch: 17   Global Step: 91890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:06:19,889-Speed 10491.28 samples/sec   Loss 2.2696   LearningRate 0.0121   Epoch: 17   Global Step: 91900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:06:27,697-Speed 10492.22 samples/sec   Loss 2.2596   LearningRate 0.0121   Epoch: 17   Global Step: 91910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:06:35,510-Speed 10487.18 samples/sec   Loss 2.2754   LearningRate 0.0121   Epoch: 17   Global Step: 91920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:06:43,304-Speed 10512.76 samples/sec   Loss 2.2652   LearningRate 0.0120   Epoch: 17   Global Step: 91930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:06:51,134-Speed 10464.16 samples/sec   Loss 2.2747   LearningRate 0.0120   Epoch: 17   Global Step: 91940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:06:58,951-Speed 10481.70 samples/sec   Loss 2.2975   LearningRate 0.0120   Epoch: 17   Global Step: 91950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:07:06,752-Speed 10504.89 samples/sec   Loss 2.2415   LearningRate 0.0120   Epoch: 17   Global Step: 91960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:07:14,567-Speed 10483.72 samples/sec   Loss 2.2483   LearningRate 0.0120   Epoch: 17   Global Step: 91970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:07:22,410-Speed 10446.22 samples/sec   Loss 2.2473   LearningRate 0.0119   Epoch: 17   Global Step: 91980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:07:30,230-Speed 10477.82 samples/sec   Loss 2.2547   LearningRate 0.0119   Epoch: 17   Global Step: 91990   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-01-16 11:07:38,012-Speed 10527.53 samples/sec   Loss 2.2695   LearningRate 0.0119   Epoch: 17   Global Step: 92000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:07:45,818-Speed 10495.92 samples/sec   Loss 2.2307   LearningRate 0.0119   Epoch: 17   Global Step: 92010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:07:53,664-Speed 10444.54 samples/sec   Loss 2.2702   LearningRate 0.0119   Epoch: 17   Global Step: 92020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-16 11:08:01,445-Speed 10530.04 samples/sec   Loss 2.2510   LearningRate 0.0118   Epoch: 17   Global Step: 92030   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 11:08:09,246-Speed 10501.92 samples/sec   Loss 2.2517   LearningRate 0.0118   Epoch: 17   Global Step: 92040   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 11:08:17,069-Speed 10474.52 samples/sec   Loss 2.2462   LearningRate 0.0118   Epoch: 17   Global Step: 92050   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 11:08:24,924-Speed 10430.53 samples/sec   Loss 2.1995   LearningRate 0.0118   Epoch: 17   Global Step: 92060   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 11:08:32,723-Speed 10506.21 samples/sec   Loss 2.2429   LearningRate 0.0118   Epoch: 17   Global Step: 92070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 11:08:40,528-Speed 10499.14 samples/sec   Loss 2.2357   LearningRate 0.0117   Epoch: 17   Global Step: 92080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 11:08:48,310-Speed 10527.10 samples/sec   Loss 2.2412   LearningRate 0.0117   Epoch: 17   Global Step: 92090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 11:08:56,103-Speed 10513.14 samples/sec   Loss 2.2484   LearningRate 0.0117   Epoch: 17   Global Step: 92100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 11:09:03,886-Speed 10527.02 samples/sec   Loss 2.2583   LearningRate 0.0117   Epoch: 17   Global Step: 92110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 11:09:11,665-Speed 10533.07 samples/sec   Loss 2.2286   LearningRate 0.0117   Epoch: 17   Global Step: 92120   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-16 11:09:19,468-Speed 10499.18 samples/sec   Loss 2.2502   LearningRate 0.0116   Epoch: 17   Global Step: 92130   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-16 11:09:27,286-Speed 10479.76 samples/sec   Loss 2.2365   LearningRate 0.0116   Epoch: 17   Global Step: 92140   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-16 11:09:35,073-Speed 10522.73 samples/sec   Loss 2.2294   LearningRate 0.0116   Epoch: 17   Global Step: 92150   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-16 11:09:42,863-Speed 10525.20 samples/sec   Loss 2.2417   LearningRate 0.0116   Epoch: 17   Global Step: 92160   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-16 11:09:50,645-Speed 10528.06 samples/sec   Loss 2.2514   LearningRate 0.0116   Epoch: 17   Global Step: 92170   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-16 11:09:58,427-Speed 10527.39 samples/sec   Loss 2.2252   LearningRate 0.0115   Epoch: 17   Global Step: 92180   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-16 11:10:06,242-Speed 10483.99 samples/sec   Loss 2.2405   LearningRate 0.0115   Epoch: 17   Global Step: 92190   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-16 11:10:14,068-Speed 10469.67 samples/sec   Loss 2.2529   LearningRate 0.0115   Epoch: 17   Global Step: 92200   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-16 11:10:21,891-Speed 10472.65 samples/sec   Loss 2.2504   LearningRate 0.0115   Epoch: 17   Global Step: 92210   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-16 11:10:29,695-Speed 10498.51 samples/sec   Loss 2.2364   LearningRate 0.0115   Epoch: 17   Global Step: 92220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 11:10:37,499-Speed 10501.87 samples/sec   Loss 2.2282   LearningRate 0.0114   Epoch: 17   Global Step: 92230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 11:10:45,306-Speed 10495.58 samples/sec   Loss 2.2491   LearningRate 0.0114   Epoch: 17   Global Step: 92240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 11:10:53,091-Speed 10524.17 samples/sec   Loss 2.2179   LearningRate 0.0114   Epoch: 17   Global Step: 92250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 11:11:00,879-Speed 10519.66 samples/sec   Loss 2.2171   LearningRate 0.0114   Epoch: 17   Global Step: 92260   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 11:11:08,674-Speed 10510.95 samples/sec   Loss 2.2442   LearningRate 0.0114   Epoch: 17   Global Step: 92270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 11:11:16,513-Speed 10451.76 samples/sec   Loss 2.2326   LearningRate 0.0113   Epoch: 17   Global Step: 92280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 11:11:24,327-Speed 10484.85 samples/sec   Loss 2.2577   LearningRate 0.0113   Epoch: 17   Global Step: 92290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 11:11:32,130-Speed 10499.90 samples/sec   Loss 2.2497   LearningRate 0.0113   Epoch: 17   Global Step: 92300   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 11:11:39,936-Speed 10496.42 samples/sec   Loss 2.2217   LearningRate 0.0113   Epoch: 17   Global Step: 92310   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-16 11:11:47,729-Speed 10513.65 samples/sec   Loss 2.2043   LearningRate 0.0113   Epoch: 17   Global Step: 92320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:11:55,532-Speed 10499.57 samples/sec   Loss 2.1998   LearningRate 0.0112   Epoch: 17   Global Step: 92330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:12:03,335-Speed 10500.66 samples/sec   Loss 2.1954   LearningRate 0.0112   Epoch: 17   Global Step: 92340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:12:11,127-Speed 10515.19 samples/sec   Loss 2.2285   LearningRate 0.0112   Epoch: 17   Global Step: 92350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:12:18,924-Speed 10507.17 samples/sec   Loss 2.2133   LearningRate 0.0112   Epoch: 17   Global Step: 92360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:12:26,725-Speed 10503.02 samples/sec   Loss 2.2218   LearningRate 0.0112   Epoch: 17   Global Step: 92370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:12:34,524-Speed 10504.62 samples/sec   Loss 2.2196   LearningRate 0.0111   Epoch: 17   Global Step: 92380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:12:42,322-Speed 10507.37 samples/sec   Loss 2.2062   LearningRate 0.0111   Epoch: 17   Global Step: 92390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:12:50,121-Speed 10504.89 samples/sec   Loss 2.2213   LearningRate 0.0111   Epoch: 17   Global Step: 92400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:12:57,912-Speed 10515.23 samples/sec   Loss 2.2206   LearningRate 0.0111   Epoch: 17   Global Step: 92410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:13:05,696-Speed 10532.86 samples/sec   Loss 2.2126   LearningRate 0.0111   Epoch: 17   Global Step: 92420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:13:13,487-Speed 10517.95 samples/sec   Loss 2.2088   LearningRate 0.0110   Epoch: 17   Global Step: 92430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:13:21,289-Speed 10500.35 samples/sec   Loss 2.2191   LearningRate 0.0110   Epoch: 17   Global Step: 92440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:13:29,099-Speed 10490.06 samples/sec   Loss 2.2109   LearningRate 0.0110   Epoch: 17   Global Step: 92450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:13:36,965-Speed 10415.58 samples/sec   Loss 2.2099   LearningRate 0.0110   Epoch: 17   Global Step: 92460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:13:44,881-Speed 10350.70 samples/sec   Loss 2.1875   LearningRate 0.0110   Epoch: 17   Global Step: 92470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:13:52,679-Speed 10506.57 samples/sec   Loss 2.1948   LearningRate 0.0109   Epoch: 17   Global Step: 92480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:14:00,478-Speed 10504.53 samples/sec   Loss 2.1542   LearningRate 0.0109   Epoch: 17   Global Step: 92490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:14:08,251-Speed 10540.72 samples/sec   Loss 2.1847   LearningRate 0.0109   Epoch: 17   Global Step: 92500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:14:16,048-Speed 10508.38 samples/sec   Loss 2.1954   LearningRate 0.0109   Epoch: 17   Global Step: 92510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:14:23,831-Speed 10527.89 samples/sec   Loss 2.1696   LearningRate 0.0109   Epoch: 17   Global Step: 92520   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-01-16 11:14:31,671-Speed 10451.82 samples/sec   Loss 2.2333   LearningRate 0.0108   Epoch: 17   Global Step: 92530   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-01-16 11:14:39,471-Speed 10503.85 samples/sec   Loss 2.1789   LearningRate 0.0108   Epoch: 17   Global Step: 92540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:14:47,289-Speed 10479.80 samples/sec   Loss 2.2024   LearningRate 0.0108   Epoch: 17   Global Step: 92550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:14:55,109-Speed 10478.02 samples/sec   Loss 2.2085   LearningRate 0.0108   Epoch: 17   Global Step: 92560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:15:02,901-Speed 10513.72 samples/sec   Loss 2.2072   LearningRate 0.0108   Epoch: 17   Global Step: 92570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:15:10,739-Speed 10454.19 samples/sec   Loss 2.1880   LearningRate 0.0107   Epoch: 17   Global Step: 92580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:15:18,534-Speed 10511.16 samples/sec   Loss 2.2152   LearningRate 0.0107   Epoch: 17   Global Step: 92590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:15:26,313-Speed 10532.39 samples/sec   Loss 2.1938   LearningRate 0.0107   Epoch: 17   Global Step: 92600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:15:34,091-Speed 10535.68 samples/sec   Loss 2.1933   LearningRate 0.0107   Epoch: 17   Global Step: 92610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:15:41,897-Speed 10495.19 samples/sec   Loss 2.1880   LearningRate 0.0107   Epoch: 17   Global Step: 92620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:15:49,696-Speed 10506.25 samples/sec   Loss 2.1987   LearningRate 0.0106   Epoch: 17   Global Step: 92630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:15:57,501-Speed 10496.60 samples/sec   Loss 2.2085   LearningRate 0.0106   Epoch: 17   Global Step: 92640   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-01-16 11:16:05,311-Speed 10490.06 samples/sec   Loss 2.1870   LearningRate 0.0106   Epoch: 17   Global Step: 92650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:16:13,102-Speed 10517.90 samples/sec   Loss 2.1790   LearningRate 0.0106   Epoch: 17   Global Step: 92660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:16:20,887-Speed 10524.27 samples/sec   Loss 2.1739   LearningRate 0.0106   Epoch: 17   Global Step: 92670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:16:28,648-Speed 10556.06 samples/sec   Loss 2.1753   LearningRate 0.0106   Epoch: 17   Global Step: 92680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:16:36,451-Speed 10501.18 samples/sec   Loss 2.1547   LearningRate 0.0105   Epoch: 17   Global Step: 92690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:16:44,294-Speed 10446.14 samples/sec   Loss 2.1541   LearningRate 0.0105   Epoch: 17   Global Step: 92700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:16:52,071-Speed 10535.59 samples/sec   Loss 2.1863   LearningRate 0.0105   Epoch: 17   Global Step: 92710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:16:59,865-Speed 10511.46 samples/sec   Loss 2.1723   LearningRate 0.0105   Epoch: 17   Global Step: 92720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:17:07,664-Speed 10505.12 samples/sec   Loss 2.1621   LearningRate 0.0105   Epoch: 17   Global Step: 92730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:17:15,448-Speed 10525.15 samples/sec   Loss 2.1768   LearningRate 0.0104   Epoch: 17   Global Step: 92740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:17:23,255-Speed 10494.52 samples/sec   Loss 2.1501   LearningRate 0.0104   Epoch: 17   Global Step: 92750   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-01-16 11:17:31,063-Speed 10492.67 samples/sec   Loss 2.1764   LearningRate 0.0104   Epoch: 17   Global Step: 92760   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-01-16 11:17:38,837-Speed 10540.34 samples/sec   Loss 2.1677   LearningRate 0.0104   Epoch: 17   Global Step: 92770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:17:46,609-Speed 10542.25 samples/sec   Loss 2.1678   LearningRate 0.0104   Epoch: 17   Global Step: 92780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:17:54,393-Speed 10525.09 samples/sec   Loss 2.1805   LearningRate 0.0103   Epoch: 17   Global Step: 92790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:18:02,194-Speed 10502.32 samples/sec   Loss 2.2089   LearningRate 0.0103   Epoch: 17   Global Step: 92800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:18:09,987-Speed 10513.24 samples/sec   Loss 2.1762   LearningRate 0.0103   Epoch: 17   Global Step: 92810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:18:17,760-Speed 10539.92 samples/sec   Loss 2.1509   LearningRate 0.0103   Epoch: 17   Global Step: 92820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:18:25,533-Speed 10540.43 samples/sec   Loss 2.1656   LearningRate 0.0103   Epoch: 17   Global Step: 92830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:18:33,321-Speed 10520.92 samples/sec   Loss 2.1577   LearningRate 0.0102   Epoch: 17   Global Step: 92840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:18:41,120-Speed 10504.94 samples/sec   Loss 2.1634   LearningRate 0.0102   Epoch: 17   Global Step: 92850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:18:48,926-Speed 10495.65 samples/sec   Loss 2.1512   LearningRate 0.0102   Epoch: 17   Global Step: 92860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:18:56,737-Speed 10489.65 samples/sec   Loss 2.1776   LearningRate 0.0102   Epoch: 17   Global Step: 92870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:19:04,527-Speed 10517.64 samples/sec   Loss 2.1581   LearningRate 0.0102   Epoch: 17   Global Step: 92880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:19:12,315-Speed 10520.23 samples/sec   Loss 2.1469   LearningRate 0.0102   Epoch: 17   Global Step: 92890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:19:20,090-Speed 10537.63 samples/sec   Loss 2.1510   LearningRate 0.0101   Epoch: 17   Global Step: 92900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:19:27,898-Speed 10495.75 samples/sec   Loss 2.1429   LearningRate 0.0101   Epoch: 17   Global Step: 92910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:19:35,683-Speed 10523.69 samples/sec   Loss 2.1239   LearningRate 0.0101   Epoch: 17   Global Step: 92920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:19:43,488-Speed 10497.83 samples/sec   Loss 2.1598   LearningRate 0.0101   Epoch: 17   Global Step: 92930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:19:51,336-Speed 10439.81 samples/sec   Loss 2.1643   LearningRate 0.0101   Epoch: 17   Global Step: 92940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:19:59,116-Speed 10531.76 samples/sec   Loss 2.1499   LearningRate 0.0100   Epoch: 17   Global Step: 92950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:20:06,913-Speed 10508.73 samples/sec   Loss 2.1476   LearningRate 0.0100   Epoch: 17   Global Step: 92960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:20:14,699-Speed 10523.56 samples/sec   Loss 2.1533   LearningRate 0.0100   Epoch: 17   Global Step: 92970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:20:22,510-Speed 10491.00 samples/sec   Loss 2.1480   LearningRate 0.0100   Epoch: 17   Global Step: 92980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:20:30,299-Speed 10519.45 samples/sec   Loss 2.1509   LearningRate 0.0100   Epoch: 17   Global Step: 92990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:20:38,103-Speed 10498.16 samples/sec   Loss 2.1199   LearningRate 0.0099   Epoch: 17   Global Step: 93000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:20:45,931-Speed 10466.54 samples/sec   Loss 2.1419   LearningRate 0.0099   Epoch: 17   Global Step: 93010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:20:53,712-Speed 10530.42 samples/sec   Loss 2.1442   LearningRate 0.0099   Epoch: 17   Global Step: 93020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:21:01,501-Speed 10518.33 samples/sec   Loss 2.1410   LearningRate 0.0099   Epoch: 17   Global Step: 93030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:21:09,316-Speed 10484.58 samples/sec   Loss 2.1573   LearningRate 0.0099   Epoch: 17   Global Step: 93040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:21:17,117-Speed 10502.82 samples/sec   Loss 2.1267   LearningRate 0.0099   Epoch: 17   Global Step: 93050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:21:24,926-Speed 10491.13 samples/sec   Loss 2.1380   LearningRate 0.0098   Epoch: 17   Global Step: 93060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:21:32,719-Speed 10513.26 samples/sec   Loss 2.1329   LearningRate 0.0098   Epoch: 17   Global Step: 93070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:21:40,510-Speed 10516.84 samples/sec   Loss 2.1295   LearningRate 0.0098   Epoch: 17   Global Step: 93080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:21:48,290-Speed 10530.60 samples/sec   Loss 2.1066   LearningRate 0.0098   Epoch: 17   Global Step: 93090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:21:56,096-Speed 10496.01 samples/sec   Loss 2.1256   LearningRate 0.0098   Epoch: 17   Global Step: 93100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:22:03,908-Speed 10487.57 samples/sec   Loss 2.1451   LearningRate 0.0097   Epoch: 17   Global Step: 93110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:22:11,730-Speed 10479.34 samples/sec   Loss 2.1371   LearningRate 0.0097   Epoch: 17   Global Step: 93120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:22:19,546-Speed 10482.56 samples/sec   Loss 2.1343   LearningRate 0.0097   Epoch: 17   Global Step: 93130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:22:27,368-Speed 10473.80 samples/sec   Loss 2.1203   LearningRate 0.0097   Epoch: 17   Global Step: 93140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:22:35,161-Speed 10514.34 samples/sec   Loss 2.1358   LearningRate 0.0097   Epoch: 17   Global Step: 93150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:22:42,943-Speed 10528.69 samples/sec   Loss 2.1232   LearningRate 0.0097   Epoch: 17   Global Step: 93160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:22:50,762-Speed 10477.82 samples/sec   Loss 2.1465   LearningRate 0.0096   Epoch: 17   Global Step: 93170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:22:58,543-Speed 10528.65 samples/sec   Loss 2.1363   LearningRate 0.0096   Epoch: 17   Global Step: 93180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:23:06,351-Speed 10494.07 samples/sec   Loss 2.1483   LearningRate 0.0096   Epoch: 17   Global Step: 93190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:23:14,133-Speed 10528.00 samples/sec   Loss 2.1195   LearningRate 0.0096   Epoch: 17   Global Step: 93200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:23:21,953-Speed 10477.27 samples/sec   Loss 2.1149   LearningRate 0.0096   Epoch: 17   Global Step: 93210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:23:29,746-Speed 10513.67 samples/sec   Loss 2.1238   LearningRate 0.0095   Epoch: 17   Global Step: 93220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:23:37,534-Speed 10520.16 samples/sec   Loss 2.1201   LearningRate 0.0095   Epoch: 17   Global Step: 93230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:23:45,323-Speed 10519.36 samples/sec   Loss 2.1114   LearningRate 0.0095   Epoch: 17   Global Step: 93240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:23:53,141-Speed 10479.35 samples/sec   Loss 2.1032   LearningRate 0.0095   Epoch: 17   Global Step: 93250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:24:00,972-Speed 10462.26 samples/sec   Loss 2.1176   LearningRate 0.0095   Epoch: 17   Global Step: 93260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:24:08,779-Speed 10495.36 samples/sec   Loss 2.1173   LearningRate 0.0095   Epoch: 17   Global Step: 93270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:24:16,588-Speed 10492.06 samples/sec   Loss 2.1155   LearningRate 0.0094   Epoch: 17   Global Step: 93280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:24:24,388-Speed 10503.94 samples/sec   Loss 2.1178   LearningRate 0.0094   Epoch: 17   Global Step: 93290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:24:32,177-Speed 10518.93 samples/sec   Loss 2.1351   LearningRate 0.0094   Epoch: 17   Global Step: 93300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:24:39,988-Speed 10490.20 samples/sec   Loss 2.1333   LearningRate 0.0094   Epoch: 17   Global Step: 93310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:24:47,793-Speed 10497.70 samples/sec   Loss 2.1126   LearningRate 0.0094   Epoch: 17   Global Step: 93320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:25:10,502-Speed 3607.46 samples/sec   Loss 2.1130   LearningRate 0.0093   Epoch: 18   Global Step: 93330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:25:18,278-Speed 10536.71 samples/sec   Loss 2.1021   LearningRate 0.0093   Epoch: 18   Global Step: 93340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:25:26,048-Speed 10546.45 samples/sec   Loss 2.1271   LearningRate 0.0093   Epoch: 18   Global Step: 93350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:25:33,831-Speed 10525.62 samples/sec   Loss 2.0976   LearningRate 0.0093   Epoch: 18   Global Step: 93360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:25:41,616-Speed 10524.40 samples/sec   Loss 2.1074   LearningRate 0.0093   Epoch: 18   Global Step: 93370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:25:49,386-Speed 10544.41 samples/sec   Loss 2.0954   LearningRate 0.0093   Epoch: 18   Global Step: 93380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:25:57,172-Speed 10522.41 samples/sec   Loss 2.0894   LearningRate 0.0092   Epoch: 18   Global Step: 93390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:26:04,963-Speed 10516.23 samples/sec   Loss 2.1136   LearningRate 0.0092   Epoch: 18   Global Step: 93400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:26:12,782-Speed 10478.84 samples/sec   Loss 2.1019   LearningRate 0.0092   Epoch: 18   Global Step: 93410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:26:20,592-Speed 10490.55 samples/sec   Loss 2.0913   LearningRate 0.0092   Epoch: 18   Global Step: 93420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:26:28,391-Speed 10505.51 samples/sec   Loss 2.0710   LearningRate 0.0092   Epoch: 18   Global Step: 93430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:26:36,170-Speed 10531.06 samples/sec   Loss 2.0746   LearningRate 0.0091   Epoch: 18   Global Step: 93440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:26:43,954-Speed 10527.36 samples/sec   Loss 2.1291   LearningRate 0.0091   Epoch: 18   Global Step: 93450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:26:51,740-Speed 10522.27 samples/sec   Loss 2.1121   LearningRate 0.0091   Epoch: 18   Global Step: 93460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:26:59,529-Speed 10522.24 samples/sec   Loss 2.0729   LearningRate 0.0091   Epoch: 18   Global Step: 93470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:27:07,321-Speed 10515.19 samples/sec   Loss 2.0894   LearningRate 0.0091   Epoch: 18   Global Step: 93480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:27:15,102-Speed 10533.14 samples/sec   Loss 2.0664   LearningRate 0.0091   Epoch: 18   Global Step: 93490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:27:22,854-Speed 10568.60 samples/sec   Loss 2.0738   LearningRate 0.0090   Epoch: 18   Global Step: 93500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:27:30,633-Speed 10534.94 samples/sec   Loss 2.0648   LearningRate 0.0090   Epoch: 18   Global Step: 93510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:27:38,401-Speed 10546.80 samples/sec   Loss 2.0879   LearningRate 0.0090   Epoch: 18   Global Step: 93520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:27:46,171-Speed 10544.88 samples/sec   Loss 2.0858   LearningRate 0.0090   Epoch: 18   Global Step: 93530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:27:53,989-Speed 10479.26 samples/sec   Loss 2.0904   LearningRate 0.0090   Epoch: 18   Global Step: 93540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:28:01,783-Speed 10512.01 samples/sec   Loss 2.0726   LearningRate 0.0089   Epoch: 18   Global Step: 93550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:28:09,580-Speed 10507.54 samples/sec   Loss 2.0837   LearningRate 0.0089   Epoch: 18   Global Step: 93560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:28:17,381-Speed 10504.19 samples/sec   Loss 2.0841   LearningRate 0.0089   Epoch: 18   Global Step: 93570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:28:25,157-Speed 10536.96 samples/sec   Loss 2.0925   LearningRate 0.0089   Epoch: 18   Global Step: 93580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:28:32,953-Speed 10508.47 samples/sec   Loss 2.0831   LearningRate 0.0089   Epoch: 18   Global Step: 93590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:28:40,728-Speed 10537.38 samples/sec   Loss 2.0909   LearningRate 0.0089   Epoch: 18   Global Step: 93600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:28:48,549-Speed 10476.01 samples/sec   Loss 2.0783   LearningRate 0.0088   Epoch: 18   Global Step: 93610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:28:56,333-Speed 10526.87 samples/sec   Loss 2.0778   LearningRate 0.0088   Epoch: 18   Global Step: 93620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:29:04,104-Speed 10542.24 samples/sec   Loss 2.1039   LearningRate 0.0088   Epoch: 18   Global Step: 93630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:29:11,896-Speed 10515.28 samples/sec   Loss 2.0587   LearningRate 0.0088   Epoch: 18   Global Step: 93640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:29:19,662-Speed 10548.64 samples/sec   Loss 2.0644   LearningRate 0.0088   Epoch: 18   Global Step: 93650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:29:27,444-Speed 10528.59 samples/sec   Loss 2.0835   LearningRate 0.0088   Epoch: 18   Global Step: 93660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:29:35,222-Speed 10533.53 samples/sec   Loss 2.0833   LearningRate 0.0087   Epoch: 18   Global Step: 93670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:29:42,982-Speed 10558.91 samples/sec   Loss 2.0629   LearningRate 0.0087   Epoch: 18   Global Step: 93680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:29:50,751-Speed 10544.93 samples/sec   Loss 2.0697   LearningRate 0.0087   Epoch: 18   Global Step: 93690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:29:58,534-Speed 10527.65 samples/sec   Loss 2.0906   LearningRate 0.0087   Epoch: 18   Global Step: 93700   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-01-16 11:30:06,310-Speed 10536.54 samples/sec   Loss 2.0619   LearningRate 0.0087   Epoch: 18   Global Step: 93710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:30:14,099-Speed 10517.91 samples/sec   Loss 2.0752   LearningRate 0.0087   Epoch: 18   Global Step: 93720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:30:21,887-Speed 10520.21 samples/sec   Loss 2.0645   LearningRate 0.0086   Epoch: 18   Global Step: 93730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:30:29,698-Speed 10489.89 samples/sec   Loss 2.0543   LearningRate 0.0086   Epoch: 18   Global Step: 93740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:30:37,514-Speed 10482.69 samples/sec   Loss 2.0682   LearningRate 0.0086   Epoch: 18   Global Step: 93750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:30:45,367-Speed 10432.82 samples/sec   Loss 2.0666   LearningRate 0.0086   Epoch: 18   Global Step: 93760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:30:53,148-Speed 10530.04 samples/sec   Loss 2.0455   LearningRate 0.0086   Epoch: 18   Global Step: 93770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:31:01,000-Speed 10433.65 samples/sec   Loss 2.0712   LearningRate 0.0085   Epoch: 18   Global Step: 93780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:31:08,882-Speed 10394.97 samples/sec   Loss 2.0533   LearningRate 0.0085   Epoch: 18   Global Step: 93790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:31:16,712-Speed 10463.97 samples/sec   Loss 2.0528   LearningRate 0.0085   Epoch: 18   Global Step: 93800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:31:24,564-Speed 10434.84 samples/sec   Loss 2.0795   LearningRate 0.0085   Epoch: 18   Global Step: 93810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:31:32,389-Speed 10471.84 samples/sec   Loss 2.0688   LearningRate 0.0085   Epoch: 18   Global Step: 93820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:31:40,212-Speed 10474.31 samples/sec   Loss 2.0485   LearningRate 0.0085   Epoch: 18   Global Step: 93830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:31:48,041-Speed 10465.08 samples/sec   Loss 2.0509   LearningRate 0.0084   Epoch: 18   Global Step: 93840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:31:55,908-Speed 10413.05 samples/sec   Loss 2.0567   LearningRate 0.0084   Epoch: 18   Global Step: 93850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:32:03,735-Speed 10468.71 samples/sec   Loss 2.0637   LearningRate 0.0084   Epoch: 18   Global Step: 93860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:32:11,556-Speed 10476.03 samples/sec   Loss 2.0585   LearningRate 0.0084   Epoch: 18   Global Step: 93870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:32:19,414-Speed 10425.78 samples/sec   Loss 2.0586   LearningRate 0.0084   Epoch: 18   Global Step: 93880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:32:27,250-Speed 10456.51 samples/sec   Loss 2.0450   LearningRate 0.0084   Epoch: 18   Global Step: 93890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:32:35,077-Speed 10467.24 samples/sec   Loss 2.0346   LearningRate 0.0083   Epoch: 18   Global Step: 93900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:32:42,887-Speed 10491.05 samples/sec   Loss 2.0345   LearningRate 0.0083   Epoch: 18   Global Step: 93910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:32:50,685-Speed 10510.97 samples/sec   Loss 2.0511   LearningRate 0.0083   Epoch: 18   Global Step: 93920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:32:58,533-Speed 10439.34 samples/sec   Loss 2.0862   LearningRate 0.0083   Epoch: 18   Global Step: 93930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:33:06,337-Speed 10499.98 samples/sec   Loss 2.0272   LearningRate 0.0083   Epoch: 18   Global Step: 93940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:33:14,125-Speed 10519.09 samples/sec   Loss 2.0523   LearningRate 0.0083   Epoch: 18   Global Step: 93950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:33:21,929-Speed 10499.49 samples/sec   Loss 2.0679   LearningRate 0.0082   Epoch: 18   Global Step: 93960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:33:29,748-Speed 10477.71 samples/sec   Loss 2.0141   LearningRate 0.0082   Epoch: 18   Global Step: 93970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:33:37,557-Speed 10493.24 samples/sec   Loss 2.0520   LearningRate 0.0082   Epoch: 18   Global Step: 93980   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:33:45,355-Speed 10505.60 samples/sec   Loss 2.0028   LearningRate 0.0082   Epoch: 18   Global Step: 93990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:33:53,148-Speed 10513.62 samples/sec   Loss 2.0538   LearningRate 0.0082   Epoch: 18   Global Step: 94000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:34:00,973-Speed 10470.15 samples/sec   Loss 2.0272   LearningRate 0.0082   Epoch: 18   Global Step: 94010   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:34:08,758-Speed 10525.69 samples/sec   Loss 2.0288   LearningRate 0.0081   Epoch: 18   Global Step: 94020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:34:16,536-Speed 10532.84 samples/sec   Loss 2.0444   LearningRate 0.0081   Epoch: 18   Global Step: 94030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:34:24,332-Speed 10509.12 samples/sec   Loss 2.0153   LearningRate 0.0081   Epoch: 18   Global Step: 94040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:34:32,134-Speed 10503.85 samples/sec   Loss 2.0122   LearningRate 0.0081   Epoch: 18   Global Step: 94050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:34:39,949-Speed 10485.23 samples/sec   Loss 2.0263   LearningRate 0.0081   Epoch: 18   Global Step: 94060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:34:47,769-Speed 10475.81 samples/sec   Loss 2.0258   LearningRate 0.0081   Epoch: 18   Global Step: 94070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:34:55,573-Speed 10506.66 samples/sec   Loss 2.0054   LearningRate 0.0080   Epoch: 18   Global Step: 94080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:35:03,395-Speed 10474.88 samples/sec   Loss 2.0190   LearningRate 0.0080   Epoch: 18   Global Step: 94090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:35:11,193-Speed 10506.56 samples/sec   Loss 2.0191   LearningRate 0.0080   Epoch: 18   Global Step: 94100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:35:18,966-Speed 10540.17 samples/sec   Loss 2.0036   LearningRate 0.0080   Epoch: 18   Global Step: 94110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:35:26,752-Speed 10523.30 samples/sec   Loss 2.0009   LearningRate 0.0080   Epoch: 18   Global Step: 94120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:35:34,521-Speed 10545.12 samples/sec   Loss 2.0321   LearningRate 0.0080   Epoch: 18   Global Step: 94130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:35:42,314-Speed 10514.33 samples/sec   Loss 2.0483   LearningRate 0.0079   Epoch: 18   Global Step: 94140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:35:50,103-Speed 10518.86 samples/sec   Loss 2.0227   LearningRate 0.0079   Epoch: 18   Global Step: 94150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:35:57,905-Speed 10500.53 samples/sec   Loss 2.0338   LearningRate 0.0079   Epoch: 18   Global Step: 94160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:36:05,724-Speed 10479.11 samples/sec   Loss 2.0053   LearningRate 0.0079   Epoch: 18   Global Step: 94170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:36:13,515-Speed 10515.44 samples/sec   Loss 2.0114   LearningRate 0.0079   Epoch: 18   Global Step: 94180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:36:21,306-Speed 10516.71 samples/sec   Loss 1.9967   LearningRate 0.0079   Epoch: 18   Global Step: 94190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:36:29,110-Speed 10497.85 samples/sec   Loss 2.0130   LearningRate 0.0078   Epoch: 18   Global Step: 94200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:36:36,912-Speed 10501.96 samples/sec   Loss 2.0242   LearningRate 0.0078   Epoch: 18   Global Step: 94210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:36:44,717-Speed 10497.05 samples/sec   Loss 2.0110   LearningRate 0.0078   Epoch: 18   Global Step: 94220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:36:52,513-Speed 10508.51 samples/sec   Loss 2.0145   LearningRate 0.0078   Epoch: 18   Global Step: 94230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:37:00,295-Speed 10528.63 samples/sec   Loss 2.0008   LearningRate 0.0078   Epoch: 18   Global Step: 94240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:37:08,122-Speed 10468.36 samples/sec   Loss 1.9943   LearningRate 0.0078   Epoch: 18   Global Step: 94250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:37:15,920-Speed 10506.22 samples/sec   Loss 2.0103   LearningRate 0.0077   Epoch: 18   Global Step: 94260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:37:23,735-Speed 10483.01 samples/sec   Loss 2.0032   LearningRate 0.0077   Epoch: 18   Global Step: 94270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:37:31,548-Speed 10487.87 samples/sec   Loss 2.0026   LearningRate 0.0077   Epoch: 18   Global Step: 94280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:37:39,347-Speed 10505.33 samples/sec   Loss 1.9840   LearningRate 0.0077   Epoch: 18   Global Step: 94290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:37:47,161-Speed 10486.84 samples/sec   Loss 2.0045   LearningRate 0.0077   Epoch: 18   Global Step: 94300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:37:54,964-Speed 10499.93 samples/sec   Loss 1.9948   LearningRate 0.0077   Epoch: 18   Global Step: 94310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:38:02,768-Speed 10498.31 samples/sec   Loss 2.0020   LearningRate 0.0076   Epoch: 18   Global Step: 94320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:38:10,552-Speed 10526.57 samples/sec   Loss 1.9984   LearningRate 0.0076   Epoch: 18   Global Step: 94330   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-01-16 11:38:18,352-Speed 10503.84 samples/sec   Loss 1.9950   LearningRate 0.0076   Epoch: 18   Global Step: 94340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:38:26,169-Speed 10481.04 samples/sec   Loss 1.9827   LearningRate 0.0076   Epoch: 18   Global Step: 94350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:38:33,952-Speed 10526.44 samples/sec   Loss 2.0102   LearningRate 0.0076   Epoch: 18   Global Step: 94360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:38:41,755-Speed 10500.15 samples/sec   Loss 1.9867   LearningRate 0.0076   Epoch: 18   Global Step: 94370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:38:49,542-Speed 10521.38 samples/sec   Loss 1.9835   LearningRate 0.0075   Epoch: 18   Global Step: 94380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:38:57,336-Speed 10511.36 samples/sec   Loss 1.9953   LearningRate 0.0075   Epoch: 18   Global Step: 94390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:39:05,169-Speed 10460.36 samples/sec   Loss 1.9808   LearningRate 0.0075   Epoch: 18   Global Step: 94400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:39:12,964-Speed 10511.38 samples/sec   Loss 1.9988   LearningRate 0.0075   Epoch: 18   Global Step: 94410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:39:20,743-Speed 10532.39 samples/sec   Loss 1.9890   LearningRate 0.0075   Epoch: 18   Global Step: 94420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:39:28,535-Speed 10514.66 samples/sec   Loss 1.9804   LearningRate 0.0075   Epoch: 18   Global Step: 94430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:39:36,321-Speed 10522.44 samples/sec   Loss 1.9934   LearningRate 0.0074   Epoch: 18   Global Step: 94440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:39:44,116-Speed 10511.13 samples/sec   Loss 2.0018   LearningRate 0.0074   Epoch: 18   Global Step: 94450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:39:51,908-Speed 10515.28 samples/sec   Loss 1.9910   LearningRate 0.0074   Epoch: 18   Global Step: 94460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:39:59,709-Speed 10503.02 samples/sec   Loss 1.9868   LearningRate 0.0074   Epoch: 18   Global Step: 94470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:40:07,501-Speed 10515.37 samples/sec   Loss 1.9911   LearningRate 0.0074   Epoch: 18   Global Step: 94480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:40:15,288-Speed 10519.91 samples/sec   Loss 1.9867   LearningRate 0.0074   Epoch: 18   Global Step: 94490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:40:23,078-Speed 10518.02 samples/sec   Loss 1.9901   LearningRate 0.0073   Epoch: 18   Global Step: 94500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:40:30,876-Speed 10507.79 samples/sec   Loss 1.9818   LearningRate 0.0073   Epoch: 18   Global Step: 94510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:40:38,678-Speed 10500.11 samples/sec   Loss 1.9850   LearningRate 0.0073   Epoch: 18   Global Step: 94520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:40:46,478-Speed 10504.08 samples/sec   Loss 1.9755   LearningRate 0.0073   Epoch: 18   Global Step: 94530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:40:54,252-Speed 10539.35 samples/sec   Loss 1.9689   LearningRate 0.0073   Epoch: 18   Global Step: 94540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:41:02,052-Speed 10504.22 samples/sec   Loss 1.9699   LearningRate 0.0073   Epoch: 18   Global Step: 94550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:41:09,839-Speed 10520.67 samples/sec   Loss 1.9959   LearningRate 0.0073   Epoch: 18   Global Step: 94560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:41:17,612-Speed 10541.71 samples/sec   Loss 1.9801   LearningRate 0.0072   Epoch: 18   Global Step: 94570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:41:25,420-Speed 10493.03 samples/sec   Loss 1.9561   LearningRate 0.0072   Epoch: 18   Global Step: 94580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:41:33,200-Speed 10530.52 samples/sec   Loss 1.9867   LearningRate 0.0072   Epoch: 18   Global Step: 94590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:41:41,006-Speed 10496.97 samples/sec   Loss 1.9706   LearningRate 0.0072   Epoch: 18   Global Step: 94600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:41:48,808-Speed 10500.98 samples/sec   Loss 1.9798   LearningRate 0.0072   Epoch: 18   Global Step: 94610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:41:56,624-Speed 10484.64 samples/sec   Loss 1.9660   LearningRate 0.0072   Epoch: 18   Global Step: 94620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:42:04,464-Speed 10450.77 samples/sec   Loss 1.9917   LearningRate 0.0071   Epoch: 18   Global Step: 94630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:42:12,301-Speed 10455.10 samples/sec   Loss 1.9685   LearningRate 0.0071   Epoch: 18   Global Step: 94640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:42:20,099-Speed 10506.10 samples/sec   Loss 1.9710   LearningRate 0.0071   Epoch: 18   Global Step: 94650   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-01-16 11:42:27,886-Speed 10525.19 samples/sec   Loss 1.9808   LearningRate 0.0071   Epoch: 18   Global Step: 94660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:42:35,668-Speed 10528.95 samples/sec   Loss 1.9569   LearningRate 0.0071   Epoch: 18   Global Step: 94670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:42:43,455-Speed 10521.85 samples/sec   Loss 1.9499   LearningRate 0.0071   Epoch: 18   Global Step: 94680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:42:51,250-Speed 10513.27 samples/sec   Loss 1.9595   LearningRate 0.0070   Epoch: 18   Global Step: 94690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:42:59,073-Speed 10474.22 samples/sec   Loss 1.9897   LearningRate 0.0070   Epoch: 18   Global Step: 94700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:43:06,858-Speed 10523.55 samples/sec   Loss 1.9620   LearningRate 0.0070   Epoch: 18   Global Step: 94710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:43:14,628-Speed 10544.72 samples/sec   Loss 1.9789   LearningRate 0.0070   Epoch: 18   Global Step: 94720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:43:22,420-Speed 10514.63 samples/sec   Loss 1.9440   LearningRate 0.0070   Epoch: 18   Global Step: 94730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:43:30,208-Speed 10519.88 samples/sec   Loss 1.9717   LearningRate 0.0070   Epoch: 18   Global Step: 94740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:43:38,002-Speed 10512.24 samples/sec   Loss 1.9525   LearningRate 0.0070   Epoch: 18   Global Step: 94750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:43:45,791-Speed 10519.21 samples/sec   Loss 1.9715   LearningRate 0.0069   Epoch: 18   Global Step: 94760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:43:53,643-Speed 10434.61 samples/sec   Loss 1.9557   LearningRate 0.0069   Epoch: 18   Global Step: 94770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:44:01,452-Speed 10490.80 samples/sec   Loss 1.9845   LearningRate 0.0069   Epoch: 18   Global Step: 94780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:44:09,275-Speed 10476.49 samples/sec   Loss 1.9543   LearningRate 0.0069   Epoch: 18   Global Step: 94790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:44:17,100-Speed 10469.88 samples/sec   Loss 1.9624   LearningRate 0.0069   Epoch: 18   Global Step: 94800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:44:24,907-Speed 10494.35 samples/sec   Loss 1.9868   LearningRate 0.0069   Epoch: 18   Global Step: 94810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:44:32,697-Speed 10518.50 samples/sec   Loss 1.9364   LearningRate 0.0068   Epoch: 18   Global Step: 94820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:44:40,497-Speed 10503.56 samples/sec   Loss 1.9481   LearningRate 0.0068   Epoch: 18   Global Step: 94830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:44:48,272-Speed 10537.21 samples/sec   Loss 1.9384   LearningRate 0.0068   Epoch: 18   Global Step: 94840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:44:56,116-Speed 10445.22 samples/sec   Loss 1.9467   LearningRate 0.0068   Epoch: 18   Global Step: 94850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:45:03,913-Speed 10517.96 samples/sec   Loss 1.9616   LearningRate 0.0068   Epoch: 18   Global Step: 94860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:45:11,714-Speed 10509.09 samples/sec   Loss 1.9314   LearningRate 0.0068   Epoch: 18   Global Step: 94870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:45:19,517-Speed 10499.78 samples/sec   Loss 1.9440   LearningRate 0.0068   Epoch: 18   Global Step: 94880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:45:27,311-Speed 10512.31 samples/sec   Loss 1.9531   LearningRate 0.0067   Epoch: 18   Global Step: 94890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:45:35,109-Speed 10506.48 samples/sec   Loss 1.9365   LearningRate 0.0067   Epoch: 18   Global Step: 94900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:45:42,926-Speed 10481.64 samples/sec   Loss 1.9478   LearningRate 0.0067   Epoch: 18   Global Step: 94910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:45:50,713-Speed 10521.31 samples/sec   Loss 1.9517   LearningRate 0.0067   Epoch: 18   Global Step: 94920   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-01-16 11:45:58,502-Speed 10517.55 samples/sec   Loss 1.9705   LearningRate 0.0067   Epoch: 18   Global Step: 94930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:46:06,288-Speed 10523.17 samples/sec   Loss 1.9399   LearningRate 0.0067   Epoch: 18   Global Step: 94940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:46:14,160-Speed 10408.40 samples/sec   Loss 1.9361   LearningRate 0.0066   Epoch: 18   Global Step: 94950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:46:21,998-Speed 10458.52 samples/sec   Loss 1.9596   LearningRate 0.0066   Epoch: 18   Global Step: 94960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:46:29,780-Speed 10528.36 samples/sec   Loss 1.9125   LearningRate 0.0066   Epoch: 18   Global Step: 94970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:46:37,570-Speed 10528.29 samples/sec   Loss 1.9387   LearningRate 0.0066   Epoch: 18   Global Step: 94980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:46:45,350-Speed 10530.54 samples/sec   Loss 1.9421   LearningRate 0.0066   Epoch: 18   Global Step: 94990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:46:53,193-Speed 10446.53 samples/sec   Loss 1.9261   LearningRate 0.0066   Epoch: 18   Global Step: 95000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:47:01,000-Speed 10494.42 samples/sec   Loss 1.9441   LearningRate 0.0066   Epoch: 18   Global Step: 95010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:47:08,795-Speed 10510.58 samples/sec   Loss 1.9397   LearningRate 0.0065   Epoch: 18   Global Step: 95020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:47:16,613-Speed 10481.03 samples/sec   Loss 1.9245   LearningRate 0.0065   Epoch: 18   Global Step: 95030   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-01-16 11:47:24,450-Speed 10457.95 samples/sec   Loss 1.9199   LearningRate 0.0065   Epoch: 18   Global Step: 95040   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-01-16 11:47:32,286-Speed 10454.86 samples/sec   Loss 1.9138   LearningRate 0.0065   Epoch: 18   Global Step: 95050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:47:40,086-Speed 10504.25 samples/sec   Loss 1.9248   LearningRate 0.0065   Epoch: 18   Global Step: 95060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:47:47,878-Speed 10515.02 samples/sec   Loss 1.9239   LearningRate 0.0065   Epoch: 18   Global Step: 95070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:47:55,681-Speed 10499.90 samples/sec   Loss 1.9360   LearningRate 0.0065   Epoch: 18   Global Step: 95080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:48:03,476-Speed 10510.07 samples/sec   Loss 1.9227   LearningRate 0.0064   Epoch: 18   Global Step: 95090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:48:11,328-Speed 10434.83 samples/sec   Loss 1.9164   LearningRate 0.0064   Epoch: 18   Global Step: 95100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:48:19,153-Speed 10470.41 samples/sec   Loss 1.9129   LearningRate 0.0064   Epoch: 18   Global Step: 95110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:48:26,966-Speed 10485.85 samples/sec   Loss 1.9264   LearningRate 0.0064   Epoch: 18   Global Step: 95120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:48:34,791-Speed 10471.12 samples/sec   Loss 1.9366   LearningRate 0.0064   Epoch: 18   Global Step: 95130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:48:42,604-Speed 10485.89 samples/sec   Loss 1.9365   LearningRate 0.0064   Epoch: 18   Global Step: 95140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:48:50,376-Speed 10542.41 samples/sec   Loss 1.9236   LearningRate 0.0063   Epoch: 18   Global Step: 95150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:48:58,141-Speed 10551.22 samples/sec   Loss 1.9256   LearningRate 0.0063   Epoch: 18   Global Step: 95160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:49:05,955-Speed 10484.91 samples/sec   Loss 1.9198   LearningRate 0.0063   Epoch: 18   Global Step: 95170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:49:13,738-Speed 10526.08 samples/sec   Loss 1.8868   LearningRate 0.0063   Epoch: 18   Global Step: 95180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:49:21,547-Speed 10493.47 samples/sec   Loss 1.9208   LearningRate 0.0063   Epoch: 18   Global Step: 95190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:49:29,323-Speed 10535.67 samples/sec   Loss 1.8966   LearningRate 0.0063   Epoch: 18   Global Step: 95200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:49:37,117-Speed 10512.59 samples/sec   Loss 1.9209   LearningRate 0.0063   Epoch: 18   Global Step: 95210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:49:44,902-Speed 10524.14 samples/sec   Loss 1.9209   LearningRate 0.0062   Epoch: 18   Global Step: 95220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:49:52,680-Speed 10533.17 samples/sec   Loss 1.9200   LearningRate 0.0062   Epoch: 18   Global Step: 95230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:50:00,467-Speed 10520.90 samples/sec   Loss 1.9306   LearningRate 0.0062   Epoch: 18   Global Step: 95240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:50:08,301-Speed 10459.07 samples/sec   Loss 1.9111   LearningRate 0.0062   Epoch: 18   Global Step: 95250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:50:16,142-Speed 10449.12 samples/sec   Loss 1.9175   LearningRate 0.0062   Epoch: 18   Global Step: 95260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:50:23,935-Speed 10513.30 samples/sec   Loss 1.9235   LearningRate 0.0062   Epoch: 18   Global Step: 95270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:50:31,736-Speed 10502.46 samples/sec   Loss 1.9147   LearningRate 0.0062   Epoch: 18   Global Step: 95280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:50:39,533-Speed 10508.05 samples/sec   Loss 1.9170   LearningRate 0.0061   Epoch: 18   Global Step: 95290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:50:47,340-Speed 10494.17 samples/sec   Loss 1.9045   LearningRate 0.0061   Epoch: 18   Global Step: 95300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:50:55,106-Speed 10549.98 samples/sec   Loss 1.8929   LearningRate 0.0061   Epoch: 18   Global Step: 95310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:51:02,903-Speed 10508.45 samples/sec   Loss 1.8839   LearningRate 0.0061   Epoch: 18   Global Step: 95320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:51:10,694-Speed 10515.34 samples/sec   Loss 1.9298   LearningRate 0.0061   Epoch: 18   Global Step: 95330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:51:18,496-Speed 10501.78 samples/sec   Loss 1.8918   LearningRate 0.0061   Epoch: 18   Global Step: 95340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:51:26,306-Speed 10490.53 samples/sec   Loss 1.8896   LearningRate 0.0061   Epoch: 18   Global Step: 95350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:51:34,124-Speed 10480.42 samples/sec   Loss 1.9016   LearningRate 0.0060   Epoch: 18   Global Step: 95360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:51:41,911-Speed 10521.48 samples/sec   Loss 1.8830   LearningRate 0.0060   Epoch: 18   Global Step: 95370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:51:49,736-Speed 10470.30 samples/sec   Loss 1.8744   LearningRate 0.0060   Epoch: 18   Global Step: 95380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:51:57,522-Speed 10522.48 samples/sec   Loss 1.8994   LearningRate 0.0060   Epoch: 18   Global Step: 95390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:52:05,301-Speed 10531.67 samples/sec   Loss 1.9155   LearningRate 0.0060   Epoch: 18   Global Step: 95400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:52:13,109-Speed 10493.72 samples/sec   Loss 1.8857   LearningRate 0.0060   Epoch: 18   Global Step: 95410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:52:20,895-Speed 10524.30 samples/sec   Loss 1.8951   LearningRate 0.0060   Epoch: 18   Global Step: 95420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:52:28,676-Speed 10528.64 samples/sec   Loss 1.8903   LearningRate 0.0059   Epoch: 18   Global Step: 95430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:52:36,461-Speed 10523.02 samples/sec   Loss 1.9031   LearningRate 0.0059   Epoch: 18   Global Step: 95440   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-01-16 11:52:44,253-Speed 10515.02 samples/sec   Loss 1.8737   LearningRate 0.0059   Epoch: 18   Global Step: 95450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:52:52,057-Speed 10498.74 samples/sec   Loss 1.8880   LearningRate 0.0059   Epoch: 18   Global Step: 95460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:52:59,857-Speed 10503.96 samples/sec   Loss 1.8750   LearningRate 0.0059   Epoch: 18   Global Step: 95470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:53:07,662-Speed 10496.78 samples/sec   Loss 1.8643   LearningRate 0.0059   Epoch: 18   Global Step: 95480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:53:15,450-Speed 10520.97 samples/sec   Loss 1.8731   LearningRate 0.0058   Epoch: 18   Global Step: 95490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:53:23,242-Speed 10514.40 samples/sec   Loss 1.8916   LearningRate 0.0058   Epoch: 18   Global Step: 95500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:53:31,046-Speed 10498.10 samples/sec   Loss 1.9026   LearningRate 0.0058   Epoch: 18   Global Step: 95510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:53:38,878-Speed 10461.26 samples/sec   Loss 1.8786   LearningRate 0.0058   Epoch: 18   Global Step: 95520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:53:46,659-Speed 10530.44 samples/sec   Loss 1.8664   LearningRate 0.0058   Epoch: 18   Global Step: 95530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:53:54,434-Speed 10537.65 samples/sec   Loss 1.8691   LearningRate 0.0058   Epoch: 18   Global Step: 95540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:54:02,221-Speed 10521.22 samples/sec   Loss 1.8754   LearningRate 0.0058   Epoch: 18   Global Step: 95550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:54:10,028-Speed 10494.78 samples/sec   Loss 1.8831   LearningRate 0.0058   Epoch: 18   Global Step: 95560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:54:17,871-Speed 10446.31 samples/sec   Loss 1.8881   LearningRate 0.0057   Epoch: 18   Global Step: 95570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:54:25,656-Speed 10524.09 samples/sec   Loss 1.8788   LearningRate 0.0057   Epoch: 18   Global Step: 95580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:54:33,457-Speed 10502.19 samples/sec   Loss 1.8818   LearningRate 0.0057   Epoch: 18   Global Step: 95590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:54:41,244-Speed 10522.97 samples/sec   Loss 1.9130   LearningRate 0.0057   Epoch: 18   Global Step: 95600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:54:49,021-Speed 10535.29 samples/sec   Loss 1.8676   LearningRate 0.0057   Epoch: 18   Global Step: 95610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:54:56,798-Speed 10534.64 samples/sec   Loss 1.8717   LearningRate 0.0057   Epoch: 18   Global Step: 95620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:55:04,589-Speed 10515.77 samples/sec   Loss 1.8747   LearningRate 0.0057   Epoch: 18   Global Step: 95630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:55:12,395-Speed 10496.62 samples/sec   Loss 1.8705   LearningRate 0.0056   Epoch: 18   Global Step: 95640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:55:20,198-Speed 10500.49 samples/sec   Loss 1.8843   LearningRate 0.0056   Epoch: 18   Global Step: 95650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:55:28,054-Speed 10428.28 samples/sec   Loss 1.8775   LearningRate 0.0056   Epoch: 18   Global Step: 95660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:55:35,882-Speed 10465.85 samples/sec   Loss 1.8744   LearningRate 0.0056   Epoch: 18   Global Step: 95670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:55:43,686-Speed 10499.89 samples/sec   Loss 1.8505   LearningRate 0.0056   Epoch: 18   Global Step: 95680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:55:51,493-Speed 10494.40 samples/sec   Loss 1.8694   LearningRate 0.0056   Epoch: 18   Global Step: 95690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:55:59,298-Speed 10496.96 samples/sec   Loss 1.8725   LearningRate 0.0056   Epoch: 18   Global Step: 95700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:56:07,081-Speed 10527.97 samples/sec   Loss 1.8870   LearningRate 0.0055   Epoch: 18   Global Step: 95710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:56:14,878-Speed 10508.63 samples/sec   Loss 1.8684   LearningRate 0.0055   Epoch: 18   Global Step: 95720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:56:22,676-Speed 10505.82 samples/sec   Loss 1.8937   LearningRate 0.0055   Epoch: 18   Global Step: 95730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:56:30,452-Speed 10536.11 samples/sec   Loss 1.8676   LearningRate 0.0055   Epoch: 18   Global Step: 95740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:56:38,234-Speed 10528.29 samples/sec   Loss 1.8897   LearningRate 0.0055   Epoch: 18   Global Step: 95750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:56:46,013-Speed 10533.05 samples/sec   Loss 1.8731   LearningRate 0.0055   Epoch: 18   Global Step: 95760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:56:53,795-Speed 10527.95 samples/sec   Loss 1.8512   LearningRate 0.0055   Epoch: 18   Global Step: 95770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:57:01,597-Speed 10501.22 samples/sec   Loss 1.8600   LearningRate 0.0054   Epoch: 18   Global Step: 95780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:57:09,442-Speed 10444.13 samples/sec   Loss 1.8707   LearningRate 0.0054   Epoch: 18   Global Step: 95790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 11:57:17,232-Speed 10517.44 samples/sec   Loss 1.8497   LearningRate 0.0054   Epoch: 18   Global Step: 95800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:57:25,020-Speed 10520.13 samples/sec   Loss 1.8347   LearningRate 0.0054   Epoch: 18   Global Step: 95810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:57:32,810-Speed 10517.23 samples/sec   Loss 1.8765   LearningRate 0.0054   Epoch: 18   Global Step: 95820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:57:40,586-Speed 10537.48 samples/sec   Loss 1.8498   LearningRate 0.0054   Epoch: 18   Global Step: 95830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:57:48,368-Speed 10527.59 samples/sec   Loss 1.8420   LearningRate 0.0054   Epoch: 18   Global Step: 95840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:57:56,147-Speed 10531.83 samples/sec   Loss 1.8411   LearningRate 0.0053   Epoch: 18   Global Step: 95850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:58:03,925-Speed 10533.50 samples/sec   Loss 1.8598   LearningRate 0.0053   Epoch: 18   Global Step: 95860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:58:11,705-Speed 10532.04 samples/sec   Loss 1.8385   LearningRate 0.0053   Epoch: 18   Global Step: 95870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:58:19,500-Speed 10510.83 samples/sec   Loss 1.8649   LearningRate 0.0053   Epoch: 18   Global Step: 95880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:58:27,280-Speed 10530.45 samples/sec   Loss 1.8431   LearningRate 0.0053   Epoch: 18   Global Step: 95890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:58:35,081-Speed 10505.19 samples/sec   Loss 1.8532   LearningRate 0.0053   Epoch: 18   Global Step: 95900   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-01-16 11:58:42,904-Speed 10474.70 samples/sec   Loss 1.8423   LearningRate 0.0053   Epoch: 18   Global Step: 95910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:58:50,673-Speed 10544.51 samples/sec   Loss 1.8521   LearningRate 0.0053   Epoch: 18   Global Step: 95920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:58:58,453-Speed 10530.62 samples/sec   Loss 1.8624   LearningRate 0.0052   Epoch: 18   Global Step: 95930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:59:06,257-Speed 10499.94 samples/sec   Loss 1.8361   LearningRate 0.0052   Epoch: 18   Global Step: 95940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:59:14,058-Speed 10502.31 samples/sec   Loss 1.8567   LearningRate 0.0052   Epoch: 18   Global Step: 95950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:59:21,848-Speed 10517.47 samples/sec   Loss 1.8578   LearningRate 0.0052   Epoch: 18   Global Step: 95960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:59:29,622-Speed 10544.55 samples/sec   Loss 1.8303   LearningRate 0.0052   Epoch: 18   Global Step: 95970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:59:37,419-Speed 10508.22 samples/sec   Loss 1.8422   LearningRate 0.0052   Epoch: 18   Global Step: 95980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:59:45,209-Speed 10517.78 samples/sec   Loss 1.8446   LearningRate 0.0052   Epoch: 18   Global Step: 95990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 11:59:53,018-Speed 10494.26 samples/sec   Loss 1.8311   LearningRate 0.0051   Epoch: 18   Global Step: 96000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 12:00:00,821-Speed 10504.43 samples/sec   Loss 1.8422   LearningRate 0.0051   Epoch: 18   Global Step: 96010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 12:00:08,616-Speed 10516.65 samples/sec   Loss 1.8293   LearningRate 0.0051   Epoch: 18   Global Step: 96020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 12:00:16,389-Speed 10539.61 samples/sec   Loss 1.8369   LearningRate 0.0051   Epoch: 18   Global Step: 96030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 12:00:24,178-Speed 10519.74 samples/sec   Loss 1.8292   LearningRate 0.0051   Epoch: 18   Global Step: 96040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 12:00:31,988-Speed 10491.07 samples/sec   Loss 1.8497   LearningRate 0.0051   Epoch: 18   Global Step: 96050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 12:00:39,776-Speed 10520.16 samples/sec   Loss 1.8379   LearningRate 0.0051   Epoch: 18   Global Step: 96060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 12:00:47,555-Speed 10532.77 samples/sec   Loss 1.8294   LearningRate 0.0051   Epoch: 18   Global Step: 96070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 12:00:55,335-Speed 10530.92 samples/sec   Loss 1.8272   LearningRate 0.0050   Epoch: 18   Global Step: 96080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 12:01:03,115-Speed 10531.33 samples/sec   Loss 1.8430   LearningRate 0.0050   Epoch: 18   Global Step: 96090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 12:01:10,903-Speed 10519.49 samples/sec   Loss 1.8433   LearningRate 0.0050   Epoch: 18   Global Step: 96100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 12:01:18,698-Speed 10510.89 samples/sec   Loss 1.8422   LearningRate 0.0050   Epoch: 18   Global Step: 96110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 12:01:26,499-Speed 10501.63 samples/sec   Loss 1.8496   LearningRate 0.0050   Epoch: 18   Global Step: 96120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 12:01:34,321-Speed 10475.63 samples/sec   Loss 1.8481   LearningRate 0.0050   Epoch: 18   Global Step: 96130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 12:01:42,183-Speed 10422.27 samples/sec   Loss 1.8407   LearningRate 0.0050   Epoch: 18   Global Step: 96140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:01:49,972-Speed 10517.13 samples/sec   Loss 1.8324   LearningRate 0.0049   Epoch: 18   Global Step: 96150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:01:57,778-Speed 10497.56 samples/sec   Loss 1.8455   LearningRate 0.0049   Epoch: 18   Global Step: 96160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:02:05,561-Speed 10527.01 samples/sec   Loss 1.8345   LearningRate 0.0049   Epoch: 18   Global Step: 96170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:02:13,350-Speed 10518.22 samples/sec   Loss 1.8341   LearningRate 0.0049   Epoch: 18   Global Step: 96180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:02:21,139-Speed 10519.14 samples/sec   Loss 1.8323   LearningRate 0.0049   Epoch: 18   Global Step: 96190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:02:28,921-Speed 10528.49 samples/sec   Loss 1.8172   LearningRate 0.0049   Epoch: 18   Global Step: 96200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:02:36,712-Speed 10515.42 samples/sec   Loss 1.8204   LearningRate 0.0049   Epoch: 18   Global Step: 96210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:02:44,536-Speed 10471.81 samples/sec   Loss 1.8203   LearningRate 0.0049   Epoch: 18   Global Step: 96220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:02:52,315-Speed 10532.75 samples/sec   Loss 1.8165   LearningRate 0.0048   Epoch: 18   Global Step: 96230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:03:00,113-Speed 10507.24 samples/sec   Loss 1.8250   LearningRate 0.0048   Epoch: 18   Global Step: 96240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 12:03:07,914-Speed 10503.40 samples/sec   Loss 1.8296   LearningRate 0.0048   Epoch: 18   Global Step: 96250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 12:03:15,699-Speed 10524.57 samples/sec   Loss 1.8035   LearningRate 0.0048   Epoch: 18   Global Step: 96260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 12:03:23,511-Speed 10487.69 samples/sec   Loss 1.8191   LearningRate 0.0048   Epoch: 18   Global Step: 96270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 12:03:31,294-Speed 10527.17 samples/sec   Loss 1.8497   LearningRate 0.0048   Epoch: 18   Global Step: 96280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:03:39,055-Speed 10556.36 samples/sec   Loss 1.8259   LearningRate 0.0048   Epoch: 18   Global Step: 96290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:03:46,823-Speed 10546.71 samples/sec   Loss 1.8229   LearningRate 0.0048   Epoch: 18   Global Step: 96300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:03:54,599-Speed 10539.65 samples/sec   Loss 1.8195   LearningRate 0.0047   Epoch: 18   Global Step: 96310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:04:02,421-Speed 10475.24 samples/sec   Loss 1.8071   LearningRate 0.0047   Epoch: 18   Global Step: 96320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:04:10,218-Speed 10507.33 samples/sec   Loss 1.8017   LearningRate 0.0047   Epoch: 18   Global Step: 96330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:04:17,995-Speed 10539.40 samples/sec   Loss 1.8267   LearningRate 0.0047   Epoch: 18   Global Step: 96340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:04:25,783-Speed 10520.84 samples/sec   Loss 1.7946   LearningRate 0.0047   Epoch: 18   Global Step: 96350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:04:33,581-Speed 10508.02 samples/sec   Loss 1.8064   LearningRate 0.0047   Epoch: 18   Global Step: 96360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:04:41,395-Speed 10484.52 samples/sec   Loss 1.8147   LearningRate 0.0047   Epoch: 18   Global Step: 96370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:04:49,185-Speed 10516.87 samples/sec   Loss 1.8148   LearningRate 0.0046   Epoch: 18   Global Step: 96380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 12:04:56,970-Speed 10525.14 samples/sec   Loss 1.7920   LearningRate 0.0046   Epoch: 18   Global Step: 96390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 12:05:04,786-Speed 10482.08 samples/sec   Loss 1.7928   LearningRate 0.0046   Epoch: 18   Global Step: 96400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 12:05:12,576-Speed 10517.75 samples/sec   Loss 1.8061   LearningRate 0.0046   Epoch: 18   Global Step: 96410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 12:05:20,367-Speed 10515.01 samples/sec   Loss 1.8019   LearningRate 0.0046   Epoch: 18   Global Step: 96420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 12:05:28,217-Speed 10438.29 samples/sec   Loss 1.8140   LearningRate 0.0046   Epoch: 18   Global Step: 96430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 12:05:36,004-Speed 10521.73 samples/sec   Loss 1.8164   LearningRate 0.0046   Epoch: 18   Global Step: 96440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 12:05:43,816-Speed 10487.55 samples/sec   Loss 1.8277   LearningRate 0.0046   Epoch: 18   Global Step: 96450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 12:05:51,612-Speed 10508.88 samples/sec   Loss 1.8026   LearningRate 0.0045   Epoch: 18   Global Step: 96460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 12:05:59,397-Speed 10527.09 samples/sec   Loss 1.8034   LearningRate 0.0045   Epoch: 18   Global Step: 96470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 12:06:07,189-Speed 10515.00 samples/sec   Loss 1.7755   LearningRate 0.0045   Epoch: 18   Global Step: 96480   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-01-16 12:06:14,991-Speed 10500.78 samples/sec   Loss 1.8160   LearningRate 0.0045   Epoch: 18   Global Step: 96490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 12:06:22,807-Speed 10482.68 samples/sec   Loss 1.7819   LearningRate 0.0045   Epoch: 18   Global Step: 96500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 12:06:30,614-Speed 10497.47 samples/sec   Loss 1.7903   LearningRate 0.0045   Epoch: 18   Global Step: 96510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 12:06:38,436-Speed 10475.84 samples/sec   Loss 1.7878   LearningRate 0.0045   Epoch: 18   Global Step: 96520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 12:06:46,227-Speed 10516.22 samples/sec   Loss 1.7973   LearningRate 0.0045   Epoch: 18   Global Step: 96530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:06:54,044-Speed 10480.48 samples/sec   Loss 1.7853   LearningRate 0.0044   Epoch: 18   Global Step: 96540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:07:01,832-Speed 10520.63 samples/sec   Loss 1.7932   LearningRate 0.0044   Epoch: 18   Global Step: 96550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:07:09,636-Speed 10499.22 samples/sec   Loss 1.7783   LearningRate 0.0044   Epoch: 18   Global Step: 96560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:07:17,434-Speed 10507.18 samples/sec   Loss 1.7862   LearningRate 0.0044   Epoch: 18   Global Step: 96570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:07:25,225-Speed 10516.14 samples/sec   Loss 1.7840   LearningRate 0.0044   Epoch: 18   Global Step: 96580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:07:33,026-Speed 10502.26 samples/sec   Loss 1.7858   LearningRate 0.0044   Epoch: 18   Global Step: 96590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:07:40,860-Speed 10458.48 samples/sec   Loss 1.7941   LearningRate 0.0044   Epoch: 18   Global Step: 96600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:07:48,676-Speed 10482.19 samples/sec   Loss 1.7854   LearningRate 0.0044   Epoch: 18   Global Step: 96610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:07:56,480-Speed 10499.99 samples/sec   Loss 1.7747   LearningRate 0.0043   Epoch: 18   Global Step: 96620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:08:04,276-Speed 10509.80 samples/sec   Loss 1.7834   LearningRate 0.0043   Epoch: 18   Global Step: 96630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 12:08:12,055-Speed 10531.90 samples/sec   Loss 1.7893   LearningRate 0.0043   Epoch: 18   Global Step: 96640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:08:19,839-Speed 10525.69 samples/sec   Loss 1.7806   LearningRate 0.0043   Epoch: 18   Global Step: 96650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:08:27,630-Speed 10515.79 samples/sec   Loss 1.7727   LearningRate 0.0043   Epoch: 18   Global Step: 96660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:08:35,413-Speed 10527.25 samples/sec   Loss 1.7897   LearningRate 0.0043   Epoch: 18   Global Step: 96670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:08:43,207-Speed 10513.35 samples/sec   Loss 1.7707   LearningRate 0.0043   Epoch: 18   Global Step: 96680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:08:51,011-Speed 10498.19 samples/sec   Loss 1.7935   LearningRate 0.0043   Epoch: 18   Global Step: 96690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:08:58,793-Speed 10528.25 samples/sec   Loss 1.7752   LearningRate 0.0042   Epoch: 18   Global Step: 96700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:09:06,634-Speed 10449.07 samples/sec   Loss 1.7642   LearningRate 0.0042   Epoch: 18   Global Step: 96710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:09:14,424-Speed 10517.94 samples/sec   Loss 1.8018   LearningRate 0.0042   Epoch: 18   Global Step: 96720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:09:22,234-Speed 10490.83 samples/sec   Loss 1.7846   LearningRate 0.0042   Epoch: 18   Global Step: 96730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:09:30,068-Speed 10457.17 samples/sec   Loss 1.8029   LearningRate 0.0042   Epoch: 18   Global Step: 96740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 12:09:37,877-Speed 10492.08 samples/sec   Loss 1.7908   LearningRate 0.0042   Epoch: 18   Global Step: 96750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:09:45,676-Speed 10505.57 samples/sec   Loss 1.7945   LearningRate 0.0042   Epoch: 18   Global Step: 96760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:09:53,479-Speed 10499.81 samples/sec   Loss 1.7603   LearningRate 0.0042   Epoch: 18   Global Step: 96770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:10:01,276-Speed 10508.34 samples/sec   Loss 1.7809   LearningRate 0.0042   Epoch: 18   Global Step: 96780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:10:09,074-Speed 10506.72 samples/sec   Loss 1.7691   LearningRate 0.0041   Epoch: 18   Global Step: 96790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:10:16,890-Speed 10483.25 samples/sec   Loss 1.7585   LearningRate 0.0041   Epoch: 18   Global Step: 96800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:10:24,700-Speed 10490.70 samples/sec   Loss 1.7793   LearningRate 0.0041   Epoch: 18   Global Step: 96810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:10:32,527-Speed 10467.55 samples/sec   Loss 1.7753   LearningRate 0.0041   Epoch: 18   Global Step: 96820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:10:40,316-Speed 10518.76 samples/sec   Loss 1.7603   LearningRate 0.0041   Epoch: 18   Global Step: 96830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:10:48,109-Speed 10519.22 samples/sec   Loss 1.7747   LearningRate 0.0041   Epoch: 18   Global Step: 96840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-16 12:10:55,939-Speed 10464.72 samples/sec   Loss 1.7716   LearningRate 0.0041   Epoch: 18   Global Step: 96850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-16 12:11:03,729-Speed 10516.67 samples/sec   Loss 1.7743   LearningRate 0.0041   Epoch: 18   Global Step: 96860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:11:11,519-Speed 10517.86 samples/sec   Loss 1.7786   LearningRate 0.0040   Epoch: 18   Global Step: 96870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:11:19,315-Speed 10509.64 samples/sec   Loss 1.7527   LearningRate 0.0040   Epoch: 18   Global Step: 96880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:11:27,154-Speed 10451.42 samples/sec   Loss 1.7677   LearningRate 0.0040   Epoch: 18   Global Step: 96890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:11:34,970-Speed 10482.57 samples/sec   Loss 1.7727   LearningRate 0.0040   Epoch: 18   Global Step: 96900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:11:42,755-Speed 10523.86 samples/sec   Loss 1.7499   LearningRate 0.0040   Epoch: 18   Global Step: 96910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:11:50,638-Speed 10501.49 samples/sec   Loss 1.7703   LearningRate 0.0040   Epoch: 18   Global Step: 96920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:11:58,431-Speed 10513.54 samples/sec   Loss 1.7790   LearningRate 0.0040   Epoch: 18   Global Step: 96930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:12:06,271-Speed 10450.10 samples/sec   Loss 1.7619   LearningRate 0.0040   Epoch: 18   Global Step: 96940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:12:14,058-Speed 10521.10 samples/sec   Loss 1.7573   LearningRate 0.0040   Epoch: 18   Global Step: 96950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:12:21,887-Speed 10465.93 samples/sec   Loss 1.7609   LearningRate 0.0039   Epoch: 18   Global Step: 96960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:12:29,711-Speed 10470.62 samples/sec   Loss 1.7756   LearningRate 0.0039   Epoch: 18   Global Step: 96970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:12:37,520-Speed 10491.88 samples/sec   Loss 1.7497   LearningRate 0.0039   Epoch: 18   Global Step: 96980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:12:45,310-Speed 10517.52 samples/sec   Loss 1.7712   LearningRate 0.0039   Epoch: 18   Global Step: 96990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:12:53,109-Speed 10505.46 samples/sec   Loss 1.7352   LearningRate 0.0039   Epoch: 18   Global Step: 97000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:13:00,905-Speed 10509.39 samples/sec   Loss 1.7463   LearningRate 0.0039   Epoch: 18   Global Step: 97010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:13:08,695-Speed 10517.84 samples/sec   Loss 1.7569   LearningRate 0.0039   Epoch: 18   Global Step: 97020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:13:16,527-Speed 10461.69 samples/sec   Loss 1.7535   LearningRate 0.0039   Epoch: 18   Global Step: 97030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:13:24,323-Speed 10508.66 samples/sec   Loss 1.7630   LearningRate 0.0038   Epoch: 18   Global Step: 97040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:13:32,123-Speed 10503.81 samples/sec   Loss 1.7435   LearningRate 0.0038   Epoch: 18   Global Step: 97050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:13:39,942-Speed 10478.49 samples/sec   Loss 1.7850   LearningRate 0.0038   Epoch: 18   Global Step: 97060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:13:47,739-Speed 10507.60 samples/sec   Loss 1.7498   LearningRate 0.0038   Epoch: 18   Global Step: 97070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:13:55,543-Speed 10499.15 samples/sec   Loss 1.7471   LearningRate 0.0038   Epoch: 18   Global Step: 97080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:14:03,338-Speed 10510.49 samples/sec   Loss 1.7454   LearningRate 0.0038   Epoch: 18   Global Step: 97090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:14:11,150-Speed 10487.52 samples/sec   Loss 1.7321   LearningRate 0.0038   Epoch: 18   Global Step: 97100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:14:18,953-Speed 10502.89 samples/sec   Loss 1.7578   LearningRate 0.0038   Epoch: 18   Global Step: 97110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:14:26,756-Speed 10499.03 samples/sec   Loss 1.7427   LearningRate 0.0038   Epoch: 18   Global Step: 97120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:14:34,532-Speed 10537.42 samples/sec   Loss 1.7591   LearningRate 0.0037   Epoch: 18   Global Step: 97130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:14:42,312-Speed 10531.12 samples/sec   Loss 1.7476   LearningRate 0.0037   Epoch: 18   Global Step: 97140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:14:50,079-Speed 10548.48 samples/sec   Loss 1.7335   LearningRate 0.0037   Epoch: 18   Global Step: 97150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:14:57,868-Speed 10517.98 samples/sec   Loss 1.7458   LearningRate 0.0037   Epoch: 18   Global Step: 97160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:15:05,674-Speed 10509.45 samples/sec   Loss 1.7220   LearningRate 0.0037   Epoch: 18   Global Step: 97170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:15:13,476-Speed 10515.39 samples/sec   Loss 1.7464   LearningRate 0.0037   Epoch: 18   Global Step: 97180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:15:21,294-Speed 10479.38 samples/sec   Loss 1.7290   LearningRate 0.0037   Epoch: 18   Global Step: 97190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:15:29,107-Speed 10497.50 samples/sec   Loss 1.7418   LearningRate 0.0037   Epoch: 18   Global Step: 97200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:15:36,924-Speed 10499.88 samples/sec   Loss 1.7304   LearningRate 0.0037   Epoch: 18   Global Step: 97210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:15:44,697-Speed 10556.90 samples/sec   Loss 1.7301   LearningRate 0.0036   Epoch: 18   Global Step: 97220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:15:52,487-Speed 10554.39 samples/sec   Loss 1.7517   LearningRate 0.0036   Epoch: 18   Global Step: 97230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:16:00,281-Speed 10511.31 samples/sec   Loss 1.7466   LearningRate 0.0036   Epoch: 18   Global Step: 97240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:16:08,093-Speed 10487.70 samples/sec   Loss 1.7337   LearningRate 0.0036   Epoch: 18   Global Step: 97250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:16:15,897-Speed 10499.07 samples/sec   Loss 1.7359   LearningRate 0.0036   Epoch: 18   Global Step: 97260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:16:23,740-Speed 10468.49 samples/sec   Loss 1.7219   LearningRate 0.0036   Epoch: 18   Global Step: 97270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:16:31,558-Speed 10480.19 samples/sec   Loss 1.7383   LearningRate 0.0036   Epoch: 18   Global Step: 97280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:16:39,380-Speed 10473.95 samples/sec   Loss 1.7482   LearningRate 0.0036   Epoch: 18   Global Step: 97290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:16:47,179-Speed 10519.56 samples/sec   Loss 1.7364   LearningRate 0.0035   Epoch: 18   Global Step: 97300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:16:55,185-Speed 10548.36 samples/sec   Loss 1.7429   LearningRate 0.0035   Epoch: 18   Global Step: 97310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:17:02,976-Speed 10516.31 samples/sec   Loss 1.7182   LearningRate 0.0035   Epoch: 18   Global Step: 97320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:17:10,766-Speed 10528.52 samples/sec   Loss 1.7222   LearningRate 0.0035   Epoch: 18   Global Step: 97330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:17:18,542-Speed 10547.60 samples/sec   Loss 1.7210   LearningRate 0.0035   Epoch: 18   Global Step: 97340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:17:26,334-Speed 10513.94 samples/sec   Loss 1.7340   LearningRate 0.0035   Epoch: 18   Global Step: 97350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:17:34,145-Speed 10501.25 samples/sec   Loss 1.7529   LearningRate 0.0035   Epoch: 18   Global Step: 97360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:17:41,941-Speed 10524.43 samples/sec   Loss 1.7333   LearningRate 0.0035   Epoch: 18   Global Step: 97370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:17:49,729-Speed 10520.63 samples/sec   Loss 1.7442   LearningRate 0.0035   Epoch: 18   Global Step: 97380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:17:57,509-Speed 10530.54 samples/sec   Loss 1.7478   LearningRate 0.0035   Epoch: 18   Global Step: 97390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:18:05,304-Speed 10521.55 samples/sec   Loss 1.7510   LearningRate 0.0034   Epoch: 18   Global Step: 97400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:18:13,088-Speed 10534.17 samples/sec   Loss 1.7257   LearningRate 0.0034   Epoch: 18   Global Step: 97410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:18:20,863-Speed 10548.53 samples/sec   Loss 1.7308   LearningRate 0.0034   Epoch: 18   Global Step: 97420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:18:28,638-Speed 10537.47 samples/sec   Loss 1.7215   LearningRate 0.0034   Epoch: 18   Global Step: 97430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:18:36,423-Speed 10540.65 samples/sec   Loss 1.7195   LearningRate 0.0034   Epoch: 18   Global Step: 97440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:18:44,223-Speed 10507.72 samples/sec   Loss 1.7324   LearningRate 0.0034   Epoch: 18   Global Step: 97450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:18:52,032-Speed 10520.91 samples/sec   Loss 1.7218   LearningRate 0.0034   Epoch: 18   Global Step: 97460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:18:59,822-Speed 10517.18 samples/sec   Loss 1.7159   LearningRate 0.0034   Epoch: 18   Global Step: 97470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:19:07,702-Speed 10397.19 samples/sec   Loss 1.7291   LearningRate 0.0034   Epoch: 18   Global Step: 97480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:19:15,504-Speed 10522.93 samples/sec   Loss 1.6995   LearningRate 0.0033   Epoch: 18   Global Step: 97490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:19:23,298-Speed 10518.97 samples/sec   Loss 1.7020   LearningRate 0.0033   Epoch: 18   Global Step: 97500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:19:31,102-Speed 10497.99 samples/sec   Loss 1.7066   LearningRate 0.0033   Epoch: 18   Global Step: 97510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:19:38,897-Speed 10510.63 samples/sec   Loss 1.7046   LearningRate 0.0033   Epoch: 18   Global Step: 97520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:19:46,698-Speed 10518.22 samples/sec   Loss 1.7119   LearningRate 0.0033   Epoch: 18   Global Step: 97530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:19:54,483-Speed 10536.30 samples/sec   Loss 1.7118   LearningRate 0.0033   Epoch: 18   Global Step: 97540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:20:02,277-Speed 10511.26 samples/sec   Loss 1.7237   LearningRate 0.0033   Epoch: 18   Global Step: 97550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:20:10,108-Speed 10480.86 samples/sec   Loss 1.7136   LearningRate 0.0033   Epoch: 18   Global Step: 97560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:20:17,885-Speed 10543.20 samples/sec   Loss 1.7237   LearningRate 0.0033   Epoch: 18   Global Step: 97570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:20:25,679-Speed 10524.34 samples/sec   Loss 1.7029   LearningRate 0.0032   Epoch: 18   Global Step: 97580   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-01-16 12:20:33,449-Speed 10544.36 samples/sec   Loss 1.6984   LearningRate 0.0032   Epoch: 18   Global Step: 97590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:20:41,257-Speed 10526.36 samples/sec   Loss 1.7308   LearningRate 0.0032   Epoch: 18   Global Step: 97600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:20:49,036-Speed 10531.55 samples/sec   Loss 1.7262   LearningRate 0.0032   Epoch: 18   Global Step: 97610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:20:56,837-Speed 10517.57 samples/sec   Loss 1.7161   LearningRate 0.0032   Epoch: 18   Global Step: 97620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:21:04,650-Speed 10499.29 samples/sec   Loss 1.7058   LearningRate 0.0032   Epoch: 18   Global Step: 97630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:21:12,462-Speed 10486.78 samples/sec   Loss 1.7119   LearningRate 0.0032   Epoch: 18   Global Step: 97640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:21:20,247-Speed 10524.24 samples/sec   Loss 1.6938   LearningRate 0.0032   Epoch: 18   Global Step: 97650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:21:28,029-Speed 10540.71 samples/sec   Loss 1.6875   LearningRate 0.0032   Epoch: 18   Global Step: 97660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:21:35,827-Speed 10522.77 samples/sec   Loss 1.7025   LearningRate 0.0032   Epoch: 18   Global Step: 97670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:21:43,629-Speed 10501.30 samples/sec   Loss 1.7033   LearningRate 0.0031   Epoch: 18   Global Step: 97680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:21:51,421-Speed 10520.88 samples/sec   Loss 1.6884   LearningRate 0.0031   Epoch: 18   Global Step: 97690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:21:59,240-Speed 10478.76 samples/sec   Loss 1.7102   LearningRate 0.0031   Epoch: 18   Global Step: 97700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:22:07,051-Speed 10488.64 samples/sec   Loss 1.6888   LearningRate 0.0031   Epoch: 18   Global Step: 97710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:22:14,855-Speed 10498.81 samples/sec   Loss 1.6967   LearningRate 0.0031   Epoch: 18   Global Step: 97720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:22:22,659-Speed 10498.98 samples/sec   Loss 1.7211   LearningRate 0.0031   Epoch: 18   Global Step: 97730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:22:30,471-Speed 10486.99 samples/sec   Loss 1.7108   LearningRate 0.0031   Epoch: 18   Global Step: 97740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:22:38,279-Speed 10493.68 samples/sec   Loss 1.6850   LearningRate 0.0031   Epoch: 18   Global Step: 97750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:22:46,067-Speed 10520.92 samples/sec   Loss 1.7141   LearningRate 0.0031   Epoch: 18   Global Step: 97760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:22:53,862-Speed 10510.01 samples/sec   Loss 1.7166   LearningRate 0.0030   Epoch: 18   Global Step: 97770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:23:01,634-Speed 10541.81 samples/sec   Loss 1.6981   LearningRate 0.0030   Epoch: 18   Global Step: 97780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:23:09,427-Speed 10514.02 samples/sec   Loss 1.7145   LearningRate 0.0030   Epoch: 18   Global Step: 97790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:23:17,229-Speed 10501.05 samples/sec   Loss 1.6897   LearningRate 0.0030   Epoch: 18   Global Step: 97800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:23:25,023-Speed 10512.14 samples/sec   Loss 1.7219   LearningRate 0.0030   Epoch: 18   Global Step: 97810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:23:32,839-Speed 10482.78 samples/sec   Loss 1.7243   LearningRate 0.0030   Epoch: 18   Global Step: 97820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:23:40,626-Speed 10522.38 samples/sec   Loss 1.6971   LearningRate 0.0030   Epoch: 18   Global Step: 97830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:23:48,452-Speed 10468.13 samples/sec   Loss 1.6826   LearningRate 0.0030   Epoch: 18   Global Step: 97840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:23:56,294-Speed 10447.46 samples/sec   Loss 1.7021   LearningRate 0.0030   Epoch: 18   Global Step: 97850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:24:04,088-Speed 10511.48 samples/sec   Loss 1.6765   LearningRate 0.0030   Epoch: 18   Global Step: 97860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:24:11,895-Speed 10495.07 samples/sec   Loss 1.6816   LearningRate 0.0029   Epoch: 18   Global Step: 97870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:24:19,702-Speed 10493.90 samples/sec   Loss 1.6911   LearningRate 0.0029   Epoch: 18   Global Step: 97880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:24:27,519-Speed 10482.18 samples/sec   Loss 1.6985   LearningRate 0.0029   Epoch: 18   Global Step: 97890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:24:35,305-Speed 10527.69 samples/sec   Loss 1.6767   LearningRate 0.0029   Epoch: 18   Global Step: 97900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:24:43,092-Speed 10520.61 samples/sec   Loss 1.6941   LearningRate 0.0029   Epoch: 18   Global Step: 97910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:24:50,883-Speed 10516.62 samples/sec   Loss 1.6908   LearningRate 0.0029   Epoch: 18   Global Step: 97920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:24:58,715-Speed 10461.12 samples/sec   Loss 1.6771   LearningRate 0.0029   Epoch: 18   Global Step: 97930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:25:06,761-Speed 10501.34 samples/sec   Loss 1.6821   LearningRate 0.0029   Epoch: 18   Global Step: 97940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:25:14,536-Speed 10537.38 samples/sec   Loss 1.6765   LearningRate 0.0029   Epoch: 18   Global Step: 97950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:25:22,334-Speed 10513.85 samples/sec   Loss 1.6694   LearningRate 0.0029   Epoch: 18   Global Step: 97960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:25:30,127-Speed 10512.95 samples/sec   Loss 1.6949   LearningRate 0.0028   Epoch: 18   Global Step: 97970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:25:37,954-Speed 10468.11 samples/sec   Loss 1.6936   LearningRate 0.0028   Epoch: 18   Global Step: 97980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:25:45,761-Speed 10551.76 samples/sec   Loss 1.7013   LearningRate 0.0028   Epoch: 18   Global Step: 97990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:25:53,552-Speed 10515.65 samples/sec   Loss 1.6909   LearningRate 0.0028   Epoch: 18   Global Step: 98000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:26:01,349-Speed 10507.29 samples/sec   Loss 1.6726   LearningRate 0.0028   Epoch: 18   Global Step: 98010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:26:09,156-Speed 10495.73 samples/sec   Loss 1.6765   LearningRate 0.0028   Epoch: 18   Global Step: 98020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:26:16,967-Speed 10492.59 samples/sec   Loss 1.6726   LearningRate 0.0028   Epoch: 18   Global Step: 98030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:26:24,754-Speed 10522.28 samples/sec   Loss 1.6807   LearningRate 0.0028   Epoch: 18   Global Step: 98040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:26:32,544-Speed 10517.75 samples/sec   Loss 1.6868   LearningRate 0.0028   Epoch: 18   Global Step: 98050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:26:40,326-Speed 10527.32 samples/sec   Loss 1.6871   LearningRate 0.0028   Epoch: 18   Global Step: 98060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:26:48,158-Speed 10461.60 samples/sec   Loss 1.6984   LearningRate 0.0027   Epoch: 18   Global Step: 98070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:26:55,924-Speed 10549.78 samples/sec   Loss 1.6800   LearningRate 0.0027   Epoch: 18   Global Step: 98080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:27:03,721-Speed 10507.95 samples/sec   Loss 1.6654   LearningRate 0.0027   Epoch: 18   Global Step: 98090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:27:11,499-Speed 10537.66 samples/sec   Loss 1.6746   LearningRate 0.0027   Epoch: 18   Global Step: 98100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:27:19,288-Speed 10519.36 samples/sec   Loss 1.7022   LearningRate 0.0027   Epoch: 18   Global Step: 98110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:27:27,090-Speed 10501.35 samples/sec   Loss 1.6690   LearningRate 0.0027   Epoch: 18   Global Step: 98120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:27:34,872-Speed 10526.68 samples/sec   Loss 1.6729   LearningRate 0.0027   Epoch: 18   Global Step: 98130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:27:42,689-Speed 10531.91 samples/sec   Loss 1.6830   LearningRate 0.0027   Epoch: 18   Global Step: 98140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:27:50,527-Speed 10454.06 samples/sec   Loss 1.6694   LearningRate 0.0027   Epoch: 18   Global Step: 98150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:27:58,310-Speed 10526.68 samples/sec   Loss 1.6783   LearningRate 0.0027   Epoch: 18   Global Step: 98160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:28:06,115-Speed 10496.49 samples/sec   Loss 1.6593   LearningRate 0.0026   Epoch: 18   Global Step: 98170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:28:13,974-Speed 10425.69 samples/sec   Loss 1.6640   LearningRate 0.0026   Epoch: 18   Global Step: 98180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:28:21,785-Speed 10490.64 samples/sec   Loss 1.6686   LearningRate 0.0026   Epoch: 18   Global Step: 98190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:28:29,569-Speed 10524.26 samples/sec   Loss 1.6706   LearningRate 0.0026   Epoch: 18   Global Step: 98200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:28:37,352-Speed 10529.07 samples/sec   Loss 1.6808   LearningRate 0.0026   Epoch: 18   Global Step: 98210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:28:45,240-Speed 10385.79 samples/sec   Loss 1.6532   LearningRate 0.0026   Epoch: 18   Global Step: 98220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:28:53,030-Speed 10517.97 samples/sec   Loss 1.6800   LearningRate 0.0026   Epoch: 18   Global Step: 98230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:29:00,827-Speed 10507.88 samples/sec   Loss 1.6731   LearningRate 0.0026   Epoch: 18   Global Step: 98240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:29:08,637-Speed 10490.58 samples/sec   Loss 1.6623   LearningRate 0.0026   Epoch: 18   Global Step: 98250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:29:16,431-Speed 10513.71 samples/sec   Loss 1.6726   LearningRate 0.0026   Epoch: 18   Global Step: 98260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:29:24,254-Speed 10528.12 samples/sec   Loss 1.6610   LearningRate 0.0026   Epoch: 18   Global Step: 98270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:29:32,042-Speed 10519.83 samples/sec   Loss 1.6734   LearningRate 0.0025   Epoch: 18   Global Step: 98280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:29:39,871-Speed 10464.92 samples/sec   Loss 1.6677   LearningRate 0.0025   Epoch: 18   Global Step: 98290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:29:47,662-Speed 10517.34 samples/sec   Loss 1.6772   LearningRate 0.0025   Epoch: 18   Global Step: 98300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:29:55,441-Speed 10532.62 samples/sec   Loss 1.6763   LearningRate 0.0025   Epoch: 18   Global Step: 98310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:30:03,233-Speed 10513.15 samples/sec   Loss 1.6675   LearningRate 0.0025   Epoch: 18   Global Step: 98320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:30:11,029-Speed 10510.42 samples/sec   Loss 1.6726   LearningRate 0.0025   Epoch: 18   Global Step: 98330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:30:18,799-Speed 10544.94 samples/sec   Loss 1.6616   LearningRate 0.0025   Epoch: 18   Global Step: 98340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:30:26,598-Speed 10505.20 samples/sec   Loss 1.6751   LearningRate 0.0025   Epoch: 18   Global Step: 98350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:30:34,430-Speed 10464.08 samples/sec   Loss 1.6724   LearningRate 0.0025   Epoch: 18   Global Step: 98360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:30:42,237-Speed 10493.55 samples/sec   Loss 1.6672   LearningRate 0.0025   Epoch: 18   Global Step: 98370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:30:50,067-Speed 10463.48 samples/sec   Loss 1.6762   LearningRate 0.0024   Epoch: 18   Global Step: 98380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:30:58,162-Speed 10465.89 samples/sec   Loss 1.6370   LearningRate 0.0024   Epoch: 18   Global Step: 98390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:31:05,987-Speed 10471.59 samples/sec   Loss 1.6586   LearningRate 0.0024   Epoch: 18   Global Step: 98400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:31:13,852-Speed 10417.05 samples/sec   Loss 1.6691   LearningRate 0.0024   Epoch: 18   Global Step: 98410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:31:21,673-Speed 10481.72 samples/sec   Loss 1.6722   LearningRate 0.0024   Epoch: 18   Global Step: 98420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:31:29,477-Speed 10499.09 samples/sec   Loss 1.6334   LearningRate 0.0024   Epoch: 18   Global Step: 98430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:31:37,306-Speed 10465.06 samples/sec   Loss 1.6436   LearningRate 0.0024   Epoch: 18   Global Step: 98440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:31:45,110-Speed 10497.87 samples/sec   Loss 1.6498   LearningRate 0.0024   Epoch: 18   Global Step: 98450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:31:52,922-Speed 10488.69 samples/sec   Loss 1.6613   LearningRate 0.0024   Epoch: 18   Global Step: 98460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:32:00,751-Speed 10464.97 samples/sec   Loss 1.6512   LearningRate 0.0024   Epoch: 18   Global Step: 98470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:32:08,579-Speed 10467.17 samples/sec   Loss 1.6622   LearningRate 0.0024   Epoch: 18   Global Step: 98480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:32:16,403-Speed 10470.92 samples/sec   Loss 1.6486   LearningRate 0.0023   Epoch: 18   Global Step: 98490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:32:24,238-Speed 10460.69 samples/sec   Loss 1.6617   LearningRate 0.0023   Epoch: 18   Global Step: 98500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:32:32,077-Speed 10452.27 samples/sec   Loss 1.6664   LearningRate 0.0023   Epoch: 18   Global Step: 98510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:33:00,044-Speed 2929.73 samples/sec   Loss 1.6659   LearningRate 0.0023   Epoch: 19   Global Step: 98520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:33:07,808-Speed 10553.19 samples/sec   Loss 1.6732   LearningRate 0.0023   Epoch: 19   Global Step: 98530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:33:15,587-Speed 10531.48 samples/sec   Loss 1.6466   LearningRate 0.0023   Epoch: 19   Global Step: 98540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:33:23,362-Speed 10537.66 samples/sec   Loss 1.6474   LearningRate 0.0023   Epoch: 19   Global Step: 98550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:33:31,130-Speed 10548.48 samples/sec   Loss 1.6584   LearningRate 0.0023   Epoch: 19   Global Step: 98560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:33:38,905-Speed 10536.93 samples/sec   Loss 1.6512   LearningRate 0.0023   Epoch: 19   Global Step: 98570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:33:46,689-Speed 10525.71 samples/sec   Loss 1.6320   LearningRate 0.0023   Epoch: 19   Global Step: 98580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:33:54,463-Speed 10538.38 samples/sec   Loss 1.6495   LearningRate 0.0023   Epoch: 19   Global Step: 98590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:34:02,237-Speed 10540.04 samples/sec   Loss 1.6454   LearningRate 0.0023   Epoch: 19   Global Step: 98600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:34:10,074-Speed 10544.17 samples/sec   Loss 1.6502   LearningRate 0.0022   Epoch: 19   Global Step: 98610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:34:17,878-Speed 10498.70 samples/sec   Loss 1.6330   LearningRate 0.0022   Epoch: 19   Global Step: 98620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:34:25,675-Speed 10507.81 samples/sec   Loss 1.6435   LearningRate 0.0022   Epoch: 19   Global Step: 98630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:34:33,497-Speed 10475.86 samples/sec   Loss 1.6449   LearningRate 0.0022   Epoch: 19   Global Step: 98640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:34:41,394-Speed 10521.78 samples/sec   Loss 1.6375   LearningRate 0.0022   Epoch: 19   Global Step: 98650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:34:49,193-Speed 10504.10 samples/sec   Loss 1.6393   LearningRate 0.0022   Epoch: 19   Global Step: 98660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:34:56,980-Speed 10521.96 samples/sec   Loss 1.6401   LearningRate 0.0022   Epoch: 19   Global Step: 98670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:35:04,779-Speed 10510.58 samples/sec   Loss 1.6332   LearningRate 0.0022   Epoch: 19   Global Step: 98680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:35:12,566-Speed 10521.39 samples/sec   Loss 1.6262   LearningRate 0.0022   Epoch: 19   Global Step: 98690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:35:20,346-Speed 10530.65 samples/sec   Loss 1.6113   LearningRate 0.0022   Epoch: 19   Global Step: 98700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:35:28,121-Speed 10537.20 samples/sec   Loss 1.6318   LearningRate 0.0022   Epoch: 19   Global Step: 98710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:35:35,911-Speed 10531.20 samples/sec   Loss 1.6368   LearningRate 0.0021   Epoch: 19   Global Step: 98720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:35:43,700-Speed 10518.56 samples/sec   Loss 1.6314   LearningRate 0.0021   Epoch: 19   Global Step: 98730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:35:51,501-Speed 10502.92 samples/sec   Loss 1.6429   LearningRate 0.0021   Epoch: 19   Global Step: 98740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:35:59,269-Speed 10548.38 samples/sec   Loss 1.6492   LearningRate 0.0021   Epoch: 19   Global Step: 98750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:36:07,037-Speed 10548.31 samples/sec   Loss 1.6374   LearningRate 0.0021   Epoch: 19   Global Step: 98760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:36:14,814-Speed 10535.87 samples/sec   Loss 1.6416   LearningRate 0.0021   Epoch: 19   Global Step: 98770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:36:22,612-Speed 10506.54 samples/sec   Loss 1.6377   LearningRate 0.0021   Epoch: 19   Global Step: 98780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:36:30,438-Speed 10469.90 samples/sec   Loss 1.6305   LearningRate 0.0021   Epoch: 19   Global Step: 98790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:36:38,232-Speed 10513.25 samples/sec   Loss 1.6194   LearningRate 0.0021   Epoch: 19   Global Step: 98800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:36:46,017-Speed 10523.02 samples/sec   Loss 1.6271   LearningRate 0.0021   Epoch: 19   Global Step: 98810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:36:53,798-Speed 10530.63 samples/sec   Loss 1.6490   LearningRate 0.0021   Epoch: 19   Global Step: 98820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:37:01,635-Speed 10529.99 samples/sec   Loss 1.6316   LearningRate 0.0021   Epoch: 19   Global Step: 98830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:37:09,415-Speed 10530.25 samples/sec   Loss 1.6431   LearningRate 0.0020   Epoch: 19   Global Step: 98840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:37:17,203-Speed 10520.69 samples/sec   Loss 1.6432   LearningRate 0.0020   Epoch: 19   Global Step: 98850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:37:24,976-Speed 10540.59 samples/sec   Loss 1.6175   LearningRate 0.0020   Epoch: 19   Global Step: 98860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:37:32,769-Speed 10511.91 samples/sec   Loss 1.6288   LearningRate 0.0020   Epoch: 19   Global Step: 98870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:37:40,569-Speed 10504.50 samples/sec   Loss 1.6114   LearningRate 0.0020   Epoch: 19   Global Step: 98880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:37:48,355-Speed 10522.82 samples/sec   Loss 1.6352   LearningRate 0.0020   Epoch: 19   Global Step: 98890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:37:56,140-Speed 10523.60 samples/sec   Loss 1.6366   LearningRate 0.0020   Epoch: 19   Global Step: 98900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:38:03,931-Speed 10516.85 samples/sec   Loss 1.6370   LearningRate 0.0020   Epoch: 19   Global Step: 98910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:38:11,762-Speed 10463.23 samples/sec   Loss 1.6246   LearningRate 0.0020   Epoch: 19   Global Step: 98920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:38:19,601-Speed 10451.39 samples/sec   Loss 1.6336   LearningRate 0.0020   Epoch: 19   Global Step: 98930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:38:27,443-Speed 10449.33 samples/sec   Loss 1.6109   LearningRate 0.0020   Epoch: 19   Global Step: 98940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:38:35,304-Speed 10422.77 samples/sec   Loss 1.6159   LearningRate 0.0020   Epoch: 19   Global Step: 98950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:38:43,164-Speed 10424.09 samples/sec   Loss 1.6358   LearningRate 0.0019   Epoch: 19   Global Step: 98960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:38:51,045-Speed 10395.93 samples/sec   Loss 1.6232   LearningRate 0.0019   Epoch: 19   Global Step: 98970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:38:58,895-Speed 10436.84 samples/sec   Loss 1.6211   LearningRate 0.0019   Epoch: 19   Global Step: 98980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:39:06,742-Speed 10441.93 samples/sec   Loss 1.6312   LearningRate 0.0019   Epoch: 19   Global Step: 98990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:39:14,580-Speed 10452.85 samples/sec   Loss 1.6217   LearningRate 0.0019   Epoch: 19   Global Step: 99000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:39:22,434-Speed 10431.05 samples/sec   Loss 1.6288   LearningRate 0.0019   Epoch: 19   Global Step: 99010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:39:30,276-Speed 10448.25 samples/sec   Loss 1.6063   LearningRate 0.0019   Epoch: 19   Global Step: 99020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:39:38,117-Speed 10456.25 samples/sec   Loss 1.6251   LearningRate 0.0019   Epoch: 19   Global Step: 99030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:39:45,931-Speed 10484.83 samples/sec   Loss 1.6212   LearningRate 0.0019   Epoch: 19   Global Step: 99040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:39:53,790-Speed 10424.50 samples/sec   Loss 1.6048   LearningRate 0.0019   Epoch: 19   Global Step: 99050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:40:01,626-Speed 10456.13 samples/sec   Loss 1.5989   LearningRate 0.0019   Epoch: 19   Global Step: 99060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:40:09,475-Speed 10437.58 samples/sec   Loss 1.6192   LearningRate 0.0019   Epoch: 19   Global Step: 99070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:40:17,297-Speed 10475.51 samples/sec   Loss 1.6216   LearningRate 0.0018   Epoch: 19   Global Step: 99080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:40:25,165-Speed 10411.97 samples/sec   Loss 1.5978   LearningRate 0.0018   Epoch: 19   Global Step: 99090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:40:32,994-Speed 10464.96 samples/sec   Loss 1.5963   LearningRate 0.0018   Epoch: 19   Global Step: 99100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:40:40,844-Speed 10437.66 samples/sec   Loss 1.6176   LearningRate 0.0018   Epoch: 19   Global Step: 99110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:40:48,702-Speed 10426.58 samples/sec   Loss 1.6238   LearningRate 0.0018   Epoch: 19   Global Step: 99120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:40:56,529-Speed 10468.51 samples/sec   Loss 1.6480   LearningRate 0.0018   Epoch: 19   Global Step: 99130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:41:04,412-Speed 10461.97 samples/sec   Loss 1.6316   LearningRate 0.0018   Epoch: 19   Global Step: 99140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:41:12,238-Speed 10468.57 samples/sec   Loss 1.5966   LearningRate 0.0018   Epoch: 19   Global Step: 99150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:41:20,057-Speed 10478.65 samples/sec   Loss 1.6142   LearningRate 0.0018   Epoch: 19   Global Step: 99160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:41:27,893-Speed 10456.84 samples/sec   Loss 1.6255   LearningRate 0.0018   Epoch: 19   Global Step: 99170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:41:35,734-Speed 10448.70 samples/sec   Loss 1.6156   LearningRate 0.0018   Epoch: 19   Global Step: 99180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:41:43,561-Speed 10466.91 samples/sec   Loss 1.6059   LearningRate 0.0018   Epoch: 19   Global Step: 99190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:41:51,442-Speed 10396.77 samples/sec   Loss 1.6196   LearningRate 0.0018   Epoch: 19   Global Step: 99200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:41:59,401-Speed 10504.82 samples/sec   Loss 1.5750   LearningRate 0.0017   Epoch: 19   Global Step: 99210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:42:07,256-Speed 10429.39 samples/sec   Loss 1.6009   LearningRate 0.0017   Epoch: 19   Global Step: 99220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:42:16,297-Speed 10458.17 samples/sec   Loss 1.6054   LearningRate 0.0017   Epoch: 19   Global Step: 99230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:42:24,123-Speed 10469.39 samples/sec   Loss 1.5915   LearningRate 0.0017   Epoch: 19   Global Step: 99240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:42:31,943-Speed 10476.16 samples/sec   Loss 1.5876   LearningRate 0.0017   Epoch: 19   Global Step: 99250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:42:39,776-Speed 10460.50 samples/sec   Loss 1.6096   LearningRate 0.0017   Epoch: 19   Global Step: 99260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:42:47,640-Speed 10418.93 samples/sec   Loss 1.6325   LearningRate 0.0017   Epoch: 19   Global Step: 99270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:42:55,449-Speed 10491.47 samples/sec   Loss 1.5825   LearningRate 0.0017   Epoch: 19   Global Step: 99280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:43:03,289-Speed 10450.69 samples/sec   Loss 1.5855   LearningRate 0.0017   Epoch: 19   Global Step: 99290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:43:11,119-Speed 10464.47 samples/sec   Loss 1.5879   LearningRate 0.0017   Epoch: 19   Global Step: 99300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:43:18,967-Speed 10438.83 samples/sec   Loss 1.6112   LearningRate 0.0017   Epoch: 19   Global Step: 99310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:43:26,809-Speed 10447.52 samples/sec   Loss 1.6012   LearningRate 0.0017   Epoch: 19   Global Step: 99320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:43:34,636-Speed 10476.47 samples/sec   Loss 1.6218   LearningRate 0.0017   Epoch: 19   Global Step: 99330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:43:42,458-Speed 10473.71 samples/sec   Loss 1.6045   LearningRate 0.0016   Epoch: 19   Global Step: 99340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:43:50,269-Speed 10490.06 samples/sec   Loss 1.5968   LearningRate 0.0016   Epoch: 19   Global Step: 99350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:43:58,051-Speed 10528.04 samples/sec   Loss 1.5761   LearningRate 0.0016   Epoch: 19   Global Step: 99360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:44:05,833-Speed 10529.06 samples/sec   Loss 1.6012   LearningRate 0.0016   Epoch: 19   Global Step: 99370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:44:13,632-Speed 10505.06 samples/sec   Loss 1.6155   LearningRate 0.0016   Epoch: 19   Global Step: 99380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:44:21,414-Speed 10527.92 samples/sec   Loss 1.6171   LearningRate 0.0016   Epoch: 19   Global Step: 99390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:44:29,216-Speed 10501.19 samples/sec   Loss 1.6105   LearningRate 0.0016   Epoch: 19   Global Step: 99400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:44:37,024-Speed 10492.76 samples/sec   Loss 1.5906   LearningRate 0.0016   Epoch: 19   Global Step: 99410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:44:44,838-Speed 10485.56 samples/sec   Loss 1.5865   LearningRate 0.0016   Epoch: 19   Global Step: 99420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:44:52,646-Speed 10493.95 samples/sec   Loss 1.6088   LearningRate 0.0016   Epoch: 19   Global Step: 99430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:45:00,436-Speed 10517.18 samples/sec   Loss 1.5913   LearningRate 0.0016   Epoch: 19   Global Step: 99440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:45:08,247-Speed 10489.47 samples/sec   Loss 1.6031   LearningRate 0.0016   Epoch: 19   Global Step: 99450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:45:16,052-Speed 10497.77 samples/sec   Loss 1.5878   LearningRate 0.0016   Epoch: 19   Global Step: 99460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:45:23,864-Speed 10488.79 samples/sec   Loss 1.5858   LearningRate 0.0015   Epoch: 19   Global Step: 99470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:45:33,231-Speed 10532.78 samples/sec   Loss 1.5899   LearningRate 0.0015   Epoch: 19   Global Step: 99480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:45:41,031-Speed 10503.59 samples/sec   Loss 1.5940   LearningRate 0.0015   Epoch: 19   Global Step: 99490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:45:48,830-Speed 10504.94 samples/sec   Loss 1.5769   LearningRate 0.0015   Epoch: 19   Global Step: 99500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:45:56,636-Speed 10496.81 samples/sec   Loss 1.6082   LearningRate 0.0015   Epoch: 19   Global Step: 99510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:46:04,437-Speed 10502.66 samples/sec   Loss 1.5942   LearningRate 0.0015   Epoch: 19   Global Step: 99520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:46:12,230-Speed 10512.74 samples/sec   Loss 1.5845   LearningRate 0.0015   Epoch: 19   Global Step: 99530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:46:20,018-Speed 10520.54 samples/sec   Loss 1.5882   LearningRate 0.0015   Epoch: 19   Global Step: 99540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:46:27,813-Speed 10509.96 samples/sec   Loss 1.5830   LearningRate 0.0015   Epoch: 19   Global Step: 99550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:46:35,610-Speed 10508.83 samples/sec   Loss 1.5950   LearningRate 0.0015   Epoch: 19   Global Step: 99560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:46:43,402-Speed 10513.82 samples/sec   Loss 1.5822   LearningRate 0.0015   Epoch: 19   Global Step: 99570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:46:51,183-Speed 10530.65 samples/sec   Loss 1.5924   LearningRate 0.0015   Epoch: 19   Global Step: 99580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:46:58,973-Speed 10516.00 samples/sec   Loss 1.5763   LearningRate 0.0015   Epoch: 19   Global Step: 99590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:47:06,845-Speed 10492.45 samples/sec   Loss 1.5969   LearningRate 0.0015   Epoch: 19   Global Step: 99600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:47:14,686-Speed 10450.09 samples/sec   Loss 1.5910   LearningRate 0.0014   Epoch: 19   Global Step: 99610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:47:22,478-Speed 10514.68 samples/sec   Loss 1.5792   LearningRate 0.0014   Epoch: 19   Global Step: 99620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:47:30,269-Speed 10516.05 samples/sec   Loss 1.6032   LearningRate 0.0014   Epoch: 19   Global Step: 99630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:47:38,052-Speed 10527.30 samples/sec   Loss 1.5922   LearningRate 0.0014   Epoch: 19   Global Step: 99640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:47:45,983-Speed 10523.13 samples/sec   Loss 1.5784   LearningRate 0.0014   Epoch: 19   Global Step: 99650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:47:53,786-Speed 10500.20 samples/sec   Loss 1.5924   LearningRate 0.0014   Epoch: 19   Global Step: 99660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:48:01,591-Speed 10496.41 samples/sec   Loss 1.5842   LearningRate 0.0014   Epoch: 19   Global Step: 99670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:48:09,384-Speed 10514.16 samples/sec   Loss 1.5789   LearningRate 0.0014   Epoch: 19   Global Step: 99680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:48:17,185-Speed 10502.98 samples/sec   Loss 1.5739   LearningRate 0.0014   Epoch: 19   Global Step: 99690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:48:25,000-Speed 10484.23 samples/sec   Loss 1.5860   LearningRate 0.0014   Epoch: 19   Global Step: 99700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:48:33,076-Speed 10533.84 samples/sec   Loss 1.5754   LearningRate 0.0014   Epoch: 19   Global Step: 99710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:48:40,874-Speed 10506.26 samples/sec   Loss 1.5871   LearningRate 0.0014   Epoch: 19   Global Step: 99720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:48:48,695-Speed 10475.22 samples/sec   Loss 1.5828   LearningRate 0.0014   Epoch: 19   Global Step: 99730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:48:58,188-Speed 10488.97 samples/sec   Loss 1.5746   LearningRate 0.0014   Epoch: 19   Global Step: 99740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:49:05,974-Speed 10532.22 samples/sec   Loss 1.5777   LearningRate 0.0013   Epoch: 19   Global Step: 99750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:49:13,753-Speed 10531.90 samples/sec   Loss 1.5885   LearningRate 0.0013   Epoch: 19   Global Step: 99760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:49:21,540-Speed 10520.79 samples/sec   Loss 1.5819   LearningRate 0.0013   Epoch: 19   Global Step: 99770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:49:29,323-Speed 10527.82 samples/sec   Loss 1.5748   LearningRate 0.0013   Epoch: 19   Global Step: 99780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:49:37,113-Speed 10516.95 samples/sec   Loss 1.5852   LearningRate 0.0013   Epoch: 19   Global Step: 99790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:49:44,931-Speed 10480.86 samples/sec   Loss 1.5620   LearningRate 0.0013   Epoch: 19   Global Step: 99800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:49:52,719-Speed 10518.41 samples/sec   Loss 1.5926   LearningRate 0.0013   Epoch: 19   Global Step: 99810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:50:00,512-Speed 10514.96 samples/sec   Loss 1.5594   LearningRate 0.0013   Epoch: 19   Global Step: 99820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:50:08,313-Speed 10502.10 samples/sec   Loss 1.5687   LearningRate 0.0013   Epoch: 19   Global Step: 99830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:50:16,092-Speed 10533.37 samples/sec   Loss 1.5615   LearningRate 0.0013   Epoch: 19   Global Step: 99840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:50:23,960-Speed 10412.38 samples/sec   Loss 1.5777   LearningRate 0.0013   Epoch: 19   Global Step: 99850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:50:31,743-Speed 10527.08 samples/sec   Loss 1.5987   LearningRate 0.0013   Epoch: 19   Global Step: 99860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:50:39,556-Speed 10486.41 samples/sec   Loss 1.5997   LearningRate 0.0013   Epoch: 19   Global Step: 99870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:50:47,341-Speed 10524.70 samples/sec   Loss 1.5657   LearningRate 0.0013   Epoch: 19   Global Step: 99880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:50:55,138-Speed 10507.76 samples/sec   Loss 1.5765   LearningRate 0.0013   Epoch: 19   Global Step: 99890   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-01-16 12:51:02,931-Speed 10513.70 samples/sec   Loss 1.5820   LearningRate 0.0012   Epoch: 19   Global Step: 99900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:51:10,714-Speed 10526.37 samples/sec   Loss 1.5915   LearningRate 0.0012   Epoch: 19   Global Step: 99910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:51:18,531-Speed 10482.51 samples/sec   Loss 1.5588   LearningRate 0.0012   Epoch: 19   Global Step: 99920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:51:26,361-Speed 10464.12 samples/sec   Loss 1.5926   LearningRate 0.0012   Epoch: 19   Global Step: 99930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:51:34,169-Speed 10494.41 samples/sec   Loss 1.5753   LearningRate 0.0012   Epoch: 19   Global Step: 99940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:51:41,942-Speed 10540.03 samples/sec   Loss 1.5873   LearningRate 0.0012   Epoch: 19   Global Step: 99950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:51:49,746-Speed 10508.21 samples/sec   Loss 1.5665   LearningRate 0.0012   Epoch: 19   Global Step: 99960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:51:57,535-Speed 10518.95 samples/sec   Loss 1.5669   LearningRate 0.0012   Epoch: 19   Global Step: 99970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:52:05,383-Speed 10440.05 samples/sec   Loss 1.5713   LearningRate 0.0012   Epoch: 19   Global Step: 99980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:52:13,190-Speed 10493.78 samples/sec   Loss 1.5690   LearningRate 0.0012   Epoch: 19   Global Step: 99990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:52:20,996-Speed 10496.03 samples/sec   Loss 1.5840   LearningRate 0.0012   Epoch: 19   Global Step: 100000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:52:48,849-[lfw][100000]XNorm: 23.508723
Training: 2022-01-16 12:52:48,849-[lfw][100000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-01-16 12:52:48,850-[lfw][100000]Accuracy-Highest: 0.99817
Training: 2022-01-16 12:53:21,017-[cfp_fp][100000]XNorm: 21.667794
Training: 2022-01-16 12:53:21,018-[cfp_fp][100000]Accuracy-Flip: 0.99243+-0.00350
Training: 2022-01-16 12:53:21,018-[cfp_fp][100000]Accuracy-Highest: 0.99257
Training: 2022-01-16 12:53:49,284-[agedb_30][100000]XNorm: 23.007740
Training: 2022-01-16 12:53:49,285-[agedb_30][100000]Accuracy-Flip: 0.98083+-0.00638
Training: 2022-01-16 12:53:49,285-[agedb_30][100000]Accuracy-Highest: 0.98083
Training: 2022-01-16 12:53:57,034-Speed 853.01 samples/sec   Loss 1.5561   LearningRate 0.0012   Epoch: 19   Global Step: 100010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:54:04,745-Speed 10625.24 samples/sec   Loss 1.5865   LearningRate 0.0012   Epoch: 19   Global Step: 100020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:54:12,475-Speed 10599.23 samples/sec   Loss 1.5903   LearningRate 0.0012   Epoch: 19   Global Step: 100030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:54:20,225-Speed 10571.53 samples/sec   Loss 1.5743   LearningRate 0.0012   Epoch: 19   Global Step: 100040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:54:27,993-Speed 10547.99 samples/sec   Loss 1.5639   LearningRate 0.0011   Epoch: 19   Global Step: 100050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:54:35,782-Speed 10517.91 samples/sec   Loss 1.5509   LearningRate 0.0011   Epoch: 19   Global Step: 100060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:54:43,552-Speed 10545.72 samples/sec   Loss 1.5774   LearningRate 0.0011   Epoch: 19   Global Step: 100070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:54:51,307-Speed 10564.05 samples/sec   Loss 1.5535   LearningRate 0.0011   Epoch: 19   Global Step: 100080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:54:59,080-Speed 10541.25 samples/sec   Loss 1.5794   LearningRate 0.0011   Epoch: 19   Global Step: 100090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:55:06,833-Speed 10568.00 samples/sec   Loss 1.5663   LearningRate 0.0011   Epoch: 19   Global Step: 100100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:55:14,595-Speed 10554.39 samples/sec   Loss 1.5630   LearningRate 0.0011   Epoch: 19   Global Step: 100110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:55:22,373-Speed 10534.47 samples/sec   Loss 1.5778   LearningRate 0.0011   Epoch: 19   Global Step: 100120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:55:30,154-Speed 10530.73 samples/sec   Loss 1.5595   LearningRate 0.0011   Epoch: 19   Global Step: 100130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:55:37,918-Speed 10552.67 samples/sec   Loss 1.5760   LearningRate 0.0011   Epoch: 19   Global Step: 100140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:55:45,694-Speed 10537.02 samples/sec   Loss 1.5600   LearningRate 0.0011   Epoch: 19   Global Step: 100150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:55:53,455-Speed 10559.21 samples/sec   Loss 1.5730   LearningRate 0.0011   Epoch: 19   Global Step: 100160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:56:01,213-Speed 10560.58 samples/sec   Loss 1.5401   LearningRate 0.0011   Epoch: 19   Global Step: 100170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:56:09,054-Speed 10449.30 samples/sec   Loss 1.5560   LearningRate 0.0011   Epoch: 19   Global Step: 100180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:56:16,788-Speed 10594.34 samples/sec   Loss 1.5571   LearningRate 0.0011   Epoch: 19   Global Step: 100190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:56:24,527-Speed 10586.46 samples/sec   Loss 1.5768   LearningRate 0.0011   Epoch: 19   Global Step: 100200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:56:32,283-Speed 10562.91 samples/sec   Loss 1.5628   LearningRate 0.0011   Epoch: 19   Global Step: 100210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:56:40,053-Speed 10547.23 samples/sec   Loss 1.5582   LearningRate 0.0010   Epoch: 19   Global Step: 100220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:56:47,808-Speed 10565.44 samples/sec   Loss 1.5716   LearningRate 0.0010   Epoch: 19   Global Step: 100230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:56:55,562-Speed 10566.19 samples/sec   Loss 1.5617   LearningRate 0.0010   Epoch: 19   Global Step: 100240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:57:03,309-Speed 10575.33 samples/sec   Loss 1.5387   LearningRate 0.0010   Epoch: 19   Global Step: 100250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:57:11,054-Speed 10578.99 samples/sec   Loss 1.5609   LearningRate 0.0010   Epoch: 19   Global Step: 100260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:57:18,802-Speed 10574.73 samples/sec   Loss 1.5314   LearningRate 0.0010   Epoch: 19   Global Step: 100270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:57:26,574-Speed 10542.74 samples/sec   Loss 1.5797   LearningRate 0.0010   Epoch: 19   Global Step: 100280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:57:34,334-Speed 10557.28 samples/sec   Loss 1.5385   LearningRate 0.0010   Epoch: 19   Global Step: 100290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:57:42,109-Speed 10537.15 samples/sec   Loss 1.5486   LearningRate 0.0010   Epoch: 19   Global Step: 100300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:57:49,917-Speed 10493.92 samples/sec   Loss 1.5630   LearningRate 0.0010   Epoch: 19   Global Step: 100310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:57:57,712-Speed 10510.89 samples/sec   Loss 1.5460   LearningRate 0.0010   Epoch: 19   Global Step: 100320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:58:05,472-Speed 10556.78 samples/sec   Loss 1.5685   LearningRate 0.0010   Epoch: 19   Global Step: 100330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:58:13,248-Speed 10538.22 samples/sec   Loss 1.5718   LearningRate 0.0010   Epoch: 19   Global Step: 100340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 12:58:21,012-Speed 10557.28 samples/sec   Loss 1.5629   LearningRate 0.0010   Epoch: 19   Global Step: 100350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:58:28,775-Speed 10553.10 samples/sec   Loss 1.5355   LearningRate 0.0010   Epoch: 19   Global Step: 100360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:58:36,541-Speed 10549.56 samples/sec   Loss 1.5440   LearningRate 0.0010   Epoch: 19   Global Step: 100370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:58:44,327-Speed 10523.10 samples/sec   Loss 1.5714   LearningRate 0.0009   Epoch: 19   Global Step: 100380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:58:52,106-Speed 10532.43 samples/sec   Loss 1.5518   LearningRate 0.0009   Epoch: 19   Global Step: 100390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:58:59,882-Speed 10536.06 samples/sec   Loss 1.5693   LearningRate 0.0009   Epoch: 19   Global Step: 100400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:59:07,686-Speed 10498.46 samples/sec   Loss 1.5540   LearningRate 0.0009   Epoch: 19   Global Step: 100410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:59:15,450-Speed 10554.13 samples/sec   Loss 1.5583   LearningRate 0.0009   Epoch: 19   Global Step: 100420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:59:23,209-Speed 10558.18 samples/sec   Loss 1.5580   LearningRate 0.0009   Epoch: 19   Global Step: 100430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:59:30,974-Speed 10551.61 samples/sec   Loss 1.5391   LearningRate 0.0009   Epoch: 19   Global Step: 100440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:59:38,751-Speed 10535.43 samples/sec   Loss 1.5454   LearningRate 0.0009   Epoch: 19   Global Step: 100450   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-01-16 12:59:46,514-Speed 10554.71 samples/sec   Loss 1.5405   LearningRate 0.0009   Epoch: 19   Global Step: 100460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 12:59:54,366-Speed 10434.49 samples/sec   Loss 1.5513   LearningRate 0.0009   Epoch: 19   Global Step: 100470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:00:02,120-Speed 10566.09 samples/sec   Loss 1.5518   LearningRate 0.0009   Epoch: 19   Global Step: 100480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:00:09,890-Speed 10544.84 samples/sec   Loss 1.5346   LearningRate 0.0009   Epoch: 19   Global Step: 100490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 13:00:17,698-Speed 10492.60 samples/sec   Loss 1.5326   LearningRate 0.0009   Epoch: 19   Global Step: 100500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 13:00:25,472-Speed 10539.00 samples/sec   Loss 1.5187   LearningRate 0.0009   Epoch: 19   Global Step: 100510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 13:00:33,264-Speed 10515.57 samples/sec   Loss 1.5265   LearningRate 0.0009   Epoch: 19   Global Step: 100520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 13:00:41,034-Speed 10548.34 samples/sec   Loss 1.5420   LearningRate 0.0009   Epoch: 19   Global Step: 100530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 13:00:48,786-Speed 10571.84 samples/sec   Loss 1.5390   LearningRate 0.0009   Epoch: 19   Global Step: 100540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 13:00:56,554-Speed 10546.40 samples/sec   Loss 1.5517   LearningRate 0.0009   Epoch: 19   Global Step: 100550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 13:01:04,314-Speed 10558.75 samples/sec   Loss 1.5409   LearningRate 0.0008   Epoch: 19   Global Step: 100560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 13:01:12,107-Speed 10514.04 samples/sec   Loss 1.5628   LearningRate 0.0008   Epoch: 19   Global Step: 100570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 13:01:19,886-Speed 10533.37 samples/sec   Loss 1.5592   LearningRate 0.0008   Epoch: 19   Global Step: 100580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 13:01:27,677-Speed 10515.77 samples/sec   Loss 1.5325   LearningRate 0.0008   Epoch: 19   Global Step: 100590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:01:35,441-Speed 10555.26 samples/sec   Loss 1.5339   LearningRate 0.0008   Epoch: 19   Global Step: 100600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:01:43,229-Speed 10521.31 samples/sec   Loss 1.5361   LearningRate 0.0008   Epoch: 19   Global Step: 100610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:01:50,992-Speed 10553.92 samples/sec   Loss 1.5380   LearningRate 0.0008   Epoch: 19   Global Step: 100620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:01:58,753-Speed 10556.36 samples/sec   Loss 1.5555   LearningRate 0.0008   Epoch: 19   Global Step: 100630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:02:06,520-Speed 10548.91 samples/sec   Loss 1.5310   LearningRate 0.0008   Epoch: 19   Global Step: 100640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:02:14,285-Speed 10553.00 samples/sec   Loss 1.5451   LearningRate 0.0008   Epoch: 19   Global Step: 100650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:02:22,056-Speed 10544.54 samples/sec   Loss 1.5431   LearningRate 0.0008   Epoch: 19   Global Step: 100660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:02:29,829-Speed 10539.25 samples/sec   Loss 1.5467   LearningRate 0.0008   Epoch: 19   Global Step: 100670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:02:37,596-Speed 10549.81 samples/sec   Loss 1.5070   LearningRate 0.0008   Epoch: 19   Global Step: 100680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:02:45,416-Speed 10476.88 samples/sec   Loss 1.5411   LearningRate 0.0008   Epoch: 19   Global Step: 100690   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-01-16 13:02:53,200-Speed 10525.63 samples/sec   Loss 1.5589   LearningRate 0.0008   Epoch: 19   Global Step: 100700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:03:00,990-Speed 10516.43 samples/sec   Loss 1.5524   LearningRate 0.0008   Epoch: 19   Global Step: 100710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:03:08,763-Speed 10541.04 samples/sec   Loss 1.5470   LearningRate 0.0008   Epoch: 19   Global Step: 100720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:03:16,568-Speed 10497.82 samples/sec   Loss 1.5309   LearningRate 0.0008   Epoch: 19   Global Step: 100730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:03:24,362-Speed 10511.30 samples/sec   Loss 1.5439   LearningRate 0.0008   Epoch: 19   Global Step: 100740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:03:32,140-Speed 10534.35 samples/sec   Loss 1.5482   LearningRate 0.0007   Epoch: 19   Global Step: 100750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:03:39,924-Speed 10525.65 samples/sec   Loss 1.5387   LearningRate 0.0007   Epoch: 19   Global Step: 100760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:03:47,718-Speed 10511.04 samples/sec   Loss 1.5367   LearningRate 0.0007   Epoch: 19   Global Step: 100770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:03:55,498-Speed 10535.01 samples/sec   Loss 1.5298   LearningRate 0.0007   Epoch: 19   Global Step: 100780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 13:04:03,264-Speed 10549.45 samples/sec   Loss 1.5298   LearningRate 0.0007   Epoch: 19   Global Step: 100790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 13:04:11,065-Speed 10502.75 samples/sec   Loss 1.5311   LearningRate 0.0007   Epoch: 19   Global Step: 100800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 13:04:18,852-Speed 10528.86 samples/sec   Loss 1.5300   LearningRate 0.0007   Epoch: 19   Global Step: 100810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 13:04:26,629-Speed 10538.51 samples/sec   Loss 1.5245   LearningRate 0.0007   Epoch: 19   Global Step: 100820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 13:04:34,423-Speed 10512.80 samples/sec   Loss 1.5366   LearningRate 0.0007   Epoch: 19   Global Step: 100830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 13:04:42,209-Speed 10522.31 samples/sec   Loss 1.5493   LearningRate 0.0007   Epoch: 19   Global Step: 100840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 13:04:50,007-Speed 10507.77 samples/sec   Loss 1.5266   LearningRate 0.0007   Epoch: 19   Global Step: 100850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 13:04:57,772-Speed 10551.47 samples/sec   Loss 1.5282   LearningRate 0.0007   Epoch: 19   Global Step: 100860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 13:05:05,567-Speed 10510.61 samples/sec   Loss 1.5468   LearningRate 0.0007   Epoch: 19   Global Step: 100870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 13:05:13,347-Speed 10531.51 samples/sec   Loss 1.5377   LearningRate 0.0007   Epoch: 19   Global Step: 100880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:05:21,138-Speed 10516.12 samples/sec   Loss 1.5296   LearningRate 0.0007   Epoch: 19   Global Step: 100890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:05:28,938-Speed 10503.47 samples/sec   Loss 1.5398   LearningRate 0.0007   Epoch: 19   Global Step: 100900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:05:36,714-Speed 10536.65 samples/sec   Loss 1.5487   LearningRate 0.0007   Epoch: 19   Global Step: 100910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:05:44,501-Speed 10523.52 samples/sec   Loss 1.5322   LearningRate 0.0007   Epoch: 19   Global Step: 100920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:05:52,300-Speed 10505.53 samples/sec   Loss 1.5351   LearningRate 0.0007   Epoch: 19   Global Step: 100930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:06:00,067-Speed 10548.73 samples/sec   Loss 1.5424   LearningRate 0.0007   Epoch: 19   Global Step: 100940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:06:07,842-Speed 10538.16 samples/sec   Loss 1.5456   LearningRate 0.0006   Epoch: 19   Global Step: 100950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:06:15,617-Speed 10537.68 samples/sec   Loss 1.5339   LearningRate 0.0006   Epoch: 19   Global Step: 100960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:06:23,398-Speed 10528.91 samples/sec   Loss 1.5279   LearningRate 0.0006   Epoch: 19   Global Step: 100970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:06:31,197-Speed 10506.53 samples/sec   Loss 1.5271   LearningRate 0.0006   Epoch: 19   Global Step: 100980   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-01-16 13:06:38,978-Speed 10529.16 samples/sec   Loss 1.5333   LearningRate 0.0006   Epoch: 19   Global Step: 100990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:06:46,757-Speed 10532.58 samples/sec   Loss 1.5106   LearningRate 0.0006   Epoch: 19   Global Step: 101000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:06:54,547-Speed 10516.53 samples/sec   Loss 1.5399   LearningRate 0.0006   Epoch: 19   Global Step: 101010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:07:02,338-Speed 10517.08 samples/sec   Loss 1.5282   LearningRate 0.0006   Epoch: 19   Global Step: 101020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:07:10,143-Speed 10496.05 samples/sec   Loss 1.5266   LearningRate 0.0006   Epoch: 19   Global Step: 101030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:07:17,929-Speed 10523.17 samples/sec   Loss 1.5167   LearningRate 0.0006   Epoch: 19   Global Step: 101040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:07:25,771-Speed 10448.56 samples/sec   Loss 1.5102   LearningRate 0.0006   Epoch: 19   Global Step: 101050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:07:33,540-Speed 10546.19 samples/sec   Loss 1.5279   LearningRate 0.0006   Epoch: 19   Global Step: 101060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:07:41,328-Speed 10518.87 samples/sec   Loss 1.5323   LearningRate 0.0006   Epoch: 19   Global Step: 101070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:07:49,114-Speed 10523.84 samples/sec   Loss 1.5328   LearningRate 0.0006   Epoch: 19   Global Step: 101080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:07:56,936-Speed 10475.28 samples/sec   Loss 1.5426   LearningRate 0.0006   Epoch: 19   Global Step: 101090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:08:04,719-Speed 10525.45 samples/sec   Loss 1.5487   LearningRate 0.0006   Epoch: 19   Global Step: 101100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:08:12,593-Speed 10406.49 samples/sec   Loss 1.5500   LearningRate 0.0006   Epoch: 19   Global Step: 101110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:08:20,377-Speed 10524.98 samples/sec   Loss 1.5531   LearningRate 0.0006   Epoch: 19   Global Step: 101120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:08:28,222-Speed 10444.41 samples/sec   Loss 1.5195   LearningRate 0.0006   Epoch: 19   Global Step: 101130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:08:35,997-Speed 10537.55 samples/sec   Loss 1.5223   LearningRate 0.0006   Epoch: 19   Global Step: 101140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:08:43,809-Speed 10488.49 samples/sec   Loss 1.5293   LearningRate 0.0006   Epoch: 19   Global Step: 101150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:08:51,571-Speed 10555.87 samples/sec   Loss 1.5147   LearningRate 0.0006   Epoch: 19   Global Step: 101160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:08:59,362-Speed 10521.36 samples/sec   Loss 1.5261   LearningRate 0.0005   Epoch: 19   Global Step: 101170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:09:07,174-Speed 10487.92 samples/sec   Loss 1.5155   LearningRate 0.0005   Epoch: 19   Global Step: 101180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:09:15,009-Speed 10458.67 samples/sec   Loss 1.5114   LearningRate 0.0005   Epoch: 19   Global Step: 101190   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-01-16 13:09:22,790-Speed 10529.44 samples/sec   Loss 1.5196   LearningRate 0.0005   Epoch: 19   Global Step: 101200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:09:30,559-Speed 10546.05 samples/sec   Loss 1.5277   LearningRate 0.0005   Epoch: 19   Global Step: 101210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:09:38,332-Speed 10540.87 samples/sec   Loss 1.5190   LearningRate 0.0005   Epoch: 19   Global Step: 101220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 13:09:46,139-Speed 10493.75 samples/sec   Loss 1.5085   LearningRate 0.0005   Epoch: 19   Global Step: 101230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 13:09:53,929-Speed 10517.72 samples/sec   Loss 1.5042   LearningRate 0.0005   Epoch: 19   Global Step: 101240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 13:10:01,720-Speed 10516.69 samples/sec   Loss 1.5467   LearningRate 0.0005   Epoch: 19   Global Step: 101250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 13:10:09,544-Speed 10472.27 samples/sec   Loss 1.5127   LearningRate 0.0005   Epoch: 19   Global Step: 101260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 13:10:17,345-Speed 10503.30 samples/sec   Loss 1.5314   LearningRate 0.0005   Epoch: 19   Global Step: 101270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 13:10:25,140-Speed 10510.70 samples/sec   Loss 1.5342   LearningRate 0.0005   Epoch: 19   Global Step: 101280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 13:10:32,936-Speed 10509.95 samples/sec   Loss 1.5303   LearningRate 0.0005   Epoch: 19   Global Step: 101290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 13:10:40,731-Speed 10511.09 samples/sec   Loss 1.5133   LearningRate 0.0005   Epoch: 19   Global Step: 101300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 13:10:48,513-Speed 10528.27 samples/sec   Loss 1.5244   LearningRate 0.0005   Epoch: 19   Global Step: 101310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 13:10:56,338-Speed 10470.13 samples/sec   Loss 1.5378   LearningRate 0.0005   Epoch: 19   Global Step: 101320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:11:04,150-Speed 10488.04 samples/sec   Loss 1.5226   LearningRate 0.0005   Epoch: 19   Global Step: 101330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:11:11,936-Speed 10524.00 samples/sec   Loss 1.5371   LearningRate 0.0005   Epoch: 19   Global Step: 101340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-16 13:11:19,724-Speed 10519.18 samples/sec   Loss 1.5102   LearningRate 0.0005   Epoch: 19   Global Step: 101350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 13:11:27,514-Speed 10517.83 samples/sec   Loss 1.5049   LearningRate 0.0005   Epoch: 19   Global Step: 101360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 13:11:35,323-Speed 10492.85 samples/sec   Loss 1.4977   LearningRate 0.0005   Epoch: 19   Global Step: 101370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 13:11:43,140-Speed 10480.97 samples/sec   Loss 1.5183   LearningRate 0.0005   Epoch: 19   Global Step: 101380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 13:11:50,938-Speed 10506.99 samples/sec   Loss 1.5084   LearningRate 0.0005   Epoch: 19   Global Step: 101390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 13:11:58,732-Speed 10511.73 samples/sec   Loss 1.5176   LearningRate 0.0005   Epoch: 19   Global Step: 101400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-16 13:12:06,511-Speed 10532.27 samples/sec   Loss 1.5149   LearningRate 0.0004   Epoch: 19   Global Step: 101410   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:12:14,297-Speed 10526.42 samples/sec   Loss 1.5279   LearningRate 0.0004   Epoch: 19   Global Step: 101420   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:12:22,076-Speed 10532.43 samples/sec   Loss 1.5397   LearningRate 0.0004   Epoch: 19   Global Step: 101430   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:12:29,883-Speed 10495.80 samples/sec   Loss 1.5200   LearningRate 0.0004   Epoch: 19   Global Step: 101440   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:12:37,669-Speed 10522.46 samples/sec   Loss 1.5155   LearningRate 0.0004   Epoch: 19   Global Step: 101450   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:12:45,472-Speed 10500.77 samples/sec   Loss 1.5067   LearningRate 0.0004   Epoch: 19   Global Step: 101460   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:12:53,246-Speed 10541.73 samples/sec   Loss 1.5250   LearningRate 0.0004   Epoch: 19   Global Step: 101470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:13:01,087-Speed 10449.71 samples/sec   Loss 1.5178   LearningRate 0.0004   Epoch: 19   Global Step: 101480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:13:08,875-Speed 10520.11 samples/sec   Loss 1.4996   LearningRate 0.0004   Epoch: 19   Global Step: 101490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:13:16,674-Speed 10507.01 samples/sec   Loss 1.5155   LearningRate 0.0004   Epoch: 19   Global Step: 101500   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:13:24,445-Speed 10542.73 samples/sec   Loss 1.5188   LearningRate 0.0004   Epoch: 19   Global Step: 101510   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:13:32,240-Speed 10510.93 samples/sec   Loss 1.5225   LearningRate 0.0004   Epoch: 19   Global Step: 101520   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:13:40,024-Speed 10525.22 samples/sec   Loss 1.5257   LearningRate 0.0004   Epoch: 19   Global Step: 101530   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:13:47,798-Speed 10540.12 samples/sec   Loss 1.5060   LearningRate 0.0004   Epoch: 19   Global Step: 101540   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:13:55,584-Speed 10521.97 samples/sec   Loss 1.5180   LearningRate 0.0004   Epoch: 19   Global Step: 101550   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:14:03,356-Speed 10541.50 samples/sec   Loss 1.5231   LearningRate 0.0004   Epoch: 19   Global Step: 101560   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:14:11,185-Speed 10465.47 samples/sec   Loss 1.4838   LearningRate 0.0004   Epoch: 19   Global Step: 101570   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:14:18,992-Speed 10493.86 samples/sec   Loss 1.5065   LearningRate 0.0004   Epoch: 19   Global Step: 101580   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:14:26,779-Speed 10522.82 samples/sec   Loss 1.5294   LearningRate 0.0004   Epoch: 19   Global Step: 101590   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:14:34,548-Speed 10545.12 samples/sec   Loss 1.4758   LearningRate 0.0004   Epoch: 19   Global Step: 101600   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:14:42,359-Speed 10488.19 samples/sec   Loss 1.5095   LearningRate 0.0004   Epoch: 19   Global Step: 101610   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:14:50,168-Speed 10492.86 samples/sec   Loss 1.5097   LearningRate 0.0004   Epoch: 19   Global Step: 101620   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:14:57,942-Speed 10538.72 samples/sec   Loss 1.5237   LearningRate 0.0004   Epoch: 19   Global Step: 101630   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-16 13:15:05,738-Speed 10511.06 samples/sec   Loss 1.5161   LearningRate 0.0004   Epoch: 19   Global Step: 101640   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-16 13:15:13,533-Speed 10510.83 samples/sec   Loss 1.5157   LearningRate 0.0004   Epoch: 19   Global Step: 101650   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-16 13:15:21,328-Speed 10511.28 samples/sec   Loss 1.5026   LearningRate 0.0004   Epoch: 19   Global Step: 101660   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-16 13:15:29,118-Speed 10516.79 samples/sec   Loss 1.5198   LearningRate 0.0004   Epoch: 19   Global Step: 101670   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-16 13:15:36,904-Speed 10523.87 samples/sec   Loss 1.5163   LearningRate 0.0003   Epoch: 19   Global Step: 101680   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-16 13:15:44,673-Speed 10545.04 samples/sec   Loss 1.5004   LearningRate 0.0003   Epoch: 19   Global Step: 101690   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-16 13:15:52,511-Speed 10454.91 samples/sec   Loss 1.5174   LearningRate 0.0003   Epoch: 19   Global Step: 101700   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-16 13:16:00,292-Speed 10528.14 samples/sec   Loss 1.5016   LearningRate 0.0003   Epoch: 19   Global Step: 101710   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-16 13:16:08,078-Speed 10527.19 samples/sec   Loss 1.5084   LearningRate 0.0003   Epoch: 19   Global Step: 101720   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-16 13:16:15,870-Speed 10514.96 samples/sec   Loss 1.5016   LearningRate 0.0003   Epoch: 19   Global Step: 101730   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:16:23,641-Speed 10544.82 samples/sec   Loss 1.4833   LearningRate 0.0003   Epoch: 19   Global Step: 101740   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:16:31,429-Speed 10519.37 samples/sec   Loss 1.4877   LearningRate 0.0003   Epoch: 19   Global Step: 101750   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:16:39,226-Speed 10508.40 samples/sec   Loss 1.5301   LearningRate 0.0003   Epoch: 19   Global Step: 101760   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:16:47,014-Speed 10519.80 samples/sec   Loss 1.5126   LearningRate 0.0003   Epoch: 19   Global Step: 101770   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:16:54,797-Speed 10527.03 samples/sec   Loss 1.5062   LearningRate 0.0003   Epoch: 19   Global Step: 101780   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:17:02,598-Speed 10502.93 samples/sec   Loss 1.5015   LearningRate 0.0003   Epoch: 19   Global Step: 101790   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:17:10,419-Speed 10475.38 samples/sec   Loss 1.5074   LearningRate 0.0003   Epoch: 19   Global Step: 101800   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:17:18,200-Speed 10529.46 samples/sec   Loss 1.5185   LearningRate 0.0003   Epoch: 19   Global Step: 101810   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:17:25,969-Speed 10545.22 samples/sec   Loss 1.5141   LearningRate 0.0003   Epoch: 19   Global Step: 101820   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:17:33,756-Speed 10521.94 samples/sec   Loss 1.4962   LearningRate 0.0003   Epoch: 19   Global Step: 101830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:17:41,525-Speed 10546.48 samples/sec   Loss 1.5270   LearningRate 0.0003   Epoch: 19   Global Step: 101840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:17:49,335-Speed 10490.17 samples/sec   Loss 1.5045   LearningRate 0.0003   Epoch: 19   Global Step: 101850   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:17:57,128-Speed 10514.51 samples/sec   Loss 1.5020   LearningRate 0.0003   Epoch: 19   Global Step: 101860   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:18:04,937-Speed 10491.57 samples/sec   Loss 1.4920   LearningRate 0.0003   Epoch: 19   Global Step: 101870   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:18:12,749-Speed 10488.97 samples/sec   Loss 1.4940   LearningRate 0.0003   Epoch: 19   Global Step: 101880   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:18:20,563-Speed 10485.66 samples/sec   Loss 1.5025   LearningRate 0.0003   Epoch: 19   Global Step: 101890   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:18:28,359-Speed 10510.05 samples/sec   Loss 1.5034   LearningRate 0.0003   Epoch: 19   Global Step: 101900   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:18:36,169-Speed 10490.40 samples/sec   Loss 1.5205   LearningRate 0.0003   Epoch: 19   Global Step: 101910   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:18:43,959-Speed 10518.50 samples/sec   Loss 1.4964   LearningRate 0.0003   Epoch: 19   Global Step: 101920   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:18:51,746-Speed 10529.51 samples/sec   Loss 1.5049   LearningRate 0.0003   Epoch: 19   Global Step: 101930   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:18:59,569-Speed 10473.11 samples/sec   Loss 1.5227   LearningRate 0.0003   Epoch: 19   Global Step: 101940   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:19:07,383-Speed 10485.37 samples/sec   Loss 1.4834   LearningRate 0.0003   Epoch: 19   Global Step: 101950   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:19:15,201-Speed 10479.48 samples/sec   Loss 1.5090   LearningRate 0.0003   Epoch: 19   Global Step: 101960   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:19:23,001-Speed 10504.70 samples/sec   Loss 1.4971   LearningRate 0.0003   Epoch: 19   Global Step: 101970   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:19:30,835-Speed 10458.51 samples/sec   Loss 1.5050   LearningRate 0.0003   Epoch: 19   Global Step: 101980   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:19:38,686-Speed 10434.56 samples/sec   Loss 1.5329   LearningRate 0.0002   Epoch: 19   Global Step: 101990   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:19:46,477-Speed 10515.94 samples/sec   Loss 1.4883   LearningRate 0.0002   Epoch: 19   Global Step: 102000   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:19:54,243-Speed 10554.52 samples/sec   Loss 1.5003   LearningRate 0.0002   Epoch: 19   Global Step: 102010   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:20:02,043-Speed 10504.02 samples/sec   Loss 1.4985   LearningRate 0.0002   Epoch: 19   Global Step: 102020   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:20:09,855-Speed 10487.16 samples/sec   Loss 1.4951   LearningRate 0.0002   Epoch: 19   Global Step: 102030   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:20:17,644-Speed 10522.37 samples/sec   Loss 1.5103   LearningRate 0.0002   Epoch: 19   Global Step: 102040   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:20:25,424-Speed 10533.13 samples/sec   Loss 1.4928   LearningRate 0.0002   Epoch: 19   Global Step: 102050   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:20:33,217-Speed 10513.55 samples/sec   Loss 1.5239   LearningRate 0.0002   Epoch: 19   Global Step: 102060   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:20:41,005-Speed 10520.90 samples/sec   Loss 1.4869   LearningRate 0.0002   Epoch: 19   Global Step: 102070   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:20:48,795-Speed 10516.96 samples/sec   Loss 1.5033   LearningRate 0.0002   Epoch: 19   Global Step: 102080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:20:56,617-Speed 10473.42 samples/sec   Loss 1.5108   LearningRate 0.0002   Epoch: 19   Global Step: 102090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:21:04,429-Speed 10492.98 samples/sec   Loss 1.4929   LearningRate 0.0002   Epoch: 19   Global Step: 102100   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:21:12,237-Speed 10493.64 samples/sec   Loss 1.4946   LearningRate 0.0002   Epoch: 19   Global Step: 102110   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:21:20,059-Speed 10473.05 samples/sec   Loss 1.5218   LearningRate 0.0002   Epoch: 19   Global Step: 102120   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:21:27,863-Speed 10501.60 samples/sec   Loss 1.5076   LearningRate 0.0002   Epoch: 19   Global Step: 102130   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:21:35,703-Speed 10453.66 samples/sec   Loss 1.5014   LearningRate 0.0002   Epoch: 19   Global Step: 102140   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:21:43,516-Speed 10487.40 samples/sec   Loss 1.4876   LearningRate 0.0002   Epoch: 19   Global Step: 102150   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:21:51,339-Speed 10472.68 samples/sec   Loss 1.4948   LearningRate 0.0002   Epoch: 19   Global Step: 102160   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:21:59,143-Speed 10498.90 samples/sec   Loss 1.4881   LearningRate 0.0002   Epoch: 19   Global Step: 102170   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:22:06,940-Speed 10507.58 samples/sec   Loss 1.4732   LearningRate 0.0002   Epoch: 19   Global Step: 102180   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-01-16 13:22:14,717-Speed 10535.06 samples/sec   Loss 1.5031   LearningRate 0.0002   Epoch: 19   Global Step: 102190   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:22:22,488-Speed 10543.30 samples/sec   Loss 1.4902   LearningRate 0.0002   Epoch: 19   Global Step: 102200   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:22:30,256-Speed 10547.14 samples/sec   Loss 1.4932   LearningRate 0.0002   Epoch: 19   Global Step: 102210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:22:38,087-Speed 10469.33 samples/sec   Loss 1.4888   LearningRate 0.0002   Epoch: 19   Global Step: 102220   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:22:45,893-Speed 10495.88 samples/sec   Loss 1.4972   LearningRate 0.0002   Epoch: 19   Global Step: 102230   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:22:53,692-Speed 10505.54 samples/sec   Loss 1.5006   LearningRate 0.0002   Epoch: 19   Global Step: 102240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:23:01,495-Speed 10500.71 samples/sec   Loss 1.4736   LearningRate 0.0002   Epoch: 19   Global Step: 102250   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:23:09,291-Speed 10508.72 samples/sec   Loss 1.5021   LearningRate 0.0002   Epoch: 19   Global Step: 102260   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:23:17,098-Speed 10495.99 samples/sec   Loss 1.5076   LearningRate 0.0002   Epoch: 19   Global Step: 102270   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:23:24,888-Speed 10516.71 samples/sec   Loss 1.5011   LearningRate 0.0002   Epoch: 19   Global Step: 102280   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:23:32,668-Speed 10530.77 samples/sec   Loss 1.4962   LearningRate 0.0002   Epoch: 19   Global Step: 102290   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:23:40,466-Speed 10507.38 samples/sec   Loss 1.4766   LearningRate 0.0002   Epoch: 19   Global Step: 102300   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:23:48,297-Speed 10461.72 samples/sec   Loss 1.4812   LearningRate 0.0002   Epoch: 19   Global Step: 102310   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:23:56,095-Speed 10507.30 samples/sec   Loss 1.5075   LearningRate 0.0002   Epoch: 19   Global Step: 102320   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:24:03,890-Speed 10509.23 samples/sec   Loss 1.4907   LearningRate 0.0002   Epoch: 19   Global Step: 102330   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:24:11,683-Speed 10513.71 samples/sec   Loss 1.5069   LearningRate 0.0002   Epoch: 19   Global Step: 102340   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:24:19,475-Speed 10515.37 samples/sec   Loss 1.4961   LearningRate 0.0002   Epoch: 19   Global Step: 102350   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:24:27,253-Speed 10533.64 samples/sec   Loss 1.4873   LearningRate 0.0002   Epoch: 19   Global Step: 102360   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:24:35,051-Speed 10507.16 samples/sec   Loss 1.4886   LearningRate 0.0001   Epoch: 19   Global Step: 102370   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:24:42,827-Speed 10536.48 samples/sec   Loss 1.4907   LearningRate 0.0001   Epoch: 19   Global Step: 102380   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:24:50,636-Speed 10492.34 samples/sec   Loss 1.4761   LearningRate 0.0001   Epoch: 19   Global Step: 102390   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:24:58,424-Speed 10520.47 samples/sec   Loss 1.4914   LearningRate 0.0001   Epoch: 19   Global Step: 102400   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:25:06,210-Speed 10523.42 samples/sec   Loss 1.4980   LearningRate 0.0001   Epoch: 19   Global Step: 102410   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:25:13,986-Speed 10537.43 samples/sec   Loss 1.4915   LearningRate 0.0001   Epoch: 19   Global Step: 102420   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:25:21,760-Speed 10538.07 samples/sec   Loss 1.4806   LearningRate 0.0001   Epoch: 19   Global Step: 102430   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:25:29,545-Speed 10525.17 samples/sec   Loss 1.5069   LearningRate 0.0001   Epoch: 19   Global Step: 102440   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:25:37,316-Speed 10543.17 samples/sec   Loss 1.4951   LearningRate 0.0001   Epoch: 19   Global Step: 102450   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:25:45,108-Speed 10513.51 samples/sec   Loss 1.4956   LearningRate 0.0001   Epoch: 19   Global Step: 102460   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:25:52,915-Speed 10495.28 samples/sec   Loss 1.4929   LearningRate 0.0001   Epoch: 19   Global Step: 102470   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:26:00,724-Speed 10491.43 samples/sec   Loss 1.4951   LearningRate 0.0001   Epoch: 19   Global Step: 102480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:26:08,570-Speed 10443.33 samples/sec   Loss 1.4822   LearningRate 0.0001   Epoch: 19   Global Step: 102490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:26:16,366-Speed 10508.80 samples/sec   Loss 1.5029   LearningRate 0.0001   Epoch: 19   Global Step: 102500   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:26:24,160-Speed 10511.98 samples/sec   Loss 1.5059   LearningRate 0.0001   Epoch: 19   Global Step: 102510   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:26:31,953-Speed 10515.11 samples/sec   Loss 1.4899   LearningRate 0.0001   Epoch: 19   Global Step: 102520   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:26:39,727-Speed 10538.20 samples/sec   Loss 1.4880   LearningRate 0.0001   Epoch: 19   Global Step: 102530   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:26:47,514-Speed 10521.81 samples/sec   Loss 1.5000   LearningRate 0.0001   Epoch: 19   Global Step: 102540   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:26:55,304-Speed 10522.13 samples/sec   Loss 1.5015   LearningRate 0.0001   Epoch: 19   Global Step: 102550   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:27:03,091-Speed 10521.60 samples/sec   Loss 1.4851   LearningRate 0.0001   Epoch: 19   Global Step: 102560   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:27:10,890-Speed 10505.59 samples/sec   Loss 1.5006   LearningRate 0.0001   Epoch: 19   Global Step: 102570   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:27:18,678-Speed 10519.62 samples/sec   Loss 1.4834   LearningRate 0.0001   Epoch: 19   Global Step: 102580   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:27:26,467-Speed 10518.95 samples/sec   Loss 1.4933   LearningRate 0.0001   Epoch: 19   Global Step: 102590   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:27:34,234-Speed 10551.42 samples/sec   Loss 1.4767   LearningRate 0.0001   Epoch: 19   Global Step: 102600   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:27:42,042-Speed 10493.95 samples/sec   Loss 1.4827   LearningRate 0.0001   Epoch: 19   Global Step: 102610   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:27:49,830-Speed 10519.37 samples/sec   Loss 1.4824   LearningRate 0.0001   Epoch: 19   Global Step: 102620   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:27:57,618-Speed 10521.73 samples/sec   Loss 1.4874   LearningRate 0.0001   Epoch: 19   Global Step: 102630   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:28:05,418-Speed 10503.11 samples/sec   Loss 1.4814   LearningRate 0.0001   Epoch: 19   Global Step: 102640   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:28:13,216-Speed 10506.61 samples/sec   Loss 1.4965   LearningRate 0.0001   Epoch: 19   Global Step: 102650   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:28:21,072-Speed 10429.19 samples/sec   Loss 1.4758   LearningRate 0.0001   Epoch: 19   Global Step: 102660   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:28:28,878-Speed 10496.23 samples/sec   Loss 1.4749   LearningRate 0.0001   Epoch: 19   Global Step: 102670   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:28:36,694-Speed 10483.56 samples/sec   Loss 1.4859   LearningRate 0.0001   Epoch: 19   Global Step: 102680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:28:44,493-Speed 10503.74 samples/sec   Loss 1.4965   LearningRate 0.0001   Epoch: 19   Global Step: 102690   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:28:52,293-Speed 10504.36 samples/sec   Loss 1.4879   LearningRate 0.0001   Epoch: 19   Global Step: 102700   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:29:00,094-Speed 10502.60 samples/sec   Loss 1.4827   LearningRate 0.0001   Epoch: 19   Global Step: 102710   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:29:07,875-Speed 10530.53 samples/sec   Loss 1.4824   LearningRate 0.0001   Epoch: 19   Global Step: 102720   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:29:15,673-Speed 10505.70 samples/sec   Loss 1.4867   LearningRate 0.0001   Epoch: 19   Global Step: 102730   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:29:23,452-Speed 10533.16 samples/sec   Loss 1.4957   LearningRate 0.0001   Epoch: 19   Global Step: 102740   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:29:31,240-Speed 10520.45 samples/sec   Loss 1.4750   LearningRate 0.0001   Epoch: 19   Global Step: 102750   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:29:39,024-Speed 10525.75 samples/sec   Loss 1.4922   LearningRate 0.0001   Epoch: 19   Global Step: 102760   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:29:46,814-Speed 10517.04 samples/sec   Loss 1.4826   LearningRate 0.0001   Epoch: 19   Global Step: 102770   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:29:54,612-Speed 10506.37 samples/sec   Loss 1.4974   LearningRate 0.0001   Epoch: 19   Global Step: 102780   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:30:02,399-Speed 10521.83 samples/sec   Loss 1.4940   LearningRate 0.0001   Epoch: 19   Global Step: 102790   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:30:10,189-Speed 10516.92 samples/sec   Loss 1.4993   LearningRate 0.0001   Epoch: 19   Global Step: 102800   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:30:17,982-Speed 10512.63 samples/sec   Loss 1.4921   LearningRate 0.0001   Epoch: 19   Global Step: 102810   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:30:25,786-Speed 10499.49 samples/sec   Loss 1.4786   LearningRate 0.0001   Epoch: 19   Global Step: 102820   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:30:33,582-Speed 10509.13 samples/sec   Loss 1.4767   LearningRate 0.0001   Epoch: 19   Global Step: 102830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:30:41,364-Speed 10527.73 samples/sec   Loss 1.4765   LearningRate 0.0001   Epoch: 19   Global Step: 102840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:30:49,139-Speed 10538.80 samples/sec   Loss 1.4866   LearningRate 0.0001   Epoch: 19   Global Step: 102850   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-01-16 13:30:56,920-Speed 10529.75 samples/sec   Loss 1.4921   LearningRate 0.0001   Epoch: 19   Global Step: 102860   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:31:04,758-Speed 10455.50 samples/sec   Loss 1.4790   LearningRate 0.0001   Epoch: 19   Global Step: 102870   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:31:12,547-Speed 10519.45 samples/sec   Loss 1.4968   LearningRate 0.0001   Epoch: 19   Global Step: 102880   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:31:20,347-Speed 10505.07 samples/sec   Loss 1.4966   LearningRate 0.0001   Epoch: 19   Global Step: 102890   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:31:28,138-Speed 10515.25 samples/sec   Loss 1.4854   LearningRate 0.0001   Epoch: 19   Global Step: 102900   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:31:35,924-Speed 10523.47 samples/sec   Loss 1.4895   LearningRate 0.0001   Epoch: 19   Global Step: 102910   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:31:43,704-Speed 10533.31 samples/sec   Loss 1.4908   LearningRate 0.0001   Epoch: 19   Global Step: 102920   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:31:51,492-Speed 10519.73 samples/sec   Loss 1.4898   LearningRate 0.0000   Epoch: 19   Global Step: 102930   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:31:59,284-Speed 10516.48 samples/sec   Loss 1.4835   LearningRate 0.0000   Epoch: 19   Global Step: 102940   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:32:07,058-Speed 10539.47 samples/sec   Loss 1.5050   LearningRate 0.0000   Epoch: 19   Global Step: 102950   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:32:14,847-Speed 10518.99 samples/sec   Loss 1.4818   LearningRate 0.0000   Epoch: 19   Global Step: 102960   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:32:22,660-Speed 10487.19 samples/sec   Loss 1.4898   LearningRate 0.0000   Epoch: 19   Global Step: 102970   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:32:30,465-Speed 10497.03 samples/sec   Loss 1.4967   LearningRate 0.0000   Epoch: 19   Global Step: 102980   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:32:38,253-Speed 10521.69 samples/sec   Loss 1.4994   LearningRate 0.0000   Epoch: 19   Global Step: 102990   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:32:46,040-Speed 10520.37 samples/sec   Loss 1.4724   LearningRate 0.0000   Epoch: 19   Global Step: 103000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:32:53,829-Speed 10523.51 samples/sec   Loss 1.4843   LearningRate 0.0000   Epoch: 19   Global Step: 103010   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:33:01,628-Speed 10505.20 samples/sec   Loss 1.4762   LearningRate 0.0000   Epoch: 19   Global Step: 103020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:33:09,413-Speed 10524.15 samples/sec   Loss 1.4677   LearningRate 0.0000   Epoch: 19   Global Step: 103030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:33:17,195-Speed 10528.46 samples/sec   Loss 1.4792   LearningRate 0.0000   Epoch: 19   Global Step: 103040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:33:25,021-Speed 10470.35 samples/sec   Loss 1.4755   LearningRate 0.0000   Epoch: 19   Global Step: 103050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:33:32,809-Speed 10519.88 samples/sec   Loss 1.4875   LearningRate 0.0000   Epoch: 19   Global Step: 103060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:33:40,606-Speed 10508.78 samples/sec   Loss 1.5104   LearningRate 0.0000   Epoch: 19   Global Step: 103070   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:33:48,430-Speed 10471.21 samples/sec   Loss 1.5020   LearningRate 0.0000   Epoch: 19   Global Step: 103080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:33:56,246-Speed 10482.87 samples/sec   Loss 1.5014   LearningRate 0.0000   Epoch: 19   Global Step: 103090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:34:04,053-Speed 10495.38 samples/sec   Loss 1.4766   LearningRate 0.0000   Epoch: 19   Global Step: 103100   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:34:11,842-Speed 10519.46 samples/sec   Loss 1.4781   LearningRate 0.0000   Epoch: 19   Global Step: 103110   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:34:19,624-Speed 10527.72 samples/sec   Loss 1.4950   LearningRate 0.0000   Epoch: 19   Global Step: 103120   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:34:27,427-Speed 10500.68 samples/sec   Loss 1.4790   LearningRate 0.0000   Epoch: 19   Global Step: 103130   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:34:35,245-Speed 10479.61 samples/sec   Loss 1.4812   LearningRate 0.0000   Epoch: 19   Global Step: 103140   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:34:43,049-Speed 10498.98 samples/sec   Loss 1.4776   LearningRate 0.0000   Epoch: 19   Global Step: 103150   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:34:50,872-Speed 10473.53 samples/sec   Loss 1.4862   LearningRate 0.0000   Epoch: 19   Global Step: 103160   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:34:58,697-Speed 10471.22 samples/sec   Loss 1.4803   LearningRate 0.0000   Epoch: 19   Global Step: 103170   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:35:06,525-Speed 10466.16 samples/sec   Loss 1.4822   LearningRate 0.0000   Epoch: 19   Global Step: 103180   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:35:14,322-Speed 10508.60 samples/sec   Loss 1.4830   LearningRate 0.0000   Epoch: 19   Global Step: 103190   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:35:22,112-Speed 10517.35 samples/sec   Loss 1.4935   LearningRate 0.0000   Epoch: 19   Global Step: 103200   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:35:29,930-Speed 10480.51 samples/sec   Loss 1.4564   LearningRate 0.0000   Epoch: 19   Global Step: 103210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:35:37,744-Speed 10484.23 samples/sec   Loss 1.4787   LearningRate 0.0000   Epoch: 19   Global Step: 103220   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:35:45,541-Speed 10511.45 samples/sec   Loss 1.4762   LearningRate 0.0000   Epoch: 19   Global Step: 103230   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:35:53,349-Speed 10497.30 samples/sec   Loss 1.4901   LearningRate 0.0000   Epoch: 19   Global Step: 103240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:36:01,143-Speed 10512.15 samples/sec   Loss 1.4965   LearningRate 0.0000   Epoch: 19   Global Step: 103250   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:36:08,941-Speed 10506.72 samples/sec   Loss 1.4916   LearningRate 0.0000   Epoch: 19   Global Step: 103260   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:36:16,726-Speed 10523.82 samples/sec   Loss 1.4891   LearningRate 0.0000   Epoch: 19   Global Step: 103270   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:36:24,511-Speed 10524.90 samples/sec   Loss 1.4506   LearningRate 0.0000   Epoch: 19   Global Step: 103280   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:36:32,315-Speed 10498.33 samples/sec   Loss 1.4768   LearningRate 0.0000   Epoch: 19   Global Step: 103290   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:36:40,106-Speed 10516.46 samples/sec   Loss 1.4685   LearningRate 0.0000   Epoch: 19   Global Step: 103300   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:36:47,884-Speed 10533.22 samples/sec   Loss 1.4752   LearningRate 0.0000   Epoch: 19   Global Step: 103310   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:36:55,661-Speed 10535.09 samples/sec   Loss 1.4830   LearningRate 0.0000   Epoch: 19   Global Step: 103320   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:37:03,454-Speed 10513.00 samples/sec   Loss 1.4783   LearningRate 0.0000   Epoch: 19   Global Step: 103330   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:37:11,249-Speed 10510.89 samples/sec   Loss 1.4681   LearningRate 0.0000   Epoch: 19   Global Step: 103340   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:37:19,033-Speed 10525.20 samples/sec   Loss 1.4808   LearningRate 0.0000   Epoch: 19   Global Step: 103350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:37:26,823-Speed 10518.76 samples/sec   Loss 1.4886   LearningRate 0.0000   Epoch: 19   Global Step: 103360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:37:34,604-Speed 10529.00 samples/sec   Loss 1.4816   LearningRate 0.0000   Epoch: 19   Global Step: 103370   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:37:42,438-Speed 10458.73 samples/sec   Loss 1.4730   LearningRate 0.0000   Epoch: 19   Global Step: 103380   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:37:50,243-Speed 10496.13 samples/sec   Loss 1.4782   LearningRate 0.0000   Epoch: 19   Global Step: 103390   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:37:58,054-Speed 10489.80 samples/sec   Loss 1.5001   LearningRate 0.0000   Epoch: 19   Global Step: 103400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:38:05,845-Speed 10515.90 samples/sec   Loss 1.5069   LearningRate 0.0000   Epoch: 19   Global Step: 103410   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:38:13,681-Speed 10456.15 samples/sec   Loss 1.4843   LearningRate 0.0000   Epoch: 19   Global Step: 103420   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:38:21,501-Speed 10477.27 samples/sec   Loss 1.4958   LearningRate 0.0000   Epoch: 19   Global Step: 103430   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:38:29,322-Speed 10475.97 samples/sec   Loss 1.4883   LearningRate 0.0000   Epoch: 19   Global Step: 103440   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:38:37,117-Speed 10510.04 samples/sec   Loss 1.4793   LearningRate 0.0000   Epoch: 19   Global Step: 103450   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:38:44,894-Speed 10536.15 samples/sec   Loss 1.4842   LearningRate 0.0000   Epoch: 19   Global Step: 103460   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:38:52,682-Speed 10520.58 samples/sec   Loss 1.4741   LearningRate 0.0000   Epoch: 19   Global Step: 103470   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:39:00,494-Speed 10486.32 samples/sec   Loss 1.4744   LearningRate 0.0000   Epoch: 19   Global Step: 103480   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:39:08,304-Speed 10490.69 samples/sec   Loss 1.5020   LearningRate 0.0000   Epoch: 19   Global Step: 103490   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:39:16,110-Speed 10495.10 samples/sec   Loss 1.4779   LearningRate 0.0000   Epoch: 19   Global Step: 103500   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-16 13:39:23,890-Speed 10535.50 samples/sec   Loss 1.4748   LearningRate 0.0000   Epoch: 19   Global Step: 103510   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:39:31,712-Speed 10473.90 samples/sec   Loss 1.4903   LearningRate 0.0000   Epoch: 19   Global Step: 103520   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:39:39,499-Speed 10522.50 samples/sec   Loss 1.4805   LearningRate 0.0000   Epoch: 19   Global Step: 103530   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:39:47,301-Speed 10503.47 samples/sec   Loss 1.4746   LearningRate 0.0000   Epoch: 19   Global Step: 103540   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:39:55,108-Speed 10494.73 samples/sec   Loss 1.4676   LearningRate 0.0000   Epoch: 19   Global Step: 103550   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:40:02,906-Speed 10506.29 samples/sec   Loss 1.4933   LearningRate 0.0000   Epoch: 19   Global Step: 103560   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:40:10,747-Speed 10449.03 samples/sec   Loss 1.4835   LearningRate 0.0000   Epoch: 19   Global Step: 103570   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:40:18,581-Speed 10459.34 samples/sec   Loss 1.4966   LearningRate 0.0000   Epoch: 19   Global Step: 103580   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:40:26,380-Speed 10505.04 samples/sec   Loss 1.4674   LearningRate 0.0000   Epoch: 19   Global Step: 103590   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:40:34,202-Speed 10473.47 samples/sec   Loss 1.4757   LearningRate 0.0000   Epoch: 19   Global Step: 103600   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:40:42,004-Speed 10501.70 samples/sec   Loss 1.4692   LearningRate 0.0000   Epoch: 19   Global Step: 103610   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-01-16 13:40:49,803-Speed 10505.25 samples/sec   Loss 1.4987   LearningRate 0.0000   Epoch: 19   Global Step: 103620   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:40:57,603-Speed 10504.47 samples/sec   Loss 1.5092   LearningRate 0.0000   Epoch: 19   Global Step: 103630   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:41:05,402-Speed 10505.40 samples/sec   Loss 1.4787   LearningRate 0.0000   Epoch: 19   Global Step: 103640   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:41:13,186-Speed 10525.35 samples/sec   Loss 1.4806   LearningRate 0.0000   Epoch: 19   Global Step: 103650   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:41:20,969-Speed 10527.44 samples/sec   Loss 1.4747   LearningRate 0.0000   Epoch: 19   Global Step: 103660   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:41:28,757-Speed 10519.85 samples/sec   Loss 1.4651   LearningRate 0.0000   Epoch: 19   Global Step: 103670   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-16 13:41:36,581-Speed 10471.65 samples/sec   Loss 1.4816   LearningRate 0.0000   Epoch: 19   Global Step: 103680   Fp16 Grad Scale: 65536   Required: -0 hours
Training: 2022-01-16 13:41:44,374-Speed 10512.75 samples/sec   Loss 1.4793   LearningRate 0.0000   Epoch: 19   Global Step: 103690   Fp16 Grad Scale: 65536   Required: -0 hours