Training: 2022-01-07 20:14:38,618-rank_id: 0
Training: 2022-01-07 20:15:06,564-: loss                     cosface
Training: 2022-01-07 20:15:06,568-: network                  r50
Training: 2022-01-07 20:15:06,568-: resume                   False
Training: 2022-01-07 20:15:06,568-: output                   work_dirs/webface42m_r50_lr01_pfc02
Training: 2022-01-07 20:15:06,568-: embedding_size           512
Training: 2022-01-07 20:15:06,568-: sample_rate              0.2
Training: 2022-01-07 20:15:06,568-: fp16                     True
Training: 2022-01-07 20:15:06,568-: momentum                 0.9
Training: 2022-01-07 20:15:06,568-: weight_decay             0.0005
Training: 2022-01-07 20:15:06,568-: batch_size               512
Training: 2022-01-07 20:15:06,569-: lr                       0.4
Training: 2022-01-07 20:15:06,569-: dali                     True
Training: 2022-01-07 20:15:06,569-: verbose                  5000
Training: 2022-01-07 20:15:06,569-: frequent                 10
Training: 2022-01-07 20:15:06,569-: if_hard_scale            False
Training: 2022-01-07 20:15:06,569-: score                    None
Training: 2022-01-07 20:15:06,569-: rec                      /train_tmp/WebFace42M
Training: 2022-01-07 20:15:06,569-: num_classes              2059906
Training: 2022-01-07 20:15:06,569-: num_image                42474557
Training: 2022-01-07 20:15:06,569-: num_epoch                20
Training: 2022-01-07 20:15:06,569-: warmup_epoch             2
Training: 2022-01-07 20:15:06,569-: val_targets              ['lfw', 'cfp_fp', 'agedb_30']
Training: 2022-01-07 20:15:06,569-: warmup_step              20738
Training: 2022-01-07 20:15:06,569-: total_step               207380
Training: 2022-01-07 20:16:14,076-Reducer buckets have been rebuilt in this iteration.
Training: 2022-01-07 20:16:27,104-Speed 5994.74 samples/sec   Loss 42.5193   LearningRate 0.0004   Epoch: 0   Global Step: 20   Fp16 Grad Scale: 32768   Required: 59 hours
Training: 2022-01-07 20:16:33,951-Speed 5984.21 samples/sec   Loss 42.4755   LearningRate 0.0006   Epoch: 0   Global Step: 30   Fp16 Grad Scale: 32768   Required: 53 hours
Training: 2022-01-07 20:16:40,804-Speed 5979.20 samples/sec   Loss 42.4921   LearningRate 0.0008   Epoch: 0   Global Step: 40   Fp16 Grad Scale: 32768   Required: 50 hours
Training: 2022-01-07 20:16:47,708-Speed 5933.39 samples/sec   Loss 42.4951   LearningRate 0.0010   Epoch: 0   Global Step: 50   Fp16 Grad Scale: 32768   Required: 48 hours
Training: 2022-01-07 20:16:54,551-Speed 5989.99 samples/sec   Loss 42.4792   LearningRate 0.0012   Epoch: 0   Global Step: 60   Fp16 Grad Scale: 32768   Required: 46 hours
Training: 2022-01-07 20:17:01,389-Speed 5991.95 samples/sec   Loss 42.4620   LearningRate 0.0014   Epoch: 0   Global Step: 70   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-01-07 20:17:08,285-Speed 5941.03 samples/sec   Loss 42.4866   LearningRate 0.0015   Epoch: 0   Global Step: 80   Fp16 Grad Scale: 32768   Required: 45 hours
Training: 2022-01-07 20:17:15,142-Speed 5978.06 samples/sec   Loss 42.4729   LearningRate 0.0017   Epoch: 0   Global Step: 90   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-01-07 20:17:22,018-Speed 5958.27 samples/sec   Loss 42.4625   LearningRate 0.0019   Epoch: 0   Global Step: 100   Fp16 Grad Scale: 32768   Required: 44 hours
Training: 2022-01-07 20:17:28,860-Speed 5988.37 samples/sec   Loss 42.4581   LearningRate 0.0021   Epoch: 0   Global Step: 110   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 20:17:35,742-Speed 5953.60 samples/sec   Loss 42.4277   LearningRate 0.0023   Epoch: 0   Global Step: 120   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 20:17:42,639-Speed 5940.10 samples/sec   Loss 42.4295   LearningRate 0.0025   Epoch: 0   Global Step: 130   Fp16 Grad Scale: 65536   Required: 43 hours
Training: 2022-01-07 20:17:49,488-Speed 5981.67 samples/sec   Loss 42.4274   LearningRate 0.0027   Epoch: 0   Global Step: 140   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 20:17:56,372-Speed 5952.38 samples/sec   Loss 42.4029   LearningRate 0.0029   Epoch: 0   Global Step: 150   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 20:18:03,314-Speed 5902.03 samples/sec   Loss 42.4134   LearningRate 0.0031   Epoch: 0   Global Step: 160   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 20:18:10,155-Speed 5988.17 samples/sec   Loss 42.3723   LearningRate 0.0033   Epoch: 0   Global Step: 170   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 20:18:17,007-Speed 5981.73 samples/sec   Loss 42.3449   LearningRate 0.0035   Epoch: 0   Global Step: 180   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 20:18:23,854-Speed 5983.36 samples/sec   Loss 42.3115   LearningRate 0.0037   Epoch: 0   Global Step: 190   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 20:18:30,720-Speed 5966.77 samples/sec   Loss 42.2991   LearningRate 0.0039   Epoch: 0   Global Step: 200   Fp16 Grad Scale: 65536   Required: 42 hours
Training: 2022-01-07 20:18:37,558-Speed 5992.28 samples/sec   Loss 42.2836   LearningRate 0.0041   Epoch: 0   Global Step: 210   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-07 20:18:44,410-Speed 5979.41 samples/sec   Loss 42.2356   LearningRate 0.0042   Epoch: 0   Global Step: 220   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-07 20:18:51,256-Speed 5983.55 samples/sec   Loss 42.1778   LearningRate 0.0044   Epoch: 0   Global Step: 230   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-07 20:18:58,092-Speed 5993.67 samples/sec   Loss 42.1115   LearningRate 0.0046   Epoch: 0   Global Step: 240   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-07 20:19:04,964-Speed 5964.23 samples/sec   Loss 42.0501   LearningRate 0.0048   Epoch: 0   Global Step: 250   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-07 20:19:11,795-Speed 5996.75 samples/sec   Loss 41.9939   LearningRate 0.0050   Epoch: 0   Global Step: 260   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-07 20:19:18,657-Speed 5970.68 samples/sec   Loss 41.9613   LearningRate 0.0052   Epoch: 0   Global Step: 270   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-07 20:19:25,491-Speed 5994.74 samples/sec   Loss 41.8977   LearningRate 0.0054   Epoch: 0   Global Step: 280   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-07 20:19:32,338-Speed 5983.68 samples/sec   Loss 41.8052   LearningRate 0.0056   Epoch: 0   Global Step: 290   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-07 20:19:39,209-Speed 5962.38 samples/sec   Loss 41.7056   LearningRate 0.0058   Epoch: 0   Global Step: 300   Fp16 Grad Scale: 65536   Required: 41 hours
Training: 2022-01-07 20:19:46,058-Speed 5982.47 samples/sec   Loss 41.6454   LearningRate 0.0060   Epoch: 0   Global Step: 310   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-07 20:19:52,907-Speed 5980.92 samples/sec   Loss 41.5677   LearningRate 0.0062   Epoch: 0   Global Step: 320   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-07 20:19:59,752-Speed 5985.32 samples/sec   Loss 41.4773   LearningRate 0.0064   Epoch: 0   Global Step: 330   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-07 20:20:06,627-Speed 5959.60 samples/sec   Loss 41.4134   LearningRate 0.0066   Epoch: 0   Global Step: 340   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-07 20:20:13,488-Speed 5970.97 samples/sec   Loss 41.3312   LearningRate 0.0068   Epoch: 0   Global Step: 350   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-07 20:20:20,331-Speed 5987.49 samples/sec   Loss 41.2454   LearningRate 0.0069   Epoch: 0   Global Step: 360   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-07 20:20:27,208-Speed 5958.68 samples/sec   Loss 41.1775   LearningRate 0.0071   Epoch: 0   Global Step: 370   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-07 20:20:34,057-Speed 5981.81 samples/sec   Loss 41.1244   LearningRate 0.0073   Epoch: 0   Global Step: 380   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-07 20:20:40,923-Speed 5966.25 samples/sec   Loss 41.0575   LearningRate 0.0075   Epoch: 0   Global Step: 390   Fp16 Grad Scale: 131072   Required: 41 hours
Training: 2022-01-07 20:20:47,798-Speed 5962.23 samples/sec   Loss 40.9563   LearningRate 0.0077   Epoch: 0   Global Step: 400   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 20:20:54,624-Speed 6001.49 samples/sec   Loss 40.9201   LearningRate 0.0079   Epoch: 0   Global Step: 410   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 20:21:01,479-Speed 5978.91 samples/sec   Loss 40.8457   LearningRate 0.0081   Epoch: 0   Global Step: 420   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 20:21:08,338-Speed 5972.58 samples/sec   Loss 40.8103   LearningRate 0.0083   Epoch: 0   Global Step: 430   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:21:15,210-Speed 5962.13 samples/sec   Loss 40.7411   LearningRate 0.0085   Epoch: 0   Global Step: 440   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:21:22,061-Speed 5979.62 samples/sec   Loss 40.6518   LearningRate 0.0087   Epoch: 0   Global Step: 450   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:21:28,903-Speed 5987.84 samples/sec   Loss 40.6232   LearningRate 0.0089   Epoch: 0   Global Step: 460   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:21:35,753-Speed 5980.73 samples/sec   Loss 40.5723   LearningRate 0.0091   Epoch: 0   Global Step: 470   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:21:42,620-Speed 5966.04 samples/sec   Loss 40.5179   LearningRate 0.0093   Epoch: 0   Global Step: 480   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:21:49,512-Speed 5944.77 samples/sec   Loss 40.4562   LearningRate 0.0095   Epoch: 0   Global Step: 490   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:21:56,370-Speed 5974.20 samples/sec   Loss 40.4086   LearningRate 0.0096   Epoch: 0   Global Step: 500   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:22:03,228-Speed 5973.28 samples/sec   Loss 40.3730   LearningRate 0.0098   Epoch: 0   Global Step: 510   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:22:10,086-Speed 5974.06 samples/sec   Loss 40.2950   LearningRate 0.0100   Epoch: 0   Global Step: 520   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:22:16,940-Speed 5977.79 samples/sec   Loss 40.2581   LearningRate 0.0102   Epoch: 0   Global Step: 530   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 20:22:23,788-Speed 5982.33 samples/sec   Loss 40.1983   LearningRate 0.0104   Epoch: 0   Global Step: 540   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:22:30,636-Speed 5982.02 samples/sec   Loss 40.1660   LearningRate 0.0106   Epoch: 0   Global Step: 550   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:22:37,491-Speed 5976.74 samples/sec   Loss 40.1292   LearningRate 0.0108   Epoch: 0   Global Step: 560   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:22:44,333-Speed 5987.73 samples/sec   Loss 40.0748   LearningRate 0.0110   Epoch: 0   Global Step: 570   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:22:51,176-Speed 5986.45 samples/sec   Loss 40.0370   LearningRate 0.0112   Epoch: 0   Global Step: 580   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:22:58,048-Speed 5961.44 samples/sec   Loss 39.9905   LearningRate 0.0114   Epoch: 0   Global Step: 590   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:23:04,889-Speed 5988.89 samples/sec   Loss 39.9674   LearningRate 0.0116   Epoch: 0   Global Step: 600   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:23:11,732-Speed 5986.92 samples/sec   Loss 39.8930   LearningRate 0.0118   Epoch: 0   Global Step: 610   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:23:18,598-Speed 5968.27 samples/sec   Loss 39.8293   LearningRate 0.0120   Epoch: 0   Global Step: 620   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:23:25,434-Speed 5992.49 samples/sec   Loss 39.8242   LearningRate 0.0122   Epoch: 0   Global Step: 630   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:23:32,290-Speed 5975.48 samples/sec   Loss 39.7918   LearningRate 0.0123   Epoch: 0   Global Step: 640   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 20:23:39,154-Speed 5969.05 samples/sec   Loss 39.7562   LearningRate 0.0125   Epoch: 0   Global Step: 650   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:23:46,003-Speed 5981.58 samples/sec   Loss 39.7265   LearningRate 0.0127   Epoch: 0   Global Step: 660   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:23:52,847-Speed 5985.79 samples/sec   Loss 39.6804   LearningRate 0.0129   Epoch: 0   Global Step: 670   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:23:59,727-Speed 5968.73 samples/sec   Loss 39.6473   LearningRate 0.0131   Epoch: 0   Global Step: 680   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:24:06,594-Speed 5966.43 samples/sec   Loss 39.5964   LearningRate 0.0133   Epoch: 0   Global Step: 690   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:24:13,536-Speed 5903.19 samples/sec   Loss 39.5511   LearningRate 0.0135   Epoch: 0   Global Step: 700   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:24:20,396-Speed 5971.88 samples/sec   Loss 39.5207   LearningRate 0.0137   Epoch: 0   Global Step: 710   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:24:27,325-Speed 5914.70 samples/sec   Loss 39.5184   LearningRate 0.0139   Epoch: 0   Global Step: 720   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:24:34,169-Speed 5985.56 samples/sec   Loss 39.4845   LearningRate 0.0141   Epoch: 0   Global Step: 730   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-07 20:24:41,025-Speed 5975.83 samples/sec   Loss 39.4615   LearningRate 0.0143   Epoch: 0   Global Step: 740   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-07 20:24:47,877-Speed 5979.55 samples/sec   Loss 39.4317   LearningRate 0.0145   Epoch: 0   Global Step: 750   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-07 20:24:54,735-Speed 5972.60 samples/sec   Loss 39.3674   LearningRate 0.0147   Epoch: 0   Global Step: 760   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-07 20:25:01,588-Speed 5978.31 samples/sec   Loss 39.3763   LearningRate 0.0149   Epoch: 0   Global Step: 770   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-07 20:25:08,463-Speed 5959.78 samples/sec   Loss 39.3582   LearningRate 0.0150   Epoch: 0   Global Step: 780   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-07 20:25:15,332-Speed 5963.57 samples/sec   Loss 39.3231   LearningRate 0.0152   Epoch: 0   Global Step: 790   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-07 20:25:22,193-Speed 5971.00 samples/sec   Loss 39.2798   LearningRate 0.0154   Epoch: 0   Global Step: 800   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-07 20:25:29,057-Speed 5968.94 samples/sec   Loss 39.2505   LearningRate 0.0156   Epoch: 0   Global Step: 810   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-07 20:25:35,906-Speed 5980.91 samples/sec   Loss 39.2465   LearningRate 0.0158   Epoch: 0   Global Step: 820   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-01-07 20:25:42,781-Speed 5959.54 samples/sec   Loss 39.2197   LearningRate 0.0160   Epoch: 0   Global Step: 830   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:25:49,654-Speed 5961.44 samples/sec   Loss 39.2087   LearningRate 0.0162   Epoch: 0   Global Step: 840   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:25:56,529-Speed 5958.39 samples/sec   Loss 39.1596   LearningRate 0.0164   Epoch: 0   Global Step: 850   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:26:03,402-Speed 5960.83 samples/sec   Loss 39.1576   LearningRate 0.0166   Epoch: 0   Global Step: 860   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:26:10,260-Speed 5973.59 samples/sec   Loss 39.1433   LearningRate 0.0168   Epoch: 0   Global Step: 870   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:26:17,122-Speed 5970.61 samples/sec   Loss 39.1540   LearningRate 0.0170   Epoch: 0   Global Step: 880   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:26:24,008-Speed 5949.25 samples/sec   Loss 39.1294   LearningRate 0.0172   Epoch: 0   Global Step: 890   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:26:30,864-Speed 5975.66 samples/sec   Loss 39.1101   LearningRate 0.0174   Epoch: 0   Global Step: 900   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:26:37,813-Speed 5895.48 samples/sec   Loss 39.0989   LearningRate 0.0176   Epoch: 0   Global Step: 910   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:26:44,677-Speed 5969.08 samples/sec   Loss 39.0847   LearningRate 0.0177   Epoch: 0   Global Step: 920   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:26:51,550-Speed 5961.07 samples/sec   Loss 39.0651   LearningRate 0.0179   Epoch: 0   Global Step: 930   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 20:26:58,393-Speed 5986.52 samples/sec   Loss 39.0690   LearningRate 0.0181   Epoch: 0   Global Step: 940   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:27:05,255-Speed 5972.38 samples/sec   Loss 39.0476   LearningRate 0.0183   Epoch: 0   Global Step: 950   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:27:12,107-Speed 5978.94 samples/sec   Loss 39.0540   LearningRate 0.0185   Epoch: 0   Global Step: 960   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:27:18,970-Speed 5969.43 samples/sec   Loss 39.0580   LearningRate 0.0187   Epoch: 0   Global Step: 970   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:27:25,853-Speed 5952.76 samples/sec   Loss 39.0359   LearningRate 0.0189   Epoch: 0   Global Step: 980   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:27:32,716-Speed 5969.20 samples/sec   Loss 39.0380   LearningRate 0.0191   Epoch: 0   Global Step: 990   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:27:39,568-Speed 5979.10 samples/sec   Loss 39.0184   LearningRate 0.0193   Epoch: 0   Global Step: 1000   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:27:46,427-Speed 5972.38 samples/sec   Loss 39.0182   LearningRate 0.0195   Epoch: 0   Global Step: 1010   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:27:53,279-Speed 5979.27 samples/sec   Loss 38.9937   LearningRate 0.0197   Epoch: 0   Global Step: 1020   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:28:00,130-Speed 5978.91 samples/sec   Loss 39.0072   LearningRate 0.0199   Epoch: 0   Global Step: 1030   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:28:06,987-Speed 5977.02 samples/sec   Loss 39.0168   LearningRate 0.0201   Epoch: 0   Global Step: 1040   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 20:28:13,833-Speed 5984.66 samples/sec   Loss 38.9930   LearningRate 0.0203   Epoch: 0   Global Step: 1050   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:28:20,682-Speed 5981.07 samples/sec   Loss 38.9955   LearningRate 0.0204   Epoch: 0   Global Step: 1060   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:28:27,537-Speed 5976.42 samples/sec   Loss 38.9939   LearningRate 0.0206   Epoch: 0   Global Step: 1070   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:28:34,420-Speed 5951.39 samples/sec   Loss 38.9994   LearningRate 0.0208   Epoch: 0   Global Step: 1080   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:28:41,272-Speed 5978.54 samples/sec   Loss 39.0005   LearningRate 0.0210   Epoch: 0   Global Step: 1090   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:28:48,153-Speed 5953.91 samples/sec   Loss 38.9745   LearningRate 0.0212   Epoch: 0   Global Step: 1100   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:28:55,010-Speed 5974.78 samples/sec   Loss 38.9775   LearningRate 0.0214   Epoch: 0   Global Step: 1110   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:29:01,881-Speed 5962.39 samples/sec   Loss 38.9924   LearningRate 0.0216   Epoch: 0   Global Step: 1120   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:29:08,730-Speed 5981.46 samples/sec   Loss 38.9674   LearningRate 0.0218   Epoch: 0   Global Step: 1130   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:29:15,591-Speed 5971.48 samples/sec   Loss 38.9701   LearningRate 0.0220   Epoch: 0   Global Step: 1140   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:29:22,446-Speed 5975.44 samples/sec   Loss 38.9810   LearningRate 0.0222   Epoch: 0   Global Step: 1150   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 20:29:29,353-Speed 5931.76 samples/sec   Loss 38.9791   LearningRate 0.0224   Epoch: 0   Global Step: 1160   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 20:29:36,303-Speed 5894.62 samples/sec   Loss 38.9628   LearningRate 0.0226   Epoch: 0   Global Step: 1170   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 20:29:43,245-Speed 5901.19 samples/sec   Loss 38.9661   LearningRate 0.0228   Epoch: 0   Global Step: 1180   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:29:50,170-Speed 5915.68 samples/sec   Loss 38.9722   LearningRate 0.0230   Epoch: 0   Global Step: 1190   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:29:57,069-Speed 5938.80 samples/sec   Loss 38.9813   LearningRate 0.0231   Epoch: 0   Global Step: 1200   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:30:03,930-Speed 5971.07 samples/sec   Loss 38.9934   LearningRate 0.0233   Epoch: 0   Global Step: 1210   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:30:10,802-Speed 5960.84 samples/sec   Loss 38.9674   LearningRate 0.0235   Epoch: 0   Global Step: 1220   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:30:17,674-Speed 5961.32 samples/sec   Loss 38.9816   LearningRate 0.0237   Epoch: 0   Global Step: 1230   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:30:24,541-Speed 5968.30 samples/sec   Loss 38.9865   LearningRate 0.0239   Epoch: 0   Global Step: 1240   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:30:31,412-Speed 5962.77 samples/sec   Loss 38.9702   LearningRate 0.0241   Epoch: 0   Global Step: 1250   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:30:38,276-Speed 5968.05 samples/sec   Loss 38.9805   LearningRate 0.0243   Epoch: 0   Global Step: 1260   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:30:45,135-Speed 5973.83 samples/sec   Loss 38.9873   LearningRate 0.0245   Epoch: 0   Global Step: 1270   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:30:51,995-Speed 5971.66 samples/sec   Loss 38.9819   LearningRate 0.0247   Epoch: 0   Global Step: 1280   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 20:30:58,859-Speed 5968.71 samples/sec   Loss 38.9628   LearningRate 0.0249   Epoch: 0   Global Step: 1290   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:31:05,709-Speed 5981.14 samples/sec   Loss 38.9647   LearningRate 0.0251   Epoch: 0   Global Step: 1300   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:31:12,571-Speed 5970.11 samples/sec   Loss 38.9823   LearningRate 0.0253   Epoch: 0   Global Step: 1310   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:31:19,438-Speed 5966.28 samples/sec   Loss 38.9873   LearningRate 0.0255   Epoch: 0   Global Step: 1320   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:31:26,290-Speed 5979.52 samples/sec   Loss 38.9673   LearningRate 0.0257   Epoch: 0   Global Step: 1330   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:31:33,141-Speed 5979.45 samples/sec   Loss 38.9920   LearningRate 0.0258   Epoch: 0   Global Step: 1340   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:31:40,005-Speed 5969.00 samples/sec   Loss 38.9698   LearningRate 0.0260   Epoch: 0   Global Step: 1350   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:31:46,854-Speed 5980.60 samples/sec   Loss 38.9964   LearningRate 0.0262   Epoch: 0   Global Step: 1360   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:31:53,727-Speed 5960.97 samples/sec   Loss 38.9925   LearningRate 0.0264   Epoch: 0   Global Step: 1370   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:32:00,600-Speed 5961.23 samples/sec   Loss 38.9887   LearningRate 0.0266   Epoch: 0   Global Step: 1380   Fp16 Grad Scale: 65536   Required: 40 hours
Training: 2022-01-07 20:32:07,463-Speed 5968.66 samples/sec   Loss 39.0125   LearningRate 0.0268   Epoch: 0   Global Step: 1390   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 20:32:14,319-Speed 5978.56 samples/sec   Loss 38.9861   LearningRate 0.0270   Epoch: 0   Global Step: 1400   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 20:32:21,165-Speed 5983.78 samples/sec   Loss 38.9947   LearningRate 0.0272   Epoch: 0   Global Step: 1410   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 20:32:28,054-Speed 5946.90 samples/sec   Loss 38.9721   LearningRate 0.0274   Epoch: 0   Global Step: 1420   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 20:32:34,990-Speed 5906.85 samples/sec   Loss 38.9875   LearningRate 0.0276   Epoch: 0   Global Step: 1430   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 20:32:41,965-Speed 5873.81 samples/sec   Loss 38.9687   LearningRate 0.0278   Epoch: 0   Global Step: 1440   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 20:32:48,925-Speed 5886.13 samples/sec   Loss 38.9819   LearningRate 0.0280   Epoch: 0   Global Step: 1450   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 20:32:55,804-Speed 5955.89 samples/sec   Loss 38.9673   LearningRate 0.0282   Epoch: 0   Global Step: 1460   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 20:33:02,713-Speed 5929.27 samples/sec   Loss 38.9776   LearningRate 0.0284   Epoch: 0   Global Step: 1470   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 20:33:09,578-Speed 5967.85 samples/sec   Loss 38.9723   LearningRate 0.0285   Epoch: 0   Global Step: 1480   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 20:33:16,433-Speed 5976.53 samples/sec   Loss 39.0120   LearningRate 0.0287   Epoch: 0   Global Step: 1490   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 20:33:23,307-Speed 5958.70 samples/sec   Loss 38.9859   LearningRate 0.0289   Epoch: 0   Global Step: 1500   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 20:33:30,180-Speed 5961.46 samples/sec   Loss 38.9910   LearningRate 0.0291   Epoch: 0   Global Step: 1510   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 20:33:37,059-Speed 5955.37 samples/sec   Loss 38.9745   LearningRate 0.0293   Epoch: 0   Global Step: 1520   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 20:33:43,922-Speed 5968.70 samples/sec   Loss 38.9936   LearningRate 0.0295   Epoch: 0   Global Step: 1530   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 20:33:50,808-Speed 5950.01 samples/sec   Loss 38.9866   LearningRate 0.0297   Epoch: 0   Global Step: 1540   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 20:33:57,690-Speed 5952.91 samples/sec   Loss 38.9934   LearningRate 0.0299   Epoch: 0   Global Step: 1550   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 20:34:04,574-Speed 5951.60 samples/sec   Loss 38.9996   LearningRate 0.0301   Epoch: 0   Global Step: 1560   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 20:34:11,459-Speed 5950.49 samples/sec   Loss 38.9841   LearningRate 0.0303   Epoch: 0   Global Step: 1570   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 20:34:18,310-Speed 5979.97 samples/sec   Loss 38.9631   LearningRate 0.0305   Epoch: 0   Global Step: 1580   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 20:34:25,165-Speed 5975.76 samples/sec   Loss 38.9840   LearningRate 0.0307   Epoch: 0   Global Step: 1590   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-01-07 20:34:32,018-Speed 5978.76 samples/sec   Loss 38.9842   LearningRate 0.0309   Epoch: 0   Global Step: 1600   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 20:34:38,868-Speed 5980.21 samples/sec   Loss 38.9848   LearningRate 0.0311   Epoch: 0   Global Step: 1610   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 20:34:45,725-Speed 5974.35 samples/sec   Loss 39.0005   LearningRate 0.0312   Epoch: 0   Global Step: 1620   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:34:52,611-Speed 5949.27 samples/sec   Loss 39.0077   LearningRate 0.0314   Epoch: 0   Global Step: 1630   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:34:59,513-Speed 5935.78 samples/sec   Loss 38.9841   LearningRate 0.0316   Epoch: 0   Global Step: 1640   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:35:06,417-Speed 5934.42 samples/sec   Loss 38.9946   LearningRate 0.0318   Epoch: 0   Global Step: 1650   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:35:13,368-Speed 5893.91 samples/sec   Loss 38.9825   LearningRate 0.0320   Epoch: 0   Global Step: 1660   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:35:20,244-Speed 5957.73 samples/sec   Loss 38.9760   LearningRate 0.0322   Epoch: 0   Global Step: 1670   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:35:27,106-Speed 5973.03 samples/sec   Loss 38.9981   LearningRate 0.0324   Epoch: 0   Global Step: 1680   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 20:35:34,002-Speed 5943.96 samples/sec   Loss 38.9854   LearningRate 0.0326   Epoch: 0   Global Step: 1690   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 20:35:40,856-Speed 5977.55 samples/sec   Loss 38.9892   LearningRate 0.0328   Epoch: 0   Global Step: 1700   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 20:35:47,707-Speed 5979.54 samples/sec   Loss 38.9947   LearningRate 0.0330   Epoch: 0   Global Step: 1710   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 20:35:54,564-Speed 5976.54 samples/sec   Loss 38.9936   LearningRate 0.0332   Epoch: 0   Global Step: 1720   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 20:36:01,440-Speed 5962.39 samples/sec   Loss 39.0219   LearningRate 0.0334   Epoch: 0   Global Step: 1730   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 20:36:08,294-Speed 5976.97 samples/sec   Loss 39.0060   LearningRate 0.0336   Epoch: 0   Global Step: 1740   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 20:36:15,188-Speed 5941.97 samples/sec   Loss 39.0153   LearningRate 0.0338   Epoch: 0   Global Step: 1750   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 20:36:22,066-Speed 5957.38 samples/sec   Loss 38.9944   LearningRate 0.0339   Epoch: 0   Global Step: 1760   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 20:36:28,946-Speed 5954.29 samples/sec   Loss 38.9938   LearningRate 0.0341   Epoch: 0   Global Step: 1770   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 20:36:35,799-Speed 5980.33 samples/sec   Loss 38.9996   LearningRate 0.0343   Epoch: 0   Global Step: 1780   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:36:42,652-Speed 5978.25 samples/sec   Loss 38.9743   LearningRate 0.0345   Epoch: 0   Global Step: 1790   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:36:49,516-Speed 5968.25 samples/sec   Loss 39.0266   LearningRate 0.0347   Epoch: 0   Global Step: 1800   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:36:56,390-Speed 5960.19 samples/sec   Loss 39.0039   LearningRate 0.0349   Epoch: 0   Global Step: 1810   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:37:03,254-Speed 5968.43 samples/sec   Loss 38.9995   LearningRate 0.0351   Epoch: 0   Global Step: 1820   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:37:10,131-Speed 5957.24 samples/sec   Loss 38.9976   LearningRate 0.0353   Epoch: 0   Global Step: 1830   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:37:16,998-Speed 5966.42 samples/sec   Loss 39.0207   LearningRate 0.0355   Epoch: 0   Global Step: 1840   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:37:23,862-Speed 5968.51 samples/sec   Loss 39.0049   LearningRate 0.0357   Epoch: 0   Global Step: 1850   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:37:30,735-Speed 5960.15 samples/sec   Loss 38.9927   LearningRate 0.0359   Epoch: 0   Global Step: 1860   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:37:37,596-Speed 5970.89 samples/sec   Loss 38.9925   LearningRate 0.0361   Epoch: 0   Global Step: 1870   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:37:44,458-Speed 5970.18 samples/sec   Loss 39.0170   LearningRate 0.0363   Epoch: 0   Global Step: 1880   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:37:51,325-Speed 5966.14 samples/sec   Loss 38.9966   LearningRate 0.0365   Epoch: 0   Global Step: 1890   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:37:58,195-Speed 5962.77 samples/sec   Loss 38.9791   LearningRate 0.0366   Epoch: 0   Global Step: 1900   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:38:05,066-Speed 5962.20 samples/sec   Loss 38.9851   LearningRate 0.0368   Epoch: 0   Global Step: 1910   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:38:11,997-Speed 5911.79 samples/sec   Loss 38.9756   LearningRate 0.0370   Epoch: 0   Global Step: 1920   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:38:18,856-Speed 5972.90 samples/sec   Loss 39.0197   LearningRate 0.0372   Epoch: 0   Global Step: 1930   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:38:25,739-Speed 5952.25 samples/sec   Loss 38.9839   LearningRate 0.0374   Epoch: 0   Global Step: 1940   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:38:32,608-Speed 5966.03 samples/sec   Loss 38.9746   LearningRate 0.0376   Epoch: 0   Global Step: 1950   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:38:39,470-Speed 5969.82 samples/sec   Loss 39.0004   LearningRate 0.0378   Epoch: 0   Global Step: 1960   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:38:46,351-Speed 5954.11 samples/sec   Loss 38.9728   LearningRate 0.0380   Epoch: 0   Global Step: 1970   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:38:53,198-Speed 5983.69 samples/sec   Loss 38.9963   LearningRate 0.0382   Epoch: 0   Global Step: 1980   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:39:00,076-Speed 5956.53 samples/sec   Loss 39.0164   LearningRate 0.0384   Epoch: 0   Global Step: 1990   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:39:06,947-Speed 5962.53 samples/sec   Loss 38.9966   LearningRate 0.0386   Epoch: 0   Global Step: 2000   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:39:13,812-Speed 5968.27 samples/sec   Loss 38.9955   LearningRate 0.0388   Epoch: 0   Global Step: 2010   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:39:20,679-Speed 5965.57 samples/sec   Loss 38.9830   LearningRate 0.0390   Epoch: 0   Global Step: 2020   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:39:27,577-Speed 5942.10 samples/sec   Loss 39.0091   LearningRate 0.0392   Epoch: 0   Global Step: 2030   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:39:34,448-Speed 5962.44 samples/sec   Loss 38.9936   LearningRate 0.0393   Epoch: 0   Global Step: 2040   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:39:41,321-Speed 5960.94 samples/sec   Loss 38.9832   LearningRate 0.0395   Epoch: 0   Global Step: 2050   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:39:48,189-Speed 5964.92 samples/sec   Loss 38.9807   LearningRate 0.0397   Epoch: 0   Global Step: 2060   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:39:55,104-Speed 5923.93 samples/sec   Loss 38.9631   LearningRate 0.0399   Epoch: 0   Global Step: 2070   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:40:02,045-Speed 5902.40 samples/sec   Loss 38.9817   LearningRate 0.0401   Epoch: 0   Global Step: 2080   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 20:40:08,967-Speed 5918.78 samples/sec   Loss 38.9700   LearningRate 0.0403   Epoch: 0   Global Step: 2090   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:40:15,874-Speed 5931.38 samples/sec   Loss 38.9316   LearningRate 0.0405   Epoch: 0   Global Step: 2100   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:40:22,728-Speed 5977.14 samples/sec   Loss 38.9720   LearningRate 0.0407   Epoch: 0   Global Step: 2110   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:40:29,600-Speed 5961.56 samples/sec   Loss 38.9500   LearningRate 0.0409   Epoch: 0   Global Step: 2120   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:40:36,462-Speed 5970.60 samples/sec   Loss 38.9639   LearningRate 0.0411   Epoch: 0   Global Step: 2130   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:40:43,324-Speed 5970.83 samples/sec   Loss 38.9435   LearningRate 0.0413   Epoch: 0   Global Step: 2140   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:40:50,188-Speed 5970.73 samples/sec   Loss 38.9259   LearningRate 0.0415   Epoch: 0   Global Step: 2150   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:40:57,043-Speed 5975.85 samples/sec   Loss 38.9296   LearningRate 0.0417   Epoch: 0   Global Step: 2160   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:41:03,897-Speed 5977.55 samples/sec   Loss 38.9367   LearningRate 0.0419   Epoch: 0   Global Step: 2170   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:41:10,752-Speed 5976.39 samples/sec   Loss 38.9845   LearningRate 0.0420   Epoch: 0   Global Step: 2180   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:41:17,607-Speed 5976.15 samples/sec   Loss 38.9526   LearningRate 0.0422   Epoch: 0   Global Step: 2190   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 20:41:24,467-Speed 5972.22 samples/sec   Loss 38.9232   LearningRate 0.0424   Epoch: 0   Global Step: 2200   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 20:41:31,323-Speed 5974.93 samples/sec   Loss 38.9251   LearningRate 0.0426   Epoch: 0   Global Step: 2210   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 20:41:38,182-Speed 5973.49 samples/sec   Loss 38.9275   LearningRate 0.0428   Epoch: 0   Global Step: 2220   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 20:41:45,050-Speed 5966.95 samples/sec   Loss 38.8835   LearningRate 0.0430   Epoch: 0   Global Step: 2230   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:41:51,913-Speed 5971.99 samples/sec   Loss 38.9009   LearningRate 0.0432   Epoch: 0   Global Step: 2240   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:41:58,805-Speed 5944.24 samples/sec   Loss 38.8685   LearningRate 0.0434   Epoch: 0   Global Step: 2250   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:42:05,677-Speed 5963.05 samples/sec   Loss 38.8964   LearningRate 0.0436   Epoch: 0   Global Step: 2260   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:42:12,535-Speed 5973.44 samples/sec   Loss 38.8614   LearningRate 0.0438   Epoch: 0   Global Step: 2270   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:42:19,404-Speed 5964.55 samples/sec   Loss 38.9026   LearningRate 0.0440   Epoch: 0   Global Step: 2280   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:42:26,290-Speed 5948.95 samples/sec   Loss 38.8550   LearningRate 0.0442   Epoch: 0   Global Step: 2290   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:42:33,149-Speed 5972.57 samples/sec   Loss 38.8624   LearningRate 0.0444   Epoch: 0   Global Step: 2300   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:42:40,046-Speed 5940.62 samples/sec   Loss 38.8621   LearningRate 0.0446   Epoch: 0   Global Step: 2310   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:42:46,916-Speed 5962.71 samples/sec   Loss 38.8372   LearningRate 0.0447   Epoch: 0   Global Step: 2320   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:42:53,758-Speed 5987.97 samples/sec   Loss 38.8022   LearningRate 0.0449   Epoch: 0   Global Step: 2330   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:43:00,624-Speed 5966.72 samples/sec   Loss 38.8195   LearningRate 0.0451   Epoch: 0   Global Step: 2340   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:43:07,481-Speed 5974.43 samples/sec   Loss 38.8139   LearningRate 0.0453   Epoch: 0   Global Step: 2350   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:43:14,372-Speed 5944.93 samples/sec   Loss 38.7939   LearningRate 0.0455   Epoch: 0   Global Step: 2360   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:43:21,228-Speed 5975.97 samples/sec   Loss 38.7874   LearningRate 0.0457   Epoch: 0   Global Step: 2370   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:43:28,096-Speed 5964.68 samples/sec   Loss 38.7595   LearningRate 0.0459   Epoch: 0   Global Step: 2380   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:43:34,948-Speed 5979.26 samples/sec   Loss 38.7434   LearningRate 0.0461   Epoch: 0   Global Step: 2390   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:43:41,816-Speed 5965.46 samples/sec   Loss 38.7468   LearningRate 0.0463   Epoch: 0   Global Step: 2400   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:43:48,676-Speed 5972.28 samples/sec   Loss 38.7252   LearningRate 0.0465   Epoch: 0   Global Step: 2410   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:43:55,530-Speed 5977.09 samples/sec   Loss 38.7193   LearningRate 0.0467   Epoch: 0   Global Step: 2420   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:44:02,388-Speed 5973.16 samples/sec   Loss 38.7122   LearningRate 0.0469   Epoch: 0   Global Step: 2430   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 20:44:09,262-Speed 5960.45 samples/sec   Loss 38.6918   LearningRate 0.0471   Epoch: 0   Global Step: 2440   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:44:16,122-Speed 5971.47 samples/sec   Loss 38.6639   LearningRate 0.0473   Epoch: 0   Global Step: 2450   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:44:22,980-Speed 5973.35 samples/sec   Loss 38.6667   LearningRate 0.0474   Epoch: 0   Global Step: 2460   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:44:29,848-Speed 5965.41 samples/sec   Loss 38.6632   LearningRate 0.0476   Epoch: 0   Global Step: 2470   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:44:36,737-Speed 5946.59 samples/sec   Loss 38.6093   LearningRate 0.0478   Epoch: 0   Global Step: 2480   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:44:43,607-Speed 5965.16 samples/sec   Loss 38.6256   LearningRate 0.0480   Epoch: 0   Global Step: 2490   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:44:50,470-Speed 5969.16 samples/sec   Loss 38.5960   LearningRate 0.0482   Epoch: 0   Global Step: 2500   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:44:57,355-Speed 5951.86 samples/sec   Loss 38.5823   LearningRate 0.0484   Epoch: 0   Global Step: 2510   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:45:04,223-Speed 5965.52 samples/sec   Loss 38.5503   LearningRate 0.0486   Epoch: 0   Global Step: 2520   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:45:11,083-Speed 5971.56 samples/sec   Loss 38.5346   LearningRate 0.0488   Epoch: 0   Global Step: 2530   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:45:17,947-Speed 5968.23 samples/sec   Loss 38.5030   LearningRate 0.0490   Epoch: 0   Global Step: 2540   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:45:24,841-Speed 5942.75 samples/sec   Loss 38.4794   LearningRate 0.0492   Epoch: 0   Global Step: 2550   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:45:31,693-Speed 5981.49 samples/sec   Loss 38.5025   LearningRate 0.0494   Epoch: 0   Global Step: 2560   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:45:38,587-Speed 5942.91 samples/sec   Loss 38.4676   LearningRate 0.0496   Epoch: 0   Global Step: 2570   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:45:45,456-Speed 5963.82 samples/sec   Loss 38.4305   LearningRate 0.0498   Epoch: 0   Global Step: 2580   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:45:52,309-Speed 5977.81 samples/sec   Loss 38.4588   LearningRate 0.0500   Epoch: 0   Global Step: 2590   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:45:59,161-Speed 5979.32 samples/sec   Loss 38.4301   LearningRate 0.0501   Epoch: 0   Global Step: 2600   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:46:06,027-Speed 5966.46 samples/sec   Loss 38.3997   LearningRate 0.0503   Epoch: 0   Global Step: 2610   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:46:12,988-Speed 5884.96 samples/sec   Loss 38.3583   LearningRate 0.0505   Epoch: 0   Global Step: 2620   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:46:19,851-Speed 5970.05 samples/sec   Loss 38.3270   LearningRate 0.0507   Epoch: 0   Global Step: 2630   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:46:26,714-Speed 5971.09 samples/sec   Loss 38.3352   LearningRate 0.0509   Epoch: 0   Global Step: 2640   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 20:46:33,587-Speed 5960.52 samples/sec   Loss 38.3132   LearningRate 0.0511   Epoch: 0   Global Step: 2650   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:46:40,444-Speed 5975.41 samples/sec   Loss 38.3094   LearningRate 0.0513   Epoch: 0   Global Step: 2660   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:46:47,317-Speed 5960.01 samples/sec   Loss 38.2479   LearningRate 0.0515   Epoch: 0   Global Step: 2670   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:46:54,197-Speed 5958.34 samples/sec   Loss 38.2519   LearningRate 0.0517   Epoch: 0   Global Step: 2680   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:47:01,050-Speed 5978.47 samples/sec   Loss 38.2418   LearningRate 0.0519   Epoch: 0   Global Step: 2690   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:47:07,901-Speed 5980.02 samples/sec   Loss 38.2228   LearningRate 0.0521   Epoch: 0   Global Step: 2700   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:47:14,751-Speed 5980.48 samples/sec   Loss 38.1595   LearningRate 0.0523   Epoch: 0   Global Step: 2710   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:47:21,609-Speed 5973.26 samples/sec   Loss 38.1601   LearningRate 0.0525   Epoch: 0   Global Step: 2720   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:47:28,503-Speed 5943.38 samples/sec   Loss 38.1444   LearningRate 0.0527   Epoch: 0   Global Step: 2730   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:47:35,362-Speed 5972.79 samples/sec   Loss 38.1072   LearningRate 0.0528   Epoch: 0   Global Step: 2740   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:47:42,243-Speed 5953.13 samples/sec   Loss 38.0634   LearningRate 0.0530   Epoch: 0   Global Step: 2750   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 20:47:49,097-Speed 5977.58 samples/sec   Loss 38.0635   LearningRate 0.0532   Epoch: 0   Global Step: 2760   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:47:55,958-Speed 5970.77 samples/sec   Loss 38.0383   LearningRate 0.0534   Epoch: 0   Global Step: 2770   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:48:02,844-Speed 5949.90 samples/sec   Loss 38.0355   LearningRate 0.0536   Epoch: 0   Global Step: 2780   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:48:09,716-Speed 5960.67 samples/sec   Loss 38.0058   LearningRate 0.0538   Epoch: 0   Global Step: 2790   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:48:16,567-Speed 5980.30 samples/sec   Loss 37.9565   LearningRate 0.0540   Epoch: 0   Global Step: 2800   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:48:23,414-Speed 5982.77 samples/sec   Loss 37.9802   LearningRate 0.0542   Epoch: 0   Global Step: 2810   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:48:30,258-Speed 5985.97 samples/sec   Loss 37.9396   LearningRate 0.0544   Epoch: 0   Global Step: 2820   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:48:37,120-Speed 5972.37 samples/sec   Loss 37.9317   LearningRate 0.0546   Epoch: 0   Global Step: 2830   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:48:43,971-Speed 5979.81 samples/sec   Loss 37.8873   LearningRate 0.0548   Epoch: 0   Global Step: 2840   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:48:50,823-Speed 5978.55 samples/sec   Loss 37.8206   LearningRate 0.0550   Epoch: 0   Global Step: 2850   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:48:57,675-Speed 5978.95 samples/sec   Loss 37.8006   LearningRate 0.0552   Epoch: 0   Global Step: 2860   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:49:04,527-Speed 5978.80 samples/sec   Loss 37.8381   LearningRate 0.0554   Epoch: 0   Global Step: 2870   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:49:11,373-Speed 5984.45 samples/sec   Loss 37.7962   LearningRate 0.0556   Epoch: 0   Global Step: 2880   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:49:18,257-Speed 5953.61 samples/sec   Loss 37.7702   LearningRate 0.0557   Epoch: 0   Global Step: 2890   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:49:25,112-Speed 5977.13 samples/sec   Loss 37.7147   LearningRate 0.0559   Epoch: 0   Global Step: 2900   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:49:31,980-Speed 5965.34 samples/sec   Loss 37.6737   LearningRate 0.0561   Epoch: 0   Global Step: 2910   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:49:38,832-Speed 5978.24 samples/sec   Loss 37.7134   LearningRate 0.0563   Epoch: 0   Global Step: 2920   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:49:45,700-Speed 5965.77 samples/sec   Loss 37.6625   LearningRate 0.0565   Epoch: 0   Global Step: 2930   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:49:52,564-Speed 5967.94 samples/sec   Loss 37.6372   LearningRate 0.0567   Epoch: 0   Global Step: 2940   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:49:59,412-Speed 5982.88 samples/sec   Loss 37.5586   LearningRate 0.0569   Epoch: 0   Global Step: 2950   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:50:06,280-Speed 5967.44 samples/sec   Loss 37.6291   LearningRate 0.0571   Epoch: 0   Global Step: 2960   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 20:50:13,131-Speed 5982.19 samples/sec   Loss 37.5445   LearningRate 0.0573   Epoch: 0   Global Step: 2970   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 20:50:19,996-Speed 5967.45 samples/sec   Loss 37.5529   LearningRate 0.0575   Epoch: 0   Global Step: 2980   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:50:26,869-Speed 5960.64 samples/sec   Loss 37.5017   LearningRate 0.0577   Epoch: 0   Global Step: 2990   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:50:33,715-Speed 5984.18 samples/sec   Loss 37.5059   LearningRate 0.0579   Epoch: 0   Global Step: 3000   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:50:40,696-Speed 5868.38 samples/sec   Loss 37.4628   LearningRate 0.0581   Epoch: 0   Global Step: 3010   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:50:47,559-Speed 5969.46 samples/sec   Loss 37.4272   LearningRate 0.0583   Epoch: 0   Global Step: 3020   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:50:54,506-Speed 5897.03 samples/sec   Loss 37.4054   LearningRate 0.0584   Epoch: 0   Global Step: 3030   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:51:01,355-Speed 5980.67 samples/sec   Loss 37.3898   LearningRate 0.0586   Epoch: 0   Global Step: 3040   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:51:08,343-Speed 5863.02 samples/sec   Loss 37.3195   LearningRate 0.0588   Epoch: 0   Global Step: 3050   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:51:15,196-Speed 5977.90 samples/sec   Loss 37.3005   LearningRate 0.0590   Epoch: 0   Global Step: 3060   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:51:22,070-Speed 5962.61 samples/sec   Loss 37.2665   LearningRate 0.0592   Epoch: 0   Global Step: 3070   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:51:28,925-Speed 5976.11 samples/sec   Loss 37.2734   LearningRate 0.0594   Epoch: 0   Global Step: 3080   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 20:51:35,773-Speed 5982.24 samples/sec   Loss 37.2562   LearningRate 0.0596   Epoch: 0   Global Step: 3090   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 20:51:42,622-Speed 5983.00 samples/sec   Loss 37.1916   LearningRate 0.0598   Epoch: 0   Global Step: 3100   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 20:51:49,469-Speed 5983.61 samples/sec   Loss 37.1790   LearningRate 0.0600   Epoch: 0   Global Step: 3110   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 20:51:56,321-Speed 5978.35 samples/sec   Loss 37.1711   LearningRate 0.0602   Epoch: 0   Global Step: 3120   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 20:52:03,196-Speed 5958.78 samples/sec   Loss 37.1202   LearningRate 0.0604   Epoch: 0   Global Step: 3130   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 20:52:10,039-Speed 5986.35 samples/sec   Loss 37.0971   LearningRate 0.0606   Epoch: 0   Global Step: 3140   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 20:52:16,879-Speed 5991.28 samples/sec   Loss 37.0651   LearningRate 0.0608   Epoch: 0   Global Step: 3150   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:52:23,748-Speed 5964.49 samples/sec   Loss 37.0270   LearningRate 0.0610   Epoch: 0   Global Step: 3160   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:52:30,602-Speed 5977.70 samples/sec   Loss 36.9910   LearningRate 0.0611   Epoch: 0   Global Step: 3170   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:52:37,466-Speed 5969.93 samples/sec   Loss 36.9284   LearningRate 0.0613   Epoch: 0   Global Step: 3180   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:52:44,322-Speed 5974.60 samples/sec   Loss 36.9566   LearningRate 0.0615   Epoch: 0   Global Step: 3190   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:52:51,189-Speed 5966.37 samples/sec   Loss 36.8858   LearningRate 0.0617   Epoch: 0   Global Step: 3200   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:52:58,040-Speed 5980.39 samples/sec   Loss 36.8811   LearningRate 0.0619   Epoch: 0   Global Step: 3210   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:53:04,906-Speed 5966.51 samples/sec   Loss 36.8102   LearningRate 0.0621   Epoch: 0   Global Step: 3220   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:53:11,772-Speed 5972.62 samples/sec   Loss 36.8026   LearningRate 0.0623   Epoch: 0   Global Step: 3230   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:53:18,626-Speed 5979.33 samples/sec   Loss 36.8288   LearningRate 0.0625   Epoch: 0   Global Step: 3240   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:53:25,479-Speed 5980.60 samples/sec   Loss 36.7670   LearningRate 0.0627   Epoch: 0   Global Step: 3250   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 20:53:32,347-Speed 5964.99 samples/sec   Loss 36.7325   LearningRate 0.0629   Epoch: 0   Global Step: 3260   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 20:53:39,188-Speed 5988.94 samples/sec   Loss 36.7056   LearningRate 0.0631   Epoch: 0   Global Step: 3270   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:53:46,054-Speed 5969.45 samples/sec   Loss 36.6984   LearningRate 0.0633   Epoch: 0   Global Step: 3280   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:53:52,906-Speed 5979.08 samples/sec   Loss 36.6671   LearningRate 0.0635   Epoch: 0   Global Step: 3290   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:53:59,758-Speed 5979.15 samples/sec   Loss 36.6037   LearningRate 0.0637   Epoch: 0   Global Step: 3300   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:54:06,623-Speed 5967.68 samples/sec   Loss 36.6051   LearningRate 0.0638   Epoch: 0   Global Step: 3310   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:54:13,502-Speed 5955.37 samples/sec   Loss 36.5251   LearningRate 0.0640   Epoch: 0   Global Step: 3320   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:54:20,358-Speed 5976.06 samples/sec   Loss 36.4742   LearningRate 0.0642   Epoch: 0   Global Step: 3330   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:54:27,216-Speed 5973.72 samples/sec   Loss 36.4280   LearningRate 0.0644   Epoch: 0   Global Step: 3340   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:54:34,082-Speed 5966.71 samples/sec   Loss 36.4358   LearningRate 0.0646   Epoch: 0   Global Step: 3350   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:54:40,937-Speed 5976.38 samples/sec   Loss 36.4167   LearningRate 0.0648   Epoch: 0   Global Step: 3360   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:54:47,791-Speed 5977.81 samples/sec   Loss 36.3575   LearningRate 0.0650   Epoch: 0   Global Step: 3370   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 20:54:54,645-Speed 5977.04 samples/sec   Loss 36.3310   LearningRate 0.0652   Epoch: 0   Global Step: 3380   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 20:55:01,508-Speed 5968.80 samples/sec   Loss 36.2775   LearningRate 0.0654   Epoch: 0   Global Step: 3390   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 20:55:08,356-Speed 5983.13 samples/sec   Loss 36.2646   LearningRate 0.0656   Epoch: 0   Global Step: 3400   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 20:55:15,221-Speed 5967.96 samples/sec   Loss 36.2401   LearningRate 0.0658   Epoch: 0   Global Step: 3410   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 20:55:22,090-Speed 5963.55 samples/sec   Loss 36.1861   LearningRate 0.0660   Epoch: 0   Global Step: 3420   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 20:55:28,954-Speed 5968.67 samples/sec   Loss 36.1800   LearningRate 0.0662   Epoch: 0   Global Step: 3430   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 20:55:35,827-Speed 5960.44 samples/sec   Loss 36.0995   LearningRate 0.0664   Epoch: 0   Global Step: 3440   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 20:55:42,694-Speed 5966.29 samples/sec   Loss 36.1242   LearningRate 0.0665   Epoch: 0   Global Step: 3450   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 20:55:49,590-Speed 5941.10 samples/sec   Loss 36.0335   LearningRate 0.0667   Epoch: 0   Global Step: 3460   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:55:56,459-Speed 5964.30 samples/sec   Loss 35.9773   LearningRate 0.0669   Epoch: 0   Global Step: 3470   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:56:03,322-Speed 5969.28 samples/sec   Loss 36.0212   LearningRate 0.0671   Epoch: 0   Global Step: 3480   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:56:10,175-Speed 5978.15 samples/sec   Loss 35.9200   LearningRate 0.0673   Epoch: 0   Global Step: 3490   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:56:17,037-Speed 5970.32 samples/sec   Loss 35.8686   LearningRate 0.0675   Epoch: 0   Global Step: 3500   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:56:23,893-Speed 5975.41 samples/sec   Loss 35.9164   LearningRate 0.0677   Epoch: 0   Global Step: 3510   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:56:30,768-Speed 5959.34 samples/sec   Loss 35.8524   LearningRate 0.0679   Epoch: 0   Global Step: 3520   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:56:37,628-Speed 5971.48 samples/sec   Loss 35.7934   LearningRate 0.0681   Epoch: 0   Global Step: 3530   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:56:44,488-Speed 5972.11 samples/sec   Loss 35.7463   LearningRate 0.0683   Epoch: 0   Global Step: 3540   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:56:51,360-Speed 5961.59 samples/sec   Loss 35.7327   LearningRate 0.0685   Epoch: 0   Global Step: 3550   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:56:58,211-Speed 5980.33 samples/sec   Loss 35.6677   LearningRate 0.0687   Epoch: 0   Global Step: 3560   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 20:57:05,065-Speed 5977.64 samples/sec   Loss 35.6797   LearningRate 0.0689   Epoch: 0   Global Step: 3570   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 20:57:11,908-Speed 5986.24 samples/sec   Loss 35.6242   LearningRate 0.0691   Epoch: 0   Global Step: 3580   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 20:57:18,773-Speed 5969.99 samples/sec   Loss 35.5924   LearningRate 0.0692   Epoch: 0   Global Step: 3590   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 20:57:25,622-Speed 5983.30 samples/sec   Loss 35.5608   LearningRate 0.0694   Epoch: 0   Global Step: 3600   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 20:57:32,504-Speed 5953.51 samples/sec   Loss 35.4761   LearningRate 0.0696   Epoch: 0   Global Step: 3610   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 20:57:39,367-Speed 5970.04 samples/sec   Loss 35.4137   LearningRate 0.0698   Epoch: 0   Global Step: 3620   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 20:57:46,216-Speed 5983.49 samples/sec   Loss 35.4462   LearningRate 0.0700   Epoch: 0   Global Step: 3630   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:57:53,081-Speed 5967.91 samples/sec   Loss 35.3985   LearningRate 0.0702   Epoch: 0   Global Step: 3640   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:57:59,947-Speed 5966.44 samples/sec   Loss 35.3569   LearningRate 0.0704   Epoch: 0   Global Step: 3650   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:58:06,813-Speed 5966.51 samples/sec   Loss 35.3448   LearningRate 0.0706   Epoch: 0   Global Step: 3660   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:58:13,692-Speed 5955.77 samples/sec   Loss 35.2712   LearningRate 0.0708   Epoch: 0   Global Step: 3670   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:58:20,562-Speed 5963.61 samples/sec   Loss 35.2860   LearningRate 0.0710   Epoch: 0   Global Step: 3680   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:58:27,411-Speed 5981.36 samples/sec   Loss 35.1319   LearningRate 0.0712   Epoch: 0   Global Step: 3690   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:58:34,264-Speed 5978.02 samples/sec   Loss 35.1569   LearningRate 0.0714   Epoch: 0   Global Step: 3700   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:58:41,123-Speed 5973.68 samples/sec   Loss 35.0992   LearningRate 0.0716   Epoch: 0   Global Step: 3710   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:58:47,994-Speed 5962.36 samples/sec   Loss 35.0593   LearningRate 0.0718   Epoch: 0   Global Step: 3720   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:58:54,857-Speed 5970.11 samples/sec   Loss 35.0803   LearningRate 0.0719   Epoch: 0   Global Step: 3730   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 20:59:01,704-Speed 5983.01 samples/sec   Loss 35.0500   LearningRate 0.0721   Epoch: 0   Global Step: 3740   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 20:59:08,555-Speed 5980.50 samples/sec   Loss 34.9635   LearningRate 0.0723   Epoch: 0   Global Step: 3750   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 20:59:15,417-Speed 5969.60 samples/sec   Loss 34.8776   LearningRate 0.0725   Epoch: 0   Global Step: 3760   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 20:59:22,277-Speed 5972.11 samples/sec   Loss 34.8599   LearningRate 0.0727   Epoch: 0   Global Step: 3770   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:59:29,125-Speed 5983.10 samples/sec   Loss 34.8668   LearningRate 0.0729   Epoch: 0   Global Step: 3780   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:59:35,993-Speed 5965.99 samples/sec   Loss 34.7957   LearningRate 0.0731   Epoch: 0   Global Step: 3790   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:59:42,843-Speed 5979.86 samples/sec   Loss 34.7567   LearningRate 0.0733   Epoch: 0   Global Step: 3800   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:59:49,705-Speed 5970.53 samples/sec   Loss 34.7522   LearningRate 0.0735   Epoch: 0   Global Step: 3810   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 20:59:56,555-Speed 5981.13 samples/sec   Loss 34.6789   LearningRate 0.0737   Epoch: 0   Global Step: 3820   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:00:03,425-Speed 5963.31 samples/sec   Loss 34.6225   LearningRate 0.0739   Epoch: 0   Global Step: 3830   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:00:10,281-Speed 5974.49 samples/sec   Loss 34.6152   LearningRate 0.0741   Epoch: 0   Global Step: 3840   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:00:17,150-Speed 5964.77 samples/sec   Loss 34.5266   LearningRate 0.0743   Epoch: 0   Global Step: 3850   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:00:24,003-Speed 5977.75 samples/sec   Loss 34.5414   LearningRate 0.0745   Epoch: 0   Global Step: 3860   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:00:30,861-Speed 5974.35 samples/sec   Loss 34.4448   LearningRate 0.0746   Epoch: 0   Global Step: 3870   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:00:37,706-Speed 5985.01 samples/sec   Loss 34.4667   LearningRate 0.0748   Epoch: 0   Global Step: 3880   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:00:44,588-Speed 5953.31 samples/sec   Loss 34.4353   LearningRate 0.0750   Epoch: 0   Global Step: 3890   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:00:51,436-Speed 5981.73 samples/sec   Loss 34.3496   LearningRate 0.0752   Epoch: 0   Global Step: 3900   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:00:58,336-Speed 5937.93 samples/sec   Loss 34.3870   LearningRate 0.0754   Epoch: 0   Global Step: 3910   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:01:05,204-Speed 5967.61 samples/sec   Loss 34.2986   LearningRate 0.0756   Epoch: 0   Global Step: 3920   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:01:12,067-Speed 5969.12 samples/sec   Loss 34.2485   LearningRate 0.0758   Epoch: 0   Global Step: 3930   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:01:18,925-Speed 5974.41 samples/sec   Loss 34.1365   LearningRate 0.0760   Epoch: 0   Global Step: 3940   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:01:25,772-Speed 5983.44 samples/sec   Loss 34.1696   LearningRate 0.0762   Epoch: 0   Global Step: 3950   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:01:32,625-Speed 5981.55 samples/sec   Loss 34.0986   LearningRate 0.0764   Epoch: 0   Global Step: 3960   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:01:39,480-Speed 5976.29 samples/sec   Loss 34.1165   LearningRate 0.0766   Epoch: 0   Global Step: 3970   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:01:46,327-Speed 5983.15 samples/sec   Loss 34.1329   LearningRate 0.0768   Epoch: 0   Global Step: 3980   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:01:53,199-Speed 5961.82 samples/sec   Loss 33.9972   LearningRate 0.0770   Epoch: 0   Global Step: 3990   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:02:00,068-Speed 5965.05 samples/sec   Loss 33.9902   LearningRate 0.0772   Epoch: 0   Global Step: 4000   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:02:06,944-Speed 5957.74 samples/sec   Loss 33.9399   LearningRate 0.0773   Epoch: 0   Global Step: 4010   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:02:13,807-Speed 5971.05 samples/sec   Loss 33.8481   LearningRate 0.0775   Epoch: 0   Global Step: 4020   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:02:20,684-Speed 5956.95 samples/sec   Loss 33.8311   LearningRate 0.0777   Epoch: 0   Global Step: 4030   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:02:27,577-Speed 5945.30 samples/sec   Loss 33.8118   LearningRate 0.0779   Epoch: 0   Global Step: 4040   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:02:34,432-Speed 5976.44 samples/sec   Loss 33.7787   LearningRate 0.0781   Epoch: 0   Global Step: 4050   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:02:41,277-Speed 5984.91 samples/sec   Loss 33.7604   LearningRate 0.0783   Epoch: 0   Global Step: 4060   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:02:48,147-Speed 5963.97 samples/sec   Loss 33.6753   LearningRate 0.0785   Epoch: 0   Global Step: 4070   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:02:55,002-Speed 5976.21 samples/sec   Loss 33.6727   LearningRate 0.0787   Epoch: 0   Global Step: 4080   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:03:01,850-Speed 5982.80 samples/sec   Loss 33.5546   LearningRate 0.0789   Epoch: 0   Global Step: 4090   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:03:08,706-Speed 5975.08 samples/sec   Loss 33.5452   LearningRate 0.0791   Epoch: 0   Global Step: 4100   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:03:15,576-Speed 5963.37 samples/sec   Loss 33.4464   LearningRate 0.0793   Epoch: 0   Global Step: 4110   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:03:22,426-Speed 5980.36 samples/sec   Loss 33.4349   LearningRate 0.0795   Epoch: 0   Global Step: 4120   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:03:29,274-Speed 5983.54 samples/sec   Loss 33.4476   LearningRate 0.0797   Epoch: 0   Global Step: 4130   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:03:36,128-Speed 5977.07 samples/sec   Loss 33.4153   LearningRate 0.0799   Epoch: 0   Global Step: 4140   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:03:42,970-Speed 5987.89 samples/sec   Loss 33.2608   LearningRate 0.0800   Epoch: 0   Global Step: 4150   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:03:49,825-Speed 5976.81 samples/sec   Loss 33.3099   LearningRate 0.0802   Epoch: 0   Global Step: 4160   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:03:56,671-Speed 5983.93 samples/sec   Loss 33.2322   LearningRate 0.0804   Epoch: 0   Global Step: 4170   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:04:03,568-Speed 5940.69 samples/sec   Loss 33.1833   LearningRate 0.0806   Epoch: 0   Global Step: 4180   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:04:10,442-Speed 5959.74 samples/sec   Loss 33.0915   LearningRate 0.0808   Epoch: 0   Global Step: 4190   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:04:17,328-Speed 5949.03 samples/sec   Loss 33.0803   LearningRate 0.0810   Epoch: 0   Global Step: 4200   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:04:24,201-Speed 5963.81 samples/sec   Loss 33.1097   LearningRate 0.0812   Epoch: 0   Global Step: 4210   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:04:31,072-Speed 5962.64 samples/sec   Loss 33.0498   LearningRate 0.0814   Epoch: 0   Global Step: 4220   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:04:37,930-Speed 5973.53 samples/sec   Loss 32.9804   LearningRate 0.0816   Epoch: 0   Global Step: 4230   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:04:44,793-Speed 5969.70 samples/sec   Loss 32.9122   LearningRate 0.0818   Epoch: 0   Global Step: 4240   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:04:51,641-Speed 5982.37 samples/sec   Loss 32.8876   LearningRate 0.0820   Epoch: 0   Global Step: 4250   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:04:58,524-Speed 5952.09 samples/sec   Loss 32.9156   LearningRate 0.0822   Epoch: 0   Global Step: 4260   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:05:05,374-Speed 5980.94 samples/sec   Loss 32.8686   LearningRate 0.0824   Epoch: 0   Global Step: 4270   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:05:12,228-Speed 5976.41 samples/sec   Loss 32.7162   LearningRate 0.0826   Epoch: 0   Global Step: 4280   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:05:19,102-Speed 5960.49 samples/sec   Loss 32.7185   LearningRate 0.0827   Epoch: 0   Global Step: 4290   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:05:25,986-Speed 5950.86 samples/sec   Loss 32.6298   LearningRate 0.0829   Epoch: 0   Global Step: 4300   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:05:32,860-Speed 5961.38 samples/sec   Loss 32.6307   LearningRate 0.0831   Epoch: 0   Global Step: 4310   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:05:39,728-Speed 5965.15 samples/sec   Loss 32.5682   LearningRate 0.0833   Epoch: 0   Global Step: 4320   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:05:46,649-Speed 5919.26 samples/sec   Loss 32.5532   LearningRate 0.0835   Epoch: 0   Global Step: 4330   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:05:53,510-Speed 5972.44 samples/sec   Loss 32.5038   LearningRate 0.0837   Epoch: 0   Global Step: 4340   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:06:00,366-Speed 5975.48 samples/sec   Loss 32.4039   LearningRate 0.0839   Epoch: 0   Global Step: 4350   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:06:07,228-Speed 5970.33 samples/sec   Loss 32.3771   LearningRate 0.0841   Epoch: 0   Global Step: 4360   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:06:14,083-Speed 5977.00 samples/sec   Loss 32.4050   LearningRate 0.0843   Epoch: 0   Global Step: 4370   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:06:20,935-Speed 5978.90 samples/sec   Loss 32.3070   LearningRate 0.0845   Epoch: 0   Global Step: 4380   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:06:27,818-Speed 5952.15 samples/sec   Loss 32.2742   LearningRate 0.0847   Epoch: 0   Global Step: 4390   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:06:34,683-Speed 5967.68 samples/sec   Loss 32.2404   LearningRate 0.0849   Epoch: 0   Global Step: 4400   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:06:41,546-Speed 5969.57 samples/sec   Loss 32.1633   LearningRate 0.0851   Epoch: 0   Global Step: 4410   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:06:48,413-Speed 5966.08 samples/sec   Loss 32.1344   LearningRate 0.0853   Epoch: 0   Global Step: 4420   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:06:55,270-Speed 5974.86 samples/sec   Loss 32.1704   LearningRate 0.0854   Epoch: 0   Global Step: 4430   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:07:02,139-Speed 5964.65 samples/sec   Loss 31.9536   LearningRate 0.0856   Epoch: 0   Global Step: 4440   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:07:09,026-Speed 5948.66 samples/sec   Loss 31.9404   LearningRate 0.0858   Epoch: 0   Global Step: 4450   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:07:15,893-Speed 5965.24 samples/sec   Loss 31.9361   LearningRate 0.0860   Epoch: 0   Global Step: 4460   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:07:22,762-Speed 5964.68 samples/sec   Loss 31.9429   LearningRate 0.0862   Epoch: 0   Global Step: 4470   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:07:29,612-Speed 5980.47 samples/sec   Loss 31.7962   LearningRate 0.0864   Epoch: 0   Global Step: 4480   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:07:36,477-Speed 5970.08 samples/sec   Loss 31.8327   LearningRate 0.0866   Epoch: 0   Global Step: 4490   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:07:43,337-Speed 5972.30 samples/sec   Loss 31.8090   LearningRate 0.0868   Epoch: 0   Global Step: 4500   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:07:50,225-Speed 5947.39 samples/sec   Loss 31.7536   LearningRate 0.0870   Epoch: 0   Global Step: 4510   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:07:57,082-Speed 5974.19 samples/sec   Loss 31.6869   LearningRate 0.0872   Epoch: 0   Global Step: 4520   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:08:03,944-Speed 5970.80 samples/sec   Loss 31.7174   LearningRate 0.0874   Epoch: 0   Global Step: 4530   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:08:10,806-Speed 5970.32 samples/sec   Loss 31.5951   LearningRate 0.0876   Epoch: 0   Global Step: 4540   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:08:17,682-Speed 5958.79 samples/sec   Loss 31.4843   LearningRate 0.0878   Epoch: 0   Global Step: 4550   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:08:24,552-Speed 5965.15 samples/sec   Loss 31.4488   LearningRate 0.0880   Epoch: 0   Global Step: 4560   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:08:31,413-Speed 5970.41 samples/sec   Loss 31.4314   LearningRate 0.0881   Epoch: 0   Global Step: 4570   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:08:38,305-Speed 5944.07 samples/sec   Loss 31.4752   LearningRate 0.0883   Epoch: 0   Global Step: 4580   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:08:45,161-Speed 5975.71 samples/sec   Loss 31.3671   LearningRate 0.0885   Epoch: 0   Global Step: 4590   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:08:52,031-Speed 5963.13 samples/sec   Loss 31.2846   LearningRate 0.0887   Epoch: 0   Global Step: 4600   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:08:58,911-Speed 5955.22 samples/sec   Loss 31.2923   LearningRate 0.0889   Epoch: 0   Global Step: 4610   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:09:05,767-Speed 5976.88 samples/sec   Loss 31.2495   LearningRate 0.0891   Epoch: 0   Global Step: 4620   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:09:12,644-Speed 5957.93 samples/sec   Loss 31.2091   LearningRate 0.0893   Epoch: 0   Global Step: 4630   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:09:19,498-Speed 5976.95 samples/sec   Loss 31.1343   LearningRate 0.0895   Epoch: 0   Global Step: 4640   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:09:26,366-Speed 5965.13 samples/sec   Loss 31.0259   LearningRate 0.0897   Epoch: 0   Global Step: 4650   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:09:33,225-Speed 5972.71 samples/sec   Loss 31.0299   LearningRate 0.0899   Epoch: 0   Global Step: 4660   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:09:40,083-Speed 5973.40 samples/sec   Loss 31.0379   LearningRate 0.0901   Epoch: 0   Global Step: 4670   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:09:46,952-Speed 5964.57 samples/sec   Loss 30.9790   LearningRate 0.0903   Epoch: 0   Global Step: 4680   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:09:53,825-Speed 5961.31 samples/sec   Loss 30.8526   LearningRate 0.0905   Epoch: 0   Global Step: 4690   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:10:00,688-Speed 5969.34 samples/sec   Loss 30.8566   LearningRate 0.0907   Epoch: 0   Global Step: 4700   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:10:07,529-Speed 5988.72 samples/sec   Loss 30.8156   LearningRate 0.0908   Epoch: 0   Global Step: 4710   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:10:14,401-Speed 5961.45 samples/sec   Loss 30.7997   LearningRate 0.0910   Epoch: 0   Global Step: 4720   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:10:21,267-Speed 5967.00 samples/sec   Loss 30.7015   LearningRate 0.0912   Epoch: 0   Global Step: 4730   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:10:28,136-Speed 5963.75 samples/sec   Loss 30.6229   LearningRate 0.0914   Epoch: 0   Global Step: 4740   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:10:34,994-Speed 5973.04 samples/sec   Loss 30.6299   LearningRate 0.0916   Epoch: 0   Global Step: 4750   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:10:41,846-Speed 5978.69 samples/sec   Loss 30.5255   LearningRate 0.0918   Epoch: 0   Global Step: 4760   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:10:48,709-Speed 5969.83 samples/sec   Loss 30.4690   LearningRate 0.0920   Epoch: 0   Global Step: 4770   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:10:55,563-Speed 5977.00 samples/sec   Loss 30.4642   LearningRate 0.0922   Epoch: 0   Global Step: 4780   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:11:02,420-Speed 5974.44 samples/sec   Loss 30.4304   LearningRate 0.0924   Epoch: 0   Global Step: 4790   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:11:09,316-Speed 5941.27 samples/sec   Loss 30.4013   LearningRate 0.0926   Epoch: 0   Global Step: 4800   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:11:16,176-Speed 5971.88 samples/sec   Loss 30.3322   LearningRate 0.0928   Epoch: 0   Global Step: 4810   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:11:23,045-Speed 5964.77 samples/sec   Loss 30.2091   LearningRate 0.0930   Epoch: 0   Global Step: 4820   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:11:29,904-Speed 5972.29 samples/sec   Loss 30.2256   LearningRate 0.0932   Epoch: 0   Global Step: 4830   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:11:36,775-Speed 5965.95 samples/sec   Loss 30.1388   LearningRate 0.0934   Epoch: 0   Global Step: 4840   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:11:43,623-Speed 5981.86 samples/sec   Loss 30.1461   LearningRate 0.0935   Epoch: 0   Global Step: 4850   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:11:50,496-Speed 5961.49 samples/sec   Loss 30.0141   LearningRate 0.0937   Epoch: 0   Global Step: 4860   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:11:57,350-Speed 5977.54 samples/sec   Loss 30.0586   LearningRate 0.0939   Epoch: 0   Global Step: 4870   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:12:04,213-Speed 5968.98 samples/sec   Loss 30.0525   LearningRate 0.0941   Epoch: 0   Global Step: 4880   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:12:11,090-Speed 5957.79 samples/sec   Loss 29.9427   LearningRate 0.0943   Epoch: 0   Global Step: 4890   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:12:17,939-Speed 5981.20 samples/sec   Loss 29.8670   LearningRate 0.0945   Epoch: 0   Global Step: 4900   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:12:24,918-Speed 5870.47 samples/sec   Loss 29.7991   LearningRate 0.0947   Epoch: 0   Global Step: 4910   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:12:31,889-Speed 5879.80 samples/sec   Loss 29.7880   LearningRate 0.0949   Epoch: 0   Global Step: 4920   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:12:38,772-Speed 5952.50 samples/sec   Loss 29.8002   LearningRate 0.0951   Epoch: 0   Global Step: 4930   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:12:45,632-Speed 5972.23 samples/sec   Loss 29.6866   LearningRate 0.0953   Epoch: 0   Global Step: 4940   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:12:52,493-Speed 5970.86 samples/sec   Loss 29.6787   LearningRate 0.0955   Epoch: 0   Global Step: 4950   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:12:59,362-Speed 5964.51 samples/sec   Loss 29.5911   LearningRate 0.0957   Epoch: 0   Global Step: 4960   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:13:06,229-Speed 5965.52 samples/sec   Loss 29.5268   LearningRate 0.0959   Epoch: 0   Global Step: 4970   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:13:13,104-Speed 5959.25 samples/sec   Loss 29.4482   LearningRate 0.0961   Epoch: 0   Global Step: 4980   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:13:19,974-Speed 5963.51 samples/sec   Loss 29.3663   LearningRate 0.0962   Epoch: 0   Global Step: 4990   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:13:26,840-Speed 5966.98 samples/sec   Loss 29.4447   LearningRate 0.0964   Epoch: 0   Global Step: 5000   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:13:53,821-[lfw][5000]XNorm: 22.570552
Training: 2022-01-07 21:13:53,822-[lfw][5000]Accuracy-Flip: 0.98167+-0.00695
Training: 2022-01-07 21:13:53,822-[lfw][5000]Accuracy-Highest: 0.98167
Training: 2022-01-07 21:14:24,946-[cfp_fp][5000]XNorm: 20.072500
Training: 2022-01-07 21:14:24,947-[cfp_fp][5000]Accuracy-Flip: 0.89171+-0.00985
Training: 2022-01-07 21:14:24,948-[cfp_fp][5000]Accuracy-Highest: 0.89171
Training: 2022-01-07 21:14:51,791-[agedb_30][5000]XNorm: 22.088100
Training: 2022-01-07 21:14:51,792-[agedb_30][5000]Accuracy-Flip: 0.85750+-0.01241
Training: 2022-01-07 21:14:51,793-[agedb_30][5000]Accuracy-Highest: 0.85750
Training: 2022-01-07 21:14:58,670-Speed 446.05 samples/sec   Loss 29.3444   LearningRate 0.0966   Epoch: 0   Global Step: 5010   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 21:15:05,524-Speed 5976.56 samples/sec   Loss 29.3053   LearningRate 0.0968   Epoch: 0   Global Step: 5020   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 21:15:12,382-Speed 5973.87 samples/sec   Loss 29.1861   LearningRate 0.0970   Epoch: 0   Global Step: 5030   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 21:15:19,252-Speed 5966.44 samples/sec   Loss 29.2594   LearningRate 0.0972   Epoch: 0   Global Step: 5040   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 21:15:26,140-Speed 5948.35 samples/sec   Loss 29.0620   LearningRate 0.0974   Epoch: 0   Global Step: 5050   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 21:15:33,026-Speed 5949.21 samples/sec   Loss 29.1541   LearningRate 0.0976   Epoch: 0   Global Step: 5060   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 21:15:39,910-Speed 5951.21 samples/sec   Loss 29.0856   LearningRate 0.0978   Epoch: 0   Global Step: 5070   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 21:15:46,804-Speed 5942.84 samples/sec   Loss 28.9425   LearningRate 0.0980   Epoch: 0   Global Step: 5080   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 21:15:53,693-Speed 5947.67 samples/sec   Loss 29.0103   LearningRate 0.0982   Epoch: 0   Global Step: 5090   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 21:16:00,595-Speed 5935.11 samples/sec   Loss 28.8262   LearningRate 0.0984   Epoch: 0   Global Step: 5100   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-01-07 21:16:07,489-Speed 5942.36 samples/sec   Loss 28.8606   LearningRate 0.0986   Epoch: 0   Global Step: 5110   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-01-07 21:16:14,380-Speed 5945.20 samples/sec   Loss 28.7762   LearningRate 0.0988   Epoch: 0   Global Step: 5120   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-01-07 21:16:21,255-Speed 5959.55 samples/sec   Loss 28.7434   LearningRate 0.0989   Epoch: 0   Global Step: 5130   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 21:16:28,143-Speed 5947.61 samples/sec   Loss 28.6580   LearningRate 0.0991   Epoch: 0   Global Step: 5140   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 21:16:35,045-Speed 5935.38 samples/sec   Loss 28.5890   LearningRate 0.0993   Epoch: 0   Global Step: 5150   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 21:16:41,923-Speed 5956.87 samples/sec   Loss 28.5379   LearningRate 0.0995   Epoch: 0   Global Step: 5160   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 21:16:48,806-Speed 5955.93 samples/sec   Loss 28.5709   LearningRate 0.0997   Epoch: 0   Global Step: 5170   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 21:16:55,671-Speed 5967.75 samples/sec   Loss 28.5357   LearningRate 0.0999   Epoch: 0   Global Step: 5180   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 21:17:02,522-Speed 5980.25 samples/sec   Loss 28.5321   LearningRate 0.1001   Epoch: 0   Global Step: 5190   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 21:17:09,371-Speed 5981.13 samples/sec   Loss 28.3559   LearningRate 0.1003   Epoch: 0   Global Step: 5200   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 21:17:16,245-Speed 5959.54 samples/sec   Loss 28.3976   LearningRate 0.1005   Epoch: 0   Global Step: 5210   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 21:17:23,133-Speed 5948.74 samples/sec   Loss 28.3089   LearningRate 0.1007   Epoch: 0   Global Step: 5220   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 21:17:30,019-Speed 5949.51 samples/sec   Loss 28.3166   LearningRate 0.1009   Epoch: 0   Global Step: 5230   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-01-07 21:17:36,888-Speed 5964.24 samples/sec   Loss 28.1921   LearningRate 0.1011   Epoch: 0   Global Step: 5240   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 21:17:43,825-Speed 5905.12 samples/sec   Loss 28.1205   LearningRate 0.1013   Epoch: 0   Global Step: 5250   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 21:17:50,680-Speed 5977.08 samples/sec   Loss 28.0629   LearningRate 0.1015   Epoch: 0   Global Step: 5260   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 21:17:57,548-Speed 5965.33 samples/sec   Loss 28.1142   LearningRate 0.1016   Epoch: 0   Global Step: 5270   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 21:18:04,408-Speed 5971.56 samples/sec   Loss 28.0000   LearningRate 0.1018   Epoch: 0   Global Step: 5280   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 21:18:11,282-Speed 5960.10 samples/sec   Loss 28.0388   LearningRate 0.1020   Epoch: 0   Global Step: 5290   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 21:18:18,235-Speed 5893.06 samples/sec   Loss 27.8527   LearningRate 0.1022   Epoch: 0   Global Step: 5300   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 21:18:25,198-Speed 5883.83 samples/sec   Loss 27.8429   LearningRate 0.1024   Epoch: 0   Global Step: 5310   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 21:18:32,081-Speed 5952.28 samples/sec   Loss 27.8267   LearningRate 0.1026   Epoch: 0   Global Step: 5320   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 21:18:38,956-Speed 5961.28 samples/sec   Loss 27.8109   LearningRate 0.1028   Epoch: 0   Global Step: 5330   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 21:18:45,849-Speed 5943.23 samples/sec   Loss 27.6715   LearningRate 0.1030   Epoch: 0   Global Step: 5340   Fp16 Grad Scale: 262144   Required: 40 hours
Training: 2022-01-07 21:18:52,707-Speed 5973.83 samples/sec   Loss 27.7718   LearningRate 0.1032   Epoch: 0   Global Step: 5350   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 21:18:59,586-Speed 5955.68 samples/sec   Loss 27.6352   LearningRate 0.1034   Epoch: 0   Global Step: 5360   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 21:19:06,468-Speed 5952.31 samples/sec   Loss 27.5857   LearningRate 0.1036   Epoch: 0   Global Step: 5370   Fp16 Grad Scale: 131072   Required: 40 hours
Training: 2022-01-07 21:19:13,348-Speed 5955.21 samples/sec   Loss 27.5943   LearningRate 0.1038   Epoch: 0   Global Step: 5380   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:19:20,225-Speed 5957.64 samples/sec   Loss 27.4644   LearningRate 0.1040   Epoch: 0   Global Step: 5390   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:19:27,116-Speed 5945.65 samples/sec   Loss 27.4641   LearningRate 0.1042   Epoch: 0   Global Step: 5400   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:19:33,970-Speed 5976.93 samples/sec   Loss 27.4321   LearningRate 0.1043   Epoch: 0   Global Step: 5410   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:19:40,857-Speed 5948.91 samples/sec   Loss 27.3412   LearningRate 0.1045   Epoch: 0   Global Step: 5420   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:19:47,719-Speed 5970.70 samples/sec   Loss 27.3061   LearningRate 0.1047   Epoch: 0   Global Step: 5430   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:19:54,591-Speed 5961.07 samples/sec   Loss 27.1689   LearningRate 0.1049   Epoch: 0   Global Step: 5440   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:20:01,463-Speed 5961.92 samples/sec   Loss 27.1390   LearningRate 0.1051   Epoch: 0   Global Step: 5450   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:20:08,325-Speed 5970.10 samples/sec   Loss 27.2063   LearningRate 0.1053   Epoch: 0   Global Step: 5460   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:20:15,182-Speed 5974.22 samples/sec   Loss 27.0826   LearningRate 0.1055   Epoch: 0   Global Step: 5470   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:20:22,065-Speed 5951.22 samples/sec   Loss 27.1802   LearningRate 0.1057   Epoch: 0   Global Step: 5480   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:20:28,937-Speed 5962.28 samples/sec   Loss 27.0110   LearningRate 0.1059   Epoch: 0   Global Step: 5490   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:20:35,797-Speed 5971.84 samples/sec   Loss 26.8824   LearningRate 0.1061   Epoch: 0   Global Step: 5500   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:20:42,667-Speed 5963.04 samples/sec   Loss 26.8471   LearningRate 0.1063   Epoch: 0   Global Step: 5510   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:20:49,564-Speed 5951.44 samples/sec   Loss 26.8067   LearningRate 0.1065   Epoch: 0   Global Step: 5520   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:20:56,425-Speed 5971.03 samples/sec   Loss 26.8032   LearningRate 0.1067   Epoch: 0   Global Step: 5530   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:21:03,309-Speed 5951.82 samples/sec   Loss 26.7403   LearningRate 0.1069   Epoch: 0   Global Step: 5540   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:21:10,185-Speed 5957.98 samples/sec   Loss 26.7607   LearningRate 0.1070   Epoch: 0   Global Step: 5550   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:21:17,043-Speed 5974.78 samples/sec   Loss 26.6883   LearningRate 0.1072   Epoch: 0   Global Step: 5560   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:21:23,917-Speed 5959.76 samples/sec   Loss 26.6191   LearningRate 0.1074   Epoch: 0   Global Step: 5570   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:21:30,796-Speed 5956.01 samples/sec   Loss 26.4842   LearningRate 0.1076   Epoch: 0   Global Step: 5580   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:21:37,662-Speed 5966.31 samples/sec   Loss 26.5450   LearningRate 0.1078   Epoch: 0   Global Step: 5590   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:21:44,520-Speed 5974.10 samples/sec   Loss 26.5283   LearningRate 0.1080   Epoch: 0   Global Step: 5600   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:21:51,394-Speed 5959.67 samples/sec   Loss 26.3961   LearningRate 0.1082   Epoch: 0   Global Step: 5610   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:21:58,252-Speed 5974.38 samples/sec   Loss 26.3914   LearningRate 0.1084   Epoch: 0   Global Step: 5620   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:22:05,134-Speed 5953.02 samples/sec   Loss 26.4419   LearningRate 0.1086   Epoch: 0   Global Step: 5630   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:22:12,031-Speed 5939.70 samples/sec   Loss 26.3171   LearningRate 0.1088   Epoch: 0   Global Step: 5640   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:22:18,894-Speed 5969.72 samples/sec   Loss 26.2565   LearningRate 0.1090   Epoch: 0   Global Step: 5650   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:22:25,758-Speed 5970.91 samples/sec   Loss 26.1823   LearningRate 0.1092   Epoch: 0   Global Step: 5660   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 21:22:32,607-Speed 5982.23 samples/sec   Loss 26.1081   LearningRate 0.1094   Epoch: 0   Global Step: 5670   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 21:22:39,475-Speed 5964.74 samples/sec   Loss 26.0931   LearningRate 0.1096   Epoch: 0   Global Step: 5680   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 21:22:46,335-Speed 5972.58 samples/sec   Loss 25.9846   LearningRate 0.1098   Epoch: 0   Global Step: 5690   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 21:22:53,235-Speed 5937.42 samples/sec   Loss 25.8910   LearningRate 0.1099   Epoch: 0   Global Step: 5700   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 21:23:00,095-Speed 5971.53 samples/sec   Loss 25.8874   LearningRate 0.1101   Epoch: 0   Global Step: 5710   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 21:23:06,948-Speed 5978.22 samples/sec   Loss 25.8290   LearningRate 0.1103   Epoch: 0   Global Step: 5720   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 21:23:13,807-Speed 5972.87 samples/sec   Loss 25.8547   LearningRate 0.1105   Epoch: 0   Global Step: 5730   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 21:23:20,688-Speed 5953.47 samples/sec   Loss 25.6778   LearningRate 0.1107   Epoch: 0   Global Step: 5740   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 21:23:27,556-Speed 5964.94 samples/sec   Loss 25.7239   LearningRate 0.1109   Epoch: 0   Global Step: 5750   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 21:23:34,435-Speed 5955.05 samples/sec   Loss 25.6553   LearningRate 0.1111   Epoch: 0   Global Step: 5760   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:23:41,301-Speed 5967.44 samples/sec   Loss 25.5779   LearningRate 0.1113   Epoch: 0   Global Step: 5770   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:23:48,160-Speed 5972.73 samples/sec   Loss 25.5174   LearningRate 0.1115   Epoch: 0   Global Step: 5780   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:23:55,026-Speed 5966.79 samples/sec   Loss 25.5147   LearningRate 0.1117   Epoch: 0   Global Step: 5790   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:24:01,920-Speed 5943.66 samples/sec   Loss 25.4931   LearningRate 0.1119   Epoch: 0   Global Step: 5800   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:24:08,790-Speed 5963.33 samples/sec   Loss 25.4098   LearningRate 0.1121   Epoch: 0   Global Step: 5810   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:24:15,659-Speed 5963.51 samples/sec   Loss 25.3802   LearningRate 0.1123   Epoch: 0   Global Step: 5820   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:24:22,531-Speed 5961.83 samples/sec   Loss 25.2667   LearningRate 0.1125   Epoch: 0   Global Step: 5830   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:24:29,436-Speed 5933.39 samples/sec   Loss 25.3373   LearningRate 0.1126   Epoch: 0   Global Step: 5840   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:24:36,287-Speed 5979.96 samples/sec   Loss 25.1256   LearningRate 0.1128   Epoch: 0   Global Step: 5850   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:24:43,152-Speed 5967.87 samples/sec   Loss 25.2071   LearningRate 0.1130   Epoch: 0   Global Step: 5860   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:24:50,035-Speed 5951.68 samples/sec   Loss 25.1097   LearningRate 0.1132   Epoch: 0   Global Step: 5870   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:24:56,904-Speed 5965.81 samples/sec   Loss 25.0595   LearningRate 0.1134   Epoch: 0   Global Step: 5880   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:25:03,761-Speed 5973.98 samples/sec   Loss 25.0161   LearningRate 0.1136   Epoch: 0   Global Step: 5890   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:25:10,630-Speed 5965.01 samples/sec   Loss 24.9846   LearningRate 0.1138   Epoch: 0   Global Step: 5900   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:25:17,521-Speed 5944.68 samples/sec   Loss 24.9442   LearningRate 0.1140   Epoch: 0   Global Step: 5910   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:25:24,400-Speed 5955.74 samples/sec   Loss 24.9069   LearningRate 0.1142   Epoch: 0   Global Step: 5920   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:25:31,276-Speed 5958.85 samples/sec   Loss 24.7856   LearningRate 0.1144   Epoch: 0   Global Step: 5930   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:25:38,148-Speed 5966.43 samples/sec   Loss 24.8231   LearningRate 0.1146   Epoch: 0   Global Step: 5940   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:25:45,018-Speed 5966.49 samples/sec   Loss 24.7873   LearningRate 0.1148   Epoch: 0   Global Step: 5950   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:25:51,882-Speed 5968.29 samples/sec   Loss 24.7062   LearningRate 0.1150   Epoch: 0   Global Step: 5960   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:25:58,737-Speed 5975.70 samples/sec   Loss 24.6227   LearningRate 0.1152   Epoch: 0   Global Step: 5970   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:26:05,606-Speed 5964.97 samples/sec   Loss 24.5703   LearningRate 0.1153   Epoch: 0   Global Step: 5980   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:26:12,490-Speed 5955.15 samples/sec   Loss 24.5106   LearningRate 0.1155   Epoch: 0   Global Step: 5990   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:26:19,347-Speed 5973.92 samples/sec   Loss 24.5200   LearningRate 0.1157   Epoch: 0   Global Step: 6000   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:26:26,209-Speed 5972.50 samples/sec   Loss 24.4210   LearningRate 0.1159   Epoch: 0   Global Step: 6010   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:26:33,068-Speed 5972.55 samples/sec   Loss 24.3314   LearningRate 0.1161   Epoch: 0   Global Step: 6020   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:26:39,929-Speed 5971.43 samples/sec   Loss 24.3441   LearningRate 0.1163   Epoch: 0   Global Step: 6030   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:26:46,811-Speed 5953.07 samples/sec   Loss 24.2755   LearningRate 0.1165   Epoch: 0   Global Step: 6040   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:26:53,676-Speed 5967.62 samples/sec   Loss 24.3109   LearningRate 0.1167   Epoch: 0   Global Step: 6050   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:27:00,548-Speed 5962.48 samples/sec   Loss 24.2201   LearningRate 0.1169   Epoch: 0   Global Step: 6060   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:27:07,403-Speed 5978.60 samples/sec   Loss 24.1402   LearningRate 0.1171   Epoch: 0   Global Step: 6070   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:27:14,283-Speed 5956.24 samples/sec   Loss 24.2096   LearningRate 0.1173   Epoch: 0   Global Step: 6080   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:27:21,154-Speed 5962.07 samples/sec   Loss 24.0298   LearningRate 0.1175   Epoch: 0   Global Step: 6090   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:27:28,051-Speed 5940.53 samples/sec   Loss 23.9816   LearningRate 0.1177   Epoch: 0   Global Step: 6100   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:27:34,909-Speed 5973.11 samples/sec   Loss 23.9674   LearningRate 0.1179   Epoch: 0   Global Step: 6110   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:27:41,777-Speed 5965.59 samples/sec   Loss 23.9150   LearningRate 0.1180   Epoch: 0   Global Step: 6120   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:27:48,636-Speed 5972.94 samples/sec   Loss 23.9063   LearningRate 0.1182   Epoch: 0   Global Step: 6130   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:27:55,506-Speed 5963.04 samples/sec   Loss 23.7114   LearningRate 0.1184   Epoch: 0   Global Step: 6140   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:28:02,359-Speed 5977.83 samples/sec   Loss 23.7432   LearningRate 0.1186   Epoch: 0   Global Step: 6150   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:28:09,214-Speed 5976.32 samples/sec   Loss 23.7052   LearningRate 0.1188   Epoch: 0   Global Step: 6160   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:28:16,091-Speed 5957.14 samples/sec   Loss 23.6870   LearningRate 0.1190   Epoch: 0   Global Step: 6170   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:28:22,948-Speed 5974.73 samples/sec   Loss 23.6274   LearningRate 0.1192   Epoch: 0   Global Step: 6180   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:28:29,831-Speed 5953.22 samples/sec   Loss 23.5379   LearningRate 0.1194   Epoch: 0   Global Step: 6190   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:28:36,695-Speed 5969.51 samples/sec   Loss 23.5904   LearningRate 0.1196   Epoch: 0   Global Step: 6200   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:28:43,562-Speed 5966.37 samples/sec   Loss 23.4785   LearningRate 0.1198   Epoch: 0   Global Step: 6210   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:28:50,430-Speed 5964.85 samples/sec   Loss 23.4325   LearningRate 0.1200   Epoch: 0   Global Step: 6220   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:28:57,302-Speed 5961.17 samples/sec   Loss 23.4735   LearningRate 0.1202   Epoch: 0   Global Step: 6230   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:29:04,168-Speed 5966.88 samples/sec   Loss 23.2518   LearningRate 0.1204   Epoch: 0   Global Step: 6240   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:29:11,027-Speed 5973.08 samples/sec   Loss 23.2595   LearningRate 0.1206   Epoch: 0   Global Step: 6250   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:29:17,900-Speed 5960.56 samples/sec   Loss 23.2640   LearningRate 0.1207   Epoch: 0   Global Step: 6260   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:29:24,758-Speed 5973.71 samples/sec   Loss 23.2695   LearningRate 0.1209   Epoch: 0   Global Step: 6270   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:29:31,623-Speed 5967.48 samples/sec   Loss 23.1880   LearningRate 0.1211   Epoch: 0   Global Step: 6280   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:29:38,483-Speed 5972.32 samples/sec   Loss 23.0638   LearningRate 0.1213   Epoch: 0   Global Step: 6290   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:29:45,342-Speed 5973.39 samples/sec   Loss 23.0802   LearningRate 0.1215   Epoch: 0   Global Step: 6300   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:29:52,208-Speed 5966.15 samples/sec   Loss 23.1020   LearningRate 0.1217   Epoch: 0   Global Step: 6310   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:29:59,078-Speed 5963.70 samples/sec   Loss 23.1017   LearningRate 0.1219   Epoch: 0   Global Step: 6320   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:30:05,931-Speed 5977.81 samples/sec   Loss 22.9111   LearningRate 0.1221   Epoch: 0   Global Step: 6330   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:30:12,802-Speed 5964.39 samples/sec   Loss 22.9085   LearningRate 0.1223   Epoch: 0   Global Step: 6340   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:30:19,656-Speed 5977.29 samples/sec   Loss 22.8651   LearningRate 0.1225   Epoch: 0   Global Step: 6350   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:30:26,519-Speed 5969.52 samples/sec   Loss 22.8757   LearningRate 0.1227   Epoch: 0   Global Step: 6360   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:30:33,382-Speed 5969.18 samples/sec   Loss 22.8368   LearningRate 0.1229   Epoch: 0   Global Step: 6370   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:30:40,239-Speed 5973.87 samples/sec   Loss 22.8134   LearningRate 0.1231   Epoch: 0   Global Step: 6380   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:30:47,102-Speed 5969.56 samples/sec   Loss 22.7087   LearningRate 0.1233   Epoch: 0   Global Step: 6390   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:30:53,973-Speed 5961.51 samples/sec   Loss 22.5714   LearningRate 0.1234   Epoch: 0   Global Step: 6400   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:31:00,859-Speed 5949.39 samples/sec   Loss 22.6576   LearningRate 0.1236   Epoch: 0   Global Step: 6410   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:31:07,729-Speed 5963.62 samples/sec   Loss 22.5552   LearningRate 0.1238   Epoch: 0   Global Step: 6420   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:31:14,603-Speed 5959.63 samples/sec   Loss 22.4687   LearningRate 0.1240   Epoch: 0   Global Step: 6430   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:31:21,460-Speed 5975.35 samples/sec   Loss 22.5224   LearningRate 0.1242   Epoch: 0   Global Step: 6440   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:31:28,320-Speed 5972.62 samples/sec   Loss 22.4048   LearningRate 0.1244   Epoch: 0   Global Step: 6450   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:31:35,193-Speed 5960.36 samples/sec   Loss 22.2972   LearningRate 0.1246   Epoch: 0   Global Step: 6460   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:31:42,058-Speed 5967.67 samples/sec   Loss 22.4571   LearningRate 0.1248   Epoch: 0   Global Step: 6470   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:31:48,928-Speed 5963.65 samples/sec   Loss 22.2989   LearningRate 0.1250   Epoch: 0   Global Step: 6480   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:31:55,799-Speed 5962.06 samples/sec   Loss 22.2457   LearningRate 0.1252   Epoch: 0   Global Step: 6490   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:32:02,688-Speed 5946.66 samples/sec   Loss 22.1900   LearningRate 0.1254   Epoch: 0   Global Step: 6500   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:32:09,553-Speed 5967.20 samples/sec   Loss 22.1462   LearningRate 0.1256   Epoch: 0   Global Step: 6510   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:32:16,422-Speed 5964.21 samples/sec   Loss 22.1257   LearningRate 0.1258   Epoch: 0   Global Step: 6520   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:32:23,284-Speed 5969.97 samples/sec   Loss 22.0367   LearningRate 0.1260   Epoch: 0   Global Step: 6530   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:32:30,139-Speed 5976.83 samples/sec   Loss 22.0739   LearningRate 0.1261   Epoch: 0   Global Step: 6540   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:32:36,994-Speed 5976.29 samples/sec   Loss 22.0852   LearningRate 0.1263   Epoch: 0   Global Step: 6550   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:32:43,864-Speed 5964.73 samples/sec   Loss 21.8911   LearningRate 0.1265   Epoch: 0   Global Step: 6560   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:32:50,717-Speed 5977.08 samples/sec   Loss 21.9507   LearningRate 0.1267   Epoch: 0   Global Step: 6570   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:32:57,573-Speed 5976.07 samples/sec   Loss 21.9305   LearningRate 0.1269   Epoch: 0   Global Step: 6580   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:33:04,430-Speed 5974.25 samples/sec   Loss 21.7735   LearningRate 0.1271   Epoch: 0   Global Step: 6590   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:33:11,274-Speed 5985.83 samples/sec   Loss 21.8241   LearningRate 0.1273   Epoch: 0   Global Step: 6600   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:33:18,153-Speed 5957.60 samples/sec   Loss 21.7262   LearningRate 0.1275   Epoch: 0   Global Step: 6610   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:33:25,018-Speed 5969.68 samples/sec   Loss 21.6330   LearningRate 0.1277   Epoch: 0   Global Step: 6620   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:33:31,874-Speed 5975.31 samples/sec   Loss 21.7573   LearningRate 0.1279   Epoch: 0   Global Step: 6630   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:33:38,727-Speed 5977.96 samples/sec   Loss 21.5427   LearningRate 0.1281   Epoch: 0   Global Step: 6640   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:33:45,605-Speed 5959.25 samples/sec   Loss 21.5805   LearningRate 0.1283   Epoch: 0   Global Step: 6650   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:33:52,471-Speed 5967.40 samples/sec   Loss 21.5272   LearningRate 0.1285   Epoch: 0   Global Step: 6660   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:33:59,342-Speed 5963.36 samples/sec   Loss 21.5388   LearningRate 0.1287   Epoch: 0   Global Step: 6670   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:34:06,213-Speed 5961.99 samples/sec   Loss 21.4154   LearningRate 0.1288   Epoch: 0   Global Step: 6680   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:34:13,071-Speed 5977.10 samples/sec   Loss 21.3313   LearningRate 0.1290   Epoch: 0   Global Step: 6690   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:34:19,931-Speed 5971.91 samples/sec   Loss 21.3393   LearningRate 0.1292   Epoch: 0   Global Step: 6700   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:34:26,785-Speed 5976.91 samples/sec   Loss 21.2535   LearningRate 0.1294   Epoch: 0   Global Step: 6710   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:34:33,653-Speed 5965.61 samples/sec   Loss 21.3516   LearningRate 0.1296   Epoch: 0   Global Step: 6720   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:34:40,518-Speed 5967.19 samples/sec   Loss 21.1714   LearningRate 0.1298   Epoch: 0   Global Step: 6730   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:34:47,377-Speed 5972.80 samples/sec   Loss 21.1649   LearningRate 0.1300   Epoch: 0   Global Step: 6740   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:34:54,231-Speed 5977.58 samples/sec   Loss 21.1721   LearningRate 0.1302   Epoch: 0   Global Step: 6750   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:35:01,109-Speed 5957.00 samples/sec   Loss 21.0945   LearningRate 0.1304   Epoch: 0   Global Step: 6760   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:35:07,973-Speed 5968.32 samples/sec   Loss 21.0659   LearningRate 0.1306   Epoch: 0   Global Step: 6770   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:35:14,839-Speed 5966.84 samples/sec   Loss 21.0576   LearningRate 0.1308   Epoch: 0   Global Step: 6780   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:35:21,690-Speed 5979.67 samples/sec   Loss 20.9320   LearningRate 0.1310   Epoch: 0   Global Step: 6790   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:35:28,556-Speed 5966.68 samples/sec   Loss 20.9309   LearningRate 0.1312   Epoch: 0   Global Step: 6800   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:35:35,428-Speed 5961.89 samples/sec   Loss 21.0435   LearningRate 0.1314   Epoch: 0   Global Step: 6810   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:35:42,283-Speed 5976.00 samples/sec   Loss 20.8224   LearningRate 0.1315   Epoch: 0   Global Step: 6820   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:35:49,144-Speed 5971.06 samples/sec   Loss 20.9089   LearningRate 0.1317   Epoch: 0   Global Step: 6830   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:35:56,014-Speed 5963.26 samples/sec   Loss 20.7361   LearningRate 0.1319   Epoch: 0   Global Step: 6840   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:36:02,890-Speed 5959.69 samples/sec   Loss 20.6986   LearningRate 0.1321   Epoch: 0   Global Step: 6850   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:36:09,731-Speed 5988.61 samples/sec   Loss 20.7084   LearningRate 0.1323   Epoch: 0   Global Step: 6860   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:36:16,588-Speed 5974.47 samples/sec   Loss 20.6970   LearningRate 0.1325   Epoch: 0   Global Step: 6870   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:36:23,462-Speed 5959.62 samples/sec   Loss 20.7671   LearningRate 0.1327   Epoch: 0   Global Step: 6880   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:36:30,324-Speed 5970.42 samples/sec   Loss 20.6050   LearningRate 0.1329   Epoch: 0   Global Step: 6890   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:36:37,175-Speed 5979.94 samples/sec   Loss 20.5372   LearningRate 0.1331   Epoch: 0   Global Step: 6900   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:36:44,044-Speed 5963.80 samples/sec   Loss 20.5397   LearningRate 0.1333   Epoch: 0   Global Step: 6910   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:36:50,912-Speed 5965.13 samples/sec   Loss 20.5260   LearningRate 0.1335   Epoch: 0   Global Step: 6920   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:36:57,793-Speed 5954.23 samples/sec   Loss 20.4918   LearningRate 0.1337   Epoch: 0   Global Step: 6930   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:37:04,650-Speed 5974.83 samples/sec   Loss 20.4581   LearningRate 0.1339   Epoch: 0   Global Step: 6940   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:37:11,509-Speed 5972.35 samples/sec   Loss 20.4610   LearningRate 0.1341   Epoch: 0   Global Step: 6950   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:37:18,364-Speed 5976.89 samples/sec   Loss 20.3642   LearningRate 0.1342   Epoch: 0   Global Step: 6960   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:37:25,222-Speed 5973.30 samples/sec   Loss 20.2534   LearningRate 0.1344   Epoch: 0   Global Step: 6970   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:37:32,069-Speed 5983.65 samples/sec   Loss 20.2551   LearningRate 0.1346   Epoch: 0   Global Step: 6980   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:37:38,921-Speed 5978.87 samples/sec   Loss 20.2410   LearningRate 0.1348   Epoch: 0   Global Step: 6990   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:37:45,804-Speed 5958.83 samples/sec   Loss 20.2647   LearningRate 0.1350   Epoch: 0   Global Step: 7000   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:37:52,676-Speed 5962.48 samples/sec   Loss 20.0806   LearningRate 0.1352   Epoch: 0   Global Step: 7010   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:37:59,529-Speed 5977.51 samples/sec   Loss 20.0678   LearningRate 0.1354   Epoch: 0   Global Step: 7020   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:38:06,407-Speed 5957.03 samples/sec   Loss 20.0957   LearningRate 0.1356   Epoch: 0   Global Step: 7030   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:38:13,279-Speed 5962.34 samples/sec   Loss 20.1535   LearningRate 0.1358   Epoch: 0   Global Step: 7040   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:38:20,129-Speed 5980.39 samples/sec   Loss 19.9917   LearningRate 0.1360   Epoch: 0   Global Step: 7050   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:38:27,005-Speed 5958.36 samples/sec   Loss 20.0354   LearningRate 0.1362   Epoch: 0   Global Step: 7060   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:38:33,860-Speed 5976.83 samples/sec   Loss 19.9335   LearningRate 0.1364   Epoch: 0   Global Step: 7070   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:38:40,734-Speed 5959.76 samples/sec   Loss 19.9244   LearningRate 0.1366   Epoch: 0   Global Step: 7080   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:38:47,593-Speed 5972.79 samples/sec   Loss 19.7635   LearningRate 0.1368   Epoch: 0   Global Step: 7090   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:38:54,472-Speed 5955.24 samples/sec   Loss 19.8679   LearningRate 0.1369   Epoch: 0   Global Step: 7100   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:39:01,346-Speed 5960.10 samples/sec   Loss 19.8105   LearningRate 0.1371   Epoch: 0   Global Step: 7110   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:39:08,216-Speed 5963.18 samples/sec   Loss 19.7583   LearningRate 0.1373   Epoch: 0   Global Step: 7120   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:39:15,105-Speed 5946.76 samples/sec   Loss 19.8394   LearningRate 0.1375   Epoch: 0   Global Step: 7130   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:39:21,973-Speed 5967.61 samples/sec   Loss 19.6697   LearningRate 0.1377   Epoch: 0   Global Step: 7140   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:39:28,982-Speed 5844.81 samples/sec   Loss 19.6214   LearningRate 0.1379   Epoch: 0   Global Step: 7150   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:39:35,847-Speed 5967.86 samples/sec   Loss 19.5544   LearningRate 0.1381   Epoch: 0   Global Step: 7160   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:39:42,726-Speed 5955.97 samples/sec   Loss 19.5964   LearningRate 0.1383   Epoch: 0   Global Step: 7170   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:39:49,598-Speed 5960.85 samples/sec   Loss 19.5708   LearningRate 0.1385   Epoch: 0   Global Step: 7180   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:39:56,473-Speed 5959.79 samples/sec   Loss 19.4787   LearningRate 0.1387   Epoch: 0   Global Step: 7190   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:40:03,345-Speed 5961.18 samples/sec   Loss 19.4966   LearningRate 0.1389   Epoch: 0   Global Step: 7200   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:40:10,227-Speed 5953.62 samples/sec   Loss 19.4631   LearningRate 0.1391   Epoch: 0   Global Step: 7210   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:40:17,090-Speed 5969.86 samples/sec   Loss 19.3594   LearningRate 0.1393   Epoch: 0   Global Step: 7220   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:40:23,955-Speed 5979.01 samples/sec   Loss 19.3658   LearningRate 0.1395   Epoch: 0   Global Step: 7230   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:40:30,819-Speed 5968.43 samples/sec   Loss 19.3937   LearningRate 0.1396   Epoch: 0   Global Step: 7240   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:40:37,683-Speed 5968.47 samples/sec   Loss 19.2979   LearningRate 0.1398   Epoch: 0   Global Step: 7250   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:40:44,554-Speed 5963.18 samples/sec   Loss 19.2577   LearningRate 0.1400   Epoch: 0   Global Step: 7260   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:40:51,437-Speed 5951.93 samples/sec   Loss 19.1587   LearningRate 0.1402   Epoch: 0   Global Step: 7270   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:40:58,305-Speed 5964.80 samples/sec   Loss 19.2299   LearningRate 0.1404   Epoch: 0   Global Step: 7280   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:41:05,167-Speed 5971.42 samples/sec   Loss 19.0937   LearningRate 0.1406   Epoch: 0   Global Step: 7290   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:41:12,028-Speed 5970.96 samples/sec   Loss 19.0973   LearningRate 0.1408   Epoch: 0   Global Step: 7300   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:41:18,885-Speed 5974.82 samples/sec   Loss 19.1337   LearningRate 0.1410   Epoch: 0   Global Step: 7310   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:41:25,741-Speed 5975.31 samples/sec   Loss 19.0723   LearningRate 0.1412   Epoch: 0   Global Step: 7320   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:41:32,594-Speed 5977.63 samples/sec   Loss 19.0455   LearningRate 0.1414   Epoch: 0   Global Step: 7330   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:41:39,452-Speed 5974.18 samples/sec   Loss 18.9989   LearningRate 0.1416   Epoch: 0   Global Step: 7340   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:41:46,316-Speed 5969.69 samples/sec   Loss 18.9994   LearningRate 0.1418   Epoch: 0   Global Step: 7350   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:41:53,192-Speed 5958.16 samples/sec   Loss 19.0271   LearningRate 0.1420   Epoch: 0   Global Step: 7360   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:42:00,065-Speed 5961.22 samples/sec   Loss 18.8798   LearningRate 0.1422   Epoch: 0   Global Step: 7370   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:42:06,921-Speed 5975.60 samples/sec   Loss 18.9624   LearningRate 0.1423   Epoch: 0   Global Step: 7380   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:42:13,799-Speed 5956.61 samples/sec   Loss 18.7277   LearningRate 0.1425   Epoch: 0   Global Step: 7390   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:42:20,681-Speed 5952.99 samples/sec   Loss 18.8655   LearningRate 0.1427   Epoch: 0   Global Step: 7400   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:42:27,582-Speed 5942.45 samples/sec   Loss 18.7466   LearningRate 0.1429   Epoch: 0   Global Step: 7410   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:42:34,456-Speed 5958.96 samples/sec   Loss 18.7740   LearningRate 0.1431   Epoch: 0   Global Step: 7420   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:42:41,321-Speed 5968.31 samples/sec   Loss 18.7195   LearningRate 0.1433   Epoch: 0   Global Step: 7430   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:42:48,197-Speed 5958.09 samples/sec   Loss 18.7071   LearningRate 0.1435   Epoch: 0   Global Step: 7440   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:42:55,056-Speed 5972.48 samples/sec   Loss 18.6442   LearningRate 0.1437   Epoch: 0   Global Step: 7450   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:43:01,927-Speed 5961.91 samples/sec   Loss 18.5876   LearningRate 0.1439   Epoch: 0   Global Step: 7460   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:43:08,784-Speed 5974.95 samples/sec   Loss 18.6559   LearningRate 0.1441   Epoch: 0   Global Step: 7470   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:43:15,670-Speed 5949.03 samples/sec   Loss 18.5337   LearningRate 0.1443   Epoch: 0   Global Step: 7480   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:43:22,547-Speed 5958.75 samples/sec   Loss 18.4889   LearningRate 0.1445   Epoch: 0   Global Step: 7490   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:43:29,421-Speed 5960.04 samples/sec   Loss 18.5127   LearningRate 0.1447   Epoch: 0   Global Step: 7500   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:43:36,290-Speed 5964.25 samples/sec   Loss 18.4916   LearningRate 0.1449   Epoch: 0   Global Step: 7510   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:43:43,140-Speed 5979.95 samples/sec   Loss 18.4371   LearningRate 0.1450   Epoch: 0   Global Step: 7520   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:43:50,003-Speed 5972.43 samples/sec   Loss 18.4683   LearningRate 0.1452   Epoch: 0   Global Step: 7530   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:43:56,882-Speed 5955.65 samples/sec   Loss 18.4851   LearningRate 0.1454   Epoch: 0   Global Step: 7540   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:44:03,758-Speed 5958.74 samples/sec   Loss 18.4679   LearningRate 0.1456   Epoch: 0   Global Step: 7550   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:44:10,609-Speed 5979.44 samples/sec   Loss 18.3436   LearningRate 0.1458   Epoch: 0   Global Step: 7560   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:44:17,498-Speed 5946.89 samples/sec   Loss 18.3011   LearningRate 0.1460   Epoch: 0   Global Step: 7570   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:44:24,357-Speed 5975.65 samples/sec   Loss 18.3175   LearningRate 0.1462   Epoch: 0   Global Step: 7580   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:44:31,239-Speed 5952.19 samples/sec   Loss 18.2334   LearningRate 0.1464   Epoch: 0   Global Step: 7590   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:44:38,129-Speed 5946.84 samples/sec   Loss 18.2187   LearningRate 0.1466   Epoch: 0   Global Step: 7600   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:44:44,988-Speed 5972.74 samples/sec   Loss 18.1654   LearningRate 0.1468   Epoch: 0   Global Step: 7610   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:44:51,861-Speed 5961.32 samples/sec   Loss 18.1970   LearningRate 0.1470   Epoch: 0   Global Step: 7620   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:44:58,721-Speed 5971.95 samples/sec   Loss 18.1375   LearningRate 0.1472   Epoch: 0   Global Step: 7630   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:45:05,587-Speed 5966.77 samples/sec   Loss 18.0956   LearningRate 0.1474   Epoch: 0   Global Step: 7640   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:45:12,454-Speed 5966.29 samples/sec   Loss 18.1329   LearningRate 0.1476   Epoch: 0   Global Step: 7650   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:45:19,314-Speed 5972.35 samples/sec   Loss 18.0499   LearningRate 0.1477   Epoch: 0   Global Step: 7660   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:45:26,194-Speed 5955.82 samples/sec   Loss 17.9180   LearningRate 0.1479   Epoch: 0   Global Step: 7670   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:45:33,069-Speed 5958.69 samples/sec   Loss 17.9841   LearningRate 0.1481   Epoch: 0   Global Step: 7680   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:45:39,933-Speed 5968.69 samples/sec   Loss 17.9670   LearningRate 0.1483   Epoch: 0   Global Step: 7690   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:45:46,805-Speed 5961.84 samples/sec   Loss 17.9624   LearningRate 0.1485   Epoch: 0   Global Step: 7700   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:45:53,658-Speed 5978.53 samples/sec   Loss 17.9092   LearningRate 0.1487   Epoch: 0   Global Step: 7710   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:46:00,510-Speed 5978.47 samples/sec   Loss 18.0158   LearningRate 0.1489   Epoch: 0   Global Step: 7720   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:46:07,373-Speed 5968.92 samples/sec   Loss 17.8455   LearningRate 0.1491   Epoch: 0   Global Step: 7730   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:46:14,226-Speed 5978.20 samples/sec   Loss 17.9041   LearningRate 0.1493   Epoch: 0   Global Step: 7740   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:46:21,083-Speed 5974.76 samples/sec   Loss 17.7961   LearningRate 0.1495   Epoch: 0   Global Step: 7750   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:46:27,948-Speed 5970.29 samples/sec   Loss 17.8780   LearningRate 0.1497   Epoch: 0   Global Step: 7760   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:46:34,831-Speed 5951.61 samples/sec   Loss 17.8372   LearningRate 0.1499   Epoch: 0   Global Step: 7770   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:46:41,731-Speed 5937.88 samples/sec   Loss 17.7995   LearningRate 0.1501   Epoch: 0   Global Step: 7780   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:46:48,668-Speed 5906.01 samples/sec   Loss 17.6803   LearningRate 0.1503   Epoch: 0   Global Step: 7790   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:46:55,536-Speed 5965.47 samples/sec   Loss 17.5854   LearningRate 0.1504   Epoch: 0   Global Step: 7800   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:47:02,421-Speed 5950.40 samples/sec   Loss 17.6792   LearningRate 0.1506   Epoch: 0   Global Step: 7810   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:47:09,322-Speed 5936.17 samples/sec   Loss 17.6787   LearningRate 0.1508   Epoch: 0   Global Step: 7820   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:47:16,170-Speed 5982.13 samples/sec   Loss 17.5801   LearningRate 0.1510   Epoch: 0   Global Step: 7830   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:47:23,097-Speed 5917.63 samples/sec   Loss 17.6034   LearningRate 0.1512   Epoch: 0   Global Step: 7840   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:47:29,961-Speed 5968.49 samples/sec   Loss 17.5990   LearningRate 0.1514   Epoch: 0   Global Step: 7850   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:47:36,837-Speed 5957.94 samples/sec   Loss 17.5066   LearningRate 0.1516   Epoch: 0   Global Step: 7860   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:47:43,716-Speed 5956.29 samples/sec   Loss 17.5724   LearningRate 0.1518   Epoch: 0   Global Step: 7870   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:47:50,591-Speed 5958.70 samples/sec   Loss 17.4354   LearningRate 0.1520   Epoch: 0   Global Step: 7880   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:47:57,454-Speed 5969.46 samples/sec   Loss 17.4540   LearningRate 0.1522   Epoch: 0   Global Step: 7890   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:48:04,307-Speed 5980.55 samples/sec   Loss 17.4825   LearningRate 0.1524   Epoch: 0   Global Step: 7900   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:48:11,196-Speed 5946.72 samples/sec   Loss 17.4668   LearningRate 0.1526   Epoch: 0   Global Step: 7910   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:48:18,079-Speed 5951.71 samples/sec   Loss 17.3536   LearningRate 0.1528   Epoch: 0   Global Step: 7920   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:48:24,955-Speed 5960.21 samples/sec   Loss 17.3835   LearningRate 0.1530   Epoch: 0   Global Step: 7930   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:48:31,820-Speed 5968.52 samples/sec   Loss 17.2276   LearningRate 0.1531   Epoch: 0   Global Step: 7940   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:48:38,676-Speed 5975.15 samples/sec   Loss 17.3389   LearningRate 0.1533   Epoch: 0   Global Step: 7950   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:48:45,542-Speed 5966.90 samples/sec   Loss 17.3307   LearningRate 0.1535   Epoch: 0   Global Step: 7960   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:48:52,391-Speed 5980.98 samples/sec   Loss 17.2087   LearningRate 0.1537   Epoch: 0   Global Step: 7970   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:48:59,251-Speed 5974.68 samples/sec   Loss 17.2716   LearningRate 0.1539   Epoch: 0   Global Step: 7980   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:49:06,109-Speed 5974.09 samples/sec   Loss 17.1665   LearningRate 0.1541   Epoch: 0   Global Step: 7990   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:49:12,964-Speed 5975.92 samples/sec   Loss 17.1761   LearningRate 0.1543   Epoch: 0   Global Step: 8000   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:49:19,829-Speed 5967.77 samples/sec   Loss 17.1844   LearningRate 0.1545   Epoch: 0   Global Step: 8010   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:49:26,685-Speed 5975.90 samples/sec   Loss 17.1919   LearningRate 0.1547   Epoch: 0   Global Step: 8020   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:49:33,548-Speed 5969.10 samples/sec   Loss 17.1077   LearningRate 0.1549   Epoch: 0   Global Step: 8030   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:49:40,402-Speed 5977.73 samples/sec   Loss 17.1908   LearningRate 0.1551   Epoch: 0   Global Step: 8040   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 21:49:47,262-Speed 5971.62 samples/sec   Loss 17.1776   LearningRate 0.1553   Epoch: 0   Global Step: 8050   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 21:49:54,140-Speed 5956.84 samples/sec   Loss 17.1030   LearningRate 0.1555   Epoch: 0   Global Step: 8060   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 21:50:01,040-Speed 5937.50 samples/sec   Loss 17.0069   LearningRate 0.1557   Epoch: 0   Global Step: 8070   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 21:50:07,909-Speed 5964.25 samples/sec   Loss 17.0455   LearningRate 0.1558   Epoch: 0   Global Step: 8080   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 21:50:14,767-Speed 5973.01 samples/sec   Loss 16.9300   LearningRate 0.1560   Epoch: 0   Global Step: 8090   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 21:50:21,611-Speed 5986.61 samples/sec   Loss 16.9528   LearningRate 0.1562   Epoch: 0   Global Step: 8100   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 21:50:28,461-Speed 5980.24 samples/sec   Loss 16.9615   LearningRate 0.1564   Epoch: 0   Global Step: 8110   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 21:50:35,317-Speed 5975.78 samples/sec   Loss 16.9693   LearningRate 0.1566   Epoch: 0   Global Step: 8120   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 21:50:42,185-Speed 5964.57 samples/sec   Loss 16.9228   LearningRate 0.1568   Epoch: 0   Global Step: 8130   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 21:50:49,073-Speed 5947.21 samples/sec   Loss 16.9636   LearningRate 0.1570   Epoch: 0   Global Step: 8140   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:50:55,946-Speed 5960.55 samples/sec   Loss 16.8588   LearningRate 0.1572   Epoch: 0   Global Step: 8150   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:51:02,787-Speed 5988.86 samples/sec   Loss 16.7942   LearningRate 0.1574   Epoch: 0   Global Step: 8160   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 21:51:09,645-Speed 5973.32 samples/sec   Loss 16.7800   LearningRate 0.1576   Epoch: 0   Global Step: 8170   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 21:51:16,521-Speed 5958.17 samples/sec   Loss 16.8076   LearningRate 0.1578   Epoch: 0   Global Step: 8180   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 21:51:23,375-Speed 5977.03 samples/sec   Loss 16.7639   LearningRate 0.1580   Epoch: 0   Global Step: 8190   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 21:51:30,230-Speed 5976.28 samples/sec   Loss 16.8448   LearningRate 0.1582   Epoch: 0   Global Step: 8200   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 21:51:37,080-Speed 5982.97 samples/sec   Loss 16.7525   LearningRate 0.1584   Epoch: 0   Global Step: 8210   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 21:51:43,921-Speed 5988.13 samples/sec   Loss 16.7533   LearningRate 0.1585   Epoch: 0   Global Step: 8220   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 21:51:50,796-Speed 5958.71 samples/sec   Loss 16.6894   LearningRate 0.1587   Epoch: 0   Global Step: 8230   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 21:51:57,641-Speed 5985.46 samples/sec   Loss 16.5745   LearningRate 0.1589   Epoch: 0   Global Step: 8240   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 21:52:04,490-Speed 5981.53 samples/sec   Loss 16.6495   LearningRate 0.1591   Epoch: 0   Global Step: 8250   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 21:52:11,358-Speed 5966.40 samples/sec   Loss 16.5943   LearningRate 0.1593   Epoch: 0   Global Step: 8260   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:52:18,238-Speed 5954.73 samples/sec   Loss 16.6616   LearningRate 0.1595   Epoch: 0   Global Step: 8270   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:52:25,118-Speed 5954.69 samples/sec   Loss 16.5690   LearningRate 0.1597   Epoch: 0   Global Step: 8280   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:52:32,001-Speed 5952.08 samples/sec   Loss 16.5439   LearningRate 0.1599   Epoch: 0   Global Step: 8290   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:52:38,858-Speed 5975.25 samples/sec   Loss 16.5347   LearningRate 0.1601   Epoch: 0   Global Step: 8300   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:52:45,721-Speed 5968.65 samples/sec   Loss 16.5297   LearningRate 0.1603   Epoch: 0   Global Step: 8310   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:52:52,566-Speed 5985.52 samples/sec   Loss 16.4169   LearningRate 0.1605   Epoch: 0   Global Step: 8320   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:52:59,412-Speed 5984.15 samples/sec   Loss 16.4549   LearningRate 0.1607   Epoch: 0   Global Step: 8330   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:53:06,271-Speed 5974.67 samples/sec   Loss 16.4238   LearningRate 0.1609   Epoch: 0   Global Step: 8340   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:53:13,130-Speed 5972.61 samples/sec   Loss 16.4734   LearningRate 0.1611   Epoch: 0   Global Step: 8350   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:53:19,988-Speed 5974.36 samples/sec   Loss 16.4284   LearningRate 0.1612   Epoch: 0   Global Step: 8360   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:53:26,873-Speed 5950.03 samples/sec   Loss 16.4370   LearningRate 0.1614   Epoch: 0   Global Step: 8370   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:53:33,735-Speed 5972.10 samples/sec   Loss 16.4053   LearningRate 0.1616   Epoch: 0   Global Step: 8380   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:53:40,596-Speed 5973.49 samples/sec   Loss 16.3828   LearningRate 0.1618   Epoch: 0   Global Step: 8390   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:53:47,478-Speed 5953.03 samples/sec   Loss 16.3081   LearningRate 0.1620   Epoch: 0   Global Step: 8400   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:53:54,348-Speed 5963.56 samples/sec   Loss 16.3545   LearningRate 0.1622   Epoch: 0   Global Step: 8410   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:54:01,206-Speed 5973.75 samples/sec   Loss 16.3820   LearningRate 0.1624   Epoch: 0   Global Step: 8420   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:54:08,110-Speed 5934.46 samples/sec   Loss 16.2807   LearningRate 0.1626   Epoch: 0   Global Step: 8430   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:54:14,975-Speed 5967.41 samples/sec   Loss 16.2073   LearningRate 0.1628   Epoch: 0   Global Step: 8440   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:54:21,873-Speed 5939.34 samples/sec   Loss 16.2582   LearningRate 0.1630   Epoch: 0   Global Step: 8450   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:54:28,761-Speed 5948.37 samples/sec   Loss 16.2800   LearningRate 0.1632   Epoch: 0   Global Step: 8460   Fp16 Grad Scale: 524288   Required: 39 hours
Training: 2022-01-07 21:54:35,635-Speed 5959.67 samples/sec   Loss 16.2336   LearningRate 0.1634   Epoch: 0   Global Step: 8470   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:54:42,498-Speed 5968.84 samples/sec   Loss 16.1895   LearningRate 0.1636   Epoch: 0   Global Step: 8480   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:54:49,359-Speed 5972.03 samples/sec   Loss 16.1190   LearningRate 0.1638   Epoch: 0   Global Step: 8490   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:54:56,231-Speed 5961.46 samples/sec   Loss 16.2384   LearningRate 0.1640   Epoch: 0   Global Step: 8500   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:55:03,130-Speed 5938.22 samples/sec   Loss 16.1496   LearningRate 0.1641   Epoch: 0   Global Step: 8510   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:55:09,997-Speed 5966.15 samples/sec   Loss 16.1222   LearningRate 0.1643   Epoch: 0   Global Step: 8520   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:55:16,885-Speed 5949.13 samples/sec   Loss 16.0454   LearningRate 0.1645   Epoch: 0   Global Step: 8530   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:55:23,731-Speed 5984.32 samples/sec   Loss 16.0432   LearningRate 0.1647   Epoch: 0   Global Step: 8540   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:55:30,585-Speed 5977.50 samples/sec   Loss 16.0517   LearningRate 0.1649   Epoch: 0   Global Step: 8550   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:55:37,437-Speed 5978.66 samples/sec   Loss 16.0492   LearningRate 0.1651   Epoch: 0   Global Step: 8560   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:55:44,311-Speed 5959.58 samples/sec   Loss 16.0011   LearningRate 0.1653   Epoch: 0   Global Step: 8570   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:55:51,160-Speed 5981.20 samples/sec   Loss 16.0361   LearningRate 0.1655   Epoch: 0   Global Step: 8580   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:55:58,009-Speed 5981.78 samples/sec   Loss 16.0504   LearningRate 0.1657   Epoch: 0   Global Step: 8590   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:56:04,860-Speed 5979.13 samples/sec   Loss 16.0179   LearningRate 0.1659   Epoch: 0   Global Step: 8600   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:56:11,744-Speed 5951.00 samples/sec   Loss 15.9946   LearningRate 0.1661   Epoch: 0   Global Step: 8610   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:56:18,591-Speed 5983.49 samples/sec   Loss 15.9184   LearningRate 0.1663   Epoch: 0   Global Step: 8620   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:56:25,459-Speed 5964.86 samples/sec   Loss 15.9306   LearningRate 0.1665   Epoch: 0   Global Step: 8630   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 21:56:32,310-Speed 5979.75 samples/sec   Loss 15.9924   LearningRate 0.1667   Epoch: 0   Global Step: 8640   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:56:39,167-Speed 5974.73 samples/sec   Loss 15.8669   LearningRate 0.1668   Epoch: 0   Global Step: 8650   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 21:56:46,035-Speed 5965.17 samples/sec   Loss 15.8896   LearningRate 0.1670   Epoch: 0   Global Step: 8660   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 21:56:52,905-Speed 5964.67 samples/sec   Loss 15.8988   LearningRate 0.1672   Epoch: 0   Global Step: 8670   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 21:56:59,760-Speed 5976.00 samples/sec   Loss 15.9095   LearningRate 0.1674   Epoch: 0   Global Step: 8680   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 21:57:06,619-Speed 5972.50 samples/sec   Loss 15.7799   LearningRate 0.1676   Epoch: 0   Global Step: 8690   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 21:57:13,482-Speed 5970.02 samples/sec   Loss 15.8239   LearningRate 0.1678   Epoch: 0   Global Step: 8700   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 21:57:20,333-Speed 5979.37 samples/sec   Loss 15.8823   LearningRate 0.1680   Epoch: 0   Global Step: 8710   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 21:57:27,193-Speed 5971.60 samples/sec   Loss 15.8293   LearningRate 0.1682   Epoch: 0   Global Step: 8720   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 21:57:34,058-Speed 5967.45 samples/sec   Loss 15.8105   LearningRate 0.1684   Epoch: 0   Global Step: 8730   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 21:57:40,921-Speed 5970.35 samples/sec   Loss 15.6523   LearningRate 0.1686   Epoch: 0   Global Step: 8740   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 21:57:47,770-Speed 5980.90 samples/sec   Loss 15.7051   LearningRate 0.1688   Epoch: 0   Global Step: 8750   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 21:57:54,640-Speed 5963.30 samples/sec   Loss 15.6780   LearningRate 0.1690   Epoch: 0   Global Step: 8760   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 21:58:01,497-Speed 5974.01 samples/sec   Loss 15.7037   LearningRate 0.1692   Epoch: 0   Global Step: 8770   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 21:58:08,343-Speed 5984.34 samples/sec   Loss 15.6728   LearningRate 0.1694   Epoch: 0   Global Step: 8780   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 21:58:15,206-Speed 5972.38 samples/sec   Loss 15.6778   LearningRate 0.1695   Epoch: 0   Global Step: 8790   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 21:58:22,071-Speed 5967.51 samples/sec   Loss 15.6115   LearningRate 0.1697   Epoch: 0   Global Step: 8800   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 21:58:28,921-Speed 5980.68 samples/sec   Loss 15.6828   LearningRate 0.1699   Epoch: 0   Global Step: 8810   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 21:58:35,797-Speed 5958.19 samples/sec   Loss 15.5774   LearningRate 0.1701   Epoch: 0   Global Step: 8820   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 21:58:42,691-Speed 5944.92 samples/sec   Loss 15.5663   LearningRate 0.1703   Epoch: 0   Global Step: 8830   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 21:58:49,550-Speed 5972.61 samples/sec   Loss 15.6027   LearningRate 0.1705   Epoch: 0   Global Step: 8840   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 21:58:56,427-Speed 5957.80 samples/sec   Loss 15.5618   LearningRate 0.1707   Epoch: 0   Global Step: 8850   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 21:59:03,291-Speed 5968.42 samples/sec   Loss 15.5316   LearningRate 0.1709   Epoch: 0   Global Step: 8860   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 21:59:10,172-Speed 5953.36 samples/sec   Loss 15.5202   LearningRate 0.1711   Epoch: 0   Global Step: 8870   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 21:59:17,062-Speed 5946.22 samples/sec   Loss 15.6317   LearningRate 0.1713   Epoch: 0   Global Step: 8880   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 21:59:23,936-Speed 5961.51 samples/sec   Loss 15.4633   LearningRate 0.1715   Epoch: 0   Global Step: 8890   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 21:59:30,825-Speed 5946.84 samples/sec   Loss 15.5509   LearningRate 0.1717   Epoch: 0   Global Step: 8900   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 21:59:37,678-Speed 5978.01 samples/sec   Loss 15.5200   LearningRate 0.1719   Epoch: 0   Global Step: 8910   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 21:59:44,549-Speed 5962.87 samples/sec   Loss 15.5216   LearningRate 0.1721   Epoch: 0   Global Step: 8920   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 21:59:51,409-Speed 5971.65 samples/sec   Loss 15.5121   LearningRate 0.1722   Epoch: 0   Global Step: 8930   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 21:59:58,251-Speed 5987.55 samples/sec   Loss 15.4882   LearningRate 0.1724   Epoch: 0   Global Step: 8940   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:00:05,137-Speed 5949.23 samples/sec   Loss 15.3887   LearningRate 0.1726   Epoch: 0   Global Step: 8950   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:00:12,060-Speed 5917.53 samples/sec   Loss 15.3911   LearningRate 0.1728   Epoch: 0   Global Step: 8960   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:00:18,920-Speed 5973.38 samples/sec   Loss 15.4255   LearningRate 0.1730   Epoch: 0   Global Step: 8970   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:00:25,800-Speed 5955.86 samples/sec   Loss 15.4783   LearningRate 0.1732   Epoch: 0   Global Step: 8980   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:00:32,659-Speed 5972.04 samples/sec   Loss 15.4224   LearningRate 0.1734   Epoch: 0   Global Step: 8990   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:00:39,622-Speed 5886.68 samples/sec   Loss 15.4192   LearningRate 0.1736   Epoch: 0   Global Step: 9000   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:00:46,478-Speed 5974.87 samples/sec   Loss 15.3679   LearningRate 0.1738   Epoch: 0   Global Step: 9010   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:00:53,436-Speed 5889.08 samples/sec   Loss 15.3293   LearningRate 0.1740   Epoch: 0   Global Step: 9020   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:01:00,413-Speed 5871.70 samples/sec   Loss 15.4114   LearningRate 0.1742   Epoch: 0   Global Step: 9030   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:01:07,338-Speed 5916.54 samples/sec   Loss 15.2618   LearningRate 0.1744   Epoch: 0   Global Step: 9040   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:01:14,208-Speed 5963.45 samples/sec   Loss 15.3206   LearningRate 0.1746   Epoch: 0   Global Step: 9050   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:01:21,061-Speed 5978.01 samples/sec   Loss 15.2302   LearningRate 0.1748   Epoch: 0   Global Step: 9060   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:01:27,913-Speed 5981.51 samples/sec   Loss 15.2747   LearningRate 0.1749   Epoch: 0   Global Step: 9070   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:01:34,759-Speed 5983.42 samples/sec   Loss 15.2203   LearningRate 0.1751   Epoch: 0   Global Step: 9080   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:01:41,607-Speed 5982.45 samples/sec   Loss 15.2309   LearningRate 0.1753   Epoch: 0   Global Step: 9090   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:01:48,460-Speed 5978.22 samples/sec   Loss 15.2254   LearningRate 0.1755   Epoch: 0   Global Step: 9100   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:01:55,332-Speed 5961.88 samples/sec   Loss 15.1388   LearningRate 0.1757   Epoch: 0   Global Step: 9110   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:02:02,184-Speed 5979.45 samples/sec   Loss 15.1683   LearningRate 0.1759   Epoch: 0   Global Step: 9120   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:02:09,045-Speed 5970.56 samples/sec   Loss 15.1888   LearningRate 0.1761   Epoch: 0   Global Step: 9130   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:02:15,896-Speed 5979.68 samples/sec   Loss 15.2158   LearningRate 0.1763   Epoch: 0   Global Step: 9140   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:02:22,752-Speed 5976.27 samples/sec   Loss 15.2421   LearningRate 0.1765   Epoch: 0   Global Step: 9150   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:02:29,621-Speed 5964.50 samples/sec   Loss 15.2884   LearningRate 0.1767   Epoch: 0   Global Step: 9160   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:02:36,499-Speed 5956.47 samples/sec   Loss 15.1859   LearningRate 0.1769   Epoch: 0   Global Step: 9170   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:02:43,353-Speed 5976.57 samples/sec   Loss 15.1560   LearningRate 0.1771   Epoch: 0   Global Step: 9180   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:02:50,226-Speed 5961.16 samples/sec   Loss 15.1291   LearningRate 0.1773   Epoch: 0   Global Step: 9190   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:02:57,084-Speed 5973.72 samples/sec   Loss 15.1348   LearningRate 0.1775   Epoch: 0   Global Step: 9200   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:03:03,963-Speed 5955.48 samples/sec   Loss 15.1061   LearningRate 0.1776   Epoch: 0   Global Step: 9210   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:03:10,805-Speed 5988.06 samples/sec   Loss 15.0376   LearningRate 0.1778   Epoch: 0   Global Step: 9220   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:03:19,506-Speed 4707.76 samples/sec   Loss 15.0068   LearningRate 0.1780   Epoch: 0   Global Step: 9230   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:03:26,370-Speed 5971.57 samples/sec   Loss 15.0558   LearningRate 0.1782   Epoch: 0   Global Step: 9240   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:03:33,237-Speed 5965.14 samples/sec   Loss 15.0669   LearningRate 0.1784   Epoch: 0   Global Step: 9250   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:03:40,089-Speed 5979.11 samples/sec   Loss 15.0292   LearningRate 0.1786   Epoch: 0   Global Step: 9260   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:03:46,938-Speed 5980.95 samples/sec   Loss 15.0814   LearningRate 0.1788   Epoch: 0   Global Step: 9270   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:03:53,805-Speed 5966.18 samples/sec   Loss 15.0248   LearningRate 0.1790   Epoch: 0   Global Step: 9280   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:04:00,657-Speed 5978.95 samples/sec   Loss 15.1007   LearningRate 0.1792   Epoch: 0   Global Step: 9290   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:04:07,533-Speed 5957.96 samples/sec   Loss 15.0560   LearningRate 0.1794   Epoch: 0   Global Step: 9300   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:04:14,411-Speed 5958.42 samples/sec   Loss 15.0078   LearningRate 0.1796   Epoch: 0   Global Step: 9310   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:04:21,287-Speed 5958.40 samples/sec   Loss 14.9496   LearningRate 0.1798   Epoch: 0   Global Step: 9320   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:04:28,191-Speed 5933.90 samples/sec   Loss 15.0092   LearningRate 0.1800   Epoch: 0   Global Step: 9330   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:04:35,043-Speed 5979.85 samples/sec   Loss 15.0581   LearningRate 0.1802   Epoch: 0   Global Step: 9340   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:04:41,915-Speed 5960.88 samples/sec   Loss 14.9066   LearningRate 0.1803   Epoch: 0   Global Step: 9350   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:04:48,772-Speed 5975.30 samples/sec   Loss 14.9653   LearningRate 0.1805   Epoch: 0   Global Step: 9360   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:04:55,616-Speed 5984.97 samples/sec   Loss 14.9535   LearningRate 0.1807   Epoch: 0   Global Step: 9370   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:05:02,490-Speed 5959.58 samples/sec   Loss 14.9437   LearningRate 0.1809   Epoch: 0   Global Step: 9380   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:05:09,358-Speed 5965.26 samples/sec   Loss 14.9131   LearningRate 0.1811   Epoch: 0   Global Step: 9390   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:05:16,212-Speed 5977.78 samples/sec   Loss 14.8616   LearningRate 0.1813   Epoch: 0   Global Step: 9400   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:05:23,099-Speed 5947.67 samples/sec   Loss 14.8862   LearningRate 0.1815   Epoch: 0   Global Step: 9410   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:05:29,972-Speed 5961.38 samples/sec   Loss 14.8848   LearningRate 0.1817   Epoch: 0   Global Step: 9420   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:05:36,832-Speed 5974.62 samples/sec   Loss 14.9471   LearningRate 0.1819   Epoch: 0   Global Step: 9430   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:05:43,717-Speed 5950.09 samples/sec   Loss 14.7083   LearningRate 0.1821   Epoch: 0   Global Step: 9440   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:05:50,617-Speed 5946.64 samples/sec   Loss 14.8556   LearningRate 0.1823   Epoch: 0   Global Step: 9450   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:05:57,466-Speed 5981.99 samples/sec   Loss 14.7837   LearningRate 0.1825   Epoch: 0   Global Step: 9460   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:06:04,331-Speed 5967.87 samples/sec   Loss 14.7805   LearningRate 0.1827   Epoch: 0   Global Step: 9470   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:06:11,198-Speed 5966.21 samples/sec   Loss 14.7731   LearningRate 0.1829   Epoch: 0   Global Step: 9480   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:06:18,043-Speed 5984.43 samples/sec   Loss 14.7736   LearningRate 0.1830   Epoch: 0   Global Step: 9490   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:06:24,896-Speed 5978.25 samples/sec   Loss 14.6963   LearningRate 0.1832   Epoch: 0   Global Step: 9500   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:06:31,769-Speed 5961.05 samples/sec   Loss 14.7555   LearningRate 0.1834   Epoch: 0   Global Step: 9510   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:06:38,632-Speed 5969.31 samples/sec   Loss 14.8794   LearningRate 0.1836   Epoch: 0   Global Step: 9520   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:06:45,503-Speed 5962.42 samples/sec   Loss 14.7393   LearningRate 0.1838   Epoch: 0   Global Step: 9530   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:06:52,371-Speed 5964.55 samples/sec   Loss 14.6987   LearningRate 0.1840   Epoch: 0   Global Step: 9540   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:06:59,231-Speed 5972.35 samples/sec   Loss 14.8191   LearningRate 0.1842   Epoch: 0   Global Step: 9550   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:07:06,118-Speed 5949.69 samples/sec   Loss 14.7644   LearningRate 0.1844   Epoch: 0   Global Step: 9560   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:07:12,970-Speed 5979.39 samples/sec   Loss 14.7596   LearningRate 0.1846   Epoch: 0   Global Step: 9570   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:07:19,842-Speed 5960.79 samples/sec   Loss 14.6877   LearningRate 0.1848   Epoch: 0   Global Step: 9580   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:07:26,735-Speed 5944.03 samples/sec   Loss 14.6889   LearningRate 0.1850   Epoch: 0   Global Step: 9590   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:07:33,607-Speed 5961.82 samples/sec   Loss 14.7247   LearningRate 0.1852   Epoch: 0   Global Step: 9600   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:07:40,473-Speed 5966.84 samples/sec   Loss 14.6631   LearningRate 0.1854   Epoch: 0   Global Step: 9610   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:07:47,337-Speed 5968.04 samples/sec   Loss 14.6609   LearningRate 0.1856   Epoch: 0   Global Step: 9620   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:07:54,211-Speed 5960.12 samples/sec   Loss 14.7423   LearningRate 0.1857   Epoch: 0   Global Step: 9630   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:08:01,080-Speed 5964.56 samples/sec   Loss 14.6019   LearningRate 0.1859   Epoch: 0   Global Step: 9640   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:08:07,942-Speed 5969.92 samples/sec   Loss 14.7086   LearningRate 0.1861   Epoch: 0   Global Step: 9650   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:08:14,808-Speed 5966.55 samples/sec   Loss 14.7254   LearningRate 0.1863   Epoch: 0   Global Step: 9660   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:08:21,671-Speed 5970.03 samples/sec   Loss 14.6023   LearningRate 0.1865   Epoch: 0   Global Step: 9670   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:08:28,576-Speed 5932.58 samples/sec   Loss 14.6784   LearningRate 0.1867   Epoch: 0   Global Step: 9680   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:08:35,432-Speed 5974.64 samples/sec   Loss 14.6009   LearningRate 0.1869   Epoch: 0   Global Step: 9690   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:08:42,294-Speed 5970.17 samples/sec   Loss 14.6094   LearningRate 0.1871   Epoch: 0   Global Step: 9700   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:08:49,151-Speed 5974.48 samples/sec   Loss 14.6191   LearningRate 0.1873   Epoch: 0   Global Step: 9710   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:08:56,052-Speed 5937.18 samples/sec   Loss 14.5669   LearningRate 0.1875   Epoch: 0   Global Step: 9720   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:09:02,974-Speed 5917.40 samples/sec   Loss 14.4762   LearningRate 0.1877   Epoch: 0   Global Step: 9730   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:09:09,859-Speed 5950.68 samples/sec   Loss 14.5649   LearningRate 0.1879   Epoch: 0   Global Step: 9740   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:09:16,707-Speed 5982.81 samples/sec   Loss 14.6165   LearningRate 0.1881   Epoch: 0   Global Step: 9750   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:09:23,556-Speed 5981.32 samples/sec   Loss 14.5682   LearningRate 0.1883   Epoch: 0   Global Step: 9760   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:09:30,404-Speed 5982.34 samples/sec   Loss 14.5815   LearningRate 0.1884   Epoch: 0   Global Step: 9770   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:09:37,260-Speed 5974.96 samples/sec   Loss 14.5272   LearningRate 0.1886   Epoch: 0   Global Step: 9780   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:09:44,112-Speed 5978.71 samples/sec   Loss 14.4669   LearningRate 0.1888   Epoch: 0   Global Step: 9790   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:09:51,012-Speed 5938.28 samples/sec   Loss 14.5058   LearningRate 0.1890   Epoch: 0   Global Step: 9800   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:09:57,865-Speed 5978.57 samples/sec   Loss 14.5419   LearningRate 0.1892   Epoch: 0   Global Step: 9810   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:10:04,699-Speed 5994.10 samples/sec   Loss 14.5219   LearningRate 0.1894   Epoch: 0   Global Step: 9820   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:10:11,560-Speed 5971.40 samples/sec   Loss 14.5208   LearningRate 0.1896   Epoch: 0   Global Step: 9830   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:10:18,425-Speed 5968.15 samples/sec   Loss 14.4499   LearningRate 0.1898   Epoch: 0   Global Step: 9840   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:10:25,277-Speed 5978.81 samples/sec   Loss 14.4971   LearningRate 0.1900   Epoch: 0   Global Step: 9850   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:10:32,121-Speed 5986.56 samples/sec   Loss 14.4689   LearningRate 0.1902   Epoch: 0   Global Step: 9860   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:10:38,968-Speed 5983.50 samples/sec   Loss 14.3801   LearningRate 0.1904   Epoch: 0   Global Step: 9870   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:10:45,821-Speed 5978.25 samples/sec   Loss 14.6264   LearningRate 0.1906   Epoch: 0   Global Step: 9880   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:10:52,679-Speed 5973.39 samples/sec   Loss 14.4772   LearningRate 0.1908   Epoch: 0   Global Step: 9890   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:10:59,531-Speed 5979.28 samples/sec   Loss 14.4862   LearningRate 0.1910   Epoch: 0   Global Step: 9900   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:11:06,380-Speed 5981.58 samples/sec   Loss 14.4451   LearningRate 0.1911   Epoch: 0   Global Step: 9910   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:11:13,240-Speed 5977.03 samples/sec   Loss 14.3839   LearningRate 0.1913   Epoch: 0   Global Step: 9920   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:11:20,110-Speed 5963.20 samples/sec   Loss 14.4148   LearningRate 0.1915   Epoch: 0   Global Step: 9930   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:11:26,966-Speed 5975.77 samples/sec   Loss 14.2782   LearningRate 0.1917   Epoch: 0   Global Step: 9940   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:11:33,827-Speed 5971.22 samples/sec   Loss 14.4280   LearningRate 0.1919   Epoch: 0   Global Step: 9950   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:11:40,674-Speed 5982.52 samples/sec   Loss 14.4327   LearningRate 0.1921   Epoch: 0   Global Step: 9960   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:11:47,548-Speed 5960.80 samples/sec   Loss 14.3736   LearningRate 0.1923   Epoch: 0   Global Step: 9970   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:11:54,408-Speed 5974.15 samples/sec   Loss 14.3185   LearningRate 0.1925   Epoch: 0   Global Step: 9980   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:12:01,279-Speed 5962.80 samples/sec   Loss 14.3071   LearningRate 0.1927   Epoch: 0   Global Step: 9990   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-01-07 22:12:08,154-Speed 5958.91 samples/sec   Loss 14.3788   LearningRate 0.1929   Epoch: 0   Global Step: 10000   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-01-07 22:12:35,258-[lfw][10000]XNorm: 23.939884
Training: 2022-01-07 22:12:35,259-[lfw][10000]Accuracy-Flip: 0.99600+-0.00238
Training: 2022-01-07 22:12:35,259-[lfw][10000]Accuracy-Highest: 0.99600
Training: 2022-01-07 22:13:06,053-[cfp_fp][10000]XNorm: 21.784641
Training: 2022-01-07 22:13:06,054-[cfp_fp][10000]Accuracy-Flip: 0.96643+-0.01007
Training: 2022-01-07 22:13:06,055-[cfp_fp][10000]Accuracy-Highest: 0.96643
Training: 2022-01-07 22:13:32,717-[agedb_30][10000]XNorm: 23.338620
Training: 2022-01-07 22:13:32,718-[agedb_30][10000]Accuracy-Flip: 0.94517+-0.01275
Training: 2022-01-07 22:13:32,719-[agedb_30][10000]Accuracy-Highest: 0.94517
Training: 2022-01-07 22:13:39,562-Speed 448.12 samples/sec   Loss 14.3137   LearningRate 0.1931   Epoch: 0   Global Step: 10010   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-01-07 22:13:46,413-Speed 5979.83 samples/sec   Loss 14.2814   LearningRate 0.1933   Epoch: 0   Global Step: 10020   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-01-07 22:13:53,246-Speed 5995.44 samples/sec   Loss 14.3279   LearningRate 0.1935   Epoch: 0   Global Step: 10030   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-01-07 22:14:00,098-Speed 5979.38 samples/sec   Loss 14.2759   LearningRate 0.1937   Epoch: 0   Global Step: 10040   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-01-07 22:14:06,961-Speed 5969.44 samples/sec   Loss 14.3463   LearningRate 0.1938   Epoch: 0   Global Step: 10050   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-01-07 22:14:13,807-Speed 5983.98 samples/sec   Loss 14.2758   LearningRate 0.1940   Epoch: 0   Global Step: 10060   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-01-07 22:14:20,674-Speed 5969.06 samples/sec   Loss 14.3367   LearningRate 0.1942   Epoch: 0   Global Step: 10070   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-01-07 22:14:27,537-Speed 5971.23 samples/sec   Loss 14.3381   LearningRate 0.1944   Epoch: 0   Global Step: 10080   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-01-07 22:14:34,396-Speed 5975.83 samples/sec   Loss 14.2366   LearningRate 0.1946   Epoch: 0   Global Step: 10090   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 22:14:41,245-Speed 5980.45 samples/sec   Loss 14.3585   LearningRate 0.1948   Epoch: 0   Global Step: 10100   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 22:14:48,120-Speed 5959.91 samples/sec   Loss 14.2282   LearningRate 0.1950   Epoch: 0   Global Step: 10110   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 22:14:54,996-Speed 5957.39 samples/sec   Loss 14.2740   LearningRate 0.1952   Epoch: 0   Global Step: 10120   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 22:15:01,879-Speed 5955.00 samples/sec   Loss 14.2033   LearningRate 0.1954   Epoch: 0   Global Step: 10130   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 22:15:08,723-Speed 5985.40 samples/sec   Loss 14.3284   LearningRate 0.1956   Epoch: 0   Global Step: 10140   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 22:15:15,580-Speed 5975.23 samples/sec   Loss 14.3477   LearningRate 0.1958   Epoch: 0   Global Step: 10150   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 22:15:22,441-Speed 5970.54 samples/sec   Loss 14.2149   LearningRate 0.1960   Epoch: 0   Global Step: 10160   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 22:15:29,319-Speed 5957.12 samples/sec   Loss 14.2125   LearningRate 0.1962   Epoch: 0   Global Step: 10170   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 22:15:36,166-Speed 5984.80 samples/sec   Loss 14.2637   LearningRate 0.1964   Epoch: 0   Global Step: 10180   Fp16 Grad Scale: 65536   Required: 39 hours
Training: 2022-01-07 22:15:43,079-Speed 5925.52 samples/sec   Loss 14.1972   LearningRate 0.1965   Epoch: 0   Global Step: 10190   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:15:50,332-Speed 5649.18 samples/sec   Loss 14.1581   LearningRate 0.1967   Epoch: 0   Global Step: 10200   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:15:57,945-Speed 5381.06 samples/sec   Loss 14.2759   LearningRate 0.1969   Epoch: 0   Global Step: 10210   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:16:04,809-Speed 5968.26 samples/sec   Loss 14.1730   LearningRate 0.1971   Epoch: 0   Global Step: 10220   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:16:11,648-Speed 5991.02 samples/sec   Loss 14.2360   LearningRate 0.1973   Epoch: 0   Global Step: 10230   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:16:18,494-Speed 5984.15 samples/sec   Loss 14.2386   LearningRate 0.1975   Epoch: 0   Global Step: 10240   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:16:25,340-Speed 5984.11 samples/sec   Loss 14.1822   LearningRate 0.1977   Epoch: 0   Global Step: 10250   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:16:32,189-Speed 5981.18 samples/sec   Loss 14.1233   LearningRate 0.1979   Epoch: 0   Global Step: 10260   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:16:39,027-Speed 5991.48 samples/sec   Loss 14.1420   LearningRate 0.1981   Epoch: 0   Global Step: 10270   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:16:45,892-Speed 5966.98 samples/sec   Loss 14.1991   LearningRate 0.1983   Epoch: 0   Global Step: 10280   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:16:52,750-Speed 5973.97 samples/sec   Loss 14.1614   LearningRate 0.1985   Epoch: 0   Global Step: 10290   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 22:16:59,599-Speed 5982.00 samples/sec   Loss 14.1625   LearningRate 0.1987   Epoch: 0   Global Step: 10300   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 22:17:06,462-Speed 5968.66 samples/sec   Loss 14.2343   LearningRate 0.1989   Epoch: 0   Global Step: 10310   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 22:17:13,346-Speed 5951.54 samples/sec   Loss 14.1250   LearningRate 0.1991   Epoch: 0   Global Step: 10320   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 22:17:20,232-Speed 5950.19 samples/sec   Loss 14.1839   LearningRate 0.1992   Epoch: 0   Global Step: 10330   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 22:17:27,104-Speed 5961.20 samples/sec   Loss 14.2074   LearningRate 0.1994   Epoch: 0   Global Step: 10340   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 22:17:33,951-Speed 5983.55 samples/sec   Loss 14.1740   LearningRate 0.1996   Epoch: 0   Global Step: 10350   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:17:40,828-Speed 5957.19 samples/sec   Loss 14.1352   LearningRate 0.1998   Epoch: 0   Global Step: 10360   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:17:47,689-Speed 5971.16 samples/sec   Loss 14.1692   LearningRate 0.2000   Epoch: 0   Global Step: 10370   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:18:13,103-Speed 1611.83 samples/sec   Loss 14.1098   LearningRate 0.2002   Epoch: 1   Global Step: 10380   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:18:19,936-Speed 5995.96 samples/sec   Loss 14.1282   LearningRate 0.2004   Epoch: 1   Global Step: 10390   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:18:26,769-Speed 5995.23 samples/sec   Loss 14.0333   LearningRate 0.2006   Epoch: 1   Global Step: 10400   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:18:33,613-Speed 5985.93 samples/sec   Loss 14.0302   LearningRate 0.2008   Epoch: 1   Global Step: 10410   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:18:40,454-Speed 5988.31 samples/sec   Loss 14.1824   LearningRate 0.2010   Epoch: 1   Global Step: 10420   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:18:47,317-Speed 5970.12 samples/sec   Loss 14.0695   LearningRate 0.2012   Epoch: 1   Global Step: 10430   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:18:54,191-Speed 5960.42 samples/sec   Loss 14.0892   LearningRate 0.2014   Epoch: 1   Global Step: 10440   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:19:01,063-Speed 5961.11 samples/sec   Loss 14.0349   LearningRate 0.2016   Epoch: 1   Global Step: 10450   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 22:19:07,983-Speed 5920.33 samples/sec   Loss 14.0302   LearningRate 0.2018   Epoch: 1   Global Step: 10460   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 22:19:14,907-Speed 5917.19 samples/sec   Loss 13.9972   LearningRate 0.2019   Epoch: 1   Global Step: 10470   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:19:21,816-Speed 5930.00 samples/sec   Loss 14.0190   LearningRate 0.2021   Epoch: 1   Global Step: 10480   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:19:28,710-Speed 5942.67 samples/sec   Loss 13.9766   LearningRate 0.2023   Epoch: 1   Global Step: 10490   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:19:35,592-Speed 5953.25 samples/sec   Loss 13.9245   LearningRate 0.2025   Epoch: 1   Global Step: 10500   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:19:42,472-Speed 5953.85 samples/sec   Loss 14.0812   LearningRate 0.2027   Epoch: 1   Global Step: 10510   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:19:49,340-Speed 5967.27 samples/sec   Loss 14.0322   LearningRate 0.2029   Epoch: 1   Global Step: 10520   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:19:56,199-Speed 5972.80 samples/sec   Loss 13.9873   LearningRate 0.2031   Epoch: 1   Global Step: 10530   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:20:03,048-Speed 5981.00 samples/sec   Loss 13.9751   LearningRate 0.2033   Epoch: 1   Global Step: 10540   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:20:09,899-Speed 5980.57 samples/sec   Loss 14.0550   LearningRate 0.2035   Epoch: 1   Global Step: 10550   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:20:16,770-Speed 5961.62 samples/sec   Loss 14.0107   LearningRate 0.2037   Epoch: 1   Global Step: 10560   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:20:23,640-Speed 5963.87 samples/sec   Loss 14.0037   LearningRate 0.2039   Epoch: 1   Global Step: 10570   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 22:20:30,510-Speed 5963.50 samples/sec   Loss 14.0178   LearningRate 0.2041   Epoch: 1   Global Step: 10580   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 22:20:37,376-Speed 5967.09 samples/sec   Loss 13.9391   LearningRate 0.2043   Epoch: 1   Global Step: 10590   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 22:20:44,241-Speed 5967.57 samples/sec   Loss 13.9715   LearningRate 0.2045   Epoch: 1   Global Step: 10600   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 22:20:51,099-Speed 5973.60 samples/sec   Loss 13.9977   LearningRate 0.2046   Epoch: 1   Global Step: 10610   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 22:20:57,968-Speed 5964.44 samples/sec   Loss 14.0022   LearningRate 0.2048   Epoch: 1   Global Step: 10620   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 22:21:04,834-Speed 5966.62 samples/sec   Loss 13.8824   LearningRate 0.2050   Epoch: 1   Global Step: 10630   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 22:21:11,684-Speed 5980.75 samples/sec   Loss 13.9032   LearningRate 0.2052   Epoch: 1   Global Step: 10640   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 22:21:18,555-Speed 5962.36 samples/sec   Loss 13.8106   LearningRate 0.2054   Epoch: 1   Global Step: 10650   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:21:25,419-Speed 5968.56 samples/sec   Loss 13.9920   LearningRate 0.2056   Epoch: 1   Global Step: 10660   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:21:32,292-Speed 5960.82 samples/sec   Loss 13.9742   LearningRate 0.2058   Epoch: 1   Global Step: 10670   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:21:39,161-Speed 5963.47 samples/sec   Loss 14.0033   LearningRate 0.2060   Epoch: 1   Global Step: 10680   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:21:46,025-Speed 5971.03 samples/sec   Loss 13.9077   LearningRate 0.2062   Epoch: 1   Global Step: 10690   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:21:52,893-Speed 5965.37 samples/sec   Loss 13.9826   LearningRate 0.2064   Epoch: 1   Global Step: 10700   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:21:59,748-Speed 5976.12 samples/sec   Loss 13.9030   LearningRate 0.2066   Epoch: 1   Global Step: 10710   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:22:06,611-Speed 5969.69 samples/sec   Loss 13.9293   LearningRate 0.2068   Epoch: 1   Global Step: 10720   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:22:13,463-Speed 5978.29 samples/sec   Loss 13.9585   LearningRate 0.2070   Epoch: 1   Global Step: 10730   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:22:20,351-Speed 5947.61 samples/sec   Loss 13.8371   LearningRate 0.2072   Epoch: 1   Global Step: 10740   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:22:27,215-Speed 5968.53 samples/sec   Loss 13.8993   LearningRate 0.2073   Epoch: 1   Global Step: 10750   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 22:22:34,077-Speed 5970.22 samples/sec   Loss 13.9453   LearningRate 0.2075   Epoch: 1   Global Step: 10760   Fp16 Grad Scale: 262144   Required: 39 hours
Training: 2022-01-07 22:22:40,953-Speed 5958.02 samples/sec   Loss 13.8497   LearningRate 0.2077   Epoch: 1   Global Step: 10770   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:22:47,817-Speed 5967.72 samples/sec   Loss 13.8728   LearningRate 0.2079   Epoch: 1   Global Step: 10780   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:22:54,670-Speed 5978.14 samples/sec   Loss 13.9127   LearningRate 0.2081   Epoch: 1   Global Step: 10790   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:23:01,526-Speed 5975.35 samples/sec   Loss 13.8699   LearningRate 0.2083   Epoch: 1   Global Step: 10800   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:23:08,392-Speed 5966.19 samples/sec   Loss 13.9828   LearningRate 0.2085   Epoch: 1   Global Step: 10810   Fp16 Grad Scale: 131072   Required: 39 hours
Training: 2022-01-07 22:23:15,249-Speed 5976.57 samples/sec   Loss 13.8497   LearningRate 0.2087   Epoch: 1   Global Step: 10820   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:23:22,107-Speed 5973.53 samples/sec   Loss 13.8933   LearningRate 0.2089   Epoch: 1   Global Step: 10830   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:23:28,970-Speed 5969.43 samples/sec   Loss 13.8995   LearningRate 0.2091   Epoch: 1   Global Step: 10840   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:23:35,833-Speed 5969.54 samples/sec   Loss 13.8354   LearningRate 0.2093   Epoch: 1   Global Step: 10850   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:23:42,688-Speed 5976.34 samples/sec   Loss 13.9544   LearningRate 0.2095   Epoch: 1   Global Step: 10860   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:23:49,548-Speed 5971.00 samples/sec   Loss 13.8224   LearningRate 0.2097   Epoch: 1   Global Step: 10870   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:23:56,425-Speed 5956.78 samples/sec   Loss 13.8277   LearningRate 0.2099   Epoch: 1   Global Step: 10880   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:24:03,331-Speed 5932.72 samples/sec   Loss 13.8511   LearningRate 0.2100   Epoch: 1   Global Step: 10890   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:24:10,288-Speed 5888.49 samples/sec   Loss 13.8512   LearningRate 0.2102   Epoch: 1   Global Step: 10900   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:24:17,139-Speed 5980.03 samples/sec   Loss 13.8121   LearningRate 0.2104   Epoch: 1   Global Step: 10910   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:24:24,029-Speed 5946.24 samples/sec   Loss 13.7923   LearningRate 0.2106   Epoch: 1   Global Step: 10920   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:24:30,902-Speed 5960.13 samples/sec   Loss 13.7511   LearningRate 0.2108   Epoch: 1   Global Step: 10930   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:24:37,788-Speed 5949.03 samples/sec   Loss 13.7990   LearningRate 0.2110   Epoch: 1   Global Step: 10940   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:24:44,658-Speed 5963.98 samples/sec   Loss 13.7352   LearningRate 0.2112   Epoch: 1   Global Step: 10950   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:24:51,519-Speed 5971.69 samples/sec   Loss 13.7823   LearningRate 0.2114   Epoch: 1   Global Step: 10960   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:24:58,422-Speed 5934.40 samples/sec   Loss 13.7727   LearningRate 0.2116   Epoch: 1   Global Step: 10970   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:25:05,293-Speed 5962.18 samples/sec   Loss 13.8628   LearningRate 0.2118   Epoch: 1   Global Step: 10980   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:25:12,166-Speed 5960.59 samples/sec   Loss 13.8466   LearningRate 0.2120   Epoch: 1   Global Step: 10990   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:25:19,018-Speed 5980.37 samples/sec   Loss 13.8052   LearningRate 0.2122   Epoch: 1   Global Step: 11000   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:25:25,878-Speed 5971.92 samples/sec   Loss 13.7807   LearningRate 0.2124   Epoch: 1   Global Step: 11010   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:25:32,757-Speed 5955.87 samples/sec   Loss 13.8322   LearningRate 0.2126   Epoch: 1   Global Step: 11020   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:25:39,625-Speed 5964.38 samples/sec   Loss 13.7183   LearningRate 0.2127   Epoch: 1   Global Step: 11030   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:25:46,490-Speed 5967.75 samples/sec   Loss 13.7025   LearningRate 0.2129   Epoch: 1   Global Step: 11040   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:25:53,364-Speed 5961.81 samples/sec   Loss 13.7508   LearningRate 0.2131   Epoch: 1   Global Step: 11050   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:26:00,223-Speed 5973.37 samples/sec   Loss 13.8307   LearningRate 0.2133   Epoch: 1   Global Step: 11060   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:26:07,096-Speed 5960.46 samples/sec   Loss 13.7563   LearningRate 0.2135   Epoch: 1   Global Step: 11070   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:26:13,962-Speed 5966.99 samples/sec   Loss 13.7717   LearningRate 0.2137   Epoch: 1   Global Step: 11080   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:26:20,833-Speed 5962.38 samples/sec   Loss 13.7175   LearningRate 0.2139   Epoch: 1   Global Step: 11090   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:26:27,706-Speed 5960.15 samples/sec   Loss 13.7015   LearningRate 0.2141   Epoch: 1   Global Step: 11100   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:26:34,564-Speed 5973.76 samples/sec   Loss 13.7830   LearningRate 0.2143   Epoch: 1   Global Step: 11110   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:26:41,442-Speed 5955.80 samples/sec   Loss 13.6656   LearningRate 0.2145   Epoch: 1   Global Step: 11120   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:26:48,308-Speed 5967.57 samples/sec   Loss 13.7459   LearningRate 0.2147   Epoch: 1   Global Step: 11130   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:26:55,167-Speed 5972.09 samples/sec   Loss 13.7614   LearningRate 0.2149   Epoch: 1   Global Step: 11140   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:27:02,030-Speed 5969.55 samples/sec   Loss 13.7711   LearningRate 0.2151   Epoch: 1   Global Step: 11150   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:27:08,916-Speed 5952.13 samples/sec   Loss 13.6937   LearningRate 0.2153   Epoch: 1   Global Step: 11160   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:27:15,783-Speed 5965.31 samples/sec   Loss 13.7464   LearningRate 0.2154   Epoch: 1   Global Step: 11170   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:27:22,639-Speed 5975.79 samples/sec   Loss 13.7480   LearningRate 0.2156   Epoch: 1   Global Step: 11180   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:27:29,524-Speed 5950.25 samples/sec   Loss 13.7492   LearningRate 0.2158   Epoch: 1   Global Step: 11190   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:27:36,377-Speed 5978.14 samples/sec   Loss 13.6757   LearningRate 0.2160   Epoch: 1   Global Step: 11200   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:27:43,252-Speed 5958.29 samples/sec   Loss 13.7197   LearningRate 0.2162   Epoch: 1   Global Step: 11210   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:27:50,120-Speed 5964.44 samples/sec   Loss 13.7666   LearningRate 0.2164   Epoch: 1   Global Step: 11220   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:27:56,993-Speed 5963.02 samples/sec   Loss 13.6873   LearningRate 0.2166   Epoch: 1   Global Step: 11230   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:28:03,875-Speed 5954.47 samples/sec   Loss 13.7739   LearningRate 0.2168   Epoch: 1   Global Step: 11240   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:28:10,729-Speed 5976.34 samples/sec   Loss 13.7123   LearningRate 0.2170   Epoch: 1   Global Step: 11250   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:28:17,606-Speed 5957.70 samples/sec   Loss 13.7345   LearningRate 0.2172   Epoch: 1   Global Step: 11260   Fp16 Grad Scale: 524288   Required: 38 hours
Training: 2022-01-07 22:28:24,456-Speed 5980.81 samples/sec   Loss 13.7241   LearningRate 0.2174   Epoch: 1   Global Step: 11270   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:28:31,338-Speed 5952.07 samples/sec   Loss 13.6636   LearningRate 0.2176   Epoch: 1   Global Step: 11280   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:28:38,214-Speed 5958.97 samples/sec   Loss 13.6032   LearningRate 0.2178   Epoch: 1   Global Step: 11290   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:28:45,095-Speed 5953.01 samples/sec   Loss 13.7058   LearningRate 0.2180   Epoch: 1   Global Step: 11300   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:28:51,940-Speed 5984.94 samples/sec   Loss 13.6561   LearningRate 0.2182   Epoch: 1   Global Step: 11310   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:28:58,805-Speed 5968.09 samples/sec   Loss 13.6144   LearningRate 0.2183   Epoch: 1   Global Step: 11320   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:29:05,679-Speed 5963.07 samples/sec   Loss 13.6600   LearningRate 0.2185   Epoch: 1   Global Step: 11330   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:29:12,558-Speed 5955.44 samples/sec   Loss 13.6718   LearningRate 0.2187   Epoch: 1   Global Step: 11340   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:29:19,423-Speed 5967.43 samples/sec   Loss 13.6897   LearningRate 0.2189   Epoch: 1   Global Step: 11350   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:29:26,284-Speed 5971.75 samples/sec   Loss 13.6609   LearningRate 0.2191   Epoch: 1   Global Step: 11360   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:29:33,138-Speed 5976.50 samples/sec   Loss 13.5745   LearningRate 0.2193   Epoch: 1   Global Step: 11370   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:29:40,008-Speed 5963.34 samples/sec   Loss 13.6890   LearningRate 0.2195   Epoch: 1   Global Step: 11380   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:29:46,882-Speed 5959.90 samples/sec   Loss 13.6210   LearningRate 0.2197   Epoch: 1   Global Step: 11390   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:29:53,730-Speed 5981.62 samples/sec   Loss 13.5843   LearningRate 0.2199   Epoch: 1   Global Step: 11400   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:30:00,584-Speed 5977.12 samples/sec   Loss 13.6330   LearningRate 0.2201   Epoch: 1   Global Step: 11410   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:30:07,448-Speed 5968.12 samples/sec   Loss 13.6152   LearningRate 0.2203   Epoch: 1   Global Step: 11420   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:30:14,312-Speed 5969.20 samples/sec   Loss 13.6855   LearningRate 0.2205   Epoch: 1   Global Step: 11430   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:30:21,189-Speed 5957.01 samples/sec   Loss 13.6872   LearningRate 0.2207   Epoch: 1   Global Step: 11440   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:30:28,061-Speed 5961.52 samples/sec   Loss 13.6349   LearningRate 0.2209   Epoch: 1   Global Step: 11450   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:30:34,936-Speed 5959.30 samples/sec   Loss 13.6502   LearningRate 0.2210   Epoch: 1   Global Step: 11460   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:30:41,779-Speed 5987.51 samples/sec   Loss 13.5804   LearningRate 0.2212   Epoch: 1   Global Step: 11470   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:30:48,633-Speed 5976.37 samples/sec   Loss 13.5876   LearningRate 0.2214   Epoch: 1   Global Step: 11480   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:30:55,499-Speed 5967.24 samples/sec   Loss 13.7085   LearningRate 0.2216   Epoch: 1   Global Step: 11490   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:31:02,348-Speed 5983.46 samples/sec   Loss 13.6242   LearningRate 0.2218   Epoch: 1   Global Step: 11500   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:31:09,206-Speed 5973.30 samples/sec   Loss 13.5619   LearningRate 0.2220   Epoch: 1   Global Step: 11510   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:31:16,060-Speed 5976.67 samples/sec   Loss 13.6250   LearningRate 0.2222   Epoch: 1   Global Step: 11520   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:31:22,926-Speed 5968.93 samples/sec   Loss 13.6700   LearningRate 0.2224   Epoch: 1   Global Step: 11530   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:31:29,814-Speed 5947.58 samples/sec   Loss 13.6171   LearningRate 0.2226   Epoch: 1   Global Step: 11540   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:31:36,679-Speed 5981.81 samples/sec   Loss 13.5750   LearningRate 0.2228   Epoch: 1   Global Step: 11550   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:31:43,550-Speed 5963.28 samples/sec   Loss 13.5104   LearningRate 0.2230   Epoch: 1   Global Step: 11560   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:31:50,419-Speed 5964.11 samples/sec   Loss 13.7035   LearningRate 0.2232   Epoch: 1   Global Step: 11570   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:31:57,293-Speed 5960.50 samples/sec   Loss 13.6351   LearningRate 0.2234   Epoch: 1   Global Step: 11580   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:32:04,168-Speed 5963.96 samples/sec   Loss 13.5961   LearningRate 0.2236   Epoch: 1   Global Step: 11590   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:32:11,129-Speed 5885.91 samples/sec   Loss 13.6058   LearningRate 0.2237   Epoch: 1   Global Step: 11600   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:32:18,001-Speed 5961.66 samples/sec   Loss 13.5534   LearningRate 0.2239   Epoch: 1   Global Step: 11610   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:32:24,854-Speed 5978.06 samples/sec   Loss 13.5400   LearningRate 0.2241   Epoch: 1   Global Step: 11620   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:32:31,731-Speed 5957.53 samples/sec   Loss 13.5385   LearningRate 0.2243   Epoch: 1   Global Step: 11630   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:32:38,605-Speed 5959.84 samples/sec   Loss 13.4874   LearningRate 0.2245   Epoch: 1   Global Step: 11640   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:32:45,466-Speed 5973.12 samples/sec   Loss 13.5296   LearningRate 0.2247   Epoch: 1   Global Step: 11650   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:32:52,335-Speed 5963.84 samples/sec   Loss 13.6287   LearningRate 0.2249   Epoch: 1   Global Step: 11660   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:32:59,185-Speed 5980.59 samples/sec   Loss 13.5643   LearningRate 0.2251   Epoch: 1   Global Step: 11670   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:33:06,038-Speed 5977.60 samples/sec   Loss 13.5808   LearningRate 0.2253   Epoch: 1   Global Step: 11680   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:33:13,036-Speed 5854.65 samples/sec   Loss 13.5032   LearningRate 0.2255   Epoch: 1   Global Step: 11690   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:33:19,981-Speed 5899.26 samples/sec   Loss 13.5169   LearningRate 0.2257   Epoch: 1   Global Step: 11700   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:33:26,928-Speed 5896.85 samples/sec   Loss 13.5284   LearningRate 0.2259   Epoch: 1   Global Step: 11710   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:33:33,819-Speed 5945.87 samples/sec   Loss 13.6204   LearningRate 0.2261   Epoch: 1   Global Step: 11720   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:33:40,679-Speed 5971.84 samples/sec   Loss 13.6002   LearningRate 0.2263   Epoch: 1   Global Step: 11730   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:33:48,159-Speed 5478.21 samples/sec   Loss 13.6105   LearningRate 0.2264   Epoch: 1   Global Step: 11740   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:33:54,997-Speed 5991.91 samples/sec   Loss 13.5823   LearningRate 0.2266   Epoch: 1   Global Step: 11750   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:34:01,846-Speed 5981.34 samples/sec   Loss 13.5175   LearningRate 0.2268   Epoch: 1   Global Step: 11760   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:34:08,798-Speed 5893.69 samples/sec   Loss 13.5645   LearningRate 0.2270   Epoch: 1   Global Step: 11770   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:34:15,664-Speed 5966.18 samples/sec   Loss 13.5154   LearningRate 0.2272   Epoch: 1   Global Step: 11780   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:34:22,556-Speed 5944.54 samples/sec   Loss 13.5497   LearningRate 0.2274   Epoch: 1   Global Step: 11790   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:34:29,404-Speed 5981.89 samples/sec   Loss 13.5809   LearningRate 0.2276   Epoch: 1   Global Step: 11800   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:34:36,263-Speed 5972.61 samples/sec   Loss 13.5916   LearningRate 0.2278   Epoch: 1   Global Step: 11810   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:34:43,131-Speed 5965.27 samples/sec   Loss 13.4983   LearningRate 0.2280   Epoch: 1   Global Step: 11820   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:34:50,003-Speed 5961.51 samples/sec   Loss 13.5457   LearningRate 0.2282   Epoch: 1   Global Step: 11830   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:34:56,861-Speed 5973.31 samples/sec   Loss 13.5313   LearningRate 0.2284   Epoch: 1   Global Step: 11840   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:35:03,760-Speed 5938.58 samples/sec   Loss 13.5980   LearningRate 0.2286   Epoch: 1   Global Step: 11850   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:35:10,630-Speed 5963.31 samples/sec   Loss 13.6129   LearningRate 0.2288   Epoch: 1   Global Step: 11860   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:35:17,505-Speed 5959.77 samples/sec   Loss 13.5270   LearningRate 0.2290   Epoch: 1   Global Step: 11870   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:35:24,390-Speed 5950.37 samples/sec   Loss 13.4769   LearningRate 0.2291   Epoch: 1   Global Step: 11880   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:35:31,250-Speed 5971.91 samples/sec   Loss 13.5047   LearningRate 0.2293   Epoch: 1   Global Step: 11890   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:35:38,098-Speed 5982.49 samples/sec   Loss 13.5638   LearningRate 0.2295   Epoch: 1   Global Step: 11900   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:35:44,986-Speed 5947.88 samples/sec   Loss 13.6013   LearningRate 0.2297   Epoch: 1   Global Step: 11910   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:35:51,842-Speed 5976.81 samples/sec   Loss 13.5010   LearningRate 0.2299   Epoch: 1   Global Step: 11920   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:35:58,790-Speed 5895.58 samples/sec   Loss 13.5016   LearningRate 0.2301   Epoch: 1   Global Step: 11930   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:36:05,661-Speed 5963.04 samples/sec   Loss 13.4794   LearningRate 0.2303   Epoch: 1   Global Step: 11940   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:36:12,543-Speed 5952.62 samples/sec   Loss 13.4979   LearningRate 0.2305   Epoch: 1   Global Step: 11950   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:36:19,403-Speed 5972.04 samples/sec   Loss 13.5020   LearningRate 0.2307   Epoch: 1   Global Step: 11960   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:36:26,252-Speed 5982.15 samples/sec   Loss 13.4064   LearningRate 0.2309   Epoch: 1   Global Step: 11970   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:36:33,104-Speed 5978.89 samples/sec   Loss 13.4889   LearningRate 0.2311   Epoch: 1   Global Step: 11980   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:36:40,002-Speed 5939.28 samples/sec   Loss 13.6197   LearningRate 0.2313   Epoch: 1   Global Step: 11990   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:36:46,860-Speed 5974.12 samples/sec   Loss 13.4884   LearningRate 0.2315   Epoch: 1   Global Step: 12000   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:36:53,703-Speed 5986.62 samples/sec   Loss 13.5918   LearningRate 0.2317   Epoch: 1   Global Step: 12010   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:37:00,563-Speed 5971.28 samples/sec   Loss 13.5334   LearningRate 0.2318   Epoch: 1   Global Step: 12020   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:37:07,414-Speed 5980.76 samples/sec   Loss 13.5287   LearningRate 0.2320   Epoch: 1   Global Step: 12030   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:37:14,279-Speed 5976.89 samples/sec   Loss 13.4388   LearningRate 0.2322   Epoch: 1   Global Step: 12040   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:37:21,174-Speed 5981.20 samples/sec   Loss 13.4080   LearningRate 0.2324   Epoch: 1   Global Step: 12050   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:37:28,025-Speed 5979.86 samples/sec   Loss 13.5209   LearningRate 0.2326   Epoch: 1   Global Step: 12060   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:37:34,880-Speed 5976.64 samples/sec   Loss 13.3983   LearningRate 0.2328   Epoch: 1   Global Step: 12070   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:37:41,731-Speed 5979.33 samples/sec   Loss 13.3840   LearningRate 0.2330   Epoch: 1   Global Step: 12080   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:37:48,663-Speed 5910.43 samples/sec   Loss 13.3153   LearningRate 0.2332   Epoch: 1   Global Step: 12090   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:37:55,529-Speed 5967.04 samples/sec   Loss 13.4354   LearningRate 0.2334   Epoch: 1   Global Step: 12100   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:38:02,389-Speed 5971.70 samples/sec   Loss 13.5573   LearningRate 0.2336   Epoch: 1   Global Step: 12110   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:38:09,258-Speed 5963.76 samples/sec   Loss 13.5579   LearningRate 0.2338   Epoch: 1   Global Step: 12120   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:38:16,113-Speed 5977.32 samples/sec   Loss 13.4677   LearningRate 0.2340   Epoch: 1   Global Step: 12130   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:38:22,976-Speed 5969.89 samples/sec   Loss 13.4408   LearningRate 0.2342   Epoch: 1   Global Step: 12140   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:38:29,833-Speed 5974.16 samples/sec   Loss 13.4713   LearningRate 0.2344   Epoch: 1   Global Step: 12150   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:38:36,695-Speed 5970.32 samples/sec   Loss 13.4582   LearningRate 0.2345   Epoch: 1   Global Step: 12160   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:38:44,289-Speed 5395.19 samples/sec   Loss 13.4835   LearningRate 0.2347   Epoch: 1   Global Step: 12170   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:38:51,146-Speed 5974.77 samples/sec   Loss 13.4743   LearningRate 0.2349   Epoch: 1   Global Step: 12180   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:38:58,023-Speed 5957.12 samples/sec   Loss 13.4596   LearningRate 0.2351   Epoch: 1   Global Step: 12190   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:39:04,893-Speed 5962.84 samples/sec   Loss 13.4159   LearningRate 0.2353   Epoch: 1   Global Step: 12200   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:39:11,741-Speed 5982.10 samples/sec   Loss 13.4644   LearningRate 0.2355   Epoch: 1   Global Step: 12210   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:39:18,702-Speed 5885.35 samples/sec   Loss 13.4724   LearningRate 0.2357   Epoch: 1   Global Step: 12220   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:39:25,556-Speed 5977.63 samples/sec   Loss 13.4566   LearningRate 0.2359   Epoch: 1   Global Step: 12230   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:39:32,416-Speed 5971.38 samples/sec   Loss 13.4031   LearningRate 0.2361   Epoch: 1   Global Step: 12240   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:39:39,310-Speed 5942.73 samples/sec   Loss 13.4809   LearningRate 0.2363   Epoch: 1   Global Step: 12250   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:39:46,174-Speed 5971.69 samples/sec   Loss 13.4125   LearningRate 0.2365   Epoch: 1   Global Step: 12260   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:39:53,045-Speed 5964.48 samples/sec   Loss 13.4082   LearningRate 0.2367   Epoch: 1   Global Step: 12270   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:39:59,897-Speed 5978.60 samples/sec   Loss 13.3818   LearningRate 0.2369   Epoch: 1   Global Step: 12280   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:40:06,779-Speed 5953.38 samples/sec   Loss 13.3761   LearningRate 0.2371   Epoch: 1   Global Step: 12290   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:40:13,645-Speed 5966.96 samples/sec   Loss 13.4921   LearningRate 0.2372   Epoch: 1   Global Step: 12300   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:40:20,534-Speed 5947.11 samples/sec   Loss 13.4065   LearningRate 0.2374   Epoch: 1   Global Step: 12310   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:40:27,414-Speed 5954.63 samples/sec   Loss 13.4754   LearningRate 0.2376   Epoch: 1   Global Step: 12320   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:40:34,292-Speed 5955.89 samples/sec   Loss 13.3684   LearningRate 0.2378   Epoch: 1   Global Step: 12330   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:40:41,219-Speed 5914.70 samples/sec   Loss 13.4700   LearningRate 0.2380   Epoch: 1   Global Step: 12340   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:40:48,082-Speed 5969.56 samples/sec   Loss 13.5197   LearningRate 0.2382   Epoch: 1   Global Step: 12350   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:40:54,942-Speed 5972.24 samples/sec   Loss 13.4776   LearningRate 0.2384   Epoch: 1   Global Step: 12360   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:41:01,797-Speed 5975.76 samples/sec   Loss 13.4367   LearningRate 0.2386   Epoch: 1   Global Step: 12370   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:41:08,643-Speed 5984.10 samples/sec   Loss 13.4597   LearningRate 0.2388   Epoch: 1   Global Step: 12380   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:41:15,483-Speed 5989.75 samples/sec   Loss 13.4099   LearningRate 0.2390   Epoch: 1   Global Step: 12390   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:41:22,344-Speed 5971.40 samples/sec   Loss 13.4023   LearningRate 0.2392   Epoch: 1   Global Step: 12400   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:41:29,208-Speed 5968.43 samples/sec   Loss 13.3859   LearningRate 0.2394   Epoch: 1   Global Step: 12410   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:41:36,073-Speed 5967.26 samples/sec   Loss 13.4308   LearningRate 0.2396   Epoch: 1   Global Step: 12420   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:41:42,926-Speed 5978.23 samples/sec   Loss 13.4157   LearningRate 0.2398   Epoch: 1   Global Step: 12430   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:41:49,778-Speed 5979.00 samples/sec   Loss 13.4859   LearningRate 0.2399   Epoch: 1   Global Step: 12440   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:41:56,672-Speed 5942.50 samples/sec   Loss 13.4966   LearningRate 0.2401   Epoch: 1   Global Step: 12450   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:42:03,527-Speed 5976.23 samples/sec   Loss 13.3919   LearningRate 0.2403   Epoch: 1   Global Step: 12460   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:42:10,411-Speed 5951.07 samples/sec   Loss 13.5871   LearningRate 0.2405   Epoch: 1   Global Step: 12470   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:42:17,273-Speed 5970.62 samples/sec   Loss 13.5010   LearningRate 0.2407   Epoch: 1   Global Step: 12480   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:42:24,128-Speed 5975.31 samples/sec   Loss 13.3839   LearningRate 0.2409   Epoch: 1   Global Step: 12490   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:42:31,057-Speed 5915.63 samples/sec   Loss 13.2821   LearningRate 0.2411   Epoch: 1   Global Step: 12500   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:42:40,241-Speed 4461.10 samples/sec   Loss 13.3748   LearningRate 0.2413   Epoch: 1   Global Step: 12510   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:42:47,135-Speed 5942.57 samples/sec   Loss 13.3135   LearningRate 0.2415   Epoch: 1   Global Step: 12520   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:42:53,981-Speed 5983.55 samples/sec   Loss 13.4362   LearningRate 0.2417   Epoch: 1   Global Step: 12530   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:43:00,830-Speed 5982.56 samples/sec   Loss 13.3551   LearningRate 0.2419   Epoch: 1   Global Step: 12540   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:43:07,703-Speed 5960.75 samples/sec   Loss 13.3940   LearningRate 0.2421   Epoch: 1   Global Step: 12550   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:43:14,548-Speed 5984.80 samples/sec   Loss 13.3821   LearningRate 0.2423   Epoch: 1   Global Step: 12560   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:43:21,397-Speed 5981.28 samples/sec   Loss 13.4084   LearningRate 0.2425   Epoch: 1   Global Step: 12570   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:43:28,253-Speed 5975.92 samples/sec   Loss 13.4886   LearningRate 0.2426   Epoch: 1   Global Step: 12580   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:43:35,097-Speed 5985.71 samples/sec   Loss 13.3540   LearningRate 0.2428   Epoch: 1   Global Step: 12590   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:43:41,958-Speed 5971.84 samples/sec   Loss 13.4508   LearningRate 0.2430   Epoch: 1   Global Step: 12600   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:43:48,824-Speed 5967.34 samples/sec   Loss 13.2850   LearningRate 0.2432   Epoch: 1   Global Step: 12610   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:43:55,678-Speed 5976.82 samples/sec   Loss 13.3951   LearningRate 0.2434   Epoch: 1   Global Step: 12620   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:44:02,621-Speed 5900.97 samples/sec   Loss 13.3448   LearningRate 0.2436   Epoch: 1   Global Step: 12630   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:44:09,478-Speed 5974.67 samples/sec   Loss 13.3857   LearningRate 0.2438   Epoch: 1   Global Step: 12640   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:44:16,375-Speed 5940.30 samples/sec   Loss 13.4477   LearningRate 0.2440   Epoch: 1   Global Step: 12650   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:44:23,238-Speed 5968.79 samples/sec   Loss 13.3595   LearningRate 0.2442   Epoch: 1   Global Step: 12660   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:44:30,089-Speed 5980.02 samples/sec   Loss 13.3928   LearningRate 0.2444   Epoch: 1   Global Step: 12670   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:44:36,945-Speed 5976.07 samples/sec   Loss 13.3819   LearningRate 0.2446   Epoch: 1   Global Step: 12680   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:44:43,797-Speed 5978.72 samples/sec   Loss 13.4128   LearningRate 0.2448   Epoch: 1   Global Step: 12690   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:44:50,658-Speed 5970.78 samples/sec   Loss 13.4026   LearningRate 0.2450   Epoch: 1   Global Step: 12700   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:44:57,518-Speed 5972.35 samples/sec   Loss 13.4589   LearningRate 0.2452   Epoch: 1   Global Step: 12710   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:45:04,370-Speed 5978.92 samples/sec   Loss 13.4362   LearningRate 0.2453   Epoch: 1   Global Step: 12720   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:45:11,263-Speed 5943.73 samples/sec   Loss 13.4275   LearningRate 0.2455   Epoch: 1   Global Step: 12730   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:45:18,142-Speed 5966.25 samples/sec   Loss 13.4018   LearningRate 0.2457   Epoch: 1   Global Step: 12740   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:45:25,005-Speed 5969.37 samples/sec   Loss 13.3887   LearningRate 0.2459   Epoch: 1   Global Step: 12750   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:45:31,853-Speed 5984.42 samples/sec   Loss 13.3583   LearningRate 0.2461   Epoch: 1   Global Step: 12760   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:45:38,697-Speed 5985.73 samples/sec   Loss 13.4276   LearningRate 0.2463   Epoch: 1   Global Step: 12770   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:45:45,549-Speed 5978.64 samples/sec   Loss 13.4564   LearningRate 0.2465   Epoch: 1   Global Step: 12780   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:45:52,396-Speed 5982.71 samples/sec   Loss 13.3388   LearningRate 0.2467   Epoch: 1   Global Step: 12790   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:45:59,270-Speed 5963.51 samples/sec   Loss 13.3192   LearningRate 0.2469   Epoch: 1   Global Step: 12800   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:46:06,123-Speed 5977.80 samples/sec   Loss 13.3994   LearningRate 0.2471   Epoch: 1   Global Step: 12810   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:46:12,993-Speed 5964.22 samples/sec   Loss 13.3300   LearningRate 0.2473   Epoch: 1   Global Step: 12820   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:46:19,844-Speed 5979.87 samples/sec   Loss 13.4093   LearningRate 0.2475   Epoch: 1   Global Step: 12830   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:46:26,697-Speed 5978.11 samples/sec   Loss 13.4447   LearningRate 0.2477   Epoch: 1   Global Step: 12840   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:46:33,586-Speed 5946.54 samples/sec   Loss 13.3710   LearningRate 0.2479   Epoch: 1   Global Step: 12850   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:46:40,428-Speed 5988.21 samples/sec   Loss 13.3203   LearningRate 0.2480   Epoch: 1   Global Step: 12860   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:46:47,283-Speed 5977.66 samples/sec   Loss 13.4673   LearningRate 0.2482   Epoch: 1   Global Step: 12870   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:46:54,132-Speed 5981.52 samples/sec   Loss 13.4230   LearningRate 0.2484   Epoch: 1   Global Step: 12880   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:47:00,983-Speed 5979.98 samples/sec   Loss 13.3872   LearningRate 0.2486   Epoch: 1   Global Step: 12890   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:47:07,838-Speed 5976.80 samples/sec   Loss 13.3727   LearningRate 0.2488   Epoch: 1   Global Step: 12900   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:47:14,695-Speed 5977.19 samples/sec   Loss 13.3786   LearningRate 0.2490   Epoch: 1   Global Step: 12910   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:47:21,545-Speed 5980.22 samples/sec   Loss 13.3260   LearningRate 0.2492   Epoch: 1   Global Step: 12920   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:47:28,390-Speed 5984.45 samples/sec   Loss 13.4114   LearningRate 0.2494   Epoch: 1   Global Step: 12930   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:47:35,242-Speed 5979.37 samples/sec   Loss 13.3893   LearningRate 0.2496   Epoch: 1   Global Step: 12940   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:47:42,090-Speed 5983.28 samples/sec   Loss 13.3332   LearningRate 0.2498   Epoch: 1   Global Step: 12950   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:47:48,933-Speed 5988.42 samples/sec   Loss 13.3690   LearningRate 0.2500   Epoch: 1   Global Step: 12960   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:47:55,793-Speed 5972.33 samples/sec   Loss 13.3407   LearningRate 0.2502   Epoch: 1   Global Step: 12970   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:48:02,640-Speed 5983.01 samples/sec   Loss 13.3842   LearningRate 0.2504   Epoch: 1   Global Step: 12980   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:48:09,495-Speed 5975.89 samples/sec   Loss 13.3845   LearningRate 0.2506   Epoch: 1   Global Step: 12990   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:48:16,381-Speed 5949.62 samples/sec   Loss 13.4218   LearningRate 0.2507   Epoch: 1   Global Step: 13000   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:48:23,241-Speed 5972.12 samples/sec   Loss 13.3755   LearningRate 0.2509   Epoch: 1   Global Step: 13010   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:48:30,109-Speed 5964.75 samples/sec   Loss 13.4353   LearningRate 0.2511   Epoch: 1   Global Step: 13020   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:48:36,978-Speed 5965.01 samples/sec   Loss 13.3840   LearningRate 0.2513   Epoch: 1   Global Step: 13030   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:48:43,836-Speed 5973.93 samples/sec   Loss 13.3657   LearningRate 0.2515   Epoch: 1   Global Step: 13040   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:48:50,735-Speed 5937.95 samples/sec   Loss 13.4059   LearningRate 0.2517   Epoch: 1   Global Step: 13050   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:48:57,591-Speed 5976.07 samples/sec   Loss 13.4297   LearningRate 0.2519   Epoch: 1   Global Step: 13060   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:49:04,454-Speed 5968.96 samples/sec   Loss 13.4385   LearningRate 0.2521   Epoch: 1   Global Step: 13070   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:49:11,310-Speed 5977.29 samples/sec   Loss 13.3193   LearningRate 0.2523   Epoch: 1   Global Step: 13080   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:49:18,172-Speed 5969.52 samples/sec   Loss 13.3120   LearningRate 0.2525   Epoch: 1   Global Step: 13090   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:49:25,026-Speed 5978.31 samples/sec   Loss 13.4094   LearningRate 0.2527   Epoch: 1   Global Step: 13100   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:49:31,872-Speed 5983.77 samples/sec   Loss 13.3162   LearningRate 0.2529   Epoch: 1   Global Step: 13110   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:49:38,725-Speed 5977.23 samples/sec   Loss 13.4074   LearningRate 0.2531   Epoch: 1   Global Step: 13120   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:49:45,624-Speed 5938.81 samples/sec   Loss 13.2848   LearningRate 0.2533   Epoch: 1   Global Step: 13130   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:49:52,504-Speed 5955.16 samples/sec   Loss 13.3886   LearningRate 0.2534   Epoch: 1   Global Step: 13140   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:49:59,463-Speed 5887.96 samples/sec   Loss 13.3250   LearningRate 0.2536   Epoch: 1   Global Step: 13150   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:50:06,319-Speed 5975.67 samples/sec   Loss 13.3919   LearningRate 0.2538   Epoch: 1   Global Step: 13160   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:50:13,151-Speed 5996.27 samples/sec   Loss 13.4338   LearningRate 0.2540   Epoch: 1   Global Step: 13170   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:50:20,031-Speed 5954.95 samples/sec   Loss 13.4173   LearningRate 0.2542   Epoch: 1   Global Step: 13180   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:50:26,909-Speed 5956.13 samples/sec   Loss 13.4546   LearningRate 0.2544   Epoch: 1   Global Step: 13190   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:50:33,769-Speed 5972.57 samples/sec   Loss 13.4122   LearningRate 0.2546   Epoch: 1   Global Step: 13200   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:50:40,619-Speed 5980.19 samples/sec   Loss 13.4007   LearningRate 0.2548   Epoch: 1   Global Step: 13210   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:50:47,467-Speed 5982.65 samples/sec   Loss 13.3444   LearningRate 0.2550   Epoch: 1   Global Step: 13220   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:50:54,323-Speed 5975.62 samples/sec   Loss 13.3490   LearningRate 0.2552   Epoch: 1   Global Step: 13230   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:51:01,166-Speed 5986.04 samples/sec   Loss 13.3728   LearningRate 0.2554   Epoch: 1   Global Step: 13240   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:51:08,015-Speed 5981.84 samples/sec   Loss 13.3331   LearningRate 0.2556   Epoch: 1   Global Step: 13250   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:51:14,893-Speed 5957.57 samples/sec   Loss 13.3943   LearningRate 0.2558   Epoch: 1   Global Step: 13260   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 22:51:21,747-Speed 5977.01 samples/sec   Loss 13.2752   LearningRate 0.2560   Epoch: 1   Global Step: 13270   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:51:28,603-Speed 5976.15 samples/sec   Loss 13.2742   LearningRate 0.2561   Epoch: 1   Global Step: 13280   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:51:35,460-Speed 5974.04 samples/sec   Loss 13.3283   LearningRate 0.2563   Epoch: 1   Global Step: 13290   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:51:42,309-Speed 5981.66 samples/sec   Loss 13.2997   LearningRate 0.2565   Epoch: 1   Global Step: 13300   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:51:49,151-Speed 5987.25 samples/sec   Loss 13.4128   LearningRate 0.2567   Epoch: 1   Global Step: 13310   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:51:56,010-Speed 5973.04 samples/sec   Loss 13.2805   LearningRate 0.2569   Epoch: 1   Global Step: 13320   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:52:02,865-Speed 5976.49 samples/sec   Loss 13.3333   LearningRate 0.2571   Epoch: 1   Global Step: 13330   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:52:09,721-Speed 5975.21 samples/sec   Loss 13.4299   LearningRate 0.2573   Epoch: 1   Global Step: 13340   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:52:16,581-Speed 5972.14 samples/sec   Loss 13.3732   LearningRate 0.2575   Epoch: 1   Global Step: 13350   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:52:23,430-Speed 5981.56 samples/sec   Loss 13.3717   LearningRate 0.2577   Epoch: 1   Global Step: 13360   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:52:30,285-Speed 5976.44 samples/sec   Loss 13.4000   LearningRate 0.2579   Epoch: 1   Global Step: 13370   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:52:37,149-Speed 5968.72 samples/sec   Loss 13.3502   LearningRate 0.2581   Epoch: 1   Global Step: 13380   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:52:44,024-Speed 5961.92 samples/sec   Loss 13.3221   LearningRate 0.2583   Epoch: 1   Global Step: 13390   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:52:50,879-Speed 5975.53 samples/sec   Loss 13.3590   LearningRate 0.2585   Epoch: 1   Global Step: 13400   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:52:57,736-Speed 5975.28 samples/sec   Loss 13.3513   LearningRate 0.2587   Epoch: 1   Global Step: 13410   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:53:04,581-Speed 5985.10 samples/sec   Loss 13.3000   LearningRate 0.2588   Epoch: 1   Global Step: 13420   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:53:11,432-Speed 5979.22 samples/sec   Loss 13.3401   LearningRate 0.2590   Epoch: 1   Global Step: 13430   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:53:18,306-Speed 5961.32 samples/sec   Loss 13.4202   LearningRate 0.2592   Epoch: 1   Global Step: 13440   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:53:25,159-Speed 5978.05 samples/sec   Loss 13.3750   LearningRate 0.2594   Epoch: 1   Global Step: 13450   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:53:32,010-Speed 5979.46 samples/sec   Loss 13.3453   LearningRate 0.2596   Epoch: 1   Global Step: 13460   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:53:38,881-Speed 5962.76 samples/sec   Loss 13.3172   LearningRate 0.2598   Epoch: 1   Global Step: 13470   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:53:45,740-Speed 5972.15 samples/sec   Loss 13.3339   LearningRate 0.2600   Epoch: 1   Global Step: 13480   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:53:52,601-Speed 5971.21 samples/sec   Loss 13.3951   LearningRate 0.2602   Epoch: 1   Global Step: 13490   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:53:59,463-Speed 5970.62 samples/sec   Loss 13.3484   LearningRate 0.2604   Epoch: 1   Global Step: 13500   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:54:06,327-Speed 5968.48 samples/sec   Loss 13.3612   LearningRate 0.2606   Epoch: 1   Global Step: 13510   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:54:13,188-Speed 5971.19 samples/sec   Loss 13.4038   LearningRate 0.2608   Epoch: 1   Global Step: 13520   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:54:20,047-Speed 5972.76 samples/sec   Loss 13.3616   LearningRate 0.2610   Epoch: 1   Global Step: 13530   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:54:26,917-Speed 5969.03 samples/sec   Loss 13.3115   LearningRate 0.2612   Epoch: 1   Global Step: 13540   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:54:33,800-Speed 5951.65 samples/sec   Loss 13.3324   LearningRate 0.2614   Epoch: 1   Global Step: 13550   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:54:40,639-Speed 5990.33 samples/sec   Loss 13.3609   LearningRate 0.2615   Epoch: 1   Global Step: 13560   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:54:47,502-Speed 5969.59 samples/sec   Loss 13.3426   LearningRate 0.2617   Epoch: 1   Global Step: 13570   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:54:54,360-Speed 5972.99 samples/sec   Loss 13.4526   LearningRate 0.2619   Epoch: 1   Global Step: 13580   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:55:01,217-Speed 5974.74 samples/sec   Loss 13.3280   LearningRate 0.2621   Epoch: 1   Global Step: 13590   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:55:08,131-Speed 5924.90 samples/sec   Loss 13.3066   LearningRate 0.2623   Epoch: 1   Global Step: 13600   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:55:14,982-Speed 5980.39 samples/sec   Loss 13.2876   LearningRate 0.2625   Epoch: 1   Global Step: 13610   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:55:21,857-Speed 5964.24 samples/sec   Loss 13.3827   LearningRate 0.2627   Epoch: 1   Global Step: 13620   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:55:28,720-Speed 5971.12 samples/sec   Loss 13.3172   LearningRate 0.2629   Epoch: 1   Global Step: 13630   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:55:35,585-Speed 5967.75 samples/sec   Loss 13.2808   LearningRate 0.2631   Epoch: 1   Global Step: 13640   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:55:42,535-Speed 5894.48 samples/sec   Loss 13.3140   LearningRate 0.2633   Epoch: 1   Global Step: 13650   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:55:49,391-Speed 5976.33 samples/sec   Loss 13.3583   LearningRate 0.2635   Epoch: 1   Global Step: 13660   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:55:56,276-Speed 5949.08 samples/sec   Loss 13.3435   LearningRate 0.2637   Epoch: 1   Global Step: 13670   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:56:03,155-Speed 5958.29 samples/sec   Loss 13.2707   LearningRate 0.2639   Epoch: 1   Global Step: 13680   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:56:10,010-Speed 5976.05 samples/sec   Loss 13.3229   LearningRate 0.2641   Epoch: 1   Global Step: 13690   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:56:16,856-Speed 5984.32 samples/sec   Loss 13.3930   LearningRate 0.2642   Epoch: 1   Global Step: 13700   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:56:23,710-Speed 5977.13 samples/sec   Loss 13.3522   LearningRate 0.2644   Epoch: 1   Global Step: 13710   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:56:30,570-Speed 5971.74 samples/sec   Loss 13.3831   LearningRate 0.2646   Epoch: 1   Global Step: 13720   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:56:37,420-Speed 5981.00 samples/sec   Loss 13.2492   LearningRate 0.2648   Epoch: 1   Global Step: 13730   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:56:44,282-Speed 5970.47 samples/sec   Loss 13.4487   LearningRate 0.2650   Epoch: 1   Global Step: 13740   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:56:51,130-Speed 5982.35 samples/sec   Loss 13.3583   LearningRate 0.2652   Epoch: 1   Global Step: 13750   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:56:58,018-Speed 5947.17 samples/sec   Loss 13.3249   LearningRate 0.2654   Epoch: 1   Global Step: 13760   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:57:04,888-Speed 5964.85 samples/sec   Loss 13.3623   LearningRate 0.2656   Epoch: 1   Global Step: 13770   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:57:11,774-Speed 5950.40 samples/sec   Loss 13.3219   LearningRate 0.2658   Epoch: 1   Global Step: 13780   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:57:18,633-Speed 5972.47 samples/sec   Loss 13.3856   LearningRate 0.2660   Epoch: 1   Global Step: 13790   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:57:25,498-Speed 5967.84 samples/sec   Loss 13.3582   LearningRate 0.2662   Epoch: 1   Global Step: 13800   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:57:32,350-Speed 5978.57 samples/sec   Loss 13.3538   LearningRate 0.2664   Epoch: 1   Global Step: 13810   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:57:39,204-Speed 5977.68 samples/sec   Loss 13.3269   LearningRate 0.2666   Epoch: 1   Global Step: 13820   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:57:46,074-Speed 5962.95 samples/sec   Loss 13.3918   LearningRate 0.2668   Epoch: 1   Global Step: 13830   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:57:52,924-Speed 5979.96 samples/sec   Loss 13.2970   LearningRate 0.2669   Epoch: 1   Global Step: 13840   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:57:59,812-Speed 5950.85 samples/sec   Loss 13.3519   LearningRate 0.2671   Epoch: 1   Global Step: 13850   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:58:06,677-Speed 5967.41 samples/sec   Loss 13.4387   LearningRate 0.2673   Epoch: 1   Global Step: 13860   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:58:13,545-Speed 5965.19 samples/sec   Loss 13.3872   LearningRate 0.2675   Epoch: 1   Global Step: 13870   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:58:20,423-Speed 5956.67 samples/sec   Loss 13.3093   LearningRate 0.2677   Epoch: 1   Global Step: 13880   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:58:27,267-Speed 5985.41 samples/sec   Loss 13.4046   LearningRate 0.2679   Epoch: 1   Global Step: 13890   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:58:34,129-Speed 5970.57 samples/sec   Loss 13.3765   LearningRate 0.2681   Epoch: 1   Global Step: 13900   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:58:40,988-Speed 5973.44 samples/sec   Loss 13.3732   LearningRate 0.2683   Epoch: 1   Global Step: 13910   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:58:47,845-Speed 5974.23 samples/sec   Loss 13.2572   LearningRate 0.2685   Epoch: 1   Global Step: 13920   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:58:54,707-Speed 5970.40 samples/sec   Loss 13.4115   LearningRate 0.2687   Epoch: 1   Global Step: 13930   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:59:01,571-Speed 5968.04 samples/sec   Loss 13.2762   LearningRate 0.2689   Epoch: 1   Global Step: 13940   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 22:59:08,478-Speed 5931.91 samples/sec   Loss 13.3598   LearningRate 0.2691   Epoch: 1   Global Step: 13950   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:59:15,401-Speed 5919.65 samples/sec   Loss 13.3261   LearningRate 0.2693   Epoch: 1   Global Step: 13960   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:59:22,332-Speed 5910.08 samples/sec   Loss 13.3511   LearningRate 0.2695   Epoch: 1   Global Step: 13970   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:59:29,247-Speed 5925.25 samples/sec   Loss 13.3789   LearningRate 0.2696   Epoch: 1   Global Step: 13980   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:59:36,108-Speed 5970.95 samples/sec   Loss 13.4042   LearningRate 0.2698   Epoch: 1   Global Step: 13990   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:59:42,972-Speed 5968.56 samples/sec   Loss 13.5184   LearningRate 0.2700   Epoch: 1   Global Step: 14000   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:59:49,834-Speed 5969.72 samples/sec   Loss 13.3888   LearningRate 0.2702   Epoch: 1   Global Step: 14010   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 22:59:56,728-Speed 5943.13 samples/sec   Loss 13.3770   LearningRate 0.2704   Epoch: 1   Global Step: 14020   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:00:03,678-Speed 5895.47 samples/sec   Loss 13.3109   LearningRate 0.2706   Epoch: 1   Global Step: 14030   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:00:10,624-Speed 5900.27 samples/sec   Loss 13.4707   LearningRate 0.2708   Epoch: 1   Global Step: 14040   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:00:17,528-Speed 5938.79 samples/sec   Loss 13.3500   LearningRate 0.2710   Epoch: 1   Global Step: 14050   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:00:24,408-Speed 5955.28 samples/sec   Loss 13.3226   LearningRate 0.2712   Epoch: 1   Global Step: 14060   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:00:31,297-Speed 5946.97 samples/sec   Loss 13.3664   LearningRate 0.2714   Epoch: 1   Global Step: 14070   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:00:38,219-Speed 5918.91 samples/sec   Loss 13.4132   LearningRate 0.2716   Epoch: 1   Global Step: 14080   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:00:45,147-Speed 5913.33 samples/sec   Loss 13.3968   LearningRate 0.2718   Epoch: 1   Global Step: 14090   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:00:52,010-Speed 5969.48 samples/sec   Loss 13.2517   LearningRate 0.2720   Epoch: 1   Global Step: 14100   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:00:58,863-Speed 5979.34 samples/sec   Loss 13.3662   LearningRate 0.2722   Epoch: 1   Global Step: 14110   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:01:05,719-Speed 5978.00 samples/sec   Loss 13.3320   LearningRate 0.2724   Epoch: 1   Global Step: 14120   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:01:12,566-Speed 5985.14 samples/sec   Loss 13.3620   LearningRate 0.2725   Epoch: 1   Global Step: 14130   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:01:19,416-Speed 5980.40 samples/sec   Loss 13.4128   LearningRate 0.2727   Epoch: 1   Global Step: 14140   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:01:26,284-Speed 5967.00 samples/sec   Loss 13.3255   LearningRate 0.2729   Epoch: 1   Global Step: 14150   Fp16 Grad Scale: 524288   Required: 38 hours
Training: 2022-01-07 23:01:33,136-Speed 5979.39 samples/sec   Loss 13.3650   LearningRate 0.2731   Epoch: 1   Global Step: 14160   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:01:40,007-Speed 5962.45 samples/sec   Loss 13.3501   LearningRate 0.2733   Epoch: 1   Global Step: 14170   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:01:46,860-Speed 5978.11 samples/sec   Loss 13.2552   LearningRate 0.2735   Epoch: 1   Global Step: 14180   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:01:53,719-Speed 5972.93 samples/sec   Loss 13.3569   LearningRate 0.2737   Epoch: 1   Global Step: 14190   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:02:00,574-Speed 5977.67 samples/sec   Loss 13.3420   LearningRate 0.2739   Epoch: 1   Global Step: 14200   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:02:07,447-Speed 5960.96 samples/sec   Loss 13.3168   LearningRate 0.2741   Epoch: 1   Global Step: 14210   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:02:14,328-Speed 5954.29 samples/sec   Loss 13.2723   LearningRate 0.2743   Epoch: 1   Global Step: 14220   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:02:21,188-Speed 5971.33 samples/sec   Loss 13.2500   LearningRate 0.2745   Epoch: 1   Global Step: 14230   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:02:28,075-Speed 5948.88 samples/sec   Loss 13.3454   LearningRate 0.2747   Epoch: 1   Global Step: 14240   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:02:34,927-Speed 5980.99 samples/sec   Loss 13.3337   LearningRate 0.2749   Epoch: 1   Global Step: 14250   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:02:41,824-Speed 5940.56 samples/sec   Loss 13.4041   LearningRate 0.2751   Epoch: 1   Global Step: 14260   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:02:48,681-Speed 5973.95 samples/sec   Loss 13.3859   LearningRate 0.2752   Epoch: 1   Global Step: 14270   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:02:55,550-Speed 5965.70 samples/sec   Loss 13.2703   LearningRate 0.2754   Epoch: 1   Global Step: 14280   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:03:02,404-Speed 5976.98 samples/sec   Loss 13.3658   LearningRate 0.2756   Epoch: 1   Global Step: 14290   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:03:09,252-Speed 5982.64 samples/sec   Loss 13.3290   LearningRate 0.2758   Epoch: 1   Global Step: 14300   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:03:16,109-Speed 5973.53 samples/sec   Loss 13.3825   LearningRate 0.2760   Epoch: 1   Global Step: 14310   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:03:22,975-Speed 5967.11 samples/sec   Loss 13.3303   LearningRate 0.2762   Epoch: 1   Global Step: 14320   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:03:29,829-Speed 5977.51 samples/sec   Loss 13.4136   LearningRate 0.2764   Epoch: 1   Global Step: 14330   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:03:36,684-Speed 5976.37 samples/sec   Loss 13.3776   LearningRate 0.2766   Epoch: 1   Global Step: 14340   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:03:43,534-Speed 5980.95 samples/sec   Loss 13.4049   LearningRate 0.2768   Epoch: 1   Global Step: 14350   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:03:50,394-Speed 5971.83 samples/sec   Loss 13.3844   LearningRate 0.2770   Epoch: 1   Global Step: 14360   Fp16 Grad Scale: 524288   Required: 38 hours
Training: 2022-01-07 23:03:57,264-Speed 5963.03 samples/sec   Loss 13.3966   LearningRate 0.2772   Epoch: 1   Global Step: 14370   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:04:04,126-Speed 5970.21 samples/sec   Loss 13.3063   LearningRate 0.2774   Epoch: 1   Global Step: 14380   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:04:10,987-Speed 5971.88 samples/sec   Loss 13.2916   LearningRate 0.2776   Epoch: 1   Global Step: 14390   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:04:17,874-Speed 5948.83 samples/sec   Loss 13.3528   LearningRate 0.2778   Epoch: 1   Global Step: 14400   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:04:24,724-Speed 5980.23 samples/sec   Loss 13.3707   LearningRate 0.2779   Epoch: 1   Global Step: 14410   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:04:31,587-Speed 5969.25 samples/sec   Loss 13.3708   LearningRate 0.2781   Epoch: 1   Global Step: 14420   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:04:38,438-Speed 5979.61 samples/sec   Loss 13.3413   LearningRate 0.2783   Epoch: 1   Global Step: 14430   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:04:45,289-Speed 5980.49 samples/sec   Loss 13.4444   LearningRate 0.2785   Epoch: 1   Global Step: 14440   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:04:52,139-Speed 5981.53 samples/sec   Loss 13.3996   LearningRate 0.2787   Epoch: 1   Global Step: 14450   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:04:59,001-Speed 5970.35 samples/sec   Loss 13.4518   LearningRate 0.2789   Epoch: 1   Global Step: 14460   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:05:05,868-Speed 5966.08 samples/sec   Loss 13.3741   LearningRate 0.2791   Epoch: 1   Global Step: 14470   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:05:12,712-Speed 5985.21 samples/sec   Loss 13.3311   LearningRate 0.2793   Epoch: 1   Global Step: 14480   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:05:19,567-Speed 5976.82 samples/sec   Loss 13.2830   LearningRate 0.2795   Epoch: 1   Global Step: 14490   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:05:26,424-Speed 5975.02 samples/sec   Loss 13.4134   LearningRate 0.2797   Epoch: 1   Global Step: 14500   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:05:33,271-Speed 5983.38 samples/sec   Loss 13.4203   LearningRate 0.2799   Epoch: 1   Global Step: 14510   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:05:40,142-Speed 5962.42 samples/sec   Loss 13.4172   LearningRate 0.2801   Epoch: 1   Global Step: 14520   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:05:47,014-Speed 5964.25 samples/sec   Loss 13.4270   LearningRate 0.2803   Epoch: 1   Global Step: 14530   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:05:53,859-Speed 5985.22 samples/sec   Loss 13.3049   LearningRate 0.2805   Epoch: 1   Global Step: 14540   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:06:00,743-Speed 5950.34 samples/sec   Loss 13.3042   LearningRate 0.2806   Epoch: 1   Global Step: 14550   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:06:07,602-Speed 5973.45 samples/sec   Loss 13.2945   LearningRate 0.2808   Epoch: 1   Global Step: 14560   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:06:14,455-Speed 5978.40 samples/sec   Loss 13.3187   LearningRate 0.2810   Epoch: 1   Global Step: 14570   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:06:21,305-Speed 5980.94 samples/sec   Loss 13.4000   LearningRate 0.2812   Epoch: 1   Global Step: 14580   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:06:28,163-Speed 5974.67 samples/sec   Loss 13.3307   LearningRate 0.2814   Epoch: 1   Global Step: 14590   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:06:35,021-Speed 5973.21 samples/sec   Loss 13.2845   LearningRate 0.2816   Epoch: 1   Global Step: 14600   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:06:41,898-Speed 5957.56 samples/sec   Loss 13.3296   LearningRate 0.2818   Epoch: 1   Global Step: 14610   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:06:48,757-Speed 5973.07 samples/sec   Loss 13.3222   LearningRate 0.2820   Epoch: 1   Global Step: 14620   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:06:55,637-Speed 5955.27 samples/sec   Loss 13.3750   LearningRate 0.2822   Epoch: 1   Global Step: 14630   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:07:02,486-Speed 5981.38 samples/sec   Loss 13.3350   LearningRate 0.2824   Epoch: 1   Global Step: 14640   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:07:09,360-Speed 5959.96 samples/sec   Loss 13.3443   LearningRate 0.2826   Epoch: 1   Global Step: 14650   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:07:16,214-Speed 5977.38 samples/sec   Loss 13.3923   LearningRate 0.2828   Epoch: 1   Global Step: 14660   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:07:23,080-Speed 5967.25 samples/sec   Loss 13.3356   LearningRate 0.2830   Epoch: 1   Global Step: 14670   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:07:29,933-Speed 5978.03 samples/sec   Loss 13.3880   LearningRate 0.2832   Epoch: 1   Global Step: 14680   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:07:36,795-Speed 5970.68 samples/sec   Loss 13.4061   LearningRate 0.2833   Epoch: 1   Global Step: 14690   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:07:43,654-Speed 5973.10 samples/sec   Loss 13.4369   LearningRate 0.2835   Epoch: 1   Global Step: 14700   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:07:50,561-Speed 5931.12 samples/sec   Loss 13.3604   LearningRate 0.2837   Epoch: 1   Global Step: 14710   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:07:57,447-Speed 5953.43 samples/sec   Loss 13.4217   LearningRate 0.2839   Epoch: 1   Global Step: 14720   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:08:04,313-Speed 5966.96 samples/sec   Loss 13.3687   LearningRate 0.2841   Epoch: 1   Global Step: 14730   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:08:11,182-Speed 5964.29 samples/sec   Loss 13.4091   LearningRate 0.2843   Epoch: 1   Global Step: 14740   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:08:18,054-Speed 5961.96 samples/sec   Loss 13.3638   LearningRate 0.2845   Epoch: 1   Global Step: 14750   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:08:24,915-Speed 5971.29 samples/sec   Loss 13.3735   LearningRate 0.2847   Epoch: 1   Global Step: 14760   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:08:31,789-Speed 5959.53 samples/sec   Loss 13.3975   LearningRate 0.2849   Epoch: 1   Global Step: 14770   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:08:38,639-Speed 5982.71 samples/sec   Loss 13.4342   LearningRate 0.2851   Epoch: 1   Global Step: 14780   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:08:45,501-Speed 5969.96 samples/sec   Loss 13.4340   LearningRate 0.2853   Epoch: 1   Global Step: 14790   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:08:52,356-Speed 5977.72 samples/sec   Loss 13.3240   LearningRate 0.2855   Epoch: 1   Global Step: 14800   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:08:59,218-Speed 5970.49 samples/sec   Loss 13.3655   LearningRate 0.2857   Epoch: 1   Global Step: 14810   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:09:06,069-Speed 5979.70 samples/sec   Loss 13.3573   LearningRate 0.2859   Epoch: 1   Global Step: 14820   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:09:12,937-Speed 5965.23 samples/sec   Loss 13.4520   LearningRate 0.2860   Epoch: 1   Global Step: 14830   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:09:19,804-Speed 5966.25 samples/sec   Loss 13.4964   LearningRate 0.2862   Epoch: 1   Global Step: 14840   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:09:26,657-Speed 5979.21 samples/sec   Loss 13.4579   LearningRate 0.2864   Epoch: 1   Global Step: 14850   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:09:33,502-Speed 5985.06 samples/sec   Loss 13.3500   LearningRate 0.2866   Epoch: 1   Global Step: 14860   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:09:40,367-Speed 5967.27 samples/sec   Loss 13.3725   LearningRate 0.2868   Epoch: 1   Global Step: 14870   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:09:47,228-Speed 5973.96 samples/sec   Loss 13.3645   LearningRate 0.2870   Epoch: 1   Global Step: 14880   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:09:54,089-Speed 5971.72 samples/sec   Loss 13.4489   LearningRate 0.2872   Epoch: 1   Global Step: 14890   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:10:00,947-Speed 5975.93 samples/sec   Loss 13.3522   LearningRate 0.2874   Epoch: 1   Global Step: 14900   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:10:07,796-Speed 5981.50 samples/sec   Loss 13.3079   LearningRate 0.2876   Epoch: 1   Global Step: 14910   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:10:14,667-Speed 5962.57 samples/sec   Loss 13.4566   LearningRate 0.2878   Epoch: 1   Global Step: 14920   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:10:21,530-Speed 5969.77 samples/sec   Loss 13.4482   LearningRate 0.2880   Epoch: 1   Global Step: 14930   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:10:28,380-Speed 5980.73 samples/sec   Loss 13.3768   LearningRate 0.2882   Epoch: 1   Global Step: 14940   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:10:35,285-Speed 5932.44 samples/sec   Loss 13.4382   LearningRate 0.2884   Epoch: 1   Global Step: 14950   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:10:42,141-Speed 5976.12 samples/sec   Loss 13.3566   LearningRate 0.2886   Epoch: 1   Global Step: 14960   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:10:48,991-Speed 5980.74 samples/sec   Loss 13.4080   LearningRate 0.2887   Epoch: 1   Global Step: 14970   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:10:55,844-Speed 5977.83 samples/sec   Loss 13.4090   LearningRate 0.2889   Epoch: 1   Global Step: 14980   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:11:02,689-Speed 5985.54 samples/sec   Loss 13.3826   LearningRate 0.2891   Epoch: 1   Global Step: 14990   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:11:09,554-Speed 5970.02 samples/sec   Loss 13.4175   LearningRate 0.2893   Epoch: 1   Global Step: 15000   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:11:36,147-[lfw][15000]XNorm: 21.573219
Training: 2022-01-07 23:11:36,148-[lfw][15000]Accuracy-Flip: 0.99550+-0.00317
Training: 2022-01-07 23:11:36,148-[lfw][15000]Accuracy-Highest: 0.99600
Training: 2022-01-07 23:12:06,961-[cfp_fp][15000]XNorm: 19.733546
Training: 2022-01-07 23:12:06,962-[cfp_fp][15000]Accuracy-Flip: 0.96500+-0.00806
Training: 2022-01-07 23:12:06,963-[cfp_fp][15000]Accuracy-Highest: 0.96643
Training: 2022-01-07 23:12:33,658-[agedb_30][15000]XNorm: 21.743631
Training: 2022-01-07 23:12:33,659-[agedb_30][15000]Accuracy-Flip: 0.94533+-0.01211
Training: 2022-01-07 23:12:33,659-[agedb_30][15000]Accuracy-Highest: 0.94533
Training: 2022-01-07 23:12:40,515-Speed 450.31 samples/sec   Loss 13.4131   LearningRate 0.2895   Epoch: 1   Global Step: 15010   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:12:47,347-Speed 5997.50 samples/sec   Loss 13.4261   LearningRate 0.2897   Epoch: 1   Global Step: 15020   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:12:54,183-Speed 5992.56 samples/sec   Loss 13.4464   LearningRate 0.2899   Epoch: 1   Global Step: 15030   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:13:01,037-Speed 5976.75 samples/sec   Loss 13.4657   LearningRate 0.2901   Epoch: 1   Global Step: 15040   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:13:07,889-Speed 5979.19 samples/sec   Loss 13.4950   LearningRate 0.2903   Epoch: 1   Global Step: 15050   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:13:14,762-Speed 5959.80 samples/sec   Loss 13.4056   LearningRate 0.2905   Epoch: 1   Global Step: 15060   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:13:21,634-Speed 5961.86 samples/sec   Loss 13.3735   LearningRate 0.2907   Epoch: 1   Global Step: 15070   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:13:28,505-Speed 5962.57 samples/sec   Loss 13.3847   LearningRate 0.2909   Epoch: 1   Global Step: 15080   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:13:35,358-Speed 5977.74 samples/sec   Loss 13.4357   LearningRate 0.2911   Epoch: 1   Global Step: 15090   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:13:42,224-Speed 5969.71 samples/sec   Loss 13.3643   LearningRate 0.2913   Epoch: 1   Global Step: 15100   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:13:49,095-Speed 5962.51 samples/sec   Loss 13.3815   LearningRate 0.2914   Epoch: 1   Global Step: 15110   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:13:55,978-Speed 5952.51 samples/sec   Loss 13.5038   LearningRate 0.2916   Epoch: 1   Global Step: 15120   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:14:02,874-Speed 5940.68 samples/sec   Loss 13.3523   LearningRate 0.2918   Epoch: 1   Global Step: 15130   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:14:09,713-Speed 5990.20 samples/sec   Loss 13.3673   LearningRate 0.2920   Epoch: 1   Global Step: 15140   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:14:16,567-Speed 5977.31 samples/sec   Loss 13.3770   LearningRate 0.2922   Epoch: 1   Global Step: 15150   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:14:23,467-Speed 5937.10 samples/sec   Loss 13.3916   LearningRate 0.2924   Epoch: 1   Global Step: 15160   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:14:30,321-Speed 5978.08 samples/sec   Loss 13.3713   LearningRate 0.2926   Epoch: 1   Global Step: 15170   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:14:37,178-Speed 5974.28 samples/sec   Loss 13.3934   LearningRate 0.2928   Epoch: 1   Global Step: 15180   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:14:44,036-Speed 5974.36 samples/sec   Loss 13.3804   LearningRate 0.2930   Epoch: 1   Global Step: 15190   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:14:50,897-Speed 5979.88 samples/sec   Loss 13.4129   LearningRate 0.2932   Epoch: 1   Global Step: 15200   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:14:57,759-Speed 5974.89 samples/sec   Loss 13.3840   LearningRate 0.2934   Epoch: 1   Global Step: 15210   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:15:04,632-Speed 5961.07 samples/sec   Loss 13.3880   LearningRate 0.2936   Epoch: 1   Global Step: 15220   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:15:11,506-Speed 5960.41 samples/sec   Loss 13.4143   LearningRate 0.2938   Epoch: 1   Global Step: 15230   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:15:18,378-Speed 5965.24 samples/sec   Loss 13.4197   LearningRate 0.2940   Epoch: 1   Global Step: 15240   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:15:25,239-Speed 5971.13 samples/sec   Loss 13.4558   LearningRate 0.2941   Epoch: 1   Global Step: 15250   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:15:32,098-Speed 5973.24 samples/sec   Loss 13.4598   LearningRate 0.2943   Epoch: 1   Global Step: 15260   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:15:38,969-Speed 5965.00 samples/sec   Loss 13.3531   LearningRate 0.2945   Epoch: 1   Global Step: 15270   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:15:45,829-Speed 5971.41 samples/sec   Loss 13.4199   LearningRate 0.2947   Epoch: 1   Global Step: 15280   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:15:52,708-Speed 5957.22 samples/sec   Loss 13.4127   LearningRate 0.2949   Epoch: 1   Global Step: 15290   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:15:59,559-Speed 5981.56 samples/sec   Loss 13.4942   LearningRate 0.2951   Epoch: 1   Global Step: 15300   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:16:06,455-Speed 5942.51 samples/sec   Loss 13.4453   LearningRate 0.2953   Epoch: 1   Global Step: 15310   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:16:13,318-Speed 5971.63 samples/sec   Loss 13.4102   LearningRate 0.2955   Epoch: 1   Global Step: 15320   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:16:20,186-Speed 5964.73 samples/sec   Loss 13.4260   LearningRate 0.2957   Epoch: 1   Global Step: 15330   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:16:27,050-Speed 5967.87 samples/sec   Loss 13.5112   LearningRate 0.2959   Epoch: 1   Global Step: 15340   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:16:33,918-Speed 5965.06 samples/sec   Loss 13.4856   LearningRate 0.2961   Epoch: 1   Global Step: 15350   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:16:40,766-Speed 5982.06 samples/sec   Loss 13.4819   LearningRate 0.2963   Epoch: 1   Global Step: 15360   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:16:47,638-Speed 5961.57 samples/sec   Loss 13.4270   LearningRate 0.2965   Epoch: 1   Global Step: 15370   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:16:54,506-Speed 5964.99 samples/sec   Loss 13.4680   LearningRate 0.2967   Epoch: 1   Global Step: 15380   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:17:01,371-Speed 5967.44 samples/sec   Loss 13.4062   LearningRate 0.2968   Epoch: 1   Global Step: 15390   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:17:08,249-Speed 5957.15 samples/sec   Loss 13.6104   LearningRate 0.2970   Epoch: 1   Global Step: 15400   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:17:15,110-Speed 5970.69 samples/sec   Loss 13.4143   LearningRate 0.2972   Epoch: 1   Global Step: 15410   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:17:21,957-Speed 5983.14 samples/sec   Loss 13.4278   LearningRate 0.2974   Epoch: 1   Global Step: 15420   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:17:28,810-Speed 5980.43 samples/sec   Loss 13.5008   LearningRate 0.2976   Epoch: 1   Global Step: 15430   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:17:35,679-Speed 5964.67 samples/sec   Loss 13.4946   LearningRate 0.2978   Epoch: 1   Global Step: 15440   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:17:42,558-Speed 5955.70 samples/sec   Loss 13.4833   LearningRate 0.2980   Epoch: 1   Global Step: 15450   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:17:49,422-Speed 5969.08 samples/sec   Loss 13.4625   LearningRate 0.2982   Epoch: 1   Global Step: 15460   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:17:56,300-Speed 5955.36 samples/sec   Loss 13.4593   LearningRate 0.2984   Epoch: 1   Global Step: 15470   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:18:03,165-Speed 5973.39 samples/sec   Loss 13.5186   LearningRate 0.2986   Epoch: 1   Global Step: 15480   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:18:10,024-Speed 5973.52 samples/sec   Loss 13.4385   LearningRate 0.2988   Epoch: 1   Global Step: 15490   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:18:16,863-Speed 5989.69 samples/sec   Loss 14.1746   LearningRate 0.2990   Epoch: 1   Global Step: 15500   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-01-07 23:18:23,723-Speed 5972.11 samples/sec   Loss 14.1723   LearningRate 0.2992   Epoch: 1   Global Step: 15510   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-01-07 23:18:30,568-Speed 5985.79 samples/sec   Loss 13.8978   LearningRate 0.2994   Epoch: 1   Global Step: 15520   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-01-07 23:18:37,440-Speed 5961.89 samples/sec   Loss 13.6282   LearningRate 0.2995   Epoch: 1   Global Step: 15530   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-01-07 23:18:44,288-Speed 5981.99 samples/sec   Loss 13.6761   LearningRate 0.2997   Epoch: 1   Global Step: 15540   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-01-07 23:18:51,145-Speed 5976.44 samples/sec   Loss 13.6871   LearningRate 0.2999   Epoch: 1   Global Step: 15550   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-01-07 23:18:57,998-Speed 5981.19 samples/sec   Loss 13.6745   LearningRate 0.3001   Epoch: 1   Global Step: 15560   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-01-07 23:19:04,852-Speed 5978.38 samples/sec   Loss 13.5055   LearningRate 0.3003   Epoch: 1   Global Step: 15570   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-01-07 23:19:11,748-Speed 5941.38 samples/sec   Loss 13.5512   LearningRate 0.3005   Epoch: 1   Global Step: 15580   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-01-07 23:19:18,608-Speed 5971.74 samples/sec   Loss 13.3739   LearningRate 0.3007   Epoch: 1   Global Step: 15590   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-01-07 23:19:25,477-Speed 5964.39 samples/sec   Loss 13.5494   LearningRate 0.3009   Epoch: 1   Global Step: 15600   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 23:19:32,344-Speed 5967.47 samples/sec   Loss 13.6070   LearningRate 0.3011   Epoch: 1   Global Step: 15610   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 23:19:39,198-Speed 5977.78 samples/sec   Loss 13.4097   LearningRate 0.3013   Epoch: 1   Global Step: 15620   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 23:19:46,049-Speed 5979.22 samples/sec   Loss 13.5055   LearningRate 0.3015   Epoch: 1   Global Step: 15630   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 23:19:52,940-Speed 5945.49 samples/sec   Loss 13.4548   LearningRate 0.3017   Epoch: 1   Global Step: 15640   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 23:19:59,795-Speed 5976.42 samples/sec   Loss 13.5019   LearningRate 0.3019   Epoch: 1   Global Step: 15650   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 23:20:06,665-Speed 5963.69 samples/sec   Loss 13.5871   LearningRate 0.3021   Epoch: 1   Global Step: 15660   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 23:20:13,515-Speed 5980.35 samples/sec   Loss 13.4935   LearningRate 0.3022   Epoch: 1   Global Step: 15670   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 23:20:20,394-Speed 5955.97 samples/sec   Loss 13.5190   LearningRate 0.3024   Epoch: 1   Global Step: 15680   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 23:20:27,241-Speed 5983.10 samples/sec   Loss 13.4510   LearningRate 0.3026   Epoch: 1   Global Step: 15690   Fp16 Grad Scale: 65536   Required: 38 hours
Training: 2022-01-07 23:20:34,127-Speed 5951.19 samples/sec   Loss 13.5674   LearningRate 0.3028   Epoch: 1   Global Step: 15700   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:20:41,083-Speed 5891.30 samples/sec   Loss 13.5876   LearningRate 0.3030   Epoch: 1   Global Step: 15710   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:20:47,930-Speed 5982.97 samples/sec   Loss 13.4881   LearningRate 0.3032   Epoch: 1   Global Step: 15720   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:20:54,786-Speed 5975.25 samples/sec   Loss 13.5346   LearningRate 0.3034   Epoch: 1   Global Step: 15730   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:21:01,657-Speed 5963.54 samples/sec   Loss 13.4661   LearningRate 0.3036   Epoch: 1   Global Step: 15740   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:21:08,546-Speed 5946.69 samples/sec   Loss 13.5104   LearningRate 0.3038   Epoch: 1   Global Step: 15750   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:21:15,470-Speed 5916.65 samples/sec   Loss 13.5576   LearningRate 0.3040   Epoch: 1   Global Step: 15760   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:21:22,399-Speed 5912.49 samples/sec   Loss 13.4516   LearningRate 0.3042   Epoch: 1   Global Step: 15770   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:21:29,255-Speed 5974.85 samples/sec   Loss 13.4786   LearningRate 0.3044   Epoch: 1   Global Step: 15780   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:21:36,162-Speed 5931.53 samples/sec   Loss 13.4427   LearningRate 0.3046   Epoch: 1   Global Step: 15790   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:21:43,028-Speed 5967.27 samples/sec   Loss 13.5567   LearningRate 0.3048   Epoch: 1   Global Step: 15800   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:21:49,879-Speed 5979.34 samples/sec   Loss 13.5347   LearningRate 0.3049   Epoch: 1   Global Step: 15810   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:21:56,731-Speed 5979.05 samples/sec   Loss 13.5721   LearningRate 0.3051   Epoch: 1   Global Step: 15820   Fp16 Grad Scale: 262144   Required: 38 hours
Training: 2022-01-07 23:22:03,574-Speed 5989.17 samples/sec   Loss 13.5376   LearningRate 0.3053   Epoch: 1   Global Step: 15830   Fp16 Grad Scale: 131072   Required: 38 hours
Training: 2022-01-07 23:22:10,428-Speed 5977.51 samples/sec   Loss 13.5272   LearningRate 0.3055   Epoch: 1   Global Step: 15840   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:22:17,284-Speed 5975.45 samples/sec   Loss 13.5826   LearningRate 0.3057   Epoch: 1   Global Step: 15850   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:22:24,138-Speed 5977.43 samples/sec   Loss 13.5269   LearningRate 0.3059   Epoch: 1   Global Step: 15860   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:22:30,994-Speed 5975.48 samples/sec   Loss 13.6547   LearningRate 0.3061   Epoch: 1   Global Step: 15870   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:22:37,865-Speed 5962.32 samples/sec   Loss 13.5035   LearningRate 0.3063   Epoch: 1   Global Step: 15880   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:22:44,743-Speed 5956.23 samples/sec   Loss 13.5744   LearningRate 0.3065   Epoch: 1   Global Step: 15890   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:22:51,621-Speed 5957.09 samples/sec   Loss 13.4497   LearningRate 0.3067   Epoch: 1   Global Step: 15900   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:22:58,482-Speed 5972.61 samples/sec   Loss 13.5382   LearningRate 0.3069   Epoch: 1   Global Step: 15910   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:23:05,341-Speed 5973.30 samples/sec   Loss 13.4659   LearningRate 0.3071   Epoch: 1   Global Step: 15920   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:23:12,204-Speed 5969.72 samples/sec   Loss 13.5750   LearningRate 0.3073   Epoch: 1   Global Step: 15930   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:23:19,098-Speed 5942.82 samples/sec   Loss 13.5698   LearningRate 0.3075   Epoch: 1   Global Step: 15940   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:23:25,966-Speed 5964.64 samples/sec   Loss 13.5265   LearningRate 0.3076   Epoch: 1   Global Step: 15950   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:23:32,847-Speed 5955.68 samples/sec   Loss 13.5113   LearningRate 0.3078   Epoch: 1   Global Step: 15960   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:23:39,707-Speed 5971.85 samples/sec   Loss 13.5685   LearningRate 0.3080   Epoch: 1   Global Step: 15970   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:23:46,607-Speed 5937.67 samples/sec   Loss 13.5519   LearningRate 0.3082   Epoch: 1   Global Step: 15980   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:23:53,468-Speed 5971.01 samples/sec   Loss 13.5371   LearningRate 0.3084   Epoch: 1   Global Step: 15990   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:24:00,328-Speed 5974.47 samples/sec   Loss 13.4763   LearningRate 0.3086   Epoch: 1   Global Step: 16000   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:24:07,183-Speed 5976.30 samples/sec   Loss 13.6106   LearningRate 0.3088   Epoch: 1   Global Step: 16010   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:24:14,031-Speed 5982.22 samples/sec   Loss 13.6386   LearningRate 0.3090   Epoch: 1   Global Step: 16020   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:24:20,875-Speed 5985.85 samples/sec   Loss 13.6359   LearningRate 0.3092   Epoch: 1   Global Step: 16030   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:24:27,728-Speed 5978.32 samples/sec   Loss 13.5770   LearningRate 0.3094   Epoch: 1   Global Step: 16040   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:24:34,701-Speed 5875.46 samples/sec   Loss 13.5464   LearningRate 0.3096   Epoch: 1   Global Step: 16050   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:24:41,636-Speed 5907.45 samples/sec   Loss 13.5736   LearningRate 0.3098   Epoch: 1   Global Step: 16060   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:24:48,541-Speed 5933.08 samples/sec   Loss 13.4884   LearningRate 0.3100   Epoch: 1   Global Step: 16070   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:24:55,458-Speed 5922.63 samples/sec   Loss 13.4975   LearningRate 0.3102   Epoch: 1   Global Step: 16080   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:25:02,335-Speed 5957.24 samples/sec   Loss 13.6263   LearningRate 0.3103   Epoch: 1   Global Step: 16090   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:25:09,204-Speed 5963.77 samples/sec   Loss 13.5937   LearningRate 0.3105   Epoch: 1   Global Step: 16100   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:25:16,070-Speed 5967.29 samples/sec   Loss 13.4952   LearningRate 0.3107   Epoch: 1   Global Step: 16110   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:25:22,947-Speed 5957.46 samples/sec   Loss 13.6262   LearningRate 0.3109   Epoch: 1   Global Step: 16120   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:25:29,799-Speed 5978.94 samples/sec   Loss 13.4778   LearningRate 0.3111   Epoch: 1   Global Step: 16130   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:25:36,655-Speed 5975.74 samples/sec   Loss 13.5095   LearningRate 0.3113   Epoch: 1   Global Step: 16140   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:25:43,543-Speed 5948.87 samples/sec   Loss 13.5937   LearningRate 0.3115   Epoch: 1   Global Step: 16150   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:25:50,403-Speed 5971.76 samples/sec   Loss 13.5965   LearningRate 0.3117   Epoch: 1   Global Step: 16160   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:25:57,253-Speed 5980.60 samples/sec   Loss 13.5081   LearningRate 0.3119   Epoch: 1   Global Step: 16170   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:26:04,149-Speed 5941.57 samples/sec   Loss 13.4948   LearningRate 0.3121   Epoch: 1   Global Step: 16180   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:26:11,043-Speed 5942.82 samples/sec   Loss 13.5404   LearningRate 0.3123   Epoch: 1   Global Step: 16190   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:26:17,928-Speed 5950.59 samples/sec   Loss 13.5181   LearningRate 0.3125   Epoch: 1   Global Step: 16200   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:26:24,779-Speed 5979.14 samples/sec   Loss 13.5478   LearningRate 0.3127   Epoch: 1   Global Step: 16210   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:26:31,630-Speed 5980.48 samples/sec   Loss 13.6522   LearningRate 0.3129   Epoch: 1   Global Step: 16220   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:26:38,477-Speed 5982.90 samples/sec   Loss 13.5650   LearningRate 0.3130   Epoch: 1   Global Step: 16230   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:26:45,339-Speed 5970.72 samples/sec   Loss 13.6006   LearningRate 0.3132   Epoch: 1   Global Step: 16240   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:26:52,189-Speed 5982.50 samples/sec   Loss 13.4791   LearningRate 0.3134   Epoch: 1   Global Step: 16250   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:26:59,035-Speed 5983.46 samples/sec   Loss 13.6188   LearningRate 0.3136   Epoch: 1   Global Step: 16260   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:27:05,884-Speed 5981.57 samples/sec   Loss 13.5324   LearningRate 0.3138   Epoch: 1   Global Step: 16270   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:27:12,731-Speed 5983.51 samples/sec   Loss 13.5976   LearningRate 0.3140   Epoch: 1   Global Step: 16280   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:27:19,590-Speed 5972.82 samples/sec   Loss 13.5397   LearningRate 0.3142   Epoch: 1   Global Step: 16290   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:27:26,448-Speed 5973.64 samples/sec   Loss 13.6068   LearningRate 0.3144   Epoch: 1   Global Step: 16300   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:27:33,306-Speed 5973.04 samples/sec   Loss 13.5860   LearningRate 0.3146   Epoch: 1   Global Step: 16310   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:27:40,168-Speed 5971.17 samples/sec   Loss 13.5445   LearningRate 0.3148   Epoch: 1   Global Step: 16320   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:27:47,022-Speed 5977.94 samples/sec   Loss 13.5715   LearningRate 0.3150   Epoch: 1   Global Step: 16330   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:27:53,866-Speed 5985.52 samples/sec   Loss 13.4907   LearningRate 0.3152   Epoch: 1   Global Step: 16340   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:28:00,713-Speed 5983.52 samples/sec   Loss 13.6210   LearningRate 0.3154   Epoch: 1   Global Step: 16350   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:28:07,568-Speed 5978.21 samples/sec   Loss 13.6631   LearningRate 0.3156   Epoch: 1   Global Step: 16360   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:28:14,412-Speed 5986.32 samples/sec   Loss 13.5691   LearningRate 0.3157   Epoch: 1   Global Step: 16370   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:28:21,265-Speed 5977.75 samples/sec   Loss 13.5907   LearningRate 0.3159   Epoch: 1   Global Step: 16380   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:28:28,134-Speed 5963.72 samples/sec   Loss 13.6887   LearningRate 0.3161   Epoch: 1   Global Step: 16390   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:28:34,976-Speed 5987.99 samples/sec   Loss 13.6896   LearningRate 0.3163   Epoch: 1   Global Step: 16400   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:28:41,866-Speed 5946.28 samples/sec   Loss 13.5651   LearningRate 0.3165   Epoch: 1   Global Step: 16410   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:28:48,710-Speed 5985.83 samples/sec   Loss 13.4916   LearningRate 0.3167   Epoch: 1   Global Step: 16420   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:28:55,557-Speed 5982.92 samples/sec   Loss 13.5781   LearningRate 0.3169   Epoch: 1   Global Step: 16430   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:29:02,408-Speed 5982.72 samples/sec   Loss 13.6492   LearningRate 0.3171   Epoch: 1   Global Step: 16440   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:29:09,362-Speed 5890.89 samples/sec   Loss 13.6221   LearningRate 0.3173   Epoch: 1   Global Step: 16450   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:29:16,315-Speed 5892.81 samples/sec   Loss 13.5626   LearningRate 0.3175   Epoch: 1   Global Step: 16460   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:29:23,169-Speed 5977.36 samples/sec   Loss 13.5413   LearningRate 0.3177   Epoch: 1   Global Step: 16470   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:29:30,035-Speed 5966.59 samples/sec   Loss 13.5598   LearningRate 0.3179   Epoch: 1   Global Step: 16480   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:29:36,899-Speed 5967.91 samples/sec   Loss 13.6273   LearningRate 0.3181   Epoch: 1   Global Step: 16490   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:29:43,765-Speed 5967.29 samples/sec   Loss 13.6014   LearningRate 0.3183   Epoch: 1   Global Step: 16500   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:29:50,615-Speed 5982.18 samples/sec   Loss 13.5867   LearningRate 0.3184   Epoch: 1   Global Step: 16510   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:29:57,513-Speed 5939.29 samples/sec   Loss 13.5899   LearningRate 0.3186   Epoch: 1   Global Step: 16520   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:30:04,372-Speed 5972.14 samples/sec   Loss 13.6879   LearningRate 0.3188   Epoch: 1   Global Step: 16530   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:30:11,234-Speed 5970.92 samples/sec   Loss 13.5037   LearningRate 0.3190   Epoch: 1   Global Step: 16540   Fp16 Grad Scale: 524288   Required: 37 hours
Training: 2022-01-07 23:30:18,136-Speed 5935.88 samples/sec   Loss 13.5907   LearningRate 0.3192   Epoch: 1   Global Step: 16550   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:30:25,007-Speed 5962.66 samples/sec   Loss 13.5221   LearningRate 0.3194   Epoch: 1   Global Step: 16560   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:30:31,867-Speed 5972.02 samples/sec   Loss 13.5928   LearningRate 0.3196   Epoch: 1   Global Step: 16570   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:30:38,751-Speed 5953.91 samples/sec   Loss 13.5408   LearningRate 0.3198   Epoch: 1   Global Step: 16580   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:30:45,612-Speed 5970.61 samples/sec   Loss 13.6924   LearningRate 0.3200   Epoch: 1   Global Step: 16590   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:30:52,494-Speed 5955.72 samples/sec   Loss 13.5503   LearningRate 0.3202   Epoch: 1   Global Step: 16600   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:30:59,434-Speed 5903.44 samples/sec   Loss 13.5652   LearningRate 0.3204   Epoch: 1   Global Step: 16610   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:31:06,395-Speed 5885.85 samples/sec   Loss 13.5341   LearningRate 0.3206   Epoch: 1   Global Step: 16620   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:31:13,256-Speed 5971.29 samples/sec   Loss 13.5280   LearningRate 0.3208   Epoch: 1   Global Step: 16630   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:31:20,250-Speed 5858.19 samples/sec   Loss 13.5779   LearningRate 0.3210   Epoch: 1   Global Step: 16640   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:31:27,219-Speed 5878.22 samples/sec   Loss 13.6139   LearningRate 0.3211   Epoch: 1   Global Step: 16650   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:31:34,094-Speed 5959.13 samples/sec   Loss 13.6436   LearningRate 0.3213   Epoch: 1   Global Step: 16660   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:31:40,962-Speed 5965.85 samples/sec   Loss 13.6670   LearningRate 0.3215   Epoch: 1   Global Step: 16670   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:31:47,824-Speed 5970.04 samples/sec   Loss 13.6402   LearningRate 0.3217   Epoch: 1   Global Step: 16680   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:31:54,670-Speed 5984.20 samples/sec   Loss 13.6953   LearningRate 0.3219   Epoch: 1   Global Step: 16690   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:32:01,520-Speed 5980.34 samples/sec   Loss 13.6261   LearningRate 0.3221   Epoch: 1   Global Step: 16700   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:32:08,394-Speed 5959.23 samples/sec   Loss 13.7632   LearningRate 0.3223   Epoch: 1   Global Step: 16710   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:32:15,249-Speed 5976.26 samples/sec   Loss 13.7188   LearningRate 0.3225   Epoch: 1   Global Step: 16720   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:32:22,098-Speed 5980.89 samples/sec   Loss 13.6734   LearningRate 0.3227   Epoch: 1   Global Step: 16730   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:32:28,959-Speed 5971.58 samples/sec   Loss 13.7128   LearningRate 0.3229   Epoch: 1   Global Step: 16740   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:32:35,824-Speed 5968.05 samples/sec   Loss 13.6551   LearningRate 0.3231   Epoch: 1   Global Step: 16750   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:32:42,669-Speed 5984.83 samples/sec   Loss 13.7071   LearningRate 0.3233   Epoch: 1   Global Step: 16760   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:32:49,532-Speed 5969.32 samples/sec   Loss 13.6234   LearningRate 0.3235   Epoch: 1   Global Step: 16770   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:32:56,398-Speed 5967.86 samples/sec   Loss 13.6636   LearningRate 0.3237   Epoch: 1   Global Step: 16780   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:33:03,248-Speed 5980.37 samples/sec   Loss 13.6892   LearningRate 0.3238   Epoch: 1   Global Step: 16790   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:33:10,110-Speed 5969.85 samples/sec   Loss 13.6051   LearningRate 0.3240   Epoch: 1   Global Step: 16800   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:33:16,975-Speed 5967.64 samples/sec   Loss 13.6413   LearningRate 0.3242   Epoch: 1   Global Step: 16810   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:33:23,826-Speed 5982.16 samples/sec   Loss 13.6612   LearningRate 0.3244   Epoch: 1   Global Step: 16820   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:33:30,679-Speed 5979.02 samples/sec   Loss 13.6704   LearningRate 0.3246   Epoch: 1   Global Step: 16830   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:33:37,553-Speed 5959.50 samples/sec   Loss 13.6010   LearningRate 0.3248   Epoch: 1   Global Step: 16840   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:33:44,409-Speed 5975.89 samples/sec   Loss 13.7359   LearningRate 0.3250   Epoch: 1   Global Step: 16850   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:33:51,275-Speed 5965.87 samples/sec   Loss 13.6511   LearningRate 0.3252   Epoch: 1   Global Step: 16860   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:33:58,152-Speed 5959.36 samples/sec   Loss 13.6613   LearningRate 0.3254   Epoch: 1   Global Step: 16870   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:34:05,036-Speed 5951.93 samples/sec   Loss 13.7053   LearningRate 0.3256   Epoch: 1   Global Step: 16880   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:34:11,918-Speed 5953.00 samples/sec   Loss 13.6777   LearningRate 0.3258   Epoch: 1   Global Step: 16890   Fp16 Grad Scale: 524288   Required: 37 hours
Training: 2022-01-07 23:34:18,780-Speed 5970.41 samples/sec   Loss 13.6536   LearningRate 0.3260   Epoch: 1   Global Step: 16900   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:34:25,655-Speed 5958.91 samples/sec   Loss 13.6583   LearningRate 0.3262   Epoch: 1   Global Step: 16910   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:34:32,529-Speed 5959.50 samples/sec   Loss 13.6388   LearningRate 0.3264   Epoch: 1   Global Step: 16920   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:34:39,376-Speed 5983.12 samples/sec   Loss 13.6330   LearningRate 0.3266   Epoch: 1   Global Step: 16930   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:34:46,228-Speed 5979.42 samples/sec   Loss 13.7709   LearningRate 0.3267   Epoch: 1   Global Step: 16940   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:34:53,078-Speed 5980.01 samples/sec   Loss 13.7244   LearningRate 0.3269   Epoch: 1   Global Step: 16950   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:34:59,931-Speed 5978.79 samples/sec   Loss 13.7119   LearningRate 0.3271   Epoch: 1   Global Step: 16960   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:35:06,799-Speed 5964.78 samples/sec   Loss 13.6501   LearningRate 0.3273   Epoch: 1   Global Step: 16970   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:35:13,670-Speed 5962.30 samples/sec   Loss 13.6230   LearningRate 0.3275   Epoch: 1   Global Step: 16980   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:35:20,522-Speed 5978.94 samples/sec   Loss 13.6883   LearningRate 0.3277   Epoch: 1   Global Step: 16990   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:35:27,364-Speed 5987.40 samples/sec   Loss 13.6854   LearningRate 0.3279   Epoch: 1   Global Step: 17000   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:35:34,253-Speed 5947.16 samples/sec   Loss 13.5612   LearningRate 0.3281   Epoch: 1   Global Step: 17010   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:35:41,148-Speed 5941.54 samples/sec   Loss 13.7173   LearningRate 0.3283   Epoch: 1   Global Step: 17020   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:35:48,011-Speed 5969.21 samples/sec   Loss 13.7088   LearningRate 0.3285   Epoch: 1   Global Step: 17030   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:35:54,863-Speed 5978.54 samples/sec   Loss 13.7332   LearningRate 0.3287   Epoch: 1   Global Step: 17040   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:36:01,733-Speed 5964.14 samples/sec   Loss 13.7426   LearningRate 0.3289   Epoch: 1   Global Step: 17050   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:36:08,600-Speed 5965.76 samples/sec   Loss 13.7095   LearningRate 0.3291   Epoch: 1   Global Step: 17060   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:36:15,483-Speed 5951.29 samples/sec   Loss 13.7001   LearningRate 0.3293   Epoch: 1   Global Step: 17070   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:36:22,334-Speed 5980.60 samples/sec   Loss 13.6581   LearningRate 0.3294   Epoch: 1   Global Step: 17080   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:36:29,225-Speed 5945.37 samples/sec   Loss 13.6808   LearningRate 0.3296   Epoch: 1   Global Step: 17090   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:36:36,079-Speed 5979.21 samples/sec   Loss 13.6763   LearningRate 0.3298   Epoch: 1   Global Step: 17100   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:36:42,938-Speed 5973.11 samples/sec   Loss 13.7769   LearningRate 0.3300   Epoch: 1   Global Step: 17110   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:36:49,808-Speed 5963.48 samples/sec   Loss 13.8058   LearningRate 0.3302   Epoch: 1   Global Step: 17120   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:36:56,664-Speed 5975.38 samples/sec   Loss 13.7655   LearningRate 0.3304   Epoch: 1   Global Step: 17130   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:37:03,565-Speed 5937.03 samples/sec   Loss 13.6901   LearningRate 0.3306   Epoch: 1   Global Step: 17140   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:37:10,417-Speed 5978.89 samples/sec   Loss 13.7478   LearningRate 0.3308   Epoch: 1   Global Step: 17150   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:37:17,289-Speed 5960.83 samples/sec   Loss 13.6935   LearningRate 0.3310   Epoch: 1   Global Step: 17160   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:37:24,155-Speed 5967.05 samples/sec   Loss 13.6315   LearningRate 0.3312   Epoch: 1   Global Step: 17170   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:37:31,018-Speed 5969.04 samples/sec   Loss 13.7971   LearningRate 0.3314   Epoch: 1   Global Step: 17180   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:37:37,878-Speed 5972.36 samples/sec   Loss 13.7591   LearningRate 0.3316   Epoch: 1   Global Step: 17190   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:37:49,509-Speed 3522.15 samples/sec   Loss 13.7422   LearningRate 0.3318   Epoch: 1   Global Step: 17200   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:37:56,333-Speed 6002.60 samples/sec   Loss 13.7860   LearningRate 0.3320   Epoch: 1   Global Step: 17210   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-07 23:38:03,195-Speed 5970.62 samples/sec   Loss 13.7102   LearningRate 0.3321   Epoch: 1   Global Step: 17220   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-07 23:38:10,071-Speed 5957.99 samples/sec   Loss 13.7592   LearningRate 0.3323   Epoch: 1   Global Step: 17230   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-07 23:38:16,912-Speed 5987.95 samples/sec   Loss 13.8820   LearningRate 0.3325   Epoch: 1   Global Step: 17240   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-07 23:38:23,767-Speed 5976.19 samples/sec   Loss 13.7958   LearningRate 0.3327   Epoch: 1   Global Step: 17250   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-07 23:38:30,635-Speed 5965.13 samples/sec   Loss 13.6738   LearningRate 0.3329   Epoch: 1   Global Step: 17260   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-07 23:38:37,486-Speed 5979.40 samples/sec   Loss 13.7432   LearningRate 0.3331   Epoch: 1   Global Step: 17270   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-07 23:38:44,335-Speed 5982.12 samples/sec   Loss 13.6708   LearningRate 0.3333   Epoch: 1   Global Step: 17280   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-07 23:38:51,193-Speed 5973.48 samples/sec   Loss 13.6714   LearningRate 0.3335   Epoch: 1   Global Step: 17290   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-07 23:38:58,054-Speed 5970.47 samples/sec   Loss 13.6809   LearningRate 0.3337   Epoch: 1   Global Step: 17300   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-07 23:39:04,908-Speed 5977.58 samples/sec   Loss 13.8118   LearningRate 0.3339   Epoch: 1   Global Step: 17310   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:39:11,770-Speed 5970.27 samples/sec   Loss 13.6433   LearningRate 0.3341   Epoch: 1   Global Step: 17320   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:39:18,690-Speed 5919.77 samples/sec   Loss 13.7042   LearningRate 0.3343   Epoch: 1   Global Step: 17330   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:39:25,552-Speed 5970.60 samples/sec   Loss 13.6362   LearningRate 0.3345   Epoch: 1   Global Step: 17340   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:39:32,422-Speed 5963.34 samples/sec   Loss 13.7085   LearningRate 0.3347   Epoch: 1   Global Step: 17350   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:39:39,273-Speed 5979.46 samples/sec   Loss 13.7308   LearningRate 0.3348   Epoch: 1   Global Step: 17360   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:39:46,120-Speed 5983.34 samples/sec   Loss 13.6726   LearningRate 0.3350   Epoch: 1   Global Step: 17370   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:39:52,966-Speed 5984.43 samples/sec   Loss 13.7581   LearningRate 0.3352   Epoch: 1   Global Step: 17380   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:39:59,815-Speed 5980.97 samples/sec   Loss 13.7323   LearningRate 0.3354   Epoch: 1   Global Step: 17390   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:40:06,660-Speed 5984.81 samples/sec   Loss 13.8280   LearningRate 0.3356   Epoch: 1   Global Step: 17400   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:40:13,521-Speed 5971.98 samples/sec   Loss 13.8037   LearningRate 0.3358   Epoch: 1   Global Step: 17410   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:40:20,379-Speed 5973.87 samples/sec   Loss 13.7655   LearningRate 0.3360   Epoch: 1   Global Step: 17420   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:40:27,240-Speed 5970.79 samples/sec   Loss 13.7297   LearningRate 0.3362   Epoch: 1   Global Step: 17430   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:40:34,097-Speed 5974.34 samples/sec   Loss 13.8081   LearningRate 0.3364   Epoch: 1   Global Step: 17440   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:40:40,972-Speed 5960.09 samples/sec   Loss 13.7597   LearningRate 0.3366   Epoch: 1   Global Step: 17450   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:40:47,835-Speed 5969.00 samples/sec   Loss 13.8178   LearningRate 0.3368   Epoch: 1   Global Step: 17460   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:40:54,697-Speed 5970.44 samples/sec   Loss 13.7594   LearningRate 0.3370   Epoch: 1   Global Step: 17470   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:41:01,559-Speed 5969.94 samples/sec   Loss 13.6939   LearningRate 0.3372   Epoch: 1   Global Step: 17480   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:41:08,420-Speed 5970.90 samples/sec   Loss 13.7459   LearningRate 0.3374   Epoch: 1   Global Step: 17490   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:41:15,358-Speed 5905.18 samples/sec   Loss 13.7359   LearningRate 0.3375   Epoch: 1   Global Step: 17500   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:41:22,281-Speed 5917.25 samples/sec   Loss 13.6005   LearningRate 0.3377   Epoch: 1   Global Step: 17510   Fp16 Grad Scale: 524288   Required: 37 hours
Training: 2022-01-07 23:41:29,152-Speed 5962.71 samples/sec   Loss 13.7836   LearningRate 0.3379   Epoch: 1   Global Step: 17520   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:41:36,017-Speed 5967.90 samples/sec   Loss 13.7130   LearningRate 0.3381   Epoch: 1   Global Step: 17530   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:41:42,890-Speed 5960.70 samples/sec   Loss 13.7379   LearningRate 0.3383   Epoch: 1   Global Step: 17540   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:41:49,816-Speed 5915.08 samples/sec   Loss 13.7777   LearningRate 0.3385   Epoch: 1   Global Step: 17550   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:41:56,690-Speed 5958.99 samples/sec   Loss 13.7386   LearningRate 0.3387   Epoch: 1   Global Step: 17560   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:42:03,584-Speed 5943.07 samples/sec   Loss 13.7138   LearningRate 0.3389   Epoch: 1   Global Step: 17570   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:42:10,469-Speed 5949.64 samples/sec   Loss 13.8273   LearningRate 0.3391   Epoch: 1   Global Step: 17580   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:42:17,316-Speed 5982.81 samples/sec   Loss 13.8150   LearningRate 0.3393   Epoch: 1   Global Step: 17590   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:42:24,319-Speed 5853.28 samples/sec   Loss 13.8399   LearningRate 0.3395   Epoch: 1   Global Step: 17600   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:42:31,200-Speed 5953.86 samples/sec   Loss 13.8364   LearningRate 0.3397   Epoch: 1   Global Step: 17610   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:42:38,067-Speed 5965.65 samples/sec   Loss 13.6920   LearningRate 0.3399   Epoch: 1   Global Step: 17620   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:42:44,919-Speed 5979.68 samples/sec   Loss 13.8374   LearningRate 0.3401   Epoch: 1   Global Step: 17630   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:42:51,771-Speed 5978.72 samples/sec   Loss 13.7834   LearningRate 0.3402   Epoch: 1   Global Step: 17640   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:42:58,667-Speed 5941.13 samples/sec   Loss 13.7196   LearningRate 0.3404   Epoch: 1   Global Step: 17650   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:43:05,520-Speed 5978.04 samples/sec   Loss 13.7237   LearningRate 0.3406   Epoch: 1   Global Step: 17660   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:43:12,401-Speed 5953.57 samples/sec   Loss 13.8272   LearningRate 0.3408   Epoch: 1   Global Step: 17670   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:43:19,276-Speed 5958.75 samples/sec   Loss 13.8081   LearningRate 0.3410   Epoch: 1   Global Step: 17680   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:43:26,133-Speed 5975.56 samples/sec   Loss 13.7781   LearningRate 0.3412   Epoch: 1   Global Step: 17690   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:43:32,995-Speed 5970.07 samples/sec   Loss 13.7597   LearningRate 0.3414   Epoch: 1   Global Step: 17700   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:43:39,859-Speed 5972.91 samples/sec   Loss 13.8658   LearningRate 0.3416   Epoch: 1   Global Step: 17710   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:43:46,712-Speed 5978.50 samples/sec   Loss 13.8530   LearningRate 0.3418   Epoch: 1   Global Step: 17720   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:43:53,588-Speed 5958.72 samples/sec   Loss 13.7775   LearningRate 0.3420   Epoch: 1   Global Step: 17730   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:44:00,443-Speed 5975.74 samples/sec   Loss 13.7305   LearningRate 0.3422   Epoch: 1   Global Step: 17740   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:44:07,301-Speed 5973.74 samples/sec   Loss 13.7868   LearningRate 0.3424   Epoch: 1   Global Step: 17750   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:44:14,254-Speed 5892.26 samples/sec   Loss 13.7215   LearningRate 0.3426   Epoch: 1   Global Step: 17760   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:44:21,118-Speed 5967.81 samples/sec   Loss 13.8323   LearningRate 0.3428   Epoch: 1   Global Step: 17770   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:44:27,996-Speed 5956.54 samples/sec   Loss 13.7486   LearningRate 0.3429   Epoch: 1   Global Step: 17780   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:44:34,843-Speed 5983.02 samples/sec   Loss 13.8135   LearningRate 0.3431   Epoch: 1   Global Step: 17790   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:44:41,694-Speed 5979.53 samples/sec   Loss 13.7908   LearningRate 0.3433   Epoch: 1   Global Step: 17800   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:44:48,538-Speed 5986.11 samples/sec   Loss 13.9607   LearningRate 0.3435   Epoch: 1   Global Step: 17810   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:44:55,408-Speed 5963.36 samples/sec   Loss 13.8850   LearningRate 0.3437   Epoch: 1   Global Step: 17820   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:45:02,262-Speed 5978.46 samples/sec   Loss 13.8692   LearningRate 0.3439   Epoch: 1   Global Step: 17830   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:45:09,118-Speed 5976.29 samples/sec   Loss 13.8387   LearningRate 0.3441   Epoch: 1   Global Step: 17840   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:45:15,993-Speed 5958.56 samples/sec   Loss 13.8669   LearningRate 0.3443   Epoch: 1   Global Step: 17850   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:45:22,862-Speed 5964.89 samples/sec   Loss 13.8780   LearningRate 0.3445   Epoch: 1   Global Step: 17860   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:45:29,716-Speed 5976.53 samples/sec   Loss 13.9181   LearningRate 0.3447   Epoch: 1   Global Step: 17870   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:45:36,568-Speed 5979.55 samples/sec   Loss 13.7479   LearningRate 0.3449   Epoch: 1   Global Step: 17880   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:45:43,439-Speed 5961.86 samples/sec   Loss 13.8496   LearningRate 0.3451   Epoch: 1   Global Step: 17890   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:45:50,330-Speed 5945.11 samples/sec   Loss 13.7674   LearningRate 0.3453   Epoch: 1   Global Step: 17900   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:45:57,196-Speed 5966.98 samples/sec   Loss 13.8524   LearningRate 0.3455   Epoch: 1   Global Step: 17910   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:46:04,045-Speed 5981.74 samples/sec   Loss 13.8419   LearningRate 0.3456   Epoch: 1   Global Step: 17920   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:46:10,912-Speed 5965.06 samples/sec   Loss 13.8152   LearningRate 0.3458   Epoch: 1   Global Step: 17930   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:46:17,779-Speed 5966.25 samples/sec   Loss 13.9321   LearningRate 0.3460   Epoch: 1   Global Step: 17940   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:46:24,671-Speed 5944.83 samples/sec   Loss 13.7755   LearningRate 0.3462   Epoch: 1   Global Step: 17950   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:46:31,546-Speed 5959.08 samples/sec   Loss 13.8450   LearningRate 0.3464   Epoch: 1   Global Step: 17960   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:46:38,404-Speed 5973.48 samples/sec   Loss 13.8089   LearningRate 0.3466   Epoch: 1   Global Step: 17970   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:46:45,256-Speed 5978.57 samples/sec   Loss 13.7460   LearningRate 0.3468   Epoch: 1   Global Step: 17980   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:46:52,120-Speed 5968.76 samples/sec   Loss 13.8310   LearningRate 0.3470   Epoch: 1   Global Step: 17990   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:46:58,995-Speed 5959.08 samples/sec   Loss 13.9294   LearningRate 0.3472   Epoch: 1   Global Step: 18000   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:47:05,856-Speed 5970.88 samples/sec   Loss 13.8204   LearningRate 0.3474   Epoch: 1   Global Step: 18010   Fp16 Grad Scale: 524288   Required: 37 hours
Training: 2022-01-07 23:47:12,716-Speed 5972.35 samples/sec   Loss 13.9474   LearningRate 0.3476   Epoch: 1   Global Step: 18020   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:47:19,579-Speed 5968.83 samples/sec   Loss 13.8836   LearningRate 0.3478   Epoch: 1   Global Step: 18030   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:47:26,468-Speed 5946.46 samples/sec   Loss 13.7972   LearningRate 0.3480   Epoch: 1   Global Step: 18040   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:47:33,340-Speed 5972.26 samples/sec   Loss 13.8102   LearningRate 0.3482   Epoch: 1   Global Step: 18050   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:47:40,206-Speed 5966.52 samples/sec   Loss 13.7879   LearningRate 0.3483   Epoch: 1   Global Step: 18060   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:47:47,067-Speed 5971.21 samples/sec   Loss 13.8131   LearningRate 0.3485   Epoch: 1   Global Step: 18070   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:47:53,916-Speed 5981.66 samples/sec   Loss 13.7733   LearningRate 0.3487   Epoch: 1   Global Step: 18080   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:48:00,771-Speed 5976.25 samples/sec   Loss 13.8660   LearningRate 0.3489   Epoch: 1   Global Step: 18090   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:48:07,617-Speed 5983.99 samples/sec   Loss 13.8389   LearningRate 0.3491   Epoch: 1   Global Step: 18100   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:48:14,485-Speed 5965.40 samples/sec   Loss 13.9846   LearningRate 0.3493   Epoch: 1   Global Step: 18110   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:48:21,334-Speed 5981.02 samples/sec   Loss 13.8929   LearningRate 0.3495   Epoch: 1   Global Step: 18120   Fp16 Grad Scale: 524288   Required: 37 hours
Training: 2022-01-07 23:48:28,190-Speed 5975.56 samples/sec   Loss 13.8300   LearningRate 0.3497   Epoch: 1   Global Step: 18130   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:48:35,068-Speed 5956.46 samples/sec   Loss 13.8567   LearningRate 0.3499   Epoch: 1   Global Step: 18140   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:48:41,941-Speed 5960.45 samples/sec   Loss 13.8945   LearningRate 0.3501   Epoch: 1   Global Step: 18150   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:48:48,789-Speed 5982.47 samples/sec   Loss 13.8212   LearningRate 0.3503   Epoch: 1   Global Step: 18160   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:48:55,639-Speed 5980.18 samples/sec   Loss 13.9100   LearningRate 0.3505   Epoch: 1   Global Step: 18170   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:49:02,494-Speed 5976.43 samples/sec   Loss 13.8306   LearningRate 0.3507   Epoch: 1   Global Step: 18180   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:49:09,346-Speed 5979.66 samples/sec   Loss 13.8985   LearningRate 0.3509   Epoch: 1   Global Step: 18190   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:49:16,214-Speed 5964.98 samples/sec   Loss 13.9400   LearningRate 0.3510   Epoch: 1   Global Step: 18200   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:49:23,058-Speed 5985.98 samples/sec   Loss 13.9134   LearningRate 0.3512   Epoch: 1   Global Step: 18210   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:49:29,918-Speed 5972.19 samples/sec   Loss 13.8623   LearningRate 0.3514   Epoch: 1   Global Step: 18220   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:49:36,769-Speed 5979.30 samples/sec   Loss 13.9981   LearningRate 0.3516   Epoch: 1   Global Step: 18230   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:49:43,642-Speed 5960.22 samples/sec   Loss 13.9922   LearningRate 0.3518   Epoch: 1   Global Step: 18240   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:49:50,486-Speed 5986.27 samples/sec   Loss 13.8512   LearningRate 0.3520   Epoch: 1   Global Step: 18250   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:49:57,345-Speed 5972.55 samples/sec   Loss 13.9770   LearningRate 0.3522   Epoch: 1   Global Step: 18260   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:50:04,192-Speed 5983.85 samples/sec   Loss 13.8668   LearningRate 0.3524   Epoch: 1   Global Step: 18270   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:50:11,034-Speed 5987.25 samples/sec   Loss 13.9187   LearningRate 0.3526   Epoch: 1   Global Step: 18280   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:50:17,897-Speed 5970.08 samples/sec   Loss 13.9291   LearningRate 0.3528   Epoch: 1   Global Step: 18290   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:50:24,797-Speed 5937.71 samples/sec   Loss 13.9126   LearningRate 0.3530   Epoch: 1   Global Step: 18300   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:50:31,653-Speed 5974.83 samples/sec   Loss 13.8904   LearningRate 0.3532   Epoch: 1   Global Step: 18310   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:50:38,491-Speed 5991.02 samples/sec   Loss 13.8233   LearningRate 0.3534   Epoch: 1   Global Step: 18320   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:50:47,422-Speed 4587.24 samples/sec   Loss 13.9690   LearningRate 0.3536   Epoch: 1   Global Step: 18330   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:50:54,298-Speed 5957.88 samples/sec   Loss 13.8792   LearningRate 0.3537   Epoch: 1   Global Step: 18340   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:51:01,135-Speed 5991.77 samples/sec   Loss 13.7549   LearningRate 0.3539   Epoch: 1   Global Step: 18350   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:51:07,980-Speed 5985.00 samples/sec   Loss 13.8333   LearningRate 0.3541   Epoch: 1   Global Step: 18360   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:51:14,837-Speed 5974.57 samples/sec   Loss 13.9286   LearningRate 0.3543   Epoch: 1   Global Step: 18370   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:51:21,684-Speed 5983.41 samples/sec   Loss 13.9130   LearningRate 0.3545   Epoch: 1   Global Step: 18380   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:51:28,545-Speed 5970.91 samples/sec   Loss 13.9471   LearningRate 0.3547   Epoch: 1   Global Step: 18390   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:51:35,402-Speed 5975.24 samples/sec   Loss 13.8493   LearningRate 0.3549   Epoch: 1   Global Step: 18400   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:51:42,267-Speed 5967.40 samples/sec   Loss 13.8915   LearningRate 0.3551   Epoch: 1   Global Step: 18410   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:51:49,139-Speed 5961.38 samples/sec   Loss 13.9241   LearningRate 0.3553   Epoch: 1   Global Step: 18420   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:51:55,994-Speed 5976.24 samples/sec   Loss 13.9414   LearningRate 0.3555   Epoch: 1   Global Step: 18430   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:52:02,837-Speed 5986.78 samples/sec   Loss 14.0998   LearningRate 0.3557   Epoch: 1   Global Step: 18440   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:52:09,700-Speed 5969.79 samples/sec   Loss 13.8638   LearningRate 0.3559   Epoch: 1   Global Step: 18450   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:52:16,565-Speed 5967.84 samples/sec   Loss 13.9586   LearningRate 0.3561   Epoch: 1   Global Step: 18460   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:52:23,433-Speed 5965.13 samples/sec   Loss 13.9028   LearningRate 0.3563   Epoch: 1   Global Step: 18470   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:52:30,307-Speed 5959.97 samples/sec   Loss 13.9832   LearningRate 0.3564   Epoch: 1   Global Step: 18480   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:52:37,180-Speed 5960.96 samples/sec   Loss 13.9715   LearningRate 0.3566   Epoch: 1   Global Step: 18490   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:52:44,076-Speed 5940.48 samples/sec   Loss 14.0058   LearningRate 0.3568   Epoch: 1   Global Step: 18500   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:52:50,973-Speed 5939.40 samples/sec   Loss 13.8935   LearningRate 0.3570   Epoch: 1   Global Step: 18510   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:52:57,840-Speed 5966.70 samples/sec   Loss 13.9564   LearningRate 0.3572   Epoch: 1   Global Step: 18520   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:53:04,696-Speed 5977.66 samples/sec   Loss 13.9436   LearningRate 0.3574   Epoch: 1   Global Step: 18530   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:53:11,537-Speed 5987.77 samples/sec   Loss 13.9862   LearningRate 0.3576   Epoch: 1   Global Step: 18540   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:53:18,382-Speed 5985.37 samples/sec   Loss 14.0745   LearningRate 0.3578   Epoch: 1   Global Step: 18550   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:53:25,232-Speed 5981.09 samples/sec   Loss 13.9317   LearningRate 0.3580   Epoch: 1   Global Step: 18560   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:53:32,078-Speed 5983.78 samples/sec   Loss 14.1095   LearningRate 0.3582   Epoch: 1   Global Step: 18570   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:53:38,931-Speed 5978.28 samples/sec   Loss 14.0087   LearningRate 0.3584   Epoch: 1   Global Step: 18580   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:53:45,793-Speed 5970.17 samples/sec   Loss 13.9104   LearningRate 0.3586   Epoch: 1   Global Step: 18590   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:53:52,650-Speed 5974.73 samples/sec   Loss 13.8459   LearningRate 0.3588   Epoch: 1   Global Step: 18600   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:53:59,502-Speed 5979.49 samples/sec   Loss 13.8579   LearningRate 0.3590   Epoch: 1   Global Step: 18610   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:54:06,356-Speed 5976.64 samples/sec   Loss 13.9240   LearningRate 0.3591   Epoch: 1   Global Step: 18620   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:54:13,212-Speed 5974.82 samples/sec   Loss 13.9447   LearningRate 0.3593   Epoch: 1   Global Step: 18630   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:54:20,086-Speed 5960.51 samples/sec   Loss 13.9407   LearningRate 0.3595   Epoch: 1   Global Step: 18640   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:54:26,942-Speed 5975.60 samples/sec   Loss 13.9972   LearningRate 0.3597   Epoch: 1   Global Step: 18650   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:54:33,797-Speed 5976.23 samples/sec   Loss 13.9797   LearningRate 0.3599   Epoch: 1   Global Step: 18660   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:54:40,661-Speed 5970.44 samples/sec   Loss 14.0575   LearningRate 0.3601   Epoch: 1   Global Step: 18670   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:54:47,527-Speed 5966.52 samples/sec   Loss 14.0184   LearningRate 0.3603   Epoch: 1   Global Step: 18680   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:54:54,422-Speed 5941.54 samples/sec   Loss 14.0497   LearningRate 0.3605   Epoch: 1   Global Step: 18690   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:55:01,290-Speed 5965.57 samples/sec   Loss 13.9497   LearningRate 0.3607   Epoch: 1   Global Step: 18700   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:55:08,150-Speed 5972.58 samples/sec   Loss 13.9016   LearningRate 0.3609   Epoch: 1   Global Step: 18710   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:55:15,022-Speed 5964.93 samples/sec   Loss 13.9410   LearningRate 0.3611   Epoch: 1   Global Step: 18720   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:55:21,883-Speed 5970.98 samples/sec   Loss 13.9908   LearningRate 0.3613   Epoch: 1   Global Step: 18730   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:55:28,747-Speed 5968.33 samples/sec   Loss 13.9626   LearningRate 0.3615   Epoch: 1   Global Step: 18740   Fp16 Grad Scale: 524288   Required: 37 hours
Training: 2022-01-07 23:55:35,603-Speed 5975.66 samples/sec   Loss 13.9755   LearningRate 0.3617   Epoch: 1   Global Step: 18750   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:55:42,472-Speed 5969.56 samples/sec   Loss 13.9868   LearningRate 0.3618   Epoch: 1   Global Step: 18760   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:55:49,325-Speed 5977.91 samples/sec   Loss 14.0110   LearningRate 0.3620   Epoch: 1   Global Step: 18770   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:55:56,186-Speed 5970.91 samples/sec   Loss 14.0194   LearningRate 0.3622   Epoch: 1   Global Step: 18780   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:56:03,074-Speed 5947.93 samples/sec   Loss 14.0090   LearningRate 0.3624   Epoch: 1   Global Step: 18790   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:56:09,933-Speed 5973.54 samples/sec   Loss 14.0189   LearningRate 0.3626   Epoch: 1   Global Step: 18800   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:56:16,804-Speed 5963.02 samples/sec   Loss 13.9372   LearningRate 0.3628   Epoch: 1   Global Step: 18810   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:56:23,666-Speed 5969.15 samples/sec   Loss 13.9534   LearningRate 0.3630   Epoch: 1   Global Step: 18820   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:56:30,530-Speed 5969.20 samples/sec   Loss 13.9706   LearningRate 0.3632   Epoch: 1   Global Step: 18830   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:56:37,379-Speed 5980.67 samples/sec   Loss 13.9627   LearningRate 0.3634   Epoch: 1   Global Step: 18840   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:56:44,247-Speed 5965.86 samples/sec   Loss 13.9976   LearningRate 0.3636   Epoch: 1   Global Step: 18850   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:56:51,128-Speed 5953.71 samples/sec   Loss 13.9944   LearningRate 0.3638   Epoch: 1   Global Step: 18860   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:56:58,019-Speed 5944.46 samples/sec   Loss 13.9540   LearningRate 0.3640   Epoch: 1   Global Step: 18870   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:57:04,892-Speed 5961.31 samples/sec   Loss 14.0197   LearningRate 0.3642   Epoch: 1   Global Step: 18880   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:57:11,775-Speed 5959.49 samples/sec   Loss 13.9573   LearningRate 0.3644   Epoch: 1   Global Step: 18890   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:57:18,610-Speed 5993.64 samples/sec   Loss 14.2067   LearningRate 0.3645   Epoch: 1   Global Step: 18900   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:57:25,472-Speed 5970.48 samples/sec   Loss 14.0772   LearningRate 0.3647   Epoch: 1   Global Step: 18910   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:57:32,321-Speed 5981.54 samples/sec   Loss 14.0042   LearningRate 0.3649   Epoch: 1   Global Step: 18920   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:57:39,198-Speed 5956.67 samples/sec   Loss 14.1247   LearningRate 0.3651   Epoch: 1   Global Step: 18930   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:57:46,051-Speed 5978.57 samples/sec   Loss 14.0736   LearningRate 0.3653   Epoch: 1   Global Step: 18940   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:57:52,915-Speed 5968.60 samples/sec   Loss 14.1132   LearningRate 0.3655   Epoch: 1   Global Step: 18950   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:57:59,767-Speed 5978.34 samples/sec   Loss 14.0008   LearningRate 0.3657   Epoch: 1   Global Step: 18960   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:58:06,615-Speed 5982.68 samples/sec   Loss 14.0409   LearningRate 0.3659   Epoch: 1   Global Step: 18970   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:58:13,474-Speed 5972.92 samples/sec   Loss 14.0374   LearningRate 0.3661   Epoch: 1   Global Step: 18980   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:58:20,435-Speed 5884.83 samples/sec   Loss 13.9483   LearningRate 0.3663   Epoch: 1   Global Step: 18990   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:58:27,277-Speed 5987.54 samples/sec   Loss 13.9557   LearningRate 0.3665   Epoch: 1   Global Step: 19000   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:58:34,166-Speed 5947.89 samples/sec   Loss 13.9590   LearningRate 0.3667   Epoch: 1   Global Step: 19010   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:58:41,030-Speed 5969.18 samples/sec   Loss 13.9651   LearningRate 0.3669   Epoch: 1   Global Step: 19020   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:58:47,890-Speed 5971.96 samples/sec   Loss 14.0354   LearningRate 0.3671   Epoch: 1   Global Step: 19030   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:58:54,741-Speed 5978.96 samples/sec   Loss 14.0360   LearningRate 0.3672   Epoch: 1   Global Step: 19040   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:59:01,598-Speed 5975.57 samples/sec   Loss 14.0053   LearningRate 0.3674   Epoch: 1   Global Step: 19050   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:59:08,458-Speed 5971.97 samples/sec   Loss 13.9949   LearningRate 0.3676   Epoch: 1   Global Step: 19060   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-07 23:59:15,321-Speed 5969.59 samples/sec   Loss 13.9995   LearningRate 0.3678   Epoch: 1   Global Step: 19070   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:59:22,166-Speed 5984.75 samples/sec   Loss 14.0297   LearningRate 0.3680   Epoch: 1   Global Step: 19080   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:59:29,043-Speed 5957.49 samples/sec   Loss 14.0583   LearningRate 0.3682   Epoch: 1   Global Step: 19090   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:59:35,896-Speed 5978.24 samples/sec   Loss 14.0644   LearningRate 0.3684   Epoch: 1   Global Step: 19100   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:59:42,760-Speed 5968.24 samples/sec   Loss 14.0370   LearningRate 0.3686   Epoch: 1   Global Step: 19110   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:59:49,613-Speed 5978.80 samples/sec   Loss 13.9884   LearningRate 0.3688   Epoch: 1   Global Step: 19120   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-07 23:59:56,471-Speed 5973.34 samples/sec   Loss 14.0149   LearningRate 0.3690   Epoch: 1   Global Step: 19130   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:00:03,324-Speed 5977.84 samples/sec   Loss 14.0034   LearningRate 0.3692   Epoch: 1   Global Step: 19140   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:00:10,192-Speed 5965.29 samples/sec   Loss 14.0988   LearningRate 0.3694   Epoch: 1   Global Step: 19150   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:00:17,042-Speed 5980.68 samples/sec   Loss 14.0103   LearningRate 0.3696   Epoch: 1   Global Step: 19160   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:00:23,888-Speed 5983.81 samples/sec   Loss 14.0522   LearningRate 0.3698   Epoch: 1   Global Step: 19170   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:00:30,745-Speed 5974.62 samples/sec   Loss 14.0892   LearningRate 0.3699   Epoch: 1   Global Step: 19180   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:00:37,612-Speed 5965.80 samples/sec   Loss 13.9976   LearningRate 0.3701   Epoch: 1   Global Step: 19190   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:00:44,477-Speed 5967.79 samples/sec   Loss 14.0093   LearningRate 0.3703   Epoch: 1   Global Step: 19200   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:00:51,377-Speed 5937.21 samples/sec   Loss 13.9825   LearningRate 0.3705   Epoch: 1   Global Step: 19210   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:00:58,252-Speed 5958.77 samples/sec   Loss 14.0158   LearningRate 0.3707   Epoch: 1   Global Step: 19220   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:01:05,105-Speed 5978.00 samples/sec   Loss 14.0896   LearningRate 0.3709   Epoch: 1   Global Step: 19230   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:01:11,979-Speed 5960.43 samples/sec   Loss 13.9963   LearningRate 0.3711   Epoch: 1   Global Step: 19240   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:01:18,834-Speed 5976.00 samples/sec   Loss 14.1589   LearningRate 0.3713   Epoch: 1   Global Step: 19250   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:01:25,692-Speed 5974.09 samples/sec   Loss 14.0625   LearningRate 0.3715   Epoch: 1   Global Step: 19260   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:01:32,555-Speed 5969.35 samples/sec   Loss 14.1026   LearningRate 0.3717   Epoch: 1   Global Step: 19270   Fp16 Grad Scale: 524288   Required: 37 hours
Training: 2022-01-08 00:01:39,420-Speed 5968.12 samples/sec   Loss 14.0799   LearningRate 0.3719   Epoch: 1   Global Step: 19280   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:01:46,295-Speed 5958.56 samples/sec   Loss 14.0030   LearningRate 0.3721   Epoch: 1   Global Step: 19290   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:01:53,185-Speed 5945.64 samples/sec   Loss 14.1314   LearningRate 0.3723   Epoch: 1   Global Step: 19300   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:02:00,053-Speed 5966.15 samples/sec   Loss 13.9960   LearningRate 0.3725   Epoch: 1   Global Step: 19310   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:02:06,908-Speed 5975.72 samples/sec   Loss 14.1186   LearningRate 0.3726   Epoch: 1   Global Step: 19320   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:02:13,811-Speed 5935.46 samples/sec   Loss 14.0985   LearningRate 0.3728   Epoch: 1   Global Step: 19330   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:02:20,670-Speed 5973.36 samples/sec   Loss 14.1169   LearningRate 0.3730   Epoch: 1   Global Step: 19340   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:02:27,554-Speed 5950.65 samples/sec   Loss 14.0917   LearningRate 0.3732   Epoch: 1   Global Step: 19350   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:02:34,428-Speed 5962.65 samples/sec   Loss 14.0252   LearningRate 0.3734   Epoch: 1   Global Step: 19360   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:02:41,323-Speed 5942.14 samples/sec   Loss 14.1333   LearningRate 0.3736   Epoch: 1   Global Step: 19370   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:02:48,176-Speed 5978.13 samples/sec   Loss 14.0050   LearningRate 0.3738   Epoch: 1   Global Step: 19380   Fp16 Grad Scale: 524288   Required: 37 hours
Training: 2022-01-08 00:02:55,039-Speed 5969.13 samples/sec   Loss 14.0591   LearningRate 0.3740   Epoch: 1   Global Step: 19390   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:03:01,910-Speed 5962.22 samples/sec   Loss 14.1228   LearningRate 0.3742   Epoch: 1   Global Step: 19400   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:03:08,778-Speed 5964.55 samples/sec   Loss 14.1607   LearningRate 0.3744   Epoch: 1   Global Step: 19410   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:03:15,654-Speed 5958.01 samples/sec   Loss 14.1138   LearningRate 0.3746   Epoch: 1   Global Step: 19420   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:03:22,505-Speed 5979.83 samples/sec   Loss 14.1975   LearningRate 0.3748   Epoch: 1   Global Step: 19430   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:03:29,367-Speed 5970.75 samples/sec   Loss 14.1082   LearningRate 0.3750   Epoch: 1   Global Step: 19440   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:03:36,234-Speed 5965.57 samples/sec   Loss 14.1248   LearningRate 0.3752   Epoch: 1   Global Step: 19450   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:03:43,107-Speed 5961.03 samples/sec   Loss 14.0105   LearningRate 0.3753   Epoch: 1   Global Step: 19460   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:03:50,001-Speed 5943.11 samples/sec   Loss 14.1139   LearningRate 0.3755   Epoch: 1   Global Step: 19470   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:03:56,870-Speed 5963.87 samples/sec   Loss 14.0842   LearningRate 0.3757   Epoch: 1   Global Step: 19480   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:04:03,729-Speed 5975.08 samples/sec   Loss 14.1358   LearningRate 0.3759   Epoch: 1   Global Step: 19490   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:04:10,591-Speed 5970.62 samples/sec   Loss 14.0407   LearningRate 0.3761   Epoch: 1   Global Step: 19500   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:04:17,454-Speed 5968.90 samples/sec   Loss 14.0594   LearningRate 0.3763   Epoch: 1   Global Step: 19510   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:04:24,365-Speed 5927.78 samples/sec   Loss 14.1957   LearningRate 0.3765   Epoch: 1   Global Step: 19520   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:04:31,308-Speed 5901.12 samples/sec   Loss 14.1407   LearningRate 0.3767   Epoch: 1   Global Step: 19530   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:04:38,237-Speed 5912.09 samples/sec   Loss 14.1224   LearningRate 0.3769   Epoch: 1   Global Step: 19540   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:04:45,177-Speed 5904.00 samples/sec   Loss 14.2013   LearningRate 0.3771   Epoch: 1   Global Step: 19550   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:04:52,115-Speed 5905.74 samples/sec   Loss 14.0192   LearningRate 0.3773   Epoch: 1   Global Step: 19560   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:04:59,052-Speed 5905.49 samples/sec   Loss 14.1646   LearningRate 0.3775   Epoch: 1   Global Step: 19570   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:05:05,990-Speed 5904.55 samples/sec   Loss 14.1854   LearningRate 0.3777   Epoch: 1   Global Step: 19580   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:05:12,922-Speed 5910.49 samples/sec   Loss 14.1399   LearningRate 0.3779   Epoch: 1   Global Step: 19590   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:05:19,865-Speed 5900.82 samples/sec   Loss 14.0450   LearningRate 0.3780   Epoch: 1   Global Step: 19600   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:05:26,801-Speed 5906.41 samples/sec   Loss 14.0800   LearningRate 0.3782   Epoch: 1   Global Step: 19610   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:05:33,668-Speed 5965.07 samples/sec   Loss 14.1184   LearningRate 0.3784   Epoch: 1   Global Step: 19620   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 00:05:40,539-Speed 5963.12 samples/sec   Loss 14.1055   LearningRate 0.3786   Epoch: 1   Global Step: 19630   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 00:05:47,473-Speed 5908.67 samples/sec   Loss 14.1396   LearningRate 0.3788   Epoch: 1   Global Step: 19640   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 00:05:54,332-Speed 5972.65 samples/sec   Loss 14.1499   LearningRate 0.3790   Epoch: 1   Global Step: 19650   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 00:06:01,186-Speed 5978.00 samples/sec   Loss 14.2281   LearningRate 0.3792   Epoch: 1   Global Step: 19660   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 00:06:08,035-Speed 5981.44 samples/sec   Loss 14.1477   LearningRate 0.3794   Epoch: 1   Global Step: 19670   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 00:06:14,883-Speed 5984.26 samples/sec   Loss 14.1274   LearningRate 0.3796   Epoch: 1   Global Step: 19680   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 00:06:21,751-Speed 5964.96 samples/sec   Loss 14.1036   LearningRate 0.3798   Epoch: 1   Global Step: 19690   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 00:06:28,612-Speed 5970.91 samples/sec   Loss 14.2154   LearningRate 0.3800   Epoch: 1   Global Step: 19700   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 00:06:35,554-Speed 5901.30 samples/sec   Loss 14.2355   LearningRate 0.3802   Epoch: 1   Global Step: 19710   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-01-08 00:06:42,398-Speed 5985.80 samples/sec   Loss 14.1710   LearningRate 0.3804   Epoch: 1   Global Step: 19720   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:06:49,247-Speed 5981.46 samples/sec   Loss 14.1057   LearningRate 0.3806   Epoch: 1   Global Step: 19730   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:06:56,108-Speed 5970.48 samples/sec   Loss 14.1512   LearningRate 0.3808   Epoch: 1   Global Step: 19740   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:07:02,986-Speed 5956.59 samples/sec   Loss 14.0652   LearningRate 0.3809   Epoch: 1   Global Step: 19750   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:07:09,837-Speed 5979.65 samples/sec   Loss 14.1996   LearningRate 0.3811   Epoch: 1   Global Step: 19760   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:07:16,700-Speed 5969.55 samples/sec   Loss 14.1248   LearningRate 0.3813   Epoch: 1   Global Step: 19770   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:07:23,563-Speed 5969.06 samples/sec   Loss 14.2057   LearningRate 0.3815   Epoch: 1   Global Step: 19780   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:07:30,425-Speed 5970.34 samples/sec   Loss 14.0777   LearningRate 0.3817   Epoch: 1   Global Step: 19790   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:07:37,285-Speed 5971.72 samples/sec   Loss 14.1842   LearningRate 0.3819   Epoch: 1   Global Step: 19800   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:07:44,143-Speed 5973.66 samples/sec   Loss 14.2392   LearningRate 0.3821   Epoch: 1   Global Step: 19810   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:07:51,027-Speed 5951.76 samples/sec   Loss 14.1756   LearningRate 0.3823   Epoch: 1   Global Step: 19820   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:07:57,981-Speed 5890.61 samples/sec   Loss 14.1716   LearningRate 0.3825   Epoch: 1   Global Step: 19830   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:08:04,843-Speed 5970.23 samples/sec   Loss 14.1575   LearningRate 0.3827   Epoch: 1   Global Step: 19840   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:08:11,718-Speed 5959.20 samples/sec   Loss 14.1660   LearningRate 0.3829   Epoch: 1   Global Step: 19850   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:08:18,576-Speed 5973.61 samples/sec   Loss 14.2101   LearningRate 0.3831   Epoch: 1   Global Step: 19860   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:08:25,424-Speed 5983.13 samples/sec   Loss 14.2317   LearningRate 0.3833   Epoch: 1   Global Step: 19870   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:08:32,279-Speed 5976.21 samples/sec   Loss 14.1799   LearningRate 0.3835   Epoch: 1   Global Step: 19880   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:08:39,142-Speed 5968.74 samples/sec   Loss 14.1250   LearningRate 0.3836   Epoch: 1   Global Step: 19890   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:08:46,010-Speed 5964.68 samples/sec   Loss 14.1447   LearningRate 0.3838   Epoch: 1   Global Step: 19900   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:08:52,868-Speed 5973.96 samples/sec   Loss 14.1496   LearningRate 0.3840   Epoch: 1   Global Step: 19910   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:08:59,715-Speed 5982.94 samples/sec   Loss 14.1836   LearningRate 0.3842   Epoch: 1   Global Step: 19920   Fp16 Grad Scale: 524288   Required: 37 hours
Training: 2022-01-08 00:09:06,567-Speed 5978.62 samples/sec   Loss 14.1244   LearningRate 0.3844   Epoch: 1   Global Step: 19930   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:09:13,495-Speed 5913.72 samples/sec   Loss 14.2306   LearningRate 0.3846   Epoch: 1   Global Step: 19940   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:09:20,396-Speed 5936.58 samples/sec   Loss 14.1884   LearningRate 0.3848   Epoch: 1   Global Step: 19950   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:09:27,252-Speed 5975.62 samples/sec   Loss 14.2145   LearningRate 0.3850   Epoch: 1   Global Step: 19960   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:09:34,108-Speed 5975.69 samples/sec   Loss 14.2657   LearningRate 0.3852   Epoch: 1   Global Step: 19970   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:09:40,959-Speed 5979.21 samples/sec   Loss 14.1910   LearningRate 0.3854   Epoch: 1   Global Step: 19980   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:09:47,838-Speed 5957.88 samples/sec   Loss 14.2744   LearningRate 0.3856   Epoch: 1   Global Step: 19990   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:09:54,699-Speed 5971.21 samples/sec   Loss 14.2818   LearningRate 0.3858   Epoch: 1   Global Step: 20000   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:10:21,473-[lfw][20000]XNorm: 23.188454
Training: 2022-01-08 00:10:21,474-[lfw][20000]Accuracy-Flip: 0.99617+-0.00308
Training: 2022-01-08 00:10:21,475-[lfw][20000]Accuracy-Highest: 0.99617
Training: 2022-01-08 00:10:52,567-[cfp_fp][20000]XNorm: 20.132425
Training: 2022-01-08 00:10:52,568-[cfp_fp][20000]Accuracy-Flip: 0.95814+-0.00899
Training: 2022-01-08 00:10:52,569-[cfp_fp][20000]Accuracy-Highest: 0.96643
Training: 2022-01-08 00:11:19,198-[agedb_30][20000]XNorm: 22.495338
Training: 2022-01-08 00:11:19,199-[agedb_30][20000]Accuracy-Flip: 0.94933+-0.01106
Training: 2022-01-08 00:11:19,200-[agedb_30][20000]Accuracy-Highest: 0.94933
Training: 2022-01-08 00:11:26,054-Speed 448.37 samples/sec   Loss 14.1682   LearningRate 0.3860   Epoch: 1   Global Step: 20010   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:11:32,891-Speed 5992.84 samples/sec   Loss 14.3106   LearningRate 0.3862   Epoch: 1   Global Step: 20020   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:11:39,759-Speed 5964.20 samples/sec   Loss 14.2503   LearningRate 0.3863   Epoch: 1   Global Step: 20030   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:11:46,617-Speed 5974.12 samples/sec   Loss 14.1657   LearningRate 0.3865   Epoch: 1   Global Step: 20040   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:11:53,473-Speed 5978.13 samples/sec   Loss 14.1472   LearningRate 0.3867   Epoch: 1   Global Step: 20050   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:12:00,345-Speed 5961.51 samples/sec   Loss 14.1893   LearningRate 0.3869   Epoch: 1   Global Step: 20060   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:12:07,208-Speed 5969.39 samples/sec   Loss 14.2230   LearningRate 0.3871   Epoch: 1   Global Step: 20070   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:12:14,077-Speed 5963.67 samples/sec   Loss 14.2626   LearningRate 0.3873   Epoch: 1   Global Step: 20080   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:12:20,950-Speed 5960.19 samples/sec   Loss 14.2551   LearningRate 0.3875   Epoch: 1   Global Step: 20090   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:12:27,845-Speed 5941.60 samples/sec   Loss 14.2190   LearningRate 0.3877   Epoch: 1   Global Step: 20100   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:12:34,736-Speed 5946.01 samples/sec   Loss 14.1969   LearningRate 0.3879   Epoch: 1   Global Step: 20110   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:12:41,602-Speed 5965.94 samples/sec   Loss 14.1787   LearningRate 0.3881   Epoch: 1   Global Step: 20120   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:12:48,495-Speed 5943.37 samples/sec   Loss 14.2623   LearningRate 0.3883   Epoch: 1   Global Step: 20130   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:12:55,366-Speed 5962.69 samples/sec   Loss 14.2177   LearningRate 0.3885   Epoch: 1   Global Step: 20140   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:13:02,256-Speed 5946.28 samples/sec   Loss 14.2581   LearningRate 0.3887   Epoch: 1   Global Step: 20150   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:13:09,161-Speed 5932.48 samples/sec   Loss 14.1890   LearningRate 0.3889   Epoch: 1   Global Step: 20160   Fp16 Grad Scale: 524288   Required: 37 hours
Training: 2022-01-08 00:13:16,017-Speed 5976.26 samples/sec   Loss 14.1762   LearningRate 0.3890   Epoch: 1   Global Step: 20170   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:13:22,873-Speed 5974.61 samples/sec   Loss 14.0890   LearningRate 0.3892   Epoch: 1   Global Step: 20180   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:13:29,750-Speed 5957.47 samples/sec   Loss 14.1919   LearningRate 0.3894   Epoch: 1   Global Step: 20190   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:13:36,747-Speed 5855.61 samples/sec   Loss 14.2904   LearningRate 0.3896   Epoch: 1   Global Step: 20200   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:13:43,725-Speed 5871.17 samples/sec   Loss 14.2980   LearningRate 0.3898   Epoch: 1   Global Step: 20210   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:13:50,692-Speed 5880.38 samples/sec   Loss 14.2522   LearningRate 0.3900   Epoch: 1   Global Step: 20220   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:13:57,539-Speed 5982.75 samples/sec   Loss 14.2189   LearningRate 0.3902   Epoch: 1   Global Step: 20230   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:14:04,397-Speed 5974.00 samples/sec   Loss 14.3442   LearningRate 0.3904   Epoch: 1   Global Step: 20240   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:14:11,279-Speed 5955.03 samples/sec   Loss 14.2672   LearningRate 0.3906   Epoch: 1   Global Step: 20250   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:14:18,145-Speed 5966.22 samples/sec   Loss 14.3534   LearningRate 0.3908   Epoch: 1   Global Step: 20260   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:14:25,005-Speed 5972.35 samples/sec   Loss 14.1867   LearningRate 0.3910   Epoch: 1   Global Step: 20270   Fp16 Grad Scale: 524288   Required: 37 hours
Training: 2022-01-08 00:14:31,853-Speed 5982.07 samples/sec   Loss 14.2767   LearningRate 0.3912   Epoch: 1   Global Step: 20280   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:14:38,725-Speed 5962.58 samples/sec   Loss 14.2535   LearningRate 0.3914   Epoch: 1   Global Step: 20290   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:14:45,577-Speed 5978.72 samples/sec   Loss 14.1849   LearningRate 0.3916   Epoch: 1   Global Step: 20300   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:14:52,435-Speed 5974.43 samples/sec   Loss 14.2256   LearningRate 0.3917   Epoch: 1   Global Step: 20310   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:14:59,283-Speed 5981.65 samples/sec   Loss 14.3316   LearningRate 0.3919   Epoch: 1   Global Step: 20320   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:15:06,134-Speed 5980.23 samples/sec   Loss 14.2770   LearningRate 0.3921   Epoch: 1   Global Step: 20330   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:15:13,062-Speed 5913.08 samples/sec   Loss 14.3326   LearningRate 0.3923   Epoch: 1   Global Step: 20340   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:15:19,943-Speed 5953.68 samples/sec   Loss 14.2301   LearningRate 0.3925   Epoch: 1   Global Step: 20350   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:15:26,805-Speed 5970.26 samples/sec   Loss 14.2283   LearningRate 0.3927   Epoch: 1   Global Step: 20360   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:15:33,668-Speed 5969.55 samples/sec   Loss 14.3166   LearningRate 0.3929   Epoch: 1   Global Step: 20370   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:15:40,528-Speed 5972.44 samples/sec   Loss 14.1838   LearningRate 0.3931   Epoch: 1   Global Step: 20380   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:15:47,382-Speed 5976.51 samples/sec   Loss 14.2109   LearningRate 0.3933   Epoch: 1   Global Step: 20390   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:15:54,259-Speed 5957.40 samples/sec   Loss 14.2276   LearningRate 0.3935   Epoch: 1   Global Step: 20400   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:16:01,112-Speed 5977.70 samples/sec   Loss 14.3041   LearningRate 0.3937   Epoch: 1   Global Step: 20410   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:16:07,963-Speed 5980.51 samples/sec   Loss 14.2910   LearningRate 0.3939   Epoch: 1   Global Step: 20420   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:16:14,825-Speed 5969.23 samples/sec   Loss 14.2784   LearningRate 0.3941   Epoch: 1   Global Step: 20430   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:16:21,686-Speed 5971.34 samples/sec   Loss 14.2588   LearningRate 0.3943   Epoch: 1   Global Step: 20440   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:16:28,544-Speed 5972.99 samples/sec   Loss 14.3891   LearningRate 0.3944   Epoch: 1   Global Step: 20450   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:16:35,408-Speed 5968.79 samples/sec   Loss 14.2704   LearningRate 0.3946   Epoch: 1   Global Step: 20460   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:16:42,254-Speed 5983.98 samples/sec   Loss 14.2026   LearningRate 0.3948   Epoch: 1   Global Step: 20470   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:16:49,116-Speed 5970.21 samples/sec   Loss 14.3087   LearningRate 0.3950   Epoch: 1   Global Step: 20480   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:16:55,978-Speed 5970.01 samples/sec   Loss 14.2646   LearningRate 0.3952   Epoch: 1   Global Step: 20490   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:17:02,830-Speed 5978.81 samples/sec   Loss 14.2050   LearningRate 0.3954   Epoch: 1   Global Step: 20500   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:17:09,697-Speed 5966.27 samples/sec   Loss 14.1739   LearningRate 0.3956   Epoch: 1   Global Step: 20510   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:17:16,566-Speed 5963.80 samples/sec   Loss 14.2827   LearningRate 0.3958   Epoch: 1   Global Step: 20520   Fp16 Grad Scale: 524288   Required: 37 hours
Training: 2022-01-08 00:17:23,412-Speed 5984.52 samples/sec   Loss 14.3574   LearningRate 0.3960   Epoch: 1   Global Step: 20530   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:17:30,284-Speed 5961.48 samples/sec   Loss 14.2339   LearningRate 0.3962   Epoch: 1   Global Step: 20540   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:17:37,150-Speed 5966.63 samples/sec   Loss 14.3780   LearningRate 0.3964   Epoch: 1   Global Step: 20550   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:17:44,007-Speed 5974.78 samples/sec   Loss 14.3412   LearningRate 0.3966   Epoch: 1   Global Step: 20560   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:17:50,859-Speed 5979.34 samples/sec   Loss 14.3192   LearningRate 0.3968   Epoch: 1   Global Step: 20570   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:17:57,713-Speed 5976.12 samples/sec   Loss 14.3630   LearningRate 0.3970   Epoch: 1   Global Step: 20580   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:18:04,580-Speed 5966.49 samples/sec   Loss 14.3197   LearningRate 0.3971   Epoch: 1   Global Step: 20590   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:18:11,444-Speed 5968.89 samples/sec   Loss 14.2878   LearningRate 0.3973   Epoch: 1   Global Step: 20600   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:18:18,321-Speed 5956.96 samples/sec   Loss 14.4368   LearningRate 0.3975   Epoch: 1   Global Step: 20610   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:18:25,185-Speed 5968.12 samples/sec   Loss 14.3663   LearningRate 0.3977   Epoch: 1   Global Step: 20620   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:18:32,043-Speed 5974.63 samples/sec   Loss 14.3877   LearningRate 0.3979   Epoch: 1   Global Step: 20630   Fp16 Grad Scale: 524288   Required: 37 hours
Training: 2022-01-08 00:18:38,892-Speed 5981.42 samples/sec   Loss 14.2651   LearningRate 0.3981   Epoch: 1   Global Step: 20640   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:18:45,739-Speed 5982.56 samples/sec   Loss 14.4182   LearningRate 0.3983   Epoch: 1   Global Step: 20650   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:18:52,582-Speed 5987.36 samples/sec   Loss 14.3083   LearningRate 0.3985   Epoch: 1   Global Step: 20660   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:18:59,432-Speed 5980.58 samples/sec   Loss 14.3564   LearningRate 0.3987   Epoch: 1   Global Step: 20670   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:19:06,291-Speed 5972.25 samples/sec   Loss 14.3171   LearningRate 0.3989   Epoch: 1   Global Step: 20680   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:19:13,139-Speed 5982.88 samples/sec   Loss 14.3761   LearningRate 0.3991   Epoch: 1   Global Step: 20690   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:19:19,986-Speed 5982.54 samples/sec   Loss 14.3618   LearningRate 0.3993   Epoch: 1   Global Step: 20700   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:19:26,846-Speed 5971.59 samples/sec   Loss 14.3783   LearningRate 0.3995   Epoch: 1   Global Step: 20710   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:19:33,693-Speed 5983.49 samples/sec   Loss 14.3348   LearningRate 0.3997   Epoch: 1   Global Step: 20720   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:19:40,545-Speed 5978.61 samples/sec   Loss 14.2900   LearningRate 0.3998   Epoch: 1   Global Step: 20730   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:19:47,414-Speed 5965.87 samples/sec   Loss 14.2773   LearningRate 0.4000   Epoch: 1   Global Step: 20740   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:20:10,652-Speed 1762.68 samples/sec   Loss 14.3140   LearningRate 0.3999   Epoch: 2   Global Step: 20750   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:20:17,496-Speed 5987.93 samples/sec   Loss 14.2361   LearningRate 0.3999   Epoch: 2   Global Step: 20760   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:20:26,146-Speed 4736.05 samples/sec   Loss 14.3834   LearningRate 0.3999   Epoch: 2   Global Step: 20770   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:20:33,011-Speed 5967.52 samples/sec   Loss 14.4183   LearningRate 0.3998   Epoch: 2   Global Step: 20780   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:20:39,856-Speed 5985.66 samples/sec   Loss 14.3247   LearningRate 0.3998   Epoch: 2   Global Step: 20790   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:20:46,706-Speed 5991.00 samples/sec   Loss 14.3725   LearningRate 0.3997   Epoch: 2   Global Step: 20800   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:20:53,551-Speed 5984.46 samples/sec   Loss 14.3150   LearningRate 0.3997   Epoch: 2   Global Step: 20810   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:21:00,393-Speed 5988.46 samples/sec   Loss 14.3427   LearningRate 0.3996   Epoch: 2   Global Step: 20820   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:21:07,245-Speed 5981.40 samples/sec   Loss 14.2727   LearningRate 0.3996   Epoch: 2   Global Step: 20830   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:21:14,100-Speed 5976.79 samples/sec   Loss 14.2951   LearningRate 0.3996   Epoch: 2   Global Step: 20840   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:21:20,947-Speed 5982.92 samples/sec   Loss 14.1956   LearningRate 0.3995   Epoch: 2   Global Step: 20850   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:21:27,793-Speed 5984.77 samples/sec   Loss 14.2624   LearningRate 0.3995   Epoch: 2   Global Step: 20860   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:21:34,664-Speed 5962.61 samples/sec   Loss 14.4431   LearningRate 0.3994   Epoch: 2   Global Step: 20870   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:21:41,514-Speed 5981.33 samples/sec   Loss 14.3564   LearningRate 0.3994   Epoch: 2   Global Step: 20880   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:21:48,411-Speed 5939.53 samples/sec   Loss 14.3146   LearningRate 0.3993   Epoch: 2   Global Step: 20890   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:21:55,280-Speed 5963.96 samples/sec   Loss 14.4168   LearningRate 0.3993   Epoch: 2   Global Step: 20900   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:22:02,164-Speed 5950.74 samples/sec   Loss 14.2678   LearningRate 0.3993   Epoch: 2   Global Step: 20910   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:22:09,023-Speed 5973.73 samples/sec   Loss 14.2952   LearningRate 0.3992   Epoch: 2   Global Step: 20920   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:22:15,927-Speed 5934.25 samples/sec   Loss 14.4262   LearningRate 0.3992   Epoch: 2   Global Step: 20930   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:22:22,832-Speed 5933.34 samples/sec   Loss 14.2494   LearningRate 0.3991   Epoch: 2   Global Step: 20940   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:22:29,767-Speed 5908.15 samples/sec   Loss 14.3205   LearningRate 0.3991   Epoch: 2   Global Step: 20950   Fp16 Grad Scale: 524288   Required: 37 hours
Training: 2022-01-08 00:22:36,647-Speed 5953.98 samples/sec   Loss 14.2857   LearningRate 0.3990   Epoch: 2   Global Step: 20960   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:22:43,514-Speed 5966.82 samples/sec   Loss 14.4554   LearningRate 0.3990   Epoch: 2   Global Step: 20970   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:22:50,378-Speed 5968.14 samples/sec   Loss 14.2867   LearningRate 0.3990   Epoch: 2   Global Step: 20980   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:22:57,255-Speed 5957.35 samples/sec   Loss 14.3021   LearningRate 0.3989   Epoch: 2   Global Step: 20990   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:23:04,116-Speed 5971.51 samples/sec   Loss 14.3730   LearningRate 0.3989   Epoch: 2   Global Step: 21000   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:23:10,988-Speed 5961.26 samples/sec   Loss 14.3949   LearningRate 0.3988   Epoch: 2   Global Step: 21010   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:23:17,836-Speed 5983.06 samples/sec   Loss 14.2998   LearningRate 0.3988   Epoch: 2   Global Step: 21020   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:23:24,688-Speed 5978.75 samples/sec   Loss 14.4026   LearningRate 0.3987   Epoch: 2   Global Step: 21030   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:23:31,536-Speed 5982.33 samples/sec   Loss 14.3942   LearningRate 0.3987   Epoch: 2   Global Step: 21040   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:23:38,389-Speed 5978.24 samples/sec   Loss 14.2842   LearningRate 0.3987   Epoch: 2   Global Step: 21050   Fp16 Grad Scale: 131072   Required: 37 hours
Training: 2022-01-08 00:23:45,256-Speed 5971.72 samples/sec   Loss 14.3547   LearningRate 0.3986   Epoch: 2   Global Step: 21060   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:23:52,130-Speed 5959.57 samples/sec   Loss 14.3218   LearningRate 0.3986   Epoch: 2   Global Step: 21070   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:23:58,981-Speed 5980.33 samples/sec   Loss 14.3023   LearningRate 0.3985   Epoch: 2   Global Step: 21080   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:24:05,850-Speed 5968.97 samples/sec   Loss 14.3818   LearningRate 0.3985   Epoch: 2   Global Step: 21090   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:24:12,730-Speed 5956.19 samples/sec   Loss 14.3352   LearningRate 0.3984   Epoch: 2   Global Step: 21100   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:24:19,592-Speed 5969.66 samples/sec   Loss 14.4112   LearningRate 0.3984   Epoch: 2   Global Step: 21110   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:24:26,487-Speed 5941.95 samples/sec   Loss 14.2535   LearningRate 0.3984   Epoch: 2   Global Step: 21120   Fp16 Grad Scale: 262144   Required: 37 hours
Training: 2022-01-08 00:24:33,353-Speed 5966.57 samples/sec   Loss 14.2664   LearningRate 0.3983   Epoch: 2   Global Step: 21130   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:24:40,252-Speed 5939.29 samples/sec   Loss 14.2748   LearningRate 0.3983   Epoch: 2   Global Step: 21140   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:24:47,139-Speed 5950.32 samples/sec   Loss 14.3054   LearningRate 0.3982   Epoch: 2   Global Step: 21150   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:24:54,009-Speed 5963.61 samples/sec   Loss 14.3873   LearningRate 0.3982   Epoch: 2   Global Step: 21160   Fp16 Grad Scale: 524288   Required: 36 hours
Training: 2022-01-08 00:25:00,863-Speed 5976.78 samples/sec   Loss 14.3431   LearningRate 0.3982   Epoch: 2   Global Step: 21170   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:25:07,729-Speed 5967.24 samples/sec   Loss 14.3505   LearningRate 0.3981   Epoch: 2   Global Step: 21180   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:25:14,580-Speed 5979.58 samples/sec   Loss 14.3024   LearningRate 0.3981   Epoch: 2   Global Step: 21190   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:25:21,444-Speed 5968.73 samples/sec   Loss 14.4004   LearningRate 0.3980   Epoch: 2   Global Step: 21200   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:25:28,304-Speed 5971.68 samples/sec   Loss 14.3849   LearningRate 0.3980   Epoch: 2   Global Step: 21210   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:25:35,171-Speed 5965.49 samples/sec   Loss 14.3819   LearningRate 0.3979   Epoch: 2   Global Step: 21220   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:25:42,032-Speed 5972.78 samples/sec   Loss 14.3496   LearningRate 0.3979   Epoch: 2   Global Step: 21230   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:25:48,865-Speed 5995.87 samples/sec   Loss 14.3283   LearningRate 0.3979   Epoch: 2   Global Step: 21240   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 00:25:55,716-Speed 5979.50 samples/sec   Loss 14.2704   LearningRate 0.3978   Epoch: 2   Global Step: 21250   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 00:26:02,560-Speed 5985.05 samples/sec   Loss 14.4301   LearningRate 0.3978   Epoch: 2   Global Step: 21260   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 00:26:09,413-Speed 5978.33 samples/sec   Loss 14.3630   LearningRate 0.3977   Epoch: 2   Global Step: 21270   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 00:26:16,277-Speed 5968.61 samples/sec   Loss 14.3071   LearningRate 0.3977   Epoch: 2   Global Step: 21280   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 00:26:23,132-Speed 5975.76 samples/sec   Loss 14.3212   LearningRate 0.3976   Epoch: 2   Global Step: 21290   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 00:26:30,021-Speed 5946.70 samples/sec   Loss 14.3807   LearningRate 0.3976   Epoch: 2   Global Step: 21300   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 00:26:36,875-Speed 5976.84 samples/sec   Loss 14.3744   LearningRate 0.3976   Epoch: 2   Global Step: 21310   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 00:26:43,744-Speed 5963.98 samples/sec   Loss 14.3029   LearningRate 0.3975   Epoch: 2   Global Step: 21320   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 00:26:50,605-Speed 5971.33 samples/sec   Loss 14.3643   LearningRate 0.3975   Epoch: 2   Global Step: 21330   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 00:26:57,460-Speed 5976.37 samples/sec   Loss 14.2226   LearningRate 0.3974   Epoch: 2   Global Step: 21340   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:27:04,326-Speed 5966.81 samples/sec   Loss 14.2794   LearningRate 0.3974   Epoch: 2   Global Step: 21350   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:27:11,246-Speed 5920.27 samples/sec   Loss 14.3303   LearningRate 0.3973   Epoch: 2   Global Step: 21360   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:27:18,098-Speed 5978.98 samples/sec   Loss 14.4116   LearningRate 0.3973   Epoch: 2   Global Step: 21370   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:27:24,943-Speed 5984.05 samples/sec   Loss 14.3438   LearningRate 0.3973   Epoch: 2   Global Step: 21380   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:27:31,811-Speed 5965.02 samples/sec   Loss 14.2876   LearningRate 0.3972   Epoch: 2   Global Step: 21390   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:27:38,672-Speed 5971.18 samples/sec   Loss 14.2442   LearningRate 0.3972   Epoch: 2   Global Step: 21400   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:27:45,536-Speed 5968.69 samples/sec   Loss 14.2198   LearningRate 0.3971   Epoch: 2   Global Step: 21410   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:27:52,400-Speed 5968.30 samples/sec   Loss 14.2628   LearningRate 0.3971   Epoch: 2   Global Step: 21420   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:27:59,246-Speed 5984.08 samples/sec   Loss 14.3163   LearningRate 0.3970   Epoch: 2   Global Step: 21430   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:28:06,093-Speed 5983.45 samples/sec   Loss 14.3459   LearningRate 0.3970   Epoch: 2   Global Step: 21440   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:28:12,959-Speed 5966.29 samples/sec   Loss 14.2767   LearningRate 0.3970   Epoch: 2   Global Step: 21450   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:28:19,823-Speed 5968.62 samples/sec   Loss 14.3242   LearningRate 0.3969   Epoch: 2   Global Step: 21460   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:28:26,690-Speed 5965.50 samples/sec   Loss 14.1972   LearningRate 0.3969   Epoch: 2   Global Step: 21470   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:28:33,547-Speed 5974.81 samples/sec   Loss 14.2503   LearningRate 0.3968   Epoch: 2   Global Step: 21480   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:28:40,423-Speed 5957.81 samples/sec   Loss 14.2704   LearningRate 0.3968   Epoch: 2   Global Step: 21490   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:28:47,289-Speed 5966.51 samples/sec   Loss 14.3235   LearningRate 0.3967   Epoch: 2   Global Step: 21500   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:28:54,162-Speed 5968.89 samples/sec   Loss 14.3293   LearningRate 0.3967   Epoch: 2   Global Step: 21510   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:29:01,016-Speed 5977.20 samples/sec   Loss 14.2658   LearningRate 0.3967   Epoch: 2   Global Step: 21520   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:29:07,881-Speed 5967.92 samples/sec   Loss 14.2109   LearningRate 0.3966   Epoch: 2   Global Step: 21530   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:29:14,727-Speed 5983.01 samples/sec   Loss 14.2416   LearningRate 0.3966   Epoch: 2   Global Step: 21540   Fp16 Grad Scale: 524288   Required: 36 hours
Training: 2022-01-08 00:29:21,588-Speed 5971.41 samples/sec   Loss 14.2938   LearningRate 0.3965   Epoch: 2   Global Step: 21550   Fp16 Grad Scale: 524288   Required: 36 hours
Training: 2022-01-08 00:29:28,426-Speed 5991.26 samples/sec   Loss 14.2347   LearningRate 0.3965   Epoch: 2   Global Step: 21560   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:29:35,285-Speed 5972.51 samples/sec   Loss 14.2654   LearningRate 0.3964   Epoch: 2   Global Step: 21570   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:29:42,150-Speed 5967.95 samples/sec   Loss 14.3196   LearningRate 0.3964   Epoch: 2   Global Step: 21580   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:29:49,010-Speed 5971.22 samples/sec   Loss 14.3266   LearningRate 0.3964   Epoch: 2   Global Step: 21590   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:29:55,862-Speed 5978.82 samples/sec   Loss 14.3442   LearningRate 0.3963   Epoch: 2   Global Step: 21600   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:30:02,718-Speed 5975.64 samples/sec   Loss 14.2674   LearningRate 0.3963   Epoch: 2   Global Step: 21610   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:30:09,567-Speed 5982.04 samples/sec   Loss 14.2480   LearningRate 0.3962   Epoch: 2   Global Step: 21620   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:30:16,428-Speed 5970.69 samples/sec   Loss 14.3095   LearningRate 0.3962   Epoch: 2   Global Step: 21630   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:30:23,285-Speed 5974.31 samples/sec   Loss 14.3140   LearningRate 0.3961   Epoch: 2   Global Step: 21640   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:30:30,145-Speed 5971.52 samples/sec   Loss 14.1711   LearningRate 0.3961   Epoch: 2   Global Step: 21650   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:30:37,052-Speed 5931.43 samples/sec   Loss 14.3286   LearningRate 0.3961   Epoch: 2   Global Step: 21660   Fp16 Grad Scale: 524288   Required: 36 hours
Training: 2022-01-08 00:30:43,918-Speed 5966.98 samples/sec   Loss 14.2176   LearningRate 0.3960   Epoch: 2   Global Step: 21670   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:30:50,766-Speed 5981.82 samples/sec   Loss 14.2466   LearningRate 0.3960   Epoch: 2   Global Step: 21680   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:30:57,626-Speed 5971.99 samples/sec   Loss 14.2014   LearningRate 0.3959   Epoch: 2   Global Step: 21690   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:31:04,505-Speed 5955.76 samples/sec   Loss 14.3234   LearningRate 0.3959   Epoch: 2   Global Step: 21700   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:31:11,353-Speed 5982.04 samples/sec   Loss 14.2770   LearningRate 0.3958   Epoch: 2   Global Step: 21710   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:31:18,214-Speed 5972.87 samples/sec   Loss 14.2389   LearningRate 0.3958   Epoch: 2   Global Step: 21720   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:31:25,063-Speed 5981.47 samples/sec   Loss 14.2748   LearningRate 0.3958   Epoch: 2   Global Step: 21730   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:31:31,908-Speed 5985.56 samples/sec   Loss 14.2566   LearningRate 0.3957   Epoch: 2   Global Step: 21740   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:31:38,763-Speed 5975.52 samples/sec   Loss 14.3305   LearningRate 0.3957   Epoch: 2   Global Step: 21750   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:31:45,626-Speed 5969.30 samples/sec   Loss 14.2937   LearningRate 0.3956   Epoch: 2   Global Step: 21760   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:31:52,478-Speed 5978.76 samples/sec   Loss 14.1934   LearningRate 0.3956   Epoch: 2   Global Step: 21770   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:31:59,364-Speed 5952.25 samples/sec   Loss 14.1999   LearningRate 0.3955   Epoch: 2   Global Step: 21780   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:32:06,247-Speed 5953.59 samples/sec   Loss 14.2191   LearningRate 0.3955   Epoch: 2   Global Step: 21790   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:32:13,103-Speed 5975.67 samples/sec   Loss 14.2084   LearningRate 0.3955   Epoch: 2   Global Step: 21800   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:32:19,966-Speed 5969.11 samples/sec   Loss 14.2641   LearningRate 0.3954   Epoch: 2   Global Step: 21810   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:32:26,824-Speed 5975.78 samples/sec   Loss 14.2374   LearningRate 0.3954   Epoch: 2   Global Step: 21820   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:32:33,685-Speed 5970.52 samples/sec   Loss 14.1974   LearningRate 0.3953   Epoch: 2   Global Step: 21830   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:32:40,556-Speed 5962.30 samples/sec   Loss 14.1758   LearningRate 0.3953   Epoch: 2   Global Step: 21840   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:32:47,427-Speed 5962.76 samples/sec   Loss 14.1688   LearningRate 0.3952   Epoch: 2   Global Step: 21850   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:32:54,279-Speed 5977.91 samples/sec   Loss 14.3149   LearningRate 0.3952   Epoch: 2   Global Step: 21860   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:33:01,150-Speed 5965.24 samples/sec   Loss 14.2255   LearningRate 0.3952   Epoch: 2   Global Step: 21870   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:33:08,020-Speed 5964.13 samples/sec   Loss 14.2528   LearningRate 0.3951   Epoch: 2   Global Step: 21880   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:33:14,876-Speed 5975.31 samples/sec   Loss 14.2058   LearningRate 0.3951   Epoch: 2   Global Step: 21890   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:33:21,729-Speed 5977.61 samples/sec   Loss 14.1990   LearningRate 0.3950   Epoch: 2   Global Step: 21900   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:33:28,599-Speed 5963.43 samples/sec   Loss 14.2609   LearningRate 0.3950   Epoch: 2   Global Step: 21910   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:33:35,449-Speed 5980.88 samples/sec   Loss 14.2688   LearningRate 0.3949   Epoch: 2   Global Step: 21920   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:33:42,303-Speed 5977.01 samples/sec   Loss 14.2504   LearningRate 0.3949   Epoch: 2   Global Step: 21930   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:33:49,164-Speed 5971.81 samples/sec   Loss 14.2212   LearningRate 0.3949   Epoch: 2   Global Step: 21940   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:33:56,019-Speed 5976.33 samples/sec   Loss 14.2014   LearningRate 0.3948   Epoch: 2   Global Step: 21950   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:34:05,057-Speed 4534.06 samples/sec   Loss 14.2327   LearningRate 0.3948   Epoch: 2   Global Step: 21960   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:34:11,906-Speed 5982.65 samples/sec   Loss 14.2860   LearningRate 0.3947   Epoch: 2   Global Step: 21970   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:34:18,775-Speed 5963.55 samples/sec   Loss 14.2784   LearningRate 0.3947   Epoch: 2   Global Step: 21980   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:34:25,639-Speed 5968.74 samples/sec   Loss 14.2768   LearningRate 0.3947   Epoch: 2   Global Step: 21990   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:34:32,495-Speed 5975.60 samples/sec   Loss 14.1829   LearningRate 0.3946   Epoch: 2   Global Step: 22000   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:34:39,454-Speed 5887.81 samples/sec   Loss 14.2072   LearningRate 0.3946   Epoch: 2   Global Step: 22010   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:34:46,319-Speed 5967.57 samples/sec   Loss 14.2233   LearningRate 0.3945   Epoch: 2   Global Step: 22020   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:34:53,186-Speed 5965.87 samples/sec   Loss 14.2054   LearningRate 0.3945   Epoch: 2   Global Step: 22030   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:35:00,047-Speed 5970.93 samples/sec   Loss 14.2182   LearningRate 0.3944   Epoch: 2   Global Step: 22040   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:35:06,918-Speed 5961.85 samples/sec   Loss 14.2360   LearningRate 0.3944   Epoch: 2   Global Step: 22050   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:35:13,797-Speed 5956.03 samples/sec   Loss 14.1742   LearningRate 0.3944   Epoch: 2   Global Step: 22060   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:35:20,686-Speed 5946.81 samples/sec   Loss 14.2176   LearningRate 0.3943   Epoch: 2   Global Step: 22070   Fp16 Grad Scale: 524288   Required: 36 hours
Training: 2022-01-08 00:35:27,565-Speed 5955.43 samples/sec   Loss 14.2145   LearningRate 0.3943   Epoch: 2   Global Step: 22080   Fp16 Grad Scale: 524288   Required: 36 hours
Training: 2022-01-08 00:35:34,435-Speed 5963.13 samples/sec   Loss 14.1871   LearningRate 0.3942   Epoch: 2   Global Step: 22090   Fp16 Grad Scale: 524288   Required: 36 hours
Training: 2022-01-08 00:35:41,291-Speed 5976.18 samples/sec   Loss 14.3394   LearningRate 0.3942   Epoch: 2   Global Step: 22100   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:35:48,157-Speed 5966.42 samples/sec   Loss 14.1843   LearningRate 0.3941   Epoch: 2   Global Step: 22110   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:35:55,023-Speed 5966.77 samples/sec   Loss 14.3061   LearningRate 0.3941   Epoch: 2   Global Step: 22120   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:36:01,913-Speed 5946.19 samples/sec   Loss 14.2714   LearningRate 0.3941   Epoch: 2   Global Step: 22130   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:36:08,791-Speed 5957.01 samples/sec   Loss 14.1357   LearningRate 0.3940   Epoch: 2   Global Step: 22140   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:36:15,645-Speed 5977.86 samples/sec   Loss 14.2081   LearningRate 0.3940   Epoch: 2   Global Step: 22150   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:36:22,510-Speed 5967.34 samples/sec   Loss 14.2459   LearningRate 0.3939   Epoch: 2   Global Step: 22160   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:36:29,381-Speed 5962.71 samples/sec   Loss 14.1652   LearningRate 0.3939   Epoch: 2   Global Step: 22170   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:36:36,236-Speed 5976.52 samples/sec   Loss 14.2050   LearningRate 0.3938   Epoch: 2   Global Step: 22180   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:36:43,095-Speed 5973.04 samples/sec   Loss 14.1626   LearningRate 0.3938   Epoch: 2   Global Step: 22190   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:36:49,943-Speed 5982.28 samples/sec   Loss 14.1306   LearningRate 0.3938   Epoch: 2   Global Step: 22200   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:36:56,833-Speed 5948.17 samples/sec   Loss 14.1156   LearningRate 0.3937   Epoch: 2   Global Step: 22210   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:37:03,702-Speed 5963.87 samples/sec   Loss 14.1536   LearningRate 0.3937   Epoch: 2   Global Step: 22220   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:37:10,562-Speed 5972.30 samples/sec   Loss 14.1750   LearningRate 0.3936   Epoch: 2   Global Step: 22230   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:37:17,416-Speed 5977.44 samples/sec   Loss 14.2190   LearningRate 0.3936   Epoch: 2   Global Step: 22240   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:37:24,282-Speed 5967.03 samples/sec   Loss 14.1557   LearningRate 0.3935   Epoch: 2   Global Step: 22250   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:37:31,140-Speed 5973.38 samples/sec   Loss 14.1897   LearningRate 0.3935   Epoch: 2   Global Step: 22260   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:37:38,006-Speed 5967.43 samples/sec   Loss 14.1485   LearningRate 0.3935   Epoch: 2   Global Step: 22270   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:37:44,894-Speed 5947.88 samples/sec   Loss 14.1886   LearningRate 0.3934   Epoch: 2   Global Step: 22280   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:37:51,756-Speed 5970.44 samples/sec   Loss 14.2661   LearningRate 0.3934   Epoch: 2   Global Step: 22290   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:37:58,605-Speed 5980.99 samples/sec   Loss 14.1290   LearningRate 0.3933   Epoch: 2   Global Step: 22300   Fp16 Grad Scale: 524288   Required: 36 hours
Training: 2022-01-08 00:38:05,461-Speed 5975.31 samples/sec   Loss 14.2003   LearningRate 0.3933   Epoch: 2   Global Step: 22310   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:38:12,331-Speed 5963.97 samples/sec   Loss 14.1672   LearningRate 0.3932   Epoch: 2   Global Step: 22320   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:38:19,228-Speed 5939.31 samples/sec   Loss 14.1905   LearningRate 0.3932   Epoch: 2   Global Step: 22330   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:38:26,085-Speed 5975.24 samples/sec   Loss 14.1574   LearningRate 0.3932   Epoch: 2   Global Step: 22340   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:38:32,938-Speed 5977.71 samples/sec   Loss 14.1776   LearningRate 0.3931   Epoch: 2   Global Step: 22350   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:38:39,783-Speed 5984.75 samples/sec   Loss 14.1735   LearningRate 0.3931   Epoch: 2   Global Step: 22360   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:38:46,647-Speed 5969.27 samples/sec   Loss 14.1370   LearningRate 0.3930   Epoch: 2   Global Step: 22370   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 00:38:53,487-Speed 5989.01 samples/sec   Loss 14.1593   LearningRate 0.3930   Epoch: 2   Global Step: 22380   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 00:39:00,336-Speed 5981.53 samples/sec   Loss 14.1947   LearningRate 0.3930   Epoch: 2   Global Step: 22390   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 00:39:07,193-Speed 5977.65 samples/sec   Loss 14.1829   LearningRate 0.3929   Epoch: 2   Global Step: 22400   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 00:39:14,044-Speed 5982.30 samples/sec   Loss 14.1760   LearningRate 0.3929   Epoch: 2   Global Step: 22410   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 00:39:20,927-Speed 5952.53 samples/sec   Loss 14.1884   LearningRate 0.3928   Epoch: 2   Global Step: 22420   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 00:39:27,783-Speed 5975.07 samples/sec   Loss 14.1778   LearningRate 0.3928   Epoch: 2   Global Step: 22430   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 00:39:34,652-Speed 5965.03 samples/sec   Loss 14.1320   LearningRate 0.3927   Epoch: 2   Global Step: 22440   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 00:39:41,522-Speed 5962.65 samples/sec   Loss 14.1271   LearningRate 0.3927   Epoch: 2   Global Step: 22450   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 00:39:48,397-Speed 5959.45 samples/sec   Loss 14.1441   LearningRate 0.3927   Epoch: 2   Global Step: 22460   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 00:39:55,262-Speed 5968.19 samples/sec   Loss 14.0453   LearningRate 0.3926   Epoch: 2   Global Step: 22470   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:40:02,126-Speed 5968.41 samples/sec   Loss 14.0934   LearningRate 0.3926   Epoch: 2   Global Step: 22480   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:40:09,003-Speed 5957.15 samples/sec   Loss 14.1610   LearningRate 0.3925   Epoch: 2   Global Step: 22490   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:40:15,873-Speed 5964.11 samples/sec   Loss 14.1890   LearningRate 0.3925   Epoch: 2   Global Step: 22500   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:40:22,747-Speed 5959.60 samples/sec   Loss 14.1203   LearningRate 0.3924   Epoch: 2   Global Step: 22510   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:40:29,609-Speed 5969.52 samples/sec   Loss 14.1778   LearningRate 0.3924   Epoch: 2   Global Step: 22520   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:40:36,467-Speed 5973.85 samples/sec   Loss 14.1334   LearningRate 0.3924   Epoch: 2   Global Step: 22530   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:40:43,331-Speed 5968.31 samples/sec   Loss 14.1505   LearningRate 0.3923   Epoch: 2   Global Step: 22540   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:40:50,173-Speed 5987.25 samples/sec   Loss 14.1639   LearningRate 0.3923   Epoch: 2   Global Step: 22550   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:40:57,053-Speed 5954.83 samples/sec   Loss 14.1391   LearningRate 0.3922   Epoch: 2   Global Step: 22560   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:41:03,912-Speed 5973.32 samples/sec   Loss 14.0848   LearningRate 0.3922   Epoch: 2   Global Step: 22570   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:41:10,785-Speed 5960.57 samples/sec   Loss 14.1164   LearningRate 0.3921   Epoch: 2   Global Step: 22580   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:41:17,632-Speed 5983.26 samples/sec   Loss 14.0412   LearningRate 0.3921   Epoch: 2   Global Step: 22590   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:41:24,483-Speed 5980.46 samples/sec   Loss 14.1738   LearningRate 0.3921   Epoch: 2   Global Step: 22600   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:41:31,356-Speed 5959.93 samples/sec   Loss 14.0443   LearningRate 0.3920   Epoch: 2   Global Step: 22610   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:41:38,207-Speed 5980.28 samples/sec   Loss 14.0466   LearningRate 0.3920   Epoch: 2   Global Step: 22620   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:41:45,071-Speed 5968.40 samples/sec   Loss 14.0778   LearningRate 0.3919   Epoch: 2   Global Step: 22630   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:41:51,974-Speed 5933.83 samples/sec   Loss 14.0713   LearningRate 0.3919   Epoch: 2   Global Step: 22640   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:41:58,839-Speed 5967.68 samples/sec   Loss 14.1006   LearningRate 0.3918   Epoch: 2   Global Step: 22650   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:42:05,722-Speed 5952.72 samples/sec   Loss 14.0768   LearningRate 0.3918   Epoch: 2   Global Step: 22660   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:42:12,573-Speed 5979.28 samples/sec   Loss 14.1881   LearningRate 0.3918   Epoch: 2   Global Step: 22670   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:42:19,434-Speed 5972.90 samples/sec   Loss 14.1648   LearningRate 0.3917   Epoch: 2   Global Step: 22680   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:42:26,302-Speed 5967.16 samples/sec   Loss 14.0715   LearningRate 0.3917   Epoch: 2   Global Step: 22690   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:42:33,170-Speed 5964.91 samples/sec   Loss 14.1922   LearningRate 0.3916   Epoch: 2   Global Step: 22700   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:42:40,040-Speed 5963.46 samples/sec   Loss 14.1711   LearningRate 0.3916   Epoch: 2   Global Step: 22710   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:42:46,894-Speed 5977.34 samples/sec   Loss 14.1372   LearningRate 0.3915   Epoch: 2   Global Step: 22720   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:42:53,738-Speed 5985.10 samples/sec   Loss 14.1419   LearningRate 0.3915   Epoch: 2   Global Step: 22730   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:43:00,604-Speed 5968.40 samples/sec   Loss 14.1381   LearningRate 0.3915   Epoch: 2   Global Step: 22740   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:43:07,459-Speed 5977.34 samples/sec   Loss 14.2863   LearningRate 0.3914   Epoch: 2   Global Step: 22750   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:43:14,319-Speed 5971.68 samples/sec   Loss 14.1946   LearningRate 0.3914   Epoch: 2   Global Step: 22760   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:43:21,171-Speed 5978.51 samples/sec   Loss 14.0913   LearningRate 0.3913   Epoch: 2   Global Step: 22770   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:43:28,020-Speed 5981.96 samples/sec   Loss 14.0679   LearningRate 0.3913   Epoch: 2   Global Step: 22780   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:43:34,864-Speed 5985.94 samples/sec   Loss 14.2658   LearningRate 0.3913   Epoch: 2   Global Step: 22790   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:43:41,708-Speed 5985.52 samples/sec   Loss 14.0832   LearningRate 0.3912   Epoch: 2   Global Step: 22800   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:43:48,575-Speed 5968.19 samples/sec   Loss 14.0574   LearningRate 0.3912   Epoch: 2   Global Step: 22810   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:43:55,437-Speed 5970.58 samples/sec   Loss 14.1184   LearningRate 0.3911   Epoch: 2   Global Step: 22820   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:44:02,303-Speed 5966.75 samples/sec   Loss 14.0254   LearningRate 0.3911   Epoch: 2   Global Step: 22830   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:44:09,163-Speed 5972.81 samples/sec   Loss 14.1197   LearningRate 0.3910   Epoch: 2   Global Step: 22840   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:44:16,031-Speed 5964.24 samples/sec   Loss 14.0599   LearningRate 0.3910   Epoch: 2   Global Step: 22850   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:44:22,903-Speed 5962.40 samples/sec   Loss 14.0116   LearningRate 0.3910   Epoch: 2   Global Step: 22860   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:44:29,770-Speed 5965.75 samples/sec   Loss 14.0778   LearningRate 0.3909   Epoch: 2   Global Step: 22870   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:44:36,634-Speed 5970.74 samples/sec   Loss 14.0903   LearningRate 0.3909   Epoch: 2   Global Step: 22880   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:44:43,486-Speed 5979.08 samples/sec   Loss 14.0374   LearningRate 0.3908   Epoch: 2   Global Step: 22890   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:44:50,372-Speed 5949.84 samples/sec   Loss 14.1685   LearningRate 0.3908   Epoch: 2   Global Step: 22900   Fp16 Grad Scale: 524288   Required: 36 hours
Training: 2022-01-08 00:44:57,226-Speed 5976.59 samples/sec   Loss 14.0940   LearningRate 0.3907   Epoch: 2   Global Step: 22910   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:45:04,083-Speed 5975.40 samples/sec   Loss 14.1220   LearningRate 0.3907   Epoch: 2   Global Step: 22920   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:45:10,947-Speed 5968.79 samples/sec   Loss 14.0751   LearningRate 0.3907   Epoch: 2   Global Step: 22930   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:45:17,795-Speed 5982.94 samples/sec   Loss 14.0276   LearningRate 0.3906   Epoch: 2   Global Step: 22940   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:45:24,661-Speed 5967.35 samples/sec   Loss 14.0342   LearningRate 0.3906   Epoch: 2   Global Step: 22950   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:45:31,533-Speed 5960.90 samples/sec   Loss 14.1023   LearningRate 0.3905   Epoch: 2   Global Step: 22960   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:45:38,379-Speed 5984.63 samples/sec   Loss 14.0893   LearningRate 0.3905   Epoch: 2   Global Step: 22970   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:45:45,227-Speed 5981.96 samples/sec   Loss 14.0414   LearningRate 0.3904   Epoch: 2   Global Step: 22980   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:45:52,078-Speed 5979.96 samples/sec   Loss 14.1573   LearningRate 0.3904   Epoch: 2   Global Step: 22990   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:45:58,923-Speed 5984.20 samples/sec   Loss 14.0091   LearningRate 0.3904   Epoch: 2   Global Step: 23000   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:46:05,781-Speed 5974.42 samples/sec   Loss 13.9939   LearningRate 0.3903   Epoch: 2   Global Step: 23010   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:46:12,630-Speed 5980.52 samples/sec   Loss 14.0397   LearningRate 0.3903   Epoch: 2   Global Step: 23020   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:46:19,494-Speed 5968.83 samples/sec   Loss 14.0265   LearningRate 0.3902   Epoch: 2   Global Step: 23030   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:46:26,356-Speed 5969.77 samples/sec   Loss 14.1000   LearningRate 0.3902   Epoch: 2   Global Step: 23040   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:46:33,236-Speed 5955.40 samples/sec   Loss 14.0987   LearningRate 0.3902   Epoch: 2   Global Step: 23050   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:46:40,092-Speed 5975.96 samples/sec   Loss 14.0380   LearningRate 0.3901   Epoch: 2   Global Step: 23060   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:46:46,948-Speed 5975.25 samples/sec   Loss 14.0538   LearningRate 0.3901   Epoch: 2   Global Step: 23070   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:46:53,803-Speed 5977.77 samples/sec   Loss 13.9945   LearningRate 0.3900   Epoch: 2   Global Step: 23080   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:47:00,663-Speed 5971.34 samples/sec   Loss 14.1052   LearningRate 0.3900   Epoch: 2   Global Step: 23090   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:47:07,532-Speed 5964.64 samples/sec   Loss 14.1976   LearningRate 0.3899   Epoch: 2   Global Step: 23100   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:47:14,398-Speed 5967.12 samples/sec   Loss 14.0025   LearningRate 0.3899   Epoch: 2   Global Step: 23110   Fp16 Grad Scale: 524288   Required: 36 hours
Training: 2022-01-08 00:47:21,245-Speed 5983.49 samples/sec   Loss 14.0998   LearningRate 0.3899   Epoch: 2   Global Step: 23120   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:47:28,093-Speed 5982.65 samples/sec   Loss 14.0715   LearningRate 0.3898   Epoch: 2   Global Step: 23130   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:47:34,999-Speed 5932.08 samples/sec   Loss 14.1028   LearningRate 0.3898   Epoch: 2   Global Step: 23140   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:47:41,873-Speed 5960.01 samples/sec   Loss 14.0421   LearningRate 0.3897   Epoch: 2   Global Step: 23150   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:47:48,795-Speed 5920.37 samples/sec   Loss 14.0352   LearningRate 0.3897   Epoch: 2   Global Step: 23160   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:47:55,660-Speed 5967.35 samples/sec   Loss 14.0011   LearningRate 0.3896   Epoch: 2   Global Step: 23170   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:48:02,562-Speed 5936.29 samples/sec   Loss 14.0740   LearningRate 0.3896   Epoch: 2   Global Step: 23180   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:48:09,406-Speed 5986.04 samples/sec   Loss 14.0124   LearningRate 0.3896   Epoch: 2   Global Step: 23190   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:48:16,249-Speed 5986.75 samples/sec   Loss 14.0447   LearningRate 0.3895   Epoch: 2   Global Step: 23200   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:48:23,104-Speed 5975.86 samples/sec   Loss 14.1016   LearningRate 0.3895   Epoch: 2   Global Step: 23210   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:48:29,947-Speed 5986.96 samples/sec   Loss 14.0516   LearningRate 0.3894   Epoch: 2   Global Step: 23220   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:48:36,807-Speed 5974.85 samples/sec   Loss 14.0936   LearningRate 0.3894   Epoch: 2   Global Step: 23230   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:48:43,680-Speed 5960.21 samples/sec   Loss 14.0500   LearningRate 0.3893   Epoch: 2   Global Step: 23240   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:48:50,549-Speed 5967.56 samples/sec   Loss 14.1053   LearningRate 0.3893   Epoch: 2   Global Step: 23250   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:48:57,409-Speed 5971.86 samples/sec   Loss 14.0067   LearningRate 0.3893   Epoch: 2   Global Step: 23260   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:49:04,267-Speed 5973.87 samples/sec   Loss 14.0054   LearningRate 0.3892   Epoch: 2   Global Step: 23270   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:49:11,144-Speed 5958.66 samples/sec   Loss 14.1754   LearningRate 0.3892   Epoch: 2   Global Step: 23280   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:49:18,004-Speed 5972.35 samples/sec   Loss 14.0135   LearningRate 0.3891   Epoch: 2   Global Step: 23290   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:49:24,841-Speed 5992.02 samples/sec   Loss 14.1041   LearningRate 0.3891   Epoch: 2   Global Step: 23300   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 00:49:31,689-Speed 5982.59 samples/sec   Loss 14.0937   LearningRate 0.3891   Epoch: 2   Global Step: 23310   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 00:49:38,582-Speed 5944.07 samples/sec   Loss 14.1093   LearningRate 0.3890   Epoch: 2   Global Step: 23320   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 00:49:45,542-Speed 5886.17 samples/sec   Loss 14.0455   LearningRate 0.3890   Epoch: 2   Global Step: 23330   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 00:49:52,491-Speed 5895.40 samples/sec   Loss 13.9352   LearningRate 0.3889   Epoch: 2   Global Step: 23340   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 00:49:59,352-Speed 5971.51 samples/sec   Loss 13.9859   LearningRate 0.3889   Epoch: 2   Global Step: 23350   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 00:50:06,242-Speed 5945.85 samples/sec   Loss 13.9780   LearningRate 0.3888   Epoch: 2   Global Step: 23360   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 00:50:13,097-Speed 5975.79 samples/sec   Loss 14.0392   LearningRate 0.3888   Epoch: 2   Global Step: 23370   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 00:50:19,972-Speed 5959.10 samples/sec   Loss 13.9398   LearningRate 0.3888   Epoch: 2   Global Step: 23380   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 00:50:26,846-Speed 5959.98 samples/sec   Loss 14.0097   LearningRate 0.3887   Epoch: 2   Global Step: 23390   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 00:50:33,701-Speed 5976.78 samples/sec   Loss 13.9574   LearningRate 0.3887   Epoch: 2   Global Step: 23400   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:50:40,566-Speed 5967.42 samples/sec   Loss 14.0373   LearningRate 0.3886   Epoch: 2   Global Step: 23410   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:50:47,411-Speed 5984.44 samples/sec   Loss 14.0196   LearningRate 0.3886   Epoch: 2   Global Step: 23420   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:50:54,257-Speed 5986.51 samples/sec   Loss 14.0445   LearningRate 0.3885   Epoch: 2   Global Step: 23430   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:51:01,152-Speed 5941.75 samples/sec   Loss 14.0080   LearningRate 0.3885   Epoch: 2   Global Step: 23440   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:51:08,043-Speed 5945.47 samples/sec   Loss 14.0169   LearningRate 0.3885   Epoch: 2   Global Step: 23450   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:51:14,890-Speed 5983.46 samples/sec   Loss 14.0015   LearningRate 0.3884   Epoch: 2   Global Step: 23460   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:51:21,744-Speed 5977.18 samples/sec   Loss 13.9358   LearningRate 0.3884   Epoch: 2   Global Step: 23470   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:51:28,621-Speed 5956.98 samples/sec   Loss 14.0190   LearningRate 0.3883   Epoch: 2   Global Step: 23480   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:51:35,474-Speed 5978.39 samples/sec   Loss 14.0386   LearningRate 0.3883   Epoch: 2   Global Step: 23490   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:51:42,323-Speed 5981.70 samples/sec   Loss 14.0802   LearningRate 0.3882   Epoch: 2   Global Step: 23500   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:51:49,173-Speed 5980.81 samples/sec   Loss 14.1407   LearningRate 0.3882   Epoch: 2   Global Step: 23510   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:51:56,019-Speed 5983.97 samples/sec   Loss 14.0265   LearningRate 0.3882   Epoch: 2   Global Step: 23520   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:52:02,860-Speed 5989.68 samples/sec   Loss 14.0864   LearningRate 0.3881   Epoch: 2   Global Step: 23530   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:52:09,728-Speed 5965.25 samples/sec   Loss 14.1064   LearningRate 0.3881   Epoch: 2   Global Step: 23540   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:52:16,585-Speed 5974.98 samples/sec   Loss 14.0331   LearningRate 0.3880   Epoch: 2   Global Step: 23550   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:52:23,435-Speed 5981.19 samples/sec   Loss 13.9939   LearningRate 0.3880   Epoch: 2   Global Step: 23560   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:52:30,290-Speed 5976.56 samples/sec   Loss 14.0511   LearningRate 0.3880   Epoch: 2   Global Step: 23570   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:52:37,178-Speed 5947.05 samples/sec   Loss 13.9675   LearningRate 0.3879   Epoch: 2   Global Step: 23580   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:52:44,039-Speed 5970.94 samples/sec   Loss 13.9894   LearningRate 0.3879   Epoch: 2   Global Step: 23590   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:52:50,886-Speed 5983.41 samples/sec   Loss 13.9550   LearningRate 0.3878   Epoch: 2   Global Step: 23600   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:52:57,735-Speed 5981.26 samples/sec   Loss 14.0503   LearningRate 0.3878   Epoch: 2   Global Step: 23610   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:53:04,592-Speed 5975.27 samples/sec   Loss 13.8863   LearningRate 0.3877   Epoch: 2   Global Step: 23620   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:53:11,461-Speed 5964.15 samples/sec   Loss 13.9802   LearningRate 0.3877   Epoch: 2   Global Step: 23630   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:53:18,332-Speed 5961.87 samples/sec   Loss 14.0260   LearningRate 0.3877   Epoch: 2   Global Step: 23640   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:53:25,187-Speed 5976.10 samples/sec   Loss 14.0060   LearningRate 0.3876   Epoch: 2   Global Step: 23650   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:53:32,042-Speed 5976.11 samples/sec   Loss 13.9623   LearningRate 0.3876   Epoch: 2   Global Step: 23660   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:53:38,892-Speed 5981.02 samples/sec   Loss 14.0539   LearningRate 0.3875   Epoch: 2   Global Step: 23670   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:53:45,740-Speed 5983.11 samples/sec   Loss 14.0299   LearningRate 0.3875   Epoch: 2   Global Step: 23680   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:53:52,589-Speed 5981.34 samples/sec   Loss 14.0155   LearningRate 0.3874   Epoch: 2   Global Step: 23690   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:53:59,434-Speed 5984.31 samples/sec   Loss 14.0276   LearningRate 0.3874   Epoch: 2   Global Step: 23700   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:54:06,285-Speed 5979.81 samples/sec   Loss 13.9930   LearningRate 0.3874   Epoch: 2   Global Step: 23710   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:54:13,153-Speed 5965.14 samples/sec   Loss 13.9748   LearningRate 0.3873   Epoch: 2   Global Step: 23720   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:54:20,011-Speed 5973.56 samples/sec   Loss 14.0608   LearningRate 0.3873   Epoch: 2   Global Step: 23730   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:54:26,875-Speed 5968.72 samples/sec   Loss 13.9937   LearningRate 0.3872   Epoch: 2   Global Step: 23740   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:54:33,734-Speed 5972.50 samples/sec   Loss 14.0031   LearningRate 0.3872   Epoch: 2   Global Step: 23750   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:54:40,591-Speed 5974.55 samples/sec   Loss 13.9673   LearningRate 0.3872   Epoch: 2   Global Step: 23760   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:54:47,475-Speed 5950.61 samples/sec   Loss 13.9923   LearningRate 0.3871   Epoch: 2   Global Step: 23770   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:54:54,319-Speed 5985.89 samples/sec   Loss 13.9806   LearningRate 0.3871   Epoch: 2   Global Step: 23780   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:55:01,165-Speed 5984.85 samples/sec   Loss 14.0062   LearningRate 0.3870   Epoch: 2   Global Step: 23790   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:55:08,018-Speed 5977.92 samples/sec   Loss 13.9517   LearningRate 0.3870   Epoch: 2   Global Step: 23800   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:55:14,924-Speed 5932.48 samples/sec   Loss 14.0137   LearningRate 0.3869   Epoch: 2   Global Step: 23810   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:55:21,790-Speed 5966.88 samples/sec   Loss 13.9781   LearningRate 0.3869   Epoch: 2   Global Step: 23820   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:55:28,651-Speed 5971.06 samples/sec   Loss 14.0616   LearningRate 0.3869   Epoch: 2   Global Step: 23830   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:55:35,509-Speed 5973.98 samples/sec   Loss 13.9684   LearningRate 0.3868   Epoch: 2   Global Step: 23840   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:55:42,369-Speed 5972.28 samples/sec   Loss 13.9625   LearningRate 0.3868   Epoch: 2   Global Step: 23850   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:55:49,225-Speed 5975.17 samples/sec   Loss 13.9224   LearningRate 0.3867   Epoch: 2   Global Step: 23860   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:55:56,079-Speed 5979.45 samples/sec   Loss 14.0215   LearningRate 0.3867   Epoch: 2   Global Step: 23870   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:56:02,943-Speed 5968.85 samples/sec   Loss 14.0371   LearningRate 0.3866   Epoch: 2   Global Step: 23880   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:56:09,821-Speed 5956.71 samples/sec   Loss 13.9542   LearningRate 0.3866   Epoch: 2   Global Step: 23890   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:56:16,692-Speed 5962.35 samples/sec   Loss 13.9151   LearningRate 0.3866   Epoch: 2   Global Step: 23900   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:56:23,543-Speed 5980.28 samples/sec   Loss 14.0051   LearningRate 0.3865   Epoch: 2   Global Step: 23910   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:56:30,393-Speed 5980.49 samples/sec   Loss 13.8735   LearningRate 0.3865   Epoch: 2   Global Step: 23920   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:56:37,279-Speed 5949.36 samples/sec   Loss 13.9737   LearningRate 0.3864   Epoch: 2   Global Step: 23930   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:56:44,134-Speed 5980.99 samples/sec   Loss 13.9383   LearningRate 0.3864   Epoch: 2   Global Step: 23940   Fp16 Grad Scale: 524288   Required: 36 hours
Training: 2022-01-08 00:56:50,983-Speed 5981.75 samples/sec   Loss 13.8777   LearningRate 0.3864   Epoch: 2   Global Step: 23950   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:56:57,852-Speed 5963.66 samples/sec   Loss 13.9657   LearningRate 0.3863   Epoch: 2   Global Step: 23960   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:57:04,716-Speed 5971.23 samples/sec   Loss 13.8737   LearningRate 0.3863   Epoch: 2   Global Step: 23970   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:57:11,567-Speed 5979.74 samples/sec   Loss 13.9538   LearningRate 0.3862   Epoch: 2   Global Step: 23980   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:57:18,442-Speed 5965.75 samples/sec   Loss 14.0127   LearningRate 0.3862   Epoch: 2   Global Step: 23990   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:57:25,297-Speed 5976.43 samples/sec   Loss 13.9917   LearningRate 0.3861   Epoch: 2   Global Step: 24000   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:57:32,148-Speed 5980.37 samples/sec   Loss 13.8718   LearningRate 0.3861   Epoch: 2   Global Step: 24010   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:57:38,998-Speed 5980.38 samples/sec   Loss 13.9299   LearningRate 0.3861   Epoch: 2   Global Step: 24020   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:57:45,883-Speed 5950.24 samples/sec   Loss 13.9139   LearningRate 0.3860   Epoch: 2   Global Step: 24030   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:57:52,758-Speed 5959.20 samples/sec   Loss 13.9320   LearningRate 0.3860   Epoch: 2   Global Step: 24040   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:57:59,624-Speed 5966.47 samples/sec   Loss 13.9672   LearningRate 0.3859   Epoch: 2   Global Step: 24050   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:58:06,476-Speed 5979.49 samples/sec   Loss 14.0092   LearningRate 0.3859   Epoch: 2   Global Step: 24060   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:58:13,357-Speed 5952.70 samples/sec   Loss 13.9779   LearningRate 0.3858   Epoch: 2   Global Step: 24070   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:58:20,245-Speed 5948.94 samples/sec   Loss 13.9416   LearningRate 0.3858   Epoch: 2   Global Step: 24080   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:58:27,095-Speed 5981.23 samples/sec   Loss 14.0203   LearningRate 0.3858   Epoch: 2   Global Step: 24090   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:58:33,962-Speed 5965.72 samples/sec   Loss 13.9513   LearningRate 0.3857   Epoch: 2   Global Step: 24100   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:58:40,824-Speed 5970.45 samples/sec   Loss 13.9184   LearningRate 0.3857   Epoch: 2   Global Step: 24110   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:58:47,690-Speed 5966.56 samples/sec   Loss 13.9361   LearningRate 0.3856   Epoch: 2   Global Step: 24120   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:58:54,558-Speed 5965.25 samples/sec   Loss 14.0023   LearningRate 0.3856   Epoch: 2   Global Step: 24130   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:59:01,407-Speed 5981.08 samples/sec   Loss 13.9258   LearningRate 0.3856   Epoch: 2   Global Step: 24140   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:59:08,249-Speed 5990.47 samples/sec   Loss 13.9046   LearningRate 0.3855   Epoch: 2   Global Step: 24150   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:59:15,136-Speed 5955.84 samples/sec   Loss 13.8546   LearningRate 0.3855   Epoch: 2   Global Step: 24160   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:59:22,022-Speed 5984.48 samples/sec   Loss 13.9088   LearningRate 0.3854   Epoch: 2   Global Step: 24170   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:59:28,871-Speed 5980.90 samples/sec   Loss 13.8380   LearningRate 0.3854   Epoch: 2   Global Step: 24180   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:59:35,719-Speed 5982.54 samples/sec   Loss 13.8560   LearningRate 0.3853   Epoch: 2   Global Step: 24190   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 00:59:42,563-Speed 5985.03 samples/sec   Loss 14.0121   LearningRate 0.3853   Epoch: 2   Global Step: 24200   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:59:49,427-Speed 5968.31 samples/sec   Loss 14.0285   LearningRate 0.3853   Epoch: 2   Global Step: 24210   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 00:59:56,282-Speed 5977.04 samples/sec   Loss 14.0161   LearningRate 0.3852   Epoch: 2   Global Step: 24220   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:00:03,141-Speed 5973.09 samples/sec   Loss 13.9133   LearningRate 0.3852   Epoch: 2   Global Step: 24230   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:00:09,984-Speed 5986.40 samples/sec   Loss 13.9593   LearningRate 0.3851   Epoch: 2   Global Step: 24240   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:00:16,839-Speed 5976.65 samples/sec   Loss 13.8857   LearningRate 0.3851   Epoch: 2   Global Step: 24250   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:00:23,682-Speed 5986.44 samples/sec   Loss 13.9293   LearningRate 0.3850   Epoch: 2   Global Step: 24260   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:00:30,557-Speed 5958.84 samples/sec   Loss 13.9162   LearningRate 0.3850   Epoch: 2   Global Step: 24270   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:00:37,412-Speed 5976.90 samples/sec   Loss 13.9691   LearningRate 0.3850   Epoch: 2   Global Step: 24280   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:00:44,259-Speed 5982.60 samples/sec   Loss 13.9539   LearningRate 0.3849   Epoch: 2   Global Step: 24290   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:00:51,112-Speed 5978.27 samples/sec   Loss 13.9704   LearningRate 0.3849   Epoch: 2   Global Step: 24300   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:00:57,969-Speed 5974.29 samples/sec   Loss 13.9181   LearningRate 0.3848   Epoch: 2   Global Step: 24310   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:01:04,858-Speed 5946.87 samples/sec   Loss 13.9254   LearningRate 0.3848   Epoch: 2   Global Step: 24320   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:01:11,711-Speed 5977.26 samples/sec   Loss 13.9765   LearningRate 0.3848   Epoch: 2   Global Step: 24330   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:01:18,569-Speed 5973.75 samples/sec   Loss 13.9073   LearningRate 0.3847   Epoch: 2   Global Step: 24340   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:01:25,425-Speed 5975.37 samples/sec   Loss 13.8822   LearningRate 0.3847   Epoch: 2   Global Step: 24350   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:01:32,276-Speed 5979.72 samples/sec   Loss 13.9290   LearningRate 0.3846   Epoch: 2   Global Step: 24360   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:01:39,222-Speed 5900.35 samples/sec   Loss 13.9210   LearningRate 0.3846   Epoch: 2   Global Step: 24370   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:01:46,078-Speed 5975.75 samples/sec   Loss 13.9001   LearningRate 0.3845   Epoch: 2   Global Step: 24380   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:01:52,931-Speed 5977.34 samples/sec   Loss 13.9704   LearningRate 0.3845   Epoch: 2   Global Step: 24390   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:01:59,772-Speed 5991.38 samples/sec   Loss 13.9966   LearningRate 0.3845   Epoch: 2   Global Step: 24400   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:02:06,633-Speed 5970.79 samples/sec   Loss 13.9597   LearningRate 0.3844   Epoch: 2   Global Step: 24410   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:02:13,506-Speed 5961.09 samples/sec   Loss 13.8545   LearningRate 0.3844   Epoch: 2   Global Step: 24420   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:02:20,377-Speed 5962.87 samples/sec   Loss 13.9128   LearningRate 0.3843   Epoch: 2   Global Step: 24430   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:02:27,232-Speed 5976.19 samples/sec   Loss 13.8978   LearningRate 0.3843   Epoch: 2   Global Step: 24440   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:02:34,066-Speed 5995.48 samples/sec   Loss 13.8966   LearningRate 0.3842   Epoch: 2   Global Step: 24450   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 01:02:40,915-Speed 5981.18 samples/sec   Loss 13.9321   LearningRate 0.3842   Epoch: 2   Global Step: 24460   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 01:02:47,771-Speed 5976.19 samples/sec   Loss 13.9218   LearningRate 0.3842   Epoch: 2   Global Step: 24470   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 01:02:54,630-Speed 5972.20 samples/sec   Loss 13.9966   LearningRate 0.3841   Epoch: 2   Global Step: 24480   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 01:03:01,496-Speed 5967.14 samples/sec   Loss 13.9001   LearningRate 0.3841   Epoch: 2   Global Step: 24490   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 01:03:08,346-Speed 5980.66 samples/sec   Loss 13.8621   LearningRate 0.3840   Epoch: 2   Global Step: 24500   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 01:03:15,195-Speed 5980.96 samples/sec   Loss 13.8579   LearningRate 0.3840   Epoch: 2   Global Step: 24510   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 01:03:22,046-Speed 5980.42 samples/sec   Loss 13.9412   LearningRate 0.3840   Epoch: 2   Global Step: 24520   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 01:03:28,901-Speed 5975.52 samples/sec   Loss 13.8765   LearningRate 0.3839   Epoch: 2   Global Step: 24530   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 01:03:35,751-Speed 5981.52 samples/sec   Loss 13.9083   LearningRate 0.3839   Epoch: 2   Global Step: 24540   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-01-08 01:03:42,605-Speed 5977.71 samples/sec   Loss 13.8435   LearningRate 0.3838   Epoch: 2   Global Step: 24550   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:03:49,459-Speed 5976.28 samples/sec   Loss 13.8170   LearningRate 0.3838   Epoch: 2   Global Step: 24560   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:03:56,314-Speed 5976.59 samples/sec   Loss 13.8885   LearningRate 0.3837   Epoch: 2   Global Step: 24570   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:04:03,163-Speed 5982.08 samples/sec   Loss 13.8344   LearningRate 0.3837   Epoch: 2   Global Step: 24580   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:04:10,015-Speed 5979.18 samples/sec   Loss 13.8733   LearningRate 0.3837   Epoch: 2   Global Step: 24590   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:04:16,896-Speed 5953.80 samples/sec   Loss 13.8753   LearningRate 0.3836   Epoch: 2   Global Step: 24600   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:04:23,759-Speed 5969.16 samples/sec   Loss 13.9488   LearningRate 0.3836   Epoch: 2   Global Step: 24610   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:04:30,629-Speed 5963.47 samples/sec   Loss 13.8709   LearningRate 0.3835   Epoch: 2   Global Step: 24620   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:04:37,482-Speed 5978.37 samples/sec   Loss 13.7750   LearningRate 0.3835   Epoch: 2   Global Step: 24630   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:04:44,332-Speed 5980.86 samples/sec   Loss 13.8968   LearningRate 0.3834   Epoch: 2   Global Step: 24640   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:04:51,178-Speed 5985.16 samples/sec   Loss 13.8664   LearningRate 0.3834   Epoch: 2   Global Step: 24650   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:04:58,045-Speed 5965.69 samples/sec   Loss 13.8703   LearningRate 0.3834   Epoch: 2   Global Step: 24660   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:05:04,922-Speed 5957.75 samples/sec   Loss 13.8455   LearningRate 0.3833   Epoch: 2   Global Step: 24670   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:05:11,780-Speed 5974.20 samples/sec   Loss 13.8968   LearningRate 0.3833   Epoch: 2   Global Step: 24680   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:05:18,636-Speed 5975.25 samples/sec   Loss 13.8740   LearningRate 0.3832   Epoch: 2   Global Step: 24690   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:05:25,502-Speed 5967.41 samples/sec   Loss 13.9227   LearningRate 0.3832   Epoch: 2   Global Step: 24700   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:05:32,369-Speed 5966.22 samples/sec   Loss 13.8361   LearningRate 0.3832   Epoch: 2   Global Step: 24710   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:05:39,246-Speed 5957.20 samples/sec   Loss 13.8741   LearningRate 0.3831   Epoch: 2   Global Step: 24720   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:05:46,107-Speed 5970.49 samples/sec   Loss 13.7809   LearningRate 0.3831   Epoch: 2   Global Step: 24730   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:05:52,976-Speed 5964.33 samples/sec   Loss 13.9455   LearningRate 0.3830   Epoch: 2   Global Step: 24740   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:05:59,841-Speed 5968.08 samples/sec   Loss 13.8800   LearningRate 0.3830   Epoch: 2   Global Step: 24750   Fp16 Grad Scale: 524288   Required: 36 hours
Training: 2022-01-08 01:06:06,708-Speed 5966.21 samples/sec   Loss 13.9415   LearningRate 0.3829   Epoch: 2   Global Step: 24760   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:06:13,564-Speed 5975.64 samples/sec   Loss 13.8512   LearningRate 0.3829   Epoch: 2   Global Step: 24770   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:06:20,422-Speed 5973.17 samples/sec   Loss 13.8832   LearningRate 0.3829   Epoch: 2   Global Step: 24780   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:06:27,334-Speed 5927.67 samples/sec   Loss 13.9719   LearningRate 0.3828   Epoch: 2   Global Step: 24790   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:06:34,189-Speed 5975.90 samples/sec   Loss 13.8622   LearningRate 0.3828   Epoch: 2   Global Step: 24800   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:06:41,056-Speed 5966.02 samples/sec   Loss 13.7881   LearningRate 0.3827   Epoch: 2   Global Step: 24810   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:06:47,911-Speed 5976.49 samples/sec   Loss 13.8267   LearningRate 0.3827   Epoch: 2   Global Step: 24820   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:06:54,753-Speed 5987.47 samples/sec   Loss 13.7402   LearningRate 0.3827   Epoch: 2   Global Step: 24830   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:07:01,709-Speed 5891.89 samples/sec   Loss 13.9169   LearningRate 0.3826   Epoch: 2   Global Step: 24840   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:07:08,577-Speed 5965.63 samples/sec   Loss 13.8370   LearningRate 0.3826   Epoch: 2   Global Step: 24850   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:07:15,714-Speed 5739.55 samples/sec   Loss 13.8606   LearningRate 0.3825   Epoch: 2   Global Step: 24860   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:07:22,557-Speed 5986.87 samples/sec   Loss 13.7941   LearningRate 0.3825   Epoch: 2   Global Step: 24870   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:07:29,410-Speed 5978.69 samples/sec   Loss 13.9272   LearningRate 0.3824   Epoch: 2   Global Step: 24880   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:07:36,277-Speed 5965.91 samples/sec   Loss 13.8284   LearningRate 0.3824   Epoch: 2   Global Step: 24890   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:07:43,130-Speed 5977.91 samples/sec   Loss 13.8517   LearningRate 0.3824   Epoch: 2   Global Step: 24900   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:07:49,999-Speed 5963.63 samples/sec   Loss 13.8566   LearningRate 0.3823   Epoch: 2   Global Step: 24910   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:07:56,850-Speed 5980.25 samples/sec   Loss 13.7817   LearningRate 0.3823   Epoch: 2   Global Step: 24920   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:08:03,730-Speed 5953.60 samples/sec   Loss 13.7324   LearningRate 0.3822   Epoch: 2   Global Step: 24930   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:08:10,590-Speed 5972.35 samples/sec   Loss 13.8816   LearningRate 0.3822   Epoch: 2   Global Step: 24940   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:08:17,448-Speed 5973.37 samples/sec   Loss 13.8050   LearningRate 0.3821   Epoch: 2   Global Step: 24950   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:08:24,323-Speed 5958.55 samples/sec   Loss 13.8838   LearningRate 0.3821   Epoch: 2   Global Step: 24960   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:08:31,193-Speed 5963.42 samples/sec   Loss 13.8795   LearningRate 0.3821   Epoch: 2   Global Step: 24970   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:08:38,058-Speed 5967.32 samples/sec   Loss 13.7811   LearningRate 0.3820   Epoch: 2   Global Step: 24980   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:08:44,923-Speed 5970.40 samples/sec   Loss 13.8022   LearningRate 0.3820   Epoch: 2   Global Step: 24990   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:08:51,809-Speed 5949.24 samples/sec   Loss 13.7938   LearningRate 0.3819   Epoch: 2   Global Step: 25000   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:09:19,057-[lfw][25000]XNorm: 22.584027
Training: 2022-01-08 01:09:19,058-[lfw][25000]Accuracy-Flip: 0.99650+-0.00252
Training: 2022-01-08 01:09:19,058-[lfw][25000]Accuracy-Highest: 0.99650
Training: 2022-01-08 01:09:50,553-[cfp_fp][25000]XNorm: 19.887459
Training: 2022-01-08 01:09:50,554-[cfp_fp][25000]Accuracy-Flip: 0.96957+-0.00823
Training: 2022-01-08 01:09:50,555-[cfp_fp][25000]Accuracy-Highest: 0.96957
Training: 2022-01-08 01:10:17,800-[agedb_30][25000]XNorm: 21.802798
Training: 2022-01-08 01:10:17,801-[agedb_30][25000]Accuracy-Flip: 0.95400+-0.01070
Training: 2022-01-08 01:10:17,802-[agedb_30][25000]Accuracy-Highest: 0.95400
Training: 2022-01-08 01:10:24,676-Speed 441.07 samples/sec   Loss 13.7654   LearningRate 0.3819   Epoch: 2   Global Step: 25010   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:10:31,513-Speed 5992.25 samples/sec   Loss 13.7905   LearningRate 0.3819   Epoch: 2   Global Step: 25020   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:10:38,348-Speed 5994.22 samples/sec   Loss 13.8612   LearningRate 0.3818   Epoch: 2   Global Step: 25030   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:10:45,224-Speed 5958.01 samples/sec   Loss 13.9032   LearningRate 0.3818   Epoch: 2   Global Step: 25040   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:10:52,082-Speed 5976.48 samples/sec   Loss 13.8590   LearningRate 0.3817   Epoch: 2   Global Step: 25050   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:10:58,932-Speed 5980.32 samples/sec   Loss 13.8154   LearningRate 0.3817   Epoch: 2   Global Step: 25060   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:11:05,799-Speed 5965.59 samples/sec   Loss 13.9317   LearningRate 0.3816   Epoch: 2   Global Step: 25070   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:11:12,679-Speed 5955.00 samples/sec   Loss 13.8109   LearningRate 0.3816   Epoch: 2   Global Step: 25080   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:11:26,647-Speed 2932.76 samples/sec   Loss 13.8697   LearningRate 0.3816   Epoch: 2   Global Step: 25090   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:11:33,480-Speed 5995.28 samples/sec   Loss 13.8030   LearningRate 0.3815   Epoch: 2   Global Step: 25100   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:11:40,325-Speed 5985.17 samples/sec   Loss 13.7888   LearningRate 0.3815   Epoch: 2   Global Step: 25110   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:11:47,174-Speed 5981.50 samples/sec   Loss 13.7858   LearningRate 0.3814   Epoch: 2   Global Step: 25120   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:11:54,021-Speed 5983.62 samples/sec   Loss 13.8480   LearningRate 0.3814   Epoch: 2   Global Step: 25130   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:12:00,892-Speed 5962.76 samples/sec   Loss 13.8866   LearningRate 0.3814   Epoch: 2   Global Step: 25140   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:12:07,754-Speed 5969.50 samples/sec   Loss 13.7054   LearningRate 0.3813   Epoch: 2   Global Step: 25150   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:12:14,603-Speed 5983.54 samples/sec   Loss 13.8872   LearningRate 0.3813   Epoch: 2   Global Step: 25160   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:12:21,459-Speed 5976.18 samples/sec   Loss 13.7366   LearningRate 0.3812   Epoch: 2   Global Step: 25170   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:12:28,315-Speed 5975.28 samples/sec   Loss 13.7538   LearningRate 0.3812   Epoch: 2   Global Step: 25180   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:12:35,170-Speed 5976.50 samples/sec   Loss 13.7342   LearningRate 0.3811   Epoch: 2   Global Step: 25190   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:12:42,021-Speed 5979.92 samples/sec   Loss 13.8342   LearningRate 0.3811   Epoch: 2   Global Step: 25200   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:12:48,879-Speed 5975.05 samples/sec   Loss 13.7796   LearningRate 0.3811   Epoch: 2   Global Step: 25210   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:12:55,731-Speed 5978.64 samples/sec   Loss 13.8360   LearningRate 0.3810   Epoch: 2   Global Step: 25220   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:13:02,579-Speed 5982.15 samples/sec   Loss 13.8828   LearningRate 0.3810   Epoch: 2   Global Step: 25230   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:13:09,423-Speed 5986.42 samples/sec   Loss 13.7699   LearningRate 0.3809   Epoch: 2   Global Step: 25240   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:13:16,274-Speed 5980.09 samples/sec   Loss 13.7515   LearningRate 0.3809   Epoch: 2   Global Step: 25250   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:13:23,150-Speed 5958.22 samples/sec   Loss 13.8051   LearningRate 0.3809   Epoch: 2   Global Step: 25260   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:13:30,001-Speed 5980.62 samples/sec   Loss 13.8497   LearningRate 0.3808   Epoch: 2   Global Step: 25270   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:13:36,844-Speed 5986.95 samples/sec   Loss 13.8222   LearningRate 0.3808   Epoch: 2   Global Step: 25280   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:13:43,689-Speed 5984.25 samples/sec   Loss 13.8502   LearningRate 0.3807   Epoch: 2   Global Step: 25290   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:13:50,561-Speed 5961.99 samples/sec   Loss 13.8057   LearningRate 0.3807   Epoch: 2   Global Step: 25300   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:13:57,412-Speed 5980.11 samples/sec   Loss 13.7627   LearningRate 0.3806   Epoch: 2   Global Step: 25310   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:14:04,266-Speed 5976.66 samples/sec   Loss 13.8250   LearningRate 0.3806   Epoch: 2   Global Step: 25320   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:14:11,117-Speed 5980.29 samples/sec   Loss 13.7984   LearningRate 0.3806   Epoch: 2   Global Step: 25330   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:14:17,992-Speed 5958.33 samples/sec   Loss 13.6955   LearningRate 0.3805   Epoch: 2   Global Step: 25340   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:14:24,835-Speed 5988.68 samples/sec   Loss 13.7030   LearningRate 0.3805   Epoch: 2   Global Step: 25350   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:14:31,699-Speed 5968.53 samples/sec   Loss 13.8835   LearningRate 0.3804   Epoch: 2   Global Step: 25360   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:14:38,538-Speed 5990.11 samples/sec   Loss 13.8349   LearningRate 0.3804   Epoch: 2   Global Step: 25370   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:14:45,379-Speed 5988.08 samples/sec   Loss 13.8715   LearningRate 0.3804   Epoch: 2   Global Step: 25380   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:14:52,235-Speed 5975.56 samples/sec   Loss 13.7877   LearningRate 0.3803   Epoch: 2   Global Step: 25390   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:14:59,080-Speed 5985.32 samples/sec   Loss 13.7411   LearningRate 0.3803   Epoch: 2   Global Step: 25400   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:15:05,923-Speed 5986.23 samples/sec   Loss 13.8143   LearningRate 0.3802   Epoch: 2   Global Step: 25410   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:15:12,774-Speed 5980.07 samples/sec   Loss 13.7165   LearningRate 0.3802   Epoch: 2   Global Step: 25420   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:15:19,627-Speed 5978.21 samples/sec   Loss 13.8177   LearningRate 0.3801   Epoch: 2   Global Step: 25430   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:15:26,479-Speed 5978.62 samples/sec   Loss 13.7081   LearningRate 0.3801   Epoch: 2   Global Step: 25440   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:15:33,327-Speed 5982.53 samples/sec   Loss 13.8284   LearningRate 0.3801   Epoch: 2   Global Step: 25450   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:15:40,186-Speed 5973.38 samples/sec   Loss 13.7661   LearningRate 0.3800   Epoch: 2   Global Step: 25460   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:15:47,045-Speed 5972.81 samples/sec   Loss 13.8034   LearningRate 0.3800   Epoch: 2   Global Step: 25470   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:15:53,896-Speed 5979.93 samples/sec   Loss 13.7365   LearningRate 0.3799   Epoch: 2   Global Step: 25480   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:16:00,751-Speed 5976.13 samples/sec   Loss 13.7591   LearningRate 0.3799   Epoch: 2   Global Step: 25490   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:16:07,607-Speed 5975.53 samples/sec   Loss 13.7670   LearningRate 0.3798   Epoch: 2   Global Step: 25500   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:16:14,443-Speed 5992.56 samples/sec   Loss 13.7296   LearningRate 0.3798   Epoch: 2   Global Step: 25510   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:16:21,297-Speed 5976.81 samples/sec   Loss 13.7742   LearningRate 0.3798   Epoch: 2   Global Step: 25520   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:16:28,145-Speed 5982.03 samples/sec   Loss 13.8177   LearningRate 0.3797   Epoch: 2   Global Step: 25530   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:16:34,995-Speed 5980.95 samples/sec   Loss 13.7881   LearningRate 0.3797   Epoch: 2   Global Step: 25540   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:16:41,847-Speed 5978.88 samples/sec   Loss 13.7583   LearningRate 0.3796   Epoch: 2   Global Step: 25550   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:16:48,725-Speed 5956.70 samples/sec   Loss 13.6832   LearningRate 0.3796   Epoch: 2   Global Step: 25560   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:16:55,571-Speed 5984.01 samples/sec   Loss 13.7041   LearningRate 0.3796   Epoch: 2   Global Step: 25570   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:17:02,412-Speed 5988.06 samples/sec   Loss 13.8197   LearningRate 0.3795   Epoch: 2   Global Step: 25580   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:17:09,258-Speed 5985.06 samples/sec   Loss 13.7730   LearningRate 0.3795   Epoch: 2   Global Step: 25590   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:17:16,116-Speed 5972.95 samples/sec   Loss 13.8609   LearningRate 0.3794   Epoch: 2   Global Step: 25600   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:17:22,965-Speed 5980.94 samples/sec   Loss 13.7668   LearningRate 0.3794   Epoch: 2   Global Step: 25610   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:17:29,812-Speed 5983.69 samples/sec   Loss 13.8410   LearningRate 0.3793   Epoch: 2   Global Step: 25620   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:17:36,656-Speed 5985.83 samples/sec   Loss 13.7376   LearningRate 0.3793   Epoch: 2   Global Step: 25630   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:17:43,556-Speed 5938.06 samples/sec   Loss 13.7392   LearningRate 0.3793   Epoch: 2   Global Step: 25640   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:17:50,415-Speed 5975.32 samples/sec   Loss 13.8135   LearningRate 0.3792   Epoch: 2   Global Step: 25650   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:17:57,285-Speed 5964.65 samples/sec   Loss 13.8126   LearningRate 0.3792   Epoch: 2   Global Step: 25660   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:18:04,158-Speed 5960.07 samples/sec   Loss 13.7715   LearningRate 0.3791   Epoch: 2   Global Step: 25670   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:18:11,045-Speed 5948.65 samples/sec   Loss 13.7694   LearningRate 0.3791   Epoch: 2   Global Step: 25680   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:18:17,903-Speed 5974.29 samples/sec   Loss 13.7822   LearningRate 0.3791   Epoch: 2   Global Step: 25690   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:18:24,753-Speed 5980.95 samples/sec   Loss 13.7723   LearningRate 0.3790   Epoch: 2   Global Step: 25700   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:18:31,608-Speed 5976.12 samples/sec   Loss 13.7219   LearningRate 0.3790   Epoch: 2   Global Step: 25710   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:18:38,471-Speed 5970.13 samples/sec   Loss 13.7251   LearningRate 0.3789   Epoch: 2   Global Step: 25720   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:18:45,325-Speed 5977.11 samples/sec   Loss 13.7227   LearningRate 0.3789   Epoch: 2   Global Step: 25730   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:18:52,175-Speed 5980.07 samples/sec   Loss 13.7346   LearningRate 0.3788   Epoch: 2   Global Step: 25740   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:18:59,065-Speed 5946.25 samples/sec   Loss 13.7497   LearningRate 0.3788   Epoch: 2   Global Step: 25750   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:19:05,924-Speed 5973.64 samples/sec   Loss 13.7231   LearningRate 0.3788   Epoch: 2   Global Step: 25760   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:19:12,799-Speed 5959.14 samples/sec   Loss 13.8171   LearningRate 0.3787   Epoch: 2   Global Step: 25770   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:19:19,666-Speed 5965.53 samples/sec   Loss 13.8529   LearningRate 0.3787   Epoch: 2   Global Step: 25780   Fp16 Grad Scale: 524288   Required: 36 hours
Training: 2022-01-08 01:19:26,562-Speed 5940.57 samples/sec   Loss 13.6829   LearningRate 0.3786   Epoch: 2   Global Step: 25790   Fp16 Grad Scale: 524288   Required: 36 hours
Training: 2022-01-08 01:19:33,404-Speed 5988.48 samples/sec   Loss 13.7335   LearningRate 0.3786   Epoch: 2   Global Step: 25800   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:19:40,290-Speed 5948.90 samples/sec   Loss 13.6739   LearningRate 0.3786   Epoch: 2   Global Step: 25810   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:19:47,149-Speed 5975.46 samples/sec   Loss 13.6900   LearningRate 0.3785   Epoch: 2   Global Step: 25820   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:19:54,012-Speed 5968.85 samples/sec   Loss 13.6428   LearningRate 0.3785   Epoch: 2   Global Step: 25830   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:20:00,880-Speed 5964.67 samples/sec   Loss 13.7507   LearningRate 0.3784   Epoch: 2   Global Step: 25840   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:20:07,772-Speed 5944.87 samples/sec   Loss 13.8341   LearningRate 0.3784   Epoch: 2   Global Step: 25850   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:20:14,638-Speed 5966.70 samples/sec   Loss 13.7054   LearningRate 0.3783   Epoch: 2   Global Step: 25860   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:20:22,094-Speed 5494.77 samples/sec   Loss 13.7126   LearningRate 0.3783   Epoch: 2   Global Step: 25870   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:20:28,950-Speed 5975.82 samples/sec   Loss 13.7429   LearningRate 0.3783   Epoch: 2   Global Step: 25880   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:20:35,823-Speed 5960.46 samples/sec   Loss 13.7527   LearningRate 0.3782   Epoch: 2   Global Step: 25890   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:20:42,716-Speed 5943.83 samples/sec   Loss 13.6022   LearningRate 0.3782   Epoch: 2   Global Step: 25900   Fp16 Grad Scale: 524288   Required: 36 hours
Training: 2022-01-08 01:20:49,571-Speed 5975.95 samples/sec   Loss 13.7720   LearningRate 0.3781   Epoch: 2   Global Step: 25910   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:20:56,424-Speed 5978.63 samples/sec   Loss 13.7792   LearningRate 0.3781   Epoch: 2   Global Step: 25920   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:21:03,278-Speed 5976.28 samples/sec   Loss 13.6985   LearningRate 0.3781   Epoch: 2   Global Step: 25930   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:21:10,154-Speed 5957.51 samples/sec   Loss 13.7622   LearningRate 0.3780   Epoch: 2   Global Step: 25940   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:21:17,011-Speed 5975.01 samples/sec   Loss 13.6394   LearningRate 0.3780   Epoch: 2   Global Step: 25950   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:21:23,866-Speed 5976.44 samples/sec   Loss 13.7463   LearningRate 0.3779   Epoch: 2   Global Step: 25960   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:21:30,737-Speed 5962.09 samples/sec   Loss 13.6804   LearningRate 0.3779   Epoch: 2   Global Step: 25970   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:21:37,620-Speed 5952.50 samples/sec   Loss 13.6947   LearningRate 0.3778   Epoch: 2   Global Step: 25980   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:21:44,480-Speed 5971.66 samples/sec   Loss 13.7531   LearningRate 0.3778   Epoch: 2   Global Step: 25990   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:21:51,331-Speed 5980.53 samples/sec   Loss 13.6580   LearningRate 0.3778   Epoch: 2   Global Step: 26000   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:21:58,188-Speed 5974.12 samples/sec   Loss 13.7681   LearningRate 0.3777   Epoch: 2   Global Step: 26010   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:22:05,037-Speed 5981.08 samples/sec   Loss 13.5717   LearningRate 0.3777   Epoch: 2   Global Step: 26020   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:22:11,889-Speed 5978.92 samples/sec   Loss 13.7537   LearningRate 0.3776   Epoch: 2   Global Step: 26030   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:22:18,750-Speed 5970.90 samples/sec   Loss 13.6724   LearningRate 0.3776   Epoch: 2   Global Step: 26040   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-01-08 01:22:25,651-Speed 5937.17 samples/sec   Loss 13.7839   LearningRate 0.3776   Epoch: 2   Global Step: 26050   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:22:32,511-Speed 5972.09 samples/sec   Loss 13.7636   LearningRate 0.3775   Epoch: 2   Global Step: 26060   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:22:39,381-Speed 5965.15 samples/sec   Loss 13.7235   LearningRate 0.3775   Epoch: 2   Global Step: 26070   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:22:46,237-Speed 5975.06 samples/sec   Loss 13.6640   LearningRate 0.3774   Epoch: 2   Global Step: 26080   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:22:53,091-Speed 5978.16 samples/sec   Loss 13.7696   LearningRate 0.3774   Epoch: 2   Global Step: 26090   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:22:59,973-Speed 5952.19 samples/sec   Loss 13.7369   LearningRate 0.3773   Epoch: 2   Global Step: 26100   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:23:06,866-Speed 5943.79 samples/sec   Loss 13.7038   LearningRate 0.3773   Epoch: 2   Global Step: 26110   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:23:13,720-Speed 5977.39 samples/sec   Loss 13.7467   LearningRate 0.3773   Epoch: 2   Global Step: 26120   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:23:20,576-Speed 5974.89 samples/sec   Loss 13.6879   LearningRate 0.3772   Epoch: 2   Global Step: 26130   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:23:27,472-Speed 5944.33 samples/sec   Loss 13.6172   LearningRate 0.3772   Epoch: 2   Global Step: 26140   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:23:34,312-Speed 5989.65 samples/sec   Loss 13.6925   LearningRate 0.3771   Epoch: 2   Global Step: 26150   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:23:41,185-Speed 5960.88 samples/sec   Loss 13.7113   LearningRate 0.3771   Epoch: 2   Global Step: 26160   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:23:48,041-Speed 5975.93 samples/sec   Loss 13.7046   LearningRate 0.3771   Epoch: 2   Global Step: 26170   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:23:54,896-Speed 5976.87 samples/sec   Loss 13.7839   LearningRate 0.3770   Epoch: 2   Global Step: 26180   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:24:01,768-Speed 5961.31 samples/sec   Loss 13.6862   LearningRate 0.3770   Epoch: 2   Global Step: 26190   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:24:08,632-Speed 5968.79 samples/sec   Loss 13.6857   LearningRate 0.3769   Epoch: 2   Global Step: 26200   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:24:15,553-Speed 5919.05 samples/sec   Loss 13.6483   LearningRate 0.3769   Epoch: 2   Global Step: 26210   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-01-08 01:24:22,463-Speed 5930.09 samples/sec   Loss 13.7143   LearningRate 0.3768   Epoch: 2   Global Step: 26220   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:24:29,380-Speed 5923.16 samples/sec   Loss 13.7059   LearningRate 0.3768   Epoch: 2   Global Step: 26230   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:24:36,283-Speed 5934.18 samples/sec   Loss 13.6342   LearningRate 0.3768   Epoch: 2   Global Step: 26240   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:24:43,168-Speed 5950.38 samples/sec   Loss 13.7483   LearningRate 0.3767   Epoch: 2   Global Step: 26250   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:24:50,014-Speed 5984.41 samples/sec   Loss 13.7296   LearningRate 0.3767   Epoch: 2   Global Step: 26260   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:24:56,862-Speed 5982.28 samples/sec   Loss 13.7665   LearningRate 0.3766   Epoch: 2   Global Step: 26270   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:25:03,707-Speed 5984.81 samples/sec   Loss 13.7506   LearningRate 0.3766   Epoch: 2   Global Step: 26280   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:25:10,571-Speed 5968.08 samples/sec   Loss 13.7810   LearningRate 0.3766   Epoch: 2   Global Step: 26290   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:25:17,447-Speed 5958.12 samples/sec   Loss 13.6429   LearningRate 0.3765   Epoch: 2   Global Step: 26300   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:25:24,300-Speed 5979.18 samples/sec   Loss 13.6361   LearningRate 0.3765   Epoch: 2   Global Step: 26310   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:25:33,187-Speed 5978.81 samples/sec   Loss 13.6743   LearningRate 0.3764   Epoch: 2   Global Step: 26320   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:25:40,040-Speed 5978.05 samples/sec   Loss 13.7593   LearningRate 0.3764   Epoch: 2   Global Step: 26330   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:25:46,908-Speed 5964.23 samples/sec   Loss 13.7227   LearningRate 0.3763   Epoch: 2   Global Step: 26340   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:25:53,774-Speed 5967.58 samples/sec   Loss 13.7956   LearningRate 0.3763   Epoch: 2   Global Step: 26350   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:26:00,647-Speed 5960.48 samples/sec   Loss 13.6381   LearningRate 0.3763   Epoch: 2   Global Step: 26360   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:26:07,515-Speed 5965.31 samples/sec   Loss 13.6688   LearningRate 0.3762   Epoch: 2   Global Step: 26370   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:26:14,365-Speed 5980.57 samples/sec   Loss 13.7191   LearningRate 0.3762   Epoch: 2   Global Step: 26380   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:26:21,218-Speed 5977.88 samples/sec   Loss 13.7467   LearningRate 0.3761   Epoch: 2   Global Step: 26390   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:26:28,104-Speed 5949.69 samples/sec   Loss 13.8379   LearningRate 0.3761   Epoch: 2   Global Step: 26400   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:26:34,953-Speed 5981.45 samples/sec   Loss 13.6658   LearningRate 0.3761   Epoch: 2   Global Step: 26410   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:26:41,825-Speed 5961.48 samples/sec   Loss 13.6465   LearningRate 0.3760   Epoch: 2   Global Step: 26420   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:26:48,693-Speed 5965.68 samples/sec   Loss 13.6978   LearningRate 0.3760   Epoch: 2   Global Step: 26430   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:26:55,542-Speed 5980.67 samples/sec   Loss 13.7322   LearningRate 0.3759   Epoch: 2   Global Step: 26440   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:27:02,394-Speed 5978.87 samples/sec   Loss 13.6750   LearningRate 0.3759   Epoch: 2   Global Step: 26450   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:27:09,255-Speed 5973.63 samples/sec   Loss 13.7441   LearningRate 0.3758   Epoch: 2   Global Step: 26460   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:27:16,117-Speed 5970.80 samples/sec   Loss 13.7223   LearningRate 0.3758   Epoch: 2   Global Step: 26470   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:27:22,988-Speed 5961.85 samples/sec   Loss 13.6644   LearningRate 0.3758   Epoch: 2   Global Step: 26480   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:27:29,844-Speed 5975.65 samples/sec   Loss 13.5712   LearningRate 0.3757   Epoch: 2   Global Step: 26490   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:27:36,701-Speed 5975.17 samples/sec   Loss 13.6912   LearningRate 0.3757   Epoch: 2   Global Step: 26500   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:27:43,563-Speed 5970.91 samples/sec   Loss 13.6566   LearningRate 0.3756   Epoch: 2   Global Step: 26510   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:27:50,417-Speed 5976.70 samples/sec   Loss 13.6912   LearningRate 0.3756   Epoch: 2   Global Step: 26520   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:27:57,301-Speed 5951.15 samples/sec   Loss 13.6387   LearningRate 0.3756   Epoch: 2   Global Step: 26530   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:28:04,173-Speed 5961.36 samples/sec   Loss 13.5748   LearningRate 0.3755   Epoch: 2   Global Step: 26540   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:28:11,040-Speed 5966.28 samples/sec   Loss 13.6015   LearningRate 0.3755   Epoch: 2   Global Step: 26550   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:28:17,890-Speed 5980.76 samples/sec   Loss 13.5567   LearningRate 0.3754   Epoch: 2   Global Step: 26560   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:28:24,750-Speed 5971.89 samples/sec   Loss 13.6832   LearningRate 0.3754   Epoch: 2   Global Step: 26570   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:28:31,612-Speed 5970.01 samples/sec   Loss 13.6644   LearningRate 0.3754   Epoch: 2   Global Step: 26580   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:28:38,471-Speed 5973.62 samples/sec   Loss 13.5560   LearningRate 0.3753   Epoch: 2   Global Step: 26590   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:28:45,338-Speed 5966.22 samples/sec   Loss 13.7581   LearningRate 0.3753   Epoch: 2   Global Step: 26600   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:28:52,201-Speed 5968.49 samples/sec   Loss 13.7183   LearningRate 0.3752   Epoch: 2   Global Step: 26610   Fp16 Grad Scale: 524288   Required: 35 hours
Training: 2022-01-08 01:28:59,079-Speed 5958.26 samples/sec   Loss 13.5831   LearningRate 0.3752   Epoch: 2   Global Step: 26620   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:29:05,942-Speed 5969.69 samples/sec   Loss 13.7186   LearningRate 0.3751   Epoch: 2   Global Step: 26630   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:29:12,800-Speed 5973.56 samples/sec   Loss 13.6775   LearningRate 0.3751   Epoch: 2   Global Step: 26640   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:29:19,654-Speed 5977.25 samples/sec   Loss 13.6092   LearningRate 0.3751   Epoch: 2   Global Step: 26650   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:29:26,516-Speed 5970.31 samples/sec   Loss 13.6159   LearningRate 0.3750   Epoch: 2   Global Step: 26660   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:29:33,375-Speed 5973.45 samples/sec   Loss 13.7130   LearningRate 0.3750   Epoch: 2   Global Step: 26670   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:29:40,253-Speed 5960.20 samples/sec   Loss 13.6725   LearningRate 0.3749   Epoch: 2   Global Step: 26680   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:29:47,108-Speed 5976.07 samples/sec   Loss 13.6422   LearningRate 0.3749   Epoch: 2   Global Step: 26690   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:29:53,964-Speed 5975.15 samples/sec   Loss 13.5937   LearningRate 0.3749   Epoch: 2   Global Step: 26700   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:30:00,825-Speed 5971.18 samples/sec   Loss 13.6038   LearningRate 0.3748   Epoch: 2   Global Step: 26710   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:30:07,705-Speed 5954.59 samples/sec   Loss 13.6693   LearningRate 0.3748   Epoch: 2   Global Step: 26720   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:30:14,584-Speed 5956.51 samples/sec   Loss 13.6837   LearningRate 0.3747   Epoch: 2   Global Step: 26730   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:30:21,552-Speed 5879.76 samples/sec   Loss 13.5883   LearningRate 0.3747   Epoch: 2   Global Step: 26740   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:30:28,442-Speed 5946.26 samples/sec   Loss 13.6603   LearningRate 0.3746   Epoch: 2   Global Step: 26750   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:30:35,310-Speed 5964.64 samples/sec   Loss 13.6101   LearningRate 0.3746   Epoch: 2   Global Step: 26760   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:30:42,162-Speed 5979.19 samples/sec   Loss 13.6753   LearningRate 0.3746   Epoch: 2   Global Step: 26770   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:30:49,033-Speed 5962.79 samples/sec   Loss 13.6268   LearningRate 0.3745   Epoch: 2   Global Step: 26780   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:30:55,920-Speed 5948.76 samples/sec   Loss 13.6313   LearningRate 0.3745   Epoch: 2   Global Step: 26790   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:31:02,778-Speed 5973.43 samples/sec   Loss 13.6017   LearningRate 0.3744   Epoch: 2   Global Step: 26800   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:31:09,632-Speed 5977.23 samples/sec   Loss 13.6329   LearningRate 0.3744   Epoch: 2   Global Step: 26810   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:31:16,486-Speed 5977.81 samples/sec   Loss 13.6434   LearningRate 0.3744   Epoch: 2   Global Step: 26820   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:31:23,338-Speed 5979.68 samples/sec   Loss 13.5879   LearningRate 0.3743   Epoch: 2   Global Step: 26830   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:31:30,199-Speed 5970.64 samples/sec   Loss 13.6624   LearningRate 0.3743   Epoch: 2   Global Step: 26840   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:31:37,065-Speed 5968.58 samples/sec   Loss 13.6062   LearningRate 0.3742   Epoch: 2   Global Step: 26850   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:31:43,919-Speed 5977.09 samples/sec   Loss 13.6581   LearningRate 0.3742   Epoch: 2   Global Step: 26860   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:31:50,784-Speed 5968.21 samples/sec   Loss 13.6614   LearningRate 0.3741   Epoch: 2   Global Step: 26870   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:31:57,659-Speed 5958.51 samples/sec   Loss 13.6304   LearningRate 0.3741   Epoch: 2   Global Step: 26880   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:32:04,542-Speed 5951.06 samples/sec   Loss 13.6447   LearningRate 0.3741   Epoch: 2   Global Step: 26890   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:32:11,401-Speed 5972.51 samples/sec   Loss 13.6507   LearningRate 0.3740   Epoch: 2   Global Step: 26900   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:32:18,265-Speed 5970.33 samples/sec   Loss 13.5275   LearningRate 0.3740   Epoch: 2   Global Step: 26910   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:32:25,124-Speed 5972.68 samples/sec   Loss 13.5761   LearningRate 0.3739   Epoch: 2   Global Step: 26920   Fp16 Grad Scale: 524288   Required: 35 hours
Training: 2022-01-08 01:32:31,992-Speed 5964.72 samples/sec   Loss 13.5697   LearningRate 0.3739   Epoch: 2   Global Step: 26930   Fp16 Grad Scale: 524288   Required: 35 hours
Training: 2022-01-08 01:32:38,844-Speed 5979.18 samples/sec   Loss 13.6483   LearningRate 0.3739   Epoch: 2   Global Step: 26940   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:32:45,695-Speed 5979.02 samples/sec   Loss 13.4729   LearningRate 0.3738   Epoch: 2   Global Step: 26950   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:32:52,546-Speed 5980.06 samples/sec   Loss 13.5383   LearningRate 0.3738   Epoch: 2   Global Step: 26960   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:32:59,400-Speed 5976.75 samples/sec   Loss 13.6330   LearningRate 0.3737   Epoch: 2   Global Step: 26970   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:33:06,255-Speed 5977.19 samples/sec   Loss 13.6260   LearningRate 0.3737   Epoch: 2   Global Step: 26980   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:33:13,084-Speed 5998.28 samples/sec   Loss 13.5772   LearningRate 0.3737   Epoch: 2   Global Step: 26990   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 01:33:19,935-Speed 5980.41 samples/sec   Loss 13.6540   LearningRate 0.3736   Epoch: 2   Global Step: 27000   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 01:33:26,782-Speed 5982.67 samples/sec   Loss 13.6177   LearningRate 0.3736   Epoch: 2   Global Step: 27010   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 01:33:33,656-Speed 5960.48 samples/sec   Loss 13.6007   LearningRate 0.3735   Epoch: 2   Global Step: 27020   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 01:33:40,511-Speed 5975.91 samples/sec   Loss 13.5825   LearningRate 0.3735   Epoch: 2   Global Step: 27030   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 01:33:47,358-Speed 5983.35 samples/sec   Loss 13.5947   LearningRate 0.3734   Epoch: 2   Global Step: 27040   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 01:33:54,214-Speed 5975.79 samples/sec   Loss 13.6225   LearningRate 0.3734   Epoch: 2   Global Step: 27050   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 01:34:01,072-Speed 5973.66 samples/sec   Loss 13.5882   LearningRate 0.3734   Epoch: 2   Global Step: 27060   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 01:34:07,921-Speed 5982.41 samples/sec   Loss 13.7022   LearningRate 0.3733   Epoch: 2   Global Step: 27070   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 01:34:14,779-Speed 5974.15 samples/sec   Loss 13.6660   LearningRate 0.3733   Epoch: 2   Global Step: 27080   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 01:34:21,626-Speed 5982.79 samples/sec   Loss 13.5882   LearningRate 0.3732   Epoch: 2   Global Step: 27090   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:34:28,493-Speed 5966.53 samples/sec   Loss 13.5939   LearningRate 0.3732   Epoch: 2   Global Step: 27100   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:34:35,353-Speed 5971.99 samples/sec   Loss 13.6947   LearningRate 0.3732   Epoch: 2   Global Step: 27110   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:34:42,204-Speed 5979.63 samples/sec   Loss 13.5980   LearningRate 0.3731   Epoch: 2   Global Step: 27120   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:34:49,063-Speed 5972.76 samples/sec   Loss 13.5703   LearningRate 0.3731   Epoch: 2   Global Step: 27130   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:34:55,921-Speed 5973.80 samples/sec   Loss 13.5803   LearningRate 0.3730   Epoch: 2   Global Step: 27140   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:35:02,785-Speed 5968.57 samples/sec   Loss 13.6238   LearningRate 0.3730   Epoch: 2   Global Step: 27150   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:35:09,639-Speed 5977.68 samples/sec   Loss 13.6382   LearningRate 0.3729   Epoch: 2   Global Step: 27160   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:35:16,484-Speed 5983.93 samples/sec   Loss 13.6828   LearningRate 0.3729   Epoch: 2   Global Step: 27170   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:35:23,362-Speed 5957.03 samples/sec   Loss 13.5840   LearningRate 0.3729   Epoch: 2   Global Step: 27180   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:35:30,218-Speed 5975.53 samples/sec   Loss 13.5437   LearningRate 0.3728   Epoch: 2   Global Step: 27190   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:35:37,089-Speed 5961.42 samples/sec   Loss 13.6370   LearningRate 0.3728   Epoch: 2   Global Step: 27200   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:35:43,955-Speed 5967.05 samples/sec   Loss 13.6725   LearningRate 0.3727   Epoch: 2   Global Step: 27210   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:35:50,790-Speed 5993.95 samples/sec   Loss 13.6890   LearningRate 0.3727   Epoch: 2   Global Step: 27220   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 01:35:57,657-Speed 5965.49 samples/sec   Loss 13.5851   LearningRate 0.3727   Epoch: 2   Global Step: 27230   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 01:36:04,513-Speed 5976.12 samples/sec   Loss 13.5831   LearningRate 0.3726   Epoch: 2   Global Step: 27240   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 01:36:11,678-Speed 5717.63 samples/sec   Loss 13.6335   LearningRate 0.3726   Epoch: 2   Global Step: 27250   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 01:36:18,568-Speed 5945.63 samples/sec   Loss 13.5923   LearningRate 0.3725   Epoch: 2   Global Step: 27260   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 01:36:25,439-Speed 5962.97 samples/sec   Loss 13.6661   LearningRate 0.3725   Epoch: 2   Global Step: 27270   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 01:36:32,304-Speed 5968.07 samples/sec   Loss 13.5430   LearningRate 0.3725   Epoch: 2   Global Step: 27280   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 01:36:39,183-Speed 5954.90 samples/sec   Loss 13.5820   LearningRate 0.3724   Epoch: 2   Global Step: 27290   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 01:36:46,037-Speed 5977.97 samples/sec   Loss 13.5875   LearningRate 0.3724   Epoch: 2   Global Step: 27300   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 01:36:52,920-Speed 5966.02 samples/sec   Loss 13.5783   LearningRate 0.3723   Epoch: 2   Global Step: 27310   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 01:36:59,793-Speed 5961.45 samples/sec   Loss 13.5867   LearningRate 0.3723   Epoch: 2   Global Step: 27320   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:37:06,652-Speed 5972.71 samples/sec   Loss 13.5410   LearningRate 0.3722   Epoch: 2   Global Step: 27330   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:37:13,527-Speed 5959.99 samples/sec   Loss 13.6728   LearningRate 0.3722   Epoch: 2   Global Step: 27340   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:37:20,399-Speed 5961.71 samples/sec   Loss 13.6069   LearningRate 0.3722   Epoch: 2   Global Step: 27350   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:37:27,288-Speed 5947.47 samples/sec   Loss 13.5980   LearningRate 0.3721   Epoch: 2   Global Step: 27360   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:37:34,140-Speed 5978.86 samples/sec   Loss 13.5655   LearningRate 0.3721   Epoch: 2   Global Step: 27370   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:37:41,006-Speed 5966.86 samples/sec   Loss 13.5608   LearningRate 0.3720   Epoch: 2   Global Step: 27380   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:37:47,864-Speed 5973.83 samples/sec   Loss 13.5247   LearningRate 0.3720   Epoch: 2   Global Step: 27390   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:37:54,760-Speed 5944.63 samples/sec   Loss 13.5848   LearningRate 0.3720   Epoch: 2   Global Step: 27400   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:38:01,627-Speed 5965.74 samples/sec   Loss 13.5632   LearningRate 0.3719   Epoch: 2   Global Step: 27410   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:38:08,488-Speed 5970.46 samples/sec   Loss 13.5995   LearningRate 0.3719   Epoch: 2   Global Step: 27420   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:38:15,387-Speed 5938.92 samples/sec   Loss 13.5097   LearningRate 0.3718   Epoch: 2   Global Step: 27430   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:38:22,235-Speed 5982.16 samples/sec   Loss 13.5352   LearningRate 0.3718   Epoch: 2   Global Step: 27440   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:38:29,098-Speed 5971.46 samples/sec   Loss 13.4810   LearningRate 0.3717   Epoch: 2   Global Step: 27450   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:38:35,959-Speed 5974.27 samples/sec   Loss 13.5736   LearningRate 0.3717   Epoch: 2   Global Step: 27460   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:38:42,829-Speed 5963.17 samples/sec   Loss 13.5299   LearningRate 0.3717   Epoch: 2   Global Step: 27470   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:38:49,699-Speed 5962.83 samples/sec   Loss 13.5963   LearningRate 0.3716   Epoch: 2   Global Step: 27480   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:38:56,571-Speed 5961.42 samples/sec   Loss 13.5331   LearningRate 0.3716   Epoch: 2   Global Step: 27490   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:39:03,425-Speed 5977.94 samples/sec   Loss 13.5437   LearningRate 0.3715   Epoch: 2   Global Step: 27500   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:39:10,399-Speed 5874.40 samples/sec   Loss 13.6184   LearningRate 0.3715   Epoch: 2   Global Step: 27510   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:39:17,278-Speed 5958.76 samples/sec   Loss 13.6004   LearningRate 0.3715   Epoch: 2   Global Step: 27520   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:39:24,129-Speed 5981.35 samples/sec   Loss 13.5138   LearningRate 0.3714   Epoch: 2   Global Step: 27530   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:39:30,974-Speed 5986.81 samples/sec   Loss 13.5524   LearningRate 0.3714   Epoch: 2   Global Step: 27540   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:39:37,845-Speed 5962.41 samples/sec   Loss 13.6089   LearningRate 0.3713   Epoch: 2   Global Step: 27550   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:39:44,703-Speed 5973.31 samples/sec   Loss 13.5777   LearningRate 0.3713   Epoch: 2   Global Step: 27560   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:39:51,559-Speed 5976.06 samples/sec   Loss 13.5756   LearningRate 0.3713   Epoch: 2   Global Step: 27570   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:39:58,417-Speed 5973.01 samples/sec   Loss 13.5295   LearningRate 0.3712   Epoch: 2   Global Step: 27580   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:40:05,299-Speed 5953.57 samples/sec   Loss 13.5208   LearningRate 0.3712   Epoch: 2   Global Step: 27590   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:40:12,149-Speed 5980.63 samples/sec   Loss 13.5552   LearningRate 0.3711   Epoch: 2   Global Step: 27600   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:40:19,004-Speed 5976.21 samples/sec   Loss 13.5900   LearningRate 0.3711   Epoch: 2   Global Step: 27610   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:40:25,857-Speed 5978.28 samples/sec   Loss 13.5339   LearningRate 0.3710   Epoch: 2   Global Step: 27620   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:40:32,707-Speed 5980.40 samples/sec   Loss 13.5503   LearningRate 0.3710   Epoch: 2   Global Step: 27630   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:40:39,560-Speed 5977.67 samples/sec   Loss 13.5785   LearningRate 0.3710   Epoch: 2   Global Step: 27640   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:40:46,396-Speed 5993.44 samples/sec   Loss 13.5139   LearningRate 0.3709   Epoch: 2   Global Step: 27650   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:40:53,234-Speed 5990.79 samples/sec   Loss 13.5656   LearningRate 0.3709   Epoch: 2   Global Step: 27660   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:41:00,075-Speed 5989.99 samples/sec   Loss 13.5723   LearningRate 0.3708   Epoch: 2   Global Step: 27670   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:41:06,942-Speed 5966.00 samples/sec   Loss 13.4718   LearningRate 0.3708   Epoch: 2   Global Step: 27680   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:41:13,794-Speed 5979.06 samples/sec   Loss 13.5156   LearningRate 0.3708   Epoch: 2   Global Step: 27690   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:41:20,655-Speed 5971.54 samples/sec   Loss 13.6010   LearningRate 0.3707   Epoch: 2   Global Step: 27700   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:41:27,512-Speed 5976.60 samples/sec   Loss 13.6054   LearningRate 0.3707   Epoch: 2   Global Step: 27710   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:41:34,360-Speed 5982.77 samples/sec   Loss 13.6347   LearningRate 0.3706   Epoch: 2   Global Step: 27720   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:41:41,234-Speed 5961.84 samples/sec   Loss 13.6214   LearningRate 0.3706   Epoch: 2   Global Step: 27730   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:41:48,087-Speed 5977.99 samples/sec   Loss 13.5388   LearningRate 0.3706   Epoch: 2   Global Step: 27740   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:41:54,950-Speed 5969.09 samples/sec   Loss 13.5262   LearningRate 0.3705   Epoch: 2   Global Step: 27750   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:42:01,822-Speed 5961.24 samples/sec   Loss 13.5633   LearningRate 0.3705   Epoch: 2   Global Step: 27760   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:42:08,675-Speed 5978.28 samples/sec   Loss 13.5856   LearningRate 0.3704   Epoch: 2   Global Step: 27770   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:42:15,521-Speed 5984.48 samples/sec   Loss 13.5831   LearningRate 0.3704   Epoch: 2   Global Step: 27780   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:42:22,393-Speed 5961.29 samples/sec   Loss 13.5050   LearningRate 0.3703   Epoch: 2   Global Step: 27790   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:42:29,257-Speed 5968.79 samples/sec   Loss 13.4935   LearningRate 0.3703   Epoch: 2   Global Step: 27800   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:42:36,110-Speed 5978.33 samples/sec   Loss 13.6546   LearningRate 0.3703   Epoch: 2   Global Step: 27810   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:42:42,959-Speed 5981.96 samples/sec   Loss 13.5166   LearningRate 0.3702   Epoch: 2   Global Step: 27820   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:42:49,798-Speed 5989.35 samples/sec   Loss 13.5519   LearningRate 0.3702   Epoch: 2   Global Step: 27830   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:42:56,644-Speed 5984.77 samples/sec   Loss 13.4997   LearningRate 0.3701   Epoch: 2   Global Step: 27840   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:43:03,521-Speed 5956.30 samples/sec   Loss 13.4868   LearningRate 0.3701   Epoch: 2   Global Step: 27850   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:43:10,371-Speed 5980.80 samples/sec   Loss 13.4742   LearningRate 0.3701   Epoch: 2   Global Step: 27860   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:43:17,218-Speed 5983.41 samples/sec   Loss 13.5114   LearningRate 0.3700   Epoch: 2   Global Step: 27870   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:43:24,080-Speed 5970.18 samples/sec   Loss 13.4435   LearningRate 0.3700   Epoch: 2   Global Step: 27880   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:43:30,949-Speed 5964.55 samples/sec   Loss 13.5445   LearningRate 0.3699   Epoch: 2   Global Step: 27890   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:43:37,808-Speed 5972.50 samples/sec   Loss 13.5522   LearningRate 0.3699   Epoch: 2   Global Step: 27900   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:43:44,649-Speed 5988.49 samples/sec   Loss 13.4796   LearningRate 0.3698   Epoch: 2   Global Step: 27910   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:43:51,512-Speed 5970.00 samples/sec   Loss 13.4651   LearningRate 0.3698   Epoch: 2   Global Step: 27920   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:43:58,362-Speed 5979.93 samples/sec   Loss 13.4314   LearningRate 0.3698   Epoch: 2   Global Step: 27930   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:44:05,216-Speed 5977.08 samples/sec   Loss 13.5116   LearningRate 0.3697   Epoch: 2   Global Step: 27940   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:44:12,070-Speed 5977.39 samples/sec   Loss 13.5071   LearningRate 0.3697   Epoch: 2   Global Step: 27950   Fp16 Grad Scale: 524288   Required: 35 hours
Training: 2022-01-08 01:44:18,918-Speed 5982.26 samples/sec   Loss 13.4833   LearningRate 0.3696   Epoch: 2   Global Step: 27960   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:44:25,765-Speed 5982.83 samples/sec   Loss 13.6038   LearningRate 0.3696   Epoch: 2   Global Step: 27970   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:44:32,620-Speed 5976.27 samples/sec   Loss 13.5886   LearningRate 0.3696   Epoch: 2   Global Step: 27980   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:44:39,507-Speed 5949.49 samples/sec   Loss 13.5425   LearningRate 0.3695   Epoch: 2   Global Step: 27990   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:44:46,361-Speed 5976.60 samples/sec   Loss 13.4736   LearningRate 0.3695   Epoch: 2   Global Step: 28000   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:44:53,289-Speed 5913.29 samples/sec   Loss 13.4979   LearningRate 0.3694   Epoch: 2   Global Step: 28010   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:45:00,154-Speed 5967.38 samples/sec   Loss 13.4849   LearningRate 0.3694   Epoch: 2   Global Step: 28020   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:45:07,023-Speed 5964.70 samples/sec   Loss 13.4291   LearningRate 0.3694   Epoch: 2   Global Step: 28030   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:45:13,877-Speed 5976.49 samples/sec   Loss 13.4423   LearningRate 0.3693   Epoch: 2   Global Step: 28040   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:45:20,733-Speed 5975.60 samples/sec   Loss 13.5695   LearningRate 0.3693   Epoch: 2   Global Step: 28050   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:45:27,587-Speed 5977.67 samples/sec   Loss 13.4650   LearningRate 0.3692   Epoch: 2   Global Step: 28060   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:45:34,429-Speed 5986.78 samples/sec   Loss 13.4988   LearningRate 0.3692   Epoch: 2   Global Step: 28070   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:45:41,282-Speed 5978.53 samples/sec   Loss 13.5946   LearningRate 0.3691   Epoch: 2   Global Step: 28080   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:45:48,150-Speed 5965.50 samples/sec   Loss 13.4639   LearningRate 0.3691   Epoch: 2   Global Step: 28090   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:45:55,010-Speed 5971.61 samples/sec   Loss 13.5149   LearningRate 0.3691   Epoch: 2   Global Step: 28100   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:46:01,864-Speed 5977.11 samples/sec   Loss 13.7073   LearningRate 0.3690   Epoch: 2   Global Step: 28110   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:46:08,741-Speed 5956.99 samples/sec   Loss 13.5509   LearningRate 0.3690   Epoch: 2   Global Step: 28120   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:46:15,604-Speed 5969.81 samples/sec   Loss 13.4748   LearningRate 0.3689   Epoch: 2   Global Step: 28130   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:46:22,468-Speed 5969.00 samples/sec   Loss 13.4289   LearningRate 0.3689   Epoch: 2   Global Step: 28140   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:46:29,323-Speed 5976.13 samples/sec   Loss 13.5339   LearningRate 0.3689   Epoch: 2   Global Step: 28150   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:46:36,276-Speed 5892.93 samples/sec   Loss 13.4037   LearningRate 0.3688   Epoch: 2   Global Step: 28160   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:46:43,211-Speed 5908.04 samples/sec   Loss 13.4124   LearningRate 0.3688   Epoch: 2   Global Step: 28170   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:46:50,078-Speed 5969.27 samples/sec   Loss 13.4899   LearningRate 0.3687   Epoch: 2   Global Step: 28180   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:46:56,944-Speed 5966.82 samples/sec   Loss 13.5522   LearningRate 0.3687   Epoch: 2   Global Step: 28190   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:47:03,813-Speed 5964.38 samples/sec   Loss 13.5106   LearningRate 0.3687   Epoch: 2   Global Step: 28200   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:47:10,690-Speed 5956.88 samples/sec   Loss 13.5581   LearningRate 0.3686   Epoch: 2   Global Step: 28210   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:47:17,550-Speed 5972.29 samples/sec   Loss 13.4211   LearningRate 0.3686   Epoch: 2   Global Step: 28220   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:47:24,410-Speed 5972.13 samples/sec   Loss 13.4514   LearningRate 0.3685   Epoch: 2   Global Step: 28230   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:47:31,273-Speed 5969.49 samples/sec   Loss 13.4742   LearningRate 0.3685   Epoch: 2   Global Step: 28240   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:47:38,150-Speed 5959.84 samples/sec   Loss 13.5331   LearningRate 0.3684   Epoch: 2   Global Step: 28250   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:47:45,010-Speed 5971.41 samples/sec   Loss 13.5776   LearningRate 0.3684   Epoch: 2   Global Step: 28260   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:47:51,872-Speed 5972.05 samples/sec   Loss 13.4974   LearningRate 0.3684   Epoch: 2   Global Step: 28270   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:47:58,730-Speed 5973.92 samples/sec   Loss 13.5702   LearningRate 0.3683   Epoch: 2   Global Step: 28280   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:48:05,594-Speed 5967.98 samples/sec   Loss 13.4359   LearningRate 0.3683   Epoch: 2   Global Step: 28290   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:48:12,454-Speed 5972.29 samples/sec   Loss 13.4110   LearningRate 0.3682   Epoch: 2   Global Step: 28300   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:48:19,317-Speed 5969.29 samples/sec   Loss 13.4307   LearningRate 0.3682   Epoch: 2   Global Step: 28310   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:48:26,170-Speed 5978.04 samples/sec   Loss 13.4532   LearningRate 0.3682   Epoch: 2   Global Step: 28320   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:48:33,022-Speed 5979.32 samples/sec   Loss 13.4922   LearningRate 0.3681   Epoch: 2   Global Step: 28330   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:48:39,870-Speed 5982.36 samples/sec   Loss 13.5094   LearningRate 0.3681   Epoch: 2   Global Step: 28340   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:48:46,731-Speed 5970.93 samples/sec   Loss 13.5004   LearningRate 0.3680   Epoch: 2   Global Step: 28350   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:48:53,586-Speed 5977.37 samples/sec   Loss 13.4173   LearningRate 0.3680   Epoch: 2   Global Step: 28360   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:49:00,438-Speed 5978.84 samples/sec   Loss 13.5056   LearningRate 0.3680   Epoch: 2   Global Step: 28370   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:49:07,297-Speed 5973.27 samples/sec   Loss 13.4855   LearningRate 0.3679   Epoch: 2   Global Step: 28380   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:49:14,281-Speed 5866.21 samples/sec   Loss 13.4152   LearningRate 0.3679   Epoch: 2   Global Step: 28390   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:49:21,274-Speed 5858.54 samples/sec   Loss 13.4446   LearningRate 0.3678   Epoch: 2   Global Step: 28400   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:49:28,124-Speed 5981.52 samples/sec   Loss 13.5148   LearningRate 0.3678   Epoch: 2   Global Step: 28410   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:49:34,983-Speed 5972.35 samples/sec   Loss 13.3985   LearningRate 0.3678   Epoch: 2   Global Step: 28420   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:49:41,840-Speed 5975.20 samples/sec   Loss 13.4651   LearningRate 0.3677   Epoch: 2   Global Step: 28430   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:49:48,712-Speed 5960.98 samples/sec   Loss 13.5179   LearningRate 0.3677   Epoch: 2   Global Step: 28440   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:49:55,579-Speed 5965.85 samples/sec   Loss 13.4557   LearningRate 0.3676   Epoch: 2   Global Step: 28450   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:50:02,424-Speed 5985.17 samples/sec   Loss 13.3985   LearningRate 0.3676   Epoch: 2   Global Step: 28460   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 01:50:09,281-Speed 5974.47 samples/sec   Loss 13.5578   LearningRate 0.3675   Epoch: 2   Global Step: 28470   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 01:50:16,154-Speed 5960.80 samples/sec   Loss 13.5788   LearningRate 0.3675   Epoch: 2   Global Step: 28480   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 01:50:23,003-Speed 5981.86 samples/sec   Loss 13.3677   LearningRate 0.3675   Epoch: 2   Global Step: 28490   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 01:50:29,852-Speed 5981.57 samples/sec   Loss 13.4260   LearningRate 0.3674   Epoch: 2   Global Step: 28500   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 01:50:36,813-Speed 5885.26 samples/sec   Loss 13.4868   LearningRate 0.3674   Epoch: 2   Global Step: 28510   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 01:50:43,734-Speed 5919.34 samples/sec   Loss 13.4074   LearningRate 0.3673   Epoch: 2   Global Step: 28520   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 01:50:50,642-Speed 5930.35 samples/sec   Loss 13.5420   LearningRate 0.3673   Epoch: 2   Global Step: 28530   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 01:50:57,492-Speed 5981.01 samples/sec   Loss 13.5446   LearningRate 0.3673   Epoch: 2   Global Step: 28540   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 01:51:04,350-Speed 5973.22 samples/sec   Loss 13.5001   LearningRate 0.3672   Epoch: 2   Global Step: 28550   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 01:51:11,206-Speed 5975.35 samples/sec   Loss 13.4289   LearningRate 0.3672   Epoch: 2   Global Step: 28560   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:51:18,050-Speed 5986.73 samples/sec   Loss 13.5487   LearningRate 0.3671   Epoch: 2   Global Step: 28570   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:51:24,930-Speed 5954.26 samples/sec   Loss 13.4304   LearningRate 0.3671   Epoch: 2   Global Step: 28580   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:51:31,800-Speed 5963.71 samples/sec   Loss 13.3935   LearningRate 0.3671   Epoch: 2   Global Step: 28590   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:51:38,642-Speed 5987.33 samples/sec   Loss 13.3776   LearningRate 0.3670   Epoch: 2   Global Step: 28600   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:51:45,506-Speed 5970.77 samples/sec   Loss 13.4224   LearningRate 0.3670   Epoch: 2   Global Step: 28610   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:51:52,353-Speed 5984.15 samples/sec   Loss 13.5348   LearningRate 0.3669   Epoch: 2   Global Step: 28620   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:51:59,206-Speed 5977.82 samples/sec   Loss 13.4169   LearningRate 0.3669   Epoch: 2   Global Step: 28630   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:52:06,046-Speed 5989.68 samples/sec   Loss 13.4382   LearningRate 0.3668   Epoch: 2   Global Step: 28640   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:52:12,898-Speed 5978.88 samples/sec   Loss 13.3681   LearningRate 0.3668   Epoch: 2   Global Step: 28650   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:52:19,762-Speed 5968.96 samples/sec   Loss 13.4814   LearningRate 0.3668   Epoch: 2   Global Step: 28660   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:52:26,620-Speed 5973.19 samples/sec   Loss 13.5864   LearningRate 0.3667   Epoch: 2   Global Step: 28670   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:52:33,506-Speed 5949.93 samples/sec   Loss 13.4118   LearningRate 0.3667   Epoch: 2   Global Step: 28680   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:52:40,349-Speed 5986.31 samples/sec   Loss 13.3822   LearningRate 0.3666   Epoch: 2   Global Step: 28690   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:52:47,194-Speed 5985.55 samples/sec   Loss 13.4331   LearningRate 0.3666   Epoch: 2   Global Step: 28700   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:52:54,057-Speed 5969.12 samples/sec   Loss 13.3578   LearningRate 0.3666   Epoch: 2   Global Step: 28710   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:53:00,907-Speed 5980.45 samples/sec   Loss 13.4771   LearningRate 0.3665   Epoch: 2   Global Step: 28720   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:53:07,760-Speed 5978.43 samples/sec   Loss 13.4670   LearningRate 0.3665   Epoch: 2   Global Step: 28730   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:53:14,610-Speed 5980.57 samples/sec   Loss 13.3973   LearningRate 0.3664   Epoch: 2   Global Step: 28740   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:53:21,461-Speed 5981.84 samples/sec   Loss 13.4385   LearningRate 0.3664   Epoch: 2   Global Step: 28750   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:53:28,312-Speed 5980.28 samples/sec   Loss 13.3755   LearningRate 0.3664   Epoch: 2   Global Step: 28760   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:53:35,162-Speed 5980.10 samples/sec   Loss 13.4565   LearningRate 0.3663   Epoch: 2   Global Step: 28770   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:53:42,007-Speed 5984.82 samples/sec   Loss 13.4293   LearningRate 0.3663   Epoch: 2   Global Step: 28780   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:53:48,868-Speed 5970.89 samples/sec   Loss 13.3710   LearningRate 0.3662   Epoch: 2   Global Step: 28790   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:53:55,742-Speed 5960.15 samples/sec   Loss 13.5205   LearningRate 0.3662   Epoch: 2   Global Step: 28800   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:54:02,602-Speed 5972.06 samples/sec   Loss 13.4772   LearningRate 0.3661   Epoch: 2   Global Step: 28810   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:54:09,458-Speed 5975.18 samples/sec   Loss 13.4950   LearningRate 0.3661   Epoch: 2   Global Step: 28820   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:54:16,303-Speed 5984.92 samples/sec   Loss 13.4323   LearningRate 0.3661   Epoch: 2   Global Step: 28830   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:54:23,167-Speed 5967.92 samples/sec   Loss 13.3541   LearningRate 0.3660   Epoch: 2   Global Step: 28840   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:54:30,023-Speed 5975.94 samples/sec   Loss 13.4473   LearningRate 0.3660   Epoch: 2   Global Step: 28850   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:54:36,879-Speed 5975.52 samples/sec   Loss 13.5789   LearningRate 0.3659   Epoch: 2   Global Step: 28860   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:54:43,749-Speed 5962.72 samples/sec   Loss 13.3996   LearningRate 0.3659   Epoch: 2   Global Step: 28870   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:54:50,616-Speed 5965.72 samples/sec   Loss 13.3779   LearningRate 0.3659   Epoch: 2   Global Step: 28880   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:54:57,481-Speed 5967.36 samples/sec   Loss 13.4121   LearningRate 0.3658   Epoch: 2   Global Step: 28890   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:55:04,342-Speed 5971.63 samples/sec   Loss 13.4367   LearningRate 0.3658   Epoch: 2   Global Step: 28900   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:55:11,190-Speed 5982.14 samples/sec   Loss 13.4233   LearningRate 0.3657   Epoch: 2   Global Step: 28910   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:55:18,058-Speed 5965.54 samples/sec   Loss 13.4327   LearningRate 0.3657   Epoch: 2   Global Step: 28920   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:55:24,915-Speed 5974.66 samples/sec   Loss 13.4354   LearningRate 0.3657   Epoch: 2   Global Step: 28930   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:55:31,885-Speed 5877.60 samples/sec   Loss 13.3666   LearningRate 0.3656   Epoch: 2   Global Step: 28940   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:55:38,835-Speed 5897.04 samples/sec   Loss 13.3794   LearningRate 0.3656   Epoch: 2   Global Step: 28950   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:55:45,779-Speed 5900.24 samples/sec   Loss 13.4921   LearningRate 0.3655   Epoch: 2   Global Step: 28960   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:55:52,726-Speed 5896.17 samples/sec   Loss 13.4108   LearningRate 0.3655   Epoch: 2   Global Step: 28970   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:55:59,671-Speed 5899.63 samples/sec   Loss 13.4267   LearningRate 0.3655   Epoch: 2   Global Step: 28980   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:56:06,567-Speed 5940.49 samples/sec   Loss 13.3403   LearningRate 0.3654   Epoch: 2   Global Step: 28990   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:56:13,428-Speed 5971.25 samples/sec   Loss 13.3999   LearningRate 0.3654   Epoch: 2   Global Step: 29000   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:56:20,310-Speed 5952.13 samples/sec   Loss 13.4245   LearningRate 0.3653   Epoch: 2   Global Step: 29010   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:56:27,166-Speed 5975.17 samples/sec   Loss 13.4292   LearningRate 0.3653   Epoch: 2   Global Step: 29020   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:56:34,040-Speed 5960.58 samples/sec   Loss 13.4773   LearningRate 0.3652   Epoch: 2   Global Step: 29030   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:56:40,909-Speed 5963.55 samples/sec   Loss 13.3921   LearningRate 0.3652   Epoch: 2   Global Step: 29040   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:56:47,765-Speed 5977.81 samples/sec   Loss 13.3645   LearningRate 0.3652   Epoch: 2   Global Step: 29050   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:56:54,617-Speed 5982.07 samples/sec   Loss 13.4147   LearningRate 0.3651   Epoch: 2   Global Step: 29060   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:57:01,503-Speed 5949.62 samples/sec   Loss 13.4559   LearningRate 0.3651   Epoch: 2   Global Step: 29070   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:57:08,384-Speed 5953.97 samples/sec   Loss 13.4147   LearningRate 0.3650   Epoch: 2   Global Step: 29080   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:57:15,310-Speed 5914.90 samples/sec   Loss 13.3781   LearningRate 0.3650   Epoch: 2   Global Step: 29090   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:57:22,212-Speed 5936.00 samples/sec   Loss 13.3696   LearningRate 0.3650   Epoch: 2   Global Step: 29100   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:57:29,195-Speed 5867.05 samples/sec   Loss 13.4039   LearningRate 0.3649   Epoch: 2   Global Step: 29110   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:57:36,048-Speed 5978.20 samples/sec   Loss 13.3618   LearningRate 0.3649   Epoch: 2   Global Step: 29120   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:57:42,895-Speed 5983.65 samples/sec   Loss 13.4578   LearningRate 0.3648   Epoch: 2   Global Step: 29130   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:57:49,743-Speed 5983.09 samples/sec   Loss 13.4374   LearningRate 0.3648   Epoch: 2   Global Step: 29140   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:57:56,610-Speed 5965.15 samples/sec   Loss 13.4423   LearningRate 0.3648   Epoch: 2   Global Step: 29150   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:58:03,477-Speed 5967.96 samples/sec   Loss 13.4426   LearningRate 0.3647   Epoch: 2   Global Step: 29160   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:58:10,340-Speed 5969.29 samples/sec   Loss 13.3604   LearningRate 0.3647   Epoch: 2   Global Step: 29170   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:58:17,195-Speed 5976.97 samples/sec   Loss 13.3898   LearningRate 0.3646   Epoch: 2   Global Step: 29180   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:58:24,050-Speed 5976.40 samples/sec   Loss 13.4535   LearningRate 0.3646   Epoch: 2   Global Step: 29190   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:58:30,899-Speed 5980.72 samples/sec   Loss 13.4689   LearningRate 0.3646   Epoch: 2   Global Step: 29200   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:58:37,777-Speed 5956.61 samples/sec   Loss 13.2880   LearningRate 0.3645   Epoch: 2   Global Step: 29210   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:58:44,644-Speed 5966.38 samples/sec   Loss 13.3238   LearningRate 0.3645   Epoch: 2   Global Step: 29220   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:58:51,507-Speed 5969.77 samples/sec   Loss 13.4772   LearningRate 0.3644   Epoch: 2   Global Step: 29230   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:58:58,374-Speed 5966.03 samples/sec   Loss 13.3057   LearningRate 0.3644   Epoch: 2   Global Step: 29240   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:59:05,251-Speed 5957.61 samples/sec   Loss 13.3511   LearningRate 0.3643   Epoch: 2   Global Step: 29250   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:59:12,127-Speed 5958.08 samples/sec   Loss 13.3074   LearningRate 0.3643   Epoch: 2   Global Step: 29260   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:59:18,997-Speed 5963.13 samples/sec   Loss 13.4360   LearningRate 0.3643   Epoch: 2   Global Step: 29270   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 01:59:25,858-Speed 5972.60 samples/sec   Loss 13.3611   LearningRate 0.3642   Epoch: 2   Global Step: 29280   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:59:32,718-Speed 5973.05 samples/sec   Loss 13.3667   LearningRate 0.3642   Epoch: 2   Global Step: 29290   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:59:39,574-Speed 5975.94 samples/sec   Loss 13.3967   LearningRate 0.3641   Epoch: 2   Global Step: 29300   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:59:46,435-Speed 5970.86 samples/sec   Loss 13.4047   LearningRate 0.3641   Epoch: 2   Global Step: 29310   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 01:59:53,299-Speed 5968.47 samples/sec   Loss 13.3572   LearningRate 0.3641   Epoch: 2   Global Step: 29320   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:00:00,166-Speed 5966.17 samples/sec   Loss 13.4233   LearningRate 0.3640   Epoch: 2   Global Step: 29330   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:00:07,024-Speed 5973.45 samples/sec   Loss 13.3465   LearningRate 0.3640   Epoch: 2   Global Step: 29340   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:00:13,869-Speed 5985.44 samples/sec   Loss 13.3618   LearningRate 0.3639   Epoch: 2   Global Step: 29350   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:00:20,745-Speed 5958.29 samples/sec   Loss 13.4446   LearningRate 0.3639   Epoch: 2   Global Step: 29360   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:00:27,629-Speed 5951.11 samples/sec   Loss 13.2636   LearningRate 0.3639   Epoch: 2   Global Step: 29370   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:00:34,494-Speed 5967.33 samples/sec   Loss 13.3571   LearningRate 0.3638   Epoch: 2   Global Step: 29380   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:00:41,345-Speed 5980.34 samples/sec   Loss 13.3836   LearningRate 0.3638   Epoch: 2   Global Step: 29390   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:00:48,205-Speed 5971.65 samples/sec   Loss 13.3246   LearningRate 0.3637   Epoch: 2   Global Step: 29400   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:00:55,269-Speed 5803.76 samples/sec   Loss 13.3951   LearningRate 0.3637   Epoch: 2   Global Step: 29410   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:01:02,160-Speed 5947.60 samples/sec   Loss 13.3017   LearningRate 0.3637   Epoch: 2   Global Step: 29420   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:01:09,064-Speed 5933.99 samples/sec   Loss 13.3491   LearningRate 0.3636   Epoch: 2   Global Step: 29430   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:01:15,919-Speed 5976.35 samples/sec   Loss 13.3212   LearningRate 0.3636   Epoch: 2   Global Step: 29440   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:01:22,770-Speed 5979.88 samples/sec   Loss 13.3149   LearningRate 0.3635   Epoch: 2   Global Step: 29450   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:01:29,634-Speed 5969.05 samples/sec   Loss 13.3395   LearningRate 0.3635   Epoch: 2   Global Step: 29460   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:01:36,492-Speed 5974.02 samples/sec   Loss 13.3895   LearningRate 0.3634   Epoch: 2   Global Step: 29470   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:01:43,352-Speed 5971.61 samples/sec   Loss 13.2721   LearningRate 0.3634   Epoch: 2   Global Step: 29480   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:01:50,196-Speed 5986.04 samples/sec   Loss 13.4008   LearningRate 0.3634   Epoch: 2   Global Step: 29490   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:01:57,071-Speed 5962.05 samples/sec   Loss 13.4222   LearningRate 0.3633   Epoch: 2   Global Step: 29500   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:02:03,929-Speed 5973.81 samples/sec   Loss 13.3382   LearningRate 0.3633   Epoch: 2   Global Step: 29510   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:02:10,781-Speed 5978.74 samples/sec   Loss 13.3453   LearningRate 0.3632   Epoch: 2   Global Step: 29520   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:02:17,658-Speed 5957.44 samples/sec   Loss 13.3539   LearningRate 0.3632   Epoch: 2   Global Step: 29530   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:02:24,500-Speed 5987.59 samples/sec   Loss 13.3311   LearningRate 0.3632   Epoch: 2   Global Step: 29540   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:02:31,362-Speed 5972.66 samples/sec   Loss 13.3991   LearningRate 0.3631   Epoch: 2   Global Step: 29550   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:02:38,215-Speed 5977.83 samples/sec   Loss 13.3184   LearningRate 0.3631   Epoch: 2   Global Step: 29560   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:02:45,063-Speed 5982.63 samples/sec   Loss 13.3197   LearningRate 0.3630   Epoch: 2   Global Step: 29570   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:02:51,944-Speed 5953.19 samples/sec   Loss 13.3317   LearningRate 0.3630   Epoch: 2   Global Step: 29580   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:02:58,814-Speed 5963.54 samples/sec   Loss 13.3183   LearningRate 0.3630   Epoch: 2   Global Step: 29590   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:03:05,684-Speed 5963.48 samples/sec   Loss 13.2906   LearningRate 0.3629   Epoch: 2   Global Step: 29600   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:03:12,547-Speed 5969.04 samples/sec   Loss 13.3050   LearningRate 0.3629   Epoch: 2   Global Step: 29610   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:03:19,417-Speed 5964.32 samples/sec   Loss 13.3860   LearningRate 0.3628   Epoch: 2   Global Step: 29620   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:03:26,302-Speed 5952.11 samples/sec   Loss 13.3373   LearningRate 0.3628   Epoch: 2   Global Step: 29630   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:03:33,163-Speed 5973.44 samples/sec   Loss 13.4345   LearningRate 0.3628   Epoch: 2   Global Step: 29640   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:03:40,031-Speed 5964.90 samples/sec   Loss 13.3262   LearningRate 0.3627   Epoch: 2   Global Step: 29650   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:03:46,899-Speed 5965.35 samples/sec   Loss 13.3563   LearningRate 0.3627   Epoch: 2   Global Step: 29660   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:03:53,744-Speed 5984.72 samples/sec   Loss 13.3796   LearningRate 0.3626   Epoch: 2   Global Step: 29670   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:04:00,615-Speed 5963.07 samples/sec   Loss 13.3162   LearningRate 0.3626   Epoch: 2   Global Step: 29680   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:04:07,497-Speed 5953.31 samples/sec   Loss 13.2945   LearningRate 0.3625   Epoch: 2   Global Step: 29690   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:04:14,333-Speed 5992.77 samples/sec   Loss 13.3812   LearningRate 0.3625   Epoch: 2   Global Step: 29700   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:04:21,174-Speed 5988.01 samples/sec   Loss 13.3277   LearningRate 0.3625   Epoch: 2   Global Step: 29710   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:04:28,042-Speed 5965.20 samples/sec   Loss 13.3712   LearningRate 0.3624   Epoch: 2   Global Step: 29720   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:04:34,889-Speed 5982.79 samples/sec   Loss 13.2976   LearningRate 0.3624   Epoch: 2   Global Step: 29730   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:04:41,744-Speed 5976.21 samples/sec   Loss 13.3113   LearningRate 0.3623   Epoch: 2   Global Step: 29740   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:04:48,626-Speed 5952.80 samples/sec   Loss 13.4310   LearningRate 0.3623   Epoch: 2   Global Step: 29750   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:04:55,470-Speed 5986.29 samples/sec   Loss 13.2656   LearningRate 0.3623   Epoch: 2   Global Step: 29760   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:05:02,309-Speed 5990.54 samples/sec   Loss 13.4156   LearningRate 0.3622   Epoch: 2   Global Step: 29770   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:05:09,181-Speed 5961.51 samples/sec   Loss 13.3028   LearningRate 0.3622   Epoch: 2   Global Step: 29780   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:05:16,019-Speed 5990.80 samples/sec   Loss 13.3865   LearningRate 0.3621   Epoch: 2   Global Step: 29790   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:05:22,892-Speed 5961.25 samples/sec   Loss 13.3084   LearningRate 0.3621   Epoch: 2   Global Step: 29800   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:05:29,758-Speed 5966.92 samples/sec   Loss 13.2157   LearningRate 0.3621   Epoch: 2   Global Step: 29810   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:05:36,621-Speed 5969.86 samples/sec   Loss 13.2445   LearningRate 0.3620   Epoch: 2   Global Step: 29820   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:05:43,504-Speed 5951.96 samples/sec   Loss 13.2813   LearningRate 0.3620   Epoch: 2   Global Step: 29830   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:05:50,397-Speed 5942.72 samples/sec   Loss 13.2227   LearningRate 0.3619   Epoch: 2   Global Step: 29840   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:05:57,262-Speed 5967.88 samples/sec   Loss 13.3302   LearningRate 0.3619   Epoch: 2   Global Step: 29850   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:06:04,134-Speed 5960.91 samples/sec   Loss 13.3055   LearningRate 0.3619   Epoch: 2   Global Step: 29860   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:06:10,995-Speed 5975.83 samples/sec   Loss 13.3024   LearningRate 0.3618   Epoch: 2   Global Step: 29870   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:06:17,842-Speed 5982.72 samples/sec   Loss 13.2827   LearningRate 0.3618   Epoch: 2   Global Step: 29880   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:06:24,738-Speed 5941.03 samples/sec   Loss 13.2801   LearningRate 0.3617   Epoch: 2   Global Step: 29890   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:06:31,578-Speed 5989.34 samples/sec   Loss 13.2288   LearningRate 0.3617   Epoch: 2   Global Step: 29900   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:06:38,452-Speed 5960.38 samples/sec   Loss 13.4332   LearningRate 0.3617   Epoch: 2   Global Step: 29910   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:06:45,299-Speed 5982.33 samples/sec   Loss 13.3131   LearningRate 0.3616   Epoch: 2   Global Step: 29920   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:06:52,200-Speed 5937.21 samples/sec   Loss 13.3312   LearningRate 0.3616   Epoch: 2   Global Step: 29930   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:06:59,073-Speed 5960.02 samples/sec   Loss 13.3242   LearningRate 0.3615   Epoch: 2   Global Step: 29940   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:07:05,934-Speed 5973.60 samples/sec   Loss 13.3416   LearningRate 0.3615   Epoch: 2   Global Step: 29950   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:07:12,793-Speed 5974.99 samples/sec   Loss 13.3385   LearningRate 0.3614   Epoch: 2   Global Step: 29960   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:07:19,657-Speed 5968.22 samples/sec   Loss 13.2514   LearningRate 0.3614   Epoch: 2   Global Step: 29970   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:07:26,506-Speed 5980.21 samples/sec   Loss 13.2349   LearningRate 0.3614   Epoch: 2   Global Step: 29980   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:07:33,350-Speed 5986.22 samples/sec   Loss 13.3578   LearningRate 0.3613   Epoch: 2   Global Step: 29990   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:07:40,194-Speed 5986.21 samples/sec   Loss 13.3206   LearningRate 0.3613   Epoch: 2   Global Step: 30000   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:08:07,036-[lfw][30000]XNorm: 22.330070
Training: 2022-01-08 02:08:07,037-[lfw][30000]Accuracy-Flip: 0.99583+-0.00281
Training: 2022-01-08 02:08:07,038-[lfw][30000]Accuracy-Highest: 0.99650
Training: 2022-01-08 02:08:38,320-[cfp_fp][30000]XNorm: 19.638639
Training: 2022-01-08 02:08:38,320-[cfp_fp][30000]Accuracy-Flip: 0.96086+-0.00649
Training: 2022-01-08 02:08:38,320-[cfp_fp][30000]Accuracy-Highest: 0.96957
Training: 2022-01-08 02:09:04,946-[agedb_30][30000]XNorm: 21.622061
Training: 2022-01-08 02:09:04,947-[agedb_30][30000]Accuracy-Flip: 0.95983+-0.00841
Training: 2022-01-08 02:09:04,948-[agedb_30][30000]Accuracy-Highest: 0.95983
Training: 2022-01-08 02:09:11,786-Speed 447.20 samples/sec   Loss 13.2658   LearningRate 0.3612   Epoch: 2   Global Step: 30010   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:09:18,626-Speed 5989.72 samples/sec   Loss 13.2728   LearningRate 0.3612   Epoch: 2   Global Step: 30020   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:09:25,488-Speed 5971.34 samples/sec   Loss 13.2585   LearningRate 0.3612   Epoch: 2   Global Step: 30030   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:09:32,348-Speed 5971.70 samples/sec   Loss 13.3334   LearningRate 0.3611   Epoch: 2   Global Step: 30040   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 02:09:39,215-Speed 5966.22 samples/sec   Loss 13.2744   LearningRate 0.3611   Epoch: 2   Global Step: 30050   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 02:09:46,088-Speed 5960.17 samples/sec   Loss 13.3032   LearningRate 0.3610   Epoch: 2   Global Step: 30060   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 02:09:52,988-Speed 5959.77 samples/sec   Loss 13.3318   LearningRate 0.3610   Epoch: 2   Global Step: 30070   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 02:09:59,859-Speed 5963.13 samples/sec   Loss 13.3951   LearningRate 0.3610   Epoch: 2   Global Step: 30080   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 02:10:06,740-Speed 5954.02 samples/sec   Loss 13.3032   LearningRate 0.3609   Epoch: 2   Global Step: 30090   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 02:10:13,615-Speed 5959.87 samples/sec   Loss 13.2385   LearningRate 0.3609   Epoch: 2   Global Step: 30100   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 02:10:20,477-Speed 5970.77 samples/sec   Loss 13.3664   LearningRate 0.3608   Epoch: 2   Global Step: 30110   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 02:10:27,352-Speed 5959.07 samples/sec   Loss 13.3306   LearningRate 0.3608   Epoch: 2   Global Step: 30120   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 02:10:34,199-Speed 5982.97 samples/sec   Loss 13.2423   LearningRate 0.3608   Epoch: 2   Global Step: 30130   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 02:10:41,063-Speed 5970.58 samples/sec   Loss 13.2100   LearningRate 0.3607   Epoch: 2   Global Step: 30140   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:10:47,925-Speed 5970.84 samples/sec   Loss 13.2730   LearningRate 0.3607   Epoch: 2   Global Step: 30150   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:10:54,777-Speed 5979.75 samples/sec   Loss 13.2130   LearningRate 0.3606   Epoch: 2   Global Step: 30160   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:11:01,630-Speed 5978.12 samples/sec   Loss 13.3048   LearningRate 0.3606   Epoch: 2   Global Step: 30170   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:11:08,473-Speed 5987.13 samples/sec   Loss 13.3317   LearningRate 0.3606   Epoch: 2   Global Step: 30180   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:11:15,335-Speed 5969.94 samples/sec   Loss 13.3031   LearningRate 0.3605   Epoch: 2   Global Step: 30190   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:11:22,171-Speed 5993.39 samples/sec   Loss 13.3082   LearningRate 0.3605   Epoch: 2   Global Step: 30200   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:11:29,023-Speed 5979.01 samples/sec   Loss 13.3794   LearningRate 0.3604   Epoch: 2   Global Step: 30210   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:11:35,865-Speed 5987.16 samples/sec   Loss 13.3049   LearningRate 0.3604   Epoch: 2   Global Step: 30220   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:11:42,713-Speed 5982.69 samples/sec   Loss 13.3306   LearningRate 0.3603   Epoch: 2   Global Step: 30230   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:11:49,573-Speed 5972.23 samples/sec   Loss 13.2375   LearningRate 0.3603   Epoch: 2   Global Step: 30240   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:11:56,426-Speed 5977.66 samples/sec   Loss 13.2275   LearningRate 0.3603   Epoch: 2   Global Step: 30250   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:12:03,279-Speed 5978.40 samples/sec   Loss 13.2110   LearningRate 0.3602   Epoch: 2   Global Step: 30260   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:12:10,132-Speed 5978.24 samples/sec   Loss 13.2323   LearningRate 0.3602   Epoch: 2   Global Step: 30270   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:12:16,983-Speed 5979.47 samples/sec   Loss 13.2713   LearningRate 0.3601   Epoch: 2   Global Step: 30280   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:12:23,838-Speed 5976.82 samples/sec   Loss 13.1733   LearningRate 0.3601   Epoch: 2   Global Step: 30290   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:12:30,688-Speed 5980.19 samples/sec   Loss 13.3073   LearningRate 0.3601   Epoch: 2   Global Step: 30300   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:12:37,552-Speed 5969.15 samples/sec   Loss 13.1847   LearningRate 0.3600   Epoch: 2   Global Step: 30310   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:12:44,405-Speed 5977.92 samples/sec   Loss 13.2441   LearningRate 0.3600   Epoch: 2   Global Step: 30320   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:12:51,270-Speed 5969.22 samples/sec   Loss 13.2913   LearningRate 0.3599   Epoch: 2   Global Step: 30330   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:12:58,131-Speed 5970.25 samples/sec   Loss 13.2750   LearningRate 0.3599   Epoch: 2   Global Step: 30340   Fp16 Grad Scale: 524288   Required: 35 hours
Training: 2022-01-08 02:13:04,979-Speed 5982.45 samples/sec   Loss 13.2419   LearningRate 0.3599   Epoch: 2   Global Step: 30350   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:13:11,818-Speed 5990.66 samples/sec   Loss 13.2907   LearningRate 0.3598   Epoch: 2   Global Step: 30360   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:13:18,676-Speed 5973.89 samples/sec   Loss 13.2013   LearningRate 0.3598   Epoch: 2   Global Step: 30370   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:13:25,555-Speed 5955.81 samples/sec   Loss 13.2151   LearningRate 0.3597   Epoch: 2   Global Step: 30380   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:13:32,398-Speed 5987.37 samples/sec   Loss 13.1755   LearningRate 0.3597   Epoch: 2   Global Step: 30390   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:13:39,249-Speed 5982.12 samples/sec   Loss 13.2208   LearningRate 0.3597   Epoch: 2   Global Step: 30400   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:13:46,141-Speed 5944.63 samples/sec   Loss 13.3622   LearningRate 0.3596   Epoch: 2   Global Step: 30410   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:13:52,997-Speed 5975.55 samples/sec   Loss 13.3133   LearningRate 0.3596   Epoch: 2   Global Step: 30420   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:13:59,845-Speed 5982.34 samples/sec   Loss 13.2794   LearningRate 0.3595   Epoch: 2   Global Step: 30430   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:14:06,691-Speed 5984.70 samples/sec   Loss 13.3071   LearningRate 0.3595   Epoch: 2   Global Step: 30440   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:14:13,630-Speed 5904.33 samples/sec   Loss 13.2490   LearningRate 0.3595   Epoch: 2   Global Step: 30450   Fp16 Grad Scale: 524288   Required: 35 hours
Training: 2022-01-08 02:14:20,479-Speed 5980.99 samples/sec   Loss 13.2651   LearningRate 0.3594   Epoch: 2   Global Step: 30460   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:14:27,335-Speed 5975.81 samples/sec   Loss 13.2399   LearningRate 0.3594   Epoch: 2   Global Step: 30470   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:14:34,186-Speed 5979.45 samples/sec   Loss 13.3617   LearningRate 0.3593   Epoch: 2   Global Step: 30480   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:14:41,042-Speed 5977.55 samples/sec   Loss 13.2360   LearningRate 0.3593   Epoch: 2   Global Step: 30490   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:14:47,885-Speed 5986.20 samples/sec   Loss 13.3443   LearningRate 0.3593   Epoch: 2   Global Step: 30500   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:14:54,756-Speed 5962.15 samples/sec   Loss 13.2258   LearningRate 0.3592   Epoch: 2   Global Step: 30510   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:15:01,632-Speed 5958.38 samples/sec   Loss 13.1823   LearningRate 0.3592   Epoch: 2   Global Step: 30520   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:15:08,477-Speed 5984.70 samples/sec   Loss 13.1504   LearningRate 0.3591   Epoch: 2   Global Step: 30530   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:15:15,333-Speed 5977.80 samples/sec   Loss 13.1801   LearningRate 0.3591   Epoch: 2   Global Step: 30540   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:15:22,178-Speed 5984.50 samples/sec   Loss 13.2376   LearningRate 0.3590   Epoch: 2   Global Step: 30550   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:15:29,023-Speed 5985.65 samples/sec   Loss 13.2365   LearningRate 0.3590   Epoch: 2   Global Step: 30560   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:15:35,867-Speed 5985.47 samples/sec   Loss 13.2725   LearningRate 0.3590   Epoch: 2   Global Step: 30570   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:15:42,732-Speed 5970.15 samples/sec   Loss 13.3035   LearningRate 0.3589   Epoch: 2   Global Step: 30580   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:15:49,579-Speed 5983.50 samples/sec   Loss 13.2350   LearningRate 0.3589   Epoch: 2   Global Step: 30590   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:15:56,445-Speed 5966.43 samples/sec   Loss 13.3613   LearningRate 0.3588   Epoch: 2   Global Step: 30600   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:16:03,315-Speed 5963.34 samples/sec   Loss 13.2468   LearningRate 0.3588   Epoch: 2   Global Step: 30610   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:16:10,160-Speed 5985.03 samples/sec   Loss 13.2642   LearningRate 0.3588   Epoch: 2   Global Step: 30620   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:16:17,042-Speed 5953.53 samples/sec   Loss 13.2157   LearningRate 0.3587   Epoch: 2   Global Step: 30630   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:16:23,912-Speed 5962.99 samples/sec   Loss 13.1742   LearningRate 0.3587   Epoch: 2   Global Step: 30640   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:16:30,761-Speed 5982.15 samples/sec   Loss 13.2937   LearningRate 0.3586   Epoch: 2   Global Step: 30650   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:16:37,622-Speed 5974.18 samples/sec   Loss 13.2627   LearningRate 0.3586   Epoch: 2   Global Step: 30660   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:16:44,472-Speed 5980.70 samples/sec   Loss 13.2243   LearningRate 0.3586   Epoch: 2   Global Step: 30670   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:16:51,320-Speed 5982.20 samples/sec   Loss 13.2668   LearningRate 0.3585   Epoch: 2   Global Step: 30680   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:16:58,182-Speed 5970.87 samples/sec   Loss 13.1957   LearningRate 0.3585   Epoch: 2   Global Step: 30690   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:17:05,067-Speed 5952.44 samples/sec   Loss 13.1023   LearningRate 0.3584   Epoch: 2   Global Step: 30700   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:17:11,945-Speed 5956.86 samples/sec   Loss 13.2731   LearningRate 0.3584   Epoch: 2   Global Step: 30710   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:17:18,815-Speed 5962.74 samples/sec   Loss 13.2253   LearningRate 0.3584   Epoch: 2   Global Step: 30720   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:17:25,694-Speed 5955.99 samples/sec   Loss 13.2824   LearningRate 0.3583   Epoch: 2   Global Step: 30730   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:17:32,912-Speed 5675.98 samples/sec   Loss 13.1842   LearningRate 0.3583   Epoch: 2   Global Step: 30740   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:17:39,785-Speed 5961.23 samples/sec   Loss 13.2910   LearningRate 0.3582   Epoch: 2   Global Step: 30750   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:17:46,646-Speed 5972.02 samples/sec   Loss 13.1742   LearningRate 0.3582   Epoch: 2   Global Step: 30760   Fp16 Grad Scale: 524288   Required: 35 hours
Training: 2022-01-08 02:17:53,486-Speed 5989.29 samples/sec   Loss 13.1909   LearningRate 0.3582   Epoch: 2   Global Step: 30770   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:18:00,339-Speed 5978.24 samples/sec   Loss 13.2577   LearningRate 0.3581   Epoch: 2   Global Step: 30780   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:18:07,210-Speed 5962.14 samples/sec   Loss 13.2270   LearningRate 0.3581   Epoch: 2   Global Step: 30790   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:18:14,086-Speed 5958.04 samples/sec   Loss 13.2123   LearningRate 0.3580   Epoch: 2   Global Step: 30800   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:18:20,942-Speed 5975.91 samples/sec   Loss 13.1570   LearningRate 0.3580   Epoch: 2   Global Step: 30810   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:18:27,794-Speed 5978.80 samples/sec   Loss 13.2143   LearningRate 0.3580   Epoch: 2   Global Step: 30820   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:18:34,645-Speed 5979.84 samples/sec   Loss 13.1995   LearningRate 0.3579   Epoch: 2   Global Step: 30830   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:18:41,502-Speed 5974.71 samples/sec   Loss 13.2158   LearningRate 0.3579   Epoch: 2   Global Step: 30840   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:18:48,364-Speed 5969.16 samples/sec   Loss 13.2442   LearningRate 0.3578   Epoch: 2   Global Step: 30850   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:18:55,219-Speed 5977.35 samples/sec   Loss 13.1834   LearningRate 0.3578   Epoch: 2   Global Step: 30860   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:19:02,093-Speed 5959.31 samples/sec   Loss 13.2330   LearningRate 0.3578   Epoch: 2   Global Step: 30870   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:19:08,942-Speed 5980.57 samples/sec   Loss 13.2312   LearningRate 0.3577   Epoch: 2   Global Step: 30880   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:19:15,817-Speed 5959.54 samples/sec   Loss 13.2415   LearningRate 0.3577   Epoch: 2   Global Step: 30890   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:19:22,667-Speed 5980.58 samples/sec   Loss 13.3492   LearningRate 0.3576   Epoch: 2   Global Step: 30900   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:19:29,514-Speed 5982.98 samples/sec   Loss 13.2467   LearningRate 0.3576   Epoch: 2   Global Step: 30910   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:19:36,362-Speed 5982.54 samples/sec   Loss 13.1219   LearningRate 0.3575   Epoch: 2   Global Step: 30920   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:19:43,207-Speed 5984.80 samples/sec   Loss 13.2883   LearningRate 0.3575   Epoch: 2   Global Step: 30930   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:19:50,057-Speed 5980.56 samples/sec   Loss 13.2211   LearningRate 0.3575   Epoch: 2   Global Step: 30940   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:19:56,925-Speed 5964.59 samples/sec   Loss 13.2182   LearningRate 0.3574   Epoch: 2   Global Step: 30950   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:20:03,772-Speed 5983.68 samples/sec   Loss 13.2059   LearningRate 0.3574   Epoch: 2   Global Step: 30960   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:20:10,625-Speed 5978.46 samples/sec   Loss 13.1648   LearningRate 0.3573   Epoch: 2   Global Step: 30970   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:20:17,498-Speed 5960.70 samples/sec   Loss 13.2665   LearningRate 0.3573   Epoch: 2   Global Step: 30980   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:20:24,369-Speed 5962.56 samples/sec   Loss 13.1784   LearningRate 0.3573   Epoch: 2   Global Step: 30990   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:20:31,244-Speed 5959.06 samples/sec   Loss 13.1884   LearningRate 0.3572   Epoch: 2   Global Step: 31000   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:20:38,129-Speed 5950.15 samples/sec   Loss 13.1692   LearningRate 0.3572   Epoch: 2   Global Step: 31010   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:20:44,984-Speed 5976.21 samples/sec   Loss 13.1929   LearningRate 0.3571   Epoch: 2   Global Step: 31020   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:20:51,836-Speed 5978.76 samples/sec   Loss 13.2295   LearningRate 0.3571   Epoch: 2   Global Step: 31030   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:20:58,700-Speed 5968.86 samples/sec   Loss 13.2546   LearningRate 0.3571   Epoch: 2   Global Step: 31040   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:21:05,558-Speed 5973.24 samples/sec   Loss 13.2547   LearningRate 0.3570   Epoch: 2   Global Step: 31050   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:21:12,426-Speed 5965.76 samples/sec   Loss 13.2273   LearningRate 0.3570   Epoch: 2   Global Step: 31060   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:21:19,305-Speed 5955.87 samples/sec   Loss 13.2423   LearningRate 0.3569   Epoch: 2   Global Step: 31070   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:21:26,139-Speed 5995.14 samples/sec   Loss 13.2507   LearningRate 0.3569   Epoch: 2   Global Step: 31080   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 02:21:33,020-Speed 5953.14 samples/sec   Loss 13.2396   LearningRate 0.3569   Epoch: 2   Global Step: 31090   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 02:21:39,886-Speed 5966.85 samples/sec   Loss 13.2020   LearningRate 0.3568   Epoch: 2   Global Step: 31100   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 02:21:46,782-Speed 5941.31 samples/sec   Loss 13.1816   LearningRate 0.3568   Epoch: 2   Global Step: 31110   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 02:22:10,847-Speed 1702.17 samples/sec   Loss 13.1906   LearningRate 0.3567   Epoch: 3   Global Step: 31120   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 02:22:17,689-Speed 5987.54 samples/sec   Loss 13.1722   LearningRate 0.3567   Epoch: 3   Global Step: 31130   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 02:22:24,572-Speed 5952.97 samples/sec   Loss 13.2291   LearningRate 0.3567   Epoch: 3   Global Step: 31140   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 02:22:31,419-Speed 5982.99 samples/sec   Loss 13.1715   LearningRate 0.3566   Epoch: 3   Global Step: 31150   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 02:22:38,334-Speed 5924.72 samples/sec   Loss 13.1557   LearningRate 0.3566   Epoch: 3   Global Step: 31160   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 02:22:45,196-Speed 5970.03 samples/sec   Loss 13.1924   LearningRate 0.3565   Epoch: 3   Global Step: 31170   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-01-08 02:22:52,086-Speed 5947.96 samples/sec   Loss 13.1306   LearningRate 0.3565   Epoch: 3   Global Step: 31180   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:22:58,976-Speed 5946.42 samples/sec   Loss 13.1457   LearningRate 0.3565   Epoch: 3   Global Step: 31190   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:23:05,859-Speed 5951.20 samples/sec   Loss 13.1622   LearningRate 0.3564   Epoch: 3   Global Step: 31200   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:23:12,786-Speed 5914.44 samples/sec   Loss 13.2364   LearningRate 0.3564   Epoch: 3   Global Step: 31210   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:23:19,669-Speed 5952.31 samples/sec   Loss 13.1812   LearningRate 0.3563   Epoch: 3   Global Step: 31220   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:23:26,548-Speed 5955.33 samples/sec   Loss 13.1692   LearningRate 0.3563   Epoch: 3   Global Step: 31230   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:23:33,418-Speed 5963.89 samples/sec   Loss 13.1883   LearningRate 0.3563   Epoch: 3   Global Step: 31240   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:23:40,287-Speed 5963.43 samples/sec   Loss 13.2539   LearningRate 0.3562   Epoch: 3   Global Step: 31250   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:23:47,168-Speed 5955.83 samples/sec   Loss 13.1067   LearningRate 0.3562   Epoch: 3   Global Step: 31260   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:23:54,026-Speed 5973.25 samples/sec   Loss 13.1268   LearningRate 0.3561   Epoch: 3   Global Step: 31270   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-01-08 02:24:00,882-Speed 5974.86 samples/sec   Loss 13.0878   LearningRate 0.3561   Epoch: 3   Global Step: 31280   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:24:07,742-Speed 5971.80 samples/sec   Loss 13.1320   LearningRate 0.3560   Epoch: 3   Global Step: 31290   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:24:14,601-Speed 5972.43 samples/sec   Loss 13.2254   LearningRate 0.3560   Epoch: 3   Global Step: 31300   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:24:21,473-Speed 5962.58 samples/sec   Loss 13.1778   LearningRate 0.3560   Epoch: 3   Global Step: 31310   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:24:28,373-Speed 5937.48 samples/sec   Loss 13.2429   LearningRate 0.3559   Epoch: 3   Global Step: 31320   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:24:35,231-Speed 5973.11 samples/sec   Loss 13.2023   LearningRate 0.3559   Epoch: 3   Global Step: 31330   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:24:42,107-Speed 5957.98 samples/sec   Loss 13.2089   LearningRate 0.3558   Epoch: 3   Global Step: 31340   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:24:48,961-Speed 5977.62 samples/sec   Loss 13.1801   LearningRate 0.3558   Epoch: 3   Global Step: 31350   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:24:55,835-Speed 5959.58 samples/sec   Loss 13.0692   LearningRate 0.3558   Epoch: 3   Global Step: 31360   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:25:02,687-Speed 5978.48 samples/sec   Loss 13.1391   LearningRate 0.3557   Epoch: 3   Global Step: 31370   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-01-08 02:25:09,538-Speed 5981.85 samples/sec   Loss 13.1648   LearningRate 0.3557   Epoch: 3   Global Step: 31380   Fp16 Grad Scale: 524288   Required: 34 hours
Training: 2022-01-08 02:25:16,393-Speed 5976.34 samples/sec   Loss 13.1655   LearningRate 0.3556   Epoch: 3   Global Step: 31390   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:25:23,271-Speed 5956.92 samples/sec   Loss 13.1405   LearningRate 0.3556   Epoch: 3   Global Step: 31400   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:25:30,127-Speed 5975.91 samples/sec   Loss 13.0991   LearningRate 0.3556   Epoch: 3   Global Step: 31410   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:25:37,025-Speed 5938.87 samples/sec   Loss 13.1228   LearningRate 0.3555   Epoch: 3   Global Step: 31420   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:25:43,902-Speed 5956.93 samples/sec   Loss 13.1744   LearningRate 0.3555   Epoch: 3   Global Step: 31430   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:25:50,780-Speed 5956.50 samples/sec   Loss 13.1852   LearningRate 0.3554   Epoch: 3   Global Step: 31440   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:25:57,643-Speed 5970.03 samples/sec   Loss 13.0951   LearningRate 0.3554   Epoch: 3   Global Step: 31450   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:26:06,610-Speed 4568.18 samples/sec   Loss 13.1435   LearningRate 0.3554   Epoch: 3   Global Step: 31460   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:26:13,482-Speed 5961.88 samples/sec   Loss 13.2187   LearningRate 0.3553   Epoch: 3   Global Step: 31470   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:26:20,336-Speed 5977.62 samples/sec   Loss 13.1472   LearningRate 0.3553   Epoch: 3   Global Step: 31480   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:26:27,201-Speed 5967.34 samples/sec   Loss 13.2424   LearningRate 0.3552   Epoch: 3   Global Step: 31490   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:26:34,087-Speed 5949.13 samples/sec   Loss 13.1701   LearningRate 0.3552   Epoch: 3   Global Step: 31500   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:26:41,004-Speed 5922.83 samples/sec   Loss 13.1614   LearningRate 0.3552   Epoch: 3   Global Step: 31510   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:26:47,892-Speed 5947.80 samples/sec   Loss 13.1492   LearningRate 0.3551   Epoch: 3   Global Step: 31520   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:26:54,828-Speed 5907.47 samples/sec   Loss 13.1364   LearningRate 0.3551   Epoch: 3   Global Step: 31530   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:27:01,714-Speed 5949.82 samples/sec   Loss 13.1822   LearningRate 0.3550   Epoch: 3   Global Step: 31540   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:27:08,599-Speed 5952.27 samples/sec   Loss 13.1372   LearningRate 0.3550   Epoch: 3   Global Step: 31550   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:27:15,493-Speed 5942.49 samples/sec   Loss 13.1532   LearningRate 0.3550   Epoch: 3   Global Step: 31560   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:27:22,365-Speed 5961.45 samples/sec   Loss 13.2012   LearningRate 0.3549   Epoch: 3   Global Step: 31570   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:27:29,237-Speed 5962.43 samples/sec   Loss 13.1910   LearningRate 0.3549   Epoch: 3   Global Step: 31580   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:27:36,101-Speed 5968.47 samples/sec   Loss 13.1007   LearningRate 0.3548   Epoch: 3   Global Step: 31590   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:27:42,962-Speed 5971.06 samples/sec   Loss 13.1567   LearningRate 0.3548   Epoch: 3   Global Step: 31600   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:27:49,817-Speed 5976.18 samples/sec   Loss 13.1272   LearningRate 0.3548   Epoch: 3   Global Step: 31610   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:27:56,675-Speed 5973.92 samples/sec   Loss 13.1684   LearningRate 0.3547   Epoch: 3   Global Step: 31620   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:28:03,533-Speed 5973.54 samples/sec   Loss 13.1587   LearningRate 0.3547   Epoch: 3   Global Step: 31630   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:28:10,395-Speed 5971.38 samples/sec   Loss 13.1547   LearningRate 0.3546   Epoch: 3   Global Step: 31640   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:28:17,289-Speed 5944.02 samples/sec   Loss 13.1450   LearningRate 0.3546   Epoch: 3   Global Step: 31650   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:28:24,156-Speed 5966.16 samples/sec   Loss 13.0625   LearningRate 0.3546   Epoch: 3   Global Step: 31660   Fp16 Grad Scale: 524288   Required: 34 hours
Training: 2022-01-08 02:28:31,013-Speed 5974.53 samples/sec   Loss 13.0686   LearningRate 0.3545   Epoch: 3   Global Step: 31670   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:28:37,876-Speed 5969.19 samples/sec   Loss 13.1644   LearningRate 0.3545   Epoch: 3   Global Step: 31680   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:28:44,739-Speed 5969.49 samples/sec   Loss 13.0883   LearningRate 0.3544   Epoch: 3   Global Step: 31690   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:28:51,623-Speed 5950.85 samples/sec   Loss 13.0983   LearningRate 0.3544   Epoch: 3   Global Step: 31700   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:28:58,485-Speed 5970.46 samples/sec   Loss 13.0795   LearningRate 0.3544   Epoch: 3   Global Step: 31710   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:29:05,358-Speed 5961.09 samples/sec   Loss 13.1935   LearningRate 0.3543   Epoch: 3   Global Step: 31720   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:29:12,217-Speed 5972.52 samples/sec   Loss 13.1114   LearningRate 0.3543   Epoch: 3   Global Step: 31730   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:29:19,083-Speed 5967.21 samples/sec   Loss 13.0445   LearningRate 0.3542   Epoch: 3   Global Step: 31740   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:29:25,967-Speed 5952.18 samples/sec   Loss 13.1918   LearningRate 0.3542   Epoch: 3   Global Step: 31750   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:29:32,826-Speed 5973.08 samples/sec   Loss 13.1060   LearningRate 0.3542   Epoch: 3   Global Step: 31760   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:29:39,666-Speed 5988.54 samples/sec   Loss 13.1283   LearningRate 0.3541   Epoch: 3   Global Step: 31770   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:29:46,617-Speed 5894.30 samples/sec   Loss 13.0785   LearningRate 0.3541   Epoch: 3   Global Step: 31780   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:29:53,476-Speed 5972.54 samples/sec   Loss 13.1312   LearningRate 0.3540   Epoch: 3   Global Step: 31790   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:30:00,323-Speed 5983.17 samples/sec   Loss 13.0771   LearningRate 0.3540   Epoch: 3   Global Step: 31800   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:30:07,174-Speed 5980.22 samples/sec   Loss 13.1729   LearningRate 0.3539   Epoch: 3   Global Step: 31810   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:30:14,023-Speed 5980.59 samples/sec   Loss 13.0318   LearningRate 0.3539   Epoch: 3   Global Step: 31820   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:30:20,891-Speed 5965.13 samples/sec   Loss 13.1083   LearningRate 0.3539   Epoch: 3   Global Step: 31830   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:30:27,808-Speed 5922.78 samples/sec   Loss 13.2295   LearningRate 0.3538   Epoch: 3   Global Step: 31840   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:30:34,667-Speed 5972.93 samples/sec   Loss 13.1540   LearningRate 0.3538   Epoch: 3   Global Step: 31850   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:30:41,525-Speed 5973.71 samples/sec   Loss 13.0825   LearningRate 0.3537   Epoch: 3   Global Step: 31860   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:30:48,373-Speed 5981.82 samples/sec   Loss 13.1229   LearningRate 0.3537   Epoch: 3   Global Step: 31870   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:30:55,242-Speed 5966.02 samples/sec   Loss 13.1220   LearningRate 0.3537   Epoch: 3   Global Step: 31880   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:31:02,104-Speed 5971.54 samples/sec   Loss 13.0070   LearningRate 0.3536   Epoch: 3   Global Step: 31890   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:31:08,982-Speed 5957.00 samples/sec   Loss 13.1347   LearningRate 0.3536   Epoch: 3   Global Step: 31900   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:31:15,890-Speed 5930.83 samples/sec   Loss 13.1472   LearningRate 0.3535   Epoch: 3   Global Step: 31910   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:31:23,034-Speed 5735.81 samples/sec   Loss 13.1535   LearningRate 0.3535   Epoch: 3   Global Step: 31920   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:31:29,911-Speed 5957.35 samples/sec   Loss 13.1362   LearningRate 0.3535   Epoch: 3   Global Step: 31930   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:31:36,787-Speed 5958.60 samples/sec   Loss 13.1487   LearningRate 0.3534   Epoch: 3   Global Step: 31940   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:31:44,166-Speed 5551.69 samples/sec   Loss 13.1131   LearningRate 0.3534   Epoch: 3   Global Step: 31950   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:31:51,043-Speed 5957.54 samples/sec   Loss 13.1657   LearningRate 0.3533   Epoch: 3   Global Step: 31960   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:31:57,901-Speed 5973.87 samples/sec   Loss 13.1368   LearningRate 0.3533   Epoch: 3   Global Step: 31970   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:32:04,762-Speed 5972.19 samples/sec   Loss 13.1563   LearningRate 0.3533   Epoch: 3   Global Step: 31980   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:32:11,641-Speed 5955.80 samples/sec   Loss 13.0722   LearningRate 0.3532   Epoch: 3   Global Step: 31990   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:32:18,500-Speed 5973.09 samples/sec   Loss 13.0240   LearningRate 0.3532   Epoch: 3   Global Step: 32000   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:32:25,366-Speed 5966.39 samples/sec   Loss 13.0601   LearningRate 0.3531   Epoch: 3   Global Step: 32010   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:32:32,236-Speed 5963.75 samples/sec   Loss 13.1149   LearningRate 0.3531   Epoch: 3   Global Step: 32020   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:32:39,102-Speed 5966.90 samples/sec   Loss 13.0328   LearningRate 0.3531   Epoch: 3   Global Step: 32030   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:32:46,001-Speed 5938.02 samples/sec   Loss 13.1347   LearningRate 0.3530   Epoch: 3   Global Step: 32040   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:32:52,867-Speed 5967.09 samples/sec   Loss 13.1187   LearningRate 0.3530   Epoch: 3   Global Step: 32050   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:32:59,741-Speed 5960.55 samples/sec   Loss 13.0928   LearningRate 0.3529   Epoch: 3   Global Step: 32060   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:33:06,613-Speed 5961.41 samples/sec   Loss 13.1734   LearningRate 0.3529   Epoch: 3   Global Step: 32070   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:33:13,468-Speed 5976.49 samples/sec   Loss 13.0913   LearningRate 0.3529   Epoch: 3   Global Step: 32080   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:33:20,311-Speed 5986.55 samples/sec   Loss 13.1969   LearningRate 0.3528   Epoch: 3   Global Step: 32090   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 02:33:27,166-Speed 5976.35 samples/sec   Loss 13.1042   LearningRate 0.3528   Epoch: 3   Global Step: 32100   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 02:33:34,037-Speed 5961.47 samples/sec   Loss 13.1284   LearningRate 0.3527   Epoch: 3   Global Step: 32110   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 02:33:40,897-Speed 5972.37 samples/sec   Loss 13.1048   LearningRate 0.3527   Epoch: 3   Global Step: 32120   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 02:33:47,753-Speed 5975.36 samples/sec   Loss 13.1091   LearningRate 0.3527   Epoch: 3   Global Step: 32130   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 02:33:54,626-Speed 5960.71 samples/sec   Loss 13.0894   LearningRate 0.3526   Epoch: 3   Global Step: 32140   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 02:34:01,510-Speed 5950.89 samples/sec   Loss 13.1014   LearningRate 0.3526   Epoch: 3   Global Step: 32150   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 02:34:08,388-Speed 5956.45 samples/sec   Loss 13.1436   LearningRate 0.3525   Epoch: 3   Global Step: 32160   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 02:34:15,268-Speed 5955.81 samples/sec   Loss 13.2205   LearningRate 0.3525   Epoch: 3   Global Step: 32170   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 02:34:22,138-Speed 5965.86 samples/sec   Loss 13.0562   LearningRate 0.3525   Epoch: 3   Global Step: 32180   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-01-08 02:34:28,988-Speed 5980.55 samples/sec   Loss 13.0958   LearningRate 0.3524   Epoch: 3   Global Step: 32190   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:34:35,887-Speed 5938.64 samples/sec   Loss 13.0562   LearningRate 0.3524   Epoch: 3   Global Step: 32200   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:34:42,747-Speed 5973.30 samples/sec   Loss 13.0482   LearningRate 0.3523   Epoch: 3   Global Step: 32210   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:34:49,635-Speed 5947.92 samples/sec   Loss 12.9981   LearningRate 0.3523   Epoch: 3   Global Step: 32220   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:34:56,513-Speed 5955.88 samples/sec   Loss 12.9976   LearningRate 0.3523   Epoch: 3   Global Step: 32230   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:35:03,395-Speed 5953.32 samples/sec   Loss 13.0517   LearningRate 0.3522   Epoch: 3   Global Step: 32240   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:35:10,244-Speed 5981.25 samples/sec   Loss 13.0055   LearningRate 0.3522   Epoch: 3   Global Step: 32250   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:35:17,102-Speed 5974.22 samples/sec   Loss 13.1200   LearningRate 0.3521   Epoch: 3   Global Step: 32260   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:35:23,957-Speed 5976.69 samples/sec   Loss 13.0624   LearningRate 0.3521   Epoch: 3   Global Step: 32270   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:35:30,810-Speed 5977.49 samples/sec   Loss 13.1798   LearningRate 0.3521   Epoch: 3   Global Step: 32280   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:35:37,667-Speed 5974.63 samples/sec   Loss 13.0115   LearningRate 0.3520   Epoch: 3   Global Step: 32290   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:35:44,526-Speed 5972.55 samples/sec   Loss 13.0931   LearningRate 0.3520   Epoch: 3   Global Step: 32300   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:35:51,378-Speed 5979.45 samples/sec   Loss 13.0160   LearningRate 0.3519   Epoch: 3   Global Step: 32310   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:35:58,242-Speed 5967.99 samples/sec   Loss 13.1804   LearningRate 0.3519   Epoch: 3   Global Step: 32320   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:36:05,117-Speed 5958.80 samples/sec   Loss 13.0165   LearningRate 0.3519   Epoch: 3   Global Step: 32330   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:36:11,994-Speed 5957.53 samples/sec   Loss 13.1345   LearningRate 0.3518   Epoch: 3   Global Step: 32340   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:36:18,862-Speed 5968.30 samples/sec   Loss 13.0609   LearningRate 0.3518   Epoch: 3   Global Step: 32350   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:36:25,722-Speed 5972.05 samples/sec   Loss 13.1251   LearningRate 0.3517   Epoch: 3   Global Step: 32360   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:36:32,573-Speed 5979.51 samples/sec   Loss 13.0517   LearningRate 0.3517   Epoch: 3   Global Step: 32370   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:36:39,439-Speed 5966.65 samples/sec   Loss 13.1045   LearningRate 0.3517   Epoch: 3   Global Step: 32380   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:36:46,302-Speed 5969.66 samples/sec   Loss 13.1043   LearningRate 0.3516   Epoch: 3   Global Step: 32390   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:36:53,178-Speed 5958.65 samples/sec   Loss 12.9836   LearningRate 0.3516   Epoch: 3   Global Step: 32400   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:37:00,046-Speed 5964.52 samples/sec   Loss 13.1311   LearningRate 0.3515   Epoch: 3   Global Step: 32410   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:37:06,912-Speed 5967.24 samples/sec   Loss 13.1076   LearningRate 0.3515   Epoch: 3   Global Step: 32420   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:37:13,796-Speed 5951.50 samples/sec   Loss 13.0169   LearningRate 0.3515   Epoch: 3   Global Step: 32430   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:37:20,650-Speed 5976.49 samples/sec   Loss 13.0365   LearningRate 0.3514   Epoch: 3   Global Step: 32440   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:37:27,527-Speed 5957.55 samples/sec   Loss 12.9755   LearningRate 0.3514   Epoch: 3   Global Step: 32450   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:37:34,367-Speed 5989.24 samples/sec   Loss 13.1631   LearningRate 0.3513   Epoch: 3   Global Step: 32460   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:37:41,240-Speed 5961.14 samples/sec   Loss 13.0192   LearningRate 0.3513   Epoch: 3   Global Step: 32470   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:37:48,118-Speed 5957.54 samples/sec   Loss 12.9854   LearningRate 0.3513   Epoch: 3   Global Step: 32480   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:37:54,991-Speed 5960.47 samples/sec   Loss 13.0677   LearningRate 0.3512   Epoch: 3   Global Step: 32490   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:38:01,872-Speed 5953.88 samples/sec   Loss 13.0943   LearningRate 0.3512   Epoch: 3   Global Step: 32500   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:38:08,765-Speed 5944.17 samples/sec   Loss 13.1247   LearningRate 0.3511   Epoch: 3   Global Step: 32510   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:38:15,641-Speed 5957.56 samples/sec   Loss 13.1465   LearningRate 0.3511   Epoch: 3   Global Step: 32520   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:38:22,516-Speed 5959.56 samples/sec   Loss 13.0679   LearningRate 0.3511   Epoch: 3   Global Step: 32530   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:38:29,372-Speed 5975.55 samples/sec   Loss 13.1082   LearningRate 0.3510   Epoch: 3   Global Step: 32540   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:38:36,226-Speed 5976.74 samples/sec   Loss 13.1528   LearningRate 0.3510   Epoch: 3   Global Step: 32550   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:38:43,120-Speed 5941.90 samples/sec   Loss 12.9783   LearningRate 0.3509   Epoch: 3   Global Step: 32560   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:38:50,063-Speed 5901.28 samples/sec   Loss 13.0680   LearningRate 0.3509   Epoch: 3   Global Step: 32570   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:38:56,917-Speed 5977.00 samples/sec   Loss 13.0893   LearningRate 0.3509   Epoch: 3   Global Step: 32580   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:39:03,771-Speed 5976.96 samples/sec   Loss 13.0698   LearningRate 0.3508   Epoch: 3   Global Step: 32590   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:39:10,647-Speed 5957.47 samples/sec   Loss 13.1024   LearningRate 0.3508   Epoch: 3   Global Step: 32600   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:39:17,521-Speed 5960.62 samples/sec   Loss 13.0799   LearningRate 0.3507   Epoch: 3   Global Step: 32610   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:39:24,395-Speed 5959.86 samples/sec   Loss 13.0139   LearningRate 0.3507   Epoch: 3   Global Step: 32620   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:39:31,258-Speed 5969.98 samples/sec   Loss 13.0407   LearningRate 0.3507   Epoch: 3   Global Step: 32630   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:39:38,115-Speed 5973.75 samples/sec   Loss 13.0512   LearningRate 0.3506   Epoch: 3   Global Step: 32640   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:39:44,968-Speed 5978.23 samples/sec   Loss 13.0665   LearningRate 0.3506   Epoch: 3   Global Step: 32650   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:39:51,805-Speed 5991.96 samples/sec   Loss 13.0726   LearningRate 0.3505   Epoch: 3   Global Step: 32660   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:39:58,671-Speed 5966.47 samples/sec   Loss 13.0974   LearningRate 0.3505   Epoch: 3   Global Step: 32670   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:40:05,563-Speed 5944.49 samples/sec   Loss 13.0805   LearningRate 0.3505   Epoch: 3   Global Step: 32680   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:40:12,425-Speed 5972.10 samples/sec   Loss 12.9869   LearningRate 0.3504   Epoch: 3   Global Step: 32690   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:40:19,299-Speed 5959.40 samples/sec   Loss 13.0295   LearningRate 0.3504   Epoch: 3   Global Step: 32700   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:40:26,193-Speed 5942.78 samples/sec   Loss 13.0521   LearningRate 0.3503   Epoch: 3   Global Step: 32710   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:40:33,059-Speed 5967.14 samples/sec   Loss 13.0089   LearningRate 0.3503   Epoch: 3   Global Step: 32720   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:40:39,928-Speed 5963.73 samples/sec   Loss 12.9362   LearningRate 0.3503   Epoch: 3   Global Step: 32730   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:40:46,790-Speed 5970.15 samples/sec   Loss 13.0172   LearningRate 0.3502   Epoch: 3   Global Step: 32740   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:40:53,642-Speed 5981.28 samples/sec   Loss 12.9950   LearningRate 0.3502   Epoch: 3   Global Step: 32750   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:41:00,513-Speed 5962.80 samples/sec   Loss 13.0027   LearningRate 0.3501   Epoch: 3   Global Step: 32760   Fp16 Grad Scale: 524288   Required: 34 hours
Training: 2022-01-08 02:41:07,374-Speed 5971.32 samples/sec   Loss 12.9652   LearningRate 0.3501   Epoch: 3   Global Step: 32770   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:41:14,268-Speed 5942.44 samples/sec   Loss 12.9824   LearningRate 0.3500   Epoch: 3   Global Step: 32780   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:41:21,124-Speed 5975.30 samples/sec   Loss 12.9819   LearningRate 0.3500   Epoch: 3   Global Step: 32790   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:41:27,985-Speed 5970.82 samples/sec   Loss 12.9509   LearningRate 0.3500   Epoch: 3   Global Step: 32800   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:41:34,874-Speed 5950.06 samples/sec   Loss 13.0460   LearningRate 0.3499   Epoch: 3   Global Step: 32810   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:41:41,755-Speed 5954.12 samples/sec   Loss 12.9924   LearningRate 0.3499   Epoch: 3   Global Step: 32820   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:41:48,610-Speed 5975.90 samples/sec   Loss 12.9362   LearningRate 0.3498   Epoch: 3   Global Step: 32830   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:41:55,477-Speed 5965.37 samples/sec   Loss 13.0206   LearningRate 0.3498   Epoch: 3   Global Step: 32840   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:42:02,355-Speed 5956.74 samples/sec   Loss 13.0096   LearningRate 0.3498   Epoch: 3   Global Step: 32850   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:42:09,221-Speed 5966.92 samples/sec   Loss 12.9891   LearningRate 0.3497   Epoch: 3   Global Step: 32860   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:42:16,084-Speed 5969.61 samples/sec   Loss 13.0319   LearningRate 0.3497   Epoch: 3   Global Step: 32870   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:42:22,947-Speed 5969.85 samples/sec   Loss 13.0817   LearningRate 0.3496   Epoch: 3   Global Step: 32880   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:42:29,820-Speed 5960.37 samples/sec   Loss 13.0109   LearningRate 0.3496   Epoch: 3   Global Step: 32890   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:42:36,684-Speed 5969.51 samples/sec   Loss 12.9629   LearningRate 0.3496   Epoch: 3   Global Step: 32900   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:42:43,559-Speed 5958.76 samples/sec   Loss 13.0479   LearningRate 0.3495   Epoch: 3   Global Step: 32910   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:42:50,477-Speed 5922.37 samples/sec   Loss 13.0116   LearningRate 0.3495   Epoch: 3   Global Step: 32920   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:42:57,365-Speed 5947.71 samples/sec   Loss 12.9860   LearningRate 0.3494   Epoch: 3   Global Step: 32930   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:43:04,325-Speed 5886.60 samples/sec   Loss 13.0188   LearningRate 0.3494   Epoch: 3   Global Step: 32940   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:43:11,309-Speed 5869.03 samples/sec   Loss 13.1129   LearningRate 0.3494   Epoch: 3   Global Step: 32950   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:43:18,252-Speed 5900.29 samples/sec   Loss 13.0538   LearningRate 0.3493   Epoch: 3   Global Step: 32960   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:43:25,106-Speed 5977.93 samples/sec   Loss 13.0555   LearningRate 0.3493   Epoch: 3   Global Step: 32970   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:43:31,962-Speed 5975.34 samples/sec   Loss 12.9507   LearningRate 0.3492   Epoch: 3   Global Step: 32980   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:43:38,937-Speed 5874.18 samples/sec   Loss 13.0298   LearningRate 0.3492   Epoch: 3   Global Step: 32990   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:43:45,911-Speed 5874.46 samples/sec   Loss 13.0150   LearningRate 0.3492   Epoch: 3   Global Step: 33000   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:43:52,810-Speed 5938.11 samples/sec   Loss 13.0228   LearningRate 0.3491   Epoch: 3   Global Step: 33010   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:43:59,685-Speed 5958.36 samples/sec   Loss 12.9767   LearningRate 0.3491   Epoch: 3   Global Step: 33020   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:44:06,560-Speed 5962.24 samples/sec   Loss 13.0147   LearningRate 0.3490   Epoch: 3   Global Step: 33030   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:44:13,409-Speed 5981.88 samples/sec   Loss 13.0104   LearningRate 0.3490   Epoch: 3   Global Step: 33040   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:44:20,274-Speed 5967.43 samples/sec   Loss 13.0239   LearningRate 0.3490   Epoch: 3   Global Step: 33050   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:44:27,142-Speed 5965.55 samples/sec   Loss 12.9516   LearningRate 0.3489   Epoch: 3   Global Step: 33060   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:44:33,992-Speed 5981.50 samples/sec   Loss 13.0494   LearningRate 0.3489   Epoch: 3   Global Step: 33070   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:44:40,865-Speed 5960.76 samples/sec   Loss 13.0617   LearningRate 0.3488   Epoch: 3   Global Step: 33080   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:44:47,726-Speed 5970.90 samples/sec   Loss 13.0140   LearningRate 0.3488   Epoch: 3   Global Step: 33090   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:44:54,581-Speed 5976.60 samples/sec   Loss 13.0356   LearningRate 0.3488   Epoch: 3   Global Step: 33100   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:45:01,447-Speed 5966.33 samples/sec   Loss 13.0459   LearningRate 0.3487   Epoch: 3   Global Step: 33110   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:45:08,324-Speed 5956.59 samples/sec   Loss 12.9940   LearningRate 0.3487   Epoch: 3   Global Step: 33120   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:45:15,184-Speed 5972.58 samples/sec   Loss 13.0542   LearningRate 0.3486   Epoch: 3   Global Step: 33130   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:45:22,066-Speed 5952.46 samples/sec   Loss 13.0009   LearningRate 0.3486   Epoch: 3   Global Step: 33140   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:45:28,914-Speed 5983.67 samples/sec   Loss 13.0359   LearningRate 0.3486   Epoch: 3   Global Step: 33150   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:45:35,766-Speed 5978.96 samples/sec   Loss 13.0516   LearningRate 0.3485   Epoch: 3   Global Step: 33160   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:45:42,636-Speed 5964.00 samples/sec   Loss 13.0571   LearningRate 0.3485   Epoch: 3   Global Step: 33170   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:45:49,494-Speed 5972.85 samples/sec   Loss 12.9615   LearningRate 0.3484   Epoch: 3   Global Step: 33180   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:45:56,349-Speed 5976.82 samples/sec   Loss 13.0164   LearningRate 0.3484   Epoch: 3   Global Step: 33190   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:46:03,226-Speed 5956.94 samples/sec   Loss 13.0071   LearningRate 0.3484   Epoch: 3   Global Step: 33200   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:46:10,104-Speed 5956.86 samples/sec   Loss 13.0085   LearningRate 0.3483   Epoch: 3   Global Step: 33210   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:46:16,971-Speed 5965.62 samples/sec   Loss 12.9677   LearningRate 0.3483   Epoch: 3   Global Step: 33220   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:46:23,827-Speed 5975.64 samples/sec   Loss 12.9082   LearningRate 0.3482   Epoch: 3   Global Step: 33230   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:46:30,686-Speed 5973.17 samples/sec   Loss 12.9607   LearningRate 0.3482   Epoch: 3   Global Step: 33240   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:46:37,561-Speed 5958.82 samples/sec   Loss 13.0553   LearningRate 0.3482   Epoch: 3   Global Step: 33250   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:46:44,428-Speed 5965.52 samples/sec   Loss 12.9846   LearningRate 0.3481   Epoch: 3   Global Step: 33260   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:46:51,299-Speed 5962.33 samples/sec   Loss 12.9508   LearningRate 0.3481   Epoch: 3   Global Step: 33270   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:46:58,159-Speed 5971.62 samples/sec   Loss 12.9889   LearningRate 0.3480   Epoch: 3   Global Step: 33280   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:47:05,006-Speed 5983.24 samples/sec   Loss 12.9454   LearningRate 0.3480   Epoch: 3   Global Step: 33290   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:47:11,872-Speed 5967.11 samples/sec   Loss 12.9624   LearningRate 0.3480   Epoch: 3   Global Step: 33300   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:47:18,734-Speed 5972.04 samples/sec   Loss 13.0104   LearningRate 0.3479   Epoch: 3   Global Step: 33310   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:47:25,580-Speed 5984.33 samples/sec   Loss 12.8991   LearningRate 0.3479   Epoch: 3   Global Step: 33320   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:47:32,428-Speed 5982.26 samples/sec   Loss 13.0301   LearningRate 0.3478   Epoch: 3   Global Step: 33330   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:47:39,280-Speed 5978.86 samples/sec   Loss 12.9676   LearningRate 0.3478   Epoch: 3   Global Step: 33340   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:47:46,150-Speed 5964.24 samples/sec   Loss 12.8425   LearningRate 0.3478   Epoch: 3   Global Step: 33350   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:47:53,009-Speed 5972.25 samples/sec   Loss 13.0152   LearningRate 0.3477   Epoch: 3   Global Step: 33360   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:47:59,871-Speed 5970.66 samples/sec   Loss 13.0069   LearningRate 0.3477   Epoch: 3   Global Step: 33370   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:48:06,843-Speed 5876.02 samples/sec   Loss 12.8953   LearningRate 0.3476   Epoch: 3   Global Step: 33380   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:48:13,707-Speed 5970.05 samples/sec   Loss 12.9681   LearningRate 0.3476   Epoch: 3   Global Step: 33390   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:48:20,581-Speed 5962.29 samples/sec   Loss 12.9848   LearningRate 0.3476   Epoch: 3   Global Step: 33400   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:48:27,452-Speed 5962.29 samples/sec   Loss 12.9588   LearningRate 0.3475   Epoch: 3   Global Step: 33410   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:48:34,315-Speed 5969.15 samples/sec   Loss 12.9844   LearningRate 0.3475   Epoch: 3   Global Step: 33420   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:48:41,166-Speed 5980.25 samples/sec   Loss 13.0504   LearningRate 0.3474   Epoch: 3   Global Step: 33430   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:48:48,031-Speed 5967.13 samples/sec   Loss 12.9709   LearningRate 0.3474   Epoch: 3   Global Step: 33440   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:48:54,917-Speed 5950.37 samples/sec   Loss 12.9475   LearningRate 0.3474   Epoch: 3   Global Step: 33450   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:49:01,787-Speed 5962.97 samples/sec   Loss 12.9374   LearningRate 0.3473   Epoch: 3   Global Step: 33460   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:49:08,653-Speed 5966.67 samples/sec   Loss 12.9578   LearningRate 0.3473   Epoch: 3   Global Step: 33470   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:49:15,526-Speed 5960.51 samples/sec   Loss 12.8909   LearningRate 0.3472   Epoch: 3   Global Step: 33480   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:49:22,396-Speed 5963.69 samples/sec   Loss 13.1079   LearningRate 0.3472   Epoch: 3   Global Step: 33490   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:49:29,266-Speed 5963.33 samples/sec   Loss 12.9771   LearningRate 0.3472   Epoch: 3   Global Step: 33500   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:49:36,150-Speed 5950.40 samples/sec   Loss 13.0024   LearningRate 0.3471   Epoch: 3   Global Step: 33510   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:49:43,018-Speed 5965.44 samples/sec   Loss 12.9676   LearningRate 0.3471   Epoch: 3   Global Step: 33520   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:49:49,887-Speed 5964.04 samples/sec   Loss 12.9617   LearningRate 0.3470   Epoch: 3   Global Step: 33530   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:49:56,781-Speed 5944.61 samples/sec   Loss 12.9747   LearningRate 0.3470   Epoch: 3   Global Step: 33540   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:50:03,641-Speed 5971.86 samples/sec   Loss 12.9897   LearningRate 0.3470   Epoch: 3   Global Step: 33550   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:50:10,513-Speed 5961.66 samples/sec   Loss 13.0635   LearningRate 0.3469   Epoch: 3   Global Step: 33560   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:50:17,426-Speed 5926.43 samples/sec   Loss 13.0727   LearningRate 0.3469   Epoch: 3   Global Step: 33570   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:50:24,286-Speed 5972.51 samples/sec   Loss 12.9991   LearningRate 0.3468   Epoch: 3   Global Step: 33580   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:50:31,175-Speed 5946.30 samples/sec   Loss 12.9635   LearningRate 0.3468   Epoch: 3   Global Step: 33590   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:50:38,049-Speed 5959.80 samples/sec   Loss 12.9372   LearningRate 0.3468   Epoch: 3   Global Step: 33600   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:50:44,904-Speed 5976.45 samples/sec   Loss 13.0339   LearningRate 0.3467   Epoch: 3   Global Step: 33610   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:50:51,765-Speed 5970.71 samples/sec   Loss 12.9286   LearningRate 0.3467   Epoch: 3   Global Step: 33620   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:50:58,653-Speed 5948.05 samples/sec   Loss 12.8788   LearningRate 0.3466   Epoch: 3   Global Step: 33630   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:51:05,508-Speed 5976.31 samples/sec   Loss 12.9646   LearningRate 0.3466   Epoch: 3   Global Step: 33640   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:51:12,382-Speed 5960.16 samples/sec   Loss 12.9956   LearningRate 0.3466   Epoch: 3   Global Step: 33650   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:51:19,254-Speed 5961.17 samples/sec   Loss 12.9818   LearningRate 0.3465   Epoch: 3   Global Step: 33660   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:51:26,113-Speed 5973.01 samples/sec   Loss 12.9588   LearningRate 0.3465   Epoch: 3   Global Step: 33670   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:51:32,999-Speed 5951.36 samples/sec   Loss 12.8610   LearningRate 0.3465   Epoch: 3   Global Step: 33680   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:51:39,875-Speed 5957.67 samples/sec   Loss 13.0184   LearningRate 0.3464   Epoch: 3   Global Step: 33690   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:51:46,749-Speed 5960.07 samples/sec   Loss 12.9366   LearningRate 0.3464   Epoch: 3   Global Step: 33700   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:51:53,599-Speed 5980.38 samples/sec   Loss 13.0326   LearningRate 0.3463   Epoch: 3   Global Step: 33710   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:52:00,455-Speed 5975.42 samples/sec   Loss 12.9514   LearningRate 0.3463   Epoch: 3   Global Step: 33720   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:52:07,344-Speed 5947.11 samples/sec   Loss 12.9239   LearningRate 0.3463   Epoch: 3   Global Step: 33730   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:52:14,202-Speed 5973.63 samples/sec   Loss 12.9256   LearningRate 0.3462   Epoch: 3   Global Step: 33740   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:52:21,066-Speed 5968.45 samples/sec   Loss 12.8862   LearningRate 0.3462   Epoch: 3   Global Step: 33750   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:52:27,932-Speed 5967.44 samples/sec   Loss 12.9521   LearningRate 0.3461   Epoch: 3   Global Step: 33760   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:52:34,806-Speed 5959.57 samples/sec   Loss 12.9266   LearningRate 0.3461   Epoch: 3   Global Step: 33770   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:52:41,693-Speed 5948.91 samples/sec   Loss 12.9735   LearningRate 0.3461   Epoch: 3   Global Step: 33780   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:52:48,685-Speed 5859.48 samples/sec   Loss 12.9519   LearningRate 0.3460   Epoch: 3   Global Step: 33790   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:52:55,655-Speed 5877.58 samples/sec   Loss 12.9090   LearningRate 0.3460   Epoch: 3   Global Step: 33800   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:53:02,592-Speed 5906.19 samples/sec   Loss 13.0005   LearningRate 0.3459   Epoch: 3   Global Step: 33810   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:53:09,452-Speed 5971.28 samples/sec   Loss 12.8626   LearningRate 0.3459   Epoch: 3   Global Step: 33820   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:53:16,310-Speed 5974.34 samples/sec   Loss 12.9246   LearningRate 0.3459   Epoch: 3   Global Step: 33830   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:53:23,166-Speed 5976.57 samples/sec   Loss 12.9989   LearningRate 0.3458   Epoch: 3   Global Step: 33840   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:53:30,029-Speed 5969.64 samples/sec   Loss 12.9496   LearningRate 0.3458   Epoch: 3   Global Step: 33850   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:53:36,896-Speed 5965.88 samples/sec   Loss 12.9742   LearningRate 0.3457   Epoch: 3   Global Step: 33860   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:53:43,780-Speed 5951.39 samples/sec   Loss 13.0075   LearningRate 0.3457   Epoch: 3   Global Step: 33870   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:53:50,642-Speed 5969.51 samples/sec   Loss 12.9606   LearningRate 0.3457   Epoch: 3   Global Step: 33880   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:53:57,521-Speed 5955.99 samples/sec   Loss 12.9344   LearningRate 0.3456   Epoch: 3   Global Step: 33890   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:54:04,387-Speed 5966.23 samples/sec   Loss 12.9661   LearningRate 0.3456   Epoch: 3   Global Step: 33900   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:54:11,243-Speed 5975.21 samples/sec   Loss 12.9285   LearningRate 0.3455   Epoch: 3   Global Step: 33910   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:54:18,107-Speed 5968.37 samples/sec   Loss 12.8981   LearningRate 0.3455   Epoch: 3   Global Step: 33920   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:54:24,961-Speed 5976.90 samples/sec   Loss 12.9564   LearningRate 0.3455   Epoch: 3   Global Step: 33930   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:54:31,844-Speed 5952.40 samples/sec   Loss 13.0498   LearningRate 0.3454   Epoch: 3   Global Step: 33940   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:54:38,694-Speed 5980.71 samples/sec   Loss 12.9252   LearningRate 0.3454   Epoch: 3   Global Step: 33950   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:54:45,576-Speed 5955.31 samples/sec   Loss 12.8676   LearningRate 0.3453   Epoch: 3   Global Step: 33960   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:54:52,472-Speed 5941.24 samples/sec   Loss 12.8670   LearningRate 0.3453   Epoch: 3   Global Step: 33970   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:54:59,332-Speed 5971.96 samples/sec   Loss 12.9310   LearningRate 0.3453   Epoch: 3   Global Step: 33980   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:55:06,187-Speed 5975.77 samples/sec   Loss 12.8811   LearningRate 0.3452   Epoch: 3   Global Step: 33990   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:55:13,145-Speed 5889.03 samples/sec   Loss 12.9152   LearningRate 0.3452   Epoch: 3   Global Step: 34000   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:55:20,099-Speed 5890.93 samples/sec   Loss 12.9747   LearningRate 0.3451   Epoch: 3   Global Step: 34010   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:55:26,996-Speed 5939.88 samples/sec   Loss 12.9659   LearningRate 0.3451   Epoch: 3   Global Step: 34020   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:55:33,878-Speed 5953.20 samples/sec   Loss 12.9396   LearningRate 0.3451   Epoch: 3   Global Step: 34030   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:55:40,730-Speed 5978.86 samples/sec   Loss 12.8983   LearningRate 0.3450   Epoch: 3   Global Step: 34040   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:55:47,598-Speed 5965.39 samples/sec   Loss 12.9311   LearningRate 0.3450   Epoch: 3   Global Step: 34050   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:55:54,470-Speed 5961.56 samples/sec   Loss 12.9677   LearningRate 0.3449   Epoch: 3   Global Step: 34060   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:56:01,340-Speed 5963.37 samples/sec   Loss 12.9206   LearningRate 0.3449   Epoch: 3   Global Step: 34070   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:56:08,203-Speed 5968.69 samples/sec   Loss 12.9075   LearningRate 0.3449   Epoch: 3   Global Step: 34080   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 02:56:15,076-Speed 5961.26 samples/sec   Loss 12.9878   LearningRate 0.3448   Epoch: 3   Global Step: 34090   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:56:21,948-Speed 5961.09 samples/sec   Loss 12.9206   LearningRate 0.3448   Epoch: 3   Global Step: 34100   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:56:28,823-Speed 5958.97 samples/sec   Loss 12.8239   LearningRate 0.3447   Epoch: 3   Global Step: 34110   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:56:35,690-Speed 5965.89 samples/sec   Loss 12.9741   LearningRate 0.3447   Epoch: 3   Global Step: 34120   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:56:42,551-Speed 5971.24 samples/sec   Loss 12.8839   LearningRate 0.3447   Epoch: 3   Global Step: 34130   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:56:49,412-Speed 5970.56 samples/sec   Loss 12.9548   LearningRate 0.3446   Epoch: 3   Global Step: 34140   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:56:56,281-Speed 5964.94 samples/sec   Loss 12.8104   LearningRate 0.3446   Epoch: 3   Global Step: 34150   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:57:03,150-Speed 5964.04 samples/sec   Loss 12.9538   LearningRate 0.3445   Epoch: 3   Global Step: 34160   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:57:10,068-Speed 5921.89 samples/sec   Loss 12.9426   LearningRate 0.3445   Epoch: 3   Global Step: 34170   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:57:17,057-Speed 5861.92 samples/sec   Loss 12.9151   LearningRate 0.3445   Epoch: 3   Global Step: 34180   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:57:24,010-Speed 5891.94 samples/sec   Loss 12.8918   LearningRate 0.3444   Epoch: 3   Global Step: 34190   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:57:30,971-Speed 5885.54 samples/sec   Loss 12.9776   LearningRate 0.3444   Epoch: 3   Global Step: 34200   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:57:37,850-Speed 5956.06 samples/sec   Loss 12.8922   LearningRate 0.3443   Epoch: 3   Global Step: 34210   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:57:44,723-Speed 5960.76 samples/sec   Loss 12.9198   LearningRate 0.3443   Epoch: 3   Global Step: 34220   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:57:51,594-Speed 5962.70 samples/sec   Loss 12.8769   LearningRate 0.3443   Epoch: 3   Global Step: 34230   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:57:58,467-Speed 5960.14 samples/sec   Loss 12.9206   LearningRate 0.3442   Epoch: 3   Global Step: 34240   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:58:05,344-Speed 5957.28 samples/sec   Loss 12.8988   LearningRate 0.3442   Epoch: 3   Global Step: 34250   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:58:12,208-Speed 5968.24 samples/sec   Loss 12.8704   LearningRate 0.3441   Epoch: 3   Global Step: 34260   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:58:19,088-Speed 5954.45 samples/sec   Loss 12.8952   LearningRate 0.3441   Epoch: 3   Global Step: 34270   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:58:25,958-Speed 5962.85 samples/sec   Loss 12.8123   LearningRate 0.3441   Epoch: 3   Global Step: 34280   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:58:32,831-Speed 5961.07 samples/sec   Loss 12.8565   LearningRate 0.3440   Epoch: 3   Global Step: 34290   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:58:39,704-Speed 5961.40 samples/sec   Loss 12.9327   LearningRate 0.3440   Epoch: 3   Global Step: 34300   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:58:46,570-Speed 5966.74 samples/sec   Loss 12.8449   LearningRate 0.3439   Epoch: 3   Global Step: 34310   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:58:53,430-Speed 5971.74 samples/sec   Loss 12.8190   LearningRate 0.3439   Epoch: 3   Global Step: 34320   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:59:00,301-Speed 5962.42 samples/sec   Loss 12.9359   LearningRate 0.3439   Epoch: 3   Global Step: 34330   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:59:07,173-Speed 5961.65 samples/sec   Loss 12.8642   LearningRate 0.3438   Epoch: 3   Global Step: 34340   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:59:14,028-Speed 5976.15 samples/sec   Loss 12.8692   LearningRate 0.3438   Epoch: 3   Global Step: 34350   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:59:20,914-Speed 5949.89 samples/sec   Loss 12.9574   LearningRate 0.3437   Epoch: 3   Global Step: 34360   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:59:27,778-Speed 5968.56 samples/sec   Loss 12.9374   LearningRate 0.3437   Epoch: 3   Global Step: 34370   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:59:34,708-Speed 5914.25 samples/sec   Loss 12.9699   LearningRate 0.3437   Epoch: 3   Global Step: 34380   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:59:41,561-Speed 5978.40 samples/sec   Loss 12.9106   LearningRate 0.3436   Epoch: 3   Global Step: 34390   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:59:48,439-Speed 5956.95 samples/sec   Loss 12.8611   LearningRate 0.3436   Epoch: 3   Global Step: 34400   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 02:59:55,308-Speed 5964.75 samples/sec   Loss 12.9241   LearningRate 0.3435   Epoch: 3   Global Step: 34410   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:00:02,170-Speed 5970.00 samples/sec   Loss 12.9254   LearningRate 0.3435   Epoch: 3   Global Step: 34420   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:00:09,038-Speed 5965.39 samples/sec   Loss 12.8321   LearningRate 0.3435   Epoch: 3   Global Step: 34430   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:00:15,895-Speed 5976.79 samples/sec   Loss 12.7926   LearningRate 0.3434   Epoch: 3   Global Step: 34440   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:00:22,746-Speed 5980.07 samples/sec   Loss 12.8846   LearningRate 0.3434   Epoch: 3   Global Step: 34450   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:00:29,612-Speed 5966.65 samples/sec   Loss 12.8582   LearningRate 0.3433   Epoch: 3   Global Step: 34460   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:00:36,474-Speed 5970.01 samples/sec   Loss 12.9062   LearningRate 0.3433   Epoch: 3   Global Step: 34470   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:00:43,337-Speed 5969.14 samples/sec   Loss 12.8937   LearningRate 0.3433   Epoch: 3   Global Step: 34480   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:00:50,196-Speed 5972.09 samples/sec   Loss 12.9692   LearningRate 0.3432   Epoch: 3   Global Step: 34490   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:00:57,065-Speed 5964.42 samples/sec   Loss 12.8395   LearningRate 0.3432   Epoch: 3   Global Step: 34500   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:01:03,949-Speed 5951.59 samples/sec   Loss 12.9037   LearningRate 0.3431   Epoch: 3   Global Step: 34510   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:01:10,840-Speed 5945.07 samples/sec   Loss 12.8309   LearningRate 0.3431   Epoch: 3   Global Step: 34520   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:01:17,703-Speed 5969.36 samples/sec   Loss 12.8004   LearningRate 0.3431   Epoch: 3   Global Step: 34530   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:01:24,565-Speed 5973.28 samples/sec   Loss 12.8615   LearningRate 0.3430   Epoch: 3   Global Step: 34540   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:01:31,469-Speed 5933.71 samples/sec   Loss 12.8086   LearningRate 0.3430   Epoch: 3   Global Step: 34550   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:01:38,330-Speed 5970.93 samples/sec   Loss 12.8029   LearningRate 0.3429   Epoch: 3   Global Step: 34560   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:01:45,196-Speed 5966.62 samples/sec   Loss 12.7801   LearningRate 0.3429   Epoch: 3   Global Step: 34570   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:01:52,086-Speed 5946.98 samples/sec   Loss 12.9255   LearningRate 0.3429   Epoch: 3   Global Step: 34580   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:01:58,950-Speed 5968.65 samples/sec   Loss 12.9080   LearningRate 0.3428   Epoch: 3   Global Step: 34590   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:02:05,824-Speed 5959.48 samples/sec   Loss 12.8843   LearningRate 0.3428   Epoch: 3   Global Step: 34600   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:02:12,700-Speed 5958.16 samples/sec   Loss 12.9027   LearningRate 0.3428   Epoch: 3   Global Step: 34610   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:02:19,561-Speed 5975.34 samples/sec   Loss 12.8984   LearningRate 0.3427   Epoch: 3   Global Step: 34620   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:02:26,413-Speed 5979.20 samples/sec   Loss 12.9026   LearningRate 0.3427   Epoch: 3   Global Step: 34630   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:02:33,279-Speed 5966.15 samples/sec   Loss 12.8617   LearningRate 0.3426   Epoch: 3   Global Step: 34640   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:02:40,158-Speed 5955.65 samples/sec   Loss 12.9171   LearningRate 0.3426   Epoch: 3   Global Step: 34650   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:02:47,024-Speed 5969.41 samples/sec   Loss 12.9317   LearningRate 0.3426   Epoch: 3   Global Step: 34660   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:02:53,913-Speed 5946.85 samples/sec   Loss 12.8840   LearningRate 0.3425   Epoch: 3   Global Step: 34670   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:03:00,771-Speed 5973.61 samples/sec   Loss 12.8959   LearningRate 0.3425   Epoch: 3   Global Step: 34680   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:03:07,649-Speed 5956.64 samples/sec   Loss 12.8747   LearningRate 0.3424   Epoch: 3   Global Step: 34690   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:03:14,507-Speed 5974.06 samples/sec   Loss 12.8555   LearningRate 0.3424   Epoch: 3   Global Step: 34700   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:03:21,400-Speed 5943.30 samples/sec   Loss 12.8518   LearningRate 0.3424   Epoch: 3   Global Step: 34710   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:03:28,303-Speed 5934.15 samples/sec   Loss 12.8006   LearningRate 0.3423   Epoch: 3   Global Step: 34720   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:03:35,170-Speed 5966.54 samples/sec   Loss 12.8625   LearningRate 0.3423   Epoch: 3   Global Step: 34730   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:03:42,039-Speed 5963.40 samples/sec   Loss 12.8481   LearningRate 0.3422   Epoch: 3   Global Step: 34740   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:03:48,890-Speed 5980.00 samples/sec   Loss 12.8778   LearningRate 0.3422   Epoch: 3   Global Step: 34750   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:03:55,771-Speed 5953.82 samples/sec   Loss 12.9244   LearningRate 0.3422   Epoch: 3   Global Step: 34760   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:04:02,653-Speed 5953.13 samples/sec   Loss 12.8778   LearningRate 0.3421   Epoch: 3   Global Step: 34770   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:04:09,535-Speed 5952.88 samples/sec   Loss 12.7923   LearningRate 0.3421   Epoch: 3   Global Step: 34780   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:04:16,412-Speed 5959.25 samples/sec   Loss 12.7474   LearningRate 0.3420   Epoch: 3   Global Step: 34790   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:04:23,271-Speed 5972.51 samples/sec   Loss 12.8250   LearningRate 0.3420   Epoch: 3   Global Step: 34800   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:04:30,151-Speed 5954.73 samples/sec   Loss 12.9870   LearningRate 0.3420   Epoch: 3   Global Step: 34810   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:04:36,997-Speed 5984.22 samples/sec   Loss 12.8540   LearningRate 0.3419   Epoch: 3   Global Step: 34820   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:04:43,877-Speed 5954.24 samples/sec   Loss 12.8846   LearningRate 0.3419   Epoch: 3   Global Step: 34830   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:04:50,736-Speed 5973.37 samples/sec   Loss 12.7806   LearningRate 0.3418   Epoch: 3   Global Step: 34840   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:04:57,635-Speed 5938.04 samples/sec   Loss 12.8654   LearningRate 0.3418   Epoch: 3   Global Step: 34850   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:05:04,544-Speed 5930.39 samples/sec   Loss 12.8341   LearningRate 0.3418   Epoch: 3   Global Step: 34860   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:05:11,398-Speed 5976.79 samples/sec   Loss 12.7746   LearningRate 0.3417   Epoch: 3   Global Step: 34870   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:05:18,281-Speed 5951.94 samples/sec   Loss 12.8412   LearningRate 0.3417   Epoch: 3   Global Step: 34880   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:05:25,139-Speed 5978.76 samples/sec   Loss 12.8030   LearningRate 0.3416   Epoch: 3   Global Step: 34890   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:05:32,011-Speed 5961.46 samples/sec   Loss 12.8893   LearningRate 0.3416   Epoch: 3   Global Step: 34900   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:05:38,905-Speed 5942.61 samples/sec   Loss 12.8925   LearningRate 0.3416   Epoch: 3   Global Step: 34910   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:05:45,772-Speed 5966.23 samples/sec   Loss 12.8383   LearningRate 0.3415   Epoch: 3   Global Step: 34920   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:05:52,638-Speed 5966.21 samples/sec   Loss 12.8303   LearningRate 0.3415   Epoch: 3   Global Step: 34930   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:05:59,500-Speed 5970.47 samples/sec   Loss 12.8605   LearningRate 0.3414   Epoch: 3   Global Step: 34940   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:06:06,379-Speed 5956.00 samples/sec   Loss 12.7994   LearningRate 0.3414   Epoch: 3   Global Step: 34950   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:06:13,250-Speed 5962.21 samples/sec   Loss 12.8447   LearningRate 0.3414   Epoch: 3   Global Step: 34960   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:06:20,137-Speed 5951.80 samples/sec   Loss 12.8177   LearningRate 0.3413   Epoch: 3   Global Step: 34970   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:06:27,082-Speed 5898.95 samples/sec   Loss 12.8266   LearningRate 0.3413   Epoch: 3   Global Step: 34980   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:06:33,963-Speed 5954.17 samples/sec   Loss 12.8863   LearningRate 0.3412   Epoch: 3   Global Step: 34990   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:06:40,841-Speed 5956.14 samples/sec   Loss 12.8217   LearningRate 0.3412   Epoch: 3   Global Step: 35000   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:07:07,647-[lfw][35000]XNorm: 22.868388
Training: 2022-01-08 03:07:07,647-[lfw][35000]Accuracy-Flip: 0.99617+-0.00248
Training: 2022-01-08 03:07:07,648-[lfw][35000]Accuracy-Highest: 0.99650
Training: 2022-01-08 03:07:38,565-[cfp_fp][35000]XNorm: 20.275993
Training: 2022-01-08 03:07:38,566-[cfp_fp][35000]Accuracy-Flip: 0.97057+-0.00661
Training: 2022-01-08 03:07:38,567-[cfp_fp][35000]Accuracy-Highest: 0.97057
Training: 2022-01-08 03:08:05,315-[agedb_30][35000]XNorm: 22.245622
Training: 2022-01-08 03:08:05,316-[agedb_30][35000]Accuracy-Flip: 0.96200+-0.00823
Training: 2022-01-08 03:08:05,316-[agedb_30][35000]Accuracy-Highest: 0.96200
Training: 2022-01-08 03:08:12,167-Speed 448.51 samples/sec   Loss 12.8203   LearningRate 0.3412   Epoch: 3   Global Step: 35010   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:08:19,021-Speed 5977.63 samples/sec   Loss 12.8510   LearningRate 0.3411   Epoch: 3   Global Step: 35020   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:08:25,884-Speed 5969.47 samples/sec   Loss 12.7904   LearningRate 0.3411   Epoch: 3   Global Step: 35030   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:08:32,763-Speed 5955.90 samples/sec   Loss 12.7809   LearningRate 0.3410   Epoch: 3   Global Step: 35040   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:08:39,640-Speed 5956.81 samples/sec   Loss 12.8707   LearningRate 0.3410   Epoch: 3   Global Step: 35050   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:08:46,532-Speed 5947.09 samples/sec   Loss 12.8722   LearningRate 0.3410   Epoch: 3   Global Step: 35060   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:08:53,412-Speed 5953.93 samples/sec   Loss 12.7846   LearningRate 0.3409   Epoch: 3   Global Step: 35070   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:09:00,296-Speed 5951.36 samples/sec   Loss 12.8447   LearningRate 0.3409   Epoch: 3   Global Step: 35080   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:09:07,177-Speed 5953.29 samples/sec   Loss 12.8343   LearningRate 0.3408   Epoch: 3   Global Step: 35090   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:09:14,074-Speed 5939.26 samples/sec   Loss 12.8410   LearningRate 0.3408   Epoch: 3   Global Step: 35100   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:09:20,957-Speed 5952.89 samples/sec   Loss 12.9310   LearningRate 0.3408   Epoch: 3   Global Step: 35110   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:09:27,832-Speed 5958.70 samples/sec   Loss 12.8291   LearningRate 0.3407   Epoch: 3   Global Step: 35120   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:09:34,734-Speed 5935.20 samples/sec   Loss 12.8046   LearningRate 0.3407   Epoch: 3   Global Step: 35130   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:09:41,771-Speed 5822.02 samples/sec   Loss 12.7869   LearningRate 0.3407   Epoch: 3   Global Step: 35140   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:09:48,713-Speed 5902.26 samples/sec   Loss 12.7894   LearningRate 0.3406   Epoch: 3   Global Step: 35150   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:09:55,568-Speed 5976.64 samples/sec   Loss 12.8847   LearningRate 0.3406   Epoch: 3   Global Step: 35160   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:10:02,424-Speed 5974.60 samples/sec   Loss 12.8157   LearningRate 0.3405   Epoch: 3   Global Step: 35170   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:10:09,273-Speed 5982.48 samples/sec   Loss 12.8728   LearningRate 0.3405   Epoch: 3   Global Step: 35180   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:10:16,143-Speed 5963.69 samples/sec   Loss 12.8404   LearningRate 0.3405   Epoch: 3   Global Step: 35190   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:10:23,010-Speed 5966.16 samples/sec   Loss 12.7803   LearningRate 0.3404   Epoch: 3   Global Step: 35200   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:10:29,906-Speed 5940.83 samples/sec   Loss 12.8216   LearningRate 0.3404   Epoch: 3   Global Step: 35210   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:10:36,793-Speed 5949.79 samples/sec   Loss 12.8713   LearningRate 0.3403   Epoch: 3   Global Step: 35220   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:10:43,742-Speed 5895.05 samples/sec   Loss 12.7762   LearningRate 0.3403   Epoch: 3   Global Step: 35230   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:10:50,594-Speed 5979.02 samples/sec   Loss 12.8535   LearningRate 0.3403   Epoch: 3   Global Step: 35240   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:10:57,459-Speed 5968.36 samples/sec   Loss 12.8611   LearningRate 0.3402   Epoch: 3   Global Step: 35250   Fp16 Grad Scale: 524288   Required: 34 hours
Training: 2022-01-08 03:11:04,307-Speed 5982.29 samples/sec   Loss 12.8067   LearningRate 0.3402   Epoch: 3   Global Step: 35260   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:11:11,199-Speed 5944.31 samples/sec   Loss 12.7844   LearningRate 0.3401   Epoch: 3   Global Step: 35270   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:11:18,069-Speed 5963.87 samples/sec   Loss 12.7851   LearningRate 0.3401   Epoch: 3   Global Step: 35280   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:11:24,921-Speed 5979.19 samples/sec   Loss 12.8212   LearningRate 0.3401   Epoch: 3   Global Step: 35290   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:11:31,794-Speed 5960.81 samples/sec   Loss 12.7987   LearningRate 0.3400   Epoch: 3   Global Step: 35300   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:11:38,778-Speed 5869.44 samples/sec   Loss 12.8035   LearningRate 0.3400   Epoch: 3   Global Step: 35310   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:11:45,686-Speed 5931.33 samples/sec   Loss 12.7588   LearningRate 0.3399   Epoch: 3   Global Step: 35320   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:11:52,546-Speed 5972.12 samples/sec   Loss 12.8414   LearningRate 0.3399   Epoch: 3   Global Step: 35330   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:11:59,392-Speed 5982.96 samples/sec   Loss 12.7769   LearningRate 0.3399   Epoch: 3   Global Step: 35340   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:12:06,267-Speed 5959.21 samples/sec   Loss 12.7827   LearningRate 0.3398   Epoch: 3   Global Step: 35350   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:12:13,114-Speed 5983.36 samples/sec   Loss 12.7988   LearningRate 0.3398   Epoch: 3   Global Step: 35360   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:12:19,983-Speed 5964.11 samples/sec   Loss 12.8141   LearningRate 0.3397   Epoch: 3   Global Step: 35370   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:12:26,831-Speed 5982.86 samples/sec   Loss 12.9291   LearningRate 0.3397   Epoch: 3   Global Step: 35380   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:12:33,704-Speed 5960.47 samples/sec   Loss 12.7336   LearningRate 0.3397   Epoch: 3   Global Step: 35390   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:12:40,548-Speed 5985.29 samples/sec   Loss 12.7857   LearningRate 0.3396   Epoch: 3   Global Step: 35400   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:12:47,404-Speed 5977.37 samples/sec   Loss 12.7702   LearningRate 0.3396   Epoch: 3   Global Step: 35410   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:12:54,265-Speed 5971.03 samples/sec   Loss 12.8544   LearningRate 0.3395   Epoch: 3   Global Step: 35420   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:13:01,122-Speed 5974.05 samples/sec   Loss 12.8004   LearningRate 0.3395   Epoch: 3   Global Step: 35430   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:13:07,981-Speed 5972.61 samples/sec   Loss 12.6801   LearningRate 0.3395   Epoch: 3   Global Step: 35440   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:13:14,836-Speed 5976.48 samples/sec   Loss 12.7863   LearningRate 0.3394   Epoch: 3   Global Step: 35450   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:13:21,705-Speed 5963.98 samples/sec   Loss 12.7665   LearningRate 0.3394   Epoch: 3   Global Step: 35460   Fp16 Grad Scale: 524288   Required: 34 hours
Training: 2022-01-08 03:13:28,561-Speed 5976.24 samples/sec   Loss 12.7936   LearningRate 0.3393   Epoch: 3   Global Step: 35470   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:13:35,427-Speed 5966.67 samples/sec   Loss 12.8745   LearningRate 0.3393   Epoch: 3   Global Step: 35480   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:13:42,311-Speed 5952.83 samples/sec   Loss 12.8626   LearningRate 0.3393   Epoch: 3   Global Step: 35490   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:13:49,167-Speed 5975.55 samples/sec   Loss 12.7848   LearningRate 0.3392   Epoch: 3   Global Step: 35500   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:13:56,031-Speed 5968.29 samples/sec   Loss 12.7828   LearningRate 0.3392   Epoch: 3   Global Step: 35510   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:14:02,898-Speed 5966.49 samples/sec   Loss 12.6658   LearningRate 0.3391   Epoch: 3   Global Step: 35520   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:14:09,781-Speed 5951.97 samples/sec   Loss 12.7361   LearningRate 0.3391   Epoch: 3   Global Step: 35530   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:14:16,699-Speed 5922.30 samples/sec   Loss 12.7698   LearningRate 0.3391   Epoch: 3   Global Step: 35540   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:14:23,559-Speed 5973.44 samples/sec   Loss 12.7793   LearningRate 0.3390   Epoch: 3   Global Step: 35550   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:14:30,405-Speed 5984.42 samples/sec   Loss 12.8783   LearningRate 0.3390   Epoch: 3   Global Step: 35560   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:14:37,274-Speed 5964.55 samples/sec   Loss 12.7579   LearningRate 0.3390   Epoch: 3   Global Step: 35570   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:14:44,134-Speed 5971.82 samples/sec   Loss 12.8019   LearningRate 0.3389   Epoch: 3   Global Step: 35580   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:14:50,990-Speed 5976.06 samples/sec   Loss 12.8036   LearningRate 0.3389   Epoch: 3   Global Step: 35590   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:14:57,869-Speed 5957.62 samples/sec   Loss 12.7034   LearningRate 0.3388   Epoch: 3   Global Step: 35600   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:15:04,720-Speed 5979.46 samples/sec   Loss 12.8082   LearningRate 0.3388   Epoch: 3   Global Step: 35610   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:15:11,596-Speed 5957.40 samples/sec   Loss 12.7496   LearningRate 0.3388   Epoch: 3   Global Step: 35620   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:15:18,468-Speed 5961.31 samples/sec   Loss 12.7596   LearningRate 0.3387   Epoch: 3   Global Step: 35630   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:15:25,344-Speed 5958.45 samples/sec   Loss 12.7673   LearningRate 0.3387   Epoch: 3   Global Step: 35640   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:15:32,231-Speed 5949.14 samples/sec   Loss 12.8837   LearningRate 0.3386   Epoch: 3   Global Step: 35650   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:15:39,142-Speed 5927.86 samples/sec   Loss 12.7112   LearningRate 0.3386   Epoch: 3   Global Step: 35660   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:15:46,014-Speed 5961.65 samples/sec   Loss 12.7600   LearningRate 0.3386   Epoch: 3   Global Step: 35670   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:15:52,876-Speed 5970.46 samples/sec   Loss 12.8109   LearningRate 0.3385   Epoch: 3   Global Step: 35680   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:15:59,733-Speed 5975.39 samples/sec   Loss 12.8131   LearningRate 0.3385   Epoch: 3   Global Step: 35690   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:16:06,594-Speed 5971.08 samples/sec   Loss 12.6762   LearningRate 0.3384   Epoch: 3   Global Step: 35700   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:16:13,451-Speed 5974.49 samples/sec   Loss 12.8999   LearningRate 0.3384   Epoch: 3   Global Step: 35710   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:16:20,321-Speed 5963.20 samples/sec   Loss 12.7516   LearningRate 0.3384   Epoch: 3   Global Step: 35720   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:16:27,186-Speed 5968.15 samples/sec   Loss 12.7926   LearningRate 0.3383   Epoch: 3   Global Step: 35730   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:16:34,072-Speed 5949.04 samples/sec   Loss 12.7247   LearningRate 0.3383   Epoch: 3   Global Step: 35740   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:16:40,949-Speed 5958.04 samples/sec   Loss 12.7451   LearningRate 0.3382   Epoch: 3   Global Step: 35750   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:16:47,815-Speed 5966.10 samples/sec   Loss 12.7257   LearningRate 0.3382   Epoch: 3   Global Step: 35760   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:16:54,669-Speed 5977.20 samples/sec   Loss 12.8309   LearningRate 0.3382   Epoch: 3   Global Step: 35770   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:17:01,540-Speed 5963.35 samples/sec   Loss 12.8084   LearningRate 0.3381   Epoch: 3   Global Step: 35780   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:17:08,408-Speed 5965.15 samples/sec   Loss 12.7074   LearningRate 0.3381   Epoch: 3   Global Step: 35790   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:17:15,284-Speed 5957.67 samples/sec   Loss 12.8252   LearningRate 0.3380   Epoch: 3   Global Step: 35800   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:17:22,171-Speed 5951.88 samples/sec   Loss 12.6984   LearningRate 0.3380   Epoch: 3   Global Step: 35810   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:17:29,040-Speed 5964.44 samples/sec   Loss 12.8010   LearningRate 0.3380   Epoch: 3   Global Step: 35820   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:17:35,934-Speed 5942.90 samples/sec   Loss 12.7681   LearningRate 0.3379   Epoch: 3   Global Step: 35830   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:17:42,795-Speed 5973.33 samples/sec   Loss 12.7770   LearningRate 0.3379   Epoch: 3   Global Step: 35840   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:17:49,682-Speed 5949.68 samples/sec   Loss 12.7095   LearningRate 0.3378   Epoch: 3   Global Step: 35850   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:17:56,559-Speed 5957.22 samples/sec   Loss 12.7904   LearningRate 0.3378   Epoch: 3   Global Step: 35860   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:18:03,415-Speed 5976.02 samples/sec   Loss 12.7532   LearningRate 0.3378   Epoch: 3   Global Step: 35870   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:18:10,384-Speed 5878.68 samples/sec   Loss 12.7651   LearningRate 0.3377   Epoch: 3   Global Step: 35880   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:18:17,241-Speed 5975.04 samples/sec   Loss 12.7378   LearningRate 0.3377   Epoch: 3   Global Step: 35890   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:18:24,111-Speed 5962.86 samples/sec   Loss 12.7082   LearningRate 0.3377   Epoch: 3   Global Step: 35900   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:18:30,981-Speed 5963.71 samples/sec   Loss 12.8126   LearningRate 0.3376   Epoch: 3   Global Step: 35910   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:18:37,858-Speed 5957.54 samples/sec   Loss 12.8129   LearningRate 0.3376   Epoch: 3   Global Step: 35920   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:18:44,718-Speed 5971.07 samples/sec   Loss 12.6949   LearningRate 0.3375   Epoch: 3   Global Step: 35930   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:18:51,593-Speed 5961.05 samples/sec   Loss 12.7852   LearningRate 0.3375   Epoch: 3   Global Step: 35940   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:18:58,458-Speed 5968.24 samples/sec   Loss 12.7763   LearningRate 0.3375   Epoch: 3   Global Step: 35950   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:19:05,322-Speed 5969.64 samples/sec   Loss 12.7034   LearningRate 0.3374   Epoch: 3   Global Step: 35960   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:19:12,171-Speed 5980.79 samples/sec   Loss 12.7383   LearningRate 0.3374   Epoch: 3   Global Step: 35970   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:19:19,078-Speed 5931.86 samples/sec   Loss 12.7678   LearningRate 0.3373   Epoch: 3   Global Step: 35980   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:19:25,937-Speed 5972.61 samples/sec   Loss 12.7070   LearningRate 0.3373   Epoch: 3   Global Step: 35990   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:19:32,843-Speed 5931.95 samples/sec   Loss 12.7858   LearningRate 0.3373   Epoch: 3   Global Step: 36000   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:19:39,708-Speed 5967.76 samples/sec   Loss 12.7878   LearningRate 0.3372   Epoch: 3   Global Step: 36010   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:19:46,563-Speed 5976.28 samples/sec   Loss 12.7912   LearningRate 0.3372   Epoch: 3   Global Step: 36020   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:19:53,424-Speed 5971.08 samples/sec   Loss 12.7139   LearningRate 0.3371   Epoch: 3   Global Step: 36030   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:20:00,285-Speed 5971.09 samples/sec   Loss 12.6620   LearningRate 0.3371   Epoch: 3   Global Step: 36040   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:20:07,134-Speed 5981.19 samples/sec   Loss 12.6628   LearningRate 0.3371   Epoch: 3   Global Step: 36050   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:20:13,985-Speed 5979.86 samples/sec   Loss 12.8105   LearningRate 0.3370   Epoch: 3   Global Step: 36060   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:20:20,899-Speed 5927.22 samples/sec   Loss 12.7956   LearningRate 0.3370   Epoch: 3   Global Step: 36070   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:20:27,778-Speed 5954.95 samples/sec   Loss 12.7076   LearningRate 0.3369   Epoch: 3   Global Step: 36080   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:20:34,633-Speed 5976.67 samples/sec   Loss 12.7001   LearningRate 0.3369   Epoch: 3   Global Step: 36090   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:20:41,488-Speed 5976.33 samples/sec   Loss 12.7498   LearningRate 0.3369   Epoch: 3   Global Step: 36100   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:20:48,350-Speed 5970.11 samples/sec   Loss 12.6509   LearningRate 0.3368   Epoch: 3   Global Step: 36110   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:20:55,198-Speed 5982.04 samples/sec   Loss 12.7733   LearningRate 0.3368   Epoch: 3   Global Step: 36120   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:21:02,044-Speed 5983.77 samples/sec   Loss 12.7583   LearningRate 0.3367   Epoch: 3   Global Step: 36130   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:21:08,888-Speed 5986.20 samples/sec   Loss 12.7351   LearningRate 0.3367   Epoch: 3   Global Step: 36140   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:21:15,741-Speed 5978.55 samples/sec   Loss 12.7883   LearningRate 0.3367   Epoch: 3   Global Step: 36150   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:21:22,590-Speed 5981.43 samples/sec   Loss 12.7253   LearningRate 0.3366   Epoch: 3   Global Step: 36160   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:21:29,441-Speed 5979.07 samples/sec   Loss 12.7207   LearningRate 0.3366   Epoch: 3   Global Step: 36170   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:21:36,321-Speed 5971.95 samples/sec   Loss 12.7699   LearningRate 0.3365   Epoch: 3   Global Step: 36180   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:21:43,220-Speed 5938.35 samples/sec   Loss 12.6634   LearningRate 0.3365   Epoch: 3   Global Step: 36190   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:21:50,130-Speed 5927.87 samples/sec   Loss 12.7498   LearningRate 0.3365   Epoch: 3   Global Step: 36200   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:21:57,069-Speed 5904.44 samples/sec   Loss 12.7494   LearningRate 0.3364   Epoch: 3   Global Step: 36210   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:22:03,977-Speed 5930.63 samples/sec   Loss 12.7281   LearningRate 0.3364   Epoch: 3   Global Step: 36220   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:22:10,884-Speed 5931.27 samples/sec   Loss 12.6809   LearningRate 0.3364   Epoch: 3   Global Step: 36230   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:22:17,793-Speed 5929.96 samples/sec   Loss 12.7909   LearningRate 0.3363   Epoch: 3   Global Step: 36240   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:22:24,689-Speed 5940.35 samples/sec   Loss 12.6952   LearningRate 0.3363   Epoch: 3   Global Step: 36250   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:22:31,596-Speed 5931.85 samples/sec   Loss 12.6594   LearningRate 0.3362   Epoch: 3   Global Step: 36260   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:22:38,452-Speed 5975.24 samples/sec   Loss 12.6951   LearningRate 0.3362   Epoch: 3   Global Step: 36270   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:22:45,309-Speed 5974.81 samples/sec   Loss 12.7464   LearningRate 0.3362   Epoch: 3   Global Step: 36280   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:22:52,178-Speed 5964.13 samples/sec   Loss 12.7730   LearningRate 0.3361   Epoch: 3   Global Step: 36290   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:22:59,038-Speed 5972.15 samples/sec   Loss 12.7160   LearningRate 0.3361   Epoch: 3   Global Step: 36300   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:23:05,902-Speed 5968.26 samples/sec   Loss 12.7355   LearningRate 0.3360   Epoch: 3   Global Step: 36310   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:23:12,748-Speed 5984.56 samples/sec   Loss 12.7999   LearningRate 0.3360   Epoch: 3   Global Step: 36320   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:23:19,595-Speed 5983.61 samples/sec   Loss 12.6677   LearningRate 0.3360   Epoch: 3   Global Step: 36330   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:23:26,467-Speed 5960.77 samples/sec   Loss 12.7652   LearningRate 0.3359   Epoch: 3   Global Step: 36340   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-01-08 03:23:33,336-Speed 5965.26 samples/sec   Loss 12.6245   LearningRate 0.3359   Epoch: 3   Global Step: 36350   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:23:40,190-Speed 5977.37 samples/sec   Loss 12.7708   LearningRate 0.3358   Epoch: 3   Global Step: 36360   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:23:47,040-Speed 5981.18 samples/sec   Loss 12.7157   LearningRate 0.3358   Epoch: 3   Global Step: 36370   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:23:53,896-Speed 5975.13 samples/sec   Loss 12.7464   LearningRate 0.3358   Epoch: 3   Global Step: 36380   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:24:00,762-Speed 5968.98 samples/sec   Loss 12.7494   LearningRate 0.3357   Epoch: 3   Global Step: 36390   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:24:07,641-Speed 5955.30 samples/sec   Loss 12.6455   LearningRate 0.3357   Epoch: 3   Global Step: 36400   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:24:14,513-Speed 5961.91 samples/sec   Loss 12.6999   LearningRate 0.3356   Epoch: 3   Global Step: 36410   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:24:21,399-Speed 5951.81 samples/sec   Loss 12.6190   LearningRate 0.3356   Epoch: 3   Global Step: 36420   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:24:28,256-Speed 5973.82 samples/sec   Loss 12.7502   LearningRate 0.3356   Epoch: 3   Global Step: 36430   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-01-08 03:24:35,131-Speed 5959.65 samples/sec   Loss 12.7067   LearningRate 0.3355   Epoch: 3   Global Step: 36440   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:24:41,991-Speed 5972.18 samples/sec   Loss 12.7294   LearningRate 0.3355   Epoch: 3   Global Step: 36450   Fp16 Grad Scale: 524288   Required: 33 hours
Training: 2022-01-08 03:24:48,846-Speed 5976.41 samples/sec   Loss 12.6933   LearningRate 0.3354   Epoch: 3   Global Step: 36460   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:24:55,712-Speed 5966.70 samples/sec   Loss 12.6976   LearningRate 0.3354   Epoch: 3   Global Step: 36470   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:25:02,575-Speed 5969.21 samples/sec   Loss 12.7342   LearningRate 0.3354   Epoch: 3   Global Step: 36480   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:25:09,433-Speed 5973.71 samples/sec   Loss 12.7039   LearningRate 0.3353   Epoch: 3   Global Step: 36490   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:25:16,335-Speed 5935.49 samples/sec   Loss 12.6476   LearningRate 0.3353   Epoch: 3   Global Step: 36500   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:25:23,204-Speed 5964.19 samples/sec   Loss 12.6428   LearningRate 0.3353   Epoch: 3   Global Step: 36510   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:25:30,065-Speed 5970.99 samples/sec   Loss 12.6771   LearningRate 0.3352   Epoch: 3   Global Step: 36520   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:25:36,926-Speed 5970.49 samples/sec   Loss 12.7464   LearningRate 0.3352   Epoch: 3   Global Step: 36530   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:25:43,795-Speed 5964.58 samples/sec   Loss 12.7487   LearningRate 0.3351   Epoch: 3   Global Step: 36540   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:25:50,658-Speed 5969.28 samples/sec   Loss 12.7200   LearningRate 0.3351   Epoch: 3   Global Step: 36550   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:25:57,523-Speed 5968.10 samples/sec   Loss 12.7274   LearningRate 0.3351   Epoch: 3   Global Step: 36560   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:26:04,393-Speed 5963.24 samples/sec   Loss 12.7040   LearningRate 0.3350   Epoch: 3   Global Step: 36570   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:26:11,241-Speed 5982.05 samples/sec   Loss 12.7686   LearningRate 0.3350   Epoch: 3   Global Step: 36580   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:26:18,119-Speed 5956.85 samples/sec   Loss 12.6817   LearningRate 0.3349   Epoch: 3   Global Step: 36590   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:26:25,013-Speed 5942.66 samples/sec   Loss 12.6538   LearningRate 0.3349   Epoch: 3   Global Step: 36600   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:26:31,872-Speed 5973.56 samples/sec   Loss 12.6835   LearningRate 0.3349   Epoch: 3   Global Step: 36610   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:26:38,739-Speed 5965.40 samples/sec   Loss 12.7363   LearningRate 0.3348   Epoch: 3   Global Step: 36620   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:26:45,614-Speed 5958.85 samples/sec   Loss 12.7086   LearningRate 0.3348   Epoch: 3   Global Step: 36630   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:26:52,522-Speed 5932.44 samples/sec   Loss 12.7005   LearningRate 0.3347   Epoch: 3   Global Step: 36640   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:26:59,394-Speed 5961.86 samples/sec   Loss 12.8042   LearningRate 0.3347   Epoch: 3   Global Step: 36650   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:27:06,267-Speed 5960.03 samples/sec   Loss 12.7443   LearningRate 0.3347   Epoch: 3   Global Step: 36660   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:27:13,120-Speed 5979.60 samples/sec   Loss 12.7888   LearningRate 0.3346   Epoch: 3   Global Step: 36670   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:27:19,986-Speed 5966.82 samples/sec   Loss 12.7056   LearningRate 0.3346   Epoch: 3   Global Step: 36680   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:27:26,869-Speed 5951.91 samples/sec   Loss 12.7464   LearningRate 0.3345   Epoch: 3   Global Step: 36690   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:27:33,738-Speed 5963.99 samples/sec   Loss 12.6800   LearningRate 0.3345   Epoch: 3   Global Step: 36700   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:27:40,595-Speed 5974.15 samples/sec   Loss 12.7388   LearningRate 0.3345   Epoch: 3   Global Step: 36710   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:27:47,479-Speed 5951.85 samples/sec   Loss 12.6554   LearningRate 0.3344   Epoch: 3   Global Step: 36720   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:27:54,373-Speed 5942.56 samples/sec   Loss 12.5863   LearningRate 0.3344   Epoch: 3   Global Step: 36730   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:28:01,236-Speed 5969.83 samples/sec   Loss 12.6945   LearningRate 0.3344   Epoch: 3   Global Step: 36740   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:28:08,084-Speed 5982.55 samples/sec   Loss 12.6624   LearningRate 0.3343   Epoch: 3   Global Step: 36750   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:28:14,940-Speed 5976.33 samples/sec   Loss 12.6923   LearningRate 0.3343   Epoch: 3   Global Step: 36760   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:28:21,826-Speed 5949.68 samples/sec   Loss 12.6267   LearningRate 0.3342   Epoch: 3   Global Step: 36770   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:28:28,672-Speed 5984.63 samples/sec   Loss 12.6975   LearningRate 0.3342   Epoch: 3   Global Step: 36780   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:28:35,543-Speed 5962.07 samples/sec   Loss 12.7067   LearningRate 0.3342   Epoch: 3   Global Step: 36790   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:28:42,412-Speed 5963.86 samples/sec   Loss 12.5945   LearningRate 0.3341   Epoch: 3   Global Step: 36800   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:28:49,274-Speed 5970.37 samples/sec   Loss 12.7477   LearningRate 0.3341   Epoch: 3   Global Step: 36810   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:28:56,143-Speed 5964.04 samples/sec   Loss 12.6694   LearningRate 0.3340   Epoch: 3   Global Step: 36820   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:29:03,009-Speed 5966.77 samples/sec   Loss 12.7394   LearningRate 0.3340   Epoch: 3   Global Step: 36830   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:29:09,860-Speed 5980.30 samples/sec   Loss 12.7103   LearningRate 0.3340   Epoch: 3   Global Step: 36840   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:29:16,724-Speed 5968.43 samples/sec   Loss 12.6837   LearningRate 0.3339   Epoch: 3   Global Step: 36850   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:29:23,628-Speed 5934.52 samples/sec   Loss 12.7270   LearningRate 0.3339   Epoch: 3   Global Step: 36860   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:29:30,483-Speed 5975.91 samples/sec   Loss 12.7602   LearningRate 0.3338   Epoch: 3   Global Step: 36870   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:29:37,373-Speed 5945.34 samples/sec   Loss 12.6789   LearningRate 0.3338   Epoch: 3   Global Step: 36880   Fp16 Grad Scale: 524288   Required: 33 hours
Training: 2022-01-08 03:29:44,313-Speed 5905.65 samples/sec   Loss 12.6439   LearningRate 0.3338   Epoch: 3   Global Step: 36890   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:29:51,267-Speed 5891.63 samples/sec   Loss 12.6255   LearningRate 0.3337   Epoch: 3   Global Step: 36900   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:29:58,183-Speed 5923.98 samples/sec   Loss 12.6802   LearningRate 0.3337   Epoch: 3   Global Step: 36910   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:30:05,035-Speed 5978.98 samples/sec   Loss 12.6254   LearningRate 0.3336   Epoch: 3   Global Step: 36920   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:30:11,888-Speed 5977.61 samples/sec   Loss 12.6151   LearningRate 0.3336   Epoch: 3   Global Step: 36930   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:30:18,738-Speed 5980.07 samples/sec   Loss 12.7890   LearningRate 0.3336   Epoch: 3   Global Step: 36940   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:30:25,599-Speed 5971.58 samples/sec   Loss 12.6873   LearningRate 0.3335   Epoch: 3   Global Step: 36950   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:30:32,453-Speed 5977.31 samples/sec   Loss 12.5946   LearningRate 0.3335   Epoch: 3   Global Step: 36960   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:30:39,300-Speed 5982.48 samples/sec   Loss 12.7157   LearningRate 0.3335   Epoch: 3   Global Step: 36970   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:30:46,173-Speed 5961.06 samples/sec   Loss 12.6670   LearningRate 0.3334   Epoch: 3   Global Step: 36980   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:30:53,041-Speed 5965.96 samples/sec   Loss 12.7137   LearningRate 0.3334   Epoch: 3   Global Step: 36990   Fp16 Grad Scale: 524288   Required: 33 hours
Training: 2022-01-08 03:30:59,879-Speed 5991.05 samples/sec   Loss 12.6532   LearningRate 0.3333   Epoch: 3   Global Step: 37000   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:31:06,726-Speed 5983.72 samples/sec   Loss 12.6859   LearningRate 0.3333   Epoch: 3   Global Step: 37010   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:31:13,573-Speed 5985.72 samples/sec   Loss 12.6274   LearningRate 0.3333   Epoch: 3   Global Step: 37020   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:31:20,425-Speed 5977.66 samples/sec   Loss 12.5645   LearningRate 0.3332   Epoch: 3   Global Step: 37030   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:31:27,339-Speed 5926.07 samples/sec   Loss 12.5940   LearningRate 0.3332   Epoch: 3   Global Step: 37040   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:31:34,288-Speed 5895.91 samples/sec   Loss 12.6335   LearningRate 0.3331   Epoch: 3   Global Step: 37050   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:31:41,221-Speed 5908.51 samples/sec   Loss 12.6465   LearningRate 0.3331   Epoch: 3   Global Step: 37060   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:31:48,151-Speed 5911.82 samples/sec   Loss 12.6853   LearningRate 0.3331   Epoch: 3   Global Step: 37070   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:31:55,020-Speed 5963.75 samples/sec   Loss 12.7395   LearningRate 0.3330   Epoch: 3   Global Step: 37080   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:32:01,876-Speed 5975.90 samples/sec   Loss 12.6807   LearningRate 0.3330   Epoch: 3   Global Step: 37090   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:32:08,747-Speed 5962.49 samples/sec   Loss 12.6110   LearningRate 0.3329   Epoch: 3   Global Step: 37100   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:32:15,598-Speed 5979.99 samples/sec   Loss 12.6406   LearningRate 0.3329   Epoch: 3   Global Step: 37110   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:32:22,455-Speed 5974.85 samples/sec   Loss 12.6175   LearningRate 0.3329   Epoch: 3   Global Step: 37120   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:32:29,359-Speed 5934.17 samples/sec   Loss 12.6407   LearningRate 0.3328   Epoch: 3   Global Step: 37130   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:32:36,228-Speed 5964.54 samples/sec   Loss 12.6230   LearningRate 0.3328   Epoch: 3   Global Step: 37140   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:32:43,083-Speed 5976.69 samples/sec   Loss 12.6270   LearningRate 0.3327   Epoch: 3   Global Step: 37150   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:32:49,936-Speed 5977.39 samples/sec   Loss 12.7598   LearningRate 0.3327   Epoch: 3   Global Step: 37160   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:32:56,801-Speed 5967.87 samples/sec   Loss 12.7111   LearningRate 0.3327   Epoch: 3   Global Step: 37170   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:33:03,657-Speed 5974.70 samples/sec   Loss 12.6190   LearningRate 0.3326   Epoch: 3   Global Step: 37180   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:33:10,523-Speed 5966.91 samples/sec   Loss 12.6459   LearningRate 0.3326   Epoch: 3   Global Step: 37190   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:33:17,397-Speed 5960.26 samples/sec   Loss 12.6075   LearningRate 0.3326   Epoch: 3   Global Step: 37200   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:33:24,253-Speed 5975.26 samples/sec   Loss 12.6625   LearningRate 0.3325   Epoch: 3   Global Step: 37210   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:33:31,105-Speed 5978.34 samples/sec   Loss 12.6557   LearningRate 0.3325   Epoch: 3   Global Step: 37220   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:33:37,971-Speed 5967.12 samples/sec   Loss 12.6076   LearningRate 0.3324   Epoch: 3   Global Step: 37230   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:33:44,828-Speed 5974.84 samples/sec   Loss 12.6893   LearningRate 0.3324   Epoch: 3   Global Step: 37240   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:33:51,700-Speed 5960.29 samples/sec   Loss 12.6359   LearningRate 0.3324   Epoch: 3   Global Step: 37250   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:33:58,549-Speed 5981.70 samples/sec   Loss 12.5731   LearningRate 0.3323   Epoch: 3   Global Step: 37260   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:34:05,402-Speed 5978.31 samples/sec   Loss 12.6456   LearningRate 0.3323   Epoch: 3   Global Step: 37270   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:34:12,265-Speed 5969.97 samples/sec   Loss 12.5871   LearningRate 0.3322   Epoch: 3   Global Step: 37280   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:34:19,123-Speed 5972.98 samples/sec   Loss 12.6819   LearningRate 0.3322   Epoch: 3   Global Step: 37290   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:34:25,970-Speed 5983.82 samples/sec   Loss 12.5792   LearningRate 0.3322   Epoch: 3   Global Step: 37300   Fp16 Grad Scale: 524288   Required: 33 hours
Training: 2022-01-08 03:34:32,806-Speed 5992.37 samples/sec   Loss 12.5957   LearningRate 0.3321   Epoch: 3   Global Step: 37310   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:34:39,653-Speed 5982.91 samples/sec   Loss 12.5799   LearningRate 0.3321   Epoch: 3   Global Step: 37320   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:34:46,502-Speed 5982.19 samples/sec   Loss 12.6768   LearningRate 0.3320   Epoch: 3   Global Step: 37330   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:34:53,359-Speed 5973.78 samples/sec   Loss 12.6477   LearningRate 0.3320   Epoch: 3   Global Step: 37340   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:35:00,220-Speed 5970.98 samples/sec   Loss 12.5960   LearningRate 0.3320   Epoch: 3   Global Step: 37350   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:35:07,097-Speed 5957.50 samples/sec   Loss 12.5747   LearningRate 0.3319   Epoch: 3   Global Step: 37360   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:35:13,967-Speed 5965.79 samples/sec   Loss 12.6657   LearningRate 0.3319   Epoch: 3   Global Step: 37370   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:35:20,832-Speed 5967.44 samples/sec   Loss 12.6502   LearningRate 0.3318   Epoch: 3   Global Step: 37380   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:35:27,680-Speed 5981.94 samples/sec   Loss 12.5721   LearningRate 0.3318   Epoch: 3   Global Step: 37390   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:35:34,562-Speed 5954.38 samples/sec   Loss 12.6792   LearningRate 0.3318   Epoch: 3   Global Step: 37400   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:35:41,451-Speed 5946.61 samples/sec   Loss 12.6572   LearningRate 0.3317   Epoch: 3   Global Step: 37410   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:35:48,334-Speed 5952.51 samples/sec   Loss 12.6424   LearningRate 0.3317   Epoch: 3   Global Step: 37420   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:35:55,646-Speed 5603.19 samples/sec   Loss 12.7355   LearningRate 0.3317   Epoch: 3   Global Step: 37430   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:36:02,512-Speed 5966.32 samples/sec   Loss 12.6558   LearningRate 0.3316   Epoch: 3   Global Step: 37440   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:36:09,371-Speed 5973.43 samples/sec   Loss 12.6723   LearningRate 0.3316   Epoch: 3   Global Step: 37450   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:36:16,229-Speed 5977.94 samples/sec   Loss 12.6590   LearningRate 0.3315   Epoch: 3   Global Step: 37460   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:36:23,100-Speed 5962.55 samples/sec   Loss 12.6872   LearningRate 0.3315   Epoch: 3   Global Step: 37470   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:36:29,955-Speed 5976.37 samples/sec   Loss 12.6424   LearningRate 0.3315   Epoch: 3   Global Step: 37480   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:36:36,805-Speed 5979.84 samples/sec   Loss 12.6345   LearningRate 0.3314   Epoch: 3   Global Step: 37490   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:36:43,665-Speed 5972.05 samples/sec   Loss 12.5064   LearningRate 0.3314   Epoch: 3   Global Step: 37500   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:36:50,519-Speed 5977.13 samples/sec   Loss 12.6438   LearningRate 0.3313   Epoch: 3   Global Step: 37510   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:36:57,363-Speed 5985.89 samples/sec   Loss 12.5684   LearningRate 0.3313   Epoch: 3   Global Step: 37520   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:37:04,212-Speed 5980.82 samples/sec   Loss 12.6136   LearningRate 0.3313   Epoch: 3   Global Step: 37530   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:37:11,062-Speed 5981.12 samples/sec   Loss 12.6442   LearningRate 0.3312   Epoch: 3   Global Step: 37540   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:37:17,938-Speed 5957.90 samples/sec   Loss 12.6409   LearningRate 0.3312   Epoch: 3   Global Step: 37550   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:37:24,796-Speed 5974.19 samples/sec   Loss 12.6409   LearningRate 0.3311   Epoch: 3   Global Step: 37560   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:37:31,672-Speed 5959.39 samples/sec   Loss 12.6750   LearningRate 0.3311   Epoch: 3   Global Step: 37570   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:37:38,540-Speed 5965.17 samples/sec   Loss 12.6516   LearningRate 0.3311   Epoch: 3   Global Step: 37580   Fp16 Grad Scale: 524288   Required: 33 hours
Training: 2022-01-08 03:37:45,381-Speed 5989.79 samples/sec   Loss 12.6622   LearningRate 0.3310   Epoch: 3   Global Step: 37590   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:37:52,265-Speed 5951.57 samples/sec   Loss 12.6350   LearningRate 0.3310   Epoch: 3   Global Step: 37600   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:37:59,119-Speed 5976.28 samples/sec   Loss 12.6009   LearningRate 0.3310   Epoch: 3   Global Step: 37610   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:38:06,012-Speed 5943.16 samples/sec   Loss 12.5878   LearningRate 0.3309   Epoch: 3   Global Step: 37620   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:38:12,907-Speed 5942.36 samples/sec   Loss 12.5414   LearningRate 0.3309   Epoch: 3   Global Step: 37630   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:38:19,791-Speed 5951.71 samples/sec   Loss 12.6552   LearningRate 0.3308   Epoch: 3   Global Step: 37640   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:38:26,664-Speed 5960.16 samples/sec   Loss 12.5761   LearningRate 0.3308   Epoch: 3   Global Step: 37650   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:38:33,599-Speed 5908.11 samples/sec   Loss 12.6725   LearningRate 0.3308   Epoch: 3   Global Step: 37660   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:38:40,585-Speed 5864.66 samples/sec   Loss 12.5626   LearningRate 0.3307   Epoch: 3   Global Step: 37670   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:38:47,502-Speed 5922.83 samples/sec   Loss 12.6381   LearningRate 0.3307   Epoch: 3   Global Step: 37680   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:38:54,354-Speed 5978.97 samples/sec   Loss 12.6035   LearningRate 0.3306   Epoch: 3   Global Step: 37690   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:39:01,203-Speed 5982.03 samples/sec   Loss 12.6377   LearningRate 0.3306   Epoch: 3   Global Step: 37700   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:39:08,064-Speed 5970.59 samples/sec   Loss 12.5364   LearningRate 0.3306   Epoch: 3   Global Step: 37710   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:39:14,935-Speed 5962.38 samples/sec   Loss 12.6032   LearningRate 0.3305   Epoch: 3   Global Step: 37720   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:39:21,788-Speed 5980.62 samples/sec   Loss 12.6803   LearningRate 0.3305   Epoch: 3   Global Step: 37730   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:39:28,638-Speed 5980.89 samples/sec   Loss 12.5847   LearningRate 0.3304   Epoch: 3   Global Step: 37740   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:39:35,520-Speed 5954.72 samples/sec   Loss 12.6186   LearningRate 0.3304   Epoch: 3   Global Step: 37750   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:39:42,392-Speed 5961.97 samples/sec   Loss 12.6310   LearningRate 0.3304   Epoch: 3   Global Step: 37760   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:39:49,258-Speed 5966.53 samples/sec   Loss 12.5693   LearningRate 0.3303   Epoch: 3   Global Step: 37770   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:39:56,133-Speed 5958.91 samples/sec   Loss 12.6398   LearningRate 0.3303   Epoch: 3   Global Step: 37780   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:40:03,007-Speed 5962.40 samples/sec   Loss 12.6090   LearningRate 0.3302   Epoch: 3   Global Step: 37790   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:40:09,862-Speed 5976.46 samples/sec   Loss 12.6038   LearningRate 0.3302   Epoch: 3   Global Step: 37800   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:40:16,745-Speed 5953.96 samples/sec   Loss 12.6388   LearningRate 0.3302   Epoch: 3   Global Step: 37810   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:40:23,611-Speed 5966.48 samples/sec   Loss 12.5973   LearningRate 0.3301   Epoch: 3   Global Step: 37820   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:40:30,490-Speed 5955.70 samples/sec   Loss 12.6026   LearningRate 0.3301   Epoch: 3   Global Step: 37830   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:40:37,350-Speed 5970.99 samples/sec   Loss 12.5977   LearningRate 0.3301   Epoch: 3   Global Step: 37840   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:40:44,208-Speed 5974.01 samples/sec   Loss 12.5787   LearningRate 0.3300   Epoch: 3   Global Step: 37850   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:40:51,087-Speed 5955.91 samples/sec   Loss 12.5363   LearningRate 0.3300   Epoch: 3   Global Step: 37860   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:40:57,937-Speed 5979.98 samples/sec   Loss 12.6492   LearningRate 0.3299   Epoch: 3   Global Step: 37870   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:41:04,817-Speed 5954.40 samples/sec   Loss 12.5687   LearningRate 0.3299   Epoch: 3   Global Step: 37880   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:41:11,706-Speed 5946.98 samples/sec   Loss 12.5984   LearningRate 0.3299   Epoch: 3   Global Step: 37890   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:41:18,568-Speed 5970.39 samples/sec   Loss 12.6733   LearningRate 0.3298   Epoch: 3   Global Step: 37900   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:41:25,425-Speed 5974.08 samples/sec   Loss 12.5902   LearningRate 0.3298   Epoch: 3   Global Step: 37910   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:41:32,288-Speed 5969.52 samples/sec   Loss 12.4894   LearningRate 0.3297   Epoch: 3   Global Step: 37920   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:41:39,128-Speed 5989.32 samples/sec   Loss 12.6839   LearningRate 0.3297   Epoch: 3   Global Step: 37930   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:41:45,989-Speed 5970.87 samples/sec   Loss 12.6462   LearningRate 0.3297   Epoch: 3   Global Step: 37940   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:41:52,863-Speed 5962.96 samples/sec   Loss 12.6668   LearningRate 0.3296   Epoch: 3   Global Step: 37950   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:41:59,761-Speed 5939.89 samples/sec   Loss 12.6203   LearningRate 0.3296   Epoch: 3   Global Step: 37960   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:42:06,641-Speed 5954.32 samples/sec   Loss 12.6075   LearningRate 0.3295   Epoch: 3   Global Step: 37970   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:42:13,540-Speed 5938.45 samples/sec   Loss 12.6129   LearningRate 0.3295   Epoch: 3   Global Step: 37980   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:42:20,392-Speed 5978.38 samples/sec   Loss 12.5694   LearningRate 0.3295   Epoch: 3   Global Step: 37990   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:42:27,264-Speed 5962.17 samples/sec   Loss 12.6853   LearningRate 0.3294   Epoch: 3   Global Step: 38000   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:42:34,117-Speed 5978.27 samples/sec   Loss 12.5961   LearningRate 0.3294   Epoch: 3   Global Step: 38010   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:42:40,974-Speed 5974.31 samples/sec   Loss 12.5567   LearningRate 0.3294   Epoch: 3   Global Step: 38020   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:42:47,840-Speed 5966.97 samples/sec   Loss 12.6078   LearningRate 0.3293   Epoch: 3   Global Step: 38030   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:42:54,692-Speed 5979.66 samples/sec   Loss 12.6037   LearningRate 0.3293   Epoch: 3   Global Step: 38040   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:43:01,549-Speed 5974.28 samples/sec   Loss 12.6170   LearningRate 0.3292   Epoch: 3   Global Step: 38050   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:43:08,436-Speed 5949.73 samples/sec   Loss 12.5800   LearningRate 0.3292   Epoch: 3   Global Step: 38060   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:43:15,318-Speed 5952.48 samples/sec   Loss 12.6216   LearningRate 0.3292   Epoch: 3   Global Step: 38070   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:43:22,185-Speed 5965.93 samples/sec   Loss 12.4427   LearningRate 0.3291   Epoch: 3   Global Step: 38080   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:43:29,044-Speed 5973.13 samples/sec   Loss 12.6336   LearningRate 0.3291   Epoch: 3   Global Step: 38090   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:43:35,890-Speed 5984.01 samples/sec   Loss 12.5907   LearningRate 0.3290   Epoch: 3   Global Step: 38100   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:43:42,766-Speed 5958.53 samples/sec   Loss 12.6163   LearningRate 0.3290   Epoch: 3   Global Step: 38110   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:43:49,617-Speed 5979.97 samples/sec   Loss 12.5781   LearningRate 0.3290   Epoch: 3   Global Step: 38120   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:43:56,464-Speed 5983.42 samples/sec   Loss 12.5057   LearningRate 0.3289   Epoch: 3   Global Step: 38130   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:44:03,337-Speed 5963.79 samples/sec   Loss 12.5571   LearningRate 0.3289   Epoch: 3   Global Step: 38140   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:44:10,193-Speed 5975.23 samples/sec   Loss 12.6063   LearningRate 0.3288   Epoch: 3   Global Step: 38150   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:44:17,042-Speed 5981.25 samples/sec   Loss 12.5887   LearningRate 0.3288   Epoch: 3   Global Step: 38160   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:44:23,902-Speed 5971.97 samples/sec   Loss 12.6268   LearningRate 0.3288   Epoch: 3   Global Step: 38170   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:44:30,766-Speed 5970.82 samples/sec   Loss 12.4536   LearningRate 0.3287   Epoch: 3   Global Step: 38180   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:44:37,635-Speed 5963.95 samples/sec   Loss 12.5921   LearningRate 0.3287   Epoch: 3   Global Step: 38190   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:44:44,489-Speed 5977.00 samples/sec   Loss 12.5266   LearningRate 0.3287   Epoch: 3   Global Step: 38200   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:44:51,370-Speed 5954.05 samples/sec   Loss 12.6290   LearningRate 0.3286   Epoch: 3   Global Step: 38210   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:44:58,229-Speed 5977.19 samples/sec   Loss 12.6298   LearningRate 0.3286   Epoch: 3   Global Step: 38220   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:45:05,112-Speed 5953.05 samples/sec   Loss 12.5481   LearningRate 0.3285   Epoch: 3   Global Step: 38230   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:45:11,967-Speed 5976.40 samples/sec   Loss 12.5227   LearningRate 0.3285   Epoch: 3   Global Step: 38240   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:45:18,828-Speed 5971.31 samples/sec   Loss 12.5287   LearningRate 0.3285   Epoch: 3   Global Step: 38250   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:45:25,677-Speed 5980.59 samples/sec   Loss 12.5417   LearningRate 0.3284   Epoch: 3   Global Step: 38260   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:45:32,528-Speed 5980.08 samples/sec   Loss 12.4943   LearningRate 0.3284   Epoch: 3   Global Step: 38270   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:45:39,384-Speed 5975.68 samples/sec   Loss 12.5232   LearningRate 0.3283   Epoch: 3   Global Step: 38280   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:45:46,257-Speed 5960.90 samples/sec   Loss 12.5806   LearningRate 0.3283   Epoch: 3   Global Step: 38290   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:45:53,114-Speed 5974.17 samples/sec   Loss 12.5383   LearningRate 0.3283   Epoch: 3   Global Step: 38300   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:45:59,985-Speed 5962.81 samples/sec   Loss 12.5038   LearningRate 0.3282   Epoch: 3   Global Step: 38310   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:46:06,856-Speed 5962.75 samples/sec   Loss 12.4937   LearningRate 0.3282   Epoch: 3   Global Step: 38320   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:46:13,711-Speed 5976.43 samples/sec   Loss 12.6306   LearningRate 0.3281   Epoch: 3   Global Step: 38330   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:46:20,583-Speed 5961.84 samples/sec   Loss 12.6478   LearningRate 0.3281   Epoch: 3   Global Step: 38340   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:46:27,457-Speed 5959.96 samples/sec   Loss 12.5746   LearningRate 0.3281   Epoch: 3   Global Step: 38350   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:46:34,315-Speed 5973.54 samples/sec   Loss 12.5405   LearningRate 0.3280   Epoch: 3   Global Step: 38360   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:46:41,161-Speed 5984.57 samples/sec   Loss 12.5488   LearningRate 0.3280   Epoch: 3   Global Step: 38370   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:46:48,009-Speed 5983.00 samples/sec   Loss 12.5242   LearningRate 0.3280   Epoch: 3   Global Step: 38380   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:46:54,888-Speed 5954.81 samples/sec   Loss 12.5973   LearningRate 0.3279   Epoch: 3   Global Step: 38390   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:47:01,746-Speed 5973.49 samples/sec   Loss 12.6002   LearningRate 0.3279   Epoch: 3   Global Step: 38400   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:47:08,620-Speed 5959.80 samples/sec   Loss 12.5369   LearningRate 0.3278   Epoch: 3   Global Step: 38410   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:47:15,479-Speed 5973.16 samples/sec   Loss 12.5320   LearningRate 0.3278   Epoch: 3   Global Step: 38420   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:47:22,345-Speed 5967.09 samples/sec   Loss 12.5620   LearningRate 0.3278   Epoch: 3   Global Step: 38430   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:47:29,204-Speed 5973.18 samples/sec   Loss 12.5285   LearningRate 0.3277   Epoch: 3   Global Step: 38440   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:47:36,177-Speed 5876.64 samples/sec   Loss 12.5314   LearningRate 0.3277   Epoch: 3   Global Step: 38450   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:47:43,138-Speed 5885.80 samples/sec   Loss 12.5798   LearningRate 0.3276   Epoch: 3   Global Step: 38460   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:47:50,072-Speed 5908.36 samples/sec   Loss 12.6282   LearningRate 0.3276   Epoch: 3   Global Step: 38470   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:47:56,928-Speed 5975.20 samples/sec   Loss 12.5460   LearningRate 0.3276   Epoch: 3   Global Step: 38480   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:48:03,767-Speed 5990.26 samples/sec   Loss 12.5094   LearningRate 0.3275   Epoch: 3   Global Step: 38490   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:48:10,710-Speed 5900.39 samples/sec   Loss 12.4741   LearningRate 0.3275   Epoch: 3   Global Step: 38500   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:48:17,586-Speed 5957.91 samples/sec   Loss 12.5675   LearningRate 0.3275   Epoch: 3   Global Step: 38510   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:48:24,449-Speed 5969.73 samples/sec   Loss 12.5532   LearningRate 0.3274   Epoch: 3   Global Step: 38520   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:48:31,291-Speed 5987.61 samples/sec   Loss 12.6054   LearningRate 0.3274   Epoch: 3   Global Step: 38530   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:48:38,143-Speed 5978.87 samples/sec   Loss 12.5412   LearningRate 0.3273   Epoch: 3   Global Step: 38540   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:48:45,013-Speed 5963.33 samples/sec   Loss 12.5211   LearningRate 0.3273   Epoch: 3   Global Step: 38550   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:48:51,870-Speed 5974.57 samples/sec   Loss 12.5255   LearningRate 0.3273   Epoch: 3   Global Step: 38560   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:48:58,773-Speed 5934.44 samples/sec   Loss 12.5938   LearningRate 0.3272   Epoch: 3   Global Step: 38570   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:49:05,673-Speed 5937.38 samples/sec   Loss 12.5220   LearningRate 0.3272   Epoch: 3   Global Step: 38580   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:49:12,524-Speed 5982.36 samples/sec   Loss 12.5647   LearningRate 0.3271   Epoch: 3   Global Step: 38590   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:49:19,367-Speed 5986.66 samples/sec   Loss 12.5847   LearningRate 0.3271   Epoch: 3   Global Step: 38600   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:49:26,208-Speed 5988.48 samples/sec   Loss 12.4932   LearningRate 0.3271   Epoch: 3   Global Step: 38610   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:49:33,063-Speed 5976.02 samples/sec   Loss 12.5259   LearningRate 0.3270   Epoch: 3   Global Step: 38620   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:49:39,921-Speed 5973.75 samples/sec   Loss 12.4760   LearningRate 0.3270   Epoch: 3   Global Step: 38630   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:49:46,789-Speed 5965.34 samples/sec   Loss 12.5461   LearningRate 0.3269   Epoch: 3   Global Step: 38640   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:49:53,662-Speed 5960.29 samples/sec   Loss 12.4876   LearningRate 0.3269   Epoch: 3   Global Step: 38650   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:50:00,509-Speed 5983.70 samples/sec   Loss 12.5286   LearningRate 0.3269   Epoch: 3   Global Step: 38660   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:50:07,373-Speed 5968.88 samples/sec   Loss 12.5160   LearningRate 0.3268   Epoch: 3   Global Step: 38670   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:50:14,227-Speed 5977.80 samples/sec   Loss 12.4860   LearningRate 0.3268   Epoch: 3   Global Step: 38680   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:50:21,204-Speed 5872.35 samples/sec   Loss 12.5231   LearningRate 0.3268   Epoch: 3   Global Step: 38690   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:50:28,148-Speed 5899.95 samples/sec   Loss 12.5507   LearningRate 0.3267   Epoch: 3   Global Step: 38700   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:50:35,035-Speed 5949.25 samples/sec   Loss 12.5859   LearningRate 0.3267   Epoch: 3   Global Step: 38710   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:50:41,889-Speed 5976.80 samples/sec   Loss 12.6093   LearningRate 0.3266   Epoch: 3   Global Step: 38720   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:50:48,737-Speed 5983.09 samples/sec   Loss 12.5779   LearningRate 0.3266   Epoch: 3   Global Step: 38730   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:50:55,595-Speed 5975.98 samples/sec   Loss 12.6251   LearningRate 0.3266   Epoch: 3   Global Step: 38740   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:51:02,443-Speed 5981.88 samples/sec   Loss 12.6098   LearningRate 0.3265   Epoch: 3   Global Step: 38750   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:51:09,286-Speed 5987.14 samples/sec   Loss 12.4917   LearningRate 0.3265   Epoch: 3   Global Step: 38760   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:51:16,150-Speed 5971.27 samples/sec   Loss 12.5139   LearningRate 0.3264   Epoch: 3   Global Step: 38770   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:51:23,114-Speed 5884.08 samples/sec   Loss 12.5633   LearningRate 0.3264   Epoch: 3   Global Step: 38780   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:51:30,060-Speed 5898.33 samples/sec   Loss 12.5438   LearningRate 0.3264   Epoch: 3   Global Step: 38790   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:51:36,929-Speed 5964.81 samples/sec   Loss 12.5595   LearningRate 0.3263   Epoch: 3   Global Step: 38800   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:51:43,791-Speed 5970.62 samples/sec   Loss 12.4988   LearningRate 0.3263   Epoch: 3   Global Step: 38810   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:51:50,645-Speed 5977.10 samples/sec   Loss 12.4955   LearningRate 0.3262   Epoch: 3   Global Step: 38820   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:51:57,504-Speed 5972.95 samples/sec   Loss 12.6119   LearningRate 0.3262   Epoch: 3   Global Step: 38830   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:52:04,375-Speed 5963.15 samples/sec   Loss 12.5728   LearningRate 0.3262   Epoch: 3   Global Step: 38840   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:52:11,237-Speed 5969.43 samples/sec   Loss 12.5621   LearningRate 0.3261   Epoch: 3   Global Step: 38850   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:52:18,097-Speed 5972.27 samples/sec   Loss 12.5424   LearningRate 0.3261   Epoch: 3   Global Step: 38860   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:52:24,951-Speed 5976.99 samples/sec   Loss 12.5450   LearningRate 0.3261   Epoch: 3   Global Step: 38870   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:52:31,802-Speed 5979.79 samples/sec   Loss 12.5559   LearningRate 0.3260   Epoch: 3   Global Step: 38880   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:52:38,650-Speed 5982.32 samples/sec   Loss 12.4631   LearningRate 0.3260   Epoch: 3   Global Step: 38890   Fp16 Grad Scale: 524288   Required: 33 hours
Training: 2022-01-08 03:52:45,502-Speed 5981.10 samples/sec   Loss 12.5027   LearningRate 0.3259   Epoch: 3   Global Step: 38900   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:52:52,352-Speed 5981.02 samples/sec   Loss 12.4684   LearningRate 0.3259   Epoch: 3   Global Step: 38910   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:52:59,200-Speed 5982.16 samples/sec   Loss 12.5136   LearningRate 0.3259   Epoch: 3   Global Step: 38920   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:53:06,071-Speed 5962.74 samples/sec   Loss 12.5269   LearningRate 0.3258   Epoch: 3   Global Step: 38930   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:53:12,933-Speed 5969.84 samples/sec   Loss 12.5325   LearningRate 0.3258   Epoch: 3   Global Step: 38940   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:53:19,776-Speed 5987.32 samples/sec   Loss 12.4682   LearningRate 0.3257   Epoch: 3   Global Step: 38950   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:53:26,627-Speed 5979.33 samples/sec   Loss 12.4844   LearningRate 0.3257   Epoch: 3   Global Step: 38960   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:53:33,573-Speed 5897.82 samples/sec   Loss 12.4941   LearningRate 0.3257   Epoch: 3   Global Step: 38970   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:53:40,417-Speed 5985.46 samples/sec   Loss 12.5639   LearningRate 0.3256   Epoch: 3   Global Step: 38980   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:53:47,269-Speed 5978.48 samples/sec   Loss 12.4777   LearningRate 0.3256   Epoch: 3   Global Step: 38990   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:53:54,113-Speed 5986.08 samples/sec   Loss 12.6288   LearningRate 0.3256   Epoch: 3   Global Step: 39000   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:54:00,956-Speed 5986.72 samples/sec   Loss 12.4854   LearningRate 0.3255   Epoch: 3   Global Step: 39010   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:54:07,801-Speed 5985.13 samples/sec   Loss 12.5601   LearningRate 0.3255   Epoch: 3   Global Step: 39020   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:54:14,659-Speed 5974.20 samples/sec   Loss 12.4375   LearningRate 0.3254   Epoch: 3   Global Step: 39030   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:54:21,536-Speed 5957.27 samples/sec   Loss 12.5606   LearningRate 0.3254   Epoch: 3   Global Step: 39040   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:54:28,387-Speed 5979.66 samples/sec   Loss 12.5041   LearningRate 0.3254   Epoch: 3   Global Step: 39050   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:54:35,228-Speed 5988.75 samples/sec   Loss 12.5378   LearningRate 0.3253   Epoch: 3   Global Step: 39060   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:54:42,073-Speed 5984.31 samples/sec   Loss 12.5141   LearningRate 0.3253   Epoch: 3   Global Step: 39070   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:54:48,919-Speed 5984.84 samples/sec   Loss 12.4989   LearningRate 0.3252   Epoch: 3   Global Step: 39080   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:54:55,769-Speed 5979.99 samples/sec   Loss 12.5728   LearningRate 0.3252   Epoch: 3   Global Step: 39090   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:55:02,640-Speed 5962.29 samples/sec   Loss 12.4821   LearningRate 0.3252   Epoch: 3   Global Step: 39100   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:55:09,485-Speed 5985.34 samples/sec   Loss 12.5292   LearningRate 0.3251   Epoch: 3   Global Step: 39110   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:55:16,331-Speed 5984.50 samples/sec   Loss 12.4884   LearningRate 0.3251   Epoch: 3   Global Step: 39120   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:55:23,180-Speed 5982.13 samples/sec   Loss 12.6597   LearningRate 0.3251   Epoch: 3   Global Step: 39130   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:55:30,034-Speed 5976.83 samples/sec   Loss 12.5087   LearningRate 0.3250   Epoch: 3   Global Step: 39140   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:55:36,895-Speed 5971.76 samples/sec   Loss 12.4141   LearningRate 0.3250   Epoch: 3   Global Step: 39150   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:55:43,787-Speed 5944.18 samples/sec   Loss 12.5464   LearningRate 0.3249   Epoch: 3   Global Step: 39160   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:55:50,636-Speed 5984.98 samples/sec   Loss 12.4593   LearningRate 0.3249   Epoch: 3   Global Step: 39170   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:55:57,489-Speed 5977.35 samples/sec   Loss 12.5347   LearningRate 0.3249   Epoch: 3   Global Step: 39180   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:56:04,328-Speed 5990.73 samples/sec   Loss 12.6511   LearningRate 0.3248   Epoch: 3   Global Step: 39190   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:56:11,175-Speed 5982.72 samples/sec   Loss 12.5709   LearningRate 0.3248   Epoch: 3   Global Step: 39200   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:56:18,025-Speed 5980.78 samples/sec   Loss 12.4510   LearningRate 0.3247   Epoch: 3   Global Step: 39210   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:56:24,880-Speed 5976.15 samples/sec   Loss 12.4816   LearningRate 0.3247   Epoch: 3   Global Step: 39220   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:56:31,757-Speed 5957.81 samples/sec   Loss 12.4445   LearningRate 0.3247   Epoch: 3   Global Step: 39230   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:56:38,609-Speed 5978.29 samples/sec   Loss 12.5044   LearningRate 0.3246   Epoch: 3   Global Step: 39240   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:56:45,478-Speed 5964.34 samples/sec   Loss 12.4646   LearningRate 0.3246   Epoch: 3   Global Step: 39250   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:56:52,352-Speed 5959.38 samples/sec   Loss 12.5062   LearningRate 0.3245   Epoch: 3   Global Step: 39260   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:56:59,213-Speed 5972.10 samples/sec   Loss 12.5395   LearningRate 0.3245   Epoch: 3   Global Step: 39270   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:57:06,064-Speed 5980.07 samples/sec   Loss 12.6131   LearningRate 0.3245   Epoch: 3   Global Step: 39280   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:57:12,898-Speed 5995.52 samples/sec   Loss 12.5600   LearningRate 0.3244   Epoch: 3   Global Step: 39290   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 03:57:19,765-Speed 5965.11 samples/sec   Loss 12.5007   LearningRate 0.3244   Epoch: 3   Global Step: 39300   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 03:57:26,690-Speed 5918.81 samples/sec   Loss 12.4904   LearningRate 0.3244   Epoch: 3   Global Step: 39310   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 03:57:33,538-Speed 5984.89 samples/sec   Loss 12.4152   LearningRate 0.3243   Epoch: 3   Global Step: 39320   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 03:57:40,393-Speed 5975.94 samples/sec   Loss 12.4821   LearningRate 0.3243   Epoch: 3   Global Step: 39330   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 03:57:47,247-Speed 5981.84 samples/sec   Loss 12.4861   LearningRate 0.3242   Epoch: 3   Global Step: 39340   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 03:57:54,088-Speed 5988.33 samples/sec   Loss 12.4583   LearningRate 0.3242   Epoch: 3   Global Step: 39350   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 03:58:00,968-Speed 5954.57 samples/sec   Loss 12.4637   LearningRate 0.3242   Epoch: 3   Global Step: 39360   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 03:58:07,837-Speed 5964.49 samples/sec   Loss 12.5031   LearningRate 0.3241   Epoch: 3   Global Step: 39370   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 03:58:14,678-Speed 5988.10 samples/sec   Loss 12.4801   LearningRate 0.3241   Epoch: 3   Global Step: 39380   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 03:58:21,525-Speed 5984.12 samples/sec   Loss 12.5322   LearningRate 0.3240   Epoch: 3   Global Step: 39390   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:58:28,371-Speed 5983.87 samples/sec   Loss 12.4382   LearningRate 0.3240   Epoch: 3   Global Step: 39400   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:58:35,242-Speed 5962.73 samples/sec   Loss 12.4096   LearningRate 0.3240   Epoch: 3   Global Step: 39410   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:58:42,105-Speed 5969.74 samples/sec   Loss 12.5102   LearningRate 0.3239   Epoch: 3   Global Step: 39420   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:58:48,961-Speed 5974.76 samples/sec   Loss 12.4587   LearningRate 0.3239   Epoch: 3   Global Step: 39430   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:58:55,811-Speed 5981.27 samples/sec   Loss 12.5289   LearningRate 0.3239   Epoch: 3   Global Step: 39440   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:59:02,667-Speed 5983.11 samples/sec   Loss 12.5244   LearningRate 0.3238   Epoch: 3   Global Step: 39450   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:59:09,530-Speed 5969.01 samples/sec   Loss 12.5088   LearningRate 0.3238   Epoch: 3   Global Step: 39460   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:59:16,377-Speed 5983.20 samples/sec   Loss 12.4876   LearningRate 0.3237   Epoch: 3   Global Step: 39470   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:59:23,224-Speed 5982.92 samples/sec   Loss 12.5398   LearningRate 0.3237   Epoch: 3   Global Step: 39480   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 03:59:30,072-Speed 5984.42 samples/sec   Loss 12.4371   LearningRate 0.3237   Epoch: 3   Global Step: 39490   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:59:36,934-Speed 5969.90 samples/sec   Loss 12.4951   LearningRate 0.3236   Epoch: 3   Global Step: 39500   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:59:43,795-Speed 5971.49 samples/sec   Loss 12.5456   LearningRate 0.3236   Epoch: 3   Global Step: 39510   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:59:50,678-Speed 5951.94 samples/sec   Loss 12.4411   LearningRate 0.3235   Epoch: 3   Global Step: 39520   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 03:59:57,562-Speed 5952.66 samples/sec   Loss 12.4096   LearningRate 0.3235   Epoch: 3   Global Step: 39530   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:00:04,414-Speed 5979.35 samples/sec   Loss 12.4285   LearningRate 0.3235   Epoch: 3   Global Step: 39540   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:00:11,291-Speed 5956.13 samples/sec   Loss 12.5028   LearningRate 0.3234   Epoch: 3   Global Step: 39550   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:00:18,155-Speed 5969.20 samples/sec   Loss 12.5324   LearningRate 0.3234   Epoch: 3   Global Step: 39560   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:00:25,035-Speed 5954.12 samples/sec   Loss 12.4265   LearningRate 0.3234   Epoch: 3   Global Step: 39570   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:00:31,899-Speed 5969.05 samples/sec   Loss 12.4760   LearningRate 0.3233   Epoch: 3   Global Step: 39580   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:00:38,754-Speed 5976.48 samples/sec   Loss 12.4021   LearningRate 0.3233   Epoch: 3   Global Step: 39590   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:00:45,606-Speed 5978.40 samples/sec   Loss 12.4768   LearningRate 0.3232   Epoch: 3   Global Step: 39600   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:00:52,485-Speed 5955.76 samples/sec   Loss 12.4418   LearningRate 0.3232   Epoch: 3   Global Step: 39610   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:00:59,356-Speed 5962.69 samples/sec   Loss 12.5525   LearningRate 0.3232   Epoch: 3   Global Step: 39620   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:01:06,209-Speed 5978.29 samples/sec   Loss 12.3968   LearningRate 0.3231   Epoch: 3   Global Step: 39630   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:01:13,062-Speed 5978.37 samples/sec   Loss 12.4151   LearningRate 0.3231   Epoch: 3   Global Step: 39640   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:01:19,983-Speed 5919.96 samples/sec   Loss 12.4872   LearningRate 0.3230   Epoch: 3   Global Step: 39650   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:01:26,825-Speed 5987.47 samples/sec   Loss 12.4391   LearningRate 0.3230   Epoch: 3   Global Step: 39660   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:01:33,675-Speed 5980.80 samples/sec   Loss 12.5339   LearningRate 0.3230   Epoch: 3   Global Step: 39670   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:01:40,522-Speed 5983.08 samples/sec   Loss 12.4369   LearningRate 0.3229   Epoch: 3   Global Step: 39680   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:01:47,376-Speed 5976.60 samples/sec   Loss 12.5525   LearningRate 0.3229   Epoch: 3   Global Step: 39690   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:01:54,225-Speed 5982.10 samples/sec   Loss 12.4377   LearningRate 0.3229   Epoch: 3   Global Step: 39700   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:02:01,079-Speed 5976.61 samples/sec   Loss 12.4451   LearningRate 0.3228   Epoch: 3   Global Step: 39710   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:02:07,957-Speed 5957.27 samples/sec   Loss 12.4982   LearningRate 0.3228   Epoch: 3   Global Step: 39720   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:02:14,815-Speed 5973.74 samples/sec   Loss 12.4570   LearningRate 0.3227   Epoch: 3   Global Step: 39730   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:02:21,696-Speed 5953.52 samples/sec   Loss 12.5241   LearningRate 0.3227   Epoch: 3   Global Step: 39740   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:02:28,570-Speed 5959.79 samples/sec   Loss 12.4518   LearningRate 0.3227   Epoch: 3   Global Step: 39750   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:02:35,429-Speed 5972.98 samples/sec   Loss 12.4099   LearningRate 0.3226   Epoch: 3   Global Step: 39760   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:02:42,288-Speed 5973.07 samples/sec   Loss 12.4194   LearningRate 0.3226   Epoch: 3   Global Step: 39770   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:02:49,194-Speed 5932.84 samples/sec   Loss 12.5078   LearningRate 0.3225   Epoch: 3   Global Step: 39780   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:02:56,092-Speed 5940.51 samples/sec   Loss 12.4708   LearningRate 0.3225   Epoch: 3   Global Step: 39790   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:03:02,963-Speed 5961.94 samples/sec   Loss 12.5106   LearningRate 0.3225   Epoch: 3   Global Step: 39800   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:03:09,816-Speed 5979.18 samples/sec   Loss 12.4027   LearningRate 0.3224   Epoch: 3   Global Step: 39810   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:03:16,681-Speed 5966.90 samples/sec   Loss 12.4111   LearningRate 0.3224   Epoch: 3   Global Step: 39820   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:03:23,526-Speed 5986.02 samples/sec   Loss 12.3879   LearningRate 0.3224   Epoch: 3   Global Step: 39830   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:03:30,373-Speed 5982.72 samples/sec   Loss 12.4791   LearningRate 0.3223   Epoch: 3   Global Step: 39840   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:03:37,226-Speed 5977.92 samples/sec   Loss 12.5393   LearningRate 0.3223   Epoch: 3   Global Step: 39850   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:03:44,072-Speed 5984.84 samples/sec   Loss 12.4822   LearningRate 0.3222   Epoch: 3   Global Step: 39860   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:03:50,941-Speed 5964.27 samples/sec   Loss 12.3749   LearningRate 0.3222   Epoch: 3   Global Step: 39870   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:03:57,779-Speed 5991.92 samples/sec   Loss 12.3757   LearningRate 0.3222   Epoch: 3   Global Step: 39880   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:04:04,622-Speed 5986.00 samples/sec   Loss 12.4762   LearningRate 0.3221   Epoch: 3   Global Step: 39890   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:04:11,483-Speed 5973.22 samples/sec   Loss 12.5047   LearningRate 0.3221   Epoch: 3   Global Step: 39900   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:04:18,336-Speed 5977.86 samples/sec   Loss 12.3577   LearningRate 0.3220   Epoch: 3   Global Step: 39910   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:04:25,205-Speed 5963.73 samples/sec   Loss 12.3603   LearningRate 0.3220   Epoch: 3   Global Step: 39920   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:04:32,048-Speed 5987.37 samples/sec   Loss 12.3436   LearningRate 0.3220   Epoch: 3   Global Step: 39930   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:04:38,892-Speed 5985.15 samples/sec   Loss 12.4700   LearningRate 0.3219   Epoch: 3   Global Step: 39940   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:04:45,747-Speed 5976.16 samples/sec   Loss 12.3845   LearningRate 0.3219   Epoch: 3   Global Step: 39950   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:04:52,697-Speed 5895.80 samples/sec   Loss 12.4294   LearningRate 0.3219   Epoch: 3   Global Step: 39960   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:04:59,563-Speed 5967.31 samples/sec   Loss 12.4987   LearningRate 0.3218   Epoch: 3   Global Step: 39970   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:05:06,482-Speed 5923.47 samples/sec   Loss 12.4867   LearningRate 0.3218   Epoch: 3   Global Step: 39980   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 04:05:13,334-Speed 5979.61 samples/sec   Loss 12.4053   LearningRate 0.3217   Epoch: 3   Global Step: 39990   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 04:05:20,182-Speed 5982.56 samples/sec   Loss 12.4488   LearningRate 0.3217   Epoch: 3   Global Step: 40000   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 04:05:47,026-[lfw][40000]XNorm: 23.529627
Training: 2022-01-08 04:05:47,027-[lfw][40000]Accuracy-Flip: 0.99583+-0.00300
Training: 2022-01-08 04:05:47,027-[lfw][40000]Accuracy-Highest: 0.99650
Training: 2022-01-08 04:06:17,815-[cfp_fp][40000]XNorm: 21.023732
Training: 2022-01-08 04:06:17,816-[cfp_fp][40000]Accuracy-Flip: 0.96929+-0.00881
Training: 2022-01-08 04:06:17,817-[cfp_fp][40000]Accuracy-Highest: 0.97057
Training: 2022-01-08 04:06:44,493-[agedb_30][40000]XNorm: 22.993922
Training: 2022-01-08 04:06:44,494-[agedb_30][40000]Accuracy-Flip: 0.95900+-0.00764
Training: 2022-01-08 04:06:44,494-[agedb_30][40000]Accuracy-Highest: 0.96200
Training: 2022-01-08 04:06:51,319-Speed 449.44 samples/sec   Loss 12.3644   LearningRate 0.3217   Epoch: 3   Global Step: 40010   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 04:06:58,153-Speed 5998.31 samples/sec   Loss 12.3871   LearningRate 0.3216   Epoch: 3   Global Step: 40020   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 04:07:04,996-Speed 5987.69 samples/sec   Loss 12.5671   LearningRate 0.3216   Epoch: 3   Global Step: 40030   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 04:07:11,856-Speed 5972.22 samples/sec   Loss 12.3908   LearningRate 0.3215   Epoch: 3   Global Step: 40040   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 04:07:18,737-Speed 5953.05 samples/sec   Loss 12.4383   LearningRate 0.3215   Epoch: 3   Global Step: 40050   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 04:07:25,583-Speed 5985.47 samples/sec   Loss 12.4455   LearningRate 0.3215   Epoch: 3   Global Step: 40060   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 04:07:32,443-Speed 5971.74 samples/sec   Loss 12.5821   LearningRate 0.3214   Epoch: 3   Global Step: 40070   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-01-08 04:07:39,320-Speed 5957.65 samples/sec   Loss 12.4478   LearningRate 0.3214   Epoch: 3   Global Step: 40080   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:07:46,184-Speed 5968.58 samples/sec   Loss 12.4571   LearningRate 0.3214   Epoch: 3   Global Step: 40090   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:07:53,028-Speed 5985.26 samples/sec   Loss 12.3787   LearningRate 0.3213   Epoch: 3   Global Step: 40100   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:07:59,870-Speed 5988.27 samples/sec   Loss 12.4116   LearningRate 0.3213   Epoch: 3   Global Step: 40110   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:08:06,718-Speed 5982.66 samples/sec   Loss 12.3759   LearningRate 0.3212   Epoch: 3   Global Step: 40120   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:08:13,576-Speed 5973.34 samples/sec   Loss 12.4409   LearningRate 0.3212   Epoch: 3   Global Step: 40130   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:08:20,409-Speed 5995.52 samples/sec   Loss 12.4336   LearningRate 0.3212   Epoch: 3   Global Step: 40140   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:08:27,252-Speed 5988.11 samples/sec   Loss 12.4288   LearningRate 0.3211   Epoch: 3   Global Step: 40150   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:08:34,088-Speed 5992.34 samples/sec   Loss 12.4750   LearningRate 0.3211   Epoch: 3   Global Step: 40160   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:08:40,933-Speed 5985.17 samples/sec   Loss 12.4629   LearningRate 0.3210   Epoch: 3   Global Step: 40170   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:08:47,856-Speed 5917.70 samples/sec   Loss 12.4461   LearningRate 0.3210   Epoch: 3   Global Step: 40180   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:08:54,795-Speed 5904.81 samples/sec   Loss 12.4516   LearningRate 0.3210   Epoch: 3   Global Step: 40190   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:09:01,762-Speed 5879.72 samples/sec   Loss 12.4486   LearningRate 0.3209   Epoch: 3   Global Step: 40200   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:09:08,611-Speed 5981.99 samples/sec   Loss 12.4087   LearningRate 0.3209   Epoch: 3   Global Step: 40210   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:09:15,577-Speed 5881.90 samples/sec   Loss 12.4293   LearningRate 0.3209   Epoch: 3   Global Step: 40220   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:09:22,424-Speed 5983.24 samples/sec   Loss 12.3407   LearningRate 0.3208   Epoch: 3   Global Step: 40230   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:09:29,265-Speed 5988.00 samples/sec   Loss 12.4213   LearningRate 0.3208   Epoch: 3   Global Step: 40240   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:09:36,133-Speed 5965.27 samples/sec   Loss 12.3502   LearningRate 0.3207   Epoch: 3   Global Step: 40250   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:09:42,993-Speed 5972.37 samples/sec   Loss 12.4218   LearningRate 0.3207   Epoch: 3   Global Step: 40260   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:09:49,879-Speed 5949.22 samples/sec   Loss 12.4817   LearningRate 0.3207   Epoch: 3   Global Step: 40270   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:09:56,735-Speed 5974.98 samples/sec   Loss 12.5177   LearningRate 0.3206   Epoch: 3   Global Step: 40280   Fp16 Grad Scale: 524288   Required: 33 hours
Training: 2022-01-08 04:10:03,568-Speed 5996.09 samples/sec   Loss 12.4101   LearningRate 0.3206   Epoch: 3   Global Step: 40290   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:10:10,414-Speed 5984.21 samples/sec   Loss 12.4619   LearningRate 0.3205   Epoch: 3   Global Step: 40300   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:10:17,274-Speed 5971.70 samples/sec   Loss 12.5110   LearningRate 0.3205   Epoch: 3   Global Step: 40310   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:10:24,151-Speed 5956.71 samples/sec   Loss 12.4289   LearningRate 0.3205   Epoch: 3   Global Step: 40320   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:10:31,001-Speed 5981.20 samples/sec   Loss 12.4474   LearningRate 0.3204   Epoch: 3   Global Step: 40330   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:10:37,845-Speed 5985.62 samples/sec   Loss 12.4549   LearningRate 0.3204   Epoch: 3   Global Step: 40340   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:10:44,696-Speed 5979.72 samples/sec   Loss 12.3718   LearningRate 0.3204   Epoch: 3   Global Step: 40350   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:10:51,554-Speed 5973.91 samples/sec   Loss 12.4172   LearningRate 0.3203   Epoch: 3   Global Step: 40360   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:10:58,397-Speed 5986.65 samples/sec   Loss 12.3810   LearningRate 0.3203   Epoch: 3   Global Step: 40370   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:11:05,267-Speed 5965.46 samples/sec   Loss 12.4643   LearningRate 0.3202   Epoch: 3   Global Step: 40380   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:11:12,114-Speed 5983.70 samples/sec   Loss 12.4554   LearningRate 0.3202   Epoch: 3   Global Step: 40390   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:11:18,966-Speed 5979.37 samples/sec   Loss 12.3225   LearningRate 0.3202   Epoch: 3   Global Step: 40400   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:11:25,843-Speed 5956.87 samples/sec   Loss 12.3173   LearningRate 0.3201   Epoch: 3   Global Step: 40410   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:11:32,691-Speed 5982.09 samples/sec   Loss 12.3897   LearningRate 0.3201   Epoch: 3   Global Step: 40420   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:11:39,544-Speed 5978.10 samples/sec   Loss 12.3913   LearningRate 0.3200   Epoch: 3   Global Step: 40430   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:11:46,391-Speed 5983.45 samples/sec   Loss 12.3659   LearningRate 0.3200   Epoch: 3   Global Step: 40440   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:11:53,237-Speed 5984.47 samples/sec   Loss 12.4603   LearningRate 0.3200   Epoch: 3   Global Step: 40450   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:12:00,089-Speed 5978.31 samples/sec   Loss 12.3493   LearningRate 0.3199   Epoch: 3   Global Step: 40460   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:12:06,945-Speed 5975.63 samples/sec   Loss 12.4186   LearningRate 0.3199   Epoch: 3   Global Step: 40470   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:12:13,794-Speed 5982.02 samples/sec   Loss 12.4262   LearningRate 0.3199   Epoch: 3   Global Step: 40480   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:12:20,644-Speed 5981.03 samples/sec   Loss 12.3881   LearningRate 0.3198   Epoch: 3   Global Step: 40490   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:12:27,602-Speed 5888.14 samples/sec   Loss 12.4378   LearningRate 0.3198   Epoch: 3   Global Step: 40500   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:12:34,556-Speed 5891.31 samples/sec   Loss 12.4114   LearningRate 0.3197   Epoch: 3   Global Step: 40510   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:12:41,403-Speed 5983.37 samples/sec   Loss 12.4213   LearningRate 0.3197   Epoch: 3   Global Step: 40520   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:12:48,254-Speed 5981.48 samples/sec   Loss 12.3774   LearningRate 0.3197   Epoch: 3   Global Step: 40530   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:12:55,101-Speed 5986.59 samples/sec   Loss 12.4056   LearningRate 0.3196   Epoch: 3   Global Step: 40540   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:13:01,940-Speed 5989.79 samples/sec   Loss 12.3303   LearningRate 0.3196   Epoch: 3   Global Step: 40550   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:13:08,785-Speed 5985.45 samples/sec   Loss 12.3894   LearningRate 0.3195   Epoch: 3   Global Step: 40560   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:13:15,641-Speed 5974.99 samples/sec   Loss 12.3624   LearningRate 0.3195   Epoch: 3   Global Step: 40570   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:13:22,500-Speed 5973.55 samples/sec   Loss 12.3347   LearningRate 0.3195   Epoch: 3   Global Step: 40580   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:13:29,367-Speed 5966.32 samples/sec   Loss 12.4675   LearningRate 0.3194   Epoch: 3   Global Step: 40590   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:13:36,207-Speed 5988.74 samples/sec   Loss 12.3966   LearningRate 0.3194   Epoch: 3   Global Step: 40600   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:13:43,050-Speed 5987.46 samples/sec   Loss 12.4034   LearningRate 0.3194   Epoch: 3   Global Step: 40610   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:13:49,942-Speed 5943.89 samples/sec   Loss 12.3913   LearningRate 0.3193   Epoch: 3   Global Step: 40620   Fp16 Grad Scale: 524288   Required: 33 hours
Training: 2022-01-08 04:13:56,782-Speed 5991.18 samples/sec   Loss 12.3835   LearningRate 0.3193   Epoch: 3   Global Step: 40630   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:14:03,630-Speed 5982.20 samples/sec   Loss 12.5009   LearningRate 0.3192   Epoch: 3   Global Step: 40640   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:14:10,477-Speed 5983.84 samples/sec   Loss 12.3626   LearningRate 0.3192   Epoch: 3   Global Step: 40650   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:14:17,323-Speed 5983.86 samples/sec   Loss 12.3596   LearningRate 0.3192   Epoch: 3   Global Step: 40660   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:14:24,181-Speed 5974.49 samples/sec   Loss 12.3262   LearningRate 0.3191   Epoch: 3   Global Step: 40670   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:14:31,144-Speed 5883.28 samples/sec   Loss 12.3703   LearningRate 0.3191   Epoch: 3   Global Step: 40680   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:14:38,122-Speed 5871.28 samples/sec   Loss 12.4618   LearningRate 0.3191   Epoch: 3   Global Step: 40690   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:14:44,966-Speed 5986.10 samples/sec   Loss 12.5028   LearningRate 0.3190   Epoch: 3   Global Step: 40700   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:14:51,835-Speed 5964.64 samples/sec   Loss 12.4609   LearningRate 0.3190   Epoch: 3   Global Step: 40710   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:14:58,682-Speed 5982.91 samples/sec   Loss 12.3838   LearningRate 0.3189   Epoch: 3   Global Step: 40720   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:15:05,549-Speed 5966.22 samples/sec   Loss 12.4011   LearningRate 0.3189   Epoch: 3   Global Step: 40730   Fp16 Grad Scale: 524288   Required: 33 hours
Training: 2022-01-08 04:15:12,408-Speed 5973.07 samples/sec   Loss 12.3133   LearningRate 0.3189   Epoch: 3   Global Step: 40740   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:15:19,274-Speed 5967.13 samples/sec   Loss 12.4409   LearningRate 0.3188   Epoch: 3   Global Step: 40750   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:15:26,130-Speed 5974.87 samples/sec   Loss 12.3614   LearningRate 0.3188   Epoch: 3   Global Step: 40760   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:15:32,966-Speed 5993.16 samples/sec   Loss 12.3763   LearningRate 0.3187   Epoch: 3   Global Step: 40770   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:15:39,851-Speed 5952.06 samples/sec   Loss 12.2901   LearningRate 0.3187   Epoch: 3   Global Step: 40780   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:15:46,717-Speed 5966.93 samples/sec   Loss 12.3059   LearningRate 0.3187   Epoch: 3   Global Step: 40790   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:15:53,567-Speed 5980.37 samples/sec   Loss 12.3710   LearningRate 0.3186   Epoch: 3   Global Step: 40800   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:16:00,436-Speed 5965.40 samples/sec   Loss 12.4447   LearningRate 0.3186   Epoch: 3   Global Step: 40810   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:16:07,302-Speed 5966.24 samples/sec   Loss 12.3554   LearningRate 0.3186   Epoch: 3   Global Step: 40820   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:16:14,158-Speed 5978.76 samples/sec   Loss 12.3522   LearningRate 0.3185   Epoch: 3   Global Step: 40830   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:16:21,012-Speed 5977.35 samples/sec   Loss 12.4330   LearningRate 0.3185   Epoch: 3   Global Step: 40840   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:16:27,887-Speed 5960.32 samples/sec   Loss 12.3366   LearningRate 0.3184   Epoch: 3   Global Step: 40850   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:16:34,751-Speed 5968.38 samples/sec   Loss 12.3996   LearningRate 0.3184   Epoch: 3   Global Step: 40860   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:16:41,610-Speed 5975.10 samples/sec   Loss 12.3049   LearningRate 0.3184   Epoch: 3   Global Step: 40870   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:16:48,457-Speed 5983.95 samples/sec   Loss 12.4050   LearningRate 0.3183   Epoch: 3   Global Step: 40880   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:16:55,306-Speed 5981.20 samples/sec   Loss 12.3337   LearningRate 0.3183   Epoch: 3   Global Step: 40890   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:17:02,157-Speed 5979.63 samples/sec   Loss 12.3417   LearningRate 0.3182   Epoch: 3   Global Step: 40900   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:17:08,999-Speed 5987.79 samples/sec   Loss 12.3386   LearningRate 0.3182   Epoch: 3   Global Step: 40910   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:17:15,847-Speed 5982.18 samples/sec   Loss 12.4258   LearningRate 0.3182   Epoch: 3   Global Step: 40920   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:17:22,740-Speed 5943.53 samples/sec   Loss 12.3609   LearningRate 0.3181   Epoch: 3   Global Step: 40930   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:17:29,579-Speed 5989.92 samples/sec   Loss 12.4308   LearningRate 0.3181   Epoch: 3   Global Step: 40940   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:17:36,457-Speed 5956.35 samples/sec   Loss 12.4741   LearningRate 0.3181   Epoch: 3   Global Step: 40950   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:17:43,318-Speed 5972.69 samples/sec   Loss 12.3985   LearningRate 0.3180   Epoch: 3   Global Step: 40960   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:17:50,272-Speed 5891.21 samples/sec   Loss 12.3591   LearningRate 0.3180   Epoch: 3   Global Step: 40970   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:17:57,147-Speed 5959.58 samples/sec   Loss 12.4255   LearningRate 0.3179   Epoch: 3   Global Step: 40980   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:18:03,996-Speed 5980.86 samples/sec   Loss 12.4043   LearningRate 0.3179   Epoch: 3   Global Step: 40990   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:18:10,882-Speed 5950.35 samples/sec   Loss 12.4578   LearningRate 0.3179   Epoch: 3   Global Step: 41000   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:18:17,736-Speed 5976.90 samples/sec   Loss 12.4539   LearningRate 0.3178   Epoch: 3   Global Step: 41010   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:18:24,585-Speed 5981.61 samples/sec   Loss 12.3998   LearningRate 0.3178   Epoch: 3   Global Step: 41020   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:18:31,433-Speed 5982.41 samples/sec   Loss 12.3784   LearningRate 0.3178   Epoch: 3   Global Step: 41030   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:18:38,304-Speed 5962.43 samples/sec   Loss 12.3736   LearningRate 0.3177   Epoch: 3   Global Step: 41040   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:18:45,166-Speed 5970.30 samples/sec   Loss 12.3685   LearningRate 0.3177   Epoch: 3   Global Step: 41050   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:18:52,026-Speed 5971.23 samples/sec   Loss 12.3407   LearningRate 0.3176   Epoch: 3   Global Step: 41060   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:18:58,906-Speed 5954.62 samples/sec   Loss 12.3541   LearningRate 0.3176   Epoch: 3   Global Step: 41070   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:19:05,769-Speed 5969.19 samples/sec   Loss 12.4037   LearningRate 0.3176   Epoch: 3   Global Step: 41080   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:19:12,624-Speed 5976.56 samples/sec   Loss 12.3901   LearningRate 0.3175   Epoch: 3   Global Step: 41090   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:19:19,483-Speed 5972.45 samples/sec   Loss 12.3223   LearningRate 0.3175   Epoch: 3   Global Step: 41100   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:19:26,359-Speed 5958.58 samples/sec   Loss 12.3169   LearningRate 0.3174   Epoch: 3   Global Step: 41110   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:19:33,221-Speed 5970.39 samples/sec   Loss 12.3092   LearningRate 0.3174   Epoch: 3   Global Step: 41120   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:19:40,092-Speed 5961.97 samples/sec   Loss 12.3509   LearningRate 0.3174   Epoch: 3   Global Step: 41130   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:19:46,982-Speed 5946.50 samples/sec   Loss 12.3943   LearningRate 0.3173   Epoch: 3   Global Step: 41140   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:19:53,843-Speed 5971.02 samples/sec   Loss 12.3886   LearningRate 0.3173   Epoch: 3   Global Step: 41150   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:20:00,713-Speed 5963.69 samples/sec   Loss 12.2934   LearningRate 0.3173   Epoch: 3   Global Step: 41160   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:20:07,571-Speed 5974.35 samples/sec   Loss 12.2695   LearningRate 0.3172   Epoch: 3   Global Step: 41170   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:20:14,432-Speed 5970.92 samples/sec   Loss 12.4190   LearningRate 0.3172   Epoch: 3   Global Step: 41180   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:20:21,282-Speed 5980.16 samples/sec   Loss 12.3641   LearningRate 0.3171   Epoch: 3   Global Step: 41190   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:20:28,173-Speed 5945.74 samples/sec   Loss 12.2419   LearningRate 0.3171   Epoch: 3   Global Step: 41200   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:20:35,025-Speed 5979.41 samples/sec   Loss 12.2793   LearningRate 0.3171   Epoch: 3   Global Step: 41210   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:20:41,875-Speed 5979.99 samples/sec   Loss 12.3443   LearningRate 0.3170   Epoch: 3   Global Step: 41220   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:20:48,756-Speed 5953.63 samples/sec   Loss 12.2889   LearningRate 0.3170   Epoch: 3   Global Step: 41230   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:20:55,615-Speed 5972.82 samples/sec   Loss 12.2802   LearningRate 0.3169   Epoch: 3   Global Step: 41240   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:21:02,491-Speed 5958.18 samples/sec   Loss 12.3231   LearningRate 0.3169   Epoch: 3   Global Step: 41250   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:21:09,365-Speed 5960.67 samples/sec   Loss 12.3039   LearningRate 0.3169   Epoch: 3   Global Step: 41260   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:21:16,224-Speed 5972.29 samples/sec   Loss 12.4219   LearningRate 0.3168   Epoch: 3   Global Step: 41270   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:21:23,079-Speed 5976.73 samples/sec   Loss 12.4391   LearningRate 0.3168   Epoch: 3   Global Step: 41280   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:21:29,935-Speed 5974.86 samples/sec   Loss 12.3445   LearningRate 0.3168   Epoch: 3   Global Step: 41290   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:21:36,792-Speed 5974.95 samples/sec   Loss 12.2655   LearningRate 0.3167   Epoch: 3   Global Step: 41300   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:21:43,668-Speed 5957.60 samples/sec   Loss 12.4385   LearningRate 0.3167   Epoch: 3   Global Step: 41310   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:21:50,536-Speed 5966.25 samples/sec   Loss 12.3664   LearningRate 0.3166   Epoch: 3   Global Step: 41320   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:21:57,408-Speed 5960.86 samples/sec   Loss 12.3308   LearningRate 0.3166   Epoch: 3   Global Step: 41330   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:22:04,281-Speed 5961.05 samples/sec   Loss 12.3905   LearningRate 0.3166   Epoch: 3   Global Step: 41340   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:22:11,147-Speed 5967.46 samples/sec   Loss 12.3933   LearningRate 0.3165   Epoch: 3   Global Step: 41350   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:22:18,041-Speed 5942.59 samples/sec   Loss 12.3957   LearningRate 0.3165   Epoch: 3   Global Step: 41360   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:22:24,896-Speed 5975.95 samples/sec   Loss 12.3263   LearningRate 0.3165   Epoch: 3   Global Step: 41370   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:22:31,762-Speed 5967.96 samples/sec   Loss 12.3227   LearningRate 0.3164   Epoch: 3   Global Step: 41380   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:22:38,630-Speed 5965.96 samples/sec   Loss 12.3047   LearningRate 0.3164   Epoch: 3   Global Step: 41390   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:22:45,528-Speed 5938.79 samples/sec   Loss 12.3027   LearningRate 0.3163   Epoch: 3   Global Step: 41400   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:22:52,387-Speed 5973.24 samples/sec   Loss 12.3586   LearningRate 0.3163   Epoch: 3   Global Step: 41410   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:22:59,383-Speed 5855.91 samples/sec   Loss 12.3700   LearningRate 0.3163   Epoch: 3   Global Step: 41420   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:23:06,274-Speed 5945.34 samples/sec   Loss 12.4116   LearningRate 0.3162   Epoch: 3   Global Step: 41430   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:23:13,173-Speed 5938.42 samples/sec   Loss 12.3417   LearningRate 0.3162   Epoch: 3   Global Step: 41440   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:23:20,029-Speed 5976.63 samples/sec   Loss 12.4061   LearningRate 0.3161   Epoch: 3   Global Step: 41450   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:23:26,894-Speed 5967.53 samples/sec   Loss 12.3552   LearningRate 0.3161   Epoch: 3   Global Step: 41460   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:23:33,753-Speed 5973.02 samples/sec   Loss 12.3734   LearningRate 0.3161   Epoch: 3   Global Step: 41470   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:23:40,628-Speed 5959.05 samples/sec   Loss 12.3554   LearningRate 0.3160   Epoch: 3   Global Step: 41480   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:24:05,461-Speed 1649.47 samples/sec   Loss 12.3452   LearningRate 0.3160   Epoch: 4   Global Step: 41490   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:24:12,295-Speed 5995.64 samples/sec   Loss 12.3252   LearningRate 0.3160   Epoch: 4   Global Step: 41500   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:24:19,136-Speed 5988.48 samples/sec   Loss 12.2741   LearningRate 0.3159   Epoch: 4   Global Step: 41510   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:24:25,984-Speed 5985.62 samples/sec   Loss 12.3232   LearningRate 0.3159   Epoch: 4   Global Step: 41520   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:24:32,825-Speed 5988.42 samples/sec   Loss 12.3896   LearningRate 0.3158   Epoch: 4   Global Step: 41530   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:24:39,668-Speed 5987.30 samples/sec   Loss 12.3262   LearningRate 0.3158   Epoch: 4   Global Step: 41540   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-01-08 04:24:46,505-Speed 5991.45 samples/sec   Loss 12.3333   LearningRate 0.3158   Epoch: 4   Global Step: 41550   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:24:53,359-Speed 5979.61 samples/sec   Loss 12.3392   LearningRate 0.3157   Epoch: 4   Global Step: 41560   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:25:00,204-Speed 5986.48 samples/sec   Loss 12.3238   LearningRate 0.3157   Epoch: 4   Global Step: 41570   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-01-08 04:25:07,051-Speed 5983.49 samples/sec   Loss 12.3442   LearningRate 0.3157   Epoch: 4   Global Step: 41580   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:25:13,900-Speed 5980.92 samples/sec   Loss 12.3251   LearningRate 0.3156   Epoch: 4   Global Step: 41590   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:25:20,757-Speed 5975.21 samples/sec   Loss 12.3409   LearningRate 0.3156   Epoch: 4   Global Step: 41600   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:25:27,623-Speed 5966.08 samples/sec   Loss 12.3308   LearningRate 0.3155   Epoch: 4   Global Step: 41610   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:25:34,492-Speed 5964.49 samples/sec   Loss 12.2824   LearningRate 0.3155   Epoch: 4   Global Step: 41620   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:25:41,354-Speed 5969.51 samples/sec   Loss 12.2670   LearningRate 0.3155   Epoch: 4   Global Step: 41630   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:25:48,222-Speed 5964.93 samples/sec   Loss 12.2845   LearningRate 0.3154   Epoch: 4   Global Step: 41640   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:25:55,066-Speed 5985.97 samples/sec   Loss 12.3321   LearningRate 0.3154   Epoch: 4   Global Step: 41650   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:26:01,938-Speed 5961.89 samples/sec   Loss 12.3106   LearningRate 0.3153   Epoch: 4   Global Step: 41660   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:26:08,794-Speed 5975.07 samples/sec   Loss 12.3262   LearningRate 0.3153   Epoch: 4   Global Step: 41670   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:26:15,665-Speed 5962.55 samples/sec   Loss 12.3313   LearningRate 0.3153   Epoch: 4   Global Step: 41680   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:26:22,521-Speed 5975.76 samples/sec   Loss 12.3210   LearningRate 0.3152   Epoch: 4   Global Step: 41690   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:26:29,477-Speed 5889.04 samples/sec   Loss 12.3218   LearningRate 0.3152   Epoch: 4   Global Step: 41700   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:26:36,343-Speed 5966.67 samples/sec   Loss 12.2837   LearningRate 0.3152   Epoch: 4   Global Step: 41710   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:26:43,291-Speed 5896.20 samples/sec   Loss 12.3690   LearningRate 0.3151   Epoch: 4   Global Step: 41720   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:26:50,146-Speed 5975.95 samples/sec   Loss 12.3170   LearningRate 0.3151   Epoch: 4   Global Step: 41730   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:26:57,038-Speed 5944.56 samples/sec   Loss 12.2192   LearningRate 0.3150   Epoch: 4   Global Step: 41740   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:27:03,930-Speed 5943.70 samples/sec   Loss 12.2869   LearningRate 0.3150   Epoch: 4   Global Step: 41750   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:27:10,809-Speed 5955.99 samples/sec   Loss 12.3663   LearningRate 0.3150   Epoch: 4   Global Step: 41760   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:27:17,676-Speed 5965.57 samples/sec   Loss 12.3158   LearningRate 0.3149   Epoch: 4   Global Step: 41770   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:27:24,543-Speed 5966.36 samples/sec   Loss 12.3297   LearningRate 0.3149   Epoch: 4   Global Step: 41780   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:27:31,407-Speed 5967.94 samples/sec   Loss 12.2634   LearningRate 0.3149   Epoch: 4   Global Step: 41790   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:27:38,253-Speed 5984.40 samples/sec   Loss 12.3641   LearningRate 0.3148   Epoch: 4   Global Step: 41800   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:27:45,114-Speed 5970.85 samples/sec   Loss 12.2816   LearningRate 0.3148   Epoch: 4   Global Step: 41810   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:27:52,008-Speed 5942.41 samples/sec   Loss 12.2855   LearningRate 0.3147   Epoch: 4   Global Step: 41820   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:27:58,873-Speed 5968.34 samples/sec   Loss 12.2519   LearningRate 0.3147   Epoch: 4   Global Step: 41830   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:28:05,736-Speed 5969.18 samples/sec   Loss 12.2621   LearningRate 0.3147   Epoch: 4   Global Step: 41840   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:28:12,607-Speed 5962.05 samples/sec   Loss 12.3465   LearningRate 0.3146   Epoch: 4   Global Step: 41850   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:28:19,472-Speed 5967.36 samples/sec   Loss 12.3384   LearningRate 0.3146   Epoch: 4   Global Step: 41860   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:28:26,344-Speed 5961.31 samples/sec   Loss 12.3581   LearningRate 0.3146   Epoch: 4   Global Step: 41870   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:28:33,208-Speed 5968.18 samples/sec   Loss 12.3510   LearningRate 0.3145   Epoch: 4   Global Step: 41880   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:28:40,064-Speed 5976.13 samples/sec   Loss 12.2684   LearningRate 0.3145   Epoch: 4   Global Step: 41890   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:28:46,930-Speed 5966.92 samples/sec   Loss 12.3072   LearningRate 0.3144   Epoch: 4   Global Step: 41900   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:28:53,796-Speed 5966.33 samples/sec   Loss 12.3215   LearningRate 0.3144   Epoch: 4   Global Step: 41910   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:29:00,656-Speed 5971.79 samples/sec   Loss 12.3124   LearningRate 0.3144   Epoch: 4   Global Step: 41920   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:29:08,548-Speed 5190.44 samples/sec   Loss 12.3229   LearningRate 0.3143   Epoch: 4   Global Step: 41930   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:29:15,395-Speed 5983.89 samples/sec   Loss 12.3410   LearningRate 0.3143   Epoch: 4   Global Step: 41940   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:29:22,269-Speed 5959.83 samples/sec   Loss 12.2776   LearningRate 0.3142   Epoch: 4   Global Step: 41950   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:29:29,121-Speed 5978.45 samples/sec   Loss 12.3865   LearningRate 0.3142   Epoch: 4   Global Step: 41960   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:29:35,972-Speed 5979.22 samples/sec   Loss 12.3127   LearningRate 0.3142   Epoch: 4   Global Step: 41970   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:29:42,829-Speed 5975.20 samples/sec   Loss 12.2860   LearningRate 0.3141   Epoch: 4   Global Step: 41980   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:29:49,683-Speed 5976.83 samples/sec   Loss 12.3531   LearningRate 0.3141   Epoch: 4   Global Step: 41990   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:29:56,561-Speed 5956.24 samples/sec   Loss 12.2843   LearningRate 0.3141   Epoch: 4   Global Step: 42000   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:30:03,424-Speed 5969.28 samples/sec   Loss 12.3143   LearningRate 0.3140   Epoch: 4   Global Step: 42010   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:30:10,285-Speed 5971.45 samples/sec   Loss 12.2536   LearningRate 0.3140   Epoch: 4   Global Step: 42020   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:30:17,145-Speed 5971.96 samples/sec   Loss 12.2750   LearningRate 0.3139   Epoch: 4   Global Step: 42030   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:30:23,995-Speed 5980.09 samples/sec   Loss 12.2829   LearningRate 0.3139   Epoch: 4   Global Step: 42040   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:30:30,863-Speed 5964.86 samples/sec   Loss 12.2603   LearningRate 0.3139   Epoch: 4   Global Step: 42050   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:30:37,720-Speed 5974.67 samples/sec   Loss 12.3200   LearningRate 0.3138   Epoch: 4   Global Step: 42060   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:30:44,580-Speed 5974.44 samples/sec   Loss 12.2004   LearningRate 0.3138   Epoch: 4   Global Step: 42070   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:30:51,443-Speed 5972.17 samples/sec   Loss 12.2838   LearningRate 0.3138   Epoch: 4   Global Step: 42080   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:30:58,313-Speed 5963.19 samples/sec   Loss 12.3671   LearningRate 0.3137   Epoch: 4   Global Step: 42090   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:31:05,189-Speed 5957.77 samples/sec   Loss 12.3061   LearningRate 0.3137   Epoch: 4   Global Step: 42100   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:31:12,126-Speed 5905.77 samples/sec   Loss 12.3305   LearningRate 0.3136   Epoch: 4   Global Step: 42110   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:31:18,983-Speed 5976.69 samples/sec   Loss 12.2935   LearningRate 0.3136   Epoch: 4   Global Step: 42120   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:31:25,838-Speed 5975.60 samples/sec   Loss 12.3050   LearningRate 0.3136   Epoch: 4   Global Step: 42130   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:31:32,684-Speed 5984.75 samples/sec   Loss 12.2720   LearningRate 0.3135   Epoch: 4   Global Step: 42140   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:31:39,557-Speed 5960.73 samples/sec   Loss 12.2934   LearningRate 0.3135   Epoch: 4   Global Step: 42150   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:31:46,416-Speed 5973.47 samples/sec   Loss 12.2224   LearningRate 0.3134   Epoch: 4   Global Step: 42160   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:31:53,266-Speed 5979.89 samples/sec   Loss 12.2938   LearningRate 0.3134   Epoch: 4   Global Step: 42170   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:32:00,211-Speed 5898.99 samples/sec   Loss 12.3552   LearningRate 0.3134   Epoch: 4   Global Step: 42180   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:32:07,094-Speed 5952.00 samples/sec   Loss 12.2664   LearningRate 0.3133   Epoch: 4   Global Step: 42190   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:32:13,982-Speed 5947.81 samples/sec   Loss 12.2254   LearningRate 0.3133   Epoch: 4   Global Step: 42200   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:32:20,850-Speed 5965.69 samples/sec   Loss 12.2791   LearningRate 0.3133   Epoch: 4   Global Step: 42210   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:32:27,698-Speed 5982.11 samples/sec   Loss 12.2612   LearningRate 0.3132   Epoch: 4   Global Step: 42220   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:32:34,547-Speed 5981.74 samples/sec   Loss 12.2589   LearningRate 0.3132   Epoch: 4   Global Step: 42230   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:32:41,390-Speed 5985.82 samples/sec   Loss 12.3221   LearningRate 0.3131   Epoch: 4   Global Step: 42240   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:32:48,276-Speed 5949.17 samples/sec   Loss 12.3270   LearningRate 0.3131   Epoch: 4   Global Step: 42250   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:32:55,127-Speed 5980.47 samples/sec   Loss 12.2934   LearningRate 0.3131   Epoch: 4   Global Step: 42260   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:33:01,983-Speed 5975.04 samples/sec   Loss 12.3324   LearningRate 0.3130   Epoch: 4   Global Step: 42270   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:33:08,886-Speed 5934.43 samples/sec   Loss 12.2447   LearningRate 0.3130   Epoch: 4   Global Step: 42280   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:33:15,847-Speed 5886.17 samples/sec   Loss 12.3387   LearningRate 0.3130   Epoch: 4   Global Step: 42290   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:33:22,860-Speed 5861.12 samples/sec   Loss 12.2401   LearningRate 0.3129   Epoch: 4   Global Step: 42300   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:33:29,866-Speed 5972.36 samples/sec   Loss 12.2937   LearningRate 0.3129   Epoch: 4   Global Step: 42310   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:33:36,717-Speed 5979.91 samples/sec   Loss 12.2831   LearningRate 0.3128   Epoch: 4   Global Step: 42320   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:33:43,564-Speed 5983.09 samples/sec   Loss 12.3447   LearningRate 0.3128   Epoch: 4   Global Step: 42330   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:33:50,441-Speed 5957.55 samples/sec   Loss 12.1937   LearningRate 0.3128   Epoch: 4   Global Step: 42340   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:33:57,318-Speed 5958.38 samples/sec   Loss 12.2299   LearningRate 0.3127   Epoch: 4   Global Step: 42350   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:34:04,155-Speed 5992.18 samples/sec   Loss 12.2730   LearningRate 0.3127   Epoch: 4   Global Step: 42360   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 04:34:11,003-Speed 5982.48 samples/sec   Loss 12.2795   LearningRate 0.3127   Epoch: 4   Global Step: 42370   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 04:34:17,856-Speed 5978.49 samples/sec   Loss 12.2786   LearningRate 0.3126   Epoch: 4   Global Step: 42380   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 04:34:24,739-Speed 5952.15 samples/sec   Loss 12.2285   LearningRate 0.3126   Epoch: 4   Global Step: 42390   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 04:34:31,643-Speed 5933.64 samples/sec   Loss 12.2954   LearningRate 0.3125   Epoch: 4   Global Step: 42400   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 04:34:38,502-Speed 5973.03 samples/sec   Loss 12.2785   LearningRate 0.3125   Epoch: 4   Global Step: 42410   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 04:34:45,387-Speed 5950.56 samples/sec   Loss 12.2556   LearningRate 0.3125   Epoch: 4   Global Step: 42420   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 04:34:52,258-Speed 5962.13 samples/sec   Loss 12.2340   LearningRate 0.3124   Epoch: 4   Global Step: 42430   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 04:34:59,118-Speed 5972.08 samples/sec   Loss 12.2550   LearningRate 0.3124   Epoch: 4   Global Step: 42440   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 04:35:05,973-Speed 5978.29 samples/sec   Loss 12.2504   LearningRate 0.3123   Epoch: 4   Global Step: 42450   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 04:35:12,842-Speed 5963.62 samples/sec   Loss 12.3497   LearningRate 0.3123   Epoch: 4   Global Step: 42460   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:35:19,716-Speed 5960.78 samples/sec   Loss 12.3179   LearningRate 0.3123   Epoch: 4   Global Step: 42470   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:35:26,615-Speed 5939.76 samples/sec   Loss 12.2261   LearningRate 0.3122   Epoch: 4   Global Step: 42480   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:35:33,485-Speed 5962.99 samples/sec   Loss 12.2666   LearningRate 0.3122   Epoch: 4   Global Step: 42490   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:35:40,356-Speed 5962.63 samples/sec   Loss 12.2607   LearningRate 0.3122   Epoch: 4   Global Step: 42500   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:35:47,229-Speed 5961.04 samples/sec   Loss 12.3064   LearningRate 0.3121   Epoch: 4   Global Step: 42510   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:35:54,101-Speed 5960.97 samples/sec   Loss 12.2570   LearningRate 0.3121   Epoch: 4   Global Step: 42520   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:36:00,990-Speed 5946.39 samples/sec   Loss 12.1705   LearningRate 0.3120   Epoch: 4   Global Step: 42530   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:36:07,851-Speed 5972.18 samples/sec   Loss 12.2525   LearningRate 0.3120   Epoch: 4   Global Step: 42540   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:36:14,700-Speed 5981.22 samples/sec   Loss 12.2786   LearningRate 0.3120   Epoch: 4   Global Step: 42550   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:36:21,575-Speed 5958.65 samples/sec   Loss 12.2113   LearningRate 0.3119   Epoch: 4   Global Step: 42560   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:36:28,464-Speed 5947.13 samples/sec   Loss 12.2689   LearningRate 0.3119   Epoch: 4   Global Step: 42570   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:36:35,332-Speed 5964.95 samples/sec   Loss 12.2543   LearningRate 0.3119   Epoch: 4   Global Step: 42580   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:36:42,193-Speed 5971.28 samples/sec   Loss 12.2510   LearningRate 0.3118   Epoch: 4   Global Step: 42590   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:36:49,052-Speed 5973.15 samples/sec   Loss 12.2249   LearningRate 0.3118   Epoch: 4   Global Step: 42600   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:36:55,917-Speed 5967.36 samples/sec   Loss 12.1716   LearningRate 0.3117   Epoch: 4   Global Step: 42610   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:37:02,765-Speed 5982.70 samples/sec   Loss 12.3283   LearningRate 0.3117   Epoch: 4   Global Step: 42620   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:37:09,624-Speed 5972.97 samples/sec   Loss 12.2414   LearningRate 0.3117   Epoch: 4   Global Step: 42630   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:37:16,483-Speed 5974.66 samples/sec   Loss 12.2059   LearningRate 0.3116   Epoch: 4   Global Step: 42640   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:37:23,330-Speed 5983.18 samples/sec   Loss 12.2874   LearningRate 0.3116   Epoch: 4   Global Step: 42650   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:37:30,224-Speed 5942.98 samples/sec   Loss 12.2791   LearningRate 0.3116   Epoch: 4   Global Step: 42660   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:37:37,080-Speed 5975.26 samples/sec   Loss 12.2503   LearningRate 0.3115   Epoch: 4   Global Step: 42670   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:37:43,936-Speed 5975.11 samples/sec   Loss 12.3297   LearningRate 0.3115   Epoch: 4   Global Step: 42680   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:37:50,808-Speed 5962.83 samples/sec   Loss 12.2216   LearningRate 0.3114   Epoch: 4   Global Step: 42690   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:37:57,677-Speed 5964.13 samples/sec   Loss 12.2489   LearningRate 0.3114   Epoch: 4   Global Step: 42700   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:38:04,540-Speed 5969.20 samples/sec   Loss 12.3458   LearningRate 0.3114   Epoch: 4   Global Step: 42710   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:38:11,404-Speed 5968.05 samples/sec   Loss 12.2509   LearningRate 0.3113   Epoch: 4   Global Step: 42720   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:38:18,262-Speed 5974.67 samples/sec   Loss 12.2372   LearningRate 0.3113   Epoch: 4   Global Step: 42730   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:38:25,125-Speed 5969.22 samples/sec   Loss 12.1879   LearningRate 0.3113   Epoch: 4   Global Step: 42740   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:38:31,991-Speed 5966.60 samples/sec   Loss 12.2875   LearningRate 0.3112   Epoch: 4   Global Step: 42750   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:38:38,858-Speed 5966.09 samples/sec   Loss 12.1849   LearningRate 0.3112   Epoch: 4   Global Step: 42760   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:38:45,714-Speed 5975.47 samples/sec   Loss 12.2079   LearningRate 0.3111   Epoch: 4   Global Step: 42770   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:38:52,589-Speed 5960.63 samples/sec   Loss 12.1590   LearningRate 0.3111   Epoch: 4   Global Step: 42780   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:38:59,492-Speed 5934.62 samples/sec   Loss 12.1217   LearningRate 0.3111   Epoch: 4   Global Step: 42790   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:39:06,355-Speed 5968.83 samples/sec   Loss 12.2374   LearningRate 0.3110   Epoch: 4   Global Step: 42800   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:39:13,217-Speed 5971.12 samples/sec   Loss 12.2571   LearningRate 0.3110   Epoch: 4   Global Step: 42810   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:39:20,084-Speed 5968.97 samples/sec   Loss 12.2931   LearningRate 0.3109   Epoch: 4   Global Step: 42820   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:39:26,949-Speed 5966.76 samples/sec   Loss 12.2628   LearningRate 0.3109   Epoch: 4   Global Step: 42830   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:39:33,812-Speed 5969.07 samples/sec   Loss 12.2178   LearningRate 0.3109   Epoch: 4   Global Step: 42840   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:39:40,691-Speed 5955.67 samples/sec   Loss 12.2049   LearningRate 0.3108   Epoch: 4   Global Step: 42850   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:39:47,551-Speed 5973.88 samples/sec   Loss 12.2370   LearningRate 0.3108   Epoch: 4   Global Step: 42860   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:39:54,413-Speed 5970.41 samples/sec   Loss 12.2385   LearningRate 0.3108   Epoch: 4   Global Step: 42870   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:40:01,279-Speed 5969.25 samples/sec   Loss 12.2428   LearningRate 0.3107   Epoch: 4   Global Step: 42880   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:40:08,138-Speed 5972.29 samples/sec   Loss 12.2250   LearningRate 0.3107   Epoch: 4   Global Step: 42890   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:40:15,000-Speed 5970.85 samples/sec   Loss 12.2959   LearningRate 0.3106   Epoch: 4   Global Step: 42900   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:40:21,851-Speed 5980.41 samples/sec   Loss 12.1980   LearningRate 0.3106   Epoch: 4   Global Step: 42910   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:40:28,745-Speed 5942.46 samples/sec   Loss 12.2654   LearningRate 0.3106   Epoch: 4   Global Step: 42920   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:40:35,598-Speed 5978.17 samples/sec   Loss 12.2313   LearningRate 0.3105   Epoch: 4   Global Step: 42930   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:40:42,466-Speed 5964.80 samples/sec   Loss 12.2207   LearningRate 0.3105   Epoch: 4   Global Step: 42940   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:40:49,334-Speed 5965.26 samples/sec   Loss 12.2958   LearningRate 0.3105   Epoch: 4   Global Step: 42950   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:40:56,210-Speed 5957.90 samples/sec   Loss 12.2269   LearningRate 0.3104   Epoch: 4   Global Step: 42960   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:41:03,075-Speed 5967.96 samples/sec   Loss 12.2758   LearningRate 0.3104   Epoch: 4   Global Step: 42970   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:41:10,042-Speed 5880.26 samples/sec   Loss 12.1963   LearningRate 0.3103   Epoch: 4   Global Step: 42980   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:41:16,918-Speed 5958.57 samples/sec   Loss 12.3013   LearningRate 0.3103   Epoch: 4   Global Step: 42990   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:41:23,771-Speed 5978.16 samples/sec   Loss 12.2319   LearningRate 0.3103   Epoch: 4   Global Step: 43000   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:41:30,617-Speed 5984.04 samples/sec   Loss 12.2641   LearningRate 0.3102   Epoch: 4   Global Step: 43010   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:41:37,462-Speed 5984.72 samples/sec   Loss 12.2127   LearningRate 0.3102   Epoch: 4   Global Step: 43020   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:41:44,333-Speed 5961.85 samples/sec   Loss 12.2773   LearningRate 0.3102   Epoch: 4   Global Step: 43030   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:41:51,192-Speed 5973.26 samples/sec   Loss 12.3037   LearningRate 0.3101   Epoch: 4   Global Step: 43040   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:41:58,078-Speed 5949.51 samples/sec   Loss 12.2201   LearningRate 0.3101   Epoch: 4   Global Step: 43050   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:42:04,913-Speed 5993.71 samples/sec   Loss 12.2070   LearningRate 0.3100   Epoch: 4   Global Step: 43060   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:42:11,767-Speed 5979.29 samples/sec   Loss 12.1550   LearningRate 0.3100   Epoch: 4   Global Step: 43070   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:42:18,626-Speed 5973.09 samples/sec   Loss 12.2192   LearningRate 0.3100   Epoch: 4   Global Step: 43080   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:42:25,481-Speed 5976.07 samples/sec   Loss 12.1859   LearningRate 0.3099   Epoch: 4   Global Step: 43090   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:42:32,361-Speed 5956.98 samples/sec   Loss 12.1811   LearningRate 0.3099   Epoch: 4   Global Step: 43100   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:42:39,213-Speed 5978.49 samples/sec   Loss 12.2055   LearningRate 0.3099   Epoch: 4   Global Step: 43110   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:42:46,075-Speed 5970.19 samples/sec   Loss 12.1858   LearningRate 0.3098   Epoch: 4   Global Step: 43120   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:42:52,921-Speed 5984.29 samples/sec   Loss 12.1804   LearningRate 0.3098   Epoch: 4   Global Step: 43130   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:42:59,787-Speed 5967.51 samples/sec   Loss 12.2169   LearningRate 0.3097   Epoch: 4   Global Step: 43140   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:43:06,667-Speed 5954.77 samples/sec   Loss 12.2450   LearningRate 0.3097   Epoch: 4   Global Step: 43150   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:43:13,528-Speed 5971.04 samples/sec   Loss 12.2020   LearningRate 0.3097   Epoch: 4   Global Step: 43160   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:43:20,404-Speed 5957.77 samples/sec   Loss 12.2074   LearningRate 0.3096   Epoch: 4   Global Step: 43170   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:43:27,282-Speed 5956.39 samples/sec   Loss 12.2336   LearningRate 0.3096   Epoch: 4   Global Step: 43180   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:43:34,160-Speed 5956.35 samples/sec   Loss 12.1448   LearningRate 0.3096   Epoch: 4   Global Step: 43190   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:43:41,015-Speed 5976.56 samples/sec   Loss 12.2047   LearningRate 0.3095   Epoch: 4   Global Step: 43200   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:43:47,901-Speed 5949.66 samples/sec   Loss 12.1946   LearningRate 0.3095   Epoch: 4   Global Step: 43210   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:43:54,750-Speed 5983.05 samples/sec   Loss 12.1906   LearningRate 0.3094   Epoch: 4   Global Step: 43220   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:44:01,616-Speed 5966.72 samples/sec   Loss 12.2063   LearningRate 0.3094   Epoch: 4   Global Step: 43230   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:44:08,464-Speed 5982.19 samples/sec   Loss 12.2265   LearningRate 0.3094   Epoch: 4   Global Step: 43240   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:44:15,331-Speed 5966.50 samples/sec   Loss 12.2573   LearningRate 0.3093   Epoch: 4   Global Step: 43250   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:44:22,181-Speed 5981.20 samples/sec   Loss 12.1885   LearningRate 0.3093   Epoch: 4   Global Step: 43260   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:44:29,051-Speed 5962.85 samples/sec   Loss 12.1911   LearningRate 0.3093   Epoch: 4   Global Step: 43270   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:44:35,910-Speed 5973.16 samples/sec   Loss 12.1864   LearningRate 0.3092   Epoch: 4   Global Step: 43280   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:44:42,771-Speed 5971.34 samples/sec   Loss 12.2014   LearningRate 0.3092   Epoch: 4   Global Step: 43290   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:44:49,647-Speed 5957.11 samples/sec   Loss 12.1139   LearningRate 0.3091   Epoch: 4   Global Step: 43300   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:45:02,358-Speed 3222.78 samples/sec   Loss 12.1929   LearningRate 0.3091   Epoch: 4   Global Step: 43310   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:45:09,226-Speed 5964.86 samples/sec   Loss 12.1833   LearningRate 0.3091   Epoch: 4   Global Step: 43320   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:45:16,076-Speed 5980.74 samples/sec   Loss 12.2575   LearningRate 0.3090   Epoch: 4   Global Step: 43330   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:45:22,952-Speed 5957.92 samples/sec   Loss 12.1551   LearningRate 0.3090   Epoch: 4   Global Step: 43340   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:45:29,812-Speed 5972.81 samples/sec   Loss 12.1622   LearningRate 0.3089   Epoch: 4   Global Step: 43350   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:45:36,668-Speed 5974.91 samples/sec   Loss 12.2281   LearningRate 0.3089   Epoch: 4   Global Step: 43360   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:45:43,536-Speed 5964.98 samples/sec   Loss 12.1932   LearningRate 0.3089   Epoch: 4   Global Step: 43370   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:45:50,406-Speed 5963.20 samples/sec   Loss 12.1923   LearningRate 0.3088   Epoch: 4   Global Step: 43380   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:45:57,265-Speed 5973.12 samples/sec   Loss 12.1413   LearningRate 0.3088   Epoch: 4   Global Step: 43390   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:46:04,138-Speed 5960.79 samples/sec   Loss 12.2150   LearningRate 0.3088   Epoch: 4   Global Step: 43400   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:46:10,995-Speed 5974.24 samples/sec   Loss 12.2418   LearningRate 0.3087   Epoch: 4   Global Step: 43410   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:46:17,861-Speed 5967.21 samples/sec   Loss 12.1994   LearningRate 0.3087   Epoch: 4   Global Step: 43420   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:46:24,713-Speed 5978.99 samples/sec   Loss 12.1776   LearningRate 0.3086   Epoch: 4   Global Step: 43430   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:46:31,569-Speed 5974.87 samples/sec   Loss 12.1951   LearningRate 0.3086   Epoch: 4   Global Step: 43440   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:46:38,437-Speed 5965.38 samples/sec   Loss 12.1873   LearningRate 0.3086   Epoch: 4   Global Step: 43450   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:46:45,296-Speed 5972.61 samples/sec   Loss 12.1203   LearningRate 0.3085   Epoch: 4   Global Step: 43460   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:46:52,188-Speed 5944.47 samples/sec   Loss 12.2076   LearningRate 0.3085   Epoch: 4   Global Step: 43470   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:46:59,043-Speed 5976.51 samples/sec   Loss 12.1772   LearningRate 0.3085   Epoch: 4   Global Step: 43480   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:47:05,873-Speed 5997.53 samples/sec   Loss 12.2039   LearningRate 0.3084   Epoch: 4   Global Step: 43490   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 04:47:12,719-Speed 5984.31 samples/sec   Loss 12.1745   LearningRate 0.3084   Epoch: 4   Global Step: 43500   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 04:47:19,562-Speed 5986.35 samples/sec   Loss 12.1746   LearningRate 0.3083   Epoch: 4   Global Step: 43510   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 04:47:26,415-Speed 5978.26 samples/sec   Loss 12.2296   LearningRate 0.3083   Epoch: 4   Global Step: 43520   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 04:47:33,263-Speed 5981.83 samples/sec   Loss 12.1122   LearningRate 0.3083   Epoch: 4   Global Step: 43530   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 04:47:40,123-Speed 5971.51 samples/sec   Loss 12.1680   LearningRate 0.3082   Epoch: 4   Global Step: 43540   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 04:47:46,996-Speed 5963.85 samples/sec   Loss 12.1942   LearningRate 0.3082   Epoch: 4   Global Step: 43550   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 04:47:53,892-Speed 5940.66 samples/sec   Loss 12.1087   LearningRate 0.3082   Epoch: 4   Global Step: 43560   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 04:48:00,770-Speed 5955.74 samples/sec   Loss 12.1028   LearningRate 0.3081   Epoch: 4   Global Step: 43570   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 04:48:07,618-Speed 5983.24 samples/sec   Loss 12.1808   LearningRate 0.3081   Epoch: 4   Global Step: 43580   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 04:48:14,483-Speed 5967.39 samples/sec   Loss 12.1671   LearningRate 0.3080   Epoch: 4   Global Step: 43590   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:48:21,339-Speed 5975.69 samples/sec   Loss 12.1214   LearningRate 0.3080   Epoch: 4   Global Step: 43600   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:48:29,277-Speed 5160.86 samples/sec   Loss 12.1920   LearningRate 0.3080   Epoch: 4   Global Step: 43610   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:48:36,179-Speed 5937.54 samples/sec   Loss 12.1446   LearningRate 0.3079   Epoch: 4   Global Step: 43620   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:48:43,025-Speed 5984.12 samples/sec   Loss 12.2184   LearningRate 0.3079   Epoch: 4   Global Step: 43630   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:48:49,883-Speed 5973.45 samples/sec   Loss 12.2308   LearningRate 0.3079   Epoch: 4   Global Step: 43640   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:48:56,757-Speed 5960.18 samples/sec   Loss 12.2223   LearningRate 0.3078   Epoch: 4   Global Step: 43650   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:49:03,610-Speed 5978.47 samples/sec   Loss 12.1085   LearningRate 0.3078   Epoch: 4   Global Step: 43660   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:49:10,592-Speed 5867.20 samples/sec   Loss 12.2206   LearningRate 0.3077   Epoch: 4   Global Step: 43670   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:49:17,537-Speed 5899.60 samples/sec   Loss 12.1812   LearningRate 0.3077   Epoch: 4   Global Step: 43680   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:49:24,484-Speed 5896.94 samples/sec   Loss 12.2238   LearningRate 0.3077   Epoch: 4   Global Step: 43690   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:49:31,429-Speed 5899.57 samples/sec   Loss 12.1191   LearningRate 0.3076   Epoch: 4   Global Step: 43700   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:49:38,273-Speed 5985.65 samples/sec   Loss 12.1709   LearningRate 0.3076   Epoch: 4   Global Step: 43710   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:49:45,111-Speed 5990.54 samples/sec   Loss 12.1871   LearningRate 0.3076   Epoch: 4   Global Step: 43720   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:49:51,978-Speed 5966.49 samples/sec   Loss 12.2428   LearningRate 0.3075   Epoch: 4   Global Step: 43730   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:49:58,831-Speed 5977.54 samples/sec   Loss 12.2341   LearningRate 0.3075   Epoch: 4   Global Step: 43740   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:50:05,678-Speed 5983.39 samples/sec   Loss 12.0970   LearningRate 0.3074   Epoch: 4   Global Step: 43750   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:50:12,536-Speed 5973.56 samples/sec   Loss 12.1587   LearningRate 0.3074   Epoch: 4   Global Step: 43760   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:50:19,390-Speed 5977.02 samples/sec   Loss 12.1732   LearningRate 0.3074   Epoch: 4   Global Step: 43770   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:50:26,267-Speed 5957.41 samples/sec   Loss 12.1594   LearningRate 0.3073   Epoch: 4   Global Step: 43780   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:50:33,124-Speed 5974.63 samples/sec   Loss 12.1858   LearningRate 0.3073   Epoch: 4   Global Step: 43790   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:50:39,980-Speed 5975.95 samples/sec   Loss 12.2229   LearningRate 0.3073   Epoch: 4   Global Step: 43800   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:50:46,836-Speed 5976.55 samples/sec   Loss 12.2114   LearningRate 0.3072   Epoch: 4   Global Step: 43810   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:50:53,684-Speed 5982.19 samples/sec   Loss 12.1807   LearningRate 0.3072   Epoch: 4   Global Step: 43820   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:51:00,519-Speed 5993.19 samples/sec   Loss 12.1491   LearningRate 0.3071   Epoch: 4   Global Step: 43830   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:51:07,365-Speed 5984.26 samples/sec   Loss 12.1366   LearningRate 0.3071   Epoch: 4   Global Step: 43840   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:51:14,228-Speed 5972.28 samples/sec   Loss 12.1732   LearningRate 0.3071   Epoch: 4   Global Step: 43850   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:51:21,078-Speed 5980.23 samples/sec   Loss 12.2070   LearningRate 0.3070   Epoch: 4   Global Step: 43860   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:51:27,924-Speed 5984.15 samples/sec   Loss 12.2030   LearningRate 0.3070   Epoch: 4   Global Step: 43870   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:51:34,779-Speed 5976.45 samples/sec   Loss 12.2068   LearningRate 0.3070   Epoch: 4   Global Step: 43880   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:51:41,623-Speed 5985.53 samples/sec   Loss 12.2202   LearningRate 0.3069   Epoch: 4   Global Step: 43890   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:51:48,473-Speed 5980.37 samples/sec   Loss 12.0997   LearningRate 0.3069   Epoch: 4   Global Step: 43900   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:51:55,345-Speed 5962.68 samples/sec   Loss 12.2495   LearningRate 0.3068   Epoch: 4   Global Step: 43910   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:52:02,207-Speed 5970.10 samples/sec   Loss 12.2534   LearningRate 0.3068   Epoch: 4   Global Step: 43920   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:52:09,074-Speed 5965.26 samples/sec   Loss 12.1438   LearningRate 0.3068   Epoch: 4   Global Step: 43930   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:52:15,940-Speed 5967.13 samples/sec   Loss 12.2088   LearningRate 0.3067   Epoch: 4   Global Step: 43940   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:52:22,799-Speed 5973.17 samples/sec   Loss 12.1747   LearningRate 0.3067   Epoch: 4   Global Step: 43950   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:52:29,639-Speed 5989.24 samples/sec   Loss 12.2419   LearningRate 0.3067   Epoch: 4   Global Step: 43960   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:52:36,511-Speed 5961.64 samples/sec   Loss 12.1382   LearningRate 0.3066   Epoch: 4   Global Step: 43970   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:52:43,364-Speed 5978.55 samples/sec   Loss 12.2317   LearningRate 0.3066   Epoch: 4   Global Step: 43980   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:52:50,224-Speed 5971.42 samples/sec   Loss 12.0493   LearningRate 0.3065   Epoch: 4   Global Step: 43990   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:52:57,072-Speed 5982.70 samples/sec   Loss 12.2457   LearningRate 0.3065   Epoch: 4   Global Step: 44000   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:53:03,914-Speed 5987.44 samples/sec   Loss 12.1192   LearningRate 0.3065   Epoch: 4   Global Step: 44010   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:53:10,776-Speed 5970.56 samples/sec   Loss 12.1081   LearningRate 0.3064   Epoch: 4   Global Step: 44020   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:53:17,624-Speed 5982.51 samples/sec   Loss 12.1248   LearningRate 0.3064   Epoch: 4   Global Step: 44030   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:53:24,477-Speed 5977.97 samples/sec   Loss 12.1380   LearningRate 0.3064   Epoch: 4   Global Step: 44040   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:53:31,336-Speed 5972.90 samples/sec   Loss 12.1640   LearningRate 0.3063   Epoch: 4   Global Step: 44050   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:53:38,195-Speed 5972.55 samples/sec   Loss 12.1988   LearningRate 0.3063   Epoch: 4   Global Step: 44060   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:53:45,043-Speed 5982.95 samples/sec   Loss 12.2378   LearningRate 0.3062   Epoch: 4   Global Step: 44070   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:53:51,898-Speed 5978.07 samples/sec   Loss 12.1729   LearningRate 0.3062   Epoch: 4   Global Step: 44080   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:53:58,736-Speed 5991.06 samples/sec   Loss 12.1582   LearningRate 0.3062   Epoch: 4   Global Step: 44090   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:54:05,602-Speed 5967.28 samples/sec   Loss 12.1565   LearningRate 0.3061   Epoch: 4   Global Step: 44100   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:54:12,449-Speed 5982.78 samples/sec   Loss 12.0446   LearningRate 0.3061   Epoch: 4   Global Step: 44110   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:54:19,307-Speed 5973.35 samples/sec   Loss 12.2222   LearningRate 0.3061   Epoch: 4   Global Step: 44120   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:54:26,151-Speed 5986.69 samples/sec   Loss 12.1137   LearningRate 0.3060   Epoch: 4   Global Step: 44130   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:54:33,015-Speed 5967.60 samples/sec   Loss 12.1430   LearningRate 0.3060   Epoch: 4   Global Step: 44140   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:54:39,873-Speed 5974.86 samples/sec   Loss 12.2716   LearningRate 0.3059   Epoch: 4   Global Step: 44150   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:54:46,727-Speed 5976.94 samples/sec   Loss 12.1307   LearningRate 0.3059   Epoch: 4   Global Step: 44160   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:54:53,675-Speed 5896.75 samples/sec   Loss 12.2150   LearningRate 0.3059   Epoch: 4   Global Step: 44170   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:55:00,537-Speed 5969.69 samples/sec   Loss 12.1348   LearningRate 0.3058   Epoch: 4   Global Step: 44180   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:55:07,391-Speed 5977.44 samples/sec   Loss 12.2064   LearningRate 0.3058   Epoch: 4   Global Step: 44190   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:55:14,246-Speed 5976.12 samples/sec   Loss 12.1077   LearningRate 0.3058   Epoch: 4   Global Step: 44200   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:55:21,107-Speed 5971.41 samples/sec   Loss 12.1410   LearningRate 0.3057   Epoch: 4   Global Step: 44210   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:55:27,971-Speed 5968.98 samples/sec   Loss 12.1263   LearningRate 0.3057   Epoch: 4   Global Step: 44220   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:55:34,834-Speed 5969.00 samples/sec   Loss 12.1565   LearningRate 0.3056   Epoch: 4   Global Step: 44230   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:55:41,701-Speed 5965.98 samples/sec   Loss 12.1472   LearningRate 0.3056   Epoch: 4   Global Step: 44240   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:55:48,585-Speed 5950.77 samples/sec   Loss 12.0869   LearningRate 0.3056   Epoch: 4   Global Step: 44250   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:55:55,444-Speed 5973.70 samples/sec   Loss 12.1449   LearningRate 0.3055   Epoch: 4   Global Step: 44260   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:56:02,312-Speed 5965.05 samples/sec   Loss 12.1568   LearningRate 0.3055   Epoch: 4   Global Step: 44270   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:56:09,144-Speed 5995.57 samples/sec   Loss 12.0719   LearningRate 0.3055   Epoch: 4   Global Step: 44280   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:56:15,988-Speed 5986.18 samples/sec   Loss 12.2084   LearningRate 0.3054   Epoch: 4   Global Step: 44290   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:56:22,843-Speed 5976.73 samples/sec   Loss 12.1155   LearningRate 0.3054   Epoch: 4   Global Step: 44300   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:56:29,718-Speed 5959.02 samples/sec   Loss 12.1567   LearningRate 0.3053   Epoch: 4   Global Step: 44310   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:56:36,602-Speed 5951.20 samples/sec   Loss 12.1285   LearningRate 0.3053   Epoch: 4   Global Step: 44320   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:56:43,452-Speed 5980.87 samples/sec   Loss 12.1381   LearningRate 0.3053   Epoch: 4   Global Step: 44330   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:56:50,352-Speed 5937.34 samples/sec   Loss 11.9966   LearningRate 0.3052   Epoch: 4   Global Step: 44340   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:56:57,209-Speed 5975.00 samples/sec   Loss 12.1323   LearningRate 0.3052   Epoch: 4   Global Step: 44350   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:57:04,082-Speed 5959.73 samples/sec   Loss 12.1221   LearningRate 0.3052   Epoch: 4   Global Step: 44360   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:57:11,012-Speed 5911.60 samples/sec   Loss 12.1237   LearningRate 0.3051   Epoch: 4   Global Step: 44370   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:57:17,883-Speed 5962.72 samples/sec   Loss 12.1292   LearningRate 0.3051   Epoch: 4   Global Step: 44380   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:57:24,754-Speed 5962.40 samples/sec   Loss 12.0903   LearningRate 0.3050   Epoch: 4   Global Step: 44390   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 04:57:31,593-Speed 5990.13 samples/sec   Loss 12.0929   LearningRate 0.3050   Epoch: 4   Global Step: 44400   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:57:38,454-Speed 5971.38 samples/sec   Loss 12.1770   LearningRate 0.3050   Epoch: 4   Global Step: 44410   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:57:45,308-Speed 5977.53 samples/sec   Loss 12.2501   LearningRate 0.3049   Epoch: 4   Global Step: 44420   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:57:52,185-Speed 5957.62 samples/sec   Loss 12.1672   LearningRate 0.3049   Epoch: 4   Global Step: 44430   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:57:59,051-Speed 5966.88 samples/sec   Loss 12.1057   LearningRate 0.3049   Epoch: 4   Global Step: 44440   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:58:05,895-Speed 5985.56 samples/sec   Loss 12.0729   LearningRate 0.3048   Epoch: 4   Global Step: 44450   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:58:12,743-Speed 5982.97 samples/sec   Loss 12.1908   LearningRate 0.3048   Epoch: 4   Global Step: 44460   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 04:58:19,590-Speed 5982.61 samples/sec   Loss 12.1686   LearningRate 0.3047   Epoch: 4   Global Step: 44470   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 04:58:26,437-Speed 5986.40 samples/sec   Loss 12.0700   LearningRate 0.3047   Epoch: 4   Global Step: 44480   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 04:58:33,315-Speed 5955.78 samples/sec   Loss 12.1766   LearningRate 0.3047   Epoch: 4   Global Step: 44490   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 04:58:40,159-Speed 5986.61 samples/sec   Loss 12.2113   LearningRate 0.3046   Epoch: 4   Global Step: 44500   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 04:58:47,022-Speed 5969.04 samples/sec   Loss 12.1432   LearningRate 0.3046   Epoch: 4   Global Step: 44510   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 04:58:53,901-Speed 5956.40 samples/sec   Loss 12.1080   LearningRate 0.3046   Epoch: 4   Global Step: 44520   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 04:59:00,746-Speed 5985.06 samples/sec   Loss 12.1292   LearningRate 0.3045   Epoch: 4   Global Step: 44530   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 04:59:07,620-Speed 5959.29 samples/sec   Loss 12.1320   LearningRate 0.3045   Epoch: 4   Global Step: 44540   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 04:59:14,467-Speed 5983.38 samples/sec   Loss 12.1756   LearningRate 0.3044   Epoch: 4   Global Step: 44550   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 04:59:21,334-Speed 5965.29 samples/sec   Loss 12.1187   LearningRate 0.3044   Epoch: 4   Global Step: 44560   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:59:28,191-Speed 5974.99 samples/sec   Loss 12.0831   LearningRate 0.3044   Epoch: 4   Global Step: 44570   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:59:35,043-Speed 5979.24 samples/sec   Loss 12.0574   LearningRate 0.3043   Epoch: 4   Global Step: 44580   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:59:41,943-Speed 5936.67 samples/sec   Loss 12.1378   LearningRate 0.3043   Epoch: 4   Global Step: 44590   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:59:48,863-Speed 5920.20 samples/sec   Loss 12.1392   LearningRate 0.3043   Epoch: 4   Global Step: 44600   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 04:59:55,791-Speed 5912.78 samples/sec   Loss 12.1854   LearningRate 0.3042   Epoch: 4   Global Step: 44610   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:00:02,687-Speed 5941.37 samples/sec   Loss 12.0778   LearningRate 0.3042   Epoch: 4   Global Step: 44620   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:00:09,563-Speed 5958.55 samples/sec   Loss 12.0837   LearningRate 0.3041   Epoch: 4   Global Step: 44630   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:00:16,447-Speed 5950.44 samples/sec   Loss 12.2109   LearningRate 0.3041   Epoch: 4   Global Step: 44640   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:00:23,295-Speed 5982.66 samples/sec   Loss 12.2524   LearningRate 0.3041   Epoch: 4   Global Step: 44650   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:00:30,161-Speed 5966.85 samples/sec   Loss 12.0454   LearningRate 0.3040   Epoch: 4   Global Step: 44660   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:00:37,010-Speed 5982.07 samples/sec   Loss 12.0847   LearningRate 0.3040   Epoch: 4   Global Step: 44670   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:00:43,866-Speed 5975.56 samples/sec   Loss 12.0877   LearningRate 0.3040   Epoch: 4   Global Step: 44680   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:00:50,724-Speed 5973.65 samples/sec   Loss 12.0698   LearningRate 0.3039   Epoch: 4   Global Step: 44690   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:00:57,576-Speed 5980.96 samples/sec   Loss 12.0686   LearningRate 0.3039   Epoch: 4   Global Step: 44700   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:01:04,430-Speed 5977.52 samples/sec   Loss 12.1537   LearningRate 0.3038   Epoch: 4   Global Step: 44710   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:01:11,288-Speed 5972.43 samples/sec   Loss 12.1676   LearningRate 0.3038   Epoch: 4   Global Step: 44720   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:01:18,276-Speed 5862.93 samples/sec   Loss 12.1503   LearningRate 0.3038   Epoch: 4   Global Step: 44730   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:01:25,208-Speed 5910.01 samples/sec   Loss 12.0866   LearningRate 0.3037   Epoch: 4   Global Step: 44740   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:01:32,101-Speed 5944.19 samples/sec   Loss 12.0694   LearningRate 0.3037   Epoch: 4   Global Step: 44750   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:01:38,952-Speed 5980.03 samples/sec   Loss 12.1022   LearningRate 0.3037   Epoch: 4   Global Step: 44760   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:01:45,800-Speed 5982.61 samples/sec   Loss 12.1619   LearningRate 0.3036   Epoch: 4   Global Step: 44770   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:01:52,666-Speed 5967.87 samples/sec   Loss 12.1424   LearningRate 0.3036   Epoch: 4   Global Step: 44780   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:01:59,519-Speed 5978.74 samples/sec   Loss 12.0799   LearningRate 0.3035   Epoch: 4   Global Step: 44790   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:02:06,393-Speed 5959.73 samples/sec   Loss 12.0972   LearningRate 0.3035   Epoch: 4   Global Step: 44800   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:02:13,259-Speed 5966.77 samples/sec   Loss 12.1115   LearningRate 0.3035   Epoch: 4   Global Step: 44810   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:02:20,115-Speed 5975.65 samples/sec   Loss 12.0619   LearningRate 0.3034   Epoch: 4   Global Step: 44820   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:02:26,993-Speed 5956.10 samples/sec   Loss 12.0588   LearningRate 0.3034   Epoch: 4   Global Step: 44830   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:02:33,856-Speed 5969.72 samples/sec   Loss 12.0532   LearningRate 0.3034   Epoch: 4   Global Step: 44840   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:02:40,703-Speed 5982.76 samples/sec   Loss 12.0458   LearningRate 0.3033   Epoch: 4   Global Step: 44850   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:02:47,570-Speed 5965.52 samples/sec   Loss 12.0171   LearningRate 0.3033   Epoch: 4   Global Step: 44860   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:02:54,420-Speed 5980.01 samples/sec   Loss 12.0469   LearningRate 0.3033   Epoch: 4   Global Step: 44870   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:03:01,290-Speed 5963.49 samples/sec   Loss 12.0426   LearningRate 0.3032   Epoch: 4   Global Step: 44880   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:03:08,139-Speed 5981.34 samples/sec   Loss 12.0940   LearningRate 0.3032   Epoch: 4   Global Step: 44890   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:03:15,000-Speed 5972.01 samples/sec   Loss 12.1427   LearningRate 0.3031   Epoch: 4   Global Step: 44900   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:03:21,858-Speed 5972.80 samples/sec   Loss 12.0669   LearningRate 0.3031   Epoch: 4   Global Step: 44910   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:03:28,843-Speed 5865.86 samples/sec   Loss 12.1135   LearningRate 0.3031   Epoch: 4   Global Step: 44920   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:03:35,712-Speed 5964.86 samples/sec   Loss 12.0717   LearningRate 0.3030   Epoch: 4   Global Step: 44930   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:03:42,566-Speed 5977.24 samples/sec   Loss 12.0778   LearningRate 0.3030   Epoch: 4   Global Step: 44940   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:03:49,539-Speed 5875.60 samples/sec   Loss 12.0979   LearningRate 0.3030   Epoch: 4   Global Step: 44950   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:03:56,387-Speed 5982.22 samples/sec   Loss 12.0961   LearningRate 0.3029   Epoch: 4   Global Step: 44960   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:04:03,257-Speed 5963.15 samples/sec   Loss 12.0631   LearningRate 0.3029   Epoch: 4   Global Step: 44970   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:04:10,119-Speed 5973.76 samples/sec   Loss 12.0895   LearningRate 0.3028   Epoch: 4   Global Step: 44980   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:04:17,004-Speed 5950.51 samples/sec   Loss 12.1194   LearningRate 0.3028   Epoch: 4   Global Step: 44990   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:04:23,891-Speed 5948.90 samples/sec   Loss 12.0505   LearningRate 0.3028   Epoch: 4   Global Step: 45000   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:04:50,929-[lfw][45000]XNorm: 22.296307
Training: 2022-01-08 05:04:50,930-[lfw][45000]Accuracy-Flip: 0.99700+-0.00245
Training: 2022-01-08 05:04:50,930-[lfw][45000]Accuracy-Highest: 0.99700
Training: 2022-01-08 05:05:22,252-[cfp_fp][45000]XNorm: 19.527793
Training: 2022-01-08 05:05:22,253-[cfp_fp][45000]Accuracy-Flip: 0.97043+-0.00897
Training: 2022-01-08 05:05:22,254-[cfp_fp][45000]Accuracy-Highest: 0.97057
Training: 2022-01-08 05:05:49,095-[agedb_30][45000]XNorm: 21.845312
Training: 2022-01-08 05:05:49,096-[agedb_30][45000]Accuracy-Flip: 0.96100+-0.00892
Training: 2022-01-08 05:05:49,097-[agedb_30][45000]Accuracy-Highest: 0.96200
Training: 2022-01-08 05:05:55,946-Speed 444.96 samples/sec   Loss 12.0866   LearningRate 0.3027   Epoch: 4   Global Step: 45010   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:06:02,792-Speed 5984.97 samples/sec   Loss 12.1171   LearningRate 0.3027   Epoch: 4   Global Step: 45020   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:06:09,625-Speed 5995.57 samples/sec   Loss 12.1182   LearningRate 0.3027   Epoch: 4   Global Step: 45030   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:06:16,493-Speed 5965.27 samples/sec   Loss 12.1106   LearningRate 0.3026   Epoch: 4   Global Step: 45040   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:06:23,434-Speed 5901.98 samples/sec   Loss 12.1221   LearningRate 0.3026   Epoch: 4   Global Step: 45050   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:06:30,295-Speed 5972.55 samples/sec   Loss 12.0302   LearningRate 0.3025   Epoch: 4   Global Step: 45060   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:06:37,185-Speed 5945.73 samples/sec   Loss 12.1428   LearningRate 0.3025   Epoch: 4   Global Step: 45070   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:06:44,044-Speed 5973.54 samples/sec   Loss 12.1223   LearningRate 0.3025   Epoch: 4   Global Step: 45080   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:06:50,905-Speed 5971.20 samples/sec   Loss 12.0435   LearningRate 0.3024   Epoch: 4   Global Step: 45090   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:06:57,763-Speed 5972.97 samples/sec   Loss 11.9932   LearningRate 0.3024   Epoch: 4   Global Step: 45100   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:07:04,613-Speed 5980.96 samples/sec   Loss 12.1019   LearningRate 0.3024   Epoch: 4   Global Step: 45110   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:07:11,483-Speed 5965.68 samples/sec   Loss 12.1385   LearningRate 0.3023   Epoch: 4   Global Step: 45120   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:07:18,349-Speed 5966.20 samples/sec   Loss 12.1165   LearningRate 0.3023   Epoch: 4   Global Step: 45130   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:07:25,203-Speed 5977.90 samples/sec   Loss 12.0534   LearningRate 0.3022   Epoch: 4   Global Step: 45140   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:07:32,057-Speed 5977.35 samples/sec   Loss 12.0771   LearningRate 0.3022   Epoch: 4   Global Step: 45150   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:07:38,910-Speed 5978.14 samples/sec   Loss 12.1388   LearningRate 0.3022   Epoch: 4   Global Step: 45160   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:07:45,769-Speed 5972.64 samples/sec   Loss 12.0402   LearningRate 0.3021   Epoch: 4   Global Step: 45170   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:07:52,619-Speed 5981.07 samples/sec   Loss 11.9956   LearningRate 0.3021   Epoch: 4   Global Step: 45180   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:07:59,499-Speed 5954.57 samples/sec   Loss 12.1222   LearningRate 0.3021   Epoch: 4   Global Step: 45190   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:08:06,360-Speed 5970.96 samples/sec   Loss 11.9941   LearningRate 0.3020   Epoch: 4   Global Step: 45200   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:08:13,198-Speed 5990.73 samples/sec   Loss 12.0690   LearningRate 0.3020   Epoch: 4   Global Step: 45210   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:08:20,036-Speed 5991.28 samples/sec   Loss 12.0149   LearningRate 0.3019   Epoch: 4   Global Step: 45220   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:08:26,888-Speed 5978.31 samples/sec   Loss 12.0716   LearningRate 0.3019   Epoch: 4   Global Step: 45230   Fp16 Grad Scale: 524288   Required: 32 hours
Training: 2022-01-08 05:08:33,715-Speed 6001.27 samples/sec   Loss 12.0650   LearningRate 0.3019   Epoch: 4   Global Step: 45240   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:08:40,556-Speed 5988.42 samples/sec   Loss 11.9405   LearningRate 0.3018   Epoch: 4   Global Step: 45250   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:08:47,418-Speed 5970.06 samples/sec   Loss 12.0555   LearningRate 0.3018   Epoch: 4   Global Step: 45260   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:08:54,269-Speed 5980.06 samples/sec   Loss 12.0748   LearningRate 0.3018   Epoch: 4   Global Step: 45270   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:09:01,121-Speed 5979.06 samples/sec   Loss 12.1156   LearningRate 0.3017   Epoch: 4   Global Step: 45280   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:09:07,991-Speed 5963.92 samples/sec   Loss 12.0931   LearningRate 0.3017   Epoch: 4   Global Step: 45290   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:09:14,841-Speed 5980.07 samples/sec   Loss 11.9772   LearningRate 0.3016   Epoch: 4   Global Step: 45300   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:09:21,687-Speed 5984.27 samples/sec   Loss 12.0955   LearningRate 0.3016   Epoch: 4   Global Step: 45310   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:09:28,556-Speed 5964.37 samples/sec   Loss 12.0381   LearningRate 0.3016   Epoch: 4   Global Step: 45320   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:09:35,405-Speed 5983.06 samples/sec   Loss 12.0070   LearningRate 0.3015   Epoch: 4   Global Step: 45330   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:09:42,260-Speed 5975.76 samples/sec   Loss 12.0732   LearningRate 0.3015   Epoch: 4   Global Step: 45340   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:09:49,106-Speed 5984.47 samples/sec   Loss 12.0269   LearningRate 0.3015   Epoch: 4   Global Step: 45350   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:09:55,954-Speed 5982.64 samples/sec   Loss 12.0868   LearningRate 0.3014   Epoch: 4   Global Step: 45360   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:10:02,799-Speed 5984.73 samples/sec   Loss 11.9722   LearningRate 0.3014   Epoch: 4   Global Step: 45370   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:10:09,652-Speed 5980.58 samples/sec   Loss 12.1008   LearningRate 0.3014   Epoch: 4   Global Step: 45380   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:10:16,508-Speed 5975.52 samples/sec   Loss 12.0024   LearningRate 0.3013   Epoch: 4   Global Step: 45390   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:10:23,362-Speed 5976.79 samples/sec   Loss 12.0717   LearningRate 0.3013   Epoch: 4   Global Step: 45400   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:10:30,206-Speed 5986.36 samples/sec   Loss 12.0679   LearningRate 0.3012   Epoch: 4   Global Step: 45410   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:10:37,071-Speed 5969.40 samples/sec   Loss 12.0565   LearningRate 0.3012   Epoch: 4   Global Step: 45420   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:10:43,917-Speed 5984.07 samples/sec   Loss 12.0181   LearningRate 0.3012   Epoch: 4   Global Step: 45430   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:10:50,765-Speed 5983.20 samples/sec   Loss 12.0195   LearningRate 0.3011   Epoch: 4   Global Step: 45440   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:10:57,602-Speed 5991.77 samples/sec   Loss 12.0555   LearningRate 0.3011   Epoch: 4   Global Step: 45450   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:11:04,464-Speed 5969.95 samples/sec   Loss 12.0896   LearningRate 0.3011   Epoch: 4   Global Step: 45460   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:11:11,332-Speed 5965.03 samples/sec   Loss 12.0051   LearningRate 0.3010   Epoch: 4   Global Step: 45470   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:11:18,197-Speed 5967.70 samples/sec   Loss 12.1215   LearningRate 0.3010   Epoch: 4   Global Step: 45480   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:11:25,054-Speed 5974.15 samples/sec   Loss 12.1314   LearningRate 0.3009   Epoch: 4   Global Step: 45490   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:11:31,920-Speed 5966.38 samples/sec   Loss 12.0320   LearningRate 0.3009   Epoch: 4   Global Step: 45500   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:11:38,782-Speed 5972.29 samples/sec   Loss 12.0021   LearningRate 0.3009   Epoch: 4   Global Step: 45510   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:11:45,641-Speed 5973.03 samples/sec   Loss 12.0751   LearningRate 0.3008   Epoch: 4   Global Step: 45520   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:11:52,491-Speed 5980.64 samples/sec   Loss 12.1489   LearningRate 0.3008   Epoch: 4   Global Step: 45530   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:11:59,333-Speed 5987.92 samples/sec   Loss 11.9336   LearningRate 0.3008   Epoch: 4   Global Step: 45540   Fp16 Grad Scale: 524288   Required: 32 hours
Training: 2022-01-08 05:12:06,240-Speed 5931.41 samples/sec   Loss 12.0507   LearningRate 0.3007   Epoch: 4   Global Step: 45550   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:12:13,089-Speed 5981.97 samples/sec   Loss 12.1502   LearningRate 0.3007   Epoch: 4   Global Step: 45560   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:12:19,931-Speed 5987.79 samples/sec   Loss 12.0394   LearningRate 0.3006   Epoch: 4   Global Step: 45570   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:12:26,796-Speed 5967.39 samples/sec   Loss 12.1092   LearningRate 0.3006   Epoch: 4   Global Step: 45580   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:12:33,683-Speed 5948.52 samples/sec   Loss 12.0568   LearningRate 0.3006   Epoch: 4   Global Step: 45590   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:12:40,527-Speed 5985.88 samples/sec   Loss 11.9863   LearningRate 0.3005   Epoch: 4   Global Step: 45600   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:12:47,377-Speed 5980.75 samples/sec   Loss 12.0718   LearningRate 0.3005   Epoch: 4   Global Step: 45610   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:12:54,219-Speed 5987.61 samples/sec   Loss 12.0343   LearningRate 0.3005   Epoch: 4   Global Step: 45620   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:13:01,068-Speed 5981.46 samples/sec   Loss 12.1668   LearningRate 0.3004   Epoch: 4   Global Step: 45630   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:13:07,953-Speed 5950.38 samples/sec   Loss 12.0488   LearningRate 0.3004   Epoch: 4   Global Step: 45640   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:13:14,809-Speed 5976.09 samples/sec   Loss 11.9714   LearningRate 0.3003   Epoch: 4   Global Step: 45650   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:13:21,661-Speed 5978.43 samples/sec   Loss 12.0063   LearningRate 0.3003   Epoch: 4   Global Step: 45660   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:13:28,521-Speed 5971.82 samples/sec   Loss 11.9407   LearningRate 0.3003   Epoch: 4   Global Step: 45670   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:13:35,387-Speed 5967.18 samples/sec   Loss 12.0118   LearningRate 0.3002   Epoch: 4   Global Step: 45680   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:13:42,243-Speed 5975.44 samples/sec   Loss 11.9279   LearningRate 0.3002   Epoch: 4   Global Step: 45690   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:13:49,090-Speed 5982.97 samples/sec   Loss 12.0269   LearningRate 0.3002   Epoch: 4   Global Step: 45700   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:13:55,957-Speed 5968.45 samples/sec   Loss 12.0304   LearningRate 0.3001   Epoch: 4   Global Step: 45710   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:14:02,817-Speed 5972.66 samples/sec   Loss 11.9342   LearningRate 0.3001   Epoch: 4   Global Step: 45720   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:14:09,678-Speed 5970.63 samples/sec   Loss 12.1035   LearningRate 0.3000   Epoch: 4   Global Step: 45730   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:14:16,540-Speed 5970.56 samples/sec   Loss 12.0061   LearningRate 0.3000   Epoch: 4   Global Step: 45740   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:14:23,389-Speed 5981.08 samples/sec   Loss 12.0162   LearningRate 0.3000   Epoch: 4   Global Step: 45750   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:14:30,250-Speed 5974.59 samples/sec   Loss 11.9622   LearningRate 0.2999   Epoch: 4   Global Step: 45760   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:14:37,111-Speed 5970.39 samples/sec   Loss 12.0696   LearningRate 0.2999   Epoch: 4   Global Step: 45770   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:14:43,988-Speed 5957.53 samples/sec   Loss 12.0618   LearningRate 0.2999   Epoch: 4   Global Step: 45780   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:14:50,842-Speed 5977.73 samples/sec   Loss 12.0100   LearningRate 0.2998   Epoch: 4   Global Step: 45790   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:14:57,720-Speed 5956.81 samples/sec   Loss 12.0511   LearningRate 0.2998   Epoch: 4   Global Step: 45800   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:15:04,596-Speed 5957.42 samples/sec   Loss 12.0665   LearningRate 0.2998   Epoch: 4   Global Step: 45810   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:15:11,470-Speed 5960.16 samples/sec   Loss 11.9410   LearningRate 0.2997   Epoch: 4   Global Step: 45820   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:15:18,334-Speed 5968.72 samples/sec   Loss 12.0108   LearningRate 0.2997   Epoch: 4   Global Step: 45830   Fp16 Grad Scale: 524288   Required: 32 hours
Training: 2022-01-08 05:15:25,199-Speed 5967.15 samples/sec   Loss 11.9657   LearningRate 0.2996   Epoch: 4   Global Step: 45840   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:15:32,076-Speed 5957.88 samples/sec   Loss 11.9340   LearningRate 0.2996   Epoch: 4   Global Step: 45850   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:15:38,952-Speed 5958.34 samples/sec   Loss 11.9951   LearningRate 0.2996   Epoch: 4   Global Step: 45860   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:15:45,849-Speed 5942.66 samples/sec   Loss 11.9995   LearningRate 0.2995   Epoch: 4   Global Step: 45870   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:15:52,737-Speed 5948.95 samples/sec   Loss 11.9779   LearningRate 0.2995   Epoch: 4   Global Step: 45880   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:15:59,700-Speed 5883.72 samples/sec   Loss 11.9359   LearningRate 0.2995   Epoch: 4   Global Step: 45890   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:16:06,632-Speed 5910.54 samples/sec   Loss 12.0799   LearningRate 0.2994   Epoch: 4   Global Step: 45900   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:16:13,499-Speed 5967.22 samples/sec   Loss 12.0188   LearningRate 0.2994   Epoch: 4   Global Step: 45910   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:16:20,382-Speed 5953.00 samples/sec   Loss 12.0129   LearningRate 0.2993   Epoch: 4   Global Step: 45920   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:16:27,266-Speed 5950.67 samples/sec   Loss 11.9647   LearningRate 0.2993   Epoch: 4   Global Step: 45930   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:16:34,105-Speed 5992.93 samples/sec   Loss 12.0560   LearningRate 0.2993   Epoch: 4   Global Step: 45940   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:16:40,969-Speed 5967.71 samples/sec   Loss 11.9973   LearningRate 0.2992   Epoch: 4   Global Step: 45950   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:16:47,823-Speed 5977.23 samples/sec   Loss 11.9549   LearningRate 0.2992   Epoch: 4   Global Step: 45960   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:16:54,693-Speed 5963.33 samples/sec   Loss 11.9974   LearningRate 0.2992   Epoch: 4   Global Step: 45970   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:17:01,550-Speed 5973.85 samples/sec   Loss 12.0019   LearningRate 0.2991   Epoch: 4   Global Step: 45980   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:17:08,416-Speed 5966.37 samples/sec   Loss 11.9536   LearningRate 0.2991   Epoch: 4   Global Step: 45990   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:17:15,264-Speed 5982.98 samples/sec   Loss 11.8825   LearningRate 0.2990   Epoch: 4   Global Step: 46000   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:17:22,113-Speed 5981.86 samples/sec   Loss 12.0649   LearningRate 0.2990   Epoch: 4   Global Step: 46010   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:17:28,961-Speed 5981.67 samples/sec   Loss 12.1270   LearningRate 0.2990   Epoch: 4   Global Step: 46020   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:17:35,821-Speed 5971.55 samples/sec   Loss 12.0349   LearningRate 0.2989   Epoch: 4   Global Step: 46030   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:17:42,678-Speed 5974.82 samples/sec   Loss 12.0451   LearningRate 0.2989   Epoch: 4   Global Step: 46040   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:17:49,533-Speed 5976.58 samples/sec   Loss 11.9268   LearningRate 0.2989   Epoch: 4   Global Step: 46050   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:17:56,385-Speed 5979.04 samples/sec   Loss 11.9679   LearningRate 0.2988   Epoch: 4   Global Step: 46060   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:18:03,244-Speed 5972.76 samples/sec   Loss 11.9854   LearningRate 0.2988   Epoch: 4   Global Step: 46070   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:18:10,109-Speed 5966.87 samples/sec   Loss 11.9992   LearningRate 0.2988   Epoch: 4   Global Step: 46080   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:18:16,954-Speed 5985.07 samples/sec   Loss 11.9995   LearningRate 0.2987   Epoch: 4   Global Step: 46090   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:18:23,818-Speed 5967.90 samples/sec   Loss 11.9737   LearningRate 0.2987   Epoch: 4   Global Step: 46100   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:18:30,689-Speed 5963.01 samples/sec   Loss 11.9810   LearningRate 0.2986   Epoch: 4   Global Step: 46110   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:18:37,544-Speed 5975.71 samples/sec   Loss 12.0030   LearningRate 0.2986   Epoch: 4   Global Step: 46120   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:18:44,442-Speed 5939.22 samples/sec   Loss 11.9872   LearningRate 0.2986   Epoch: 4   Global Step: 46130   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:18:51,334-Speed 5945.16 samples/sec   Loss 12.0754   LearningRate 0.2985   Epoch: 4   Global Step: 46140   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:18:58,188-Speed 5977.00 samples/sec   Loss 11.9799   LearningRate 0.2985   Epoch: 4   Global Step: 46150   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:19:05,043-Speed 5976.74 samples/sec   Loss 12.0722   LearningRate 0.2985   Epoch: 4   Global Step: 46160   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:19:11,939-Speed 5946.64 samples/sec   Loss 12.0808   LearningRate 0.2984   Epoch: 4   Global Step: 46170   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:19:18,810-Speed 5962.71 samples/sec   Loss 11.9924   LearningRate 0.2984   Epoch: 4   Global Step: 46180   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:19:25,684-Speed 5959.55 samples/sec   Loss 12.0017   LearningRate 0.2983   Epoch: 4   Global Step: 46190   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:19:32,540-Speed 5975.00 samples/sec   Loss 12.0851   LearningRate 0.2983   Epoch: 4   Global Step: 46200   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:19:39,420-Speed 5954.69 samples/sec   Loss 11.9812   LearningRate 0.2983   Epoch: 4   Global Step: 46210   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:19:46,274-Speed 5976.89 samples/sec   Loss 12.0410   LearningRate 0.2982   Epoch: 4   Global Step: 46220   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:19:53,146-Speed 5962.09 samples/sec   Loss 12.0323   LearningRate 0.2982   Epoch: 4   Global Step: 46230   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:20:00,054-Speed 5930.79 samples/sec   Loss 11.9513   LearningRate 0.2982   Epoch: 4   Global Step: 46240   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:20:06,919-Speed 5967.25 samples/sec   Loss 11.9814   LearningRate 0.2981   Epoch: 4   Global Step: 46250   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:20:13,781-Speed 5970.15 samples/sec   Loss 12.0927   LearningRate 0.2981   Epoch: 4   Global Step: 46260   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:20:20,642-Speed 5971.10 samples/sec   Loss 11.9882   LearningRate 0.2980   Epoch: 4   Global Step: 46270   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:20:27,524-Speed 5953.08 samples/sec   Loss 11.9390   LearningRate 0.2980   Epoch: 4   Global Step: 46280   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:20:34,377-Speed 5978.09 samples/sec   Loss 11.9715   LearningRate 0.2980   Epoch: 4   Global Step: 46290   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:20:41,251-Speed 5960.24 samples/sec   Loss 11.9281   LearningRate 0.2979   Epoch: 4   Global Step: 46300   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:20:48,109-Speed 5973.30 samples/sec   Loss 11.9799   LearningRate 0.2979   Epoch: 4   Global Step: 46310   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 05:20:55,190-Speed 5785.98 samples/sec   Loss 11.9659   LearningRate 0.2979   Epoch: 4   Global Step: 46320   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 05:21:02,105-Speed 5924.22 samples/sec   Loss 12.0047   LearningRate 0.2978   Epoch: 4   Global Step: 46330   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 05:21:08,956-Speed 5980.01 samples/sec   Loss 11.9718   LearningRate 0.2978   Epoch: 4   Global Step: 46340   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 05:21:15,831-Speed 5958.70 samples/sec   Loss 11.9137   LearningRate 0.2978   Epoch: 4   Global Step: 46350   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 05:21:22,713-Speed 5954.31 samples/sec   Loss 11.9921   LearningRate 0.2977   Epoch: 4   Global Step: 46360   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 05:21:29,570-Speed 5974.07 samples/sec   Loss 12.0369   LearningRate 0.2977   Epoch: 4   Global Step: 46370   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 05:21:36,433-Speed 5969.13 samples/sec   Loss 12.1142   LearningRate 0.2976   Epoch: 4   Global Step: 46380   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 05:21:43,281-Speed 5982.37 samples/sec   Loss 11.9264   LearningRate 0.2976   Epoch: 4   Global Step: 46390   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 05:21:50,132-Speed 5980.03 samples/sec   Loss 12.0131   LearningRate 0.2976   Epoch: 4   Global Step: 46400   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-01-08 05:21:56,981-Speed 5981.67 samples/sec   Loss 11.9747   LearningRate 0.2975   Epoch: 4   Global Step: 46410   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:22:03,835-Speed 5977.24 samples/sec   Loss 11.8980   LearningRate 0.2975   Epoch: 4   Global Step: 46420   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:22:10,708-Speed 5962.55 samples/sec   Loss 11.9685   LearningRate 0.2975   Epoch: 4   Global Step: 46430   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:22:17,565-Speed 5976.66 samples/sec   Loss 11.9601   LearningRate 0.2974   Epoch: 4   Global Step: 46440   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:22:24,414-Speed 5981.61 samples/sec   Loss 11.8974   LearningRate 0.2974   Epoch: 4   Global Step: 46450   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:22:31,276-Speed 5970.63 samples/sec   Loss 11.9089   LearningRate 0.2973   Epoch: 4   Global Step: 46460   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:22:38,143-Speed 5965.94 samples/sec   Loss 11.9713   LearningRate 0.2973   Epoch: 4   Global Step: 46470   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:22:45,114-Speed 5877.36 samples/sec   Loss 11.9370   LearningRate 0.2973   Epoch: 4   Global Step: 46480   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:22:51,958-Speed 5985.74 samples/sec   Loss 12.0115   LearningRate 0.2972   Epoch: 4   Global Step: 46490   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:22:58,842-Speed 5952.10 samples/sec   Loss 11.9533   LearningRate 0.2972   Epoch: 4   Global Step: 46500   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:23:05,703-Speed 5971.74 samples/sec   Loss 11.9516   LearningRate 0.2972   Epoch: 4   Global Step: 46510   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:23:12,552-Speed 5982.15 samples/sec   Loss 11.9117   LearningRate 0.2971   Epoch: 4   Global Step: 46520   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:23:19,418-Speed 5966.41 samples/sec   Loss 11.9951   LearningRate 0.2971   Epoch: 4   Global Step: 46530   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:23:26,304-Speed 5949.71 samples/sec   Loss 11.9672   LearningRate 0.2970   Epoch: 4   Global Step: 46540   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:23:33,185-Speed 5953.56 samples/sec   Loss 11.9318   LearningRate 0.2970   Epoch: 4   Global Step: 46550   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:23:40,079-Speed 5942.52 samples/sec   Loss 11.9982   LearningRate 0.2970   Epoch: 4   Global Step: 46560   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:23:46,949-Speed 5963.42 samples/sec   Loss 11.9530   LearningRate 0.2969   Epoch: 4   Global Step: 46570   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:23:53,826-Speed 5957.28 samples/sec   Loss 11.9830   LearningRate 0.2969   Epoch: 4   Global Step: 46580   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:24:00,691-Speed 5968.35 samples/sec   Loss 11.9012   LearningRate 0.2969   Epoch: 4   Global Step: 46590   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:24:07,560-Speed 5965.58 samples/sec   Loss 11.9745   LearningRate 0.2968   Epoch: 4   Global Step: 46600   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:24:14,418-Speed 5974.05 samples/sec   Loss 12.0257   LearningRate 0.2968   Epoch: 4   Global Step: 46610   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:24:21,267-Speed 5981.24 samples/sec   Loss 11.9837   LearningRate 0.2968   Epoch: 4   Global Step: 46620   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:24:28,115-Speed 5982.68 samples/sec   Loss 12.0012   LearningRate 0.2967   Epoch: 4   Global Step: 46630   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:24:34,968-Speed 5977.66 samples/sec   Loss 12.0141   LearningRate 0.2967   Epoch: 4   Global Step: 46640   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-01-08 05:24:41,824-Speed 5974.70 samples/sec   Loss 11.9226   LearningRate 0.2966   Epoch: 4   Global Step: 46650   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:24:48,688-Speed 5969.35 samples/sec   Loss 12.0312   LearningRate 0.2966   Epoch: 4   Global Step: 46660   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-01-08 05:24:55,561-Speed 5960.55 samples/sec   Loss 11.9086   LearningRate 0.2966   Epoch: 4   Global Step: 46670   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:25:02,420-Speed 5973.63 samples/sec   Loss 11.9614   LearningRate 0.2965   Epoch: 4   Global Step: 46680   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:25:09,284-Speed 5967.80 samples/sec   Loss 11.9398   LearningRate 0.2965   Epoch: 4   Global Step: 46690   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:25:16,139-Speed 5976.34 samples/sec   Loss 11.9151   LearningRate 0.2965   Epoch: 4   Global Step: 46700   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:25:22,998-Speed 5972.52 samples/sec   Loss 11.9776   LearningRate 0.2964   Epoch: 4   Global Step: 46710   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:25:29,863-Speed 5968.38 samples/sec   Loss 11.9435   LearningRate 0.2964   Epoch: 4   Global Step: 46720   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:25:36,722-Speed 5973.28 samples/sec   Loss 11.9173   LearningRate 0.2963   Epoch: 4   Global Step: 46730   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:25:43,640-Speed 5921.06 samples/sec   Loss 11.9429   LearningRate 0.2963   Epoch: 4   Global Step: 46740   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:25:50,517-Speed 5957.54 samples/sec   Loss 11.8924   LearningRate 0.2963   Epoch: 4   Global Step: 46750   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:25:57,384-Speed 5966.31 samples/sec   Loss 11.9745   LearningRate 0.2962   Epoch: 4   Global Step: 46760   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:26:04,228-Speed 5985.75 samples/sec   Loss 11.9283   LearningRate 0.2962   Epoch: 4   Global Step: 46770   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:26:11,143-Speed 5923.64 samples/sec   Loss 11.9558   LearningRate 0.2962   Epoch: 4   Global Step: 46780   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:26:18,067-Speed 5917.21 samples/sec   Loss 11.9633   LearningRate 0.2961   Epoch: 4   Global Step: 46790   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:26:24,916-Speed 5981.30 samples/sec   Loss 11.9223   LearningRate 0.2961   Epoch: 4   Global Step: 46800   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:26:31,767-Speed 5980.62 samples/sec   Loss 11.9406   LearningRate 0.2961   Epoch: 4   Global Step: 46810   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:26:38,651-Speed 5950.84 samples/sec   Loss 11.8801   LearningRate 0.2960   Epoch: 4   Global Step: 46820   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:26:45,505-Speed 5977.06 samples/sec   Loss 11.9106   LearningRate 0.2960   Epoch: 4   Global Step: 46830   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:26:52,378-Speed 5960.67 samples/sec   Loss 11.9697   LearningRate 0.2959   Epoch: 4   Global Step: 46840   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:26:59,254-Speed 5958.36 samples/sec   Loss 11.9182   LearningRate 0.2959   Epoch: 4   Global Step: 46850   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:27:06,121-Speed 5968.43 samples/sec   Loss 11.9224   LearningRate 0.2959   Epoch: 4   Global Step: 46860   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:27:12,972-Speed 5979.66 samples/sec   Loss 11.9170   LearningRate 0.2958   Epoch: 4   Global Step: 46870   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:27:19,817-Speed 5984.88 samples/sec   Loss 11.9815   LearningRate 0.2958   Epoch: 4   Global Step: 46880   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:27:26,685-Speed 5966.57 samples/sec   Loss 11.9322   LearningRate 0.2958   Epoch: 4   Global Step: 46890   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:27:33,540-Speed 5977.08 samples/sec   Loss 11.9091   LearningRate 0.2957   Epoch: 4   Global Step: 46900   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:27:40,426-Speed 5949.01 samples/sec   Loss 11.9728   LearningRate 0.2957   Epoch: 4   Global Step: 46910   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:27:47,294-Speed 5965.43 samples/sec   Loss 11.9686   LearningRate 0.2956   Epoch: 4   Global Step: 46920   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:27:54,141-Speed 5982.39 samples/sec   Loss 11.9698   LearningRate 0.2956   Epoch: 4   Global Step: 46930   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:28:00,992-Speed 5980.70 samples/sec   Loss 11.9346   LearningRate 0.2956   Epoch: 4   Global Step: 46940   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:28:07,868-Speed 5957.93 samples/sec   Loss 11.9415   LearningRate 0.2955   Epoch: 4   Global Step: 46950   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:28:14,739-Speed 5962.96 samples/sec   Loss 11.8808   LearningRate 0.2955   Epoch: 4   Global Step: 46960   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:28:21,609-Speed 5963.36 samples/sec   Loss 11.9148   LearningRate 0.2955   Epoch: 4   Global Step: 46970   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:28:28,471-Speed 5970.28 samples/sec   Loss 11.9626   LearningRate 0.2954   Epoch: 4   Global Step: 46980   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:28:35,332-Speed 5970.84 samples/sec   Loss 12.0095   LearningRate 0.2954   Epoch: 4   Global Step: 46990   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:28:42,204-Speed 5962.12 samples/sec   Loss 12.0001   LearningRate 0.2954   Epoch: 4   Global Step: 47000   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:28:49,078-Speed 5961.32 samples/sec   Loss 11.9776   LearningRate 0.2953   Epoch: 4   Global Step: 47010   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:28:55,928-Speed 5981.04 samples/sec   Loss 11.9913   LearningRate 0.2953   Epoch: 4   Global Step: 47020   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:29:02,795-Speed 5965.96 samples/sec   Loss 11.9932   LearningRate 0.2952   Epoch: 4   Global Step: 47030   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:29:09,653-Speed 5973.64 samples/sec   Loss 11.8534   LearningRate 0.2952   Epoch: 4   Global Step: 47040   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:29:16,641-Speed 5863.11 samples/sec   Loss 11.9442   LearningRate 0.2952   Epoch: 4   Global Step: 47050   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:29:23,492-Speed 5979.00 samples/sec   Loss 11.9744   LearningRate 0.2951   Epoch: 4   Global Step: 47060   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:29:30,322-Speed 5998.38 samples/sec   Loss 12.0028   LearningRate 0.2951   Epoch: 4   Global Step: 47070   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:29:37,172-Speed 5980.71 samples/sec   Loss 11.9676   LearningRate 0.2951   Epoch: 4   Global Step: 47080   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:29:44,033-Speed 5971.29 samples/sec   Loss 11.8781   LearningRate 0.2950   Epoch: 4   Global Step: 47090   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:29:50,884-Speed 5979.91 samples/sec   Loss 11.9775   LearningRate 0.2950   Epoch: 4   Global Step: 47100   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:29:57,767-Speed 5952.23 samples/sec   Loss 11.9545   LearningRate 0.2949   Epoch: 4   Global Step: 47110   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:30:04,650-Speed 5952.36 samples/sec   Loss 11.9224   LearningRate 0.2949   Epoch: 4   Global Step: 47120   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:30:11,491-Speed 5988.11 samples/sec   Loss 11.9480   LearningRate 0.2949   Epoch: 4   Global Step: 47130   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:30:18,347-Speed 5975.80 samples/sec   Loss 11.9867   LearningRate 0.2948   Epoch: 4   Global Step: 47140   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:30:25,205-Speed 5973.21 samples/sec   Loss 11.9892   LearningRate 0.2948   Epoch: 4   Global Step: 47150   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:30:32,067-Speed 5971.58 samples/sec   Loss 11.9234   LearningRate 0.2948   Epoch: 4   Global Step: 47160   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:30:38,925-Speed 5973.78 samples/sec   Loss 11.9733   LearningRate 0.2947   Epoch: 4   Global Step: 47170   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:30:45,784-Speed 5973.22 samples/sec   Loss 11.9874   LearningRate 0.2947   Epoch: 4   Global Step: 47180   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:30:52,647-Speed 5969.62 samples/sec   Loss 11.8801   LearningRate 0.2947   Epoch: 4   Global Step: 47190   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:30:59,516-Speed 5964.46 samples/sec   Loss 11.9486   LearningRate 0.2946   Epoch: 4   Global Step: 47200   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:31:06,366-Speed 5980.65 samples/sec   Loss 11.9473   LearningRate 0.2946   Epoch: 4   Global Step: 47210   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:31:13,208-Speed 5987.72 samples/sec   Loss 11.8790   LearningRate 0.2945   Epoch: 4   Global Step: 47220   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:31:20,059-Speed 5980.04 samples/sec   Loss 11.9180   LearningRate 0.2945   Epoch: 4   Global Step: 47230   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:31:26,908-Speed 5981.87 samples/sec   Loss 11.8767   LearningRate 0.2945   Epoch: 4   Global Step: 47240   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:31:33,768-Speed 5971.98 samples/sec   Loss 11.8571   LearningRate 0.2944   Epoch: 4   Global Step: 47250   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:31:40,618-Speed 5980.42 samples/sec   Loss 11.9489   LearningRate 0.2944   Epoch: 4   Global Step: 47260   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:31:47,468-Speed 5980.49 samples/sec   Loss 11.8926   LearningRate 0.2944   Epoch: 4   Global Step: 47270   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:31:54,318-Speed 5981.13 samples/sec   Loss 11.9810   LearningRate 0.2943   Epoch: 4   Global Step: 47280   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:32:01,178-Speed 5971.62 samples/sec   Loss 11.9262   LearningRate 0.2943   Epoch: 4   Global Step: 47290   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:32:08,045-Speed 5966.66 samples/sec   Loss 11.8172   LearningRate 0.2942   Epoch: 4   Global Step: 47300   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:32:14,896-Speed 5980.78 samples/sec   Loss 11.8549   LearningRate 0.2942   Epoch: 4   Global Step: 47310   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:32:21,745-Speed 5982.19 samples/sec   Loss 11.8458   LearningRate 0.2942   Epoch: 4   Global Step: 47320   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:32:28,606-Speed 5970.98 samples/sec   Loss 11.8190   LearningRate 0.2941   Epoch: 4   Global Step: 47330   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:32:35,487-Speed 5953.81 samples/sec   Loss 12.0037   LearningRate 0.2941   Epoch: 4   Global Step: 47340   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:32:42,335-Speed 5982.64 samples/sec   Loss 11.8924   LearningRate 0.2941   Epoch: 4   Global Step: 47350   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:32:49,198-Speed 5969.04 samples/sec   Loss 11.9269   LearningRate 0.2940   Epoch: 4   Global Step: 47360   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:32:56,048-Speed 5982.45 samples/sec   Loss 11.8469   LearningRate 0.2940   Epoch: 4   Global Step: 47370   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:33:02,895-Speed 5983.21 samples/sec   Loss 11.9346   LearningRate 0.2940   Epoch: 4   Global Step: 47380   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:33:09,754-Speed 5973.56 samples/sec   Loss 11.8703   LearningRate 0.2939   Epoch: 4   Global Step: 47390   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:33:16,628-Speed 5959.88 samples/sec   Loss 11.9506   LearningRate 0.2939   Epoch: 4   Global Step: 47400   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:33:23,485-Speed 5976.38 samples/sec   Loss 11.9232   LearningRate 0.2938   Epoch: 4   Global Step: 47410   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:33:30,363-Speed 5956.61 samples/sec   Loss 11.9633   LearningRate 0.2938   Epoch: 4   Global Step: 47420   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:33:37,225-Speed 5970.35 samples/sec   Loss 11.9239   LearningRate 0.2938   Epoch: 4   Global Step: 47430   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:33:44,084-Speed 5972.92 samples/sec   Loss 11.9150   LearningRate 0.2937   Epoch: 4   Global Step: 47440   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:33:50,919-Speed 5993.29 samples/sec   Loss 11.8275   LearningRate 0.2937   Epoch: 4   Global Step: 47450   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:33:57,770-Speed 5981.62 samples/sec   Loss 11.8667   LearningRate 0.2937   Epoch: 4   Global Step: 47460   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:34:04,629-Speed 5972.87 samples/sec   Loss 11.8952   LearningRate 0.2936   Epoch: 4   Global Step: 47470   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:34:11,479-Speed 5981.41 samples/sec   Loss 11.8958   LearningRate 0.2936   Epoch: 4   Global Step: 47480   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:34:18,326-Speed 5983.34 samples/sec   Loss 11.8655   LearningRate 0.2936   Epoch: 4   Global Step: 47490   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:34:25,203-Speed 5957.08 samples/sec   Loss 11.8312   LearningRate 0.2935   Epoch: 4   Global Step: 47500   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:34:32,068-Speed 5968.04 samples/sec   Loss 11.7991   LearningRate 0.2935   Epoch: 4   Global Step: 47510   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:34:38,918-Speed 5982.02 samples/sec   Loss 11.9937   LearningRate 0.2934   Epoch: 4   Global Step: 47520   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:34:45,784-Speed 5966.63 samples/sec   Loss 11.8806   LearningRate 0.2934   Epoch: 4   Global Step: 47530   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:34:52,645-Speed 5971.22 samples/sec   Loss 11.8926   LearningRate 0.2934   Epoch: 4   Global Step: 47540   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:34:59,513-Speed 5965.51 samples/sec   Loss 11.8894   LearningRate 0.2933   Epoch: 4   Global Step: 47550   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:35:06,383-Speed 5963.43 samples/sec   Loss 11.9087   LearningRate 0.2933   Epoch: 4   Global Step: 47560   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:35:13,239-Speed 5977.33 samples/sec   Loss 11.9149   LearningRate 0.2933   Epoch: 4   Global Step: 47570   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:35:20,119-Speed 5953.86 samples/sec   Loss 11.8901   LearningRate 0.2932   Epoch: 4   Global Step: 47580   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:35:26,978-Speed 5973.47 samples/sec   Loss 11.8846   LearningRate 0.2932   Epoch: 4   Global Step: 47590   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:35:33,901-Speed 5917.56 samples/sec   Loss 11.8413   LearningRate 0.2931   Epoch: 4   Global Step: 47600   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:35:40,751-Speed 5980.52 samples/sec   Loss 11.8666   LearningRate 0.2931   Epoch: 4   Global Step: 47610   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:35:47,617-Speed 5966.53 samples/sec   Loss 11.9508   LearningRate 0.2931   Epoch: 4   Global Step: 47620   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:35:54,497-Speed 5955.07 samples/sec   Loss 11.8117   LearningRate 0.2930   Epoch: 4   Global Step: 47630   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:36:01,358-Speed 5970.79 samples/sec   Loss 11.9102   LearningRate 0.2930   Epoch: 4   Global Step: 47640   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:36:08,789-Speed 5513.53 samples/sec   Loss 11.9181   LearningRate 0.2930   Epoch: 4   Global Step: 47650   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:36:15,622-Speed 5998.59 samples/sec   Loss 11.9303   LearningRate 0.2929   Epoch: 4   Global Step: 47660   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:36:22,479-Speed 5974.72 samples/sec   Loss 11.9364   LearningRate 0.2929   Epoch: 4   Global Step: 47670   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:36:29,408-Speed 5912.53 samples/sec   Loss 11.8348   LearningRate 0.2929   Epoch: 4   Global Step: 47680   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:36:36,274-Speed 5967.24 samples/sec   Loss 11.7914   LearningRate 0.2928   Epoch: 4   Global Step: 47690   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:36:43,108-Speed 5994.74 samples/sec   Loss 11.8182   LearningRate 0.2928   Epoch: 4   Global Step: 47700   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 05:36:49,946-Speed 5989.99 samples/sec   Loss 11.8064   LearningRate 0.2927   Epoch: 4   Global Step: 47710   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 05:36:56,827-Speed 5954.65 samples/sec   Loss 11.8188   LearningRate 0.2927   Epoch: 4   Global Step: 47720   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 05:37:03,711-Speed 5954.22 samples/sec   Loss 11.8527   LearningRate 0.2927   Epoch: 4   Global Step: 47730   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 05:37:10,565-Speed 5977.19 samples/sec   Loss 12.0054   LearningRate 0.2926   Epoch: 4   Global Step: 47740   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 05:37:17,414-Speed 5980.70 samples/sec   Loss 11.8342   LearningRate 0.2926   Epoch: 4   Global Step: 47750   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 05:37:24,266-Speed 5979.73 samples/sec   Loss 11.9463   LearningRate 0.2926   Epoch: 4   Global Step: 47760   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 05:37:31,129-Speed 5968.60 samples/sec   Loss 11.8320   LearningRate 0.2925   Epoch: 4   Global Step: 47770   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 05:37:37,984-Speed 5976.29 samples/sec   Loss 11.9463   LearningRate 0.2925   Epoch: 4   Global Step: 47780   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 05:37:44,831-Speed 5983.36 samples/sec   Loss 11.8634   LearningRate 0.2925   Epoch: 4   Global Step: 47790   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-01-08 05:37:51,678-Speed 5983.35 samples/sec   Loss 11.8768   LearningRate 0.2924   Epoch: 4   Global Step: 47800   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:37:58,524-Speed 5983.59 samples/sec   Loss 11.9282   LearningRate 0.2924   Epoch: 4   Global Step: 47810   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:38:05,383-Speed 5972.48 samples/sec   Loss 11.9281   LearningRate 0.2923   Epoch: 4   Global Step: 47820   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:38:12,246-Speed 5969.56 samples/sec   Loss 11.8994   LearningRate 0.2923   Epoch: 4   Global Step: 47830   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:38:19,092-Speed 5983.98 samples/sec   Loss 11.8064   LearningRate 0.2923   Epoch: 4   Global Step: 47840   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:38:25,957-Speed 5967.77 samples/sec   Loss 11.8614   LearningRate 0.2922   Epoch: 4   Global Step: 47850   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:38:32,819-Speed 5970.17 samples/sec   Loss 12.0116   LearningRate 0.2922   Epoch: 4   Global Step: 47860   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:38:39,665-Speed 5984.65 samples/sec   Loss 11.9275   LearningRate 0.2922   Epoch: 4   Global Step: 47870   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:38:46,528-Speed 5969.33 samples/sec   Loss 11.8257   LearningRate 0.2921   Epoch: 4   Global Step: 47880   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:38:53,417-Speed 5947.29 samples/sec   Loss 11.8580   LearningRate 0.2921   Epoch: 4   Global Step: 47890   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:39:00,267-Speed 5980.53 samples/sec   Loss 11.8479   LearningRate 0.2920   Epoch: 4   Global Step: 47900   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:39:07,122-Speed 5979.15 samples/sec   Loss 11.8221   LearningRate 0.2920   Epoch: 4   Global Step: 47910   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:39:13,973-Speed 5979.67 samples/sec   Loss 11.8558   LearningRate 0.2920   Epoch: 4   Global Step: 47920   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:39:20,842-Speed 5965.55 samples/sec   Loss 11.9559   LearningRate 0.2919   Epoch: 4   Global Step: 47930   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:39:27,692-Speed 5983.46 samples/sec   Loss 11.8649   LearningRate 0.2919   Epoch: 4   Global Step: 47940   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:39:34,538-Speed 5983.60 samples/sec   Loss 11.9079   LearningRate 0.2919   Epoch: 4   Global Step: 47950   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:39:41,382-Speed 5986.30 samples/sec   Loss 11.8812   LearningRate 0.2918   Epoch: 4   Global Step: 47960   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:39:48,244-Speed 5969.50 samples/sec   Loss 11.8869   LearningRate 0.2918   Epoch: 4   Global Step: 47970   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:39:55,093-Speed 5986.95 samples/sec   Loss 12.0213   LearningRate 0.2918   Epoch: 4   Global Step: 47980   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:40:01,957-Speed 5968.78 samples/sec   Loss 11.8447   LearningRate 0.2917   Epoch: 4   Global Step: 47990   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:40:08,810-Speed 5977.51 samples/sec   Loss 11.7711   LearningRate 0.2917   Epoch: 4   Global Step: 48000   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:40:15,667-Speed 5975.00 samples/sec   Loss 11.8609   LearningRate 0.2916   Epoch: 4   Global Step: 48010   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:40:22,517-Speed 5980.35 samples/sec   Loss 11.7989   LearningRate 0.2916   Epoch: 4   Global Step: 48020   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:40:29,366-Speed 5981.38 samples/sec   Loss 11.9153   LearningRate 0.2916   Epoch: 4   Global Step: 48030   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:40:36,216-Speed 5980.66 samples/sec   Loss 11.8481   LearningRate 0.2915   Epoch: 4   Global Step: 48040   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:40:43,068-Speed 5978.98 samples/sec   Loss 11.8884   LearningRate 0.2915   Epoch: 4   Global Step: 48050   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:40:49,909-Speed 5990.08 samples/sec   Loss 11.9009   LearningRate 0.2915   Epoch: 4   Global Step: 48060   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:40:56,756-Speed 5983.80 samples/sec   Loss 11.9378   LearningRate 0.2914   Epoch: 4   Global Step: 48070   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:41:03,619-Speed 5968.41 samples/sec   Loss 11.8791   LearningRate 0.2914   Epoch: 4   Global Step: 48080   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:41:10,470-Speed 5981.43 samples/sec   Loss 11.7580   LearningRate 0.2914   Epoch: 4   Global Step: 48090   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:41:17,329-Speed 5971.99 samples/sec   Loss 11.8153   LearningRate 0.2913   Epoch: 4   Global Step: 48100   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:41:24,175-Speed 5984.51 samples/sec   Loss 11.9159   LearningRate 0.2913   Epoch: 4   Global Step: 48110   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:41:31,051-Speed 5958.70 samples/sec   Loss 11.8870   LearningRate 0.2912   Epoch: 4   Global Step: 48120   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:41:37,918-Speed 5965.89 samples/sec   Loss 11.8030   LearningRate 0.2912   Epoch: 4   Global Step: 48130   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:41:44,773-Speed 5976.00 samples/sec   Loss 11.8611   LearningRate 0.2912   Epoch: 4   Global Step: 48140   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:41:51,623-Speed 5982.94 samples/sec   Loss 11.7795   LearningRate 0.2911   Epoch: 4   Global Step: 48150   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:41:58,480-Speed 5974.20 samples/sec   Loss 11.7381   LearningRate 0.2911   Epoch: 4   Global Step: 48160   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:42:05,322-Speed 5987.79 samples/sec   Loss 11.9339   LearningRate 0.2911   Epoch: 4   Global Step: 48170   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:42:12,210-Speed 5948.49 samples/sec   Loss 11.8585   LearningRate 0.2910   Epoch: 4   Global Step: 48180   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:42:19,202-Speed 5858.85 samples/sec   Loss 11.8434   LearningRate 0.2910   Epoch: 4   Global Step: 48190   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:42:26,043-Speed 5988.77 samples/sec   Loss 11.8119   LearningRate 0.2909   Epoch: 4   Global Step: 48200   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:42:32,974-Speed 5911.11 samples/sec   Loss 11.7890   LearningRate 0.2909   Epoch: 4   Global Step: 48210   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:42:39,885-Speed 5927.53 samples/sec   Loss 11.8918   LearningRate 0.2909   Epoch: 4   Global Step: 48220   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:42:46,742-Speed 5975.42 samples/sec   Loss 11.8064   LearningRate 0.2908   Epoch: 4   Global Step: 48230   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:42:53,597-Speed 5976.24 samples/sec   Loss 11.9707   LearningRate 0.2908   Epoch: 4   Global Step: 48240   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:43:00,463-Speed 5966.55 samples/sec   Loss 11.8331   LearningRate 0.2908   Epoch: 4   Global Step: 48250   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:43:07,319-Speed 5975.21 samples/sec   Loss 11.8599   LearningRate 0.2907   Epoch: 4   Global Step: 48260   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:43:14,169-Speed 5981.15 samples/sec   Loss 11.8648   LearningRate 0.2907   Epoch: 4   Global Step: 48270   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:43:21,053-Speed 5951.02 samples/sec   Loss 11.8534   LearningRate 0.2907   Epoch: 4   Global Step: 48280   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:43:27,908-Speed 5979.52 samples/sec   Loss 11.9176   LearningRate 0.2906   Epoch: 4   Global Step: 48290   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:43:34,773-Speed 5967.23 samples/sec   Loss 11.8725   LearningRate 0.2906   Epoch: 4   Global Step: 48300   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:43:41,665-Speed 5945.22 samples/sec   Loss 11.7985   LearningRate 0.2905   Epoch: 4   Global Step: 48310   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:43:48,515-Speed 5980.94 samples/sec   Loss 11.9267   LearningRate 0.2905   Epoch: 4   Global Step: 48320   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:43:55,365-Speed 5983.66 samples/sec   Loss 11.8333   LearningRate 0.2905   Epoch: 4   Global Step: 48330   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:44:02,227-Speed 5971.07 samples/sec   Loss 11.7696   LearningRate 0.2904   Epoch: 4   Global Step: 48340   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:44:09,064-Speed 5991.51 samples/sec   Loss 11.8106   LearningRate 0.2904   Epoch: 4   Global Step: 48350   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:44:15,910-Speed 5983.62 samples/sec   Loss 11.7725   LearningRate 0.2904   Epoch: 4   Global Step: 48360   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:44:22,768-Speed 5975.91 samples/sec   Loss 11.8207   LearningRate 0.2903   Epoch: 4   Global Step: 48370   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:44:29,623-Speed 5975.94 samples/sec   Loss 11.7944   LearningRate 0.2903   Epoch: 4   Global Step: 48380   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:44:36,475-Speed 5978.77 samples/sec   Loss 11.8014   LearningRate 0.2903   Epoch: 4   Global Step: 48390   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:44:43,314-Speed 5990.71 samples/sec   Loss 11.7879   LearningRate 0.2902   Epoch: 4   Global Step: 48400   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:44:50,164-Speed 5983.13 samples/sec   Loss 11.8264   LearningRate 0.2902   Epoch: 4   Global Step: 48410   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:44:57,016-Speed 5979.16 samples/sec   Loss 11.7772   LearningRate 0.2901   Epoch: 4   Global Step: 48420   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:45:03,856-Speed 5988.64 samples/sec   Loss 11.8209   LearningRate 0.2901   Epoch: 4   Global Step: 48430   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:45:10,725-Speed 5964.06 samples/sec   Loss 11.8087   LearningRate 0.2901   Epoch: 4   Global Step: 48440   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:45:17,571-Speed 5984.07 samples/sec   Loss 11.8160   LearningRate 0.2900   Epoch: 4   Global Step: 48450   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:45:24,417-Speed 5983.15 samples/sec   Loss 11.8893   LearningRate 0.2900   Epoch: 4   Global Step: 48460   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:45:31,267-Speed 5981.89 samples/sec   Loss 11.8577   LearningRate 0.2900   Epoch: 4   Global Step: 48470   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:45:38,137-Speed 5963.21 samples/sec   Loss 11.9045   LearningRate 0.2899   Epoch: 4   Global Step: 48480   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:45:44,984-Speed 5982.53 samples/sec   Loss 11.8395   LearningRate 0.2899   Epoch: 4   Global Step: 48490   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:45:51,832-Speed 5982.95 samples/sec   Loss 11.7861   LearningRate 0.2899   Epoch: 4   Global Step: 48500   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:45:58,688-Speed 5975.89 samples/sec   Loss 11.8561   LearningRate 0.2898   Epoch: 4   Global Step: 48510   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:46:05,558-Speed 5963.86 samples/sec   Loss 11.8878   LearningRate 0.2898   Epoch: 4   Global Step: 48520   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:46:12,401-Speed 5986.38 samples/sec   Loss 11.8523   LearningRate 0.2897   Epoch: 4   Global Step: 48530   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:46:19,256-Speed 5976.70 samples/sec   Loss 11.8747   LearningRate 0.2897   Epoch: 4   Global Step: 48540   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:46:26,119-Speed 5968.89 samples/sec   Loss 11.7862   LearningRate 0.2897   Epoch: 4   Global Step: 48550   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:46:32,976-Speed 5974.83 samples/sec   Loss 11.8494   LearningRate 0.2896   Epoch: 4   Global Step: 48560   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:46:39,821-Speed 5985.78 samples/sec   Loss 11.8170   LearningRate 0.2896   Epoch: 4   Global Step: 48570   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:46:46,705-Speed 5950.73 samples/sec   Loss 11.8292   LearningRate 0.2896   Epoch: 4   Global Step: 48580   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:46:53,569-Speed 5971.79 samples/sec   Loss 11.8007   LearningRate 0.2895   Epoch: 4   Global Step: 48590   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:47:00,412-Speed 5986.30 samples/sec   Loss 11.7601   LearningRate 0.2895   Epoch: 4   Global Step: 48600   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:47:07,271-Speed 5973.10 samples/sec   Loss 11.8216   LearningRate 0.2895   Epoch: 4   Global Step: 48610   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:47:14,109-Speed 5990.74 samples/sec   Loss 11.8325   LearningRate 0.2894   Epoch: 4   Global Step: 48620   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:47:20,946-Speed 5992.32 samples/sec   Loss 11.7756   LearningRate 0.2894   Epoch: 4   Global Step: 48630   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:47:27,800-Speed 5977.17 samples/sec   Loss 11.7695   LearningRate 0.2893   Epoch: 4   Global Step: 48640   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:47:34,648-Speed 5981.90 samples/sec   Loss 11.7289   LearningRate 0.2893   Epoch: 4   Global Step: 48650   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:47:41,487-Speed 5990.83 samples/sec   Loss 11.7853   LearningRate 0.2893   Epoch: 4   Global Step: 48660   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:47:48,333-Speed 5983.18 samples/sec   Loss 11.7336   LearningRate 0.2892   Epoch: 4   Global Step: 48670   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:47:55,189-Speed 5976.41 samples/sec   Loss 11.8089   LearningRate 0.2892   Epoch: 4   Global Step: 48680   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:48:02,035-Speed 5983.96 samples/sec   Loss 11.8176   LearningRate 0.2892   Epoch: 4   Global Step: 48690   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:48:08,884-Speed 5981.42 samples/sec   Loss 11.7924   LearningRate 0.2891   Epoch: 4   Global Step: 48700   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:48:15,740-Speed 5975.91 samples/sec   Loss 11.8726   LearningRate 0.2891   Epoch: 4   Global Step: 48710   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:48:22,601-Speed 5970.95 samples/sec   Loss 11.7158   LearningRate 0.2891   Epoch: 4   Global Step: 48720   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:48:29,459-Speed 5972.67 samples/sec   Loss 11.7515   LearningRate 0.2890   Epoch: 4   Global Step: 48730   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:48:36,309-Speed 5980.93 samples/sec   Loss 11.8089   LearningRate 0.2890   Epoch: 4   Global Step: 48740   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:48:43,162-Speed 5978.54 samples/sec   Loss 11.8580   LearningRate 0.2889   Epoch: 4   Global Step: 48750   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:48:50,028-Speed 5966.01 samples/sec   Loss 11.7545   LearningRate 0.2889   Epoch: 4   Global Step: 48760   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:48:56,920-Speed 5943.82 samples/sec   Loss 11.7794   LearningRate 0.2889   Epoch: 4   Global Step: 48770   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:49:03,787-Speed 5966.19 samples/sec   Loss 11.7654   LearningRate 0.2888   Epoch: 4   Global Step: 48780   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:49:10,651-Speed 5969.24 samples/sec   Loss 11.7434   LearningRate 0.2888   Epoch: 4   Global Step: 48790   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:49:17,521-Speed 5962.82 samples/sec   Loss 11.7571   LearningRate 0.2888   Epoch: 4   Global Step: 48800   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:49:24,383-Speed 5970.35 samples/sec   Loss 11.7964   LearningRate 0.2887   Epoch: 4   Global Step: 48810   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:49:31,227-Speed 5985.55 samples/sec   Loss 11.8455   LearningRate 0.2887   Epoch: 4   Global Step: 48820   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:49:38,086-Speed 5972.86 samples/sec   Loss 11.8047   LearningRate 0.2887   Epoch: 4   Global Step: 48830   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:49:44,941-Speed 5976.45 samples/sec   Loss 11.9025   LearningRate 0.2886   Epoch: 4   Global Step: 48840   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:49:51,794-Speed 5977.71 samples/sec   Loss 11.7506   LearningRate 0.2886   Epoch: 4   Global Step: 48850   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:49:58,643-Speed 5981.16 samples/sec   Loss 11.8150   LearningRate 0.2885   Epoch: 4   Global Step: 48860   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:50:05,513-Speed 5964.02 samples/sec   Loss 11.8171   LearningRate 0.2885   Epoch: 4   Global Step: 48870   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:50:12,380-Speed 5966.50 samples/sec   Loss 11.7150   LearningRate 0.2885   Epoch: 4   Global Step: 48880   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:50:19,227-Speed 5983.11 samples/sec   Loss 11.7040   LearningRate 0.2884   Epoch: 4   Global Step: 48890   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:50:26,093-Speed 5967.20 samples/sec   Loss 11.7772   LearningRate 0.2884   Epoch: 4   Global Step: 48900   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:50:32,943-Speed 5979.74 samples/sec   Loss 11.7527   LearningRate 0.2884   Epoch: 4   Global Step: 48910   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:50:39,863-Speed 5920.14 samples/sec   Loss 11.7573   LearningRate 0.2883   Epoch: 4   Global Step: 48920   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:50:46,747-Speed 5951.63 samples/sec   Loss 11.8163   LearningRate 0.2883   Epoch: 4   Global Step: 48930   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:50:53,604-Speed 5974.48 samples/sec   Loss 11.8066   LearningRate 0.2883   Epoch: 4   Global Step: 48940   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:51:00,451-Speed 5983.49 samples/sec   Loss 11.8065   LearningRate 0.2882   Epoch: 4   Global Step: 48950   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:51:07,287-Speed 5992.23 samples/sec   Loss 11.7952   LearningRate 0.2882   Epoch: 4   Global Step: 48960   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:51:14,155-Speed 5965.39 samples/sec   Loss 11.7857   LearningRate 0.2881   Epoch: 4   Global Step: 48970   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:51:21,018-Speed 5969.80 samples/sec   Loss 11.8897   LearningRate 0.2881   Epoch: 4   Global Step: 48980   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:51:27,884-Speed 5965.83 samples/sec   Loss 11.8739   LearningRate 0.2881   Epoch: 4   Global Step: 48990   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:51:34,746-Speed 5970.59 samples/sec   Loss 11.7533   LearningRate 0.2880   Epoch: 4   Global Step: 49000   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:51:41,615-Speed 5967.33 samples/sec   Loss 11.8089   LearningRate 0.2880   Epoch: 4   Global Step: 49010   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:51:48,484-Speed 5966.72 samples/sec   Loss 11.7473   LearningRate 0.2880   Epoch: 4   Global Step: 49020   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:51:55,341-Speed 5974.95 samples/sec   Loss 11.7726   LearningRate 0.2879   Epoch: 4   Global Step: 49030   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:52:02,200-Speed 5972.42 samples/sec   Loss 11.8099   LearningRate 0.2879   Epoch: 4   Global Step: 49040   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:52:09,051-Speed 5979.92 samples/sec   Loss 11.7458   LearningRate 0.2879   Epoch: 4   Global Step: 49050   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:52:15,909-Speed 5973.50 samples/sec   Loss 11.7888   LearningRate 0.2878   Epoch: 4   Global Step: 49060   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:52:22,767-Speed 5973.43 samples/sec   Loss 11.7689   LearningRate 0.2878   Epoch: 4   Global Step: 49070   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:52:29,614-Speed 5985.94 samples/sec   Loss 11.7281   LearningRate 0.2877   Epoch: 4   Global Step: 49080   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:52:36,493-Speed 5954.95 samples/sec   Loss 11.7695   LearningRate 0.2877   Epoch: 4   Global Step: 49090   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:52:43,341-Speed 5982.16 samples/sec   Loss 11.7900   LearningRate 0.2877   Epoch: 4   Global Step: 49100   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:52:50,185-Speed 5986.74 samples/sec   Loss 11.7166   LearningRate 0.2876   Epoch: 4   Global Step: 49110   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:52:57,027-Speed 5986.54 samples/sec   Loss 11.8064   LearningRate 0.2876   Epoch: 4   Global Step: 49120   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:53:03,872-Speed 5985.54 samples/sec   Loss 11.7855   LearningRate 0.2876   Epoch: 4   Global Step: 49130   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:53:10,723-Speed 5979.49 samples/sec   Loss 11.7845   LearningRate 0.2875   Epoch: 4   Global Step: 49140   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:53:17,571-Speed 5982.63 samples/sec   Loss 11.8657   LearningRate 0.2875   Epoch: 4   Global Step: 49150   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:53:24,408-Speed 5991.38 samples/sec   Loss 11.7470   LearningRate 0.2875   Epoch: 4   Global Step: 49160   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:53:31,280-Speed 5960.88 samples/sec   Loss 11.7441   LearningRate 0.2874   Epoch: 4   Global Step: 49170   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:53:38,137-Speed 5974.38 samples/sec   Loss 11.7915   LearningRate 0.2874   Epoch: 4   Global Step: 49180   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:53:44,996-Speed 5972.71 samples/sec   Loss 11.7496   LearningRate 0.2873   Epoch: 4   Global Step: 49190   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:53:51,853-Speed 5974.67 samples/sec   Loss 11.7563   LearningRate 0.2873   Epoch: 4   Global Step: 49200   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:53:58,696-Speed 5986.78 samples/sec   Loss 11.7721   LearningRate 0.2873   Epoch: 4   Global Step: 49210   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:54:05,556-Speed 5971.25 samples/sec   Loss 11.7168   LearningRate 0.2872   Epoch: 4   Global Step: 49220   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:54:12,403-Speed 5983.27 samples/sec   Loss 11.7712   LearningRate 0.2872   Epoch: 4   Global Step: 49230   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:54:19,267-Speed 5968.86 samples/sec   Loss 11.7840   LearningRate 0.2872   Epoch: 4   Global Step: 49240   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:54:26,159-Speed 5944.19 samples/sec   Loss 11.7996   LearningRate 0.2871   Epoch: 4   Global Step: 49250   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:54:33,053-Speed 5943.34 samples/sec   Loss 11.7058   LearningRate 0.2871   Epoch: 4   Global Step: 49260   Fp16 Grad Scale: 524288   Required: 31 hours
Training: 2022-01-08 05:54:39,896-Speed 5986.31 samples/sec   Loss 11.7397   LearningRate 0.2871   Epoch: 4   Global Step: 49270   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:54:46,787-Speed 5945.15 samples/sec   Loss 11.7396   LearningRate 0.2870   Epoch: 4   Global Step: 49280   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:54:53,662-Speed 5959.19 samples/sec   Loss 11.6403   LearningRate 0.2870   Epoch: 4   Global Step: 49290   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:55:00,561-Speed 5938.43 samples/sec   Loss 11.7431   LearningRate 0.2869   Epoch: 4   Global Step: 49300   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:55:07,417-Speed 5975.81 samples/sec   Loss 11.7867   LearningRate 0.2869   Epoch: 4   Global Step: 49310   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:55:14,274-Speed 5973.56 samples/sec   Loss 11.7818   LearningRate 0.2869   Epoch: 4   Global Step: 49320   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:55:21,114-Speed 5990.23 samples/sec   Loss 11.7575   LearningRate 0.2868   Epoch: 4   Global Step: 49330   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:55:27,963-Speed 5983.33 samples/sec   Loss 11.8020   LearningRate 0.2868   Epoch: 4   Global Step: 49340   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:55:34,806-Speed 5986.14 samples/sec   Loss 11.7121   LearningRate 0.2868   Epoch: 4   Global Step: 49350   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:55:41,674-Speed 5965.28 samples/sec   Loss 11.7497   LearningRate 0.2867   Epoch: 4   Global Step: 49360   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:55:48,521-Speed 5983.54 samples/sec   Loss 11.7363   LearningRate 0.2867   Epoch: 4   Global Step: 49370   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:55:55,387-Speed 5968.57 samples/sec   Loss 11.7267   LearningRate 0.2867   Epoch: 4   Global Step: 49380   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:56:02,248-Speed 5971.09 samples/sec   Loss 11.6784   LearningRate 0.2866   Epoch: 4   Global Step: 49390   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:56:09,127-Speed 5955.71 samples/sec   Loss 11.8149   LearningRate 0.2866   Epoch: 4   Global Step: 49400   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:56:15,980-Speed 5978.63 samples/sec   Loss 11.7763   LearningRate 0.2865   Epoch: 4   Global Step: 49410   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:56:22,822-Speed 5987.28 samples/sec   Loss 11.7223   LearningRate 0.2865   Epoch: 4   Global Step: 49420   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:56:29,689-Speed 5966.26 samples/sec   Loss 11.7677   LearningRate 0.2865   Epoch: 4   Global Step: 49430   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:56:36,552-Speed 5969.31 samples/sec   Loss 11.7614   LearningRate 0.2864   Epoch: 4   Global Step: 49440   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:56:43,421-Speed 5964.58 samples/sec   Loss 11.7794   LearningRate 0.2864   Epoch: 4   Global Step: 49450   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:56:50,265-Speed 5985.40 samples/sec   Loss 11.7558   LearningRate 0.2864   Epoch: 4   Global Step: 49460   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:56:57,144-Speed 5957.07 samples/sec   Loss 11.7432   LearningRate 0.2863   Epoch: 4   Global Step: 49470   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:57:03,994-Speed 5979.91 samples/sec   Loss 11.7208   LearningRate 0.2863   Epoch: 4   Global Step: 49480   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-01-08 05:57:10,872-Speed 5957.29 samples/sec   Loss 11.7311   LearningRate 0.2863   Epoch: 4   Global Step: 49490   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:57:17,781-Speed 5931.10 samples/sec   Loss 11.7243   LearningRate 0.2862   Epoch: 4   Global Step: 49500   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:57:24,647-Speed 5968.23 samples/sec   Loss 11.7675   LearningRate 0.2862   Epoch: 4   Global Step: 49510   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:57:31,511-Speed 5968.82 samples/sec   Loss 11.7548   LearningRate 0.2861   Epoch: 4   Global Step: 49520   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:57:38,378-Speed 5968.24 samples/sec   Loss 11.7132   LearningRate 0.2861   Epoch: 4   Global Step: 49530   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:57:45,256-Speed 5956.54 samples/sec   Loss 11.7472   LearningRate 0.2861   Epoch: 4   Global Step: 49540   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:57:52,122-Speed 5967.30 samples/sec   Loss 11.7423   LearningRate 0.2860   Epoch: 4   Global Step: 49550   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:57:58,996-Speed 5959.79 samples/sec   Loss 11.8101   LearningRate 0.2860   Epoch: 4   Global Step: 49560   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:58:05,904-Speed 5930.36 samples/sec   Loss 11.7850   LearningRate 0.2860   Epoch: 4   Global Step: 49570   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:58:12,801-Speed 5940.14 samples/sec   Loss 11.8351   LearningRate 0.2859   Epoch: 4   Global Step: 49580   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:58:19,659-Speed 5973.88 samples/sec   Loss 11.7413   LearningRate 0.2859   Epoch: 4   Global Step: 49590   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:58:26,495-Speed 5992.52 samples/sec   Loss 11.7289   LearningRate 0.2859   Epoch: 4   Global Step: 49600   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:58:33,343-Speed 5982.72 samples/sec   Loss 11.6758   LearningRate 0.2858   Epoch: 4   Global Step: 49610   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:58:40,188-Speed 5984.96 samples/sec   Loss 11.7349   LearningRate 0.2858   Epoch: 4   Global Step: 49620   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:58:47,039-Speed 5979.18 samples/sec   Loss 11.7297   LearningRate 0.2857   Epoch: 4   Global Step: 49630   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:58:53,892-Speed 5977.82 samples/sec   Loss 11.7119   LearningRate 0.2857   Epoch: 4   Global Step: 49640   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:59:00,741-Speed 5981.90 samples/sec   Loss 11.6782   LearningRate 0.2857   Epoch: 4   Global Step: 49650   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:59:07,592-Speed 5979.78 samples/sec   Loss 11.7326   LearningRate 0.2856   Epoch: 4   Global Step: 49660   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:59:14,462-Speed 5963.19 samples/sec   Loss 11.7006   LearningRate 0.2856   Epoch: 4   Global Step: 49670   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:59:21,317-Speed 5976.39 samples/sec   Loss 11.8179   LearningRate 0.2856   Epoch: 4   Global Step: 49680   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:59:28,184-Speed 5965.91 samples/sec   Loss 11.7388   LearningRate 0.2855   Epoch: 4   Global Step: 49690   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 05:59:35,042-Speed 5973.35 samples/sec   Loss 11.8249   LearningRate 0.2855   Epoch: 4   Global Step: 49700   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:59:41,902-Speed 5971.96 samples/sec   Loss 11.6844   LearningRate 0.2855   Epoch: 4   Global Step: 49710   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:59:48,778-Speed 5957.54 samples/sec   Loss 11.7831   LearningRate 0.2854   Epoch: 4   Global Step: 49720   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 05:59:55,623-Speed 5985.84 samples/sec   Loss 11.7152   LearningRate 0.2854   Epoch: 4   Global Step: 49730   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:00:02,474-Speed 5980.05 samples/sec   Loss 11.7290   LearningRate 0.2853   Epoch: 4   Global Step: 49740   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:00:09,326-Speed 5978.20 samples/sec   Loss 11.7683   LearningRate 0.2853   Epoch: 4   Global Step: 49750   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:00:16,186-Speed 5972.30 samples/sec   Loss 11.8469   LearningRate 0.2853   Epoch: 4   Global Step: 49760   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:00:23,030-Speed 5986.03 samples/sec   Loss 11.6356   LearningRate 0.2852   Epoch: 4   Global Step: 49770   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:00:29,911-Speed 5954.05 samples/sec   Loss 11.7880   LearningRate 0.2852   Epoch: 4   Global Step: 49780   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:00:36,762-Speed 5979.80 samples/sec   Loss 11.6625   LearningRate 0.2852   Epoch: 4   Global Step: 49790   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:00:43,607-Speed 5985.30 samples/sec   Loss 11.7747   LearningRate 0.2851   Epoch: 4   Global Step: 49800   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:00:50,490-Speed 5951.88 samples/sec   Loss 11.7287   LearningRate 0.2851   Epoch: 4   Global Step: 49810   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:00:57,348-Speed 5973.59 samples/sec   Loss 11.7368   LearningRate 0.2851   Epoch: 4   Global Step: 49820   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:01:04,189-Speed 5988.95 samples/sec   Loss 11.6608   LearningRate 0.2850   Epoch: 4   Global Step: 49830   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:01:11,034-Speed 5984.72 samples/sec   Loss 11.7312   LearningRate 0.2850   Epoch: 4   Global Step: 49840   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:01:17,882-Speed 5984.06 samples/sec   Loss 11.6512   LearningRate 0.2849   Epoch: 4   Global Step: 49850   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:01:24,727-Speed 5984.57 samples/sec   Loss 11.7671   LearningRate 0.2849   Epoch: 4   Global Step: 49860   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:01:31,571-Speed 5986.15 samples/sec   Loss 11.7298   LearningRate 0.2849   Epoch: 4   Global Step: 49870   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:01:38,449-Speed 5956.53 samples/sec   Loss 11.7842   LearningRate 0.2848   Epoch: 4   Global Step: 49880   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:01:45,304-Speed 5976.14 samples/sec   Loss 11.7281   LearningRate 0.2848   Epoch: 4   Global Step: 49890   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:01:52,153-Speed 5980.74 samples/sec   Loss 11.7280   LearningRate 0.2848   Epoch: 4   Global Step: 49900   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:01:59,013-Speed 5974.95 samples/sec   Loss 11.6962   LearningRate 0.2847   Epoch: 4   Global Step: 49910   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:02:05,860-Speed 5982.52 samples/sec   Loss 11.6701   LearningRate 0.2847   Epoch: 4   Global Step: 49920   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:02:12,713-Speed 5977.82 samples/sec   Loss 11.7491   LearningRate 0.2847   Epoch: 4   Global Step: 49930   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:02:19,564-Speed 5983.65 samples/sec   Loss 11.6977   LearningRate 0.2846   Epoch: 4   Global Step: 49940   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:02:26,432-Speed 5965.79 samples/sec   Loss 11.7694   LearningRate 0.2846   Epoch: 4   Global Step: 49950   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:02:33,302-Speed 5963.63 samples/sec   Loss 11.7414   LearningRate 0.2846   Epoch: 4   Global Step: 49960   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:02:40,190-Speed 5947.58 samples/sec   Loss 11.5907   LearningRate 0.2845   Epoch: 4   Global Step: 49970   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:02:47,049-Speed 5972.41 samples/sec   Loss 11.6604   LearningRate 0.2845   Epoch: 4   Global Step: 49980   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:02:53,908-Speed 5972.99 samples/sec   Loss 11.6917   LearningRate 0.2844   Epoch: 4   Global Step: 49990   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:03:00,766-Speed 5974.05 samples/sec   Loss 11.6831   LearningRate 0.2844   Epoch: 4   Global Step: 50000   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:03:27,631-[lfw][50000]XNorm: 22.400308
Training: 2022-01-08 06:03:27,632-[lfw][50000]Accuracy-Flip: 0.99550+-0.00299
Training: 2022-01-08 06:03:27,633-[lfw][50000]Accuracy-Highest: 0.99700
Training: 2022-01-08 06:03:58,769-[cfp_fp][50000]XNorm: 19.297992
Training: 2022-01-08 06:03:58,770-[cfp_fp][50000]Accuracy-Flip: 0.96743+-0.00956
Training: 2022-01-08 06:03:58,771-[cfp_fp][50000]Accuracy-Highest: 0.97057
Training: 2022-01-08 06:04:25,654-[agedb_30][50000]XNorm: 21.656549
Training: 2022-01-08 06:04:25,655-[agedb_30][50000]Accuracy-Flip: 0.96283+-0.00975
Training: 2022-01-08 06:04:25,656-[agedb_30][50000]Accuracy-Highest: 0.96283
Training: 2022-01-08 06:04:32,494-Speed 446.55 samples/sec   Loss 11.6215   LearningRate 0.2844   Epoch: 4   Global Step: 50010   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:04:39,322-Speed 6000.00 samples/sec   Loss 11.6512   LearningRate 0.2843   Epoch: 4   Global Step: 50020   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:04:46,155-Speed 5994.37 samples/sec   Loss 11.7144   LearningRate 0.2843   Epoch: 4   Global Step: 50030   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:04:53,018-Speed 5969.71 samples/sec   Loss 11.6973   LearningRate 0.2843   Epoch: 4   Global Step: 50040   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:04:59,880-Speed 5970.67 samples/sec   Loss 11.7399   LearningRate 0.2842   Epoch: 4   Global Step: 50050   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:05:06,736-Speed 5975.73 samples/sec   Loss 11.6811   LearningRate 0.2842   Epoch: 4   Global Step: 50060   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:05:13,592-Speed 5978.24 samples/sec   Loss 11.7304   LearningRate 0.2842   Epoch: 4   Global Step: 50070   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:05:20,447-Speed 5976.25 samples/sec   Loss 11.6734   LearningRate 0.2841   Epoch: 4   Global Step: 50080   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:05:27,304-Speed 5974.75 samples/sec   Loss 11.7731   LearningRate 0.2841   Epoch: 4   Global Step: 50090   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:05:34,343-Speed 5819.77 samples/sec   Loss 11.7663   LearningRate 0.2840   Epoch: 4   Global Step: 50100   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:05:41,204-Speed 5971.80 samples/sec   Loss 11.6941   LearningRate 0.2840   Epoch: 4   Global Step: 50110   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:05:48,092-Speed 5948.14 samples/sec   Loss 11.6544   LearningRate 0.2840   Epoch: 4   Global Step: 50120   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:05:54,952-Speed 5971.60 samples/sec   Loss 11.6655   LearningRate 0.2839   Epoch: 4   Global Step: 50130   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:06:01,808-Speed 5975.56 samples/sec   Loss 11.6294   LearningRate 0.2839   Epoch: 4   Global Step: 50140   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:06:08,685-Speed 5957.34 samples/sec   Loss 11.6576   LearningRate 0.2839   Epoch: 4   Global Step: 50150   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:06:15,582-Speed 5941.19 samples/sec   Loss 11.6081   LearningRate 0.2838   Epoch: 4   Global Step: 50160   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:06:22,431-Speed 5981.50 samples/sec   Loss 11.6890   LearningRate 0.2838   Epoch: 4   Global Step: 50170   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:06:29,291-Speed 5978.69 samples/sec   Loss 11.6619   LearningRate 0.2838   Epoch: 4   Global Step: 50180   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:06:36,163-Speed 5962.08 samples/sec   Loss 11.6201   LearningRate 0.2837   Epoch: 4   Global Step: 50190   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:06:43,042-Speed 5955.67 samples/sec   Loss 11.8166   LearningRate 0.2837   Epoch: 4   Global Step: 50200   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:06:49,869-Speed 6000.87 samples/sec   Loss 11.6811   LearningRate 0.2836   Epoch: 4   Global Step: 50210   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:06:56,740-Speed 5962.15 samples/sec   Loss 11.6050   LearningRate 0.2836   Epoch: 4   Global Step: 50220   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:07:03,603-Speed 5969.63 samples/sec   Loss 11.6712   LearningRate 0.2836   Epoch: 4   Global Step: 50230   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:07:10,469-Speed 5967.22 samples/sec   Loss 11.7312   LearningRate 0.2835   Epoch: 4   Global Step: 50240   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:07:17,349-Speed 5955.02 samples/sec   Loss 11.6728   LearningRate 0.2835   Epoch: 4   Global Step: 50250   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:07:24,203-Speed 5976.98 samples/sec   Loss 11.6731   LearningRate 0.2835   Epoch: 4   Global Step: 50260   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:07:31,068-Speed 5968.43 samples/sec   Loss 11.6261   LearningRate 0.2834   Epoch: 4   Global Step: 50270   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:07:37,940-Speed 5961.38 samples/sec   Loss 11.6697   LearningRate 0.2834   Epoch: 4   Global Step: 50280   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:07:44,797-Speed 5978.23 samples/sec   Loss 11.6440   LearningRate 0.2834   Epoch: 4   Global Step: 50290   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:07:51,648-Speed 5979.76 samples/sec   Loss 11.7059   LearningRate 0.2833   Epoch: 4   Global Step: 50300   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:07:58,537-Speed 5946.64 samples/sec   Loss 11.6222   LearningRate 0.2833   Epoch: 4   Global Step: 50310   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:08:05,415-Speed 5956.45 samples/sec   Loss 11.6912   LearningRate 0.2833   Epoch: 4   Global Step: 50320   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:08:12,286-Speed 5962.39 samples/sec   Loss 11.6433   LearningRate 0.2832   Epoch: 4   Global Step: 50330   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:08:19,162-Speed 5957.93 samples/sec   Loss 11.6370   LearningRate 0.2832   Epoch: 4   Global Step: 50340   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:08:26,042-Speed 5954.23 samples/sec   Loss 11.6520   LearningRate 0.2831   Epoch: 4   Global Step: 50350   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:08:32,881-Speed 5989.66 samples/sec   Loss 11.7255   LearningRate 0.2831   Epoch: 4   Global Step: 50360   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:08:39,758-Speed 5957.26 samples/sec   Loss 11.6629   LearningRate 0.2831   Epoch: 4   Global Step: 50370   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:08:46,731-Speed 5875.45 samples/sec   Loss 11.7207   LearningRate 0.2830   Epoch: 4   Global Step: 50380   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:08:53,684-Speed 5892.75 samples/sec   Loss 11.6789   LearningRate 0.2830   Epoch: 4   Global Step: 50390   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:09:00,555-Speed 5962.83 samples/sec   Loss 11.6608   LearningRate 0.2830   Epoch: 4   Global Step: 50400   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:09:07,504-Speed 5898.53 samples/sec   Loss 11.6870   LearningRate 0.2829   Epoch: 4   Global Step: 50410   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:09:14,454-Speed 5894.80 samples/sec   Loss 11.6671   LearningRate 0.2829   Epoch: 4   Global Step: 50420   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:09:21,309-Speed 5976.96 samples/sec   Loss 11.6752   LearningRate 0.2829   Epoch: 4   Global Step: 50430   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:09:28,176-Speed 5965.41 samples/sec   Loss 11.5497   LearningRate 0.2828   Epoch: 4   Global Step: 50440   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:09:35,047-Speed 5962.47 samples/sec   Loss 11.6815   LearningRate 0.2828   Epoch: 4   Global Step: 50450   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:09:41,903-Speed 5975.45 samples/sec   Loss 11.6085   LearningRate 0.2827   Epoch: 4   Global Step: 50460   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:09:48,775-Speed 5961.50 samples/sec   Loss 11.6727   LearningRate 0.2827   Epoch: 4   Global Step: 50470   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:09:55,630-Speed 5977.01 samples/sec   Loss 11.7422   LearningRate 0.2827   Epoch: 4   Global Step: 50480   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:10:02,476-Speed 5984.62 samples/sec   Loss 11.6345   LearningRate 0.2826   Epoch: 4   Global Step: 50490   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:10:09,336-Speed 5972.50 samples/sec   Loss 11.6466   LearningRate 0.2826   Epoch: 4   Global Step: 50500   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:10:16,185-Speed 5981.76 samples/sec   Loss 11.6778   LearningRate 0.2826   Epoch: 4   Global Step: 50510   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:10:23,034-Speed 5981.46 samples/sec   Loss 11.6576   LearningRate 0.2825   Epoch: 4   Global Step: 50520   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:10:29,887-Speed 5977.24 samples/sec   Loss 11.6173   LearningRate 0.2825   Epoch: 4   Global Step: 50530   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:10:36,749-Speed 5970.34 samples/sec   Loss 11.6587   LearningRate 0.2825   Epoch: 4   Global Step: 50540   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:10:43,607-Speed 5974.04 samples/sec   Loss 11.6375   LearningRate 0.2824   Epoch: 4   Global Step: 50550   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:10:50,469-Speed 5970.83 samples/sec   Loss 11.6951   LearningRate 0.2824   Epoch: 4   Global Step: 50560   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:10:57,324-Speed 5976.46 samples/sec   Loss 11.6010   LearningRate 0.2824   Epoch: 4   Global Step: 50570   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:11:04,194-Speed 5962.86 samples/sec   Loss 11.6408   LearningRate 0.2823   Epoch: 4   Global Step: 50580   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:11:11,051-Speed 5975.30 samples/sec   Loss 11.6082   LearningRate 0.2823   Epoch: 4   Global Step: 50590   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:11:17,926-Speed 5958.75 samples/sec   Loss 11.6580   LearningRate 0.2822   Epoch: 4   Global Step: 50600   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:11:24,792-Speed 5966.84 samples/sec   Loss 11.6808   LearningRate 0.2822   Epoch: 4   Global Step: 50610   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:11:31,651-Speed 5973.24 samples/sec   Loss 11.6836   LearningRate 0.2822   Epoch: 4   Global Step: 50620   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:11:38,542-Speed 5944.41 samples/sec   Loss 11.6677   LearningRate 0.2821   Epoch: 4   Global Step: 50630   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:11:45,403-Speed 5971.34 samples/sec   Loss 11.6864   LearningRate 0.2821   Epoch: 4   Global Step: 50640   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:11:52,278-Speed 5959.01 samples/sec   Loss 11.6701   LearningRate 0.2821   Epoch: 4   Global Step: 50650   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:11:59,173-Speed 5942.41 samples/sec   Loss 11.6610   LearningRate 0.2820   Epoch: 4   Global Step: 50660   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:12:06,057-Speed 5950.52 samples/sec   Loss 11.6518   LearningRate 0.2820   Epoch: 4   Global Step: 50670   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:12:12,910-Speed 5980.07 samples/sec   Loss 11.6421   LearningRate 0.2820   Epoch: 4   Global Step: 50680   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:12:19,760-Speed 5980.92 samples/sec   Loss 11.6501   LearningRate 0.2819   Epoch: 4   Global Step: 50690   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:12:26,623-Speed 5969.65 samples/sec   Loss 11.6258   LearningRate 0.2819   Epoch: 4   Global Step: 50700   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:12:33,484-Speed 5970.39 samples/sec   Loss 11.7094   LearningRate 0.2818   Epoch: 4   Global Step: 50710   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:12:40,336-Speed 5979.96 samples/sec   Loss 11.6670   LearningRate 0.2818   Epoch: 4   Global Step: 50720   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:12:47,238-Speed 5935.64 samples/sec   Loss 11.5532   LearningRate 0.2818   Epoch: 4   Global Step: 50730   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:12:54,103-Speed 5967.45 samples/sec   Loss 11.6430   LearningRate 0.2817   Epoch: 4   Global Step: 50740   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:13:00,954-Speed 5980.26 samples/sec   Loss 11.6118   LearningRate 0.2817   Epoch: 4   Global Step: 50750   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:13:07,809-Speed 5975.70 samples/sec   Loss 11.5784   LearningRate 0.2817   Epoch: 4   Global Step: 50760   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:13:14,675-Speed 5966.76 samples/sec   Loss 11.6434   LearningRate 0.2816   Epoch: 4   Global Step: 50770   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:13:21,529-Speed 5979.77 samples/sec   Loss 11.6201   LearningRate 0.2816   Epoch: 4   Global Step: 50780   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:13:28,396-Speed 5965.99 samples/sec   Loss 11.6243   LearningRate 0.2816   Epoch: 4   Global Step: 50790   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:13:35,263-Speed 5965.89 samples/sec   Loss 11.7484   LearningRate 0.2815   Epoch: 4   Global Step: 50800   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:13:42,114-Speed 5980.22 samples/sec   Loss 11.6290   LearningRate 0.2815   Epoch: 4   Global Step: 50810   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:13:48,970-Speed 5975.31 samples/sec   Loss 11.7262   LearningRate 0.2815   Epoch: 4   Global Step: 50820   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:13:55,835-Speed 5966.77 samples/sec   Loss 11.7088   LearningRate 0.2814   Epoch: 4   Global Step: 50830   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:14:02,708-Speed 5960.91 samples/sec   Loss 11.5175   LearningRate 0.2814   Epoch: 4   Global Step: 50840   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:14:09,556-Speed 5982.42 samples/sec   Loss 11.6267   LearningRate 0.2813   Epoch: 4   Global Step: 50850   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:14:16,404-Speed 5982.32 samples/sec   Loss 11.5705   LearningRate 0.2813   Epoch: 4   Global Step: 50860   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:14:23,262-Speed 5973.85 samples/sec   Loss 11.7076   LearningRate 0.2813   Epoch: 4   Global Step: 50870   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:14:30,127-Speed 5970.85 samples/sec   Loss 11.6465   LearningRate 0.2812   Epoch: 4   Global Step: 50880   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:14:36,988-Speed 5970.79 samples/sec   Loss 11.5755   LearningRate 0.2812   Epoch: 4   Global Step: 50890   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:14:43,864-Speed 5957.49 samples/sec   Loss 11.6207   LearningRate 0.2812   Epoch: 4   Global Step: 50900   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:14:50,718-Speed 5976.88 samples/sec   Loss 11.6135   LearningRate 0.2811   Epoch: 4   Global Step: 50910   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:14:57,569-Speed 5980.25 samples/sec   Loss 11.5793   LearningRate 0.2811   Epoch: 4   Global Step: 50920   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:15:04,449-Speed 5954.07 samples/sec   Loss 11.5969   LearningRate 0.2811   Epoch: 4   Global Step: 50930   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:15:11,315-Speed 5967.56 samples/sec   Loss 11.5839   LearningRate 0.2810   Epoch: 4   Global Step: 50940   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:15:18,178-Speed 5969.41 samples/sec   Loss 11.5623   LearningRate 0.2810   Epoch: 4   Global Step: 50950   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:15:25,060-Speed 5952.51 samples/sec   Loss 11.5315   LearningRate 0.2809   Epoch: 4   Global Step: 50960   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:15:31,918-Speed 5973.56 samples/sec   Loss 11.6062   LearningRate 0.2809   Epoch: 4   Global Step: 50970   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:15:38,766-Speed 5982.50 samples/sec   Loss 11.6748   LearningRate 0.2809   Epoch: 4   Global Step: 50980   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:15:45,670-Speed 5934.32 samples/sec   Loss 11.6256   LearningRate 0.2808   Epoch: 4   Global Step: 50990   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:15:52,539-Speed 5964.24 samples/sec   Loss 11.6233   LearningRate 0.2808   Epoch: 4   Global Step: 51000   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:15:59,393-Speed 5977.63 samples/sec   Loss 11.6501   LearningRate 0.2808   Epoch: 4   Global Step: 51010   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:16:06,252-Speed 5972.90 samples/sec   Loss 11.6361   LearningRate 0.2807   Epoch: 4   Global Step: 51020   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:16:13,130-Speed 5957.58 samples/sec   Loss 11.6192   LearningRate 0.2807   Epoch: 4   Global Step: 51030   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:16:19,979-Speed 5981.27 samples/sec   Loss 11.6501   LearningRate 0.2807   Epoch: 4   Global Step: 51040   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:16:26,859-Speed 5955.86 samples/sec   Loss 11.6041   LearningRate 0.2806   Epoch: 4   Global Step: 51050   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:16:33,719-Speed 5972.64 samples/sec   Loss 11.6651   LearningRate 0.2806   Epoch: 4   Global Step: 51060   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:16:40,575-Speed 5975.35 samples/sec   Loss 11.6829   LearningRate 0.2806   Epoch: 4   Global Step: 51070   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:16:47,424-Speed 5981.68 samples/sec   Loss 11.6193   LearningRate 0.2805   Epoch: 4   Global Step: 51080   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:16:54,397-Speed 5875.41 samples/sec   Loss 11.5882   LearningRate 0.2805   Epoch: 4   Global Step: 51090   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:17:01,352-Speed 5890.17 samples/sec   Loss 11.6463   LearningRate 0.2804   Epoch: 4   Global Step: 51100   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:17:08,209-Speed 5975.39 samples/sec   Loss 11.6126   LearningRate 0.2804   Epoch: 4   Global Step: 51110   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:17:15,757-Speed 5427.59 samples/sec   Loss 11.5599   LearningRate 0.2804   Epoch: 4   Global Step: 51120   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:17:22,627-Speed 5963.36 samples/sec   Loss 11.5309   LearningRate 0.2803   Epoch: 4   Global Step: 51130   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:17:29,552-Speed 5915.94 samples/sec   Loss 11.6319   LearningRate 0.2803   Epoch: 4   Global Step: 51140   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:17:36,396-Speed 5985.90 samples/sec   Loss 11.6606   LearningRate 0.2803   Epoch: 4   Global Step: 51150   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:17:43,243-Speed 5983.43 samples/sec   Loss 11.6067   LearningRate 0.2802   Epoch: 4   Global Step: 51160   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:17:50,093-Speed 5980.30 samples/sec   Loss 11.6177   LearningRate 0.2802   Epoch: 4   Global Step: 51170   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:17:56,947-Speed 5977.61 samples/sec   Loss 11.6112   LearningRate 0.2802   Epoch: 4   Global Step: 51180   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:18:03,791-Speed 5985.40 samples/sec   Loss 11.6756   LearningRate 0.2801   Epoch: 4   Global Step: 51190   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:18:10,635-Speed 5986.31 samples/sec   Loss 11.6832   LearningRate 0.2801   Epoch: 4   Global Step: 51200   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:18:17,491-Speed 5975.40 samples/sec   Loss 11.5238   LearningRate 0.2801   Epoch: 4   Global Step: 51210   Fp16 Grad Scale: 524288   Required: 31 hours
Training: 2022-01-08 06:18:24,346-Speed 5976.21 samples/sec   Loss 11.5281   LearningRate 0.2800   Epoch: 4   Global Step: 51220   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:18:31,197-Speed 5979.40 samples/sec   Loss 11.6352   LearningRate 0.2800   Epoch: 4   Global Step: 51230   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:18:38,075-Speed 5957.20 samples/sec   Loss 11.6084   LearningRate 0.2799   Epoch: 4   Global Step: 51240   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:18:44,930-Speed 5976.67 samples/sec   Loss 11.6124   LearningRate 0.2799   Epoch: 4   Global Step: 51250   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:18:51,795-Speed 5967.13 samples/sec   Loss 11.6158   LearningRate 0.2799   Epoch: 4   Global Step: 51260   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:18:58,656-Speed 5972.09 samples/sec   Loss 11.6191   LearningRate 0.2798   Epoch: 4   Global Step: 51270   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:19:05,523-Speed 5966.63 samples/sec   Loss 11.6552   LearningRate 0.2798   Epoch: 4   Global Step: 51280   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:19:12,410-Speed 5949.08 samples/sec   Loss 11.6958   LearningRate 0.2798   Epoch: 4   Global Step: 51290   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:19:19,292-Speed 5952.19 samples/sec   Loss 11.5714   LearningRate 0.2797   Epoch: 4   Global Step: 51300   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:19:26,147-Speed 5976.35 samples/sec   Loss 11.6487   LearningRate 0.2797   Epoch: 4   Global Step: 51310   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:19:33,006-Speed 5972.93 samples/sec   Loss 11.5633   LearningRate 0.2797   Epoch: 4   Global Step: 51320   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:19:39,855-Speed 5981.11 samples/sec   Loss 11.5483   LearningRate 0.2796   Epoch: 4   Global Step: 51330   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:19:46,706-Speed 5980.45 samples/sec   Loss 11.5792   LearningRate 0.2796   Epoch: 4   Global Step: 51340   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:19:53,588-Speed 5952.99 samples/sec   Loss 11.5935   LearningRate 0.2795   Epoch: 4   Global Step: 51350   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:20:00,432-Speed 5986.22 samples/sec   Loss 11.5646   LearningRate 0.2795   Epoch: 4   Global Step: 51360   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:20:07,279-Speed 5983.12 samples/sec   Loss 11.5770   LearningRate 0.2795   Epoch: 4   Global Step: 51370   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:20:14,129-Speed 5981.01 samples/sec   Loss 11.5677   LearningRate 0.2794   Epoch: 4   Global Step: 51380   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:20:20,976-Speed 5983.70 samples/sec   Loss 11.6051   LearningRate 0.2794   Epoch: 4   Global Step: 51390   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:20:27,841-Speed 5970.08 samples/sec   Loss 11.5785   LearningRate 0.2794   Epoch: 4   Global Step: 51400   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:20:34,724-Speed 5952.00 samples/sec   Loss 11.6214   LearningRate 0.2793   Epoch: 4   Global Step: 51410   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:20:41,586-Speed 5970.55 samples/sec   Loss 11.5417   LearningRate 0.2793   Epoch: 4   Global Step: 51420   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:20:48,440-Speed 5976.51 samples/sec   Loss 11.5471   LearningRate 0.2793   Epoch: 4   Global Step: 51430   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:20:55,303-Speed 5968.88 samples/sec   Loss 11.5746   LearningRate 0.2792   Epoch: 4   Global Step: 51440   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:21:02,169-Speed 5967.20 samples/sec   Loss 11.6266   LearningRate 0.2792   Epoch: 4   Global Step: 51450   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:21:09,034-Speed 5967.51 samples/sec   Loss 11.5673   LearningRate 0.2792   Epoch: 4   Global Step: 51460   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:21:15,875-Speed 5988.81 samples/sec   Loss 11.6125   LearningRate 0.2791   Epoch: 4   Global Step: 51470   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:21:22,731-Speed 5975.15 samples/sec   Loss 11.5958   LearningRate 0.2791   Epoch: 4   Global Step: 51480   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:21:29,576-Speed 5984.29 samples/sec   Loss 11.5832   LearningRate 0.2790   Epoch: 4   Global Step: 51490   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:21:36,454-Speed 5956.27 samples/sec   Loss 11.5235   LearningRate 0.2790   Epoch: 4   Global Step: 51500   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:21:43,313-Speed 5972.17 samples/sec   Loss 11.5069   LearningRate 0.2790   Epoch: 4   Global Step: 51510   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:21:51,151-Speed 5950.05 samples/sec   Loss 11.5738   LearningRate 0.2789   Epoch: 4   Global Step: 51520   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:21:57,990-Speed 5990.24 samples/sec   Loss 11.5394   LearningRate 0.2789   Epoch: 4   Global Step: 51530   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:22:04,846-Speed 5975.14 samples/sec   Loss 11.5878   LearningRate 0.2789   Epoch: 4   Global Step: 51540   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:22:11,705-Speed 5972.92 samples/sec   Loss 11.5694   LearningRate 0.2788   Epoch: 4   Global Step: 51550   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:22:18,559-Speed 5976.87 samples/sec   Loss 11.5812   LearningRate 0.2788   Epoch: 4   Global Step: 51560   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:22:25,432-Speed 5960.98 samples/sec   Loss 11.6510   LearningRate 0.2788   Epoch: 4   Global Step: 51570   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:22:32,281-Speed 5981.37 samples/sec   Loss 11.6324   LearningRate 0.2787   Epoch: 4   Global Step: 51580   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:22:39,137-Speed 5975.34 samples/sec   Loss 11.5357   LearningRate 0.2787   Epoch: 4   Global Step: 51590   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:22:46,005-Speed 5964.37 samples/sec   Loss 11.5194   LearningRate 0.2787   Epoch: 4   Global Step: 51600   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:22:52,882-Speed 5956.99 samples/sec   Loss 11.5589   LearningRate 0.2786   Epoch: 4   Global Step: 51610   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:22:59,739-Speed 5974.54 samples/sec   Loss 11.5337   LearningRate 0.2786   Epoch: 4   Global Step: 51620   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-01-08 06:23:06,695-Speed 5890.06 samples/sec   Loss 11.5402   LearningRate 0.2785   Epoch: 4   Global Step: 51630   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:23:13,662-Speed 5880.26 samples/sec   Loss 11.5192   LearningRate 0.2785   Epoch: 4   Global Step: 51640   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:23:20,661-Speed 5853.36 samples/sec   Loss 11.6227   LearningRate 0.2785   Epoch: 4   Global Step: 51650   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:23:27,548-Speed 5948.65 samples/sec   Loss 11.5725   LearningRate 0.2784   Epoch: 4   Global Step: 51660   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:23:34,392-Speed 5985.58 samples/sec   Loss 11.6697   LearningRate 0.2784   Epoch: 4   Global Step: 51670   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:23:41,238-Speed 5983.60 samples/sec   Loss 11.6424   LearningRate 0.2784   Epoch: 4   Global Step: 51680   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:23:48,085-Speed 5983.74 samples/sec   Loss 11.5381   LearningRate 0.2783   Epoch: 4   Global Step: 51690   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:23:54,934-Speed 5980.81 samples/sec   Loss 11.6315   LearningRate 0.2783   Epoch: 4   Global Step: 51700   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:24:01,802-Speed 5967.59 samples/sec   Loss 11.5682   LearningRate 0.2783   Epoch: 4   Global Step: 51710   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-01-08 06:24:08,650-Speed 5984.63 samples/sec   Loss 11.5315   LearningRate 0.2782   Epoch: 4   Global Step: 51720   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:24:15,522-Speed 5961.23 samples/sec   Loss 11.5493   LearningRate 0.2782   Epoch: 4   Global Step: 51730   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:24:22,368-Speed 5984.71 samples/sec   Loss 11.4920   LearningRate 0.2782   Epoch: 4   Global Step: 51740   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:24:29,220-Speed 5979.14 samples/sec   Loss 11.5446   LearningRate 0.2781   Epoch: 4   Global Step: 51750   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:24:36,068-Speed 5984.12 samples/sec   Loss 11.6249   LearningRate 0.2781   Epoch: 4   Global Step: 51760   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:24:42,954-Speed 5949.13 samples/sec   Loss 11.5598   LearningRate 0.2780   Epoch: 4   Global Step: 51770   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:24:49,863-Speed 5929.54 samples/sec   Loss 11.5466   LearningRate 0.2780   Epoch: 4   Global Step: 51780   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:24:56,763-Speed 5937.10 samples/sec   Loss 11.5861   LearningRate 0.2780   Epoch: 4   Global Step: 51790   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:25:03,672-Speed 5929.71 samples/sec   Loss 11.5186   LearningRate 0.2779   Epoch: 4   Global Step: 51800   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:25:10,586-Speed 5925.74 samples/sec   Loss 11.5940   LearningRate 0.2779   Epoch: 4   Global Step: 51810   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:25:17,498-Speed 5927.24 samples/sec   Loss 11.6814   LearningRate 0.2779   Epoch: 4   Global Step: 51820   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:25:24,346-Speed 5982.63 samples/sec   Loss 11.5181   LearningRate 0.2778   Epoch: 4   Global Step: 51830   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:25:31,189-Speed 5986.60 samples/sec   Loss 11.5783   LearningRate 0.2778   Epoch: 4   Global Step: 51840   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:25:54,640-Speed 1746.87 samples/sec   Loss 11.6261   LearningRate 0.2778   Epoch: 5   Global Step: 51850   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:26:01,466-Speed 6001.82 samples/sec   Loss 11.5731   LearningRate 0.2777   Epoch: 5   Global Step: 51860   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:26:08,307-Speed 5989.31 samples/sec   Loss 11.4783   LearningRate 0.2777   Epoch: 5   Global Step: 51870   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:26:15,174-Speed 5965.38 samples/sec   Loss 11.5942   LearningRate 0.2777   Epoch: 5   Global Step: 51880   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:26:22,015-Speed 5988.34 samples/sec   Loss 11.5850   LearningRate 0.2776   Epoch: 5   Global Step: 51890   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:26:28,855-Speed 5989.13 samples/sec   Loss 11.5959   LearningRate 0.2776   Epoch: 5   Global Step: 51900   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:26:35,708-Speed 5977.91 samples/sec   Loss 11.5034   LearningRate 0.2775   Epoch: 5   Global Step: 51910   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:26:42,576-Speed 5965.16 samples/sec   Loss 11.4726   LearningRate 0.2775   Epoch: 5   Global Step: 51920   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:26:49,430-Speed 5977.71 samples/sec   Loss 11.5159   LearningRate 0.2775   Epoch: 5   Global Step: 51930   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:26:56,325-Speed 5941.76 samples/sec   Loss 11.4724   LearningRate 0.2774   Epoch: 5   Global Step: 51940   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:27:03,206-Speed 5953.85 samples/sec   Loss 11.5536   LearningRate 0.2774   Epoch: 5   Global Step: 51950   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:27:10,086-Speed 5956.27 samples/sec   Loss 11.5634   LearningRate 0.2774   Epoch: 5   Global Step: 51960   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:27:16,975-Speed 5946.46 samples/sec   Loss 11.5570   LearningRate 0.2773   Epoch: 5   Global Step: 51970   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:27:23,848-Speed 5960.69 samples/sec   Loss 11.6333   LearningRate 0.2773   Epoch: 5   Global Step: 51980   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:27:30,722-Speed 5960.47 samples/sec   Loss 11.5067   LearningRate 0.2773   Epoch: 5   Global Step: 51990   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:27:37,592-Speed 5962.42 samples/sec   Loss 11.6254   LearningRate 0.2772   Epoch: 5   Global Step: 52000   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:27:44,444-Speed 5979.46 samples/sec   Loss 11.5024   LearningRate 0.2772   Epoch: 5   Global Step: 52010   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:27:51,308-Speed 5968.77 samples/sec   Loss 11.5203   LearningRate 0.2772   Epoch: 5   Global Step: 52020   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:27:58,185-Speed 5956.81 samples/sec   Loss 11.4787   LearningRate 0.2771   Epoch: 5   Global Step: 52030   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:28:05,040-Speed 5977.44 samples/sec   Loss 11.4611   LearningRate 0.2771   Epoch: 5   Global Step: 52040   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:28:11,889-Speed 5981.86 samples/sec   Loss 11.5165   LearningRate 0.2770   Epoch: 5   Global Step: 52050   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:28:18,747-Speed 5973.14 samples/sec   Loss 11.4929   LearningRate 0.2770   Epoch: 5   Global Step: 52060   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:28:25,608-Speed 5971.36 samples/sec   Loss 11.5661   LearningRate 0.2770   Epoch: 5   Global Step: 52070   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:28:32,465-Speed 5974.72 samples/sec   Loss 11.5610   LearningRate 0.2769   Epoch: 5   Global Step: 52080   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:28:39,311-Speed 5983.43 samples/sec   Loss 11.5007   LearningRate 0.2769   Epoch: 5   Global Step: 52090   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:28:46,160-Speed 5981.24 samples/sec   Loss 11.4992   LearningRate 0.2769   Epoch: 5   Global Step: 52100   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:28:53,030-Speed 5963.97 samples/sec   Loss 11.4888   LearningRate 0.2768   Epoch: 5   Global Step: 52110   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:28:59,891-Speed 5970.59 samples/sec   Loss 11.4524   LearningRate 0.2768   Epoch: 5   Global Step: 52120   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:29:06,766-Speed 5959.22 samples/sec   Loss 11.4897   LearningRate 0.2768   Epoch: 5   Global Step: 52130   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:29:13,621-Speed 5976.00 samples/sec   Loss 11.5300   LearningRate 0.2767   Epoch: 5   Global Step: 52140   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:29:20,497-Speed 5958.30 samples/sec   Loss 11.5684   LearningRate 0.2767   Epoch: 5   Global Step: 52150   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:29:27,357-Speed 5972.35 samples/sec   Loss 11.4863   LearningRate 0.2767   Epoch: 5   Global Step: 52160   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:29:34,228-Speed 5962.58 samples/sec   Loss 11.5483   LearningRate 0.2766   Epoch: 5   Global Step: 52170   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:29:41,083-Speed 5975.89 samples/sec   Loss 11.6141   LearningRate 0.2766   Epoch: 5   Global Step: 52180   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:29:47,932-Speed 5981.88 samples/sec   Loss 11.5448   LearningRate 0.2765   Epoch: 5   Global Step: 52190   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:29:54,799-Speed 5965.62 samples/sec   Loss 11.5414   LearningRate 0.2765   Epoch: 5   Global Step: 52200   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:30:01,661-Speed 5970.59 samples/sec   Loss 11.3880   LearningRate 0.2765   Epoch: 5   Global Step: 52210   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:30:08,516-Speed 5975.60 samples/sec   Loss 11.4484   LearningRate 0.2764   Epoch: 5   Global Step: 52220   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:30:15,372-Speed 5976.37 samples/sec   Loss 11.4607   LearningRate 0.2764   Epoch: 5   Global Step: 52230   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:30:22,227-Speed 5975.82 samples/sec   Loss 11.5810   LearningRate 0.2764   Epoch: 5   Global Step: 52240   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:30:29,089-Speed 5969.68 samples/sec   Loss 11.5709   LearningRate 0.2763   Epoch: 5   Global Step: 52250   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:30:35,950-Speed 5971.30 samples/sec   Loss 11.5026   LearningRate 0.2763   Epoch: 5   Global Step: 52260   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:30:42,820-Speed 5962.87 samples/sec   Loss 11.5492   LearningRate 0.2763   Epoch: 5   Global Step: 52270   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:30:49,701-Speed 5954.95 samples/sec   Loss 11.5272   LearningRate 0.2762   Epoch: 5   Global Step: 52280   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:30:57,403-Speed 5319.20 samples/sec   Loss 11.5450   LearningRate 0.2762   Epoch: 5   Global Step: 52290   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:31:04,295-Speed 5943.60 samples/sec   Loss 11.4874   LearningRate 0.2762   Epoch: 5   Global Step: 52300   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:31:11,148-Speed 5978.16 samples/sec   Loss 11.5034   LearningRate 0.2761   Epoch: 5   Global Step: 52310   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:31:18,024-Speed 5958.53 samples/sec   Loss 11.5110   LearningRate 0.2761   Epoch: 5   Global Step: 52320   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:31:24,886-Speed 5969.95 samples/sec   Loss 11.6302   LearningRate 0.2760   Epoch: 5   Global Step: 52330   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:31:31,741-Speed 5976.44 samples/sec   Loss 11.4809   LearningRate 0.2760   Epoch: 5   Global Step: 52340   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:31:38,604-Speed 5971.53 samples/sec   Loss 11.5359   LearningRate 0.2760   Epoch: 5   Global Step: 52350   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:31:45,464-Speed 5971.90 samples/sec   Loss 11.5317   LearningRate 0.2759   Epoch: 5   Global Step: 52360   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:31:52,335-Speed 5962.51 samples/sec   Loss 11.4804   LearningRate 0.2759   Epoch: 5   Global Step: 52370   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:31:59,198-Speed 5969.59 samples/sec   Loss 11.4947   LearningRate 0.2759   Epoch: 5   Global Step: 52380   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:32:06,059-Speed 5971.56 samples/sec   Loss 11.4918   LearningRate 0.2758   Epoch: 5   Global Step: 52390   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:32:12,917-Speed 5974.51 samples/sec   Loss 11.5375   LearningRate 0.2758   Epoch: 5   Global Step: 52400   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:32:19,822-Speed 5932.96 samples/sec   Loss 11.4206   LearningRate 0.2758   Epoch: 5   Global Step: 52410   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:32:26,671-Speed 5981.31 samples/sec   Loss 11.4582   LearningRate 0.2757   Epoch: 5   Global Step: 52420   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:32:33,527-Speed 5976.26 samples/sec   Loss 11.5115   LearningRate 0.2757   Epoch: 5   Global Step: 52430   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:32:40,383-Speed 5976.03 samples/sec   Loss 11.5404   LearningRate 0.2757   Epoch: 5   Global Step: 52440   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:32:47,232-Speed 5981.00 samples/sec   Loss 11.5326   LearningRate 0.2756   Epoch: 5   Global Step: 52450   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:32:54,100-Speed 5968.16 samples/sec   Loss 11.5891   LearningRate 0.2756   Epoch: 5   Global Step: 52460   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:33:00,958-Speed 5973.32 samples/sec   Loss 11.5254   LearningRate 0.2755   Epoch: 5   Global Step: 52470   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:33:07,811-Speed 5978.89 samples/sec   Loss 11.5162   LearningRate 0.2755   Epoch: 5   Global Step: 52480   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:33:14,686-Speed 5959.37 samples/sec   Loss 11.5526   LearningRate 0.2755   Epoch: 5   Global Step: 52490   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:33:21,540-Speed 5977.06 samples/sec   Loss 11.4682   LearningRate 0.2754   Epoch: 5   Global Step: 52500   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:33:28,397-Speed 5974.37 samples/sec   Loss 11.4961   LearningRate 0.2754   Epoch: 5   Global Step: 52510   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:33:35,255-Speed 5974.66 samples/sec   Loss 11.3888   LearningRate 0.2754   Epoch: 5   Global Step: 52520   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:33:42,159-Speed 5933.36 samples/sec   Loss 11.4128   LearningRate 0.2753   Epoch: 5   Global Step: 52530   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:33:49,005-Speed 5984.48 samples/sec   Loss 11.4623   LearningRate 0.2753   Epoch: 5   Global Step: 52540   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:33:55,862-Speed 5974.65 samples/sec   Loss 11.5511   LearningRate 0.2753   Epoch: 5   Global Step: 52550   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:34:02,719-Speed 5975.46 samples/sec   Loss 11.5156   LearningRate 0.2752   Epoch: 5   Global Step: 52560   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:34:09,570-Speed 5979.90 samples/sec   Loss 11.4849   LearningRate 0.2752   Epoch: 5   Global Step: 52570   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:34:16,415-Speed 5984.91 samples/sec   Loss 11.4991   LearningRate 0.2752   Epoch: 5   Global Step: 52580   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:34:23,343-Speed 5915.58 samples/sec   Loss 11.4314   LearningRate 0.2751   Epoch: 5   Global Step: 52590   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:34:30,194-Speed 5979.45 samples/sec   Loss 11.4795   LearningRate 0.2751   Epoch: 5   Global Step: 52600   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:34:37,065-Speed 5962.69 samples/sec   Loss 11.4230   LearningRate 0.2751   Epoch: 5   Global Step: 52610   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:34:43,919-Speed 5977.64 samples/sec   Loss 11.5089   LearningRate 0.2750   Epoch: 5   Global Step: 52620   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:34:50,765-Speed 5983.37 samples/sec   Loss 11.5446   LearningRate 0.2750   Epoch: 5   Global Step: 52630   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:34:57,634-Speed 5967.24 samples/sec   Loss 11.4660   LearningRate 0.2749   Epoch: 5   Global Step: 52640   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:35:04,494-Speed 5972.71 samples/sec   Loss 11.4211   LearningRate 0.2749   Epoch: 5   Global Step: 52650   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:35:11,371-Speed 5956.81 samples/sec   Loss 11.5726   LearningRate 0.2749   Epoch: 5   Global Step: 52660   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:35:18,223-Speed 5978.70 samples/sec   Loss 11.5268   LearningRate 0.2748   Epoch: 5   Global Step: 52670   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:35:25,101-Speed 5958.48 samples/sec   Loss 11.5139   LearningRate 0.2748   Epoch: 5   Global Step: 52680   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:35:31,970-Speed 5964.14 samples/sec   Loss 11.4999   LearningRate 0.2748   Epoch: 5   Global Step: 52690   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:35:38,816-Speed 5984.36 samples/sec   Loss 11.4837   LearningRate 0.2747   Epoch: 5   Global Step: 52700   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:35:45,668-Speed 5978.79 samples/sec   Loss 11.5423   LearningRate 0.2747   Epoch: 5   Global Step: 52710   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:35:52,527-Speed 5973.14 samples/sec   Loss 11.4830   LearningRate 0.2747   Epoch: 5   Global Step: 52720   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:35:59,374-Speed 5984.33 samples/sec   Loss 11.4948   LearningRate 0.2746   Epoch: 5   Global Step: 52730   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:36:06,248-Speed 5959.53 samples/sec   Loss 11.3984   LearningRate 0.2746   Epoch: 5   Global Step: 52740   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:36:13,109-Speed 5971.15 samples/sec   Loss 11.4677   LearningRate 0.2746   Epoch: 5   Global Step: 52750   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:36:19,957-Speed 5982.25 samples/sec   Loss 11.4515   LearningRate 0.2745   Epoch: 5   Global Step: 52760   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:36:26,820-Speed 5971.77 samples/sec   Loss 11.4833   LearningRate 0.2745   Epoch: 5   Global Step: 52770   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:36:33,669-Speed 5980.94 samples/sec   Loss 11.4541   LearningRate 0.2744   Epoch: 5   Global Step: 52780   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:36:40,514-Speed 5985.21 samples/sec   Loss 11.3943   LearningRate 0.2744   Epoch: 5   Global Step: 52790   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:36:47,362-Speed 5984.49 samples/sec   Loss 11.4266   LearningRate 0.2744   Epoch: 5   Global Step: 52800   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:36:54,212-Speed 5983.80 samples/sec   Loss 11.4504   LearningRate 0.2743   Epoch: 5   Global Step: 52810   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:37:01,062-Speed 5980.42 samples/sec   Loss 11.4446   LearningRate 0.2743   Epoch: 5   Global Step: 52820   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:37:07,926-Speed 5969.89 samples/sec   Loss 11.4092   LearningRate 0.2743   Epoch: 5   Global Step: 52830   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:37:14,791-Speed 5967.41 samples/sec   Loss 11.4371   LearningRate 0.2742   Epoch: 5   Global Step: 52840   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:37:21,678-Speed 5948.77 samples/sec   Loss 11.4943   LearningRate 0.2742   Epoch: 5   Global Step: 52850   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:37:28,568-Speed 5946.43 samples/sec   Loss 11.4860   LearningRate 0.2742   Epoch: 5   Global Step: 52860   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:37:35,431-Speed 5969.47 samples/sec   Loss 11.5099   LearningRate 0.2741   Epoch: 5   Global Step: 52870   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:37:42,289-Speed 5973.80 samples/sec   Loss 11.5244   LearningRate 0.2741   Epoch: 5   Global Step: 52880   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:37:49,163-Speed 5959.41 samples/sec   Loss 11.4577   LearningRate 0.2741   Epoch: 5   Global Step: 52890   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:37:56,024-Speed 5971.43 samples/sec   Loss 11.4452   LearningRate 0.2740   Epoch: 5   Global Step: 52900   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:38:02,877-Speed 5978.37 samples/sec   Loss 11.4821   LearningRate 0.2740   Epoch: 5   Global Step: 52910   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:38:09,735-Speed 5974.07 samples/sec   Loss 11.5000   LearningRate 0.2740   Epoch: 5   Global Step: 52920   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:38:16,578-Speed 5986.46 samples/sec   Loss 11.4490   LearningRate 0.2739   Epoch: 5   Global Step: 52930   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:38:23,468-Speed 5946.09 samples/sec   Loss 11.4794   LearningRate 0.2739   Epoch: 5   Global Step: 52940   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:38:30,337-Speed 5964.55 samples/sec   Loss 11.4349   LearningRate 0.2738   Epoch: 5   Global Step: 52950   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:38:37,193-Speed 5975.30 samples/sec   Loss 11.4043   LearningRate 0.2738   Epoch: 5   Global Step: 52960   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:38:44,063-Speed 5963.62 samples/sec   Loss 11.4629   LearningRate 0.2738   Epoch: 5   Global Step: 52970   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:38:50,933-Speed 5962.93 samples/sec   Loss 11.4274   LearningRate 0.2737   Epoch: 5   Global Step: 52980   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:39:01,375-Speed 3922.98 samples/sec   Loss 11.4236   LearningRate 0.2737   Epoch: 5   Global Step: 52990   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:39:08,249-Speed 5961.39 samples/sec   Loss 11.5724   LearningRate 0.2737   Epoch: 5   Global Step: 53000   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:39:15,087-Speed 5991.16 samples/sec   Loss 11.4559   LearningRate 0.2736   Epoch: 5   Global Step: 53010   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:39:21,948-Speed 5973.68 samples/sec   Loss 11.4539   LearningRate 0.2736   Epoch: 5   Global Step: 53020   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:39:28,805-Speed 5974.99 samples/sec   Loss 11.4847   LearningRate 0.2736   Epoch: 5   Global Step: 53030   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:39:35,659-Speed 5976.83 samples/sec   Loss 11.4393   LearningRate 0.2735   Epoch: 5   Global Step: 53040   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:39:42,527-Speed 5967.33 samples/sec   Loss 11.4943   LearningRate 0.2735   Epoch: 5   Global Step: 53050   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:39:49,389-Speed 5970.42 samples/sec   Loss 11.5117   LearningRate 0.2735   Epoch: 5   Global Step: 53060   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:39:56,248-Speed 5972.71 samples/sec   Loss 11.4543   LearningRate 0.2734   Epoch: 5   Global Step: 53070   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:40:03,116-Speed 5965.12 samples/sec   Loss 11.4938   LearningRate 0.2734   Epoch: 5   Global Step: 53080   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:40:09,958-Speed 5987.26 samples/sec   Loss 11.4405   LearningRate 0.2733   Epoch: 5   Global Step: 53090   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:40:16,807-Speed 5981.37 samples/sec   Loss 11.4074   LearningRate 0.2733   Epoch: 5   Global Step: 53100   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:40:23,656-Speed 5981.98 samples/sec   Loss 11.4537   LearningRate 0.2733   Epoch: 5   Global Step: 53110   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:40:30,504-Speed 5982.78 samples/sec   Loss 11.4024   LearningRate 0.2732   Epoch: 5   Global Step: 53120   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:40:37,449-Speed 5898.38 samples/sec   Loss 11.4746   LearningRate 0.2732   Epoch: 5   Global Step: 53130   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:40:44,311-Speed 5971.76 samples/sec   Loss 11.4222   LearningRate 0.2732   Epoch: 5   Global Step: 53140   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:40:51,269-Speed 5889.15 samples/sec   Loss 11.3964   LearningRate 0.2731   Epoch: 5   Global Step: 53150   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:40:58,223-Speed 5891.04 samples/sec   Loss 11.4217   LearningRate 0.2731   Epoch: 5   Global Step: 53160   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:41:05,176-Speed 5892.43 samples/sec   Loss 11.4027   LearningRate 0.2731   Epoch: 5   Global Step: 53170   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:41:12,026-Speed 5980.60 samples/sec   Loss 11.5166   LearningRate 0.2730   Epoch: 5   Global Step: 53180   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:41:18,970-Speed 5899.12 samples/sec   Loss 11.4285   LearningRate 0.2730   Epoch: 5   Global Step: 53190   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:41:25,848-Speed 5956.92 samples/sec   Loss 11.4626   LearningRate 0.2730   Epoch: 5   Global Step: 53200   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:41:32,701-Speed 5977.95 samples/sec   Loss 11.4985   LearningRate 0.2729   Epoch: 5   Global Step: 53210   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:41:39,556-Speed 5975.96 samples/sec   Loss 11.4204   LearningRate 0.2729   Epoch: 5   Global Step: 53220   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:41:46,386-Speed 5999.34 samples/sec   Loss 11.4263   LearningRate 0.2729   Epoch: 5   Global Step: 53230   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 06:41:53,258-Speed 5963.56 samples/sec   Loss 11.4999   LearningRate 0.2728   Epoch: 5   Global Step: 53240   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 06:42:00,106-Speed 5981.39 samples/sec   Loss 11.4257   LearningRate 0.2728   Epoch: 5   Global Step: 53250   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 06:42:06,946-Speed 5990.27 samples/sec   Loss 11.4249   LearningRate 0.2727   Epoch: 5   Global Step: 53260   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 06:42:13,828-Speed 5953.03 samples/sec   Loss 11.4548   LearningRate 0.2727   Epoch: 5   Global Step: 53270   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 06:42:20,688-Speed 5971.19 samples/sec   Loss 11.4405   LearningRate 0.2727   Epoch: 5   Global Step: 53280   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 06:42:27,542-Speed 5977.55 samples/sec   Loss 11.4692   LearningRate 0.2726   Epoch: 5   Global Step: 53290   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 06:42:34,981-Speed 5509.63 samples/sec   Loss 11.4277   LearningRate 0.2726   Epoch: 5   Global Step: 53300   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 06:42:41,824-Speed 5986.40 samples/sec   Loss 11.3748   LearningRate 0.2726   Epoch: 5   Global Step: 53310   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 06:42:48,681-Speed 5974.15 samples/sec   Loss 11.4258   LearningRate 0.2725   Epoch: 5   Global Step: 53320   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-01-08 06:42:55,546-Speed 5967.79 samples/sec   Loss 11.4444   LearningRate 0.2725   Epoch: 5   Global Step: 53330   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:43:02,406-Speed 5972.32 samples/sec   Loss 11.5383   LearningRate 0.2725   Epoch: 5   Global Step: 53340   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:43:09,274-Speed 5964.77 samples/sec   Loss 11.4528   LearningRate 0.2724   Epoch: 5   Global Step: 53350   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:43:16,118-Speed 5986.23 samples/sec   Loss 11.4702   LearningRate 0.2724   Epoch: 5   Global Step: 53360   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:43:22,968-Speed 5980.12 samples/sec   Loss 11.3987   LearningRate 0.2724   Epoch: 5   Global Step: 53370   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:43:29,862-Speed 5942.71 samples/sec   Loss 11.3670   LearningRate 0.2723   Epoch: 5   Global Step: 53380   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:43:36,764-Speed 5935.59 samples/sec   Loss 11.4128   LearningRate 0.2723   Epoch: 5   Global Step: 53390   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:43:43,632-Speed 5964.82 samples/sec   Loss 11.4444   LearningRate 0.2723   Epoch: 5   Global Step: 53400   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:43:50,493-Speed 5972.74 samples/sec   Loss 11.4268   LearningRate 0.2722   Epoch: 5   Global Step: 53410   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:43:57,394-Speed 5937.26 samples/sec   Loss 11.4234   LearningRate 0.2722   Epoch: 5   Global Step: 53420   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:44:04,270-Speed 5957.89 samples/sec   Loss 11.4684   LearningRate 0.2721   Epoch: 5   Global Step: 53430   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:44:11,119-Speed 5981.42 samples/sec   Loss 11.5047   LearningRate 0.2721   Epoch: 5   Global Step: 53440   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:44:18,068-Speed 5896.07 samples/sec   Loss 11.4768   LearningRate 0.2721   Epoch: 5   Global Step: 53450   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:44:25,012-Speed 5899.47 samples/sec   Loss 11.3417   LearningRate 0.2720   Epoch: 5   Global Step: 53460   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:44:31,858-Speed 5985.91 samples/sec   Loss 11.4627   LearningRate 0.2720   Epoch: 5   Global Step: 53470   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:44:38,717-Speed 5973.08 samples/sec   Loss 11.4122   LearningRate 0.2720   Epoch: 5   Global Step: 53480   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:44:45,570-Speed 5977.76 samples/sec   Loss 11.4744   LearningRate 0.2719   Epoch: 5   Global Step: 53490   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:44:52,434-Speed 5968.42 samples/sec   Loss 11.3469   LearningRate 0.2719   Epoch: 5   Global Step: 53500   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:44:59,286-Speed 5980.10 samples/sec   Loss 11.4121   LearningRate 0.2719   Epoch: 5   Global Step: 53510   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:45:06,285-Speed 5853.04 samples/sec   Loss 11.3730   LearningRate 0.2718   Epoch: 5   Global Step: 53520   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:45:13,147-Speed 5970.30 samples/sec   Loss 11.4255   LearningRate 0.2718   Epoch: 5   Global Step: 53530   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:45:20,010-Speed 5969.52 samples/sec   Loss 11.3954   LearningRate 0.2718   Epoch: 5   Global Step: 53540   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:45:26,870-Speed 5972.06 samples/sec   Loss 11.3858   LearningRate 0.2717   Epoch: 5   Global Step: 53550   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:45:33,727-Speed 5975.07 samples/sec   Loss 11.4615   LearningRate 0.2717   Epoch: 5   Global Step: 53560   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:45:40,580-Speed 5978.05 samples/sec   Loss 11.4036   LearningRate 0.2717   Epoch: 5   Global Step: 53570   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:45:47,435-Speed 5975.90 samples/sec   Loss 11.3989   LearningRate 0.2716   Epoch: 5   Global Step: 53580   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:45:54,285-Speed 5980.48 samples/sec   Loss 11.3275   LearningRate 0.2716   Epoch: 5   Global Step: 53590   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:46:01,136-Speed 5980.12 samples/sec   Loss 11.4364   LearningRate 0.2715   Epoch: 5   Global Step: 53600   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:46:07,994-Speed 5973.24 samples/sec   Loss 11.3624   LearningRate 0.2715   Epoch: 5   Global Step: 53610   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:46:14,836-Speed 5987.97 samples/sec   Loss 11.3900   LearningRate 0.2715   Epoch: 5   Global Step: 53620   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:46:21,696-Speed 5971.41 samples/sec   Loss 11.3929   LearningRate 0.2714   Epoch: 5   Global Step: 53630   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:46:28,540-Speed 5986.01 samples/sec   Loss 11.4063   LearningRate 0.2714   Epoch: 5   Global Step: 53640   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:46:35,405-Speed 5967.69 samples/sec   Loss 11.4424   LearningRate 0.2714   Epoch: 5   Global Step: 53650   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:46:42,290-Speed 5950.09 samples/sec   Loss 11.4733   LearningRate 0.2713   Epoch: 5   Global Step: 53660   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:46:49,150-Speed 5972.10 samples/sec   Loss 11.4396   LearningRate 0.2713   Epoch: 5   Global Step: 53670   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:46:56,007-Speed 5974.66 samples/sec   Loss 11.4256   LearningRate 0.2713   Epoch: 5   Global Step: 53680   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:47:02,874-Speed 5966.40 samples/sec   Loss 11.3658   LearningRate 0.2712   Epoch: 5   Global Step: 53690   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:47:09,721-Speed 5982.53 samples/sec   Loss 11.3995   LearningRate 0.2712   Epoch: 5   Global Step: 53700   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:47:16,581-Speed 5972.54 samples/sec   Loss 11.4159   LearningRate 0.2712   Epoch: 5   Global Step: 53710   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:47:23,448-Speed 5966.16 samples/sec   Loss 11.3732   LearningRate 0.2711   Epoch: 5   Global Step: 53720   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:47:30,306-Speed 5973.20 samples/sec   Loss 11.4088   LearningRate 0.2711   Epoch: 5   Global Step: 53730   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:47:37,204-Speed 5939.51 samples/sec   Loss 11.3715   LearningRate 0.2711   Epoch: 5   Global Step: 53740   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:47:44,075-Speed 5962.05 samples/sec   Loss 11.4142   LearningRate 0.2710   Epoch: 5   Global Step: 53750   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:47:50,936-Speed 5971.36 samples/sec   Loss 11.3491   LearningRate 0.2710   Epoch: 5   Global Step: 53760   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:47:57,783-Speed 5983.27 samples/sec   Loss 11.3701   LearningRate 0.2709   Epoch: 5   Global Step: 53770   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:48:04,659-Speed 5958.67 samples/sec   Loss 11.4149   LearningRate 0.2709   Epoch: 5   Global Step: 53780   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:48:11,510-Speed 5979.67 samples/sec   Loss 11.4449   LearningRate 0.2709   Epoch: 5   Global Step: 53790   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:48:18,359-Speed 5981.25 samples/sec   Loss 11.4184   LearningRate 0.2708   Epoch: 5   Global Step: 53800   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:48:25,219-Speed 5972.37 samples/sec   Loss 11.4111   LearningRate 0.2708   Epoch: 5   Global Step: 53810   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:48:32,072-Speed 5977.64 samples/sec   Loss 11.3983   LearningRate 0.2708   Epoch: 5   Global Step: 53820   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:48:38,922-Speed 5980.52 samples/sec   Loss 11.3664   LearningRate 0.2707   Epoch: 5   Global Step: 53830   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:48:45,780-Speed 5973.83 samples/sec   Loss 11.4068   LearningRate 0.2707   Epoch: 5   Global Step: 53840   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:48:52,630-Speed 5980.14 samples/sec   Loss 11.4449   LearningRate 0.2707   Epoch: 5   Global Step: 53850   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:48:59,469-Speed 5990.50 samples/sec   Loss 11.4383   LearningRate 0.2706   Epoch: 5   Global Step: 53860   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:49:06,315-Speed 5984.27 samples/sec   Loss 11.3599   LearningRate 0.2706   Epoch: 5   Global Step: 53870   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:49:13,181-Speed 5966.27 samples/sec   Loss 11.3964   LearningRate 0.2706   Epoch: 5   Global Step: 53880   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:49:20,037-Speed 5975.77 samples/sec   Loss 11.3974   LearningRate 0.2705   Epoch: 5   Global Step: 53890   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:49:26,895-Speed 5973.80 samples/sec   Loss 11.3924   LearningRate 0.2705   Epoch: 5   Global Step: 53900   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:49:33,762-Speed 5965.88 samples/sec   Loss 11.3776   LearningRate 0.2705   Epoch: 5   Global Step: 53910   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:49:40,636-Speed 5960.02 samples/sec   Loss 11.3024   LearningRate 0.2704   Epoch: 5   Global Step: 53920   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:49:47,506-Speed 5963.27 samples/sec   Loss 11.4017   LearningRate 0.2704   Epoch: 5   Global Step: 53930   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:49:54,371-Speed 5967.58 samples/sec   Loss 11.4781   LearningRate 0.2703   Epoch: 5   Global Step: 53940   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:50:01,261-Speed 5946.49 samples/sec   Loss 11.3166   LearningRate 0.2703   Epoch: 5   Global Step: 53950   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:50:08,236-Speed 5873.88 samples/sec   Loss 11.3461   LearningRate 0.2703   Epoch: 5   Global Step: 53960   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:50:15,091-Speed 5976.29 samples/sec   Loss 11.4040   LearningRate 0.2702   Epoch: 5   Global Step: 53970   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:50:21,936-Speed 5984.84 samples/sec   Loss 11.3080   LearningRate 0.2702   Epoch: 5   Global Step: 53980   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:50:28,830-Speed 5953.15 samples/sec   Loss 11.3828   LearningRate 0.2702   Epoch: 5   Global Step: 53990   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:50:35,781-Speed 5894.04 samples/sec   Loss 11.2722   LearningRate 0.2701   Epoch: 5   Global Step: 54000   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:50:42,650-Speed 5963.89 samples/sec   Loss 11.4052   LearningRate 0.2701   Epoch: 5   Global Step: 54010   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:50:49,543-Speed 5944.18 samples/sec   Loss 11.4180   LearningRate 0.2701   Epoch: 5   Global Step: 54020   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:50:56,506-Speed 5883.29 samples/sec   Loss 11.3255   LearningRate 0.2700   Epoch: 5   Global Step: 54030   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:51:03,353-Speed 5983.79 samples/sec   Loss 11.3233   LearningRate 0.2700   Epoch: 5   Global Step: 54040   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:51:10,227-Speed 5960.59 samples/sec   Loss 11.3665   LearningRate 0.2700   Epoch: 5   Global Step: 54050   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:51:17,096-Speed 5963.40 samples/sec   Loss 11.5146   LearningRate 0.2699   Epoch: 5   Global Step: 54060   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:51:23,958-Speed 5970.76 samples/sec   Loss 11.4004   LearningRate 0.2699   Epoch: 5   Global Step: 54070   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:51:30,823-Speed 5967.69 samples/sec   Loss 11.2959   LearningRate 0.2699   Epoch: 5   Global Step: 54080   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:51:37,695-Speed 5961.67 samples/sec   Loss 11.3539   LearningRate 0.2698   Epoch: 5   Global Step: 54090   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:51:44,560-Speed 5971.53 samples/sec   Loss 11.4595   LearningRate 0.2698   Epoch: 5   Global Step: 54100   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:51:51,407-Speed 5983.01 samples/sec   Loss 11.3425   LearningRate 0.2697   Epoch: 5   Global Step: 54110   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:51:58,288-Speed 5953.16 samples/sec   Loss 11.4040   LearningRate 0.2697   Epoch: 5   Global Step: 54120   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:52:05,134-Speed 5985.29 samples/sec   Loss 11.3596   LearningRate 0.2697   Epoch: 5   Global Step: 54130   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:52:11,986-Speed 5980.48 samples/sec   Loss 11.3537   LearningRate 0.2696   Epoch: 5   Global Step: 54140   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:52:18,825-Speed 5989.80 samples/sec   Loss 11.4251   LearningRate 0.2696   Epoch: 5   Global Step: 54150   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:52:25,675-Speed 5981.08 samples/sec   Loss 11.3821   LearningRate 0.2696   Epoch: 5   Global Step: 54160   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:52:32,561-Speed 5949.21 samples/sec   Loss 11.3105   LearningRate 0.2695   Epoch: 5   Global Step: 54170   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:52:39,415-Speed 5976.58 samples/sec   Loss 11.3226   LearningRate 0.2695   Epoch: 5   Global Step: 54180   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:52:46,266-Speed 5979.74 samples/sec   Loss 11.3690   LearningRate 0.2695   Epoch: 5   Global Step: 54190   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:52:53,116-Speed 5981.14 samples/sec   Loss 11.3985   LearningRate 0.2694   Epoch: 5   Global Step: 54200   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:52:59,983-Speed 5965.98 samples/sec   Loss 11.2754   LearningRate 0.2694   Epoch: 5   Global Step: 54210   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:53:06,851-Speed 5964.57 samples/sec   Loss 11.4193   LearningRate 0.2694   Epoch: 5   Global Step: 54220   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:53:13,705-Speed 5977.81 samples/sec   Loss 11.3658   LearningRate 0.2693   Epoch: 5   Global Step: 54230   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:53:20,617-Speed 5926.34 samples/sec   Loss 11.3486   LearningRate 0.2693   Epoch: 5   Global Step: 54240   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:53:27,473-Speed 5975.99 samples/sec   Loss 11.4357   LearningRate 0.2693   Epoch: 5   Global Step: 54250   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:53:34,324-Speed 5980.77 samples/sec   Loss 11.3464   LearningRate 0.2692   Epoch: 5   Global Step: 54260   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:53:41,222-Speed 5939.14 samples/sec   Loss 11.4246   LearningRate 0.2692   Epoch: 5   Global Step: 54270   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:53:48,068-Speed 5984.64 samples/sec   Loss 11.3295   LearningRate 0.2691   Epoch: 5   Global Step: 54280   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:53:54,919-Speed 5979.83 samples/sec   Loss 11.3563   LearningRate 0.2691   Epoch: 5   Global Step: 54290   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:54:01,794-Speed 5959.03 samples/sec   Loss 11.3402   LearningRate 0.2691   Epoch: 5   Global Step: 54300   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:54:08,663-Speed 5963.95 samples/sec   Loss 11.3827   LearningRate 0.2690   Epoch: 5   Global Step: 54310   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:54:15,542-Speed 5956.16 samples/sec   Loss 11.3982   LearningRate 0.2690   Epoch: 5   Global Step: 54320   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:54:22,384-Speed 5987.38 samples/sec   Loss 11.3158   LearningRate 0.2690   Epoch: 5   Global Step: 54330   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:54:29,244-Speed 5972.18 samples/sec   Loss 11.2903   LearningRate 0.2689   Epoch: 5   Global Step: 54340   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:54:36,115-Speed 5963.28 samples/sec   Loss 11.4444   LearningRate 0.2689   Epoch: 5   Global Step: 54350   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:54:42,970-Speed 5976.70 samples/sec   Loss 11.3694   LearningRate 0.2689   Epoch: 5   Global Step: 54360   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:54:49,855-Speed 5951.75 samples/sec   Loss 11.3395   LearningRate 0.2688   Epoch: 5   Global Step: 54370   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:54:56,728-Speed 5960.10 samples/sec   Loss 11.2919   LearningRate 0.2688   Epoch: 5   Global Step: 54380   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:55:03,573-Speed 5985.00 samples/sec   Loss 11.3932   LearningRate 0.2688   Epoch: 5   Global Step: 54390   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:55:10,427-Speed 5977.59 samples/sec   Loss 11.2594   LearningRate 0.2687   Epoch: 5   Global Step: 54400   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:55:17,285-Speed 5973.86 samples/sec   Loss 11.4542   LearningRate 0.2687   Epoch: 5   Global Step: 54410   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:55:24,140-Speed 5975.82 samples/sec   Loss 11.3647   LearningRate 0.2687   Epoch: 5   Global Step: 54420   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:55:31,070-Speed 5971.97 samples/sec   Loss 11.3228   LearningRate 0.2686   Epoch: 5   Global Step: 54430   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:55:37,967-Speed 5940.85 samples/sec   Loss 11.4798   LearningRate 0.2686   Epoch: 5   Global Step: 54440   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:55:44,823-Speed 5975.10 samples/sec   Loss 11.3424   LearningRate 0.2686   Epoch: 5   Global Step: 54450   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:55:51,678-Speed 5976.14 samples/sec   Loss 11.3954   LearningRate 0.2685   Epoch: 5   Global Step: 54460   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:55:58,524-Speed 5985.86 samples/sec   Loss 11.3171   LearningRate 0.2685   Epoch: 5   Global Step: 54470   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:56:05,403-Speed 5954.29 samples/sec   Loss 11.3385   LearningRate 0.2684   Epoch: 5   Global Step: 54480   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:56:12,254-Speed 5980.02 samples/sec   Loss 11.3265   LearningRate 0.2684   Epoch: 5   Global Step: 54490   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:56:19,097-Speed 5987.33 samples/sec   Loss 11.2806   LearningRate 0.2684   Epoch: 5   Global Step: 54500   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:56:25,959-Speed 5969.43 samples/sec   Loss 11.2783   LearningRate 0.2683   Epoch: 5   Global Step: 54510   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:56:32,818-Speed 5973.29 samples/sec   Loss 11.3985   LearningRate 0.2683   Epoch: 5   Global Step: 54520   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:56:39,698-Speed 5955.07 samples/sec   Loss 11.3476   LearningRate 0.2683   Epoch: 5   Global Step: 54530   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:56:46,551-Speed 5978.08 samples/sec   Loss 11.3186   LearningRate 0.2682   Epoch: 5   Global Step: 54540   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:56:53,415-Speed 5967.98 samples/sec   Loss 11.3572   LearningRate 0.2682   Epoch: 5   Global Step: 54550   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:57:00,258-Speed 5987.03 samples/sec   Loss 11.4009   LearningRate 0.2682   Epoch: 5   Global Step: 54560   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:57:07,123-Speed 5967.87 samples/sec   Loss 11.2490   LearningRate 0.2681   Epoch: 5   Global Step: 54570   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:57:13,995-Speed 5961.65 samples/sec   Loss 11.3333   LearningRate 0.2681   Epoch: 5   Global Step: 54580   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:57:20,878-Speed 5953.73 samples/sec   Loss 11.2777   LearningRate 0.2681   Epoch: 5   Global Step: 54590   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:57:27,762-Speed 5951.07 samples/sec   Loss 11.3559   LearningRate 0.2680   Epoch: 5   Global Step: 54600   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:57:34,614-Speed 5978.98 samples/sec   Loss 11.4183   LearningRate 0.2680   Epoch: 5   Global Step: 54610   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:57:41,472-Speed 5973.73 samples/sec   Loss 11.2914   LearningRate 0.2680   Epoch: 5   Global Step: 54620   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 06:57:48,318-Speed 5983.95 samples/sec   Loss 11.3474   LearningRate 0.2679   Epoch: 5   Global Step: 54630   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:57:55,176-Speed 5973.57 samples/sec   Loss 11.4114   LearningRate 0.2679   Epoch: 5   Global Step: 54640   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:58:02,040-Speed 5969.28 samples/sec   Loss 11.3737   LearningRate 0.2678   Epoch: 5   Global Step: 54650   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:58:08,915-Speed 5958.87 samples/sec   Loss 11.3172   LearningRate 0.2678   Epoch: 5   Global Step: 54660   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:58:15,772-Speed 5974.46 samples/sec   Loss 11.3154   LearningRate 0.2678   Epoch: 5   Global Step: 54670   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:58:22,631-Speed 5973.27 samples/sec   Loss 11.3862   LearningRate 0.2677   Epoch: 5   Global Step: 54680   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:58:29,482-Speed 5979.06 samples/sec   Loss 11.3681   LearningRate 0.2677   Epoch: 5   Global Step: 54690   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:58:36,346-Speed 5968.63 samples/sec   Loss 11.3612   LearningRate 0.2677   Epoch: 5   Global Step: 54700   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:58:43,227-Speed 5953.98 samples/sec   Loss 11.3320   LearningRate 0.2676   Epoch: 5   Global Step: 54710   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:58:50,071-Speed 5984.94 samples/sec   Loss 11.2942   LearningRate 0.2676   Epoch: 5   Global Step: 54720   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 06:58:56,920-Speed 5981.85 samples/sec   Loss 11.2745   LearningRate 0.2676   Epoch: 5   Global Step: 54730   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:59:03,776-Speed 5975.87 samples/sec   Loss 11.2779   LearningRate 0.2675   Epoch: 5   Global Step: 54740   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:59:10,639-Speed 5968.60 samples/sec   Loss 11.3431   LearningRate 0.2675   Epoch: 5   Global Step: 54750   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:59:17,487-Speed 5982.60 samples/sec   Loss 11.2955   LearningRate 0.2675   Epoch: 5   Global Step: 54760   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:59:24,337-Speed 5981.05 samples/sec   Loss 11.2964   LearningRate 0.2674   Epoch: 5   Global Step: 54770   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:59:31,196-Speed 5973.23 samples/sec   Loss 11.2732   LearningRate 0.2674   Epoch: 5   Global Step: 54780   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:59:38,070-Speed 5960.25 samples/sec   Loss 11.3836   LearningRate 0.2674   Epoch: 5   Global Step: 54790   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:59:44,932-Speed 5969.86 samples/sec   Loss 11.2928   LearningRate 0.2673   Epoch: 5   Global Step: 54800   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:59:51,794-Speed 5969.90 samples/sec   Loss 11.3828   LearningRate 0.2673   Epoch: 5   Global Step: 54810   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 06:59:58,643-Speed 5982.13 samples/sec   Loss 11.3380   LearningRate 0.2673   Epoch: 5   Global Step: 54820   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:00:05,511-Speed 5965.13 samples/sec   Loss 11.3133   LearningRate 0.2672   Epoch: 5   Global Step: 54830   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:00:12,369-Speed 5973.91 samples/sec   Loss 11.3018   LearningRate 0.2672   Epoch: 5   Global Step: 54840   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:00:19,238-Speed 5964.10 samples/sec   Loss 11.3175   LearningRate 0.2671   Epoch: 5   Global Step: 54850   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:00:26,094-Speed 5980.12 samples/sec   Loss 11.3927   LearningRate 0.2671   Epoch: 5   Global Step: 54860   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:00:32,952-Speed 5973.40 samples/sec   Loss 11.2832   LearningRate 0.2671   Epoch: 5   Global Step: 54870   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:00:39,814-Speed 5972.54 samples/sec   Loss 11.3478   LearningRate 0.2670   Epoch: 5   Global Step: 54880   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:00:46,684-Speed 5963.28 samples/sec   Loss 11.2281   LearningRate 0.2670   Epoch: 5   Global Step: 54890   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:00:53,540-Speed 5975.01 samples/sec   Loss 11.3526   LearningRate 0.2670   Epoch: 5   Global Step: 54900   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:01:00,387-Speed 5983.41 samples/sec   Loss 11.3697   LearningRate 0.2669   Epoch: 5   Global Step: 54910   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:01:07,247-Speed 5974.15 samples/sec   Loss 11.3304   LearningRate 0.2669   Epoch: 5   Global Step: 54920   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:01:14,118-Speed 5962.68 samples/sec   Loss 11.2850   LearningRate 0.2669   Epoch: 5   Global Step: 54930   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:01:20,992-Speed 5959.72 samples/sec   Loss 11.3023   LearningRate 0.2668   Epoch: 5   Global Step: 54940   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:01:27,840-Speed 5984.66 samples/sec   Loss 11.3579   LearningRate 0.2668   Epoch: 5   Global Step: 54950   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:01:34,725-Speed 5949.96 samples/sec   Loss 11.4219   LearningRate 0.2668   Epoch: 5   Global Step: 54960   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:01:41,600-Speed 5959.85 samples/sec   Loss 11.3894   LearningRate 0.2667   Epoch: 5   Global Step: 54970   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:01:48,489-Speed 5946.40 samples/sec   Loss 11.2812   LearningRate 0.2667   Epoch: 5   Global Step: 54980   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:01:55,359-Speed 5963.68 samples/sec   Loss 11.2453   LearningRate 0.2667   Epoch: 5   Global Step: 54990   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:02:02,212-Speed 5978.43 samples/sec   Loss 11.2946   LearningRate 0.2666   Epoch: 5   Global Step: 55000   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 07:02:32,176-[lfw][55000]XNorm: 23.234773
Training: 2022-01-08 07:02:32,177-[lfw][55000]Accuracy-Flip: 0.99667+-0.00289
Training: 2022-01-08 07:02:32,178-[lfw][55000]Accuracy-Highest: 0.99700
Training: 2022-01-08 07:03:02,970-[cfp_fp][55000]XNorm: 20.125612
Training: 2022-01-08 07:03:02,971-[cfp_fp][55000]Accuracy-Flip: 0.97686+-0.00676
Training: 2022-01-08 07:03:02,972-[cfp_fp][55000]Accuracy-Highest: 0.97686
Training: 2022-01-08 07:03:34,405-[agedb_30][55000]XNorm: 22.376593
Training: 2022-01-08 07:03:34,406-[agedb_30][55000]Accuracy-Flip: 0.96150+-0.01050
Training: 2022-01-08 07:03:34,406-[agedb_30][55000]Accuracy-Highest: 0.96283
Training: 2022-01-08 07:03:41,281-Speed 413.45 samples/sec   Loss 11.3682   LearningRate 0.2666   Epoch: 5   Global Step: 55010   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 07:03:48,130-Speed 5981.99 samples/sec   Loss 11.3465   LearningRate 0.2666   Epoch: 5   Global Step: 55020   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 07:03:54,961-Speed 5997.58 samples/sec   Loss 11.3370   LearningRate 0.2665   Epoch: 5   Global Step: 55030   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 07:04:01,812-Speed 5980.32 samples/sec   Loss 11.2941   LearningRate 0.2665   Epoch: 5   Global Step: 55040   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 07:04:08,664-Speed 5979.25 samples/sec   Loss 11.2405   LearningRate 0.2664   Epoch: 5   Global Step: 55050   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 07:04:15,531-Speed 5965.39 samples/sec   Loss 11.3544   LearningRate 0.2664   Epoch: 5   Global Step: 55060   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 07:04:22,389-Speed 5974.19 samples/sec   Loss 11.1859   LearningRate 0.2664   Epoch: 5   Global Step: 55070   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 07:04:29,255-Speed 5966.66 samples/sec   Loss 11.3670   LearningRate 0.2663   Epoch: 5   Global Step: 55080   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 07:04:36,123-Speed 5965.50 samples/sec   Loss 11.1874   LearningRate 0.2663   Epoch: 5   Global Step: 55090   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 07:04:43,013-Speed 5946.14 samples/sec   Loss 11.3795   LearningRate 0.2663   Epoch: 5   Global Step: 55100   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:04:49,881-Speed 5965.34 samples/sec   Loss 11.3481   LearningRate 0.2662   Epoch: 5   Global Step: 55110   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:04:56,767-Speed 5948.92 samples/sec   Loss 11.3349   LearningRate 0.2662   Epoch: 5   Global Step: 55120   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:05:03,625-Speed 5976.88 samples/sec   Loss 11.3137   LearningRate 0.2662   Epoch: 5   Global Step: 55130   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:05:10,489-Speed 5968.54 samples/sec   Loss 11.2717   LearningRate 0.2661   Epoch: 5   Global Step: 55140   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:05:17,361-Speed 5961.89 samples/sec   Loss 11.4158   LearningRate 0.2661   Epoch: 5   Global Step: 55150   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:05:24,212-Speed 5980.41 samples/sec   Loss 11.3715   LearningRate 0.2661   Epoch: 5   Global Step: 55160   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:05:31,072-Speed 5971.92 samples/sec   Loss 11.2714   LearningRate 0.2660   Epoch: 5   Global Step: 55170   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:05:37,920-Speed 5982.46 samples/sec   Loss 11.3868   LearningRate 0.2660   Epoch: 5   Global Step: 55180   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:05:44,825-Speed 5932.90 samples/sec   Loss 11.2595   LearningRate 0.2660   Epoch: 5   Global Step: 55190   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:05:51,697-Speed 5961.19 samples/sec   Loss 11.2131   LearningRate 0.2659   Epoch: 5   Global Step: 55200   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:05:58,562-Speed 5968.00 samples/sec   Loss 11.2974   LearningRate 0.2659   Epoch: 5   Global Step: 55210   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:06:05,436-Speed 5960.31 samples/sec   Loss 11.2015   LearningRate 0.2659   Epoch: 5   Global Step: 55220   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:06:12,295-Speed 5972.60 samples/sec   Loss 11.3090   LearningRate 0.2658   Epoch: 5   Global Step: 55230   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:06:19,155-Speed 5972.44 samples/sec   Loss 11.2947   LearningRate 0.2658   Epoch: 5   Global Step: 55240   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:06:26,042-Speed 5948.10 samples/sec   Loss 11.2690   LearningRate 0.2657   Epoch: 5   Global Step: 55250   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:06:32,900-Speed 5974.03 samples/sec   Loss 11.3149   LearningRate 0.2657   Epoch: 5   Global Step: 55260   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:06:39,760-Speed 5972.34 samples/sec   Loss 11.3140   LearningRate 0.2657   Epoch: 5   Global Step: 55270   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:06:46,608-Speed 5981.96 samples/sec   Loss 11.2784   LearningRate 0.2656   Epoch: 5   Global Step: 55280   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:06:53,455-Speed 5982.84 samples/sec   Loss 11.2905   LearningRate 0.2656   Epoch: 5   Global Step: 55290   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:07:00,315-Speed 5972.29 samples/sec   Loss 11.3780   LearningRate 0.2656   Epoch: 5   Global Step: 55300   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:07:07,188-Speed 5961.21 samples/sec   Loss 11.2025   LearningRate 0.2655   Epoch: 5   Global Step: 55310   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:07:14,042-Speed 5976.74 samples/sec   Loss 11.2624   LearningRate 0.2655   Epoch: 5   Global Step: 55320   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:07:20,910-Speed 5967.40 samples/sec   Loss 11.2660   LearningRate 0.2655   Epoch: 5   Global Step: 55330   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:07:27,785-Speed 5959.03 samples/sec   Loss 11.2534   LearningRate 0.2654   Epoch: 5   Global Step: 55340   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:07:34,689-Speed 5933.62 samples/sec   Loss 11.2732   LearningRate 0.2654   Epoch: 5   Global Step: 55350   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:07:41,536-Speed 5983.39 samples/sec   Loss 11.3006   LearningRate 0.2654   Epoch: 5   Global Step: 55360   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:07:48,410-Speed 5962.15 samples/sec   Loss 11.2798   LearningRate 0.2653   Epoch: 5   Global Step: 55370   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:07:55,265-Speed 5976.58 samples/sec   Loss 11.2999   LearningRate 0.2653   Epoch: 5   Global Step: 55380   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:08:02,210-Speed 5901.31 samples/sec   Loss 11.2755   LearningRate 0.2653   Epoch: 5   Global Step: 55390   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:08:09,064-Speed 5977.41 samples/sec   Loss 11.2481   LearningRate 0.2652   Epoch: 5   Global Step: 55400   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:08:15,956-Speed 5944.10 samples/sec   Loss 11.3533   LearningRate 0.2652   Epoch: 5   Global Step: 55410   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:08:22,835-Speed 5956.06 samples/sec   Loss 11.2378   LearningRate 0.2652   Epoch: 5   Global Step: 55420   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:08:29,702-Speed 5965.44 samples/sec   Loss 11.2548   LearningRate 0.2651   Epoch: 5   Global Step: 55430   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:08:36,595-Speed 5943.61 samples/sec   Loss 11.2941   LearningRate 0.2651   Epoch: 5   Global Step: 55440   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:08:43,469-Speed 5959.94 samples/sec   Loss 11.2498   LearningRate 0.2651   Epoch: 5   Global Step: 55450   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:08:50,339-Speed 5963.80 samples/sec   Loss 11.2776   LearningRate 0.2650   Epoch: 5   Global Step: 55460   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:08:57,206-Speed 5966.27 samples/sec   Loss 11.3533   LearningRate 0.2650   Epoch: 5   Global Step: 55470   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:09:04,068-Speed 5971.91 samples/sec   Loss 11.2544   LearningRate 0.2649   Epoch: 5   Global Step: 55480   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:09:10,957-Speed 5947.88 samples/sec   Loss 11.2965   LearningRate 0.2649   Epoch: 5   Global Step: 55490   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:09:17,823-Speed 5966.67 samples/sec   Loss 11.2688   LearningRate 0.2649   Epoch: 5   Global Step: 55500   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:09:24,689-Speed 5966.34 samples/sec   Loss 11.2649   LearningRate 0.2648   Epoch: 5   Global Step: 55510   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:09:31,568-Speed 5955.59 samples/sec   Loss 11.2140   LearningRate 0.2648   Epoch: 5   Global Step: 55520   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:09:38,430-Speed 5970.19 samples/sec   Loss 11.2686   LearningRate 0.2648   Epoch: 5   Global Step: 55530   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:09:45,323-Speed 5943.17 samples/sec   Loss 11.3260   LearningRate 0.2647   Epoch: 5   Global Step: 55540   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:09:52,194-Speed 5962.38 samples/sec   Loss 11.3566   LearningRate 0.2647   Epoch: 5   Global Step: 55550   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:09:59,065-Speed 5963.17 samples/sec   Loss 11.3220   LearningRate 0.2647   Epoch: 5   Global Step: 55560   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:10:05,935-Speed 5962.97 samples/sec   Loss 11.1931   LearningRate 0.2646   Epoch: 5   Global Step: 55570   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:10:12,805-Speed 5963.85 samples/sec   Loss 11.2436   LearningRate 0.2646   Epoch: 5   Global Step: 55580   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:10:19,707-Speed 5935.54 samples/sec   Loss 11.2994   LearningRate 0.2646   Epoch: 5   Global Step: 55590   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:10:26,718-Speed 5843.92 samples/sec   Loss 11.2869   LearningRate 0.2645   Epoch: 5   Global Step: 55600   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:10:33,572-Speed 5978.53 samples/sec   Loss 11.2377   LearningRate 0.2645   Epoch: 5   Global Step: 55610   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:10:40,461-Speed 5946.74 samples/sec   Loss 11.3155   LearningRate 0.2645   Epoch: 5   Global Step: 55620   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:10:47,338-Speed 5957.32 samples/sec   Loss 11.2371   LearningRate 0.2644   Epoch: 5   Global Step: 55630   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:10:54,194-Speed 5975.82 samples/sec   Loss 11.2732   LearningRate 0.2644   Epoch: 5   Global Step: 55640   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:11:01,065-Speed 5962.57 samples/sec   Loss 11.2463   LearningRate 0.2644   Epoch: 5   Global Step: 55650   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:11:07,935-Speed 5962.66 samples/sec   Loss 11.2890   LearningRate 0.2643   Epoch: 5   Global Step: 55660   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:11:14,792-Speed 5975.27 samples/sec   Loss 11.1755   LearningRate 0.2643   Epoch: 5   Global Step: 55670   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:11:21,649-Speed 5974.03 samples/sec   Loss 11.2854   LearningRate 0.2642   Epoch: 5   Global Step: 55680   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:11:28,503-Speed 5978.06 samples/sec   Loss 11.3145   LearningRate 0.2642   Epoch: 5   Global Step: 55690   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:11:35,375-Speed 5961.16 samples/sec   Loss 11.2413   LearningRate 0.2642   Epoch: 5   Global Step: 55700   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:11:42,268-Speed 5943.34 samples/sec   Loss 11.1962   LearningRate 0.2641   Epoch: 5   Global Step: 55710   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:11:49,144-Speed 5958.98 samples/sec   Loss 11.2400   LearningRate 0.2641   Epoch: 5   Global Step: 55720   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:11:56,006-Speed 5970.33 samples/sec   Loss 11.2156   LearningRate 0.2641   Epoch: 5   Global Step: 55730   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:12:02,976-Speed 5877.63 samples/sec   Loss 11.3133   LearningRate 0.2640   Epoch: 5   Global Step: 55740   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:12:09,844-Speed 5964.93 samples/sec   Loss 11.2011   LearningRate 0.2640   Epoch: 5   Global Step: 55750   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:12:16,695-Speed 5979.67 samples/sec   Loss 11.2513   LearningRate 0.2640   Epoch: 5   Global Step: 55760   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:12:23,574-Speed 5956.12 samples/sec   Loss 11.1813   LearningRate 0.2639   Epoch: 5   Global Step: 55770   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:12:30,428-Speed 5977.08 samples/sec   Loss 11.2444   LearningRate 0.2639   Epoch: 5   Global Step: 55780   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:12:37,282-Speed 5977.25 samples/sec   Loss 11.2316   LearningRate 0.2639   Epoch: 5   Global Step: 55790   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:12:44,156-Speed 5959.49 samples/sec   Loss 11.2493   LearningRate 0.2638   Epoch: 5   Global Step: 55800   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:12:51,012-Speed 5975.21 samples/sec   Loss 11.1891   LearningRate 0.2638   Epoch: 5   Global Step: 55810   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:12:57,885-Speed 5961.72 samples/sec   Loss 11.3172   LearningRate 0.2638   Epoch: 5   Global Step: 55820   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:13:04,746-Speed 5971.18 samples/sec   Loss 11.2683   LearningRate 0.2637   Epoch: 5   Global Step: 55830   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:13:11,597-Speed 5980.17 samples/sec   Loss 11.2734   LearningRate 0.2637   Epoch: 5   Global Step: 55840   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:13:18,454-Speed 5974.15 samples/sec   Loss 11.3040   LearningRate 0.2637   Epoch: 5   Global Step: 55850   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:13:25,312-Speed 5973.98 samples/sec   Loss 11.2346   LearningRate 0.2636   Epoch: 5   Global Step: 55860   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:13:32,173-Speed 5970.53 samples/sec   Loss 11.2489   LearningRate 0.2636   Epoch: 5   Global Step: 55870   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:13:39,054-Speed 5954.17 samples/sec   Loss 11.2300   LearningRate 0.2636   Epoch: 5   Global Step: 55880   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:13:45,935-Speed 5953.85 samples/sec   Loss 11.2527   LearningRate 0.2635   Epoch: 5   Global Step: 55890   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:13:52,788-Speed 5978.73 samples/sec   Loss 11.3270   LearningRate 0.2635   Epoch: 5   Global Step: 55900   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:13:59,641-Speed 5978.43 samples/sec   Loss 11.2888   LearningRate 0.2634   Epoch: 5   Global Step: 55910   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:14:06,494-Speed 5978.24 samples/sec   Loss 11.3083   LearningRate 0.2634   Epoch: 5   Global Step: 55920   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:14:13,380-Speed 5952.74 samples/sec   Loss 11.2782   LearningRate 0.2634   Epoch: 5   Global Step: 55930   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:14:20,271-Speed 5947.16 samples/sec   Loss 11.2032   LearningRate 0.2633   Epoch: 5   Global Step: 55940   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:14:27,140-Speed 5964.34 samples/sec   Loss 11.2626   LearningRate 0.2633   Epoch: 5   Global Step: 55950   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:14:33,987-Speed 5983.13 samples/sec   Loss 11.2855   LearningRate 0.2633   Epoch: 5   Global Step: 55960   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:14:40,846-Speed 5972.81 samples/sec   Loss 11.2210   LearningRate 0.2632   Epoch: 5   Global Step: 55970   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:14:47,720-Speed 5960.15 samples/sec   Loss 11.1144   LearningRate 0.2632   Epoch: 5   Global Step: 55980   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:14:54,584-Speed 5970.07 samples/sec   Loss 11.2511   LearningRate 0.2632   Epoch: 5   Global Step: 55990   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:15:01,431-Speed 5983.56 samples/sec   Loss 11.2357   LearningRate 0.2631   Epoch: 5   Global Step: 56000   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:15:08,314-Speed 5952.60 samples/sec   Loss 11.2155   LearningRate 0.2631   Epoch: 5   Global Step: 56010   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:15:15,170-Speed 5975.42 samples/sec   Loss 11.2207   LearningRate 0.2631   Epoch: 5   Global Step: 56020   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:15:22,049-Speed 5955.96 samples/sec   Loss 11.2095   LearningRate 0.2630   Epoch: 5   Global Step: 56030   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:15:28,900-Speed 5978.96 samples/sec   Loss 11.1279   LearningRate 0.2630   Epoch: 5   Global Step: 56040   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:15:35,745-Speed 5985.21 samples/sec   Loss 11.1886   LearningRate 0.2630   Epoch: 5   Global Step: 56050   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:15:42,592-Speed 5983.65 samples/sec   Loss 11.2398   LearningRate 0.2629   Epoch: 5   Global Step: 56060   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:15:49,465-Speed 5959.95 samples/sec   Loss 11.2409   LearningRate 0.2629   Epoch: 5   Global Step: 56070   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:15:56,399-Speed 5909.10 samples/sec   Loss 11.2019   LearningRate 0.2629   Epoch: 5   Global Step: 56080   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:16:03,334-Speed 5906.86 samples/sec   Loss 11.2150   LearningRate 0.2628   Epoch: 5   Global Step: 56090   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:16:10,285-Speed 5893.50 samples/sec   Loss 11.1812   LearningRate 0.2628   Epoch: 5   Global Step: 56100   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:16:17,184-Speed 5938.28 samples/sec   Loss 11.2175   LearningRate 0.2628   Epoch: 5   Global Step: 56110   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 07:16:24,121-Speed 5905.79 samples/sec   Loss 11.1985   LearningRate 0.2627   Epoch: 5   Global Step: 56120   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 07:16:30,991-Speed 5963.35 samples/sec   Loss 11.2451   LearningRate 0.2627   Epoch: 5   Global Step: 56130   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 07:16:37,839-Speed 5981.65 samples/sec   Loss 11.2638   LearningRate 0.2626   Epoch: 5   Global Step: 56140   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 07:16:44,698-Speed 5975.43 samples/sec   Loss 11.1873   LearningRate 0.2626   Epoch: 5   Global Step: 56150   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 07:16:51,546-Speed 5982.08 samples/sec   Loss 11.2722   LearningRate 0.2626   Epoch: 5   Global Step: 56160   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 07:16:58,417-Speed 5961.83 samples/sec   Loss 11.2465   LearningRate 0.2625   Epoch: 5   Global Step: 56170   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 07:17:05,271-Speed 5977.09 samples/sec   Loss 11.1509   LearningRate 0.2625   Epoch: 5   Global Step: 56180   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 07:17:12,127-Speed 5975.80 samples/sec   Loss 11.2253   LearningRate 0.2625   Epoch: 5   Global Step: 56190   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 07:17:19,000-Speed 5960.23 samples/sec   Loss 11.1712   LearningRate 0.2624   Epoch: 5   Global Step: 56200   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 07:17:25,849-Speed 5982.22 samples/sec   Loss 11.2300   LearningRate 0.2624   Epoch: 5   Global Step: 56210   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:17:32,707-Speed 5973.40 samples/sec   Loss 11.1865   LearningRate 0.2624   Epoch: 5   Global Step: 56220   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:17:39,564-Speed 5975.78 samples/sec   Loss 11.2988   LearningRate 0.2623   Epoch: 5   Global Step: 56230   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:17:46,414-Speed 5980.73 samples/sec   Loss 11.2335   LearningRate 0.2623   Epoch: 5   Global Step: 56240   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:17:53,290-Speed 5957.87 samples/sec   Loss 11.2314   LearningRate 0.2623   Epoch: 5   Global Step: 56250   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:18:00,137-Speed 5983.42 samples/sec   Loss 11.1939   LearningRate 0.2622   Epoch: 5   Global Step: 56260   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:18:06,985-Speed 5982.16 samples/sec   Loss 11.1826   LearningRate 0.2622   Epoch: 5   Global Step: 56270   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:18:13,833-Speed 5982.02 samples/sec   Loss 11.2296   LearningRate 0.2622   Epoch: 5   Global Step: 56280   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:18:20,728-Speed 5941.85 samples/sec   Loss 11.1156   LearningRate 0.2621   Epoch: 5   Global Step: 56290   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:18:27,589-Speed 5971.21 samples/sec   Loss 11.2010   LearningRate 0.2621   Epoch: 5   Global Step: 56300   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:18:34,452-Speed 5968.83 samples/sec   Loss 11.1554   LearningRate 0.2621   Epoch: 5   Global Step: 56310   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:18:41,308-Speed 5975.89 samples/sec   Loss 11.2020   LearningRate 0.2620   Epoch: 5   Global Step: 56320   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:18:48,176-Speed 5965.08 samples/sec   Loss 11.2208   LearningRate 0.2620   Epoch: 5   Global Step: 56330   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:18:55,072-Speed 5939.95 samples/sec   Loss 11.1764   LearningRate 0.2620   Epoch: 5   Global Step: 56340   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:19:01,940-Speed 5965.01 samples/sec   Loss 11.1523   LearningRate 0.2619   Epoch: 5   Global Step: 56350   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:19:08,819-Speed 5955.73 samples/sec   Loss 11.2187   LearningRate 0.2619   Epoch: 5   Global Step: 56360   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:19:15,672-Speed 5977.91 samples/sec   Loss 11.2089   LearningRate 0.2619   Epoch: 5   Global Step: 56370   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:19:22,552-Speed 5962.33 samples/sec   Loss 11.1389   LearningRate 0.2618   Epoch: 5   Global Step: 56380   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:19:29,421-Speed 5964.39 samples/sec   Loss 11.1998   LearningRate 0.2618   Epoch: 5   Global Step: 56390   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:19:36,320-Speed 5938.21 samples/sec   Loss 11.1104   LearningRate 0.2617   Epoch: 5   Global Step: 56400   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:19:43,194-Speed 5959.71 samples/sec   Loss 11.2275   LearningRate 0.2617   Epoch: 5   Global Step: 56410   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:19:50,054-Speed 5972.11 samples/sec   Loss 11.2906   LearningRate 0.2617   Epoch: 5   Global Step: 56420   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:19:56,923-Speed 5964.43 samples/sec   Loss 11.1708   LearningRate 0.2616   Epoch: 5   Global Step: 56430   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:20:03,774-Speed 5981.04 samples/sec   Loss 11.2191   LearningRate 0.2616   Epoch: 5   Global Step: 56440   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:20:10,640-Speed 5966.60 samples/sec   Loss 11.2279   LearningRate 0.2616   Epoch: 5   Global Step: 56450   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:20:17,516-Speed 5958.03 samples/sec   Loss 11.1249   LearningRate 0.2615   Epoch: 5   Global Step: 56460   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:20:24,426-Speed 5928.93 samples/sec   Loss 11.1703   LearningRate 0.2615   Epoch: 5   Global Step: 56470   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:20:31,332-Speed 5932.32 samples/sec   Loss 11.2674   LearningRate 0.2615   Epoch: 5   Global Step: 56480   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:20:38,211-Speed 5955.40 samples/sec   Loss 11.2668   LearningRate 0.2614   Epoch: 5   Global Step: 56490   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:20:45,187-Speed 5875.15 samples/sec   Loss 11.2227   LearningRate 0.2614   Epoch: 5   Global Step: 56500   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:20:52,049-Speed 5971.14 samples/sec   Loss 11.2073   LearningRate 0.2614   Epoch: 5   Global Step: 56510   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:20:58,915-Speed 5965.88 samples/sec   Loss 11.1899   LearningRate 0.2613   Epoch: 5   Global Step: 56520   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:21:05,784-Speed 5966.88 samples/sec   Loss 11.2031   LearningRate 0.2613   Epoch: 5   Global Step: 56530   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:21:12,637-Speed 5978.84 samples/sec   Loss 11.2031   LearningRate 0.2613   Epoch: 5   Global Step: 56540   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:21:19,485-Speed 5981.79 samples/sec   Loss 11.2058   LearningRate 0.2612   Epoch: 5   Global Step: 56550   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:21:26,327-Speed 5987.62 samples/sec   Loss 11.1969   LearningRate 0.2612   Epoch: 5   Global Step: 56560   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:21:33,176-Speed 5981.49 samples/sec   Loss 11.2458   LearningRate 0.2612   Epoch: 5   Global Step: 56570   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:21:40,025-Speed 5981.20 samples/sec   Loss 11.2224   LearningRate 0.2611   Epoch: 5   Global Step: 56580   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:21:46,895-Speed 5963.28 samples/sec   Loss 11.1606   LearningRate 0.2611   Epoch: 5   Global Step: 56590   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:21:53,746-Speed 5980.41 samples/sec   Loss 11.1983   LearningRate 0.2611   Epoch: 5   Global Step: 56600   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:22:00,619-Speed 5960.59 samples/sec   Loss 11.1825   LearningRate 0.2610   Epoch: 5   Global Step: 56610   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:22:07,479-Speed 5972.07 samples/sec   Loss 11.1591   LearningRate 0.2610   Epoch: 5   Global Step: 56620   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:22:14,328-Speed 5981.36 samples/sec   Loss 11.2154   LearningRate 0.2609   Epoch: 5   Global Step: 56630   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:22:21,189-Speed 5970.78 samples/sec   Loss 11.2107   LearningRate 0.2609   Epoch: 5   Global Step: 56640   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:22:28,038-Speed 5981.50 samples/sec   Loss 11.0809   LearningRate 0.2609   Epoch: 5   Global Step: 56650   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 07:22:34,895-Speed 5975.02 samples/sec   Loss 11.1928   LearningRate 0.2608   Epoch: 5   Global Step: 56660   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 07:22:41,739-Speed 5985.70 samples/sec   Loss 11.2299   LearningRate 0.2608   Epoch: 5   Global Step: 56670   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 07:22:48,591-Speed 5979.46 samples/sec   Loss 11.1396   LearningRate 0.2608   Epoch: 5   Global Step: 56680   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 07:22:55,447-Speed 5977.86 samples/sec   Loss 11.0586   LearningRate 0.2607   Epoch: 5   Global Step: 56690   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 07:23:02,310-Speed 5969.07 samples/sec   Loss 11.2740   LearningRate 0.2607   Epoch: 5   Global Step: 56700   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 07:23:09,164-Speed 5977.03 samples/sec   Loss 11.1954   LearningRate 0.2607   Epoch: 5   Global Step: 56710   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 07:23:16,032-Speed 5969.37 samples/sec   Loss 11.2045   LearningRate 0.2606   Epoch: 5   Global Step: 56720   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 07:23:22,893-Speed 5970.52 samples/sec   Loss 11.2336   LearningRate 0.2606   Epoch: 5   Global Step: 56730   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 07:23:29,733-Speed 5989.80 samples/sec   Loss 11.0642   LearningRate 0.2606   Epoch: 5   Global Step: 56740   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-01-08 07:23:36,599-Speed 5967.33 samples/sec   Loss 11.0882   LearningRate 0.2605   Epoch: 5   Global Step: 56750   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:23:43,445-Speed 5983.57 samples/sec   Loss 11.1613   LearningRate 0.2605   Epoch: 5   Global Step: 56760   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:23:50,299-Speed 5977.16 samples/sec   Loss 11.0792   LearningRate 0.2605   Epoch: 5   Global Step: 56770   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:23:57,174-Speed 5959.66 samples/sec   Loss 11.2475   LearningRate 0.2604   Epoch: 5   Global Step: 56780   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:24:04,014-Speed 5989.47 samples/sec   Loss 11.2009   LearningRate 0.2604   Epoch: 5   Global Step: 56790   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:24:10,899-Speed 5950.98 samples/sec   Loss 11.1939   LearningRate 0.2604   Epoch: 5   Global Step: 56800   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:24:17,788-Speed 5951.35 samples/sec   Loss 11.2414   LearningRate 0.2603   Epoch: 5   Global Step: 56810   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:24:24,639-Speed 5979.71 samples/sec   Loss 11.1258   LearningRate 0.2603   Epoch: 5   Global Step: 56820   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:24:31,483-Speed 5985.67 samples/sec   Loss 11.2144   LearningRate 0.2603   Epoch: 5   Global Step: 56830   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:24:38,358-Speed 5959.13 samples/sec   Loss 11.0928   LearningRate 0.2602   Epoch: 5   Global Step: 56840   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-01-08 07:24:45,206-Speed 5982.78 samples/sec   Loss 11.1562   LearningRate 0.2602   Epoch: 5   Global Step: 56850   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:24:52,052-Speed 5983.43 samples/sec   Loss 11.1492   LearningRate 0.2602   Epoch: 5   Global Step: 56860   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:24:58,929-Speed 5983.41 samples/sec   Loss 11.1047   LearningRate 0.2601   Epoch: 5   Global Step: 56870   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:25:05,797-Speed 5964.21 samples/sec   Loss 11.1577   LearningRate 0.2601   Epoch: 5   Global Step: 56880   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-01-08 07:25:12,669-Speed 5962.20 samples/sec   Loss 11.2167   LearningRate 0.2600   Epoch: 5   Global Step: 56890   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:25:19,525-Speed 5975.20 samples/sec   Loss 11.1963   LearningRate 0.2600   Epoch: 5   Global Step: 56900   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:25:26,389-Speed 5968.33 samples/sec   Loss 11.2066   LearningRate 0.2600   Epoch: 5   Global Step: 56910   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:25:33,236-Speed 5983.67 samples/sec   Loss 11.1415   LearningRate 0.2599   Epoch: 5   Global Step: 56920   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:25:40,080-Speed 5986.09 samples/sec   Loss 11.2373   LearningRate 0.2599   Epoch: 5   Global Step: 56930   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:25:46,934-Speed 5976.31 samples/sec   Loss 11.1345   LearningRate 0.2599   Epoch: 5   Global Step: 56940   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:25:53,780-Speed 5984.50 samples/sec   Loss 11.1296   LearningRate 0.2598   Epoch: 5   Global Step: 56950   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:26:00,629-Speed 5981.02 samples/sec   Loss 11.1936   LearningRate 0.2598   Epoch: 5   Global Step: 56960   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:26:07,473-Speed 5985.41 samples/sec   Loss 11.1373   LearningRate 0.2598   Epoch: 5   Global Step: 56970   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:26:14,341-Speed 5965.40 samples/sec   Loss 11.2421   LearningRate 0.2597   Epoch: 5   Global Step: 56980   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:26:21,196-Speed 5976.15 samples/sec   Loss 11.1271   LearningRate 0.2597   Epoch: 5   Global Step: 56990   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:26:28,060-Speed 5971.05 samples/sec   Loss 11.1361   LearningRate 0.2597   Epoch: 5   Global Step: 57000   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:26:34,924-Speed 5968.22 samples/sec   Loss 11.0740   LearningRate 0.2596   Epoch: 5   Global Step: 57010   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:26:41,789-Speed 5968.12 samples/sec   Loss 11.1932   LearningRate 0.2596   Epoch: 5   Global Step: 57020   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:26:48,633-Speed 5985.37 samples/sec   Loss 11.1386   LearningRate 0.2596   Epoch: 5   Global Step: 57030   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:26:55,550-Speed 5922.64 samples/sec   Loss 11.1205   LearningRate 0.2595   Epoch: 5   Global Step: 57040   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:27:02,422-Speed 5961.91 samples/sec   Loss 11.1001   LearningRate 0.2595   Epoch: 5   Global Step: 57050   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:27:09,302-Speed 5953.74 samples/sec   Loss 11.1075   LearningRate 0.2595   Epoch: 5   Global Step: 57060   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:27:16,165-Speed 5970.24 samples/sec   Loss 11.1885   LearningRate 0.2594   Epoch: 5   Global Step: 57070   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:27:23,012-Speed 5983.08 samples/sec   Loss 11.0905   LearningRate 0.2594   Epoch: 5   Global Step: 57080   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:27:29,882-Speed 5963.30 samples/sec   Loss 11.1587   LearningRate 0.2594   Epoch: 5   Global Step: 57090   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:27:36,758-Speed 5958.12 samples/sec   Loss 11.2329   LearningRate 0.2593   Epoch: 5   Global Step: 57100   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:27:43,645-Speed 5948.76 samples/sec   Loss 11.0608   LearningRate 0.2593   Epoch: 5   Global Step: 57110   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:27:50,492-Speed 5983.10 samples/sec   Loss 11.1414   LearningRate 0.2593   Epoch: 5   Global Step: 57120   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:27:57,352-Speed 5972.07 samples/sec   Loss 11.1557   LearningRate 0.2592   Epoch: 5   Global Step: 57130   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:28:04,205-Speed 5978.01 samples/sec   Loss 11.0733   LearningRate 0.2592   Epoch: 5   Global Step: 57140   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:28:11,075-Speed 5963.80 samples/sec   Loss 11.0784   LearningRate 0.2592   Epoch: 5   Global Step: 57150   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:28:17,949-Speed 5959.26 samples/sec   Loss 11.1448   LearningRate 0.2591   Epoch: 5   Global Step: 57160   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:28:24,810-Speed 5971.50 samples/sec   Loss 11.1127   LearningRate 0.2591   Epoch: 5   Global Step: 57170   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:28:31,661-Speed 5980.31 samples/sec   Loss 11.1240   LearningRate 0.2590   Epoch: 5   Global Step: 57180   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:28:38,514-Speed 5977.55 samples/sec   Loss 11.2201   LearningRate 0.2590   Epoch: 5   Global Step: 57190   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:28:45,379-Speed 5967.96 samples/sec   Loss 11.1560   LearningRate 0.2590   Epoch: 5   Global Step: 57200   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:28:52,232-Speed 5977.79 samples/sec   Loss 11.1579   LearningRate 0.2589   Epoch: 5   Global Step: 57210   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:28:59,076-Speed 5985.55 samples/sec   Loss 11.1047   LearningRate 0.2589   Epoch: 5   Global Step: 57220   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:29:05,951-Speed 5959.64 samples/sec   Loss 11.1447   LearningRate 0.2589   Epoch: 5   Global Step: 57230   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:29:12,801-Speed 5979.78 samples/sec   Loss 11.0994   LearningRate 0.2588   Epoch: 5   Global Step: 57240   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:29:19,657-Speed 5975.81 samples/sec   Loss 11.1860   LearningRate 0.2588   Epoch: 5   Global Step: 57250   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:29:26,534-Speed 5957.26 samples/sec   Loss 11.2132   LearningRate 0.2588   Epoch: 5   Global Step: 57260   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:29:33,405-Speed 5961.82 samples/sec   Loss 11.1179   LearningRate 0.2587   Epoch: 5   Global Step: 57270   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:29:40,285-Speed 5956.86 samples/sec   Loss 11.1156   LearningRate 0.2587   Epoch: 5   Global Step: 57280   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:29:47,134-Speed 5982.49 samples/sec   Loss 11.1894   LearningRate 0.2587   Epoch: 5   Global Step: 57290   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:29:53,989-Speed 5975.10 samples/sec   Loss 11.1284   LearningRate 0.2586   Epoch: 5   Global Step: 57300   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:30:00,841-Speed 5979.80 samples/sec   Loss 11.1461   LearningRate 0.2586   Epoch: 5   Global Step: 57310   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:30:07,691-Speed 5980.36 samples/sec   Loss 11.1647   LearningRate 0.2586   Epoch: 5   Global Step: 57320   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:30:14,544-Speed 5977.80 samples/sec   Loss 11.1459   LearningRate 0.2585   Epoch: 5   Global Step: 57330   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:30:21,418-Speed 5959.83 samples/sec   Loss 11.0691   LearningRate 0.2585   Epoch: 5   Global Step: 57340   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:30:28,296-Speed 5956.78 samples/sec   Loss 11.0994   LearningRate 0.2585   Epoch: 5   Global Step: 57350   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:30:35,144-Speed 5982.08 samples/sec   Loss 11.1196   LearningRate 0.2584   Epoch: 5   Global Step: 57360   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:30:41,985-Speed 5988.01 samples/sec   Loss 11.1049   LearningRate 0.2584   Epoch: 5   Global Step: 57370   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:30:48,843-Speed 5974.27 samples/sec   Loss 11.0957   LearningRate 0.2584   Epoch: 5   Global Step: 57380   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:30:55,687-Speed 5985.35 samples/sec   Loss 11.1965   LearningRate 0.2583   Epoch: 5   Global Step: 57390   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:31:02,559-Speed 5961.62 samples/sec   Loss 11.1426   LearningRate 0.2583   Epoch: 5   Global Step: 57400   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:31:09,417-Speed 5973.99 samples/sec   Loss 11.2073   LearningRate 0.2583   Epoch: 5   Global Step: 57410   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:31:16,262-Speed 5984.24 samples/sec   Loss 11.1915   LearningRate 0.2582   Epoch: 5   Global Step: 57420   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:31:23,107-Speed 5985.61 samples/sec   Loss 11.0933   LearningRate 0.2582   Epoch: 5   Global Step: 57430   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:31:29,952-Speed 5985.05 samples/sec   Loss 11.1224   LearningRate 0.2582   Epoch: 5   Global Step: 57440   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:31:36,846-Speed 5941.89 samples/sec   Loss 11.1737   LearningRate 0.2581   Epoch: 5   Global Step: 57450   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:31:43,694-Speed 5982.49 samples/sec   Loss 11.1387   LearningRate 0.2581   Epoch: 5   Global Step: 57460   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:31:50,539-Speed 5984.61 samples/sec   Loss 11.1369   LearningRate 0.2580   Epoch: 5   Global Step: 57470   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:31:57,476-Speed 5905.47 samples/sec   Loss 11.2035   LearningRate 0.2580   Epoch: 5   Global Step: 57480   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:32:04,449-Speed 5874.88 samples/sec   Loss 11.1196   LearningRate 0.2580   Epoch: 5   Global Step: 57490   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:32:11,303-Speed 5977.33 samples/sec   Loss 11.1174   LearningRate 0.2579   Epoch: 5   Global Step: 57500   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:32:18,250-Speed 5897.02 samples/sec   Loss 11.1657   LearningRate 0.2579   Epoch: 5   Global Step: 57510   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:32:25,127-Speed 5957.43 samples/sec   Loss 11.1426   LearningRate 0.2579   Epoch: 5   Global Step: 57520   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:32:31,982-Speed 5976.38 samples/sec   Loss 11.2112   LearningRate 0.2578   Epoch: 5   Global Step: 57530   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:32:38,840-Speed 5974.20 samples/sec   Loss 11.1323   LearningRate 0.2578   Epoch: 5   Global Step: 57540   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:32:45,694-Speed 5976.85 samples/sec   Loss 11.0882   LearningRate 0.2578   Epoch: 5   Global Step: 57550   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:32:52,559-Speed 5967.43 samples/sec   Loss 11.0962   LearningRate 0.2577   Epoch: 5   Global Step: 57560   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:32:59,418-Speed 5973.16 samples/sec   Loss 11.1366   LearningRate 0.2577   Epoch: 5   Global Step: 57570   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:33:06,279-Speed 5971.36 samples/sec   Loss 11.1911   LearningRate 0.2577   Epoch: 5   Global Step: 57580   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:33:13,156-Speed 5956.77 samples/sec   Loss 11.0539   LearningRate 0.2576   Epoch: 5   Global Step: 57590   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:33:20,030-Speed 5960.48 samples/sec   Loss 11.1174   LearningRate 0.2576   Epoch: 5   Global Step: 57600   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:33:26,872-Speed 5986.98 samples/sec   Loss 11.1796   LearningRate 0.2576   Epoch: 5   Global Step: 57610   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:33:33,746-Speed 5959.81 samples/sec   Loss 11.0866   LearningRate 0.2575   Epoch: 5   Global Step: 57620   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:33:40,595-Speed 5981.28 samples/sec   Loss 11.1026   LearningRate 0.2575   Epoch: 5   Global Step: 57630   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:33:47,442-Speed 5983.86 samples/sec   Loss 11.0983   LearningRate 0.2575   Epoch: 5   Global Step: 57640   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:33:54,294-Speed 5979.20 samples/sec   Loss 11.1819   LearningRate 0.2574   Epoch: 5   Global Step: 57650   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:34:01,145-Speed 5979.14 samples/sec   Loss 11.1339   LearningRate 0.2574   Epoch: 5   Global Step: 57660   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:34:08,008-Speed 5969.61 samples/sec   Loss 11.0390   LearningRate 0.2574   Epoch: 5   Global Step: 57670   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:34:14,858-Speed 5980.97 samples/sec   Loss 11.0895   LearningRate 0.2573   Epoch: 5   Global Step: 57680   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:34:21,726-Speed 5964.44 samples/sec   Loss 11.1013   LearningRate 0.2573   Epoch: 5   Global Step: 57690   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:34:28,584-Speed 5974.32 samples/sec   Loss 11.1065   LearningRate 0.2573   Epoch: 5   Global Step: 57700   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:34:35,430-Speed 5984.33 samples/sec   Loss 11.1088   LearningRate 0.2572   Epoch: 5   Global Step: 57710   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:34:42,265-Speed 5993.13 samples/sec   Loss 11.1229   LearningRate 0.2572   Epoch: 5   Global Step: 57720   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:34:49,109-Speed 5985.74 samples/sec   Loss 11.0316   LearningRate 0.2572   Epoch: 5   Global Step: 57730   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:34:55,965-Speed 5975.47 samples/sec   Loss 11.0862   LearningRate 0.2571   Epoch: 5   Global Step: 57740   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:35:02,818-Speed 5977.46 samples/sec   Loss 11.1019   LearningRate 0.2571   Epoch: 5   Global Step: 57750   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:35:09,696-Speed 5956.97 samples/sec   Loss 11.1071   LearningRate 0.2571   Epoch: 5   Global Step: 57760   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:35:16,545-Speed 5981.64 samples/sec   Loss 11.1159   LearningRate 0.2570   Epoch: 5   Global Step: 57770   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:35:23,405-Speed 5975.03 samples/sec   Loss 10.9872   LearningRate 0.2570   Epoch: 5   Global Step: 57780   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:35:30,260-Speed 5977.40 samples/sec   Loss 11.0568   LearningRate 0.2569   Epoch: 5   Global Step: 57790   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:35:37,119-Speed 5972.55 samples/sec   Loss 11.0807   LearningRate 0.2569   Epoch: 5   Global Step: 57800   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:35:44,004-Speed 5950.23 samples/sec   Loss 11.0724   LearningRate 0.2569   Epoch: 5   Global Step: 57810   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:35:50,855-Speed 5980.13 samples/sec   Loss 11.0874   LearningRate 0.2568   Epoch: 5   Global Step: 57820   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:35:57,731-Speed 5959.17 samples/sec   Loss 11.1861   LearningRate 0.2568   Epoch: 5   Global Step: 57830   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:36:04,586-Speed 5976.16 samples/sec   Loss 11.0084   LearningRate 0.2568   Epoch: 5   Global Step: 57840   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:36:11,446-Speed 5981.34 samples/sec   Loss 11.0982   LearningRate 0.2567   Epoch: 5   Global Step: 57850   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:36:18,297-Speed 5979.95 samples/sec   Loss 11.0477   LearningRate 0.2567   Epoch: 5   Global Step: 57860   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:36:25,148-Speed 5979.23 samples/sec   Loss 11.0856   LearningRate 0.2567   Epoch: 5   Global Step: 57870   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:36:31,988-Speed 5989.77 samples/sec   Loss 11.1261   LearningRate 0.2566   Epoch: 5   Global Step: 57880   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:36:38,878-Speed 5945.70 samples/sec   Loss 11.0950   LearningRate 0.2566   Epoch: 5   Global Step: 57890   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:36:45,738-Speed 5971.49 samples/sec   Loss 11.1002   LearningRate 0.2566   Epoch: 5   Global Step: 57900   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:36:52,593-Speed 5976.85 samples/sec   Loss 10.9918   LearningRate 0.2565   Epoch: 5   Global Step: 57910   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:36:59,459-Speed 5966.21 samples/sec   Loss 11.1239   LearningRate 0.2565   Epoch: 5   Global Step: 57920   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:37:06,318-Speed 5973.08 samples/sec   Loss 11.1765   LearningRate 0.2565   Epoch: 5   Global Step: 57930   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:37:13,166-Speed 5981.83 samples/sec   Loss 11.1475   LearningRate 0.2564   Epoch: 5   Global Step: 57940   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:37:20,043-Speed 5969.11 samples/sec   Loss 10.9884   LearningRate 0.2564   Epoch: 5   Global Step: 57950   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:37:26,901-Speed 5973.92 samples/sec   Loss 11.1155   LearningRate 0.2564   Epoch: 5   Global Step: 57960   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:37:33,762-Speed 5971.25 samples/sec   Loss 11.1073   LearningRate 0.2563   Epoch: 5   Global Step: 57970   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:37:40,614-Speed 5980.86 samples/sec   Loss 11.0834   LearningRate 0.2563   Epoch: 5   Global Step: 57980   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:37:47,466-Speed 5978.99 samples/sec   Loss 11.0480   LearningRate 0.2563   Epoch: 5   Global Step: 57990   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:37:54,323-Speed 5974.28 samples/sec   Loss 11.0886   LearningRate 0.2562   Epoch: 5   Global Step: 58000   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:38:01,181-Speed 5973.63 samples/sec   Loss 11.1076   LearningRate 0.2562   Epoch: 5   Global Step: 58010   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:38:08,037-Speed 5975.24 samples/sec   Loss 11.1031   LearningRate 0.2562   Epoch: 5   Global Step: 58020   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:38:14,888-Speed 5981.75 samples/sec   Loss 11.0716   LearningRate 0.2561   Epoch: 5   Global Step: 58030   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:38:21,774-Speed 5949.18 samples/sec   Loss 11.0694   LearningRate 0.2561   Epoch: 5   Global Step: 58040   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:38:28,634-Speed 5971.75 samples/sec   Loss 11.1110   LearningRate 0.2561   Epoch: 5   Global Step: 58050   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:38:35,479-Speed 5985.53 samples/sec   Loss 11.0947   LearningRate 0.2560   Epoch: 5   Global Step: 58060   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:38:42,351-Speed 5962.56 samples/sec   Loss 11.0262   LearningRate 0.2560   Epoch: 5   Global Step: 58070   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:38:49,222-Speed 5961.90 samples/sec   Loss 11.0271   LearningRate 0.2560   Epoch: 5   Global Step: 58080   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:38:56,092-Speed 5963.74 samples/sec   Loss 11.1219   LearningRate 0.2559   Epoch: 5   Global Step: 58090   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:39:02,962-Speed 5963.05 samples/sec   Loss 11.0562   LearningRate 0.2559   Epoch: 5   Global Step: 58100   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:39:09,847-Speed 5950.68 samples/sec   Loss 11.0265   LearningRate 0.2559   Epoch: 5   Global Step: 58110   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:39:16,703-Speed 5975.63 samples/sec   Loss 11.0775   LearningRate 0.2558   Epoch: 5   Global Step: 58120   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:39:23,551-Speed 5981.94 samples/sec   Loss 11.0685   LearningRate 0.2558   Epoch: 5   Global Step: 58130   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:39:30,419-Speed 5965.24 samples/sec   Loss 11.0872   LearningRate 0.2557   Epoch: 5   Global Step: 58140   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:39:37,268-Speed 5983.92 samples/sec   Loss 11.0401   LearningRate 0.2557   Epoch: 5   Global Step: 58150   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:39:44,113-Speed 5984.11 samples/sec   Loss 11.0559   LearningRate 0.2557   Epoch: 5   Global Step: 58160   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:39:50,971-Speed 5973.72 samples/sec   Loss 11.0799   LearningRate 0.2556   Epoch: 5   Global Step: 58170   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:39:57,840-Speed 5964.40 samples/sec   Loss 11.1330   LearningRate 0.2556   Epoch: 5   Global Step: 58180   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:40:04,690-Speed 5980.78 samples/sec   Loss 11.0602   LearningRate 0.2556   Epoch: 5   Global Step: 58190   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:40:11,550-Speed 5971.83 samples/sec   Loss 11.1420   LearningRate 0.2555   Epoch: 5   Global Step: 58200   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:40:18,409-Speed 5973.19 samples/sec   Loss 11.1628   LearningRate 0.2555   Epoch: 5   Global Step: 58210   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:40:25,256-Speed 5982.68 samples/sec   Loss 11.0155   LearningRate 0.2555   Epoch: 5   Global Step: 58220   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:40:32,107-Speed 5980.08 samples/sec   Loss 11.1588   LearningRate 0.2554   Epoch: 5   Global Step: 58230   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:40:38,947-Speed 5989.27 samples/sec   Loss 11.1036   LearningRate 0.2554   Epoch: 5   Global Step: 58240   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:40:45,823-Speed 5957.96 samples/sec   Loss 11.0565   LearningRate 0.2554   Epoch: 5   Global Step: 58250   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:40:52,667-Speed 5985.68 samples/sec   Loss 11.0798   LearningRate 0.2553   Epoch: 5   Global Step: 58260   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:40:59,542-Speed 5959.45 samples/sec   Loss 11.0949   LearningRate 0.2553   Epoch: 5   Global Step: 58270   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:41:06,406-Speed 5971.67 samples/sec   Loss 11.0450   LearningRate 0.2553   Epoch: 5   Global Step: 58280   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:41:13,266-Speed 5971.34 samples/sec   Loss 11.1341   LearningRate 0.2552   Epoch: 5   Global Step: 58290   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:41:20,126-Speed 5971.95 samples/sec   Loss 11.0812   LearningRate 0.2552   Epoch: 5   Global Step: 58300   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:41:26,978-Speed 5979.03 samples/sec   Loss 11.0968   LearningRate 0.2552   Epoch: 5   Global Step: 58310   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:41:33,830-Speed 5978.88 samples/sec   Loss 11.0267   LearningRate 0.2551   Epoch: 5   Global Step: 58320   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:41:40,686-Speed 5975.36 samples/sec   Loss 11.0452   LearningRate 0.2551   Epoch: 5   Global Step: 58330   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:41:47,553-Speed 5966.25 samples/sec   Loss 11.0405   LearningRate 0.2551   Epoch: 5   Global Step: 58340   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:41:54,420-Speed 5965.09 samples/sec   Loss 11.0739   LearningRate 0.2550   Epoch: 5   Global Step: 58350   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:42:01,268-Speed 5983.02 samples/sec   Loss 11.0903   LearningRate 0.2550   Epoch: 5   Global Step: 58360   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:42:08,149-Speed 5953.23 samples/sec   Loss 11.0045   LearningRate 0.2550   Epoch: 5   Global Step: 58370   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:42:15,006-Speed 5975.03 samples/sec   Loss 11.0410   LearningRate 0.2549   Epoch: 5   Global Step: 58380   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:42:21,870-Speed 5968.16 samples/sec   Loss 11.0567   LearningRate 0.2549   Epoch: 5   Global Step: 58390   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:42:28,751-Speed 5953.92 samples/sec   Loss 11.1302   LearningRate 0.2549   Epoch: 5   Global Step: 58400   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:42:35,612-Speed 5971.08 samples/sec   Loss 11.0888   LearningRate 0.2548   Epoch: 5   Global Step: 58410   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:42:42,500-Speed 5948.24 samples/sec   Loss 11.0910   LearningRate 0.2548   Epoch: 5   Global Step: 58420   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:42:49,442-Speed 5901.31 samples/sec   Loss 11.0970   LearningRate 0.2548   Epoch: 5   Global Step: 58430   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:42:56,376-Speed 5908.00 samples/sec   Loss 11.0870   LearningRate 0.2547   Epoch: 5   Global Step: 58440   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:43:03,266-Speed 5945.66 samples/sec   Loss 10.9941   LearningRate 0.2547   Epoch: 5   Global Step: 58450   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:43:10,118-Speed 5978.90 samples/sec   Loss 11.0833   LearningRate 0.2547   Epoch: 5   Global Step: 58460   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:43:16,975-Speed 5974.51 samples/sec   Loss 11.0130   LearningRate 0.2546   Epoch: 5   Global Step: 58470   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:43:23,841-Speed 5967.21 samples/sec   Loss 11.0876   LearningRate 0.2546   Epoch: 5   Global Step: 58480   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:43:30,706-Speed 5969.10 samples/sec   Loss 10.9609   LearningRate 0.2545   Epoch: 5   Global Step: 58490   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:43:37,542-Speed 5992.98 samples/sec   Loss 11.0762   LearningRate 0.2545   Epoch: 5   Global Step: 58500   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:43:44,413-Speed 5962.20 samples/sec   Loss 11.0874   LearningRate 0.2545   Epoch: 5   Global Step: 58510   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:43:51,261-Speed 5982.51 samples/sec   Loss 11.0222   LearningRate 0.2544   Epoch: 5   Global Step: 58520   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:43:58,154-Speed 5944.04 samples/sec   Loss 11.0884   LearningRate 0.2544   Epoch: 5   Global Step: 58530   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:44:05,005-Speed 5979.45 samples/sec   Loss 11.0198   LearningRate 0.2544   Epoch: 5   Global Step: 58540   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:44:11,923-Speed 5924.12 samples/sec   Loss 11.0267   LearningRate 0.2543   Epoch: 5   Global Step: 58550   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:44:18,831-Speed 5930.24 samples/sec   Loss 11.0143   LearningRate 0.2543   Epoch: 5   Global Step: 58560   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:44:25,689-Speed 5974.00 samples/sec   Loss 10.9875   LearningRate 0.2543   Epoch: 5   Global Step: 58570   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:44:32,530-Speed 5988.21 samples/sec   Loss 11.0393   LearningRate 0.2542   Epoch: 5   Global Step: 58580   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:44:39,376-Speed 5983.68 samples/sec   Loss 10.9956   LearningRate 0.2542   Epoch: 5   Global Step: 58590   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:44:46,236-Speed 5973.34 samples/sec   Loss 11.0871   LearningRate 0.2542   Epoch: 5   Global Step: 58600   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:44:53,092-Speed 5975.73 samples/sec   Loss 11.0653   LearningRate 0.2541   Epoch: 5   Global Step: 58610   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:44:59,928-Speed 5993.30 samples/sec   Loss 11.0956   LearningRate 0.2541   Epoch: 5   Global Step: 58620   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:45:06,809-Speed 5953.11 samples/sec   Loss 11.0465   LearningRate 0.2541   Epoch: 5   Global Step: 58630   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:45:13,663-Speed 5978.08 samples/sec   Loss 11.1326   LearningRate 0.2540   Epoch: 5   Global Step: 58640   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:45:20,519-Speed 5978.14 samples/sec   Loss 10.9417   LearningRate 0.2540   Epoch: 5   Global Step: 58650   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:45:27,394-Speed 5958.30 samples/sec   Loss 11.0481   LearningRate 0.2540   Epoch: 5   Global Step: 58660   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:45:34,265-Speed 5962.66 samples/sec   Loss 11.0433   LearningRate 0.2539   Epoch: 5   Global Step: 58670   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:45:41,106-Speed 5988.56 samples/sec   Loss 11.0357   LearningRate 0.2539   Epoch: 5   Global Step: 58680   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:45:47,961-Speed 5976.50 samples/sec   Loss 11.0352   LearningRate 0.2539   Epoch: 5   Global Step: 58690   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:45:54,807-Speed 5984.62 samples/sec   Loss 11.1124   LearningRate 0.2538   Epoch: 5   Global Step: 58700   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:46:01,647-Speed 5988.98 samples/sec   Loss 11.0719   LearningRate 0.2538   Epoch: 5   Global Step: 58710   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:46:08,496-Speed 5981.75 samples/sec   Loss 11.0618   LearningRate 0.2538   Epoch: 5   Global Step: 58720   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:46:15,351-Speed 5977.17 samples/sec   Loss 11.0119   LearningRate 0.2537   Epoch: 5   Global Step: 58730   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:46:22,316-Speed 5881.81 samples/sec   Loss 11.0061   LearningRate 0.2537   Epoch: 5   Global Step: 58740   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:46:29,274-Speed 5888.41 samples/sec   Loss 11.0078   LearningRate 0.2537   Epoch: 5   Global Step: 58750   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:46:36,149-Speed 5959.37 samples/sec   Loss 11.0765   LearningRate 0.2536   Epoch: 5   Global Step: 58760   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:46:43,010-Speed 5970.39 samples/sec   Loss 11.0603   LearningRate 0.2536   Epoch: 5   Global Step: 58770   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:46:49,876-Speed 5969.68 samples/sec   Loss 11.0413   LearningRate 0.2536   Epoch: 5   Global Step: 58780   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:46:56,833-Speed 5889.84 samples/sec   Loss 11.0742   LearningRate 0.2535   Epoch: 5   Global Step: 58790   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:47:03,675-Speed 5987.12 samples/sec   Loss 11.0079   LearningRate 0.2535   Epoch: 5   Global Step: 58800   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:47:10,511-Speed 5992.96 samples/sec   Loss 11.0312   LearningRate 0.2535   Epoch: 5   Global Step: 58810   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 07:47:17,357-Speed 5983.99 samples/sec   Loss 11.0495   LearningRate 0.2534   Epoch: 5   Global Step: 58820   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 07:47:24,209-Speed 5978.76 samples/sec   Loss 11.0208   LearningRate 0.2534   Epoch: 5   Global Step: 58830   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 07:47:31,062-Speed 5978.14 samples/sec   Loss 11.0827   LearningRate 0.2534   Epoch: 5   Global Step: 58840   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 07:47:37,909-Speed 5983.20 samples/sec   Loss 11.0113   LearningRate 0.2533   Epoch: 5   Global Step: 58850   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 07:47:44,756-Speed 5983.63 samples/sec   Loss 10.9380   LearningRate 0.2533   Epoch: 5   Global Step: 58860   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 07:47:51,609-Speed 5978.19 samples/sec   Loss 11.0759   LearningRate 0.2533   Epoch: 5   Global Step: 58870   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 07:47:58,462-Speed 5978.14 samples/sec   Loss 11.0827   LearningRate 0.2532   Epoch: 5   Global Step: 58880   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 07:48:05,314-Speed 5978.59 samples/sec   Loss 11.0036   LearningRate 0.2532   Epoch: 5   Global Step: 58890   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 07:48:12,164-Speed 5980.71 samples/sec   Loss 11.0328   LearningRate 0.2531   Epoch: 5   Global Step: 58900   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 07:48:19,015-Speed 5979.42 samples/sec   Loss 11.0462   LearningRate 0.2531   Epoch: 5   Global Step: 58910   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:48:25,874-Speed 5972.61 samples/sec   Loss 11.0549   LearningRate 0.2531   Epoch: 5   Global Step: 58920   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:48:32,721-Speed 5983.42 samples/sec   Loss 11.0295   LearningRate 0.2530   Epoch: 5   Global Step: 58930   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:48:39,575-Speed 5977.45 samples/sec   Loss 11.0109   LearningRate 0.2530   Epoch: 5   Global Step: 58940   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:48:46,454-Speed 5955.94 samples/sec   Loss 11.0670   LearningRate 0.2530   Epoch: 5   Global Step: 58950   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:48:53,330-Speed 5958.17 samples/sec   Loss 10.9993   LearningRate 0.2529   Epoch: 5   Global Step: 58960   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:49:00,204-Speed 5963.21 samples/sec   Loss 11.0279   LearningRate 0.2529   Epoch: 5   Global Step: 58970   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:49:07,054-Speed 5980.42 samples/sec   Loss 11.0889   LearningRate 0.2529   Epoch: 5   Global Step: 58980   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:49:13,909-Speed 5977.93 samples/sec   Loss 10.9382   LearningRate 0.2528   Epoch: 5   Global Step: 58990   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:49:21,560-Speed 5354.41 samples/sec   Loss 10.9662   LearningRate 0.2528   Epoch: 5   Global Step: 59000   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:49:28,470-Speed 5929.34 samples/sec   Loss 10.9707   LearningRate 0.2528   Epoch: 5   Global Step: 59010   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:49:35,320-Speed 5980.68 samples/sec   Loss 10.9518   LearningRate 0.2527   Epoch: 5   Global Step: 59020   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:49:42,187-Speed 5965.83 samples/sec   Loss 10.8839   LearningRate 0.2527   Epoch: 5   Global Step: 59030   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:49:49,054-Speed 5966.19 samples/sec   Loss 11.0263   LearningRate 0.2527   Epoch: 5   Global Step: 59040   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:49:55,912-Speed 5973.87 samples/sec   Loss 11.0752   LearningRate 0.2526   Epoch: 5   Global Step: 59050   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:50:02,768-Speed 5976.18 samples/sec   Loss 10.9715   LearningRate 0.2526   Epoch: 5   Global Step: 59060   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:50:09,627-Speed 5972.06 samples/sec   Loss 11.1035   LearningRate 0.2526   Epoch: 5   Global Step: 59070   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:50:16,493-Speed 5969.59 samples/sec   Loss 10.9970   LearningRate 0.2525   Epoch: 5   Global Step: 59080   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:50:23,340-Speed 5983.24 samples/sec   Loss 11.0890   LearningRate 0.2525   Epoch: 5   Global Step: 59090   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:50:30,188-Speed 5982.74 samples/sec   Loss 11.0306   LearningRate 0.2525   Epoch: 5   Global Step: 59100   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:50:37,045-Speed 5973.86 samples/sec   Loss 11.0322   LearningRate 0.2524   Epoch: 5   Global Step: 59110   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:50:43,894-Speed 5981.89 samples/sec   Loss 11.0122   LearningRate 0.2524   Epoch: 5   Global Step: 59120   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:50:50,763-Speed 5964.61 samples/sec   Loss 11.0965   LearningRate 0.2524   Epoch: 5   Global Step: 59130   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:50:57,629-Speed 5968.66 samples/sec   Loss 11.0256   LearningRate 0.2523   Epoch: 5   Global Step: 59140   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:51:04,478-Speed 5981.38 samples/sec   Loss 10.9240   LearningRate 0.2523   Epoch: 5   Global Step: 59150   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:51:11,371-Speed 5942.81 samples/sec   Loss 10.8674   LearningRate 0.2523   Epoch: 5   Global Step: 59160   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:51:18,276-Speed 5933.14 samples/sec   Loss 10.9846   LearningRate 0.2522   Epoch: 5   Global Step: 59170   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:51:25,171-Speed 5941.67 samples/sec   Loss 10.9719   LearningRate 0.2522   Epoch: 5   Global Step: 59180   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:51:32,052-Speed 5953.18 samples/sec   Loss 10.9629   LearningRate 0.2522   Epoch: 5   Global Step: 59190   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:51:38,969-Speed 5923.21 samples/sec   Loss 10.9577   LearningRate 0.2521   Epoch: 5   Global Step: 59200   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:51:45,850-Speed 5953.67 samples/sec   Loss 10.9293   LearningRate 0.2521   Epoch: 5   Global Step: 59210   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:51:52,711-Speed 5970.90 samples/sec   Loss 11.0120   LearningRate 0.2521   Epoch: 5   Global Step: 59220   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:51:59,606-Speed 5942.90 samples/sec   Loss 11.0442   LearningRate 0.2520   Epoch: 5   Global Step: 59230   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:52:06,491-Speed 5950.45 samples/sec   Loss 11.0809   LearningRate 0.2520   Epoch: 5   Global Step: 59240   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:52:13,361-Speed 5963.52 samples/sec   Loss 11.0392   LearningRate 0.2520   Epoch: 5   Global Step: 59250   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:52:20,215-Speed 5976.68 samples/sec   Loss 11.0345   LearningRate 0.2519   Epoch: 5   Global Step: 59260   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:52:27,084-Speed 5968.27 samples/sec   Loss 11.0307   LearningRate 0.2519   Epoch: 5   Global Step: 59270   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:52:33,971-Speed 5948.42 samples/sec   Loss 10.9880   LearningRate 0.2519   Epoch: 5   Global Step: 59280   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:52:40,818-Speed 5983.04 samples/sec   Loss 10.9725   LearningRate 0.2518   Epoch: 5   Global Step: 59290   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:52:47,678-Speed 5971.85 samples/sec   Loss 11.0437   LearningRate 0.2518   Epoch: 5   Global Step: 59300   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:52:54,539-Speed 5971.51 samples/sec   Loss 10.9414   LearningRate 0.2518   Epoch: 5   Global Step: 59310   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:53:01,794-Speed 5646.63 samples/sec   Loss 10.9843   LearningRate 0.2517   Epoch: 5   Global Step: 59320   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:53:08,641-Speed 5983.37 samples/sec   Loss 10.9755   LearningRate 0.2517   Epoch: 5   Global Step: 59330   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:53:15,511-Speed 5962.97 samples/sec   Loss 11.0004   LearningRate 0.2517   Epoch: 5   Global Step: 59340   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:53:22,396-Speed 5950.78 samples/sec   Loss 10.9435   LearningRate 0.2516   Epoch: 5   Global Step: 59350   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:53:29,237-Speed 5988.61 samples/sec   Loss 10.8997   LearningRate 0.2516   Epoch: 5   Global Step: 59360   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 07:53:36,086-Speed 5981.27 samples/sec   Loss 11.0354   LearningRate 0.2515   Epoch: 5   Global Step: 59370   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 07:53:42,931-Speed 5985.22 samples/sec   Loss 10.9805   LearningRate 0.2515   Epoch: 5   Global Step: 59380   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 07:53:49,786-Speed 5975.98 samples/sec   Loss 11.0633   LearningRate 0.2515   Epoch: 5   Global Step: 59390   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 07:53:56,636-Speed 5980.59 samples/sec   Loss 10.9948   LearningRate 0.2514   Epoch: 5   Global Step: 59400   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 07:54:03,503-Speed 5966.13 samples/sec   Loss 10.9563   LearningRate 0.2514   Epoch: 5   Global Step: 59410   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 07:54:10,350-Speed 5986.38 samples/sec   Loss 11.0525   LearningRate 0.2514   Epoch: 5   Global Step: 59420   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 07:54:17,198-Speed 5981.79 samples/sec   Loss 11.0245   LearningRate 0.2513   Epoch: 5   Global Step: 59430   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 07:54:24,049-Speed 5981.25 samples/sec   Loss 11.0032   LearningRate 0.2513   Epoch: 5   Global Step: 59440   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 07:54:30,897-Speed 5982.49 samples/sec   Loss 10.9790   LearningRate 0.2513   Epoch: 5   Global Step: 59450   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 07:54:37,741-Speed 5985.45 samples/sec   Loss 10.9759   LearningRate 0.2512   Epoch: 5   Global Step: 59460   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:54:44,594-Speed 5977.75 samples/sec   Loss 10.9656   LearningRate 0.2512   Epoch: 5   Global Step: 59470   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:54:51,559-Speed 5883.22 samples/sec   Loss 10.9650   LearningRate 0.2512   Epoch: 5   Global Step: 59480   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:54:58,525-Speed 5881.27 samples/sec   Loss 11.0114   LearningRate 0.2511   Epoch: 5   Global Step: 59490   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:55:05,391-Speed 5966.69 samples/sec   Loss 10.8976   LearningRate 0.2511   Epoch: 5   Global Step: 59500   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:55:12,244-Speed 5978.55 samples/sec   Loss 10.9894   LearningRate 0.2511   Epoch: 5   Global Step: 59510   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:55:19,092-Speed 5981.59 samples/sec   Loss 10.8820   LearningRate 0.2510   Epoch: 5   Global Step: 59520   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:55:25,935-Speed 5987.53 samples/sec   Loss 10.9460   LearningRate 0.2510   Epoch: 5   Global Step: 59530   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:55:32,805-Speed 5962.86 samples/sec   Loss 11.0768   LearningRate 0.2510   Epoch: 5   Global Step: 59540   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:55:39,657-Speed 5978.91 samples/sec   Loss 10.9802   LearningRate 0.2509   Epoch: 5   Global Step: 59550   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:55:46,516-Speed 5973.78 samples/sec   Loss 10.9879   LearningRate 0.2509   Epoch: 5   Global Step: 59560   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:55:53,374-Speed 5973.57 samples/sec   Loss 10.9036   LearningRate 0.2509   Epoch: 5   Global Step: 59570   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:56:00,207-Speed 5995.31 samples/sec   Loss 10.9563   LearningRate 0.2508   Epoch: 5   Global Step: 59580   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:56:07,065-Speed 5974.15 samples/sec   Loss 11.0150   LearningRate 0.2508   Epoch: 5   Global Step: 59590   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:56:13,983-Speed 5922.77 samples/sec   Loss 10.9823   LearningRate 0.2508   Epoch: 5   Global Step: 59600   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:56:20,902-Speed 5920.96 samples/sec   Loss 10.9775   LearningRate 0.2507   Epoch: 5   Global Step: 59610   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:56:27,825-Speed 5917.26 samples/sec   Loss 10.9021   LearningRate 0.2507   Epoch: 5   Global Step: 59620   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:56:34,664-Speed 5990.12 samples/sec   Loss 10.9294   LearningRate 0.2507   Epoch: 5   Global Step: 59630   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:56:41,536-Speed 5962.05 samples/sec   Loss 10.9624   LearningRate 0.2506   Epoch: 5   Global Step: 59640   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:56:48,979-Speed 5506.61 samples/sec   Loss 10.9593   LearningRate 0.2506   Epoch: 5   Global Step: 59650   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:56:55,885-Speed 5933.29 samples/sec   Loss 10.9514   LearningRate 0.2506   Epoch: 5   Global Step: 59660   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:57:02,795-Speed 5928.86 samples/sec   Loss 11.0047   LearningRate 0.2505   Epoch: 5   Global Step: 59670   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:57:09,641-Speed 5983.96 samples/sec   Loss 10.9492   LearningRate 0.2505   Epoch: 5   Global Step: 59680   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:57:16,490-Speed 5982.24 samples/sec   Loss 10.9737   LearningRate 0.2505   Epoch: 5   Global Step: 59690   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:57:23,327-Speed 5991.77 samples/sec   Loss 10.9394   LearningRate 0.2504   Epoch: 5   Global Step: 59700   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:57:30,184-Speed 5974.49 samples/sec   Loss 10.9986   LearningRate 0.2504   Epoch: 5   Global Step: 59710   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:57:37,036-Speed 5979.25 samples/sec   Loss 10.9485   LearningRate 0.2504   Epoch: 5   Global Step: 59720   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:57:43,885-Speed 5981.83 samples/sec   Loss 11.0077   LearningRate 0.2503   Epoch: 5   Global Step: 59730   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:57:50,744-Speed 5973.59 samples/sec   Loss 10.9824   LearningRate 0.2503   Epoch: 5   Global Step: 59740   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:57:57,607-Speed 5969.21 samples/sec   Loss 10.8995   LearningRate 0.2503   Epoch: 5   Global Step: 59750   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:58:04,468-Speed 5971.69 samples/sec   Loss 10.9534   LearningRate 0.2502   Epoch: 5   Global Step: 59760   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:58:11,335-Speed 5965.76 samples/sec   Loss 10.9706   LearningRate 0.2502   Epoch: 5   Global Step: 59770   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:58:18,212-Speed 5956.92 samples/sec   Loss 10.9758   LearningRate 0.2502   Epoch: 5   Global Step: 59780   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:58:25,059-Speed 5983.49 samples/sec   Loss 10.9378   LearningRate 0.2501   Epoch: 5   Global Step: 59790   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:58:31,906-Speed 5983.45 samples/sec   Loss 11.0278   LearningRate 0.2501   Epoch: 5   Global Step: 59800   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:58:38,788-Speed 5954.87 samples/sec   Loss 11.0931   LearningRate 0.2501   Epoch: 5   Global Step: 59810   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:58:45,642-Speed 5977.11 samples/sec   Loss 10.9661   LearningRate 0.2500   Epoch: 5   Global Step: 59820   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:58:52,517-Speed 5958.67 samples/sec   Loss 10.9694   LearningRate 0.2500   Epoch: 5   Global Step: 59830   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 07:58:59,380-Speed 5969.59 samples/sec   Loss 11.0279   LearningRate 0.2500   Epoch: 5   Global Step: 59840   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:59:06,255-Speed 5959.08 samples/sec   Loss 10.9211   LearningRate 0.2499   Epoch: 5   Global Step: 59850   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:59:13,109-Speed 5976.76 samples/sec   Loss 10.9068   LearningRate 0.2499   Epoch: 5   Global Step: 59860   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:59:19,965-Speed 5977.60 samples/sec   Loss 10.9365   LearningRate 0.2499   Epoch: 5   Global Step: 59870   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:59:26,824-Speed 5972.74 samples/sec   Loss 11.0105   LearningRate 0.2498   Epoch: 5   Global Step: 59880   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:59:33,690-Speed 5966.31 samples/sec   Loss 10.9756   LearningRate 0.2498   Epoch: 5   Global Step: 59890   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:59:40,549-Speed 5973.50 samples/sec   Loss 10.9696   LearningRate 0.2498   Epoch: 5   Global Step: 59900   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:59:47,402-Speed 5977.55 samples/sec   Loss 10.9700   LearningRate 0.2497   Epoch: 5   Global Step: 59910   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 07:59:54,264-Speed 5969.54 samples/sec   Loss 11.0115   LearningRate 0.2497   Epoch: 5   Global Step: 59920   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:00:01,121-Speed 5974.73 samples/sec   Loss 10.9204   LearningRate 0.2496   Epoch: 5   Global Step: 59930   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:00:07,995-Speed 5959.93 samples/sec   Loss 10.9400   LearningRate 0.2496   Epoch: 5   Global Step: 59940   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:00:14,879-Speed 5951.12 samples/sec   Loss 10.9752   LearningRate 0.2496   Epoch: 5   Global Step: 59950   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:00:21,740-Speed 5972.03 samples/sec   Loss 11.0660   LearningRate 0.2495   Epoch: 5   Global Step: 59960   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:00:28,611-Speed 5961.91 samples/sec   Loss 11.0093   LearningRate 0.2495   Epoch: 5   Global Step: 59970   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:00:35,460-Speed 5981.75 samples/sec   Loss 11.0031   LearningRate 0.2495   Epoch: 5   Global Step: 59980   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:00:42,332-Speed 5961.95 samples/sec   Loss 10.9845   LearningRate 0.2494   Epoch: 5   Global Step: 59990   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:00:49,211-Speed 5955.11 samples/sec   Loss 10.9592   LearningRate 0.2494   Epoch: 5   Global Step: 60000   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:01:16,394-[lfw][60000]XNorm: 23.897016
Training: 2022-01-08 08:01:16,395-[lfw][60000]Accuracy-Flip: 0.99683+-0.00293
Training: 2022-01-08 08:01:16,396-[lfw][60000]Accuracy-Highest: 0.99700
Training: 2022-01-08 08:01:47,711-[cfp_fp][60000]XNorm: 20.987406
Training: 2022-01-08 08:01:47,712-[cfp_fp][60000]Accuracy-Flip: 0.97557+-0.00773
Training: 2022-01-08 08:01:47,713-[cfp_fp][60000]Accuracy-Highest: 0.97686
Training: 2022-01-08 08:02:14,789-[agedb_30][60000]XNorm: 23.517364
Training: 2022-01-08 08:02:14,790-[agedb_30][60000]Accuracy-Flip: 0.96400+-0.00834
Training: 2022-01-08 08:02:14,790-[agedb_30][60000]Accuracy-Highest: 0.96400
Training: 2022-01-08 08:02:21,622-Speed 443.25 samples/sec   Loss 10.9680   LearningRate 0.2494   Epoch: 5   Global Step: 60010   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:02:28,448-Speed 6002.05 samples/sec   Loss 10.8945   LearningRate 0.2493   Epoch: 5   Global Step: 60020   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:02:35,292-Speed 5987.14 samples/sec   Loss 11.0046   LearningRate 0.2493   Epoch: 5   Global Step: 60030   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:02:42,154-Speed 5970.26 samples/sec   Loss 10.9984   LearningRate 0.2493   Epoch: 5   Global Step: 60040   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:02:49,004-Speed 5983.22 samples/sec   Loss 10.9467   LearningRate 0.2492   Epoch: 5   Global Step: 60050   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:02:55,842-Speed 5991.05 samples/sec   Loss 10.9290   LearningRate 0.2492   Epoch: 5   Global Step: 60060   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:03:02,692-Speed 5980.37 samples/sec   Loss 10.9721   LearningRate 0.2492   Epoch: 5   Global Step: 60070   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:03:09,570-Speed 5956.57 samples/sec   Loss 10.9121   LearningRate 0.2491   Epoch: 5   Global Step: 60080   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:03:16,417-Speed 5983.19 samples/sec   Loss 10.8770   LearningRate 0.2491   Epoch: 5   Global Step: 60090   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:03:23,284-Speed 5966.24 samples/sec   Loss 11.0155   LearningRate 0.2491   Epoch: 5   Global Step: 60100   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 08:03:30,129-Speed 5984.67 samples/sec   Loss 10.9281   LearningRate 0.2490   Epoch: 5   Global Step: 60110   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 08:03:36,971-Speed 5988.00 samples/sec   Loss 10.9760   LearningRate 0.2490   Epoch: 5   Global Step: 60120   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 08:03:44,834-Speed 5213.05 samples/sec   Loss 10.9977   LearningRate 0.2490   Epoch: 5   Global Step: 60130   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 08:03:51,671-Speed 5990.95 samples/sec   Loss 10.9258   LearningRate 0.2489   Epoch: 5   Global Step: 60140   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 08:03:58,513-Speed 5990.25 samples/sec   Loss 10.9163   LearningRate 0.2489   Epoch: 5   Global Step: 60150   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 08:04:05,357-Speed 5986.81 samples/sec   Loss 10.8595   LearningRate 0.2489   Epoch: 5   Global Step: 60160   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 08:04:12,210-Speed 5978.02 samples/sec   Loss 11.0014   LearningRate 0.2488   Epoch: 5   Global Step: 60170   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 08:04:19,051-Speed 5990.76 samples/sec   Loss 10.8937   LearningRate 0.2488   Epoch: 5   Global Step: 60180   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 08:04:25,904-Speed 5978.74 samples/sec   Loss 10.9154   LearningRate 0.2488   Epoch: 5   Global Step: 60190   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 08:04:32,776-Speed 5961.52 samples/sec   Loss 10.9069   LearningRate 0.2487   Epoch: 5   Global Step: 60200   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:04:39,619-Speed 5986.92 samples/sec   Loss 10.9328   LearningRate 0.2487   Epoch: 5   Global Step: 60210   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:04:46,461-Speed 5988.00 samples/sec   Loss 10.9427   LearningRate 0.2487   Epoch: 5   Global Step: 60220   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:04:53,315-Speed 5977.27 samples/sec   Loss 11.0262   LearningRate 0.2486   Epoch: 5   Global Step: 60230   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:05:00,186-Speed 5962.30 samples/sec   Loss 10.9520   LearningRate 0.2486   Epoch: 5   Global Step: 60240   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:05:07,059-Speed 5961.18 samples/sec   Loss 11.0108   LearningRate 0.2486   Epoch: 5   Global Step: 60250   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:05:13,930-Speed 5961.79 samples/sec   Loss 10.8619   LearningRate 0.2485   Epoch: 5   Global Step: 60260   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:05:20,793-Speed 5969.25 samples/sec   Loss 10.9756   LearningRate 0.2485   Epoch: 5   Global Step: 60270   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:05:27,662-Speed 5964.58 samples/sec   Loss 10.9069   LearningRate 0.2485   Epoch: 5   Global Step: 60280   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:05:34,532-Speed 5963.37 samples/sec   Loss 10.9767   LearningRate 0.2484   Epoch: 5   Global Step: 60290   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:05:41,415-Speed 5951.99 samples/sec   Loss 10.8803   LearningRate 0.2484   Epoch: 5   Global Step: 60300   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:05:48,291-Speed 5966.85 samples/sec   Loss 10.8752   LearningRate 0.2484   Epoch: 5   Global Step: 60310   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:05:55,144-Speed 5978.05 samples/sec   Loss 10.9502   LearningRate 0.2483   Epoch: 5   Global Step: 60320   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:06:02,003-Speed 5972.06 samples/sec   Loss 10.9574   LearningRate 0.2483   Epoch: 5   Global Step: 60330   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:06:08,861-Speed 5974.18 samples/sec   Loss 10.8674   LearningRate 0.2483   Epoch: 5   Global Step: 60340   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:06:15,716-Speed 5976.65 samples/sec   Loss 10.9358   LearningRate 0.2482   Epoch: 5   Global Step: 60350   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:06:22,562-Speed 5984.46 samples/sec   Loss 10.8773   LearningRate 0.2482   Epoch: 5   Global Step: 60360   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:06:29,439-Speed 5958.95 samples/sec   Loss 10.9015   LearningRate 0.2482   Epoch: 5   Global Step: 60370   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:06:36,299-Speed 5972.44 samples/sec   Loss 11.0007   LearningRate 0.2481   Epoch: 5   Global Step: 60380   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:06:43,165-Speed 5966.75 samples/sec   Loss 10.9490   LearningRate 0.2481   Epoch: 5   Global Step: 60390   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:06:50,096-Speed 5910.65 samples/sec   Loss 10.9847   LearningRate 0.2481   Epoch: 5   Global Step: 60400   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:06:56,933-Speed 5991.79 samples/sec   Loss 10.9042   LearningRate 0.2480   Epoch: 5   Global Step: 60410   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:07:03,776-Speed 5987.70 samples/sec   Loss 10.9488   LearningRate 0.2480   Epoch: 5   Global Step: 60420   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:07:10,664-Speed 5948.41 samples/sec   Loss 10.9353   LearningRate 0.2480   Epoch: 5   Global Step: 60430   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:07:17,511-Speed 5983.26 samples/sec   Loss 10.9974   LearningRate 0.2479   Epoch: 5   Global Step: 60440   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:07:24,389-Speed 5956.36 samples/sec   Loss 10.9412   LearningRate 0.2479   Epoch: 5   Global Step: 60450   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:07:31,230-Speed 5989.66 samples/sec   Loss 10.9169   LearningRate 0.2479   Epoch: 5   Global Step: 60460   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:07:38,096-Speed 5966.30 samples/sec   Loss 10.8702   LearningRate 0.2478   Epoch: 5   Global Step: 60470   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:07:44,957-Speed 5971.31 samples/sec   Loss 10.8963   LearningRate 0.2478   Epoch: 5   Global Step: 60480   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:07:51,824-Speed 5966.30 samples/sec   Loss 10.9371   LearningRate 0.2478   Epoch: 5   Global Step: 60490   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:07:58,680-Speed 5974.97 samples/sec   Loss 10.9120   LearningRate 0.2477   Epoch: 5   Global Step: 60500   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:08:05,531-Speed 5980.58 samples/sec   Loss 10.9212   LearningRate 0.2477   Epoch: 5   Global Step: 60510   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:08:12,398-Speed 5966.41 samples/sec   Loss 10.8908   LearningRate 0.2477   Epoch: 5   Global Step: 60520   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:08:19,265-Speed 5965.40 samples/sec   Loss 10.8940   LearningRate 0.2476   Epoch: 5   Global Step: 60530   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:08:26,123-Speed 5974.17 samples/sec   Loss 10.9227   LearningRate 0.2476   Epoch: 5   Global Step: 60540   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:08:32,986-Speed 5969.17 samples/sec   Loss 10.9005   LearningRate 0.2476   Epoch: 5   Global Step: 60550   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:08:39,834-Speed 5982.24 samples/sec   Loss 10.8940   LearningRate 0.2475   Epoch: 5   Global Step: 60560   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:08:46,738-Speed 5933.84 samples/sec   Loss 10.8291   LearningRate 0.2475   Epoch: 5   Global Step: 60570   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:08:53,613-Speed 5959.02 samples/sec   Loss 10.9181   LearningRate 0.2475   Epoch: 5   Global Step: 60580   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:09:00,473-Speed 5971.87 samples/sec   Loss 10.9345   LearningRate 0.2474   Epoch: 5   Global Step: 60590   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:09:07,336-Speed 5970.17 samples/sec   Loss 10.8666   LearningRate 0.2474   Epoch: 5   Global Step: 60600   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:09:14,203-Speed 5967.08 samples/sec   Loss 10.9920   LearningRate 0.2474   Epoch: 5   Global Step: 60610   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:09:21,038-Speed 5992.60 samples/sec   Loss 11.0234   LearningRate 0.2473   Epoch: 5   Global Step: 60620   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:09:27,907-Speed 5964.78 samples/sec   Loss 10.9318   LearningRate 0.2473   Epoch: 5   Global Step: 60630   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:09:34,785-Speed 5955.90 samples/sec   Loss 10.8720   LearningRate 0.2473   Epoch: 5   Global Step: 60640   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:09:41,640-Speed 5976.89 samples/sec   Loss 10.8694   LearningRate 0.2472   Epoch: 5   Global Step: 60650   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:09:48,487-Speed 5983.14 samples/sec   Loss 10.8322   LearningRate 0.2472   Epoch: 5   Global Step: 60660   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:09:55,360-Speed 5961.35 samples/sec   Loss 10.8621   LearningRate 0.2472   Epoch: 5   Global Step: 60670   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:10:02,250-Speed 5945.95 samples/sec   Loss 10.9026   LearningRate 0.2471   Epoch: 5   Global Step: 60680   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:10:09,116-Speed 5966.46 samples/sec   Loss 10.8950   LearningRate 0.2471   Epoch: 5   Global Step: 60690   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:10:15,977-Speed 5972.47 samples/sec   Loss 10.8576   LearningRate 0.2470   Epoch: 5   Global Step: 60700   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:10:22,934-Speed 5888.20 samples/sec   Loss 10.9985   LearningRate 0.2470   Epoch: 5   Global Step: 60710   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:10:29,894-Speed 5886.18 samples/sec   Loss 10.9249   LearningRate 0.2470   Epoch: 5   Global Step: 60720   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:10:36,759-Speed 5967.81 samples/sec   Loss 10.9191   LearningRate 0.2469   Epoch: 5   Global Step: 60730   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:10:43,632-Speed 5961.13 samples/sec   Loss 10.8871   LearningRate 0.2469   Epoch: 5   Global Step: 60740   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:10:50,495-Speed 5975.72 samples/sec   Loss 10.9144   LearningRate 0.2469   Epoch: 5   Global Step: 60750   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:10:57,349-Speed 5976.38 samples/sec   Loss 10.9265   LearningRate 0.2468   Epoch: 5   Global Step: 60760   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:11:04,216-Speed 5965.73 samples/sec   Loss 10.8936   LearningRate 0.2468   Epoch: 5   Global Step: 60770   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:11:11,099-Speed 5952.13 samples/sec   Loss 10.8737   LearningRate 0.2468   Epoch: 5   Global Step: 60780   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:11:17,997-Speed 5939.95 samples/sec   Loss 10.9099   LearningRate 0.2467   Epoch: 5   Global Step: 60790   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:11:24,851-Speed 5976.34 samples/sec   Loss 10.9108   LearningRate 0.2467   Epoch: 5   Global Step: 60800   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:11:31,700-Speed 5981.34 samples/sec   Loss 10.8104   LearningRate 0.2467   Epoch: 5   Global Step: 60810   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:11:38,549-Speed 5981.86 samples/sec   Loss 10.9811   LearningRate 0.2466   Epoch: 5   Global Step: 60820   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:11:45,402-Speed 5977.61 samples/sec   Loss 10.9390   LearningRate 0.2466   Epoch: 5   Global Step: 60830   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:11:52,258-Speed 5975.32 samples/sec   Loss 10.8913   LearningRate 0.2466   Epoch: 5   Global Step: 60840   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:11:59,111-Speed 5978.56 samples/sec   Loss 10.9106   LearningRate 0.2465   Epoch: 5   Global Step: 60850   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:12:05,992-Speed 5953.24 samples/sec   Loss 10.8682   LearningRate 0.2465   Epoch: 5   Global Step: 60860   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:12:12,869-Speed 5958.11 samples/sec   Loss 10.9418   LearningRate 0.2465   Epoch: 5   Global Step: 60870   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:12:19,750-Speed 5953.72 samples/sec   Loss 10.8841   LearningRate 0.2464   Epoch: 5   Global Step: 60880   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:12:26,623-Speed 5960.61 samples/sec   Loss 10.8138   LearningRate 0.2464   Epoch: 5   Global Step: 60890   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:12:33,482-Speed 5973.18 samples/sec   Loss 10.8256   LearningRate 0.2464   Epoch: 5   Global Step: 60900   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:12:40,350-Speed 5964.46 samples/sec   Loss 10.8977   LearningRate 0.2463   Epoch: 5   Global Step: 60910   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:12:47,206-Speed 5975.97 samples/sec   Loss 10.8798   LearningRate 0.2463   Epoch: 5   Global Step: 60920   Fp16 Grad Scale: 524288   Required: 29 hours
Training: 2022-01-08 08:12:54,064-Speed 5973.55 samples/sec   Loss 10.9124   LearningRate 0.2463   Epoch: 5   Global Step: 60930   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:13:00,930-Speed 5966.29 samples/sec   Loss 10.9386   LearningRate 0.2462   Epoch: 5   Global Step: 60940   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:13:07,807-Speed 5957.73 samples/sec   Loss 10.8742   LearningRate 0.2462   Epoch: 5   Global Step: 60950   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:13:14,667-Speed 5971.38 samples/sec   Loss 10.8130   LearningRate 0.2462   Epoch: 5   Global Step: 60960   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:13:21,517-Speed 5980.45 samples/sec   Loss 10.8941   LearningRate 0.2461   Epoch: 5   Global Step: 60970   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:13:28,399-Speed 5952.53 samples/sec   Loss 10.7931   LearningRate 0.2461   Epoch: 5   Global Step: 60980   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:13:35,281-Speed 5952.75 samples/sec   Loss 10.9300   LearningRate 0.2461   Epoch: 5   Global Step: 60990   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:13:42,132-Speed 5980.67 samples/sec   Loss 10.9069   LearningRate 0.2460   Epoch: 5   Global Step: 61000   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:13:48,981-Speed 5981.05 samples/sec   Loss 10.9363   LearningRate 0.2460   Epoch: 5   Global Step: 61010   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:13:55,840-Speed 5972.47 samples/sec   Loss 10.8537   LearningRate 0.2460   Epoch: 5   Global Step: 61020   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:14:02,693-Speed 5978.31 samples/sec   Loss 10.9714   LearningRate 0.2459   Epoch: 5   Global Step: 61030   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:14:09,540-Speed 5983.36 samples/sec   Loss 10.9354   LearningRate 0.2459   Epoch: 5   Global Step: 61040   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:14:16,415-Speed 5958.71 samples/sec   Loss 10.9316   LearningRate 0.2459   Epoch: 5   Global Step: 61050   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:14:23,262-Speed 5983.62 samples/sec   Loss 10.8777   LearningRate 0.2458   Epoch: 5   Global Step: 61060   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:14:30,115-Speed 5977.26 samples/sec   Loss 10.9061   LearningRate 0.2458   Epoch: 5   Global Step: 61070   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:14:36,980-Speed 5967.94 samples/sec   Loss 10.8400   LearningRate 0.2458   Epoch: 5   Global Step: 61080   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:14:43,872-Speed 5944.01 samples/sec   Loss 10.8433   LearningRate 0.2457   Epoch: 5   Global Step: 61090   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:14:50,740-Speed 5965.75 samples/sec   Loss 10.9081   LearningRate 0.2457   Epoch: 5   Global Step: 61100   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:14:57,597-Speed 5974.24 samples/sec   Loss 10.9189   LearningRate 0.2457   Epoch: 5   Global Step: 61110   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:15:04,448-Speed 5979.51 samples/sec   Loss 10.8371   LearningRate 0.2456   Epoch: 5   Global Step: 61120   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:15:11,304-Speed 5976.16 samples/sec   Loss 10.8579   LearningRate 0.2456   Epoch: 5   Global Step: 61130   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:15:18,164-Speed 5971.86 samples/sec   Loss 10.8910   LearningRate 0.2456   Epoch: 5   Global Step: 61140   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:15:25,017-Speed 5979.71 samples/sec   Loss 10.7978   LearningRate 0.2455   Epoch: 5   Global Step: 61150   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:15:31,864-Speed 5983.60 samples/sec   Loss 10.8719   LearningRate 0.2455   Epoch: 5   Global Step: 61160   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:15:38,707-Speed 5986.34 samples/sec   Loss 10.9315   LearningRate 0.2455   Epoch: 5   Global Step: 61170   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:15:45,583-Speed 5958.00 samples/sec   Loss 10.8892   LearningRate 0.2454   Epoch: 5   Global Step: 61180   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:15:52,462-Speed 5955.18 samples/sec   Loss 10.9736   LearningRate 0.2454   Epoch: 5   Global Step: 61190   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:15:59,347-Speed 5950.32 samples/sec   Loss 10.8796   LearningRate 0.2454   Epoch: 5   Global Step: 61200   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:16:06,224-Speed 5957.04 samples/sec   Loss 10.8503   LearningRate 0.2453   Epoch: 5   Global Step: 61210   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:16:13,077-Speed 5978.22 samples/sec   Loss 10.8996   LearningRate 0.2453   Epoch: 5   Global Step: 61220   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:16:19,983-Speed 5933.76 samples/sec   Loss 10.8308   LearningRate 0.2453   Epoch: 5   Global Step: 61230   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:16:26,832-Speed 5982.23 samples/sec   Loss 10.8788   LearningRate 0.2452   Epoch: 5   Global Step: 61240   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:16:33,700-Speed 5965.06 samples/sec   Loss 10.8007   LearningRate 0.2452   Epoch: 5   Global Step: 61250   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:16:40,567-Speed 5966.45 samples/sec   Loss 10.8542   LearningRate 0.2452   Epoch: 5   Global Step: 61260   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:16:47,516-Speed 5895.34 samples/sec   Loss 10.8654   LearningRate 0.2451   Epoch: 5   Global Step: 61270   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:16:54,455-Speed 5904.16 samples/sec   Loss 10.9050   LearningRate 0.2451   Epoch: 5   Global Step: 61280   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:17:01,387-Speed 5911.31 samples/sec   Loss 10.9123   LearningRate 0.2451   Epoch: 5   Global Step: 61290   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:17:08,313-Speed 5915.48 samples/sec   Loss 10.8972   LearningRate 0.2450   Epoch: 5   Global Step: 61300   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 08:17:15,223-Speed 5928.73 samples/sec   Loss 10.8613   LearningRate 0.2450   Epoch: 5   Global Step: 61310   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 08:17:22,097-Speed 5960.78 samples/sec   Loss 10.8203   LearningRate 0.2450   Epoch: 5   Global Step: 61320   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 08:17:28,965-Speed 5965.42 samples/sec   Loss 10.8662   LearningRate 0.2449   Epoch: 5   Global Step: 61330   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 08:17:35,819-Speed 5979.12 samples/sec   Loss 10.8793   LearningRate 0.2449   Epoch: 5   Global Step: 61340   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 08:17:42,675-Speed 5974.93 samples/sec   Loss 10.7946   LearningRate 0.2449   Epoch: 5   Global Step: 61350   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 08:17:49,528-Speed 5979.16 samples/sec   Loss 10.7732   LearningRate 0.2448   Epoch: 5   Global Step: 61360   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 08:17:56,396-Speed 5964.18 samples/sec   Loss 10.8268   LearningRate 0.2448   Epoch: 5   Global Step: 61370   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 08:18:03,263-Speed 5966.69 samples/sec   Loss 10.8466   LearningRate 0.2448   Epoch: 5   Global Step: 61380   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 08:18:10,140-Speed 5956.70 samples/sec   Loss 10.8161   LearningRate 0.2447   Epoch: 5   Global Step: 61390   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-01-08 08:18:16,994-Speed 5977.67 samples/sec   Loss 10.8169   LearningRate 0.2447   Epoch: 5   Global Step: 61400   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:18:23,877-Speed 5951.58 samples/sec   Loss 10.8461   LearningRate 0.2447   Epoch: 5   Global Step: 61410   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:18:30,739-Speed 5971.56 samples/sec   Loss 10.9019   LearningRate 0.2446   Epoch: 5   Global Step: 61420   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:18:37,586-Speed 5982.72 samples/sec   Loss 10.9134   LearningRate 0.2446   Epoch: 5   Global Step: 61430   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:18:44,438-Speed 5979.29 samples/sec   Loss 10.8788   LearningRate 0.2446   Epoch: 5   Global Step: 61440   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:18:51,303-Speed 5968.37 samples/sec   Loss 10.9044   LearningRate 0.2445   Epoch: 5   Global Step: 61450   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:18:58,158-Speed 5975.30 samples/sec   Loss 10.8323   LearningRate 0.2445   Epoch: 5   Global Step: 61460   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:19:05,028-Speed 5964.08 samples/sec   Loss 10.8476   LearningRate 0.2445   Epoch: 5   Global Step: 61470   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:19:11,889-Speed 5970.92 samples/sec   Loss 10.8736   LearningRate 0.2444   Epoch: 5   Global Step: 61480   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:19:18,745-Speed 5975.53 samples/sec   Loss 10.7797   LearningRate 0.2444   Epoch: 5   Global Step: 61490   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:19:25,607-Speed 5973.36 samples/sec   Loss 10.8510   LearningRate 0.2444   Epoch: 5   Global Step: 61500   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:19:32,459-Speed 5979.51 samples/sec   Loss 10.8269   LearningRate 0.2443   Epoch: 5   Global Step: 61510   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:19:39,315-Speed 5975.36 samples/sec   Loss 10.7864   LearningRate 0.2443   Epoch: 5   Global Step: 61520   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:19:46,168-Speed 5979.82 samples/sec   Loss 10.8317   LearningRate 0.2443   Epoch: 5   Global Step: 61530   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:19:53,031-Speed 5969.57 samples/sec   Loss 10.8514   LearningRate 0.2442   Epoch: 5   Global Step: 61540   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:19:59,885-Speed 5977.13 samples/sec   Loss 10.8932   LearningRate 0.2442   Epoch: 5   Global Step: 61550   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:20:06,756-Speed 5961.93 samples/sec   Loss 10.8208   LearningRate 0.2442   Epoch: 5   Global Step: 61560   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:20:13,629-Speed 5960.68 samples/sec   Loss 10.8890   LearningRate 0.2441   Epoch: 5   Global Step: 61570   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:20:20,486-Speed 5974.91 samples/sec   Loss 10.8440   LearningRate 0.2441   Epoch: 5   Global Step: 61580   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:20:27,346-Speed 5971.42 samples/sec   Loss 10.8674   LearningRate 0.2441   Epoch: 5   Global Step: 61590   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:20:34,235-Speed 5949.68 samples/sec   Loss 10.7700   LearningRate 0.2440   Epoch: 5   Global Step: 61600   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:20:41,085-Speed 5980.48 samples/sec   Loss 10.8245   LearningRate 0.2440   Epoch: 5   Global Step: 61610   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:20:47,952-Speed 5966.52 samples/sec   Loss 10.9218   LearningRate 0.2440   Epoch: 5   Global Step: 61620   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:20:54,808-Speed 5975.15 samples/sec   Loss 10.9206   LearningRate 0.2439   Epoch: 5   Global Step: 61630   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:21:01,679-Speed 5962.15 samples/sec   Loss 10.8487   LearningRate 0.2439   Epoch: 5   Global Step: 61640   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:21:08,552-Speed 5961.15 samples/sec   Loss 10.7762   LearningRate 0.2439   Epoch: 5   Global Step: 61650   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:21:15,486-Speed 5909.29 samples/sec   Loss 10.8365   LearningRate 0.2438   Epoch: 5   Global Step: 61660   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:21:22,431-Speed 5898.44 samples/sec   Loss 10.8344   LearningRate 0.2438   Epoch: 5   Global Step: 61670   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:21:29,291-Speed 5973.44 samples/sec   Loss 10.8457   LearningRate 0.2438   Epoch: 5   Global Step: 61680   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:21:36,159-Speed 5968.13 samples/sec   Loss 10.7700   LearningRate 0.2437   Epoch: 5   Global Step: 61690   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:21:43,032-Speed 5961.06 samples/sec   Loss 10.8865   LearningRate 0.2437   Epoch: 5   Global Step: 61700   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:21:49,885-Speed 5978.03 samples/sec   Loss 10.8536   LearningRate 0.2437   Epoch: 5   Global Step: 61710   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:21:56,756-Speed 5962.83 samples/sec   Loss 10.8948   LearningRate 0.2436   Epoch: 5   Global Step: 61720   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:22:03,611-Speed 5976.02 samples/sec   Loss 10.7191   LearningRate 0.2436   Epoch: 5   Global Step: 61730   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:22:10,474-Speed 5969.61 samples/sec   Loss 10.8598   LearningRate 0.2436   Epoch: 5   Global Step: 61740   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:22:17,365-Speed 5945.73 samples/sec   Loss 10.8810   LearningRate 0.2435   Epoch: 5   Global Step: 61750   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:22:24,236-Speed 5961.65 samples/sec   Loss 10.8403   LearningRate 0.2435   Epoch: 5   Global Step: 61760   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:22:31,106-Speed 5975.05 samples/sec   Loss 10.8881   LearningRate 0.2435   Epoch: 5   Global Step: 61770   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:22:37,967-Speed 5971.56 samples/sec   Loss 10.7927   LearningRate 0.2434   Epoch: 5   Global Step: 61780   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:22:51,447-Speed 3038.78 samples/sec   Loss 10.9354   LearningRate 0.2434   Epoch: 5   Global Step: 61790   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:22:58,289-Speed 5988.59 samples/sec   Loss 10.8659   LearningRate 0.2434   Epoch: 5   Global Step: 61800   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:23:05,130-Speed 5988.31 samples/sec   Loss 10.7560   LearningRate 0.2433   Epoch: 5   Global Step: 61810   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:23:12,001-Speed 5965.24 samples/sec   Loss 10.8500   LearningRate 0.2433   Epoch: 5   Global Step: 61820   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:23:18,840-Speed 5990.28 samples/sec   Loss 10.8100   LearningRate 0.2433   Epoch: 5   Global Step: 61830   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:23:25,675-Speed 5993.56 samples/sec   Loss 10.7720   LearningRate 0.2432   Epoch: 5   Global Step: 61840   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:23:32,527-Speed 5979.12 samples/sec   Loss 10.8474   LearningRate 0.2432   Epoch: 5   Global Step: 61850   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:23:39,369-Speed 5987.84 samples/sec   Loss 10.8557   LearningRate 0.2432   Epoch: 5   Global Step: 61860   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:23:46,214-Speed 5984.12 samples/sec   Loss 10.7936   LearningRate 0.2431   Epoch: 5   Global Step: 61870   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:23:53,069-Speed 5976.06 samples/sec   Loss 10.8466   LearningRate 0.2431   Epoch: 5   Global Step: 61880   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:23:59,919-Speed 5981.21 samples/sec   Loss 10.8385   LearningRate 0.2431   Epoch: 5   Global Step: 61890   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:24:06,764-Speed 5984.71 samples/sec   Loss 10.7986   LearningRate 0.2430   Epoch: 5   Global Step: 61900   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:24:13,618-Speed 5983.89 samples/sec   Loss 10.8999   LearningRate 0.2430   Epoch: 5   Global Step: 61910   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:24:20,475-Speed 5976.80 samples/sec   Loss 10.8393   LearningRate 0.2430   Epoch: 5   Global Step: 61920   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:24:27,343-Speed 5964.54 samples/sec   Loss 10.8715   LearningRate 0.2429   Epoch: 5   Global Step: 61930   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:24:34,224-Speed 5955.84 samples/sec   Loss 10.8218   LearningRate 0.2429   Epoch: 5   Global Step: 61940   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-01-08 08:24:41,084-Speed 5971.68 samples/sec   Loss 10.8218   LearningRate 0.2429   Epoch: 5   Global Step: 61950   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:24:47,950-Speed 5966.79 samples/sec   Loss 10.7873   LearningRate 0.2428   Epoch: 5   Global Step: 61960   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:24:54,823-Speed 5960.81 samples/sec   Loss 10.7777   LearningRate 0.2428   Epoch: 5   Global Step: 61970   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-01-08 08:25:01,689-Speed 5966.83 samples/sec   Loss 10.8395   LearningRate 0.2428   Epoch: 5   Global Step: 61980   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:25:08,565-Speed 5958.21 samples/sec   Loss 10.7285   LearningRate 0.2427   Epoch: 5   Global Step: 61990   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:25:15,502-Speed 5906.61 samples/sec   Loss 10.8225   LearningRate 0.2427   Epoch: 5   Global Step: 62000   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:25:22,500-Speed 5854.30 samples/sec   Loss 10.7793   LearningRate 0.2427   Epoch: 5   Global Step: 62010   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:25:29,351-Speed 5979.96 samples/sec   Loss 10.7516   LearningRate 0.2426   Epoch: 5   Global Step: 62020   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:25:36,229-Speed 5956.82 samples/sec   Loss 10.8572   LearningRate 0.2426   Epoch: 5   Global Step: 62030   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:25:43,087-Speed 5974.42 samples/sec   Loss 10.6895   LearningRate 0.2426   Epoch: 5   Global Step: 62040   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:25:49,963-Speed 5957.53 samples/sec   Loss 10.7883   LearningRate 0.2425   Epoch: 5   Global Step: 62050   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:25:56,844-Speed 5954.52 samples/sec   Loss 10.8530   LearningRate 0.2425   Epoch: 5   Global Step: 62060   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:26:03,837-Speed 5858.00 samples/sec   Loss 10.8370   LearningRate 0.2425   Epoch: 5   Global Step: 62070   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:26:10,781-Speed 5900.09 samples/sec   Loss 10.9688   LearningRate 0.2424   Epoch: 5   Global Step: 62080   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:26:17,758-Speed 5873.08 samples/sec   Loss 10.8259   LearningRate 0.2424   Epoch: 5   Global Step: 62090   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:26:24,610-Speed 5978.53 samples/sec   Loss 10.9467   LearningRate 0.2424   Epoch: 5   Global Step: 62100   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:26:31,469-Speed 5973.12 samples/sec   Loss 10.8156   LearningRate 0.2423   Epoch: 5   Global Step: 62110   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:26:38,321-Speed 5978.48 samples/sec   Loss 10.8370   LearningRate 0.2423   Epoch: 5   Global Step: 62120   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:26:45,170-Speed 5982.38 samples/sec   Loss 10.8448   LearningRate 0.2423   Epoch: 5   Global Step: 62130   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:26:52,019-Speed 5980.72 samples/sec   Loss 10.7631   LearningRate 0.2422   Epoch: 5   Global Step: 62140   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:26:58,885-Speed 5967.14 samples/sec   Loss 10.8314   LearningRate 0.2422   Epoch: 5   Global Step: 62150   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:27:05,736-Speed 5980.00 samples/sec   Loss 10.8678   LearningRate 0.2422   Epoch: 5   Global Step: 62160   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:27:12,585-Speed 5981.06 samples/sec   Loss 10.7911   LearningRate 0.2421   Epoch: 5   Global Step: 62170   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:27:19,435-Speed 5981.78 samples/sec   Loss 10.8309   LearningRate 0.2421   Epoch: 5   Global Step: 62180   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:27:26,303-Speed 5966.79 samples/sec   Loss 10.7911   LearningRate 0.2421   Epoch: 5   Global Step: 62190   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:27:33,156-Speed 5978.03 samples/sec   Loss 10.8484   LearningRate 0.2420   Epoch: 5   Global Step: 62200   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:27:40,003-Speed 5983.45 samples/sec   Loss 10.8340   LearningRate 0.2420   Epoch: 5   Global Step: 62210   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:28:03,307-Speed 1757.75 samples/sec   Loss 10.7980   LearningRate 0.2420   Epoch: 6   Global Step: 62220   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:28:10,365-Speed 5804.38 samples/sec   Loss 10.8225   LearningRate 0.2419   Epoch: 6   Global Step: 62230   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:28:17,207-Speed 5987.77 samples/sec   Loss 10.8946   LearningRate 0.2419   Epoch: 6   Global Step: 62240   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:28:24,039-Speed 5996.73 samples/sec   Loss 10.8294   LearningRate 0.2419   Epoch: 6   Global Step: 62250   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:28:30,870-Speed 5997.88 samples/sec   Loss 10.8055   LearningRate 0.2418   Epoch: 6   Global Step: 62260   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:28:37,706-Speed 5993.38 samples/sec   Loss 10.8095   LearningRate 0.2418   Epoch: 6   Global Step: 62270   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:28:44,553-Speed 5983.06 samples/sec   Loss 10.8185   LearningRate 0.2418   Epoch: 6   Global Step: 62280   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:28:51,403-Speed 5980.07 samples/sec   Loss 10.8079   LearningRate 0.2417   Epoch: 6   Global Step: 62290   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:28:58,304-Speed 5938.01 samples/sec   Loss 10.8336   LearningRate 0.2417   Epoch: 6   Global Step: 62300   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:29:05,279-Speed 5874.40 samples/sec   Loss 10.7769   LearningRate 0.2417   Epoch: 6   Global Step: 62310   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:29:12,247-Speed 5879.76 samples/sec   Loss 10.7213   LearningRate 0.2416   Epoch: 6   Global Step: 62320   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:29:19,229-Speed 5867.62 samples/sec   Loss 10.7869   LearningRate 0.2416   Epoch: 6   Global Step: 62330   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:29:26,079-Speed 5980.85 samples/sec   Loss 10.7224   LearningRate 0.2416   Epoch: 6   Global Step: 62340   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:29:32,970-Speed 5945.40 samples/sec   Loss 10.7583   LearningRate 0.2415   Epoch: 6   Global Step: 62350   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:29:39,823-Speed 5977.73 samples/sec   Loss 10.7406   LearningRate 0.2415   Epoch: 6   Global Step: 62360   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:29:46,719-Speed 5941.30 samples/sec   Loss 10.8074   LearningRate 0.2415   Epoch: 6   Global Step: 62370   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:29:53,615-Speed 5940.82 samples/sec   Loss 10.7467   LearningRate 0.2414   Epoch: 6   Global Step: 62380   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:30:00,478-Speed 5969.72 samples/sec   Loss 10.7390   LearningRate 0.2414   Epoch: 6   Global Step: 62390   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:30:07,400-Speed 5918.17 samples/sec   Loss 10.8627   LearningRate 0.2414   Epoch: 6   Global Step: 62400   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:30:14,341-Speed 5902.15 samples/sec   Loss 10.8696   LearningRate 0.2413   Epoch: 6   Global Step: 62410   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:30:21,276-Speed 5907.16 samples/sec   Loss 10.8439   LearningRate 0.2413   Epoch: 6   Global Step: 62420   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:30:28,203-Speed 5914.01 samples/sec   Loss 10.7685   LearningRate 0.2413   Epoch: 6   Global Step: 62430   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:30:35,086-Speed 5951.70 samples/sec   Loss 10.7824   LearningRate 0.2412   Epoch: 6   Global Step: 62440   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:30:41,946-Speed 5971.95 samples/sec   Loss 10.8586   LearningRate 0.2412   Epoch: 6   Global Step: 62450   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:30:48,797-Speed 5980.02 samples/sec   Loss 10.7543   LearningRate 0.2412   Epoch: 6   Global Step: 62460   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:30:55,687-Speed 5945.43 samples/sec   Loss 10.8291   LearningRate 0.2411   Epoch: 6   Global Step: 62470   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:31:02,534-Speed 5985.38 samples/sec   Loss 10.8026   LearningRate 0.2411   Epoch: 6   Global Step: 62480   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:31:09,402-Speed 5964.84 samples/sec   Loss 10.7830   LearningRate 0.2411   Epoch: 6   Global Step: 62490   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:31:16,242-Speed 5988.62 samples/sec   Loss 10.7935   LearningRate 0.2410   Epoch: 6   Global Step: 62500   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:31:23,107-Speed 5967.71 samples/sec   Loss 10.7723   LearningRate 0.2410   Epoch: 6   Global Step: 62510   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:31:29,989-Speed 5952.18 samples/sec   Loss 10.7928   LearningRate 0.2410   Epoch: 6   Global Step: 62520   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:31:36,842-Speed 5979.07 samples/sec   Loss 10.7510   LearningRate 0.2409   Epoch: 6   Global Step: 62530   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:31:43,706-Speed 5968.25 samples/sec   Loss 10.7694   LearningRate 0.2409   Epoch: 6   Global Step: 62540   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:31:50,574-Speed 5964.73 samples/sec   Loss 10.7738   LearningRate 0.2409   Epoch: 6   Global Step: 62550   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:31:57,428-Speed 5977.55 samples/sec   Loss 10.7258   LearningRate 0.2408   Epoch: 6   Global Step: 62560   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:32:04,283-Speed 5978.15 samples/sec   Loss 10.7469   LearningRate 0.2408   Epoch: 6   Global Step: 62570   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:32:11,136-Speed 5977.68 samples/sec   Loss 10.7791   LearningRate 0.2408   Epoch: 6   Global Step: 62580   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:32:18,003-Speed 5966.52 samples/sec   Loss 10.7806   LearningRate 0.2407   Epoch: 6   Global Step: 62590   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:32:24,877-Speed 5959.37 samples/sec   Loss 10.7733   LearningRate 0.2407   Epoch: 6   Global Step: 62600   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:32:31,760-Speed 5952.22 samples/sec   Loss 10.7833   LearningRate 0.2407   Epoch: 6   Global Step: 62610   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:32:38,620-Speed 5971.69 samples/sec   Loss 10.7015   LearningRate 0.2406   Epoch: 6   Global Step: 62620   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:32:45,492-Speed 5962.02 samples/sec   Loss 10.7830   LearningRate 0.2406   Epoch: 6   Global Step: 62630   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:32:52,352-Speed 5972.08 samples/sec   Loss 10.7015   LearningRate 0.2406   Epoch: 6   Global Step: 62640   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:32:59,210-Speed 5973.82 samples/sec   Loss 10.8031   LearningRate 0.2405   Epoch: 6   Global Step: 62650   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:33:06,068-Speed 5973.10 samples/sec   Loss 10.8831   LearningRate 0.2405   Epoch: 6   Global Step: 62660   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:33:12,923-Speed 5979.13 samples/sec   Loss 10.7940   LearningRate 0.2405   Epoch: 6   Global Step: 62670   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:33:19,773-Speed 5980.74 samples/sec   Loss 10.8324   LearningRate 0.2404   Epoch: 6   Global Step: 62680   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:33:26,627-Speed 5976.86 samples/sec   Loss 10.8621   LearningRate 0.2404   Epoch: 6   Global Step: 62690   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:33:33,485-Speed 5975.32 samples/sec   Loss 10.8347   LearningRate 0.2404   Epoch: 6   Global Step: 62700   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:33:40,335-Speed 5980.59 samples/sec   Loss 10.8163   LearningRate 0.2403   Epoch: 6   Global Step: 62710   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:33:47,186-Speed 5979.69 samples/sec   Loss 10.8250   LearningRate 0.2403   Epoch: 6   Global Step: 62720   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:33:54,063-Speed 5957.16 samples/sec   Loss 10.7946   LearningRate 0.2403   Epoch: 6   Global Step: 62730   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:34:00,923-Speed 5972.01 samples/sec   Loss 10.7868   LearningRate 0.2402   Epoch: 6   Global Step: 62740   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:34:07,794-Speed 5961.83 samples/sec   Loss 10.7139   LearningRate 0.2402   Epoch: 6   Global Step: 62750   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:34:14,655-Speed 5973.12 samples/sec   Loss 10.7174   LearningRate 0.2402   Epoch: 6   Global Step: 62760   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:34:21,535-Speed 5955.42 samples/sec   Loss 10.6757   LearningRate 0.2401   Epoch: 6   Global Step: 62770   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:34:28,396-Speed 5970.42 samples/sec   Loss 10.7562   LearningRate 0.2401   Epoch: 6   Global Step: 62780   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:34:35,250-Speed 5978.15 samples/sec   Loss 10.7158   LearningRate 0.2401   Epoch: 6   Global Step: 62790   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:34:42,137-Speed 5948.75 samples/sec   Loss 10.6826   LearningRate 0.2400   Epoch: 6   Global Step: 62800   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:34:48,990-Speed 5977.37 samples/sec   Loss 10.7506   LearningRate 0.2400   Epoch: 6   Global Step: 62810   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:34:55,823-Speed 5995.08 samples/sec   Loss 10.7683   LearningRate 0.2400   Epoch: 6   Global Step: 62820   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:35:02,669-Speed 5985.00 samples/sec   Loss 10.6950   LearningRate 0.2399   Epoch: 6   Global Step: 62830   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:35:09,517-Speed 5981.80 samples/sec   Loss 10.7095   LearningRate 0.2399   Epoch: 6   Global Step: 62840   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:35:16,367-Speed 5980.72 samples/sec   Loss 10.7871   LearningRate 0.2399   Epoch: 6   Global Step: 62850   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:35:23,230-Speed 5972.16 samples/sec   Loss 10.8212   LearningRate 0.2398   Epoch: 6   Global Step: 62860   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:35:30,158-Speed 5913.09 samples/sec   Loss 10.8128   LearningRate 0.2398   Epoch: 6   Global Step: 62870   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:35:37,015-Speed 5974.84 samples/sec   Loss 10.7120   LearningRate 0.2398   Epoch: 6   Global Step: 62880   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 08:35:43,889-Speed 5959.73 samples/sec   Loss 10.7666   LearningRate 0.2397   Epoch: 6   Global Step: 62890   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 08:35:50,780-Speed 5945.72 samples/sec   Loss 10.7793   LearningRate 0.2397   Epoch: 6   Global Step: 62900   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 08:35:57,665-Speed 5952.22 samples/sec   Loss 10.8491   LearningRate 0.2397   Epoch: 6   Global Step: 62910   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 08:36:04,553-Speed 5948.28 samples/sec   Loss 10.7710   LearningRate 0.2396   Epoch: 6   Global Step: 62920   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 08:36:11,406-Speed 5977.87 samples/sec   Loss 10.7609   LearningRate 0.2396   Epoch: 6   Global Step: 62930   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 08:36:18,265-Speed 5972.94 samples/sec   Loss 10.7234   LearningRate 0.2396   Epoch: 6   Global Step: 62940   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 08:36:25,131-Speed 5966.57 samples/sec   Loss 10.8534   LearningRate 0.2395   Epoch: 6   Global Step: 62950   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 08:36:32,011-Speed 5954.92 samples/sec   Loss 10.7240   LearningRate 0.2395   Epoch: 6   Global Step: 62960   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 08:36:38,882-Speed 5962.67 samples/sec   Loss 10.7522   LearningRate 0.2395   Epoch: 6   Global Step: 62970   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 08:36:45,751-Speed 5964.24 samples/sec   Loss 10.6780   LearningRate 0.2394   Epoch: 6   Global Step: 62980   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:36:52,632-Speed 5953.74 samples/sec   Loss 10.7685   LearningRate 0.2394   Epoch: 6   Global Step: 62990   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:36:59,496-Speed 5968.19 samples/sec   Loss 10.7442   LearningRate 0.2394   Epoch: 6   Global Step: 63000   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:37:06,372-Speed 5959.23 samples/sec   Loss 10.7661   LearningRate 0.2393   Epoch: 6   Global Step: 63010   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:37:13,240-Speed 5964.86 samples/sec   Loss 10.8228   LearningRate 0.2393   Epoch: 6   Global Step: 63020   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:37:20,112-Speed 5961.31 samples/sec   Loss 10.8114   LearningRate 0.2393   Epoch: 6   Global Step: 63030   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:37:26,964-Speed 5978.71 samples/sec   Loss 10.6907   LearningRate 0.2392   Epoch: 6   Global Step: 63040   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:37:33,826-Speed 5970.66 samples/sec   Loss 10.7215   LearningRate 0.2392   Epoch: 6   Global Step: 63050   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:37:40,681-Speed 5976.92 samples/sec   Loss 10.7854   LearningRate 0.2392   Epoch: 6   Global Step: 63060   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:37:47,551-Speed 5962.43 samples/sec   Loss 10.7410   LearningRate 0.2391   Epoch: 6   Global Step: 63070   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:37:54,381-Speed 5998.72 samples/sec   Loss 10.7121   LearningRate 0.2391   Epoch: 6   Global Step: 63080   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 08:38:01,243-Speed 5970.58 samples/sec   Loss 10.8641   LearningRate 0.2391   Epoch: 6   Global Step: 63090   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 08:38:08,100-Speed 5975.46 samples/sec   Loss 10.8400   LearningRate 0.2390   Epoch: 6   Global Step: 63100   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 08:38:14,962-Speed 5971.06 samples/sec   Loss 10.7597   LearningRate 0.2390   Epoch: 6   Global Step: 63110   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 08:38:21,831-Speed 5963.95 samples/sec   Loss 10.7742   LearningRate 0.2390   Epoch: 6   Global Step: 63120   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 08:38:28,689-Speed 5973.88 samples/sec   Loss 10.7319   LearningRate 0.2389   Epoch: 6   Global Step: 63130   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 08:38:35,545-Speed 5975.88 samples/sec   Loss 10.7249   LearningRate 0.2389   Epoch: 6   Global Step: 63140   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 08:38:42,409-Speed 5969.29 samples/sec   Loss 10.7353   LearningRate 0.2389   Epoch: 6   Global Step: 63150   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 08:38:49,251-Speed 5987.41 samples/sec   Loss 10.7728   LearningRate 0.2388   Epoch: 6   Global Step: 63160   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 08:38:56,094-Speed 5987.09 samples/sec   Loss 10.7852   LearningRate 0.2388   Epoch: 6   Global Step: 63170   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-01-08 08:39:02,944-Speed 5980.85 samples/sec   Loss 10.6956   LearningRate 0.2388   Epoch: 6   Global Step: 63180   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 08:39:09,803-Speed 5972.99 samples/sec   Loss 10.7600   LearningRate 0.2387   Epoch: 6   Global Step: 63190   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 08:39:16,695-Speed 5944.46 samples/sec   Loss 10.7271   LearningRate 0.2387   Epoch: 6   Global Step: 63200   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 08:39:23,539-Speed 5985.92 samples/sec   Loss 10.7093   LearningRate 0.2387   Epoch: 6   Global Step: 63210   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 08:39:30,384-Speed 5984.59 samples/sec   Loss 10.7438   LearningRate 0.2386   Epoch: 6   Global Step: 63220   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 08:39:37,242-Speed 5973.86 samples/sec   Loss 10.7072   LearningRate 0.2386   Epoch: 6   Global Step: 63230   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 08:39:44,128-Speed 5949.86 samples/sec   Loss 10.7168   LearningRate 0.2386   Epoch: 6   Global Step: 63240   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 08:39:50,998-Speed 5963.19 samples/sec   Loss 10.8060   LearningRate 0.2385   Epoch: 6   Global Step: 63250   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 08:39:57,846-Speed 5982.54 samples/sec   Loss 10.6934   LearningRate 0.2385   Epoch: 6   Global Step: 63260   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 08:40:04,709-Speed 5970.00 samples/sec   Loss 10.7716   LearningRate 0.2385   Epoch: 6   Global Step: 63270   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 08:40:11,562-Speed 5978.00 samples/sec   Loss 10.7966   LearningRate 0.2384   Epoch: 6   Global Step: 63280   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:40:18,437-Speed 5959.00 samples/sec   Loss 10.6640   LearningRate 0.2384   Epoch: 6   Global Step: 63290   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:40:25,285-Speed 5982.32 samples/sec   Loss 10.7023   LearningRate 0.2384   Epoch: 6   Global Step: 63300   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:40:32,132-Speed 5983.28 samples/sec   Loss 10.7073   LearningRate 0.2383   Epoch: 6   Global Step: 63310   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:40:38,975-Speed 5987.04 samples/sec   Loss 10.6616   LearningRate 0.2383   Epoch: 6   Global Step: 63320   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:40:47,395-Speed 4865.03 samples/sec   Loss 10.6889   LearningRate 0.2383   Epoch: 6   Global Step: 63330   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:40:54,257-Speed 5970.77 samples/sec   Loss 10.6459   LearningRate 0.2382   Epoch: 6   Global Step: 63340   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:41:01,147-Speed 5945.28 samples/sec   Loss 10.6668   LearningRate 0.2382   Epoch: 6   Global Step: 63350   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:41:07,994-Speed 5983.87 samples/sec   Loss 10.7309   LearningRate 0.2382   Epoch: 6   Global Step: 63360   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:41:14,854-Speed 5971.86 samples/sec   Loss 10.7968   LearningRate 0.2381   Epoch: 6   Global Step: 63370   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:41:21,702-Speed 5982.16 samples/sec   Loss 10.6907   LearningRate 0.2381   Epoch: 6   Global Step: 63380   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:41:28,571-Speed 5964.77 samples/sec   Loss 10.7732   LearningRate 0.2381   Epoch: 6   Global Step: 63390   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:41:35,419-Speed 5982.29 samples/sec   Loss 10.7416   LearningRate 0.2380   Epoch: 6   Global Step: 63400   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:41:42,270-Speed 5979.31 samples/sec   Loss 10.7764   LearningRate 0.2380   Epoch: 6   Global Step: 63410   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:41:49,131-Speed 5971.83 samples/sec   Loss 10.7337   LearningRate 0.2380   Epoch: 6   Global Step: 63420   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:41:55,978-Speed 5983.47 samples/sec   Loss 10.7065   LearningRate 0.2379   Epoch: 6   Global Step: 63430   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:42:02,829-Speed 5979.26 samples/sec   Loss 10.6975   LearningRate 0.2379   Epoch: 6   Global Step: 63440   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:42:09,706-Speed 5958.33 samples/sec   Loss 10.7531   LearningRate 0.2379   Epoch: 6   Global Step: 63450   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:42:16,565-Speed 5972.52 samples/sec   Loss 10.7030   LearningRate 0.2378   Epoch: 6   Global Step: 63460   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:42:23,424-Speed 5974.91 samples/sec   Loss 10.6800   LearningRate 0.2378   Epoch: 6   Global Step: 63470   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:42:30,303-Speed 5955.15 samples/sec   Loss 10.7206   LearningRate 0.2378   Epoch: 6   Global Step: 63480   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:42:37,165-Speed 5969.95 samples/sec   Loss 10.7959   LearningRate 0.2377   Epoch: 6   Global Step: 63490   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:42:44,069-Speed 5933.76 samples/sec   Loss 10.7818   LearningRate 0.2377   Epoch: 6   Global Step: 63500   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:42:50,914-Speed 5984.52 samples/sec   Loss 10.7259   LearningRate 0.2377   Epoch: 6   Global Step: 63510   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:42:57,799-Speed 5951.83 samples/sec   Loss 10.7197   LearningRate 0.2376   Epoch: 6   Global Step: 63520   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:43:04,643-Speed 5985.74 samples/sec   Loss 10.6803   LearningRate 0.2376   Epoch: 6   Global Step: 63530   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:43:11,517-Speed 5959.89 samples/sec   Loss 10.7533   LearningRate 0.2376   Epoch: 6   Global Step: 63540   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:43:18,432-Speed 5924.18 samples/sec   Loss 10.6526   LearningRate 0.2375   Epoch: 6   Global Step: 63550   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:43:25,368-Speed 5907.20 samples/sec   Loss 10.6908   LearningRate 0.2375   Epoch: 6   Global Step: 63560   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:43:32,295-Speed 5914.35 samples/sec   Loss 10.6169   LearningRate 0.2375   Epoch: 6   Global Step: 63570   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:43:39,173-Speed 5956.42 samples/sec   Loss 10.7501   LearningRate 0.2374   Epoch: 6   Global Step: 63580   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:43:46,051-Speed 5956.50 samples/sec   Loss 10.7576   LearningRate 0.2374   Epoch: 6   Global Step: 63590   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:43:52,922-Speed 5964.20 samples/sec   Loss 10.7668   LearningRate 0.2374   Epoch: 6   Global Step: 63600   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:43:59,769-Speed 5982.93 samples/sec   Loss 10.7274   LearningRate 0.2373   Epoch: 6   Global Step: 63610   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:44:06,641-Speed 5961.63 samples/sec   Loss 10.6334   LearningRate 0.2373   Epoch: 6   Global Step: 63620   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:44:13,486-Speed 5985.07 samples/sec   Loss 10.7552   LearningRate 0.2373   Epoch: 6   Global Step: 63630   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:44:20,343-Speed 5974.71 samples/sec   Loss 10.6877   LearningRate 0.2372   Epoch: 6   Global Step: 63640   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:44:27,287-Speed 5899.93 samples/sec   Loss 10.6450   LearningRate 0.2372   Epoch: 6   Global Step: 63650   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:44:34,148-Speed 5971.62 samples/sec   Loss 10.7024   LearningRate 0.2372   Epoch: 6   Global Step: 63660   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:44:41,022-Speed 5959.49 samples/sec   Loss 10.6793   LearningRate 0.2371   Epoch: 6   Global Step: 63670   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:44:47,899-Speed 5957.34 samples/sec   Loss 10.6575   LearningRate 0.2371   Epoch: 6   Global Step: 63680   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:44:54,775-Speed 5958.07 samples/sec   Loss 10.6684   LearningRate 0.2371   Epoch: 6   Global Step: 63690   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:45:01,645-Speed 5964.09 samples/sec   Loss 10.6893   LearningRate 0.2370   Epoch: 6   Global Step: 63700   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:45:08,555-Speed 5928.34 samples/sec   Loss 10.7846   LearningRate 0.2370   Epoch: 6   Global Step: 63710   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:45:15,497-Speed 5902.29 samples/sec   Loss 10.7648   LearningRate 0.2370   Epoch: 6   Global Step: 63720   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:45:22,382-Speed 5950.12 samples/sec   Loss 10.6004   LearningRate 0.2369   Epoch: 6   Global Step: 63730   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:45:29,239-Speed 5974.91 samples/sec   Loss 10.6830   LearningRate 0.2369   Epoch: 6   Global Step: 63740   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:45:36,107-Speed 5964.50 samples/sec   Loss 10.8254   LearningRate 0.2369   Epoch: 6   Global Step: 63750   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:45:42,961-Speed 5977.35 samples/sec   Loss 10.6894   LearningRate 0.2368   Epoch: 6   Global Step: 63760   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:45:49,827-Speed 5966.31 samples/sec   Loss 10.7057   LearningRate 0.2368   Epoch: 6   Global Step: 63770   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:45:56,715-Speed 5947.39 samples/sec   Loss 10.7196   LearningRate 0.2368   Epoch: 6   Global Step: 63780   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:46:03,642-Speed 5917.12 samples/sec   Loss 10.7077   LearningRate 0.2367   Epoch: 6   Global Step: 63790   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:46:10,559-Speed 5923.99 samples/sec   Loss 10.7355   LearningRate 0.2367   Epoch: 6   Global Step: 63800   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:46:17,416-Speed 5974.63 samples/sec   Loss 10.6166   LearningRate 0.2367   Epoch: 6   Global Step: 63810   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:46:24,291-Speed 5958.72 samples/sec   Loss 10.7343   LearningRate 0.2367   Epoch: 6   Global Step: 63820   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:46:31,151-Speed 5972.53 samples/sec   Loss 10.6724   LearningRate 0.2366   Epoch: 6   Global Step: 63830   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:46:38,021-Speed 5964.90 samples/sec   Loss 10.6503   LearningRate 0.2366   Epoch: 6   Global Step: 63840   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:46:44,908-Speed 5948.47 samples/sec   Loss 10.6201   LearningRate 0.2366   Epoch: 6   Global Step: 63850   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:46:51,765-Speed 5975.11 samples/sec   Loss 10.6812   LearningRate 0.2365   Epoch: 6   Global Step: 63860   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:46:58,618-Speed 5977.65 samples/sec   Loss 10.7335   LearningRate 0.2365   Epoch: 6   Global Step: 63870   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:47:05,492-Speed 5982.26 samples/sec   Loss 10.7006   LearningRate 0.2365   Epoch: 6   Global Step: 63880   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:47:12,474-Speed 5867.67 samples/sec   Loss 10.5846   LearningRate 0.2364   Epoch: 6   Global Step: 63890   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:47:19,370-Speed 5940.78 samples/sec   Loss 10.6456   LearningRate 0.2364   Epoch: 6   Global Step: 63900   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:47:26,220-Speed 5979.95 samples/sec   Loss 10.7159   LearningRate 0.2364   Epoch: 6   Global Step: 63910   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:47:33,077-Speed 5974.39 samples/sec   Loss 10.7325   LearningRate 0.2363   Epoch: 6   Global Step: 63920   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:47:39,944-Speed 5966.70 samples/sec   Loss 10.6585   LearningRate 0.2363   Epoch: 6   Global Step: 63930   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:47:46,800-Speed 5974.69 samples/sec   Loss 10.6988   LearningRate 0.2363   Epoch: 6   Global Step: 63940   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:47:53,678-Speed 5956.75 samples/sec   Loss 10.6670   LearningRate 0.2362   Epoch: 6   Global Step: 63950   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:48:00,532-Speed 5977.57 samples/sec   Loss 10.6802   LearningRate 0.2362   Epoch: 6   Global Step: 63960   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:48:07,422-Speed 5946.00 samples/sec   Loss 10.7210   LearningRate 0.2362   Epoch: 6   Global Step: 63970   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:48:14,277-Speed 5976.64 samples/sec   Loss 10.6548   LearningRate 0.2361   Epoch: 6   Global Step: 63980   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:48:21,138-Speed 5970.86 samples/sec   Loss 10.6841   LearningRate 0.2361   Epoch: 6   Global Step: 63990   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:48:28,023-Speed 5950.88 samples/sec   Loss 10.6410   LearningRate 0.2361   Epoch: 6   Global Step: 64000   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:48:34,875-Speed 5978.89 samples/sec   Loss 10.6694   LearningRate 0.2360   Epoch: 6   Global Step: 64010   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:48:41,761-Speed 5949.59 samples/sec   Loss 10.6891   LearningRate 0.2360   Epoch: 6   Global Step: 64020   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:48:48,646-Speed 5950.27 samples/sec   Loss 10.6893   LearningRate 0.2360   Epoch: 6   Global Step: 64030   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:48:55,523-Speed 5957.54 samples/sec   Loss 10.7640   LearningRate 0.2359   Epoch: 6   Global Step: 64040   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:49:02,368-Speed 5984.54 samples/sec   Loss 10.8019   LearningRate 0.2359   Epoch: 6   Global Step: 64050   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:49:09,236-Speed 5967.65 samples/sec   Loss 10.6706   LearningRate 0.2359   Epoch: 6   Global Step: 64060   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:49:16,094-Speed 5973.81 samples/sec   Loss 10.6684   LearningRate 0.2358   Epoch: 6   Global Step: 64070   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:49:22,951-Speed 5974.93 samples/sec   Loss 10.6999   LearningRate 0.2358   Epoch: 6   Global Step: 64080   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:49:29,817-Speed 5966.88 samples/sec   Loss 10.5975   LearningRate 0.2358   Epoch: 6   Global Step: 64090   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:49:36,679-Speed 5971.39 samples/sec   Loss 10.6365   LearningRate 0.2357   Epoch: 6   Global Step: 64100   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:49:43,518-Speed 5989.68 samples/sec   Loss 10.6194   LearningRate 0.2357   Epoch: 6   Global Step: 64110   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:49:50,360-Speed 5987.38 samples/sec   Loss 10.7146   LearningRate 0.2357   Epoch: 6   Global Step: 64120   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:49:57,225-Speed 5968.02 samples/sec   Loss 10.5972   LearningRate 0.2356   Epoch: 6   Global Step: 64130   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:50:04,107-Speed 5955.36 samples/sec   Loss 10.5935   LearningRate 0.2356   Epoch: 6   Global Step: 64140   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:50:10,971-Speed 5970.46 samples/sec   Loss 10.6975   LearningRate 0.2356   Epoch: 6   Global Step: 64150   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:50:17,917-Speed 5897.91 samples/sec   Loss 10.6647   LearningRate 0.2355   Epoch: 6   Global Step: 64160   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:50:24,780-Speed 5969.94 samples/sec   Loss 10.6857   LearningRate 0.2355   Epoch: 6   Global Step: 64170   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:50:31,636-Speed 5975.30 samples/sec   Loss 10.5878   LearningRate 0.2355   Epoch: 6   Global Step: 64180   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:50:38,484-Speed 5984.68 samples/sec   Loss 10.7546   LearningRate 0.2354   Epoch: 6   Global Step: 64190   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:50:45,335-Speed 5980.00 samples/sec   Loss 10.6631   LearningRate 0.2354   Epoch: 6   Global Step: 64200   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:50:52,203-Speed 5965.16 samples/sec   Loss 10.6315   LearningRate 0.2354   Epoch: 6   Global Step: 64210   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:50:59,048-Speed 5985.19 samples/sec   Loss 10.6280   LearningRate 0.2353   Epoch: 6   Global Step: 64220   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:51:05,891-Speed 5986.56 samples/sec   Loss 10.6639   LearningRate 0.2353   Epoch: 6   Global Step: 64230   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:51:12,737-Speed 5984.75 samples/sec   Loss 10.6805   LearningRate 0.2353   Epoch: 6   Global Step: 64240   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:51:19,582-Speed 5984.68 samples/sec   Loss 10.6685   LearningRate 0.2352   Epoch: 6   Global Step: 64250   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:51:26,449-Speed 5965.43 samples/sec   Loss 10.6428   LearningRate 0.2352   Epoch: 6   Global Step: 64260   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:51:33,304-Speed 5976.51 samples/sec   Loss 10.7061   LearningRate 0.2352   Epoch: 6   Global Step: 64270   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:51:40,162-Speed 5973.65 samples/sec   Loss 10.5891   LearningRate 0.2351   Epoch: 6   Global Step: 64280   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:51:47,035-Speed 5960.65 samples/sec   Loss 10.7539   LearningRate 0.2351   Epoch: 6   Global Step: 64290   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:51:53,889-Speed 5977.16 samples/sec   Loss 10.6776   LearningRate 0.2351   Epoch: 6   Global Step: 64300   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:52:00,736-Speed 5982.75 samples/sec   Loss 10.6301   LearningRate 0.2350   Epoch: 6   Global Step: 64310   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:52:07,593-Speed 5974.72 samples/sec   Loss 10.6235   LearningRate 0.2350   Epoch: 6   Global Step: 64320   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:52:14,460-Speed 5966.14 samples/sec   Loss 10.6286   LearningRate 0.2350   Epoch: 6   Global Step: 64330   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:52:21,310-Speed 5980.05 samples/sec   Loss 10.7118   LearningRate 0.2349   Epoch: 6   Global Step: 64340   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:52:28,160-Speed 5980.39 samples/sec   Loss 10.6215   LearningRate 0.2349   Epoch: 6   Global Step: 64350   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:52:35,012-Speed 5979.02 samples/sec   Loss 10.5619   LearningRate 0.2349   Epoch: 6   Global Step: 64360   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:52:41,878-Speed 5966.45 samples/sec   Loss 10.6354   LearningRate 0.2348   Epoch: 6   Global Step: 64370   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:52:48,728-Speed 5980.80 samples/sec   Loss 10.7103   LearningRate 0.2348   Epoch: 6   Global Step: 64380   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:52:55,579-Speed 5982.86 samples/sec   Loss 10.7838   LearningRate 0.2348   Epoch: 6   Global Step: 64390   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:53:02,423-Speed 5986.26 samples/sec   Loss 10.7049   LearningRate 0.2347   Epoch: 6   Global Step: 64400   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:53:09,261-Speed 5992.05 samples/sec   Loss 10.6094   LearningRate 0.2347   Epoch: 6   Global Step: 64410   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:53:16,103-Speed 5987.01 samples/sec   Loss 10.6303   LearningRate 0.2347   Epoch: 6   Global Step: 64420   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:53:22,966-Speed 5971.14 samples/sec   Loss 10.6094   LearningRate 0.2346   Epoch: 6   Global Step: 64430   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:53:29,819-Speed 5977.73 samples/sec   Loss 10.6978   LearningRate 0.2346   Epoch: 6   Global Step: 64440   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:53:36,679-Speed 5971.96 samples/sec   Loss 10.6417   LearningRate 0.2346   Epoch: 6   Global Step: 64450   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:53:43,543-Speed 5968.20 samples/sec   Loss 10.5266   LearningRate 0.2345   Epoch: 6   Global Step: 64460   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:53:50,394-Speed 5980.25 samples/sec   Loss 10.6794   LearningRate 0.2345   Epoch: 6   Global Step: 64470   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:53:57,244-Speed 5980.95 samples/sec   Loss 10.5758   LearningRate 0.2345   Epoch: 6   Global Step: 64480   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:54:04,090-Speed 5983.76 samples/sec   Loss 10.6105   LearningRate 0.2344   Epoch: 6   Global Step: 64490   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:54:10,938-Speed 5985.42 samples/sec   Loss 10.7009   LearningRate 0.2344   Epoch: 6   Global Step: 64500   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:54:17,782-Speed 5985.88 samples/sec   Loss 10.6498   LearningRate 0.2344   Epoch: 6   Global Step: 64510   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:54:24,625-Speed 5987.30 samples/sec   Loss 10.6126   LearningRate 0.2343   Epoch: 6   Global Step: 64520   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:54:31,511-Speed 5950.96 samples/sec   Loss 10.6700   LearningRate 0.2343   Epoch: 6   Global Step: 64530   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:54:38,408-Speed 5939.22 samples/sec   Loss 10.6585   LearningRate 0.2343   Epoch: 6   Global Step: 64540   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:54:45,281-Speed 5963.66 samples/sec   Loss 10.6039   LearningRate 0.2343   Epoch: 6   Global Step: 64550   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:54:52,147-Speed 5967.41 samples/sec   Loss 10.6602   LearningRate 0.2342   Epoch: 6   Global Step: 64560   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:54:59,028-Speed 5953.76 samples/sec   Loss 10.6169   LearningRate 0.2342   Epoch: 6   Global Step: 64570   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:55:05,941-Speed 5926.09 samples/sec   Loss 10.6216   LearningRate 0.2342   Epoch: 6   Global Step: 64580   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:55:12,833-Speed 5944.54 samples/sec   Loss 10.6851   LearningRate 0.2341   Epoch: 6   Global Step: 64590   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:55:19,683-Speed 5982.55 samples/sec   Loss 10.6353   LearningRate 0.2341   Epoch: 6   Global Step: 64600   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:55:26,545-Speed 5972.72 samples/sec   Loss 10.7280   LearningRate 0.2341   Epoch: 6   Global Step: 64610   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:55:33,388-Speed 5986.89 samples/sec   Loss 10.6563   LearningRate 0.2340   Epoch: 6   Global Step: 64620   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:55:40,251-Speed 5969.14 samples/sec   Loss 10.6215   LearningRate 0.2340   Epoch: 6   Global Step: 64630   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:55:47,129-Speed 5956.88 samples/sec   Loss 10.6855   LearningRate 0.2340   Epoch: 6   Global Step: 64640   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:55:53,992-Speed 5969.64 samples/sec   Loss 10.6452   LearningRate 0.2339   Epoch: 6   Global Step: 64650   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:56:00,857-Speed 5967.34 samples/sec   Loss 10.5474   LearningRate 0.2339   Epoch: 6   Global Step: 64660   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:56:07,718-Speed 5971.05 samples/sec   Loss 10.6726   LearningRate 0.2339   Epoch: 6   Global Step: 64670   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:56:14,593-Speed 5959.69 samples/sec   Loss 10.6221   LearningRate 0.2338   Epoch: 6   Global Step: 64680   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:56:21,440-Speed 5983.68 samples/sec   Loss 10.6739   LearningRate 0.2338   Epoch: 6   Global Step: 64690   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:56:28,326-Speed 5949.19 samples/sec   Loss 10.6654   LearningRate 0.2338   Epoch: 6   Global Step: 64700   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:56:35,201-Speed 5959.31 samples/sec   Loss 10.6559   LearningRate 0.2337   Epoch: 6   Global Step: 64710   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:56:42,056-Speed 5976.47 samples/sec   Loss 10.6939   LearningRate 0.2337   Epoch: 6   Global Step: 64720   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:56:48,919-Speed 5968.96 samples/sec   Loss 10.6353   LearningRate 0.2337   Epoch: 6   Global Step: 64730   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:56:55,775-Speed 5975.70 samples/sec   Loss 10.6115   LearningRate 0.2336   Epoch: 6   Global Step: 64740   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:57:02,641-Speed 5966.76 samples/sec   Loss 10.6626   LearningRate 0.2336   Epoch: 6   Global Step: 64750   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:57:09,510-Speed 5964.74 samples/sec   Loss 10.7294   LearningRate 0.2336   Epoch: 6   Global Step: 64760   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:57:16,369-Speed 5975.92 samples/sec   Loss 10.6292   LearningRate 0.2335   Epoch: 6   Global Step: 64770   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:57:23,224-Speed 5976.33 samples/sec   Loss 10.6701   LearningRate 0.2335   Epoch: 6   Global Step: 64780   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:57:30,078-Speed 5976.91 samples/sec   Loss 10.6445   LearningRate 0.2335   Epoch: 6   Global Step: 64790   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:57:36,951-Speed 5961.11 samples/sec   Loss 10.6580   LearningRate 0.2334   Epoch: 6   Global Step: 64800   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:57:43,800-Speed 5980.73 samples/sec   Loss 10.7117   LearningRate 0.2334   Epoch: 6   Global Step: 64810   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:57:50,673-Speed 5960.56 samples/sec   Loss 10.6103   LearningRate 0.2334   Epoch: 6   Global Step: 64820   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:57:57,619-Speed 5898.45 samples/sec   Loss 10.5697   LearningRate 0.2333   Epoch: 6   Global Step: 64830   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:58:04,470-Speed 5979.19 samples/sec   Loss 10.6268   LearningRate 0.2333   Epoch: 6   Global Step: 64840   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:58:11,326-Speed 5975.58 samples/sec   Loss 10.5985   LearningRate 0.2333   Epoch: 6   Global Step: 64850   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:58:18,177-Speed 5980.19 samples/sec   Loss 10.6333   LearningRate 0.2332   Epoch: 6   Global Step: 64860   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:58:25,045-Speed 5965.45 samples/sec   Loss 10.5853   LearningRate 0.2332   Epoch: 6   Global Step: 64870   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:58:31,973-Speed 5913.43 samples/sec   Loss 10.6578   LearningRate 0.2332   Epoch: 6   Global Step: 64880   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:58:38,910-Speed 5905.47 samples/sec   Loss 10.6018   LearningRate 0.2331   Epoch: 6   Global Step: 64890   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:58:46,646-Speed 5295.69 samples/sec   Loss 10.6330   LearningRate 0.2331   Epoch: 6   Global Step: 64900   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:58:53,493-Speed 5983.71 samples/sec   Loss 10.5792   LearningRate 0.2331   Epoch: 6   Global Step: 64910   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:59:00,359-Speed 5966.48 samples/sec   Loss 10.6072   LearningRate 0.2330   Epoch: 6   Global Step: 64920   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:59:07,205-Speed 5983.88 samples/sec   Loss 10.5957   LearningRate 0.2330   Epoch: 6   Global Step: 64930   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:59:14,079-Speed 5959.91 samples/sec   Loss 10.5834   LearningRate 0.2330   Epoch: 6   Global Step: 64940   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:59:20,921-Speed 5988.17 samples/sec   Loss 10.5960   LearningRate 0.2329   Epoch: 6   Global Step: 64950   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:59:27,784-Speed 5969.84 samples/sec   Loss 10.6321   LearningRate 0.2329   Epoch: 6   Global Step: 64960   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:59:34,760-Speed 5872.95 samples/sec   Loss 10.7057   LearningRate 0.2329   Epoch: 6   Global Step: 64970   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:59:41,631-Speed 5962.31 samples/sec   Loss 10.6494   LearningRate 0.2328   Epoch: 6   Global Step: 64980   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 08:59:48,492-Speed 5969.90 samples/sec   Loss 10.5446   LearningRate 0.2328   Epoch: 6   Global Step: 64990   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 08:59:55,349-Speed 5975.39 samples/sec   Loss 10.5977   LearningRate 0.2328   Epoch: 6   Global Step: 65000   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 09:00:21,977-[lfw][65000]XNorm: 22.888245
Training: 2022-01-08 09:00:21,977-[lfw][65000]Accuracy-Flip: 0.99617+-0.00279
Training: 2022-01-08 09:00:21,978-[lfw][65000]Accuracy-Highest: 0.99700
Training: 2022-01-08 09:00:52,799-[cfp_fp][65000]XNorm: 20.112521
Training: 2022-01-08 09:00:52,800-[cfp_fp][65000]Accuracy-Flip: 0.97129+-0.00893
Training: 2022-01-08 09:00:52,801-[cfp_fp][65000]Accuracy-Highest: 0.97686
Training: 2022-01-08 09:01:19,461-[agedb_30][65000]XNorm: 22.095434
Training: 2022-01-08 09:01:19,462-[agedb_30][65000]Accuracy-Flip: 0.96633+-0.00777
Training: 2022-01-08 09:01:19,463-[agedb_30][65000]Accuracy-Highest: 0.96633
Training: 2022-01-08 09:01:26,344-Speed 450.14 samples/sec   Loss 10.6133   LearningRate 0.2327   Epoch: 6   Global Step: 65010   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 09:01:33,188-Speed 5986.94 samples/sec   Loss 10.6660   LearningRate 0.2327   Epoch: 6   Global Step: 65020   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:01:40,062-Speed 5959.95 samples/sec   Loss 10.6214   LearningRate 0.2327   Epoch: 6   Global Step: 65030   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:01:46,917-Speed 5976.37 samples/sec   Loss 10.5562   LearningRate 0.2326   Epoch: 6   Global Step: 65040   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:01:53,783-Speed 5966.54 samples/sec   Loss 10.5842   LearningRate 0.2326   Epoch: 6   Global Step: 65050   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:02:00,668-Speed 5950.57 samples/sec   Loss 10.6301   LearningRate 0.2326   Epoch: 6   Global Step: 65060   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:02:07,538-Speed 5963.59 samples/sec   Loss 10.6431   LearningRate 0.2325   Epoch: 6   Global Step: 65070   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:02:14,415-Speed 5956.80 samples/sec   Loss 10.6042   LearningRate 0.2325   Epoch: 6   Global Step: 65080   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:02:21,325-Speed 5929.27 samples/sec   Loss 10.6584   LearningRate 0.2325   Epoch: 6   Global Step: 65090   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:02:28,268-Speed 5900.60 samples/sec   Loss 10.6332   LearningRate 0.2324   Epoch: 6   Global Step: 65100   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:02:35,157-Speed 5947.08 samples/sec   Loss 10.6278   LearningRate 0.2324   Epoch: 6   Global Step: 65110   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:02:42,037-Speed 5954.18 samples/sec   Loss 10.6548   LearningRate 0.2324   Epoch: 6   Global Step: 65120   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 09:02:48,890-Speed 5977.36 samples/sec   Loss 10.5392   LearningRate 0.2324   Epoch: 6   Global Step: 65130   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:02:55,752-Speed 5971.03 samples/sec   Loss 10.6050   LearningRate 0.2323   Epoch: 6   Global Step: 65140   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:03:02,610-Speed 5975.90 samples/sec   Loss 10.5267   LearningRate 0.2323   Epoch: 6   Global Step: 65150   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:03:09,459-Speed 5981.43 samples/sec   Loss 10.6128   LearningRate 0.2323   Epoch: 6   Global Step: 65160   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:03:16,327-Speed 5965.24 samples/sec   Loss 10.6263   LearningRate 0.2322   Epoch: 6   Global Step: 65170   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:03:23,204-Speed 5959.51 samples/sec   Loss 10.6447   LearningRate 0.2322   Epoch: 6   Global Step: 65180   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:03:30,051-Speed 5983.62 samples/sec   Loss 10.6914   LearningRate 0.2322   Epoch: 6   Global Step: 65190   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:03:36,922-Speed 5961.86 samples/sec   Loss 10.6459   LearningRate 0.2321   Epoch: 6   Global Step: 65200   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:03:43,804-Speed 5955.80 samples/sec   Loss 10.5656   LearningRate 0.2321   Epoch: 6   Global Step: 65210   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:03:50,666-Speed 5970.12 samples/sec   Loss 10.6039   LearningRate 0.2321   Epoch: 6   Global Step: 65220   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:03:57,519-Speed 5977.93 samples/sec   Loss 10.6253   LearningRate 0.2320   Epoch: 6   Global Step: 65230   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:04:04,374-Speed 5976.09 samples/sec   Loss 10.6174   LearningRate 0.2320   Epoch: 6   Global Step: 65240   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:04:11,225-Speed 5979.89 samples/sec   Loss 10.6464   LearningRate 0.2320   Epoch: 6   Global Step: 65250   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:04:18,081-Speed 5975.60 samples/sec   Loss 10.6118   LearningRate 0.2319   Epoch: 6   Global Step: 65260   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:04:24,944-Speed 5970.23 samples/sec   Loss 10.6076   LearningRate 0.2319   Epoch: 6   Global Step: 65270   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:04:31,796-Speed 5978.75 samples/sec   Loss 10.5401   LearningRate 0.2319   Epoch: 6   Global Step: 65280   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:04:38,663-Speed 5966.55 samples/sec   Loss 10.6075   LearningRate 0.2318   Epoch: 6   Global Step: 65290   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:04:45,539-Speed 5958.01 samples/sec   Loss 10.6178   LearningRate 0.2318   Epoch: 6   Global Step: 65300   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:04:52,407-Speed 5964.44 samples/sec   Loss 10.5755   LearningRate 0.2318   Epoch: 6   Global Step: 65310   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:04:59,271-Speed 5969.03 samples/sec   Loss 10.5894   LearningRate 0.2317   Epoch: 6   Global Step: 65320   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:05:06,149-Speed 5956.47 samples/sec   Loss 10.6551   LearningRate 0.2317   Epoch: 6   Global Step: 65330   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 09:05:13,030-Speed 5954.18 samples/sec   Loss 10.7148   LearningRate 0.2317   Epoch: 6   Global Step: 65340   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 09:05:19,895-Speed 5967.64 samples/sec   Loss 10.5804   LearningRate 0.2316   Epoch: 6   Global Step: 65350   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 09:05:26,749-Speed 5977.55 samples/sec   Loss 10.5976   LearningRate 0.2316   Epoch: 6   Global Step: 65360   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 09:05:33,612-Speed 5969.25 samples/sec   Loss 10.5840   LearningRate 0.2316   Epoch: 6   Global Step: 65370   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 09:05:40,464-Speed 5981.96 samples/sec   Loss 10.5484   LearningRate 0.2315   Epoch: 6   Global Step: 65380   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:05:47,322-Speed 5972.87 samples/sec   Loss 10.5813   LearningRate 0.2315   Epoch: 6   Global Step: 65390   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:05:54,273-Speed 5894.11 samples/sec   Loss 10.5782   LearningRate 0.2315   Epoch: 6   Global Step: 65400   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:06:01,240-Speed 5880.98 samples/sec   Loss 10.5766   LearningRate 0.2314   Epoch: 6   Global Step: 65410   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:06:08,187-Speed 5896.90 samples/sec   Loss 10.6271   LearningRate 0.2314   Epoch: 6   Global Step: 65420   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:06:15,129-Speed 5901.98 samples/sec   Loss 10.6035   LearningRate 0.2314   Epoch: 6   Global Step: 65430   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:06:22,049-Speed 5920.17 samples/sec   Loss 10.6003   LearningRate 0.2313   Epoch: 6   Global Step: 65440   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:06:28,968-Speed 5922.41 samples/sec   Loss 10.6754   LearningRate 0.2313   Epoch: 6   Global Step: 65450   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:06:35,821-Speed 5981.47 samples/sec   Loss 10.5832   LearningRate 0.2313   Epoch: 6   Global Step: 65460   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:06:42,671-Speed 5980.16 samples/sec   Loss 10.5582   LearningRate 0.2312   Epoch: 6   Global Step: 65470   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:06:49,530-Speed 5972.64 samples/sec   Loss 10.5422   LearningRate 0.2312   Epoch: 6   Global Step: 65480   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 09:06:56,394-Speed 5968.44 samples/sec   Loss 10.6379   LearningRate 0.2312   Epoch: 6   Global Step: 65490   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 09:07:03,275-Speed 5953.61 samples/sec   Loss 10.5908   LearningRate 0.2311   Epoch: 6   Global Step: 65500   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 09:07:10,151-Speed 5961.24 samples/sec   Loss 10.5224   LearningRate 0.2311   Epoch: 6   Global Step: 65510   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 09:07:16,996-Speed 5984.43 samples/sec   Loss 10.5992   LearningRate 0.2311   Epoch: 6   Global Step: 65520   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:07:23,852-Speed 5975.52 samples/sec   Loss 10.6024   LearningRate 0.2310   Epoch: 6   Global Step: 65530   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:07:30,705-Speed 5978.18 samples/sec   Loss 10.5765   LearningRate 0.2310   Epoch: 6   Global Step: 65540   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:07:37,572-Speed 5966.74 samples/sec   Loss 10.5534   LearningRate 0.2310   Epoch: 6   Global Step: 65550   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:07:44,442-Speed 5965.70 samples/sec   Loss 10.5645   LearningRate 0.2309   Epoch: 6   Global Step: 65560   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:07:51,302-Speed 5971.63 samples/sec   Loss 10.6355   LearningRate 0.2309   Epoch: 6   Global Step: 65570   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:07:58,170-Speed 5965.26 samples/sec   Loss 10.4668   LearningRate 0.2309   Epoch: 6   Global Step: 65580   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:08:05,049-Speed 5955.41 samples/sec   Loss 10.5975   LearningRate 0.2309   Epoch: 6   Global Step: 65590   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:08:11,917-Speed 5964.88 samples/sec   Loss 10.5549   LearningRate 0.2308   Epoch: 6   Global Step: 65600   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:08:18,800-Speed 5952.46 samples/sec   Loss 10.5455   LearningRate 0.2308   Epoch: 6   Global Step: 65610   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:08:25,652-Speed 5979.17 samples/sec   Loss 10.5467   LearningRate 0.2308   Epoch: 6   Global Step: 65620   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 09:08:32,529-Speed 5956.24 samples/sec   Loss 10.5363   LearningRate 0.2307   Epoch: 6   Global Step: 65630   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 09:08:39,394-Speed 5968.17 samples/sec   Loss 10.5298   LearningRate 0.2307   Epoch: 6   Global Step: 65640   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 09:08:46,251-Speed 5975.06 samples/sec   Loss 10.4721   LearningRate 0.2307   Epoch: 6   Global Step: 65650   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:08:53,115-Speed 5968.03 samples/sec   Loss 10.6230   LearningRate 0.2306   Epoch: 6   Global Step: 65660   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:08:59,969-Speed 5976.81 samples/sec   Loss 10.5874   LearningRate 0.2306   Epoch: 6   Global Step: 65670   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:09:06,825-Speed 5975.91 samples/sec   Loss 10.5905   LearningRate 0.2306   Epoch: 6   Global Step: 65680   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:09:13,690-Speed 5967.24 samples/sec   Loss 10.5595   LearningRate 0.2305   Epoch: 6   Global Step: 65690   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:09:20,542-Speed 5979.42 samples/sec   Loss 10.5446   LearningRate 0.2305   Epoch: 6   Global Step: 65700   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:09:27,411-Speed 5964.12 samples/sec   Loss 10.6220   LearningRate 0.2305   Epoch: 6   Global Step: 65710   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:09:34,266-Speed 5975.71 samples/sec   Loss 10.5656   LearningRate 0.2304   Epoch: 6   Global Step: 65720   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:09:41,133-Speed 5965.86 samples/sec   Loss 10.5175   LearningRate 0.2304   Epoch: 6   Global Step: 65730   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:09:48,017-Speed 5952.14 samples/sec   Loss 10.5752   LearningRate 0.2304   Epoch: 6   Global Step: 65740   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:09:54,900-Speed 5951.24 samples/sec   Loss 10.5985   LearningRate 0.2303   Epoch: 6   Global Step: 65750   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 09:10:01,757-Speed 5974.89 samples/sec   Loss 10.6393   LearningRate 0.2303   Epoch: 6   Global Step: 65760   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 09:10:08,618-Speed 5971.02 samples/sec   Loss 10.6276   LearningRate 0.2303   Epoch: 6   Global Step: 65770   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 09:10:15,472-Speed 5976.63 samples/sec   Loss 10.5937   LearningRate 0.2302   Epoch: 6   Global Step: 65780   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 09:10:22,354-Speed 5954.93 samples/sec   Loss 10.5380   LearningRate 0.2302   Epoch: 6   Global Step: 65790   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 09:10:29,239-Speed 5950.04 samples/sec   Loss 10.5954   LearningRate 0.2302   Epoch: 6   Global Step: 65800   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 09:10:36,084-Speed 5984.97 samples/sec   Loss 10.5497   LearningRate 0.2301   Epoch: 6   Global Step: 65810   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:10:42,939-Speed 5976.52 samples/sec   Loss 10.4875   LearningRate 0.2301   Epoch: 6   Global Step: 65820   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:10:49,794-Speed 5979.55 samples/sec   Loss 10.5323   LearningRate 0.2301   Epoch: 6   Global Step: 65830   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:10:56,669-Speed 5958.44 samples/sec   Loss 10.6024   LearningRate 0.2300   Epoch: 6   Global Step: 65840   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:11:03,520-Speed 5979.70 samples/sec   Loss 10.5345   LearningRate 0.2300   Epoch: 6   Global Step: 65850   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:11:10,384-Speed 5968.69 samples/sec   Loss 10.5635   LearningRate 0.2300   Epoch: 6   Global Step: 65860   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:11:17,278-Speed 5942.76 samples/sec   Loss 10.5068   LearningRate 0.2299   Epoch: 6   Global Step: 65870   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:11:24,135-Speed 5974.74 samples/sec   Loss 10.5438   LearningRate 0.2299   Epoch: 6   Global Step: 65880   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:11:30,989-Speed 5976.98 samples/sec   Loss 10.5270   LearningRate 0.2299   Epoch: 6   Global Step: 65890   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:11:37,861-Speed 5960.89 samples/sec   Loss 10.5806   LearningRate 0.2298   Epoch: 6   Global Step: 65900   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:11:44,736-Speed 5959.44 samples/sec   Loss 10.5364   LearningRate 0.2298   Epoch: 6   Global Step: 65910   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 09:11:51,604-Speed 5967.35 samples/sec   Loss 10.5248   LearningRate 0.2298   Epoch: 6   Global Step: 65920   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:11:58,452-Speed 5982.07 samples/sec   Loss 10.5560   LearningRate 0.2297   Epoch: 6   Global Step: 65930   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 09:12:05,328-Speed 5958.83 samples/sec   Loss 10.5737   LearningRate 0.2297   Epoch: 6   Global Step: 65940   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 09:12:12,201-Speed 5961.32 samples/sec   Loss 10.6359   LearningRate 0.2297   Epoch: 6   Global Step: 65950   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 09:12:19,062-Speed 5971.48 samples/sec   Loss 10.5676   LearningRate 0.2296   Epoch: 6   Global Step: 65960   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 09:12:25,942-Speed 5954.52 samples/sec   Loss 10.5909   LearningRate 0.2296   Epoch: 6   Global Step: 65970   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 09:12:32,806-Speed 5968.50 samples/sec   Loss 10.4957   LearningRate 0.2296   Epoch: 6   Global Step: 65980   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 09:12:39,677-Speed 5962.34 samples/sec   Loss 10.5451   LearningRate 0.2296   Epoch: 6   Global Step: 65990   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 09:12:46,537-Speed 5972.44 samples/sec   Loss 10.5075   LearningRate 0.2295   Epoch: 6   Global Step: 66000   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 09:12:53,429-Speed 5944.27 samples/sec   Loss 10.5002   LearningRate 0.2295   Epoch: 6   Global Step: 66010   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 09:13:00,288-Speed 5972.12 samples/sec   Loss 10.4738   LearningRate 0.2295   Epoch: 6   Global Step: 66020   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 09:13:07,183-Speed 5941.53 samples/sec   Loss 10.5184   LearningRate 0.2294   Epoch: 6   Global Step: 66030   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:13:14,043-Speed 5972.14 samples/sec   Loss 10.5228   LearningRate 0.2294   Epoch: 6   Global Step: 66040   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:13:20,939-Speed 5941.05 samples/sec   Loss 10.4903   LearningRate 0.2294   Epoch: 6   Global Step: 66050   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:13:27,820-Speed 5953.95 samples/sec   Loss 10.5282   LearningRate 0.2293   Epoch: 6   Global Step: 66060   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:13:34,668-Speed 5982.34 samples/sec   Loss 10.4852   LearningRate 0.2293   Epoch: 6   Global Step: 66070   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:13:41,536-Speed 5965.26 samples/sec   Loss 10.5411   LearningRate 0.2293   Epoch: 6   Global Step: 66080   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:13:48,413-Speed 5957.19 samples/sec   Loss 10.6155   LearningRate 0.2292   Epoch: 6   Global Step: 66090   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:13:55,357-Speed 5899.58 samples/sec   Loss 10.5497   LearningRate 0.2292   Epoch: 6   Global Step: 66100   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:14:02,216-Speed 5972.88 samples/sec   Loss 10.4992   LearningRate 0.2292   Epoch: 6   Global Step: 66110   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:14:09,092-Speed 5958.13 samples/sec   Loss 10.5346   LearningRate 0.2291   Epoch: 6   Global Step: 66120   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:14:15,960-Speed 5965.73 samples/sec   Loss 10.5364   LearningRate 0.2291   Epoch: 6   Global Step: 66130   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 09:14:22,873-Speed 5926.19 samples/sec   Loss 10.5425   LearningRate 0.2291   Epoch: 6   Global Step: 66140   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:14:29,719-Speed 5983.50 samples/sec   Loss 10.5482   LearningRate 0.2290   Epoch: 6   Global Step: 66150   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:14:36,607-Speed 5948.24 samples/sec   Loss 10.5481   LearningRate 0.2290   Epoch: 6   Global Step: 66160   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:14:43,477-Speed 5962.81 samples/sec   Loss 10.4913   LearningRate 0.2290   Epoch: 6   Global Step: 66170   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:14:50,350-Speed 5961.49 samples/sec   Loss 10.4771   LearningRate 0.2289   Epoch: 6   Global Step: 66180   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:14:57,215-Speed 5966.96 samples/sec   Loss 10.5271   LearningRate 0.2289   Epoch: 6   Global Step: 66190   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:15:04,069-Speed 5977.19 samples/sec   Loss 10.5257   LearningRate 0.2289   Epoch: 6   Global Step: 66200   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:15:10,926-Speed 5985.90 samples/sec   Loss 10.5441   LearningRate 0.2288   Epoch: 6   Global Step: 66210   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:15:17,773-Speed 5983.64 samples/sec   Loss 10.4649   LearningRate 0.2288   Epoch: 6   Global Step: 66220   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:15:24,628-Speed 5975.59 samples/sec   Loss 10.5454   LearningRate 0.2288   Epoch: 6   Global Step: 66230   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:15:31,499-Speed 5964.03 samples/sec   Loss 10.5737   LearningRate 0.2287   Epoch: 6   Global Step: 66240   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 09:15:38,369-Speed 5963.45 samples/sec   Loss 10.5359   LearningRate 0.2287   Epoch: 6   Global Step: 66250   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:15:45,225-Speed 5974.93 samples/sec   Loss 10.4769   LearningRate 0.2287   Epoch: 6   Global Step: 66260   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:15:52,073-Speed 5982.41 samples/sec   Loss 10.5121   LearningRate 0.2286   Epoch: 6   Global Step: 66270   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:15:58,949-Speed 5958.20 samples/sec   Loss 10.5588   LearningRate 0.2286   Epoch: 6   Global Step: 66280   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:16:05,803-Speed 5976.78 samples/sec   Loss 10.5326   LearningRate 0.2286   Epoch: 6   Global Step: 66290   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:16:12,662-Speed 5972.39 samples/sec   Loss 10.5411   LearningRate 0.2285   Epoch: 6   Global Step: 66300   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:16:19,534-Speed 5961.63 samples/sec   Loss 10.5517   LearningRate 0.2285   Epoch: 6   Global Step: 66310   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:16:26,415-Speed 5953.97 samples/sec   Loss 10.4870   LearningRate 0.2285   Epoch: 6   Global Step: 66320   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:16:33,327-Speed 5926.60 samples/sec   Loss 10.4995   LearningRate 0.2284   Epoch: 6   Global Step: 66330   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:16:40,199-Speed 5961.69 samples/sec   Loss 10.5185   LearningRate 0.2284   Epoch: 6   Global Step: 66340   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:16:47,054-Speed 5980.43 samples/sec   Loss 10.5102   LearningRate 0.2284   Epoch: 6   Global Step: 66350   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 09:16:53,893-Speed 5989.87 samples/sec   Loss 10.4629   LearningRate 0.2284   Epoch: 6   Global Step: 66360   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:17:00,750-Speed 5976.21 samples/sec   Loss 10.5549   LearningRate 0.2283   Epoch: 6   Global Step: 66370   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:17:07,627-Speed 5956.62 samples/sec   Loss 10.4156   LearningRate 0.2283   Epoch: 6   Global Step: 66380   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:17:14,490-Speed 5969.60 samples/sec   Loss 10.5524   LearningRate 0.2283   Epoch: 6   Global Step: 66390   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:17:21,353-Speed 5969.14 samples/sec   Loss 10.4900   LearningRate 0.2282   Epoch: 6   Global Step: 66400   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:17:28,215-Speed 5981.51 samples/sec   Loss 10.4578   LearningRate 0.2282   Epoch: 6   Global Step: 66410   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:17:35,064-Speed 5981.22 samples/sec   Loss 10.5398   LearningRate 0.2282   Epoch: 6   Global Step: 66420   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:17:41,946-Speed 5952.54 samples/sec   Loss 10.5005   LearningRate 0.2281   Epoch: 6   Global Step: 66430   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:17:48,816-Speed 5963.23 samples/sec   Loss 10.5139   LearningRate 0.2281   Epoch: 6   Global Step: 66440   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:17:55,666-Speed 5980.90 samples/sec   Loss 10.5128   LearningRate 0.2281   Epoch: 6   Global Step: 66450   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:18:02,520-Speed 5976.76 samples/sec   Loss 10.4889   LearningRate 0.2280   Epoch: 6   Global Step: 66460   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:18:09,378-Speed 5974.29 samples/sec   Loss 10.5245   LearningRate 0.2280   Epoch: 6   Global Step: 66470   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:18:16,226-Speed 5981.10 samples/sec   Loss 10.4563   LearningRate 0.2280   Epoch: 6   Global Step: 66480   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:18:23,080-Speed 5977.29 samples/sec   Loss 10.5169   LearningRate 0.2279   Epoch: 6   Global Step: 66490   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:18:30,307-Speed 5668.89 samples/sec   Loss 10.5460   LearningRate 0.2279   Epoch: 6   Global Step: 66500   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:18:37,176-Speed 5964.60 samples/sec   Loss 10.5861   LearningRate 0.2279   Epoch: 6   Global Step: 66510   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 09:18:44,037-Speed 5971.12 samples/sec   Loss 10.4917   LearningRate 0.2278   Epoch: 6   Global Step: 66520   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 09:18:50,918-Speed 5954.61 samples/sec   Loss 10.3961   LearningRate 0.2278   Epoch: 6   Global Step: 66530   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 09:18:57,797-Speed 5955.78 samples/sec   Loss 10.4971   LearningRate 0.2278   Epoch: 6   Global Step: 66540   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 09:19:04,651-Speed 5977.15 samples/sec   Loss 10.5283   LearningRate 0.2277   Epoch: 6   Global Step: 66550   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 09:19:11,502-Speed 5980.07 samples/sec   Loss 10.6117   LearningRate 0.2277   Epoch: 6   Global Step: 66560   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 09:19:18,366-Speed 5970.39 samples/sec   Loss 10.5263   LearningRate 0.2277   Epoch: 6   Global Step: 66570   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 09:19:25,226-Speed 5972.55 samples/sec   Loss 10.4839   LearningRate 0.2276   Epoch: 6   Global Step: 66580   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 09:19:32,077-Speed 5979.84 samples/sec   Loss 10.3837   LearningRate 0.2276   Epoch: 6   Global Step: 66590   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 09:19:38,955-Speed 5956.29 samples/sec   Loss 10.4422   LearningRate 0.2276   Epoch: 6   Global Step: 66600   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-01-08 09:19:45,940-Speed 5864.71 samples/sec   Loss 10.5011   LearningRate 0.2275   Epoch: 6   Global Step: 66610   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:19:52,823-Speed 5954.26 samples/sec   Loss 10.5155   LearningRate 0.2275   Epoch: 6   Global Step: 66620   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:19:59,689-Speed 5966.67 samples/sec   Loss 10.5735   LearningRate 0.2275   Epoch: 6   Global Step: 66630   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:20:06,560-Speed 5962.29 samples/sec   Loss 10.5229   LearningRate 0.2274   Epoch: 6   Global Step: 66640   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:20:13,436-Speed 5959.00 samples/sec   Loss 10.4780   LearningRate 0.2274   Epoch: 6   Global Step: 66650   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:20:20,329-Speed 5943.56 samples/sec   Loss 10.4364   LearningRate 0.2274   Epoch: 6   Global Step: 66660   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:20:27,229-Speed 5937.52 samples/sec   Loss 10.4736   LearningRate 0.2273   Epoch: 6   Global Step: 66670   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:20:34,105-Speed 5958.97 samples/sec   Loss 10.5564   LearningRate 0.2273   Epoch: 6   Global Step: 66680   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:20:41,031-Speed 5914.98 samples/sec   Loss 10.5558   LearningRate 0.2273   Epoch: 6   Global Step: 66690   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:20:47,923-Speed 5943.87 samples/sec   Loss 10.4771   LearningRate 0.2273   Epoch: 6   Global Step: 66700   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:20:54,815-Speed 5944.63 samples/sec   Loss 10.5581   LearningRate 0.2272   Epoch: 6   Global Step: 66710   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:21:01,671-Speed 5975.40 samples/sec   Loss 10.4548   LearningRate 0.2272   Epoch: 6   Global Step: 66720   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:21:08,527-Speed 5975.85 samples/sec   Loss 10.5183   LearningRate 0.2272   Epoch: 6   Global Step: 66730   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:21:15,374-Speed 5983.25 samples/sec   Loss 10.4610   LearningRate 0.2271   Epoch: 6   Global Step: 66740   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:21:22,235-Speed 5971.57 samples/sec   Loss 10.5353   LearningRate 0.2271   Epoch: 6   Global Step: 66750   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:21:29,107-Speed 5964.30 samples/sec   Loss 10.5088   LearningRate 0.2271   Epoch: 6   Global Step: 66760   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:21:35,995-Speed 5948.51 samples/sec   Loss 10.4583   LearningRate 0.2270   Epoch: 6   Global Step: 66770   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:21:42,849-Speed 5976.72 samples/sec   Loss 10.4176   LearningRate 0.2270   Epoch: 6   Global Step: 66780   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:21:49,701-Speed 5979.39 samples/sec   Loss 10.4433   LearningRate 0.2270   Epoch: 6   Global Step: 66790   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:21:56,569-Speed 5965.20 samples/sec   Loss 10.5771   LearningRate 0.2269   Epoch: 6   Global Step: 66800   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:22:03,445-Speed 5957.37 samples/sec   Loss 10.5141   LearningRate 0.2269   Epoch: 6   Global Step: 66810   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 09:22:10,286-Speed 5988.85 samples/sec   Loss 10.4870   LearningRate 0.2269   Epoch: 6   Global Step: 66820   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:22:17,151-Speed 5967.52 samples/sec   Loss 10.4096   LearningRate 0.2268   Epoch: 6   Global Step: 66830   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:22:24,131-Speed 5869.56 samples/sec   Loss 10.4653   LearningRate 0.2268   Epoch: 6   Global Step: 66840   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:22:31,013-Speed 5954.34 samples/sec   Loss 10.4367   LearningRate 0.2268   Epoch: 6   Global Step: 66850   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:22:37,900-Speed 5949.16 samples/sec   Loss 10.5024   LearningRate 0.2267   Epoch: 6   Global Step: 66860   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:22:44,770-Speed 5962.64 samples/sec   Loss 10.4492   LearningRate 0.2267   Epoch: 6   Global Step: 66870   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:22:51,631-Speed 5972.07 samples/sec   Loss 10.5038   LearningRate 0.2267   Epoch: 6   Global Step: 66880   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:22:58,498-Speed 5965.31 samples/sec   Loss 10.4683   LearningRate 0.2266   Epoch: 6   Global Step: 66890   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:23:05,351-Speed 5978.16 samples/sec   Loss 10.3679   LearningRate 0.2266   Epoch: 6   Global Step: 66900   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:23:12,202-Speed 5981.34 samples/sec   Loss 10.4878   LearningRate 0.2266   Epoch: 6   Global Step: 66910   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:23:19,067-Speed 5967.91 samples/sec   Loss 10.4106   LearningRate 0.2265   Epoch: 6   Global Step: 66920   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 09:23:25,939-Speed 5961.46 samples/sec   Loss 10.4848   LearningRate 0.2265   Epoch: 6   Global Step: 66930   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 09:23:32,791-Speed 5978.63 samples/sec   Loss 10.5061   LearningRate 0.2265   Epoch: 6   Global Step: 66940   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:23:39,663-Speed 5962.03 samples/sec   Loss 10.4832   LearningRate 0.2264   Epoch: 6   Global Step: 66950   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:23:46,517-Speed 5976.52 samples/sec   Loss 10.5236   LearningRate 0.2264   Epoch: 6   Global Step: 66960   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:23:53,367-Speed 5980.27 samples/sec   Loss 10.4724   LearningRate 0.2264   Epoch: 6   Global Step: 66970   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:24:00,235-Speed 5965.11 samples/sec   Loss 10.4818   LearningRate 0.2263   Epoch: 6   Global Step: 66980   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:24:07,109-Speed 5960.13 samples/sec   Loss 10.4337   LearningRate 0.2263   Epoch: 6   Global Step: 66990   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:24:13,986-Speed 5957.56 samples/sec   Loss 10.4097   LearningRate 0.2263   Epoch: 6   Global Step: 67000   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:24:20,882-Speed 5942.76 samples/sec   Loss 10.4558   LearningRate 0.2263   Epoch: 6   Global Step: 67010   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:24:27,771-Speed 5946.92 samples/sec   Loss 10.4518   LearningRate 0.2262   Epoch: 6   Global Step: 67020   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:24:34,616-Speed 5985.04 samples/sec   Loss 10.4987   LearningRate 0.2262   Epoch: 6   Global Step: 67030   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:24:41,464-Speed 5982.66 samples/sec   Loss 10.4887   LearningRate 0.2262   Epoch: 6   Global Step: 67040   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 09:24:48,328-Speed 5968.52 samples/sec   Loss 10.5133   LearningRate 0.2261   Epoch: 6   Global Step: 67050   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-01-08 09:24:55,175-Speed 5983.65 samples/sec   Loss 10.4385   LearningRate 0.2261   Epoch: 6   Global Step: 67060   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:25:02,167-Speed 5860.78 samples/sec   Loss 10.4554   LearningRate 0.2261   Epoch: 6   Global Step: 67070   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:25:09,033-Speed 5966.51 samples/sec   Loss 10.4702   LearningRate 0.2260   Epoch: 6   Global Step: 67080   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:25:15,889-Speed 5976.99 samples/sec   Loss 10.4637   LearningRate 0.2260   Epoch: 6   Global Step: 67090   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:25:22,744-Speed 5976.81 samples/sec   Loss 10.5532   LearningRate 0.2260   Epoch: 6   Global Step: 67100   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-01-08 09:25:29,591-Speed 5982.42 samples/sec   Loss 10.4648   LearningRate 0.2259   Epoch: 6   Global Step: 67110   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:25:36,447-Speed 5975.44 samples/sec   Loss 10.3794   LearningRate 0.2259   Epoch: 6   Global Step: 67120   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:25:43,335-Speed 5948.12 samples/sec   Loss 10.4418   LearningRate 0.2259   Epoch: 6   Global Step: 67130   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:25:50,216-Speed 5953.47 samples/sec   Loss 10.4780   LearningRate 0.2258   Epoch: 6   Global Step: 67140   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:25:57,109-Speed 5943.91 samples/sec   Loss 10.3894   LearningRate 0.2258   Epoch: 6   Global Step: 67150   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:26:03,979-Speed 5963.63 samples/sec   Loss 10.5559   LearningRate 0.2258   Epoch: 6   Global Step: 67160   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:26:10,852-Speed 5960.45 samples/sec   Loss 10.5401   LearningRate 0.2257   Epoch: 6   Global Step: 67170   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:26:18,895-Speed 5093.56 samples/sec   Loss 10.4038   LearningRate 0.2257   Epoch: 6   Global Step: 67180   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:26:25,737-Speed 5988.47 samples/sec   Loss 10.4231   LearningRate 0.2257   Epoch: 6   Global Step: 67190   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:26:32,602-Speed 5967.16 samples/sec   Loss 10.4587   LearningRate 0.2256   Epoch: 6   Global Step: 67200   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:26:39,447-Speed 5985.65 samples/sec   Loss 10.4848   LearningRate 0.2256   Epoch: 6   Global Step: 67210   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:26:46,411-Speed 5882.61 samples/sec   Loss 10.4856   LearningRate 0.2256   Epoch: 6   Global Step: 67220   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:26:53,256-Speed 5985.90 samples/sec   Loss 10.4379   LearningRate 0.2255   Epoch: 6   Global Step: 67230   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:27:00,118-Speed 5969.93 samples/sec   Loss 10.4176   LearningRate 0.2255   Epoch: 6   Global Step: 67240   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:27:06,967-Speed 5981.83 samples/sec   Loss 10.3998   LearningRate 0.2255   Epoch: 6   Global Step: 67250   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:27:13,821-Speed 5976.55 samples/sec   Loss 10.3824   LearningRate 0.2254   Epoch: 6   Global Step: 67260   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 09:27:20,701-Speed 5954.85 samples/sec   Loss 10.3792   LearningRate 0.2254   Epoch: 6   Global Step: 67270   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 09:27:27,554-Speed 5978.74 samples/sec   Loss 10.4500   LearningRate 0.2254   Epoch: 6   Global Step: 67280   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:27:34,431-Speed 5956.40 samples/sec   Loss 10.4407   LearningRate 0.2253   Epoch: 6   Global Step: 67290   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:27:41,279-Speed 5982.76 samples/sec   Loss 10.3543   LearningRate 0.2253   Epoch: 6   Global Step: 67300   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:27:48,152-Speed 5963.53 samples/sec   Loss 10.4864   LearningRate 0.2253   Epoch: 6   Global Step: 67310   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:27:55,010-Speed 5974.72 samples/sec   Loss 10.4660   LearningRate 0.2253   Epoch: 6   Global Step: 67320   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:28:01,870-Speed 5971.64 samples/sec   Loss 10.4728   LearningRate 0.2252   Epoch: 6   Global Step: 67330   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:28:08,732-Speed 5970.54 samples/sec   Loss 10.5210   LearningRate 0.2252   Epoch: 6   Global Step: 67340   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:28:15,588-Speed 5976.21 samples/sec   Loss 10.4603   LearningRate 0.2252   Epoch: 6   Global Step: 67350   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:28:22,463-Speed 5961.08 samples/sec   Loss 10.4035   LearningRate 0.2251   Epoch: 6   Global Step: 67360   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:28:29,324-Speed 5970.35 samples/sec   Loss 10.4360   LearningRate 0.2251   Epoch: 6   Global Step: 67370   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:28:36,187-Speed 5971.80 samples/sec   Loss 10.4248   LearningRate 0.2251   Epoch: 6   Global Step: 67380   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 09:28:43,023-Speed 5992.15 samples/sec   Loss 10.4955   LearningRate 0.2250   Epoch: 6   Global Step: 67390   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:28:49,875-Speed 5981.52 samples/sec   Loss 10.3886   LearningRate 0.2250   Epoch: 6   Global Step: 67400   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:28:56,739-Speed 5969.42 samples/sec   Loss 10.4562   LearningRate 0.2250   Epoch: 6   Global Step: 67410   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:29:03,597-Speed 5974.00 samples/sec   Loss 10.4149   LearningRate 0.2249   Epoch: 6   Global Step: 67420   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:29:10,458-Speed 5970.77 samples/sec   Loss 10.4479   LearningRate 0.2249   Epoch: 6   Global Step: 67430   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:29:17,315-Speed 5974.88 samples/sec   Loss 10.4429   LearningRate 0.2249   Epoch: 6   Global Step: 67440   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:29:24,189-Speed 5960.12 samples/sec   Loss 10.3805   LearningRate 0.2248   Epoch: 6   Global Step: 67450   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:29:31,033-Speed 5985.68 samples/sec   Loss 10.4390   LearningRate 0.2248   Epoch: 6   Global Step: 67460   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:29:37,882-Speed 5981.52 samples/sec   Loss 10.4561   LearningRate 0.2248   Epoch: 6   Global Step: 67470   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:29:44,733-Speed 5980.27 samples/sec   Loss 10.4136   LearningRate 0.2247   Epoch: 6   Global Step: 67480   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:29:51,584-Speed 5979.56 samples/sec   Loss 10.4158   LearningRate 0.2247   Epoch: 6   Global Step: 67490   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 09:29:58,435-Speed 5979.53 samples/sec   Loss 10.4564   LearningRate 0.2247   Epoch: 6   Global Step: 67500   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 09:30:05,300-Speed 5969.37 samples/sec   Loss 10.4107   LearningRate 0.2246   Epoch: 6   Global Step: 67510   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 09:30:12,151-Speed 5978.91 samples/sec   Loss 10.3823   LearningRate 0.2246   Epoch: 6   Global Step: 67520   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 09:30:19,002-Speed 5980.27 samples/sec   Loss 10.3656   LearningRate 0.2246   Epoch: 6   Global Step: 67530   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:30:25,866-Speed 5968.63 samples/sec   Loss 10.4010   LearningRate 0.2245   Epoch: 6   Global Step: 67540   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:30:32,729-Speed 5969.32 samples/sec   Loss 10.4134   LearningRate 0.2245   Epoch: 6   Global Step: 67550   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:30:39,604-Speed 5961.53 samples/sec   Loss 10.4265   LearningRate 0.2245   Epoch: 6   Global Step: 67560   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:30:46,460-Speed 5976.02 samples/sec   Loss 10.3169   LearningRate 0.2244   Epoch: 6   Global Step: 67570   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:30:53,311-Speed 5979.20 samples/sec   Loss 10.4299   LearningRate 0.2244   Epoch: 6   Global Step: 67580   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:31:00,163-Speed 5979.38 samples/sec   Loss 10.3436   LearningRate 0.2244   Epoch: 6   Global Step: 67590   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 09:31:07,022-Speed 5973.03 samples/sec   Loss 10.4440   LearningRate 0.2244   Epoch: 6   Global Step: 67600   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 09:31:13,899-Speed 5956.96 samples/sec   Loss 10.4215   LearningRate 0.2243   Epoch: 6   Global Step: 67610   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 09:31:20,773-Speed 5960.54 samples/sec   Loss 10.3673   LearningRate 0.2243   Epoch: 6   Global Step: 67620   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 09:31:27,628-Speed 5976.37 samples/sec   Loss 10.4807   LearningRate 0.2243   Epoch: 6   Global Step: 67630   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 09:31:34,479-Speed 5979.78 samples/sec   Loss 10.5291   LearningRate 0.2242   Epoch: 6   Global Step: 67640   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 09:31:41,414-Speed 5907.06 samples/sec   Loss 10.3749   LearningRate 0.2242   Epoch: 6   Global Step: 67650   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 09:31:48,387-Speed 5875.44 samples/sec   Loss 10.4330   LearningRate 0.2242   Epoch: 6   Global Step: 67660   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 09:31:55,339-Speed 5893.35 samples/sec   Loss 10.4525   LearningRate 0.2241   Epoch: 6   Global Step: 67670   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 09:32:02,205-Speed 5966.60 samples/sec   Loss 10.4148   LearningRate 0.2241   Epoch: 6   Global Step: 67680   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 09:32:09,049-Speed 5985.45 samples/sec   Loss 10.4297   LearningRate 0.2241   Epoch: 6   Global Step: 67690   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:32:15,943-Speed 5943.14 samples/sec   Loss 10.3528   LearningRate 0.2240   Epoch: 6   Global Step: 67700   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:32:22,854-Speed 5927.84 samples/sec   Loss 10.4605   LearningRate 0.2240   Epoch: 6   Global Step: 67710   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:32:29,726-Speed 5960.58 samples/sec   Loss 10.4870   LearningRate 0.2240   Epoch: 6   Global Step: 67720   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:32:36,571-Speed 5985.89 samples/sec   Loss 10.4427   LearningRate 0.2239   Epoch: 6   Global Step: 67730   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:32:43,433-Speed 5969.98 samples/sec   Loss 10.4311   LearningRate 0.2239   Epoch: 6   Global Step: 67740   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:32:50,287-Speed 5977.33 samples/sec   Loss 10.3941   LearningRate 0.2239   Epoch: 6   Global Step: 67750   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:32:57,165-Speed 5956.83 samples/sec   Loss 10.4476   LearningRate 0.2238   Epoch: 6   Global Step: 67760   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:33:04,051-Speed 5948.90 samples/sec   Loss 10.4841   LearningRate 0.2238   Epoch: 6   Global Step: 67770   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:33:10,905-Speed 5977.30 samples/sec   Loss 10.4639   LearningRate 0.2238   Epoch: 6   Global Step: 67780   Fp16 Grad Scale: 16384   Required: 27 hours
Training: 2022-01-08 09:33:17,772-Speed 5966.26 samples/sec   Loss 10.4526   LearningRate 0.2237   Epoch: 6   Global Step: 67790   Fp16 Grad Scale: 16384   Required: 27 hours
Training: 2022-01-08 09:33:24,649-Speed 5957.39 samples/sec   Loss 10.4965   LearningRate 0.2237   Epoch: 6   Global Step: 67800   Fp16 Grad Scale: 16384   Required: 27 hours
Training: 2022-01-08 09:33:31,498-Speed 5981.19 samples/sec   Loss 10.4829   LearningRate 0.2237   Epoch: 6   Global Step: 67810   Fp16 Grad Scale: 16384   Required: 27 hours
Training: 2022-01-08 09:33:38,401-Speed 5937.24 samples/sec   Loss 10.4068   LearningRate 0.2236   Epoch: 6   Global Step: 67820   Fp16 Grad Scale: 16384   Required: 27 hours
Training: 2022-01-08 09:33:45,282-Speed 5953.89 samples/sec   Loss 10.4150   LearningRate 0.2236   Epoch: 6   Global Step: 67830   Fp16 Grad Scale: 16384   Required: 27 hours
Training: 2022-01-08 09:33:52,179-Speed 5939.64 samples/sec   Loss 10.4238   LearningRate 0.2236   Epoch: 6   Global Step: 67840   Fp16 Grad Scale: 16384   Required: 27 hours
Training: 2022-01-08 09:33:59,048-Speed 5965.16 samples/sec   Loss 10.4293   LearningRate 0.2236   Epoch: 6   Global Step: 67850   Fp16 Grad Scale: 16384   Required: 27 hours
Training: 2022-01-08 09:34:05,923-Speed 5958.64 samples/sec   Loss 10.3823   LearningRate 0.2235   Epoch: 6   Global Step: 67860   Fp16 Grad Scale: 16384   Required: 27 hours
Training: 2022-01-08 09:34:12,769-Speed 5984.06 samples/sec   Loss 10.4577   LearningRate 0.2235   Epoch: 6   Global Step: 67870   Fp16 Grad Scale: 16384   Required: 27 hours
Training: 2022-01-08 09:34:19,651-Speed 5953.15 samples/sec   Loss 10.3904   LearningRate 0.2235   Epoch: 6   Global Step: 67880   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 09:34:26,527-Speed 5960.68 samples/sec   Loss 10.3716   LearningRate 0.2234   Epoch: 6   Global Step: 67890   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 09:34:33,381-Speed 5977.44 samples/sec   Loss 10.4249   LearningRate 0.2234   Epoch: 6   Global Step: 67900   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 09:34:40,238-Speed 5974.79 samples/sec   Loss 10.4668   LearningRate 0.2234   Epoch: 6   Global Step: 67910   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 09:34:47,088-Speed 5981.10 samples/sec   Loss 10.5049   LearningRate 0.2233   Epoch: 6   Global Step: 67920   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 09:34:53,944-Speed 5975.56 samples/sec   Loss 10.4329   LearningRate 0.2233   Epoch: 6   Global Step: 67930   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 09:35:00,812-Speed 5965.01 samples/sec   Loss 10.4130   LearningRate 0.2233   Epoch: 6   Global Step: 67940   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 09:35:07,692-Speed 5955.41 samples/sec   Loss 10.3856   LearningRate 0.2232   Epoch: 6   Global Step: 67950   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 09:35:14,554-Speed 5970.71 samples/sec   Loss 10.3629   LearningRate 0.2232   Epoch: 6   Global Step: 67960   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 09:35:21,413-Speed 5972.86 samples/sec   Loss 10.3740   LearningRate 0.2232   Epoch: 6   Global Step: 67970   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 09:35:28,259-Speed 5984.62 samples/sec   Loss 10.3689   LearningRate 0.2231   Epoch: 6   Global Step: 67980   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:35:35,110-Speed 5978.77 samples/sec   Loss 10.3722   LearningRate 0.2231   Epoch: 6   Global Step: 67990   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:35:41,962-Speed 5979.02 samples/sec   Loss 10.4262   LearningRate 0.2231   Epoch: 6   Global Step: 68000   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:35:48,835-Speed 5961.62 samples/sec   Loss 10.3858   LearningRate 0.2230   Epoch: 6   Global Step: 68010   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:35:55,689-Speed 5979.12 samples/sec   Loss 10.4289   LearningRate 0.2230   Epoch: 6   Global Step: 68020   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:36:02,564-Speed 5958.82 samples/sec   Loss 10.3255   LearningRate 0.2230   Epoch: 6   Global Step: 68030   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:36:09,434-Speed 5962.76 samples/sec   Loss 10.3542   LearningRate 0.2229   Epoch: 6   Global Step: 68040   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:36:16,280-Speed 5984.08 samples/sec   Loss 10.3942   LearningRate 0.2229   Epoch: 6   Global Step: 68050   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:36:23,124-Speed 5985.37 samples/sec   Loss 10.4232   LearningRate 0.2229   Epoch: 6   Global Step: 68060   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:36:29,979-Speed 5976.69 samples/sec   Loss 10.3727   LearningRate 0.2228   Epoch: 6   Global Step: 68070   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:36:36,877-Speed 5938.61 samples/sec   Loss 10.4198   LearningRate 0.2228   Epoch: 6   Global Step: 68080   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:36:43,747-Speed 5965.04 samples/sec   Loss 10.3766   LearningRate 0.2228   Epoch: 6   Global Step: 68090   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:36:50,596-Speed 5980.97 samples/sec   Loss 10.3959   LearningRate 0.2228   Epoch: 6   Global Step: 68100   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:36:57,443-Speed 5983.99 samples/sec   Loss 10.3914   LearningRate 0.2227   Epoch: 6   Global Step: 68110   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:37:04,294-Speed 5980.06 samples/sec   Loss 10.3779   LearningRate 0.2227   Epoch: 6   Global Step: 68120   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:37:11,142-Speed 5983.98 samples/sec   Loss 10.4092   LearningRate 0.2227   Epoch: 6   Global Step: 68130   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:37:17,988-Speed 5983.77 samples/sec   Loss 10.3493   LearningRate 0.2226   Epoch: 6   Global Step: 68140   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:37:24,849-Speed 5970.83 samples/sec   Loss 10.3609   LearningRate 0.2226   Epoch: 6   Global Step: 68150   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:37:31,714-Speed 5967.65 samples/sec   Loss 10.3901   LearningRate 0.2226   Epoch: 6   Global Step: 68160   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:37:38,557-Speed 5987.63 samples/sec   Loss 10.3463   LearningRate 0.2225   Epoch: 6   Global Step: 68170   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:37:45,408-Speed 5979.10 samples/sec   Loss 10.2874   LearningRate 0.2225   Epoch: 6   Global Step: 68180   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 09:37:52,264-Speed 5975.51 samples/sec   Loss 10.3035   LearningRate 0.2225   Epoch: 6   Global Step: 68190   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 09:37:59,119-Speed 5976.19 samples/sec   Loss 10.3946   LearningRate 0.2224   Epoch: 6   Global Step: 68200   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:38:05,994-Speed 5959.51 samples/sec   Loss 10.3614   LearningRate 0.2224   Epoch: 6   Global Step: 68210   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:38:12,854-Speed 5971.42 samples/sec   Loss 10.4044   LearningRate 0.2224   Epoch: 6   Global Step: 68220   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:38:19,726-Speed 5961.60 samples/sec   Loss 10.4028   LearningRate 0.2223   Epoch: 6   Global Step: 68230   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:38:26,636-Speed 5927.95 samples/sec   Loss 10.4187   LearningRate 0.2223   Epoch: 6   Global Step: 68240   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:38:33,485-Speed 5982.22 samples/sec   Loss 10.3543   LearningRate 0.2223   Epoch: 6   Global Step: 68250   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:38:40,330-Speed 5984.96 samples/sec   Loss 10.4243   LearningRate 0.2222   Epoch: 6   Global Step: 68260   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:38:47,180-Speed 5979.98 samples/sec   Loss 10.2942   LearningRate 0.2222   Epoch: 6   Global Step: 68270   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:38:54,034-Speed 5977.74 samples/sec   Loss 10.4162   LearningRate 0.2222   Epoch: 6   Global Step: 68280   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:39:00,882-Speed 5982.63 samples/sec   Loss 10.4635   LearningRate 0.2221   Epoch: 6   Global Step: 68290   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:39:07,743-Speed 5970.55 samples/sec   Loss 10.3575   LearningRate 0.2221   Epoch: 6   Global Step: 68300   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 09:39:14,584-Speed 5988.20 samples/sec   Loss 10.3744   LearningRate 0.2221   Epoch: 6   Global Step: 68310   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:39:21,431-Speed 5983.44 samples/sec   Loss 10.3238   LearningRate 0.2220   Epoch: 6   Global Step: 68320   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:39:28,287-Speed 5975.22 samples/sec   Loss 10.3134   LearningRate 0.2220   Epoch: 6   Global Step: 68330   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:39:35,140-Speed 5978.17 samples/sec   Loss 10.4295   LearningRate 0.2220   Epoch: 6   Global Step: 68340   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:39:41,993-Speed 5978.55 samples/sec   Loss 10.3722   LearningRate 0.2220   Epoch: 6   Global Step: 68350   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:39:48,838-Speed 5984.38 samples/sec   Loss 10.3758   LearningRate 0.2219   Epoch: 6   Global Step: 68360   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:39:55,692-Speed 5977.73 samples/sec   Loss 10.3916   LearningRate 0.2219   Epoch: 6   Global Step: 68370   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:40:02,544-Speed 5978.74 samples/sec   Loss 10.2791   LearningRate 0.2219   Epoch: 6   Global Step: 68380   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:40:09,418-Speed 5959.96 samples/sec   Loss 10.3109   LearningRate 0.2218   Epoch: 6   Global Step: 68390   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:40:16,275-Speed 5973.66 samples/sec   Loss 10.3629   LearningRate 0.2218   Epoch: 6   Global Step: 68400   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:40:23,152-Speed 5957.52 samples/sec   Loss 10.3316   LearningRate 0.2218   Epoch: 6   Global Step: 68410   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:40:30,009-Speed 5975.07 samples/sec   Loss 10.3222   LearningRate 0.2217   Epoch: 6   Global Step: 68420   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:40:36,879-Speed 5962.78 samples/sec   Loss 10.3639   LearningRate 0.2217   Epoch: 6   Global Step: 68430   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:40:43,734-Speed 5976.54 samples/sec   Loss 10.3418   LearningRate 0.2217   Epoch: 6   Global Step: 68440   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:40:50,577-Speed 5989.15 samples/sec   Loss 10.3246   LearningRate 0.2216   Epoch: 6   Global Step: 68450   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:40:57,425-Speed 5982.52 samples/sec   Loss 10.4064   LearningRate 0.2216   Epoch: 6   Global Step: 68460   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:41:04,288-Speed 5969.39 samples/sec   Loss 10.4117   LearningRate 0.2216   Epoch: 6   Global Step: 68470   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:41:11,154-Speed 5966.65 samples/sec   Loss 10.3741   LearningRate 0.2215   Epoch: 6   Global Step: 68480   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:41:18,007-Speed 5977.85 samples/sec   Loss 10.3109   LearningRate 0.2215   Epoch: 6   Global Step: 68490   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:41:24,856-Speed 5981.50 samples/sec   Loss 10.2858   LearningRate 0.2215   Epoch: 6   Global Step: 68500   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:41:31,727-Speed 5961.98 samples/sec   Loss 10.4232   LearningRate 0.2214   Epoch: 6   Global Step: 68510   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 09:41:38,564-Speed 5991.85 samples/sec   Loss 10.3573   LearningRate 0.2214   Epoch: 6   Global Step: 68520   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:41:45,423-Speed 5973.27 samples/sec   Loss 10.3935   LearningRate 0.2214   Epoch: 6   Global Step: 68530   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:41:52,293-Speed 5963.32 samples/sec   Loss 10.3351   LearningRate 0.2213   Epoch: 6   Global Step: 68540   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:41:59,165-Speed 5962.04 samples/sec   Loss 10.3363   LearningRate 0.2213   Epoch: 6   Global Step: 68550   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:42:06,016-Speed 5979.30 samples/sec   Loss 10.3474   LearningRate 0.2213   Epoch: 6   Global Step: 68560   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:42:12,867-Speed 5980.28 samples/sec   Loss 10.3508   LearningRate 0.2212   Epoch: 6   Global Step: 68570   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:42:19,725-Speed 5973.80 samples/sec   Loss 10.3776   LearningRate 0.2212   Epoch: 6   Global Step: 68580   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:42:26,597-Speed 5961.30 samples/sec   Loss 10.3030   LearningRate 0.2212   Epoch: 6   Global Step: 68590   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:42:33,465-Speed 5964.89 samples/sec   Loss 10.4233   LearningRate 0.2212   Epoch: 6   Global Step: 68600   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:42:40,305-Speed 5989.35 samples/sec   Loss 10.4253   LearningRate 0.2211   Epoch: 6   Global Step: 68610   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:42:47,169-Speed 5968.68 samples/sec   Loss 10.4208   LearningRate 0.2211   Epoch: 6   Global Step: 68620   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 09:42:54,048-Speed 5954.98 samples/sec   Loss 10.3453   LearningRate 0.2211   Epoch: 6   Global Step: 68630   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 09:43:00,898-Speed 5980.45 samples/sec   Loss 10.4126   LearningRate 0.2210   Epoch: 6   Global Step: 68640   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 09:43:07,746-Speed 5982.46 samples/sec   Loss 10.3762   LearningRate 0.2210   Epoch: 6   Global Step: 68650   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:43:14,591-Speed 5985.17 samples/sec   Loss 10.3364   LearningRate 0.2210   Epoch: 6   Global Step: 68660   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:43:21,442-Speed 5980.17 samples/sec   Loss 10.3434   LearningRate 0.2209   Epoch: 6   Global Step: 68670   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:43:28,291-Speed 5981.18 samples/sec   Loss 10.4259   LearningRate 0.2209   Epoch: 6   Global Step: 68680   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:43:35,136-Speed 5986.01 samples/sec   Loss 10.4341   LearningRate 0.2209   Epoch: 6   Global Step: 68690   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:43:42,025-Speed 5947.93 samples/sec   Loss 10.3109   LearningRate 0.2208   Epoch: 6   Global Step: 68700   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:43:48,877-Speed 5978.88 samples/sec   Loss 10.2741   LearningRate 0.2208   Epoch: 6   Global Step: 68710   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:43:55,764-Speed 5948.38 samples/sec   Loss 10.3607   LearningRate 0.2208   Epoch: 6   Global Step: 68720   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:44:02,644-Speed 5954.85 samples/sec   Loss 10.3834   LearningRate 0.2207   Epoch: 6   Global Step: 68730   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:44:09,533-Speed 5946.97 samples/sec   Loss 10.3033   LearningRate 0.2207   Epoch: 6   Global Step: 68740   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:44:16,485-Speed 5893.35 samples/sec   Loss 10.3649   LearningRate 0.2207   Epoch: 6   Global Step: 68750   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 09:44:23,340-Speed 5976.50 samples/sec   Loss 10.2482   LearningRate 0.2206   Epoch: 6   Global Step: 68760   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 09:44:30,317-Speed 5871.97 samples/sec   Loss 10.2899   LearningRate 0.2206   Epoch: 6   Global Step: 68770   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 09:44:37,272-Speed 5890.93 samples/sec   Loss 10.3466   LearningRate 0.2206   Epoch: 6   Global Step: 68780   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 09:44:44,126-Speed 5977.68 samples/sec   Loss 10.3421   LearningRate 0.2205   Epoch: 6   Global Step: 68790   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 09:44:50,975-Speed 5980.95 samples/sec   Loss 10.3675   LearningRate 0.2205   Epoch: 6   Global Step: 68800   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 09:44:57,846-Speed 5963.42 samples/sec   Loss 10.2500   LearningRate 0.2205   Epoch: 6   Global Step: 68810   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 09:45:04,736-Speed 5945.91 samples/sec   Loss 10.3965   LearningRate 0.2205   Epoch: 6   Global Step: 68820   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 09:45:11,583-Speed 5983.24 samples/sec   Loss 10.3902   LearningRate 0.2204   Epoch: 6   Global Step: 68830   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 09:45:18,453-Speed 5963.24 samples/sec   Loss 10.3374   LearningRate 0.2204   Epoch: 6   Global Step: 68840   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 09:45:25,290-Speed 5992.94 samples/sec   Loss 10.3754   LearningRate 0.2204   Epoch: 6   Global Step: 68850   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 09:45:32,137-Speed 5982.64 samples/sec   Loss 10.4253   LearningRate 0.2203   Epoch: 6   Global Step: 68860   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:45:39,015-Speed 5956.63 samples/sec   Loss 10.3819   LearningRate 0.2203   Epoch: 6   Global Step: 68870   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:45:45,885-Speed 5963.56 samples/sec   Loss 10.2676   LearningRate 0.2203   Epoch: 6   Global Step: 68880   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:45:52,746-Speed 5970.99 samples/sec   Loss 10.4188   LearningRate 0.2202   Epoch: 6   Global Step: 68890   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:45:59,611-Speed 5967.34 samples/sec   Loss 10.2960   LearningRate 0.2202   Epoch: 6   Global Step: 68900   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:46:06,468-Speed 5974.72 samples/sec   Loss 10.3143   LearningRate 0.2202   Epoch: 6   Global Step: 68910   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:46:13,325-Speed 5977.27 samples/sec   Loss 10.3251   LearningRate 0.2201   Epoch: 6   Global Step: 68920   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:46:20,179-Speed 5977.69 samples/sec   Loss 10.3669   LearningRate 0.2201   Epoch: 6   Global Step: 68930   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:46:27,064-Speed 5950.81 samples/sec   Loss 10.2600   LearningRate 0.2201   Epoch: 6   Global Step: 68940   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:46:33,942-Speed 5956.49 samples/sec   Loss 10.3337   LearningRate 0.2200   Epoch: 6   Global Step: 68950   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:46:40,805-Speed 5971.55 samples/sec   Loss 10.3391   LearningRate 0.2200   Epoch: 6   Global Step: 68960   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 09:46:47,651-Speed 5983.70 samples/sec   Loss 10.3009   LearningRate 0.2200   Epoch: 6   Global Step: 68970   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 09:46:54,521-Speed 5963.35 samples/sec   Loss 10.2749   LearningRate 0.2199   Epoch: 6   Global Step: 68980   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 09:47:01,387-Speed 5967.27 samples/sec   Loss 10.3406   LearningRate 0.2199   Epoch: 6   Global Step: 68990   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 09:47:08,294-Speed 5931.59 samples/sec   Loss 10.2748   LearningRate 0.2199   Epoch: 6   Global Step: 69000   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 09:47:15,191-Speed 5940.10 samples/sec   Loss 10.3573   LearningRate 0.2198   Epoch: 6   Global Step: 69010   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 09:47:22,064-Speed 5959.84 samples/sec   Loss 10.2763   LearningRate 0.2198   Epoch: 6   Global Step: 69020   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:47:28,983-Speed 5921.93 samples/sec   Loss 10.2898   LearningRate 0.2198   Epoch: 6   Global Step: 69030   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:47:35,900-Speed 5922.58 samples/sec   Loss 10.3749   LearningRate 0.2198   Epoch: 6   Global Step: 69040   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:47:42,745-Speed 5985.01 samples/sec   Loss 10.3408   LearningRate 0.2197   Epoch: 6   Global Step: 69050   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:47:49,602-Speed 5975.46 samples/sec   Loss 10.3636   LearningRate 0.2197   Epoch: 6   Global Step: 69060   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:47:56,453-Speed 5979.75 samples/sec   Loss 10.2698   LearningRate 0.2197   Epoch: 6   Global Step: 69070   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:48:03,340-Speed 5948.21 samples/sec   Loss 10.2888   LearningRate 0.2196   Epoch: 6   Global Step: 69080   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:48:10,202-Speed 5970.90 samples/sec   Loss 10.3244   LearningRate 0.2196   Epoch: 6   Global Step: 69090   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:48:17,059-Speed 5975.14 samples/sec   Loss 10.2540   LearningRate 0.2196   Epoch: 6   Global Step: 69100   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:48:23,921-Speed 5970.32 samples/sec   Loss 10.3194   LearningRate 0.2195   Epoch: 6   Global Step: 69110   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:48:30,784-Speed 5970.87 samples/sec   Loss 10.3641   LearningRate 0.2195   Epoch: 6   Global Step: 69120   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:48:37,622-Speed 5990.91 samples/sec   Loss 10.3812   LearningRate 0.2195   Epoch: 6   Global Step: 69130   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:48:44,508-Speed 5950.08 samples/sec   Loss 10.2493   LearningRate 0.2194   Epoch: 6   Global Step: 69140   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:48:51,372-Speed 5969.05 samples/sec   Loss 10.3126   LearningRate 0.2194   Epoch: 6   Global Step: 69150   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:48:58,233-Speed 5970.92 samples/sec   Loss 10.2960   LearningRate 0.2194   Epoch: 6   Global Step: 69160   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:49:05,092-Speed 5973.16 samples/sec   Loss 10.3229   LearningRate 0.2193   Epoch: 6   Global Step: 69170   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:49:11,956-Speed 5969.70 samples/sec   Loss 10.3736   LearningRate 0.2193   Epoch: 6   Global Step: 69180   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:49:18,833-Speed 5957.57 samples/sec   Loss 10.3300   LearningRate 0.2193   Epoch: 6   Global Step: 69190   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:49:25,706-Speed 5960.21 samples/sec   Loss 10.2306   LearningRate 0.2192   Epoch: 6   Global Step: 69200   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:49:32,573-Speed 5970.00 samples/sec   Loss 10.3529   LearningRate 0.2192   Epoch: 6   Global Step: 69210   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:49:39,428-Speed 5975.83 samples/sec   Loss 10.2320   LearningRate 0.2192   Epoch: 6   Global Step: 69220   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:49:46,290-Speed 5970.33 samples/sec   Loss 10.2703   LearningRate 0.2192   Epoch: 6   Global Step: 69230   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:49:53,152-Speed 5970.29 samples/sec   Loss 10.3145   LearningRate 0.2191   Epoch: 6   Global Step: 69240   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:50:00,014-Speed 5970.36 samples/sec   Loss 10.4137   LearningRate 0.2191   Epoch: 6   Global Step: 69250   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 09:50:06,884-Speed 5963.65 samples/sec   Loss 10.3467   LearningRate 0.2191   Epoch: 6   Global Step: 69260   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 09:50:13,723-Speed 5990.06 samples/sec   Loss 10.2616   LearningRate 0.2190   Epoch: 6   Global Step: 69270   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:50:20,572-Speed 5982.41 samples/sec   Loss 10.3381   LearningRate 0.2190   Epoch: 6   Global Step: 69280   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:50:27,421-Speed 5981.07 samples/sec   Loss 10.3071   LearningRate 0.2190   Epoch: 6   Global Step: 69290   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:50:34,276-Speed 5976.82 samples/sec   Loss 10.2755   LearningRate 0.2189   Epoch: 6   Global Step: 69300   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:50:41,115-Speed 5989.24 samples/sec   Loss 10.3760   LearningRate 0.2189   Epoch: 6   Global Step: 69310   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:50:48,000-Speed 5950.36 samples/sec   Loss 10.2415   LearningRate 0.2189   Epoch: 6   Global Step: 69320   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:50:54,858-Speed 5974.48 samples/sec   Loss 10.3022   LearningRate 0.2188   Epoch: 6   Global Step: 69330   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:51:01,710-Speed 5980.63 samples/sec   Loss 10.3160   LearningRate 0.2188   Epoch: 6   Global Step: 69340   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:51:08,566-Speed 5975.91 samples/sec   Loss 10.2791   LearningRate 0.2188   Epoch: 6   Global Step: 69350   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:51:15,414-Speed 5982.55 samples/sec   Loss 10.2709   LearningRate 0.2187   Epoch: 6   Global Step: 69360   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:51:22,285-Speed 5962.14 samples/sec   Loss 10.2686   LearningRate 0.2187   Epoch: 6   Global Step: 69370   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:51:29,140-Speed 5976.42 samples/sec   Loss 10.2501   LearningRate 0.2187   Epoch: 6   Global Step: 69380   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:51:36,007-Speed 5966.14 samples/sec   Loss 10.2612   LearningRate 0.2186   Epoch: 6   Global Step: 69390   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:51:42,887-Speed 5954.71 samples/sec   Loss 10.2860   LearningRate 0.2186   Epoch: 6   Global Step: 69400   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:51:49,743-Speed 5978.20 samples/sec   Loss 10.2289   LearningRate 0.2186   Epoch: 6   Global Step: 69410   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:51:56,593-Speed 5980.19 samples/sec   Loss 10.3016   LearningRate 0.2185   Epoch: 6   Global Step: 69420   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:52:03,526-Speed 5909.95 samples/sec   Loss 10.3014   LearningRate 0.2185   Epoch: 6   Global Step: 69430   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:52:10,376-Speed 5980.04 samples/sec   Loss 10.3361   LearningRate 0.2185   Epoch: 6   Global Step: 69440   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:52:17,241-Speed 5967.74 samples/sec   Loss 10.2362   LearningRate 0.2185   Epoch: 6   Global Step: 69450   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:52:24,111-Speed 5963.23 samples/sec   Loss 10.2193   LearningRate 0.2184   Epoch: 6   Global Step: 69460   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:52:30,997-Speed 5949.86 samples/sec   Loss 10.1795   LearningRate 0.2184   Epoch: 6   Global Step: 69470   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:52:37,844-Speed 5983.69 samples/sec   Loss 10.2635   LearningRate 0.2184   Epoch: 6   Global Step: 69480   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:52:44,693-Speed 5981.66 samples/sec   Loss 10.2509   LearningRate 0.2183   Epoch: 6   Global Step: 69490   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:52:51,550-Speed 5975.17 samples/sec   Loss 10.2351   LearningRate 0.2183   Epoch: 6   Global Step: 69500   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:52:58,399-Speed 5981.49 samples/sec   Loss 10.3131   LearningRate 0.2183   Epoch: 6   Global Step: 69510   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 09:53:05,254-Speed 5976.30 samples/sec   Loss 10.1928   LearningRate 0.2182   Epoch: 6   Global Step: 69520   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:53:12,121-Speed 5966.36 samples/sec   Loss 10.2435   LearningRate 0.2182   Epoch: 6   Global Step: 69530   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:53:18,969-Speed 5982.29 samples/sec   Loss 10.3273   LearningRate 0.2182   Epoch: 6   Global Step: 69540   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:53:25,827-Speed 5973.68 samples/sec   Loss 10.2046   LearningRate 0.2181   Epoch: 6   Global Step: 69550   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:53:32,677-Speed 5982.64 samples/sec   Loss 10.2787   LearningRate 0.2181   Epoch: 6   Global Step: 69560   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:53:39,525-Speed 5982.93 samples/sec   Loss 10.3189   LearningRate 0.2181   Epoch: 6   Global Step: 69570   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:53:46,369-Speed 5985.42 samples/sec   Loss 10.2769   LearningRate 0.2180   Epoch: 6   Global Step: 69580   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:53:53,246-Speed 5961.15 samples/sec   Loss 10.2442   LearningRate 0.2180   Epoch: 6   Global Step: 69590   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:54:00,106-Speed 5971.69 samples/sec   Loss 10.2373   LearningRate 0.2180   Epoch: 6   Global Step: 69600   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:54:06,980-Speed 5960.05 samples/sec   Loss 10.3351   LearningRate 0.2179   Epoch: 6   Global Step: 69610   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:54:13,845-Speed 5967.33 samples/sec   Loss 10.2824   LearningRate 0.2179   Epoch: 6   Global Step: 69620   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 09:54:20,711-Speed 5967.32 samples/sec   Loss 10.3935   LearningRate 0.2179   Epoch: 6   Global Step: 69630   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 09:54:27,573-Speed 5969.49 samples/sec   Loss 10.3155   LearningRate 0.2179   Epoch: 6   Global Step: 69640   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 09:54:34,449-Speed 5958.08 samples/sec   Loss 10.3524   LearningRate 0.2178   Epoch: 6   Global Step: 69650   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 09:54:41,296-Speed 5984.16 samples/sec   Loss 10.2547   LearningRate 0.2178   Epoch: 6   Global Step: 69660   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:54:48,155-Speed 5972.55 samples/sec   Loss 10.3088   LearningRate 0.2178   Epoch: 6   Global Step: 69670   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:54:55,009-Speed 5976.78 samples/sec   Loss 10.3263   LearningRate 0.2177   Epoch: 6   Global Step: 69680   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:55:01,889-Speed 5955.79 samples/sec   Loss 10.3056   LearningRate 0.2177   Epoch: 6   Global Step: 69690   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:55:08,754-Speed 5967.43 samples/sec   Loss 10.2029   LearningRate 0.2177   Epoch: 6   Global Step: 69700   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:55:15,597-Speed 5986.16 samples/sec   Loss 10.2519   LearningRate 0.2176   Epoch: 6   Global Step: 69710   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:55:22,442-Speed 5985.34 samples/sec   Loss 10.2570   LearningRate 0.2176   Epoch: 6   Global Step: 69720   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:55:29,298-Speed 5975.10 samples/sec   Loss 10.3262   LearningRate 0.2176   Epoch: 6   Global Step: 69730   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:55:36,153-Speed 5977.04 samples/sec   Loss 10.3264   LearningRate 0.2175   Epoch: 6   Global Step: 69740   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:55:43,033-Speed 5954.62 samples/sec   Loss 10.2562   LearningRate 0.2175   Epoch: 6   Global Step: 69750   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:55:49,888-Speed 5976.07 samples/sec   Loss 10.2452   LearningRate 0.2175   Epoch: 6   Global Step: 69760   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 09:55:56,738-Speed 5983.92 samples/sec   Loss 10.3291   LearningRate 0.2174   Epoch: 6   Global Step: 69770   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:56:03,597-Speed 5972.93 samples/sec   Loss 10.2102   LearningRate 0.2174   Epoch: 6   Global Step: 69780   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:56:10,458-Speed 5971.03 samples/sec   Loss 10.2356   LearningRate 0.2174   Epoch: 6   Global Step: 69790   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:56:17,328-Speed 5963.61 samples/sec   Loss 10.2619   LearningRate 0.2173   Epoch: 6   Global Step: 69800   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:56:24,190-Speed 5969.84 samples/sec   Loss 10.3415   LearningRate 0.2173   Epoch: 6   Global Step: 69810   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:56:31,038-Speed 5981.99 samples/sec   Loss 10.2730   LearningRate 0.2173   Epoch: 6   Global Step: 69820   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:56:37,896-Speed 5973.24 samples/sec   Loss 10.2830   LearningRate 0.2173   Epoch: 6   Global Step: 69830   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:56:44,752-Speed 5975.95 samples/sec   Loss 10.1927   LearningRate 0.2172   Epoch: 6   Global Step: 69840   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:56:51,624-Speed 5961.52 samples/sec   Loss 10.2925   LearningRate 0.2172   Epoch: 6   Global Step: 69850   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:56:58,483-Speed 5972.59 samples/sec   Loss 10.2550   LearningRate 0.2172   Epoch: 6   Global Step: 69860   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:57:05,328-Speed 5985.17 samples/sec   Loss 10.3151   LearningRate 0.2171   Epoch: 6   Global Step: 69870   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:57:12,202-Speed 5960.18 samples/sec   Loss 10.3343   LearningRate 0.2171   Epoch: 6   Global Step: 69880   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:57:19,066-Speed 5968.20 samples/sec   Loss 10.1464   LearningRate 0.2171   Epoch: 6   Global Step: 69890   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:57:25,934-Speed 5965.59 samples/sec   Loss 10.2330   LearningRate 0.2170   Epoch: 6   Global Step: 69900   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:57:32,780-Speed 5984.40 samples/sec   Loss 10.2246   LearningRate 0.2170   Epoch: 6   Global Step: 69910   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:57:39,636-Speed 5974.80 samples/sec   Loss 10.1519   LearningRate 0.2170   Epoch: 6   Global Step: 69920   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:57:46,485-Speed 5981.78 samples/sec   Loss 10.2268   LearningRate 0.2169   Epoch: 6   Global Step: 69930   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:57:53,331-Speed 5984.86 samples/sec   Loss 10.2443   LearningRate 0.2169   Epoch: 6   Global Step: 69940   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:58:00,208-Speed 5957.21 samples/sec   Loss 10.1509   LearningRate 0.2169   Epoch: 6   Global Step: 69950   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:58:07,058-Speed 5981.31 samples/sec   Loss 10.2486   LearningRate 0.2168   Epoch: 6   Global Step: 69960   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:58:13,908-Speed 5980.25 samples/sec   Loss 10.2663   LearningRate 0.2168   Epoch: 6   Global Step: 69970   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:58:20,768-Speed 5971.99 samples/sec   Loss 10.2582   LearningRate 0.2168   Epoch: 6   Global Step: 69980   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:58:27,642-Speed 5962.53 samples/sec   Loss 10.3201   LearningRate 0.2167   Epoch: 6   Global Step: 69990   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 09:58:34,500-Speed 5973.42 samples/sec   Loss 10.2610   LearningRate 0.2167   Epoch: 6   Global Step: 70000   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 09:59:01,283-[lfw][70000]XNorm: 23.052480
Training: 2022-01-08 09:59:01,284-[lfw][70000]Accuracy-Flip: 0.99750+-0.00291
Training: 2022-01-08 09:59:01,285-[lfw][70000]Accuracy-Highest: 0.99750
Training: 2022-01-08 09:59:32,303-[cfp_fp][70000]XNorm: 20.241400
Training: 2022-01-08 09:59:32,304-[cfp_fp][70000]Accuracy-Flip: 0.97443+-0.00781
Training: 2022-01-08 09:59:32,305-[cfp_fp][70000]Accuracy-Highest: 0.97686
Training: 2022-01-08 09:59:59,096-[agedb_30][70000]XNorm: 22.574474
Training: 2022-01-08 09:59:59,097-[agedb_30][70000]Accuracy-Flip: 0.96050+-0.00940
Training: 2022-01-08 09:59:59,098-[agedb_30][70000]Accuracy-Highest: 0.96633
Training: 2022-01-08 10:00:05,971-Speed 447.80 samples/sec   Loss 10.2294   LearningRate 0.2167   Epoch: 6   Global Step: 70010   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:00:12,800-Speed 6001.42 samples/sec   Loss 10.3211   LearningRate 0.2167   Epoch: 6   Global Step: 70020   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:00:19,637-Speed 5992.20 samples/sec   Loss 10.2896   LearningRate 0.2166   Epoch: 6   Global Step: 70030   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:00:26,478-Speed 5988.29 samples/sec   Loss 10.2625   LearningRate 0.2166   Epoch: 6   Global Step: 70040   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:00:33,332-Speed 5976.84 samples/sec   Loss 10.2206   LearningRate 0.2166   Epoch: 6   Global Step: 70050   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:00:40,194-Speed 5970.48 samples/sec   Loss 10.2120   LearningRate 0.2165   Epoch: 6   Global Step: 70060   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:00:47,050-Speed 5977.53 samples/sec   Loss 10.2554   LearningRate 0.2165   Epoch: 6   Global Step: 70070   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:00:53,899-Speed 5980.50 samples/sec   Loss 10.2569   LearningRate 0.2165   Epoch: 6   Global Step: 70080   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:01:00,762-Speed 5969.91 samples/sec   Loss 10.2690   LearningRate 0.2164   Epoch: 6   Global Step: 70090   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:01:07,648-Speed 5949.93 samples/sec   Loss 10.2747   LearningRate 0.2164   Epoch: 6   Global Step: 70100   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:01:14,500-Speed 5978.93 samples/sec   Loss 10.2850   LearningRate 0.2164   Epoch: 6   Global Step: 70110   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:01:21,364-Speed 5968.49 samples/sec   Loss 10.2120   LearningRate 0.2163   Epoch: 6   Global Step: 70120   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:01:28,213-Speed 5981.83 samples/sec   Loss 10.2170   LearningRate 0.2163   Epoch: 6   Global Step: 70130   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:01:35,055-Speed 5987.52 samples/sec   Loss 10.2717   LearningRate 0.2163   Epoch: 6   Global Step: 70140   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:01:41,919-Speed 5968.54 samples/sec   Loss 10.2397   LearningRate 0.2162   Epoch: 6   Global Step: 70150   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:01:48,830-Speed 5928.20 samples/sec   Loss 10.1622   LearningRate 0.2162   Epoch: 6   Global Step: 70160   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:01:55,677-Speed 5983.02 samples/sec   Loss 10.2206   LearningRate 0.2162   Epoch: 6   Global Step: 70170   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:02:02,531-Speed 5977.54 samples/sec   Loss 10.3114   LearningRate 0.2161   Epoch: 6   Global Step: 70180   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:02:09,386-Speed 5978.04 samples/sec   Loss 10.2031   LearningRate 0.2161   Epoch: 6   Global Step: 70190   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:02:16,227-Speed 5988.80 samples/sec   Loss 10.1912   LearningRate 0.2161   Epoch: 6   Global Step: 70200   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:02:23,082-Speed 5975.63 samples/sec   Loss 10.2933   LearningRate 0.2161   Epoch: 6   Global Step: 70210   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:02:29,934-Speed 5980.38 samples/sec   Loss 10.1821   LearningRate 0.2160   Epoch: 6   Global Step: 70220   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:02:36,787-Speed 5977.42 samples/sec   Loss 10.2340   LearningRate 0.2160   Epoch: 6   Global Step: 70230   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:02:43,637-Speed 5981.01 samples/sec   Loss 10.2120   LearningRate 0.2160   Epoch: 6   Global Step: 70240   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:02:50,545-Speed 5930.23 samples/sec   Loss 10.1845   LearningRate 0.2159   Epoch: 6   Global Step: 70250   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:02:57,414-Speed 5964.45 samples/sec   Loss 10.2397   LearningRate 0.2159   Epoch: 6   Global Step: 70260   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:03:04,274-Speed 5972.60 samples/sec   Loss 10.2252   LearningRate 0.2159   Epoch: 6   Global Step: 70270   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:03:11,111-Speed 5991.74 samples/sec   Loss 10.2175   LearningRate 0.2158   Epoch: 6   Global Step: 70280   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:03:17,970-Speed 5973.76 samples/sec   Loss 10.2831   LearningRate 0.2158   Epoch: 6   Global Step: 70290   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:03:24,833-Speed 5969.33 samples/sec   Loss 10.2561   LearningRate 0.2158   Epoch: 6   Global Step: 70300   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:03:31,721-Speed 5947.96 samples/sec   Loss 10.2248   LearningRate 0.2157   Epoch: 6   Global Step: 70310   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:03:38,639-Speed 5921.51 samples/sec   Loss 10.1749   LearningRate 0.2157   Epoch: 6   Global Step: 70320   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:03:45,567-Speed 5913.65 samples/sec   Loss 10.2258   LearningRate 0.2157   Epoch: 6   Global Step: 70330   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:03:52,496-Speed 5912.66 samples/sec   Loss 10.1840   LearningRate 0.2156   Epoch: 6   Global Step: 70340   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:03:59,430-Speed 5908.56 samples/sec   Loss 10.2935   LearningRate 0.2156   Epoch: 6   Global Step: 70350   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:04:06,363-Speed 5909.23 samples/sec   Loss 10.2934   LearningRate 0.2156   Epoch: 6   Global Step: 70360   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:04:13,234-Speed 5963.04 samples/sec   Loss 10.2056   LearningRate 0.2155   Epoch: 6   Global Step: 70370   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:04:20,083-Speed 5981.83 samples/sec   Loss 10.1766   LearningRate 0.2155   Epoch: 6   Global Step: 70380   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:04:26,934-Speed 5979.89 samples/sec   Loss 10.2259   LearningRate 0.2155   Epoch: 6   Global Step: 70390   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:04:33,802-Speed 5965.72 samples/sec   Loss 10.3350   LearningRate 0.2155   Epoch: 6   Global Step: 70400   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:04:40,654-Speed 5978.08 samples/sec   Loss 10.1904   LearningRate 0.2154   Epoch: 6   Global Step: 70410   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:04:47,515-Speed 5970.93 samples/sec   Loss 10.2228   LearningRate 0.2154   Epoch: 6   Global Step: 70420   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:04:54,369-Speed 5977.46 samples/sec   Loss 10.1912   LearningRate 0.2154   Epoch: 6   Global Step: 70430   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:05:01,230-Speed 5970.97 samples/sec   Loss 10.2236   LearningRate 0.2153   Epoch: 6   Global Step: 70440   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:05:08,090-Speed 5972.09 samples/sec   Loss 10.2614   LearningRate 0.2153   Epoch: 6   Global Step: 70450   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:05:14,968-Speed 5956.56 samples/sec   Loss 10.2640   LearningRate 0.2153   Epoch: 6   Global Step: 70460   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:05:21,831-Speed 5968.84 samples/sec   Loss 10.2373   LearningRate 0.2152   Epoch: 6   Global Step: 70470   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:05:28,740-Speed 5929.87 samples/sec   Loss 10.1946   LearningRate 0.2152   Epoch: 6   Global Step: 70480   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:05:35,603-Speed 5969.07 samples/sec   Loss 10.2465   LearningRate 0.2152   Epoch: 6   Global Step: 70490   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:05:42,451-Speed 5982.54 samples/sec   Loss 10.2430   LearningRate 0.2151   Epoch: 6   Global Step: 70500   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:05:49,330-Speed 5956.48 samples/sec   Loss 10.1875   LearningRate 0.2151   Epoch: 6   Global Step: 70510   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:05:56,185-Speed 5976.19 samples/sec   Loss 10.2942   LearningRate 0.2151   Epoch: 6   Global Step: 70520   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:06:03,070-Speed 5950.34 samples/sec   Loss 10.1184   LearningRate 0.2150   Epoch: 6   Global Step: 70530   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:06:09,912-Speed 5987.89 samples/sec   Loss 10.2649   LearningRate 0.2150   Epoch: 6   Global Step: 70540   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:06:16,772-Speed 5975.06 samples/sec   Loss 10.3100   LearningRate 0.2150   Epoch: 6   Global Step: 70550   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:06:23,625-Speed 5977.56 samples/sec   Loss 10.2209   LearningRate 0.2150   Epoch: 6   Global Step: 70560   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:06:30,479-Speed 5977.47 samples/sec   Loss 10.2070   LearningRate 0.2149   Epoch: 6   Global Step: 70570   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:06:37,341-Speed 5970.31 samples/sec   Loss 10.1631   LearningRate 0.2149   Epoch: 6   Global Step: 70580   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:06:44,193-Speed 5978.95 samples/sec   Loss 10.2651   LearningRate 0.2149   Epoch: 6   Global Step: 70590   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:06:51,039-Speed 5984.11 samples/sec   Loss 10.1550   LearningRate 0.2148   Epoch: 6   Global Step: 70600   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:06:57,919-Speed 5954.26 samples/sec   Loss 10.1715   LearningRate 0.2148   Epoch: 6   Global Step: 70610   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:07:04,780-Speed 5971.46 samples/sec   Loss 10.2134   LearningRate 0.2148   Epoch: 6   Global Step: 70620   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:07:11,671-Speed 5945.59 samples/sec   Loss 10.1189   LearningRate 0.2147   Epoch: 6   Global Step: 70630   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:07:18,545-Speed 5960.29 samples/sec   Loss 10.1366   LearningRate 0.2147   Epoch: 6   Global Step: 70640   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:07:25,426-Speed 5953.44 samples/sec   Loss 10.1725   LearningRate 0.2147   Epoch: 6   Global Step: 70650   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:07:32,301-Speed 5959.27 samples/sec   Loss 10.2947   LearningRate 0.2146   Epoch: 6   Global Step: 70660   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:07:39,172-Speed 5964.44 samples/sec   Loss 10.1658   LearningRate 0.2146   Epoch: 6   Global Step: 70670   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:07:46,092-Speed 5919.46 samples/sec   Loss 10.1715   LearningRate 0.2146   Epoch: 6   Global Step: 70680   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:07:52,966-Speed 5962.51 samples/sec   Loss 10.2221   LearningRate 0.2145   Epoch: 6   Global Step: 70690   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:07:59,831-Speed 5967.72 samples/sec   Loss 10.2016   LearningRate 0.2145   Epoch: 6   Global Step: 70700   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:08:06,688-Speed 5974.17 samples/sec   Loss 10.1840   LearningRate 0.2145   Epoch: 6   Global Step: 70710   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:08:13,554-Speed 5967.63 samples/sec   Loss 10.1983   LearningRate 0.2144   Epoch: 6   Global Step: 70720   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 10:08:20,420-Speed 5966.40 samples/sec   Loss 10.1855   LearningRate 0.2144   Epoch: 6   Global Step: 70730   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 10:08:27,273-Speed 5980.50 samples/sec   Loss 10.1543   LearningRate 0.2144   Epoch: 6   Global Step: 70740   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 10:08:34,133-Speed 5971.29 samples/sec   Loss 10.1543   LearningRate 0.2144   Epoch: 6   Global Step: 70750   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 10:08:41,001-Speed 5965.89 samples/sec   Loss 10.1761   LearningRate 0.2143   Epoch: 6   Global Step: 70760   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 10:08:47,878-Speed 5956.85 samples/sec   Loss 10.2465   LearningRate 0.2143   Epoch: 6   Global Step: 70770   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 10:08:54,732-Speed 5977.36 samples/sec   Loss 10.1592   LearningRate 0.2143   Epoch: 6   Global Step: 70780   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 10:09:01,589-Speed 5975.05 samples/sec   Loss 10.2246   LearningRate 0.2142   Epoch: 6   Global Step: 70790   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 10:09:08,465-Speed 5957.70 samples/sec   Loss 10.1845   LearningRate 0.2142   Epoch: 6   Global Step: 70800   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 10:09:15,323-Speed 5973.49 samples/sec   Loss 10.1643   LearningRate 0.2142   Epoch: 6   Global Step: 70810   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 10:09:22,207-Speed 5951.09 samples/sec   Loss 10.2034   LearningRate 0.2141   Epoch: 6   Global Step: 70820   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:09:29,072-Speed 5968.01 samples/sec   Loss 10.2351   LearningRate 0.2141   Epoch: 6   Global Step: 70830   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:09:35,931-Speed 5974.68 samples/sec   Loss 10.2063   LearningRate 0.2141   Epoch: 6   Global Step: 70840   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:09:42,779-Speed 5982.70 samples/sec   Loss 10.2182   LearningRate 0.2140   Epoch: 6   Global Step: 70850   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:09:49,646-Speed 5967.66 samples/sec   Loss 10.1416   LearningRate 0.2140   Epoch: 6   Global Step: 70860   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:09:56,497-Speed 5979.51 samples/sec   Loss 10.2204   LearningRate 0.2140   Epoch: 6   Global Step: 70870   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:10:03,345-Speed 5984.82 samples/sec   Loss 10.1748   LearningRate 0.2139   Epoch: 6   Global Step: 70880   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:10:10,197-Speed 5979.38 samples/sec   Loss 10.1341   LearningRate 0.2139   Epoch: 6   Global Step: 70890   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:10:17,082-Speed 5950.67 samples/sec   Loss 10.1072   LearningRate 0.2139   Epoch: 6   Global Step: 70900   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:10:23,938-Speed 5975.92 samples/sec   Loss 10.2076   LearningRate 0.2139   Epoch: 6   Global Step: 70910   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:10:30,794-Speed 5975.85 samples/sec   Loss 10.0977   LearningRate 0.2138   Epoch: 6   Global Step: 70920   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:10:37,655-Speed 5971.06 samples/sec   Loss 10.1442   LearningRate 0.2138   Epoch: 6   Global Step: 70930   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:10:44,537-Speed 5952.56 samples/sec   Loss 10.2912   LearningRate 0.2138   Epoch: 6   Global Step: 70940   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:10:51,408-Speed 5962.23 samples/sec   Loss 10.1708   LearningRate 0.2137   Epoch: 6   Global Step: 70950   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:10:58,272-Speed 5968.44 samples/sec   Loss 10.1940   LearningRate 0.2137   Epoch: 6   Global Step: 70960   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:11:05,142-Speed 5963.52 samples/sec   Loss 10.2339   LearningRate 0.2137   Epoch: 6   Global Step: 70970   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:11:11,999-Speed 5974.37 samples/sec   Loss 10.1339   LearningRate 0.2136   Epoch: 6   Global Step: 70980   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:11:18,855-Speed 5975.45 samples/sec   Loss 10.0804   LearningRate 0.2136   Epoch: 6   Global Step: 70990   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:11:25,704-Speed 5981.73 samples/sec   Loss 10.1584   LearningRate 0.2136   Epoch: 6   Global Step: 71000   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:11:32,556-Speed 5978.83 samples/sec   Loss 10.0881   LearningRate 0.2135   Epoch: 6   Global Step: 71010   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:11:39,425-Speed 5964.46 samples/sec   Loss 10.1705   LearningRate 0.2135   Epoch: 6   Global Step: 71020   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:11:46,378-Speed 5894.82 samples/sec   Loss 10.1008   LearningRate 0.2135   Epoch: 6   Global Step: 71030   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:11:53,255-Speed 5956.57 samples/sec   Loss 10.2114   LearningRate 0.2134   Epoch: 6   Global Step: 71040   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:12:00,129-Speed 5962.36 samples/sec   Loss 10.2278   LearningRate 0.2134   Epoch: 6   Global Step: 71050   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:12:06,975-Speed 5986.19 samples/sec   Loss 10.1461   LearningRate 0.2134   Epoch: 6   Global Step: 71060   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:12:13,828-Speed 5977.61 samples/sec   Loss 10.1703   LearningRate 0.2134   Epoch: 6   Global Step: 71070   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:12:20,701-Speed 5962.69 samples/sec   Loss 10.0980   LearningRate 0.2133   Epoch: 6   Global Step: 71080   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:12:27,568-Speed 5965.46 samples/sec   Loss 10.1662   LearningRate 0.2133   Epoch: 6   Global Step: 71090   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:12:34,447-Speed 5956.62 samples/sec   Loss 10.1289   LearningRate 0.2133   Epoch: 6   Global Step: 71100   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:12:41,305-Speed 5973.50 samples/sec   Loss 10.1365   LearningRate 0.2132   Epoch: 6   Global Step: 71110   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:12:48,170-Speed 5967.90 samples/sec   Loss 10.2330   LearningRate 0.2132   Epoch: 6   Global Step: 71120   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:12:55,045-Speed 5965.16 samples/sec   Loss 10.1536   LearningRate 0.2132   Epoch: 6   Global Step: 71130   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:13:01,911-Speed 5966.84 samples/sec   Loss 10.1164   LearningRate 0.2131   Epoch: 6   Global Step: 71140   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:13:08,768-Speed 5974.55 samples/sec   Loss 10.1285   LearningRate 0.2131   Epoch: 6   Global Step: 71150   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:13:15,636-Speed 5965.07 samples/sec   Loss 10.1888   LearningRate 0.2131   Epoch: 6   Global Step: 71160   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:13:22,535-Speed 5937.95 samples/sec   Loss 10.2258   LearningRate 0.2130   Epoch: 6   Global Step: 71170   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:13:29,398-Speed 5972.13 samples/sec   Loss 10.1761   LearningRate 0.2130   Epoch: 6   Global Step: 71180   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:13:36,255-Speed 5974.13 samples/sec   Loss 10.1760   LearningRate 0.2130   Epoch: 6   Global Step: 71190   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:13:43,118-Speed 5969.79 samples/sec   Loss 10.2310   LearningRate 0.2129   Epoch: 6   Global Step: 71200   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:13:49,973-Speed 5976.23 samples/sec   Loss 10.1775   LearningRate 0.2129   Epoch: 6   Global Step: 71210   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:13:56,844-Speed 5962.71 samples/sec   Loss 10.1302   LearningRate 0.2129   Epoch: 6   Global Step: 71220   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:14:03,725-Speed 5954.28 samples/sec   Loss 10.1230   LearningRate 0.2129   Epoch: 6   Global Step: 71230   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:14:10,610-Speed 5950.31 samples/sec   Loss 10.0984   LearningRate 0.2128   Epoch: 6   Global Step: 71240   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:14:17,503-Speed 5942.84 samples/sec   Loss 10.2127   LearningRate 0.2128   Epoch: 6   Global Step: 71250   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:14:24,354-Speed 5980.22 samples/sec   Loss 10.1375   LearningRate 0.2128   Epoch: 6   Global Step: 71260   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:14:31,215-Speed 5972.02 samples/sec   Loss 10.1732   LearningRate 0.2127   Epoch: 6   Global Step: 71270   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:14:38,069-Speed 5977.16 samples/sec   Loss 10.2182   LearningRate 0.2127   Epoch: 6   Global Step: 71280   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:14:44,922-Speed 5977.23 samples/sec   Loss 10.1286   LearningRate 0.2127   Epoch: 6   Global Step: 71290   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:14:51,778-Speed 5975.83 samples/sec   Loss 10.1257   LearningRate 0.2126   Epoch: 6   Global Step: 71300   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:14:58,659-Speed 5953.56 samples/sec   Loss 10.1545   LearningRate 0.2126   Epoch: 6   Global Step: 71310   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:15:05,524-Speed 5968.54 samples/sec   Loss 10.1786   LearningRate 0.2126   Epoch: 6   Global Step: 71320   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:15:12,376-Speed 5978.53 samples/sec   Loss 10.1317   LearningRate 0.2125   Epoch: 6   Global Step: 71330   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:15:19,252-Speed 5957.96 samples/sec   Loss 10.1392   LearningRate 0.2125   Epoch: 6   Global Step: 71340   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:15:26,106-Speed 5977.43 samples/sec   Loss 10.0900   LearningRate 0.2125   Epoch: 6   Global Step: 71350   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:15:32,984-Speed 5957.19 samples/sec   Loss 10.1422   LearningRate 0.2124   Epoch: 6   Global Step: 71360   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:15:39,858-Speed 5959.39 samples/sec   Loss 10.1086   LearningRate 0.2124   Epoch: 6   Global Step: 71370   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:15:46,750-Speed 5944.04 samples/sec   Loss 10.1064   LearningRate 0.2124   Epoch: 6   Global Step: 71380   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:15:53,582-Speed 5996.50 samples/sec   Loss 10.1743   LearningRate 0.2124   Epoch: 6   Global Step: 71390   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 10:16:00,431-Speed 5980.89 samples/sec   Loss 10.1550   LearningRate 0.2123   Epoch: 6   Global Step: 71400   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 10:16:07,294-Speed 5969.91 samples/sec   Loss 10.1956   LearningRate 0.2123   Epoch: 6   Global Step: 71410   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 10:16:14,162-Speed 5965.22 samples/sec   Loss 10.1865   LearningRate 0.2123   Epoch: 6   Global Step: 71420   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 10:16:21,008-Speed 5983.62 samples/sec   Loss 10.1905   LearningRate 0.2122   Epoch: 6   Global Step: 71430   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 10:16:27,868-Speed 5972.98 samples/sec   Loss 10.1793   LearningRate 0.2122   Epoch: 6   Global Step: 71440   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 10:16:34,718-Speed 5981.41 samples/sec   Loss 10.1682   LearningRate 0.2122   Epoch: 6   Global Step: 71450   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 10:16:41,555-Speed 5991.86 samples/sec   Loss 10.1152   LearningRate 0.2121   Epoch: 6   Global Step: 71460   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 10:16:48,404-Speed 5982.12 samples/sec   Loss 10.1309   LearningRate 0.2121   Epoch: 6   Global Step: 71470   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 10:16:55,266-Speed 5972.34 samples/sec   Loss 10.1668   LearningRate 0.2121   Epoch: 6   Global Step: 71480   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-01-08 10:17:02,148-Speed 5953.48 samples/sec   Loss 10.1922   LearningRate 0.2120   Epoch: 6   Global Step: 71490   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 10:17:09,022-Speed 5959.56 samples/sec   Loss 10.2432   LearningRate 0.2120   Epoch: 6   Global Step: 71500   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 10:17:15,884-Speed 5970.00 samples/sec   Loss 10.0898   LearningRate 0.2120   Epoch: 6   Global Step: 71510   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 10:17:22,746-Speed 5970.67 samples/sec   Loss 10.0655   LearningRate 0.2119   Epoch: 6   Global Step: 71520   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 10:17:29,618-Speed 5962.00 samples/sec   Loss 10.0876   LearningRate 0.2119   Epoch: 6   Global Step: 71530   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 10:17:36,473-Speed 5978.25 samples/sec   Loss 10.1201   LearningRate 0.2119   Epoch: 6   Global Step: 71540   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 10:17:43,319-Speed 5984.25 samples/sec   Loss 10.0674   LearningRate 0.2119   Epoch: 6   Global Step: 71550   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 10:17:50,172-Speed 5980.85 samples/sec   Loss 10.1131   LearningRate 0.2118   Epoch: 6   Global Step: 71560   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 10:17:57,023-Speed 5980.05 samples/sec   Loss 10.1400   LearningRate 0.2118   Epoch: 6   Global Step: 71570   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 10:18:03,898-Speed 5959.06 samples/sec   Loss 10.1315   LearningRate 0.2118   Epoch: 6   Global Step: 71580   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-01-08 10:18:10,764-Speed 5966.78 samples/sec   Loss 10.2174   LearningRate 0.2117   Epoch: 6   Global Step: 71590   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:18:17,639-Speed 5959.47 samples/sec   Loss 10.0425   LearningRate 0.2117   Epoch: 6   Global Step: 71600   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:18:24,493-Speed 5977.44 samples/sec   Loss 10.0693   LearningRate 0.2117   Epoch: 6   Global Step: 71610   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:18:31,347-Speed 5977.29 samples/sec   Loss 10.1531   LearningRate 0.2116   Epoch: 6   Global Step: 71620   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:18:39,155-Speed 5246.80 samples/sec   Loss 10.0777   LearningRate 0.2116   Epoch: 6   Global Step: 71630   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:18:45,995-Speed 5989.20 samples/sec   Loss 10.1602   LearningRate 0.2116   Epoch: 6   Global Step: 71640   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:18:52,836-Speed 5988.21 samples/sec   Loss 10.1250   LearningRate 0.2115   Epoch: 6   Global Step: 71650   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:18:59,682-Speed 5983.98 samples/sec   Loss 10.1474   LearningRate 0.2115   Epoch: 6   Global Step: 71660   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:19:06,541-Speed 5972.22 samples/sec   Loss 10.1579   LearningRate 0.2115   Epoch: 6   Global Step: 71670   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:19:13,387-Speed 5984.41 samples/sec   Loss 10.0566   LearningRate 0.2114   Epoch: 6   Global Step: 71680   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:19:20,232-Speed 5984.28 samples/sec   Loss 10.1023   LearningRate 0.2114   Epoch: 6   Global Step: 71690   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:19:27,090-Speed 5974.02 samples/sec   Loss 10.1163   LearningRate 0.2114   Epoch: 6   Global Step: 71700   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:19:33,992-Speed 5936.58 samples/sec   Loss 10.1234   LearningRate 0.2114   Epoch: 6   Global Step: 71710   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:19:40,853-Speed 5970.50 samples/sec   Loss 10.1708   LearningRate 0.2113   Epoch: 6   Global Step: 71720   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:19:47,707-Speed 5977.30 samples/sec   Loss 10.1322   LearningRate 0.2113   Epoch: 6   Global Step: 71730   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:19:54,560-Speed 5978.29 samples/sec   Loss 10.0940   LearningRate 0.2113   Epoch: 6   Global Step: 71740   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:20:01,439-Speed 5955.01 samples/sec   Loss 10.1563   LearningRate 0.2112   Epoch: 6   Global Step: 71750   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:20:08,293-Speed 5977.11 samples/sec   Loss 10.1021   LearningRate 0.2112   Epoch: 6   Global Step: 71760   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:20:15,158-Speed 5969.54 samples/sec   Loss 10.0739   LearningRate 0.2112   Epoch: 6   Global Step: 71770   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:20:22,010-Speed 5978.45 samples/sec   Loss 10.0550   LearningRate 0.2111   Epoch: 6   Global Step: 71780   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:20:28,861-Speed 5979.77 samples/sec   Loss 10.1531   LearningRate 0.2111   Epoch: 6   Global Step: 71790   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:20:35,740-Speed 5955.98 samples/sec   Loss 10.0547   LearningRate 0.2111   Epoch: 6   Global Step: 71800   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:20:42,599-Speed 5972.60 samples/sec   Loss 10.0968   LearningRate 0.2110   Epoch: 6   Global Step: 71810   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:20:49,465-Speed 5967.00 samples/sec   Loss 10.0564   LearningRate 0.2110   Epoch: 6   Global Step: 71820   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:20:56,338-Speed 5960.61 samples/sec   Loss 10.1261   LearningRate 0.2110   Epoch: 6   Global Step: 71830   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:21:03,220-Speed 5953.28 samples/sec   Loss 10.0875   LearningRate 0.2109   Epoch: 6   Global Step: 71840   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:21:10,072-Speed 5978.33 samples/sec   Loss 10.1829   LearningRate 0.2109   Epoch: 6   Global Step: 71850   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:21:16,926-Speed 5977.86 samples/sec   Loss 10.1447   LearningRate 0.2109   Epoch: 6   Global Step: 71860   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:21:23,788-Speed 5969.85 samples/sec   Loss 10.1551   LearningRate 0.2109   Epoch: 6   Global Step: 71870   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:21:30,643-Speed 5976.22 samples/sec   Loss 10.1169   LearningRate 0.2108   Epoch: 6   Global Step: 71880   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:21:37,491-Speed 5982.34 samples/sec   Loss 10.1301   LearningRate 0.2108   Epoch: 6   Global Step: 71890   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:21:44,339-Speed 5982.48 samples/sec   Loss 10.1021   LearningRate 0.2108   Epoch: 6   Global Step: 71900   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:21:51,187-Speed 5981.51 samples/sec   Loss 10.0844   LearningRate 0.2107   Epoch: 6   Global Step: 71910   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:21:58,059-Speed 5961.30 samples/sec   Loss 10.1751   LearningRate 0.2107   Epoch: 6   Global Step: 71920   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:22:04,928-Speed 5973.52 samples/sec   Loss 10.0245   LearningRate 0.2107   Epoch: 6   Global Step: 71930   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:22:11,813-Speed 5950.59 samples/sec   Loss 10.0987   LearningRate 0.2106   Epoch: 6   Global Step: 71940   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:22:18,666-Speed 5977.56 samples/sec   Loss 9.9746   LearningRate 0.2106   Epoch: 6   Global Step: 71950   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:22:25,533-Speed 5966.10 samples/sec   Loss 10.1477   LearningRate 0.2106   Epoch: 6   Global Step: 71960   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:22:32,393-Speed 5972.14 samples/sec   Loss 10.0779   LearningRate 0.2105   Epoch: 6   Global Step: 71970   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:22:39,258-Speed 5967.55 samples/sec   Loss 10.0878   LearningRate 0.2105   Epoch: 6   Global Step: 71980   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:22:46,137-Speed 5955.46 samples/sec   Loss 10.1559   LearningRate 0.2105   Epoch: 6   Global Step: 71990   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:22:53,000-Speed 5971.51 samples/sec   Loss 10.1251   LearningRate 0.2105   Epoch: 6   Global Step: 72000   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:22:59,884-Speed 5951.61 samples/sec   Loss 10.0540   LearningRate 0.2104   Epoch: 6   Global Step: 72010   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:23:06,753-Speed 5964.44 samples/sec   Loss 10.1330   LearningRate 0.2104   Epoch: 6   Global Step: 72020   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:23:13,624-Speed 5963.90 samples/sec   Loss 10.1215   LearningRate 0.2104   Epoch: 6   Global Step: 72030   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:23:20,490-Speed 5967.41 samples/sec   Loss 10.1348   LearningRate 0.2103   Epoch: 6   Global Step: 72040   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:23:27,353-Speed 5969.88 samples/sec   Loss 10.1223   LearningRate 0.2103   Epoch: 6   Global Step: 72050   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-01-08 10:23:34,216-Speed 5971.57 samples/sec   Loss 10.1222   LearningRate 0.2103   Epoch: 6   Global Step: 72060   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:23:41,092-Speed 5958.19 samples/sec   Loss 10.0221   LearningRate 0.2102   Epoch: 6   Global Step: 72070   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:23:47,969-Speed 5957.26 samples/sec   Loss 10.1186   LearningRate 0.2102   Epoch: 6   Global Step: 72080   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:23:54,829-Speed 5972.04 samples/sec   Loss 10.0721   LearningRate 0.2102   Epoch: 6   Global Step: 72090   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:24:01,693-Speed 5968.07 samples/sec   Loss 10.0785   LearningRate 0.2101   Epoch: 6   Global Step: 72100   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:24:08,556-Speed 5969.51 samples/sec   Loss 10.0702   LearningRate 0.2101   Epoch: 6   Global Step: 72110   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:24:15,416-Speed 5972.15 samples/sec   Loss 10.0648   LearningRate 0.2101   Epoch: 6   Global Step: 72120   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:24:22,287-Speed 5962.35 samples/sec   Loss 10.1124   LearningRate 0.2100   Epoch: 6   Global Step: 72130   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:24:29,145-Speed 5973.15 samples/sec   Loss 10.1571   LearningRate 0.2100   Epoch: 6   Global Step: 72140   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:24:36,035-Speed 5949.72 samples/sec   Loss 10.1061   LearningRate 0.2100   Epoch: 6   Global Step: 72150   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:24:42,882-Speed 5983.69 samples/sec   Loss 10.0405   LearningRate 0.2100   Epoch: 6   Global Step: 72160   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:24:49,743-Speed 5970.22 samples/sec   Loss 10.0998   LearningRate 0.2099   Epoch: 6   Global Step: 72170   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-01-08 10:24:56,613-Speed 5964.03 samples/sec   Loss 10.0698   LearningRate 0.2099   Epoch: 6   Global Step: 72180   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:25:03,464-Speed 5980.35 samples/sec   Loss 10.1085   LearningRate 0.2099   Epoch: 6   Global Step: 72190   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:25:10,308-Speed 5985.07 samples/sec   Loss 10.1437   LearningRate 0.2098   Epoch: 6   Global Step: 72200   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:25:17,177-Speed 5964.31 samples/sec   Loss 10.0677   LearningRate 0.2098   Epoch: 6   Global Step: 72210   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:25:24,058-Speed 5954.14 samples/sec   Loss 10.0854   LearningRate 0.2098   Epoch: 6   Global Step: 72220   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:25:30,926-Speed 5964.22 samples/sec   Loss 10.1225   LearningRate 0.2097   Epoch: 6   Global Step: 72230   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:25:37,787-Speed 5971.05 samples/sec   Loss 10.0314   LearningRate 0.2097   Epoch: 6   Global Step: 72240   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:25:44,629-Speed 5987.52 samples/sec   Loss 10.0630   LearningRate 0.2097   Epoch: 6   Global Step: 72250   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:25:51,474-Speed 5984.79 samples/sec   Loss 10.0026   LearningRate 0.2096   Epoch: 6   Global Step: 72260   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:25:58,322-Speed 5981.90 samples/sec   Loss 10.0442   LearningRate 0.2096   Epoch: 6   Global Step: 72270   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:26:05,167-Speed 5984.93 samples/sec   Loss 9.9779   LearningRate 0.2096   Epoch: 6   Global Step: 72280   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:26:12,022-Speed 5976.78 samples/sec   Loss 10.0792   LearningRate 0.2095   Epoch: 6   Global Step: 72290   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:26:18,877-Speed 5975.89 samples/sec   Loss 10.0635   LearningRate 0.2095   Epoch: 6   Global Step: 72300   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:26:25,788-Speed 5928.35 samples/sec   Loss 10.0846   LearningRate 0.2095   Epoch: 6   Global Step: 72310   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:26:32,686-Speed 5938.69 samples/sec   Loss 10.1049   LearningRate 0.2095   Epoch: 6   Global Step: 72320   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:26:39,606-Speed 5920.85 samples/sec   Loss 9.9751   LearningRate 0.2094   Epoch: 6   Global Step: 72330   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:26:47,061-Speed 5494.83 samples/sec   Loss 10.0761   LearningRate 0.2094   Epoch: 6   Global Step: 72340   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:26:53,956-Speed 5941.66 samples/sec   Loss 10.0786   LearningRate 0.2094   Epoch: 6   Global Step: 72350   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:27:00,805-Speed 5981.09 samples/sec   Loss 9.9856   LearningRate 0.2093   Epoch: 6   Global Step: 72360   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:27:07,655-Speed 5981.38 samples/sec   Loss 10.0957   LearningRate 0.2093   Epoch: 6   Global Step: 72370   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:27:14,568-Speed 5925.72 samples/sec   Loss 10.0949   LearningRate 0.2093   Epoch: 6   Global Step: 72380   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:27:21,426-Speed 5973.56 samples/sec   Loss 10.0777   LearningRate 0.2092   Epoch: 6   Global Step: 72390   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:27:28,289-Speed 5969.98 samples/sec   Loss 10.0585   LearningRate 0.2092   Epoch: 6   Global Step: 72400   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:27:35,156-Speed 5967.28 samples/sec   Loss 10.0830   LearningRate 0.2092   Epoch: 6   Global Step: 72410   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:27:42,021-Speed 5977.95 samples/sec   Loss 10.1077   LearningRate 0.2091   Epoch: 6   Global Step: 72420   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:27:48,872-Speed 5979.77 samples/sec   Loss 10.0571   LearningRate 0.2091   Epoch: 6   Global Step: 72430   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:27:55,730-Speed 5974.31 samples/sec   Loss 10.0415   LearningRate 0.2091   Epoch: 6   Global Step: 72440   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:28:02,587-Speed 5975.94 samples/sec   Loss 10.0163   LearningRate 0.2091   Epoch: 6   Global Step: 72450   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:28:09,446-Speed 5972.41 samples/sec   Loss 10.1428   LearningRate 0.2090   Epoch: 6   Global Step: 72460   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:28:16,308-Speed 5970.14 samples/sec   Loss 10.1581   LearningRate 0.2090   Epoch: 6   Global Step: 72470   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:28:23,187-Speed 5956.43 samples/sec   Loss 10.0283   LearningRate 0.2090   Epoch: 6   Global Step: 72480   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:28:30,054-Speed 5965.92 samples/sec   Loss 10.0630   LearningRate 0.2089   Epoch: 6   Global Step: 72490   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:28:36,938-Speed 5950.95 samples/sec   Loss 10.0361   LearningRate 0.2089   Epoch: 6   Global Step: 72500   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:28:43,805-Speed 5968.46 samples/sec   Loss 10.1141   LearningRate 0.2089   Epoch: 6   Global Step: 72510   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:28:50,641-Speed 5992.40 samples/sec   Loss 10.1116   LearningRate 0.2088   Epoch: 6   Global Step: 72520   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:28:57,493-Speed 5979.40 samples/sec   Loss 10.1246   LearningRate 0.2088   Epoch: 6   Global Step: 72530   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:29:04,380-Speed 5948.08 samples/sec   Loss 10.0223   LearningRate 0.2088   Epoch: 6   Global Step: 72540   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:29:11,235-Speed 5976.61 samples/sec   Loss 10.0714   LearningRate 0.2087   Epoch: 6   Global Step: 72550   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:29:18,110-Speed 5958.65 samples/sec   Loss 10.0620   LearningRate 0.2087   Epoch: 6   Global Step: 72560   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:29:25,015-Speed 5953.25 samples/sec   Loss 10.0647   LearningRate 0.2087   Epoch: 6   Global Step: 72570   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:29:31,858-Speed 5986.64 samples/sec   Loss 10.0320   LearningRate 0.2087   Epoch: 6   Global Step: 72580   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:29:55,540-Speed 1729.90 samples/sec   Loss 10.1085   LearningRate 0.2086   Epoch: 7   Global Step: 72590   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:30:02,355-Speed 6011.76 samples/sec   Loss 10.1052   LearningRate 0.2086   Epoch: 7   Global Step: 72600   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:30:09,187-Speed 5996.03 samples/sec   Loss 10.1062   LearningRate 0.2086   Epoch: 7   Global Step: 72610   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:30:16,019-Speed 5996.39 samples/sec   Loss 10.0749   LearningRate 0.2085   Epoch: 7   Global Step: 72620   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:30:22,855-Speed 5993.23 samples/sec   Loss 10.0265   LearningRate 0.2085   Epoch: 7   Global Step: 72630   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:30:29,717-Speed 5969.76 samples/sec   Loss 10.0312   LearningRate 0.2085   Epoch: 7   Global Step: 72640   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:30:36,573-Speed 5975.30 samples/sec   Loss 10.0442   LearningRate 0.2084   Epoch: 7   Global Step: 72650   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:30:43,414-Speed 5989.25 samples/sec   Loss 10.1393   LearningRate 0.2084   Epoch: 7   Global Step: 72660   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:30:50,293-Speed 5955.39 samples/sec   Loss 10.0856   LearningRate 0.2084   Epoch: 7   Global Step: 72670   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:30:57,207-Speed 5924.41 samples/sec   Loss 10.1029   LearningRate 0.2083   Epoch: 7   Global Step: 72680   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:31:04,093-Speed 5951.70 samples/sec   Loss 9.9594   LearningRate 0.2083   Epoch: 7   Global Step: 72690   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:31:10,961-Speed 5965.64 samples/sec   Loss 9.9396   LearningRate 0.2083   Epoch: 7   Global Step: 72700   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:31:17,867-Speed 5934.90 samples/sec   Loss 9.9967   LearningRate 0.2082   Epoch: 7   Global Step: 72710   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:31:24,746-Speed 5955.22 samples/sec   Loss 9.9787   LearningRate 0.2082   Epoch: 7   Global Step: 72720   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:31:31,623-Speed 5958.36 samples/sec   Loss 9.9772   LearningRate 0.2082   Epoch: 7   Global Step: 72730   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:31:38,487-Speed 5967.67 samples/sec   Loss 10.0316   LearningRate 0.2082   Epoch: 7   Global Step: 72740   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:31:45,364-Speed 5957.49 samples/sec   Loss 9.9787   LearningRate 0.2081   Epoch: 7   Global Step: 72750   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:31:52,224-Speed 5976.85 samples/sec   Loss 9.9674   LearningRate 0.2081   Epoch: 7   Global Step: 72760   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:31:59,110-Speed 5949.21 samples/sec   Loss 9.9511   LearningRate 0.2081   Epoch: 7   Global Step: 72770   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:32:05,983-Speed 5961.20 samples/sec   Loss 10.0194   LearningRate 0.2080   Epoch: 7   Global Step: 72780   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:32:12,853-Speed 5964.88 samples/sec   Loss 9.9745   LearningRate 0.2080   Epoch: 7   Global Step: 72790   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:32:19,728-Speed 5961.03 samples/sec   Loss 10.0118   LearningRate 0.2080   Epoch: 7   Global Step: 72800   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:32:26,571-Speed 5986.48 samples/sec   Loss 10.0459   LearningRate 0.2079   Epoch: 7   Global Step: 72810   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:32:33,420-Speed 5981.91 samples/sec   Loss 10.0214   LearningRate 0.2079   Epoch: 7   Global Step: 72820   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:32:40,276-Speed 5975.83 samples/sec   Loss 10.0582   LearningRate 0.2079   Epoch: 7   Global Step: 72830   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:32:47,139-Speed 5970.20 samples/sec   Loss 10.0050   LearningRate 0.2078   Epoch: 7   Global Step: 72840   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:32:53,990-Speed 5980.16 samples/sec   Loss 9.9780   LearningRate 0.2078   Epoch: 7   Global Step: 72850   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:33:00,863-Speed 5960.83 samples/sec   Loss 10.0755   LearningRate 0.2078   Epoch: 7   Global Step: 72860   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:33:07,721-Speed 5973.43 samples/sec   Loss 10.0119   LearningRate 0.2078   Epoch: 7   Global Step: 72870   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:33:14,594-Speed 5960.75 samples/sec   Loss 10.0419   LearningRate 0.2077   Epoch: 7   Global Step: 72880   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:33:21,466-Speed 5962.36 samples/sec   Loss 10.0148   LearningRate 0.2077   Epoch: 7   Global Step: 72890   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:33:28,326-Speed 5971.34 samples/sec   Loss 10.0384   LearningRate 0.2077   Epoch: 7   Global Step: 72900   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:33:35,183-Speed 5981.14 samples/sec   Loss 10.0849   LearningRate 0.2076   Epoch: 7   Global Step: 72910   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:33:42,046-Speed 5968.41 samples/sec   Loss 10.0646   LearningRate 0.2076   Epoch: 7   Global Step: 72920   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:33:48,893-Speed 5983.63 samples/sec   Loss 10.0019   LearningRate 0.2076   Epoch: 7   Global Step: 72930   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:33:55,813-Speed 5920.51 samples/sec   Loss 10.0519   LearningRate 0.2075   Epoch: 7   Global Step: 72940   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:34:02,669-Speed 5977.55 samples/sec   Loss 10.0580   LearningRate 0.2075   Epoch: 7   Global Step: 72950   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:34:09,522-Speed 5977.17 samples/sec   Loss 10.0093   LearningRate 0.2075   Epoch: 7   Global Step: 72960   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:34:16,371-Speed 5982.23 samples/sec   Loss 9.9986   LearningRate 0.2074   Epoch: 7   Global Step: 72970   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:34:23,228-Speed 5974.52 samples/sec   Loss 10.0169   LearningRate 0.2074   Epoch: 7   Global Step: 72980   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:34:30,087-Speed 5972.96 samples/sec   Loss 9.9821   LearningRate 0.2074   Epoch: 7   Global Step: 72990   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:34:36,936-Speed 5980.87 samples/sec   Loss 10.0564   LearningRate 0.2074   Epoch: 7   Global Step: 73000   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:34:43,794-Speed 5975.52 samples/sec   Loss 10.0584   LearningRate 0.2073   Epoch: 7   Global Step: 73010   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:34:50,671-Speed 5957.36 samples/sec   Loss 10.0011   LearningRate 0.2073   Epoch: 7   Global Step: 73020   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:34:57,571-Speed 5941.47 samples/sec   Loss 10.1651   LearningRate 0.2073   Epoch: 7   Global Step: 73030   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:35:04,438-Speed 5965.88 samples/sec   Loss 10.0621   LearningRate 0.2072   Epoch: 7   Global Step: 73040   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:35:11,304-Speed 5966.66 samples/sec   Loss 10.0116   LearningRate 0.2072   Epoch: 7   Global Step: 73050   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:35:18,185-Speed 5953.75 samples/sec   Loss 9.9896   LearningRate 0.2072   Epoch: 7   Global Step: 73060   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:35:25,035-Speed 5981.06 samples/sec   Loss 10.0111   LearningRate 0.2071   Epoch: 7   Global Step: 73070   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:35:31,899-Speed 5968.20 samples/sec   Loss 10.0552   LearningRate 0.2071   Epoch: 7   Global Step: 73080   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:35:38,760-Speed 5971.20 samples/sec   Loss 10.0148   LearningRate 0.2071   Epoch: 7   Global Step: 73090   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:35:45,622-Speed 5969.91 samples/sec   Loss 10.0137   LearningRate 0.2070   Epoch: 7   Global Step: 73100   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:35:52,477-Speed 5976.65 samples/sec   Loss 10.0313   LearningRate 0.2070   Epoch: 7   Global Step: 73110   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:35:59,336-Speed 5972.10 samples/sec   Loss 10.0426   LearningRate 0.2070   Epoch: 7   Global Step: 73120   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:36:06,225-Speed 5947.25 samples/sec   Loss 9.9597   LearningRate 0.2070   Epoch: 7   Global Step: 73130   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:36:13,066-Speed 5988.42 samples/sec   Loss 10.0070   LearningRate 0.2069   Epoch: 7   Global Step: 73140   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:36:19,941-Speed 5958.78 samples/sec   Loss 10.0143   LearningRate 0.2069   Epoch: 7   Global Step: 73150   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:36:26,792-Speed 5980.37 samples/sec   Loss 9.9880   LearningRate 0.2069   Epoch: 7   Global Step: 73160   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:36:33,644-Speed 5978.38 samples/sec   Loss 9.9389   LearningRate 0.2068   Epoch: 7   Global Step: 73170   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:36:40,525-Speed 5952.89 samples/sec   Loss 9.9210   LearningRate 0.2068   Epoch: 7   Global Step: 73180   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:36:47,379-Speed 5976.96 samples/sec   Loss 10.0104   LearningRate 0.2068   Epoch: 7   Global Step: 73190   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:36:54,222-Speed 5986.61 samples/sec   Loss 10.0392   LearningRate 0.2067   Epoch: 7   Global Step: 73200   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:37:01,078-Speed 5975.93 samples/sec   Loss 10.0533   LearningRate 0.2067   Epoch: 7   Global Step: 73210   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:37:07,936-Speed 5973.89 samples/sec   Loss 10.0256   LearningRate 0.2067   Epoch: 7   Global Step: 73220   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:37:14,826-Speed 5945.17 samples/sec   Loss 9.9584   LearningRate 0.2066   Epoch: 7   Global Step: 73230   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:37:21,680-Speed 5977.38 samples/sec   Loss 9.9151   LearningRate 0.2066   Epoch: 7   Global Step: 73240   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:37:28,545-Speed 5968.14 samples/sec   Loss 10.0181   LearningRate 0.2066   Epoch: 7   Global Step: 73250   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:37:35,409-Speed 5970.47 samples/sec   Loss 10.0418   LearningRate 0.2066   Epoch: 7   Global Step: 73260   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:37:42,267-Speed 5973.11 samples/sec   Loss 9.9165   LearningRate 0.2065   Epoch: 7   Global Step: 73270   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:37:49,131-Speed 5971.23 samples/sec   Loss 10.0475   LearningRate 0.2065   Epoch: 7   Global Step: 73280   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:37:55,978-Speed 5983.70 samples/sec   Loss 10.0066   LearningRate 0.2065   Epoch: 7   Global Step: 73290   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:38:02,824-Speed 5983.73 samples/sec   Loss 10.0198   LearningRate 0.2064   Epoch: 7   Global Step: 73300   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:38:09,674-Speed 5983.74 samples/sec   Loss 9.9717   LearningRate 0.2064   Epoch: 7   Global Step: 73310   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:38:16,527-Speed 5978.00 samples/sec   Loss 9.9789   LearningRate 0.2064   Epoch: 7   Global Step: 73320   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:38:23,385-Speed 5974.71 samples/sec   Loss 9.9643   LearningRate 0.2063   Epoch: 7   Global Step: 73330   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:38:30,238-Speed 5977.75 samples/sec   Loss 10.0039   LearningRate 0.2063   Epoch: 7   Global Step: 73340   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:38:37,123-Speed 5950.55 samples/sec   Loss 9.9993   LearningRate 0.2063   Epoch: 7   Global Step: 73350   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:38:44,013-Speed 5945.52 samples/sec   Loss 9.9698   LearningRate 0.2062   Epoch: 7   Global Step: 73360   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:38:50,875-Speed 5970.59 samples/sec   Loss 9.9711   LearningRate 0.2062   Epoch: 7   Global Step: 73370   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:38:57,739-Speed 5969.18 samples/sec   Loss 9.9428   LearningRate 0.2062   Epoch: 7   Global Step: 73380   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:39:04,598-Speed 5972.70 samples/sec   Loss 9.9719   LearningRate 0.2062   Epoch: 7   Global Step: 73390   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:39:11,452-Speed 5977.37 samples/sec   Loss 10.0124   LearningRate 0.2061   Epoch: 7   Global Step: 73400   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:39:18,325-Speed 5960.87 samples/sec   Loss 9.9797   LearningRate 0.2061   Epoch: 7   Global Step: 73410   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:39:25,210-Speed 5950.42 samples/sec   Loss 9.9093   LearningRate 0.2061   Epoch: 7   Global Step: 73420   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:39:32,070-Speed 5971.90 samples/sec   Loss 9.9865   LearningRate 0.2060   Epoch: 7   Global Step: 73430   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:39:38,918-Speed 5982.82 samples/sec   Loss 10.0443   LearningRate 0.2060   Epoch: 7   Global Step: 73440   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:39:45,851-Speed 5909.39 samples/sec   Loss 9.9627   LearningRate 0.2060   Epoch: 7   Global Step: 73450   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:39:52,815-Speed 5882.97 samples/sec   Loss 10.1128   LearningRate 0.2059   Epoch: 7   Global Step: 73460   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:39:59,694-Speed 5955.16 samples/sec   Loss 9.9950   LearningRate 0.2059   Epoch: 7   Global Step: 73470   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:40:06,543-Speed 5982.12 samples/sec   Loss 10.0172   LearningRate 0.2059   Epoch: 7   Global Step: 73480   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:40:13,407-Speed 5968.25 samples/sec   Loss 10.0192   LearningRate 0.2058   Epoch: 7   Global Step: 73490   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:40:20,263-Speed 5975.30 samples/sec   Loss 9.9874   LearningRate 0.2058   Epoch: 7   Global Step: 73500   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:40:27,108-Speed 5985.41 samples/sec   Loss 9.9780   LearningRate 0.2058   Epoch: 7   Global Step: 73510   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:40:33,998-Speed 5946.73 samples/sec   Loss 9.9874   LearningRate 0.2058   Epoch: 7   Global Step: 73520   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:40:40,936-Speed 5905.37 samples/sec   Loss 9.9817   LearningRate 0.2057   Epoch: 7   Global Step: 73530   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:40:47,795-Speed 5973.03 samples/sec   Loss 10.0311   LearningRate 0.2057   Epoch: 7   Global Step: 73540   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:40:54,664-Speed 5963.94 samples/sec   Loss 9.9042   LearningRate 0.2057   Epoch: 7   Global Step: 73550   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:41:01,565-Speed 5936.39 samples/sec   Loss 9.8344   LearningRate 0.2056   Epoch: 7   Global Step: 73560   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:41:08,444-Speed 5955.95 samples/sec   Loss 9.9114   LearningRate 0.2056   Epoch: 7   Global Step: 73570   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:41:15,343-Speed 5938.97 samples/sec   Loss 10.0045   LearningRate 0.2056   Epoch: 7   Global Step: 73580   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:41:22,208-Speed 5967.38 samples/sec   Loss 9.9404   LearningRate 0.2055   Epoch: 7   Global Step: 73590   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:41:29,075-Speed 5964.99 samples/sec   Loss 9.9887   LearningRate 0.2055   Epoch: 7   Global Step: 73600   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:41:35,954-Speed 5955.23 samples/sec   Loss 9.9425   LearningRate 0.2055   Epoch: 7   Global Step: 73610   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:41:42,813-Speed 5973.29 samples/sec   Loss 10.0240   LearningRate 0.2054   Epoch: 7   Global Step: 73620   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:41:49,665-Speed 5978.89 samples/sec   Loss 9.9386   LearningRate 0.2054   Epoch: 7   Global Step: 73630   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:41:56,535-Speed 5963.13 samples/sec   Loss 9.9028   LearningRate 0.2054   Epoch: 7   Global Step: 73640   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:42:03,414-Speed 5955.74 samples/sec   Loss 9.9214   LearningRate 0.2054   Epoch: 7   Global Step: 73650   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:42:10,280-Speed 5967.80 samples/sec   Loss 9.9261   LearningRate 0.2053   Epoch: 7   Global Step: 73660   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:42:17,195-Speed 5924.23 samples/sec   Loss 10.0051   LearningRate 0.2053   Epoch: 7   Global Step: 73670   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:42:24,129-Speed 5908.04 samples/sec   Loss 10.0361   LearningRate 0.2053   Epoch: 7   Global Step: 73680   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:42:31,068-Speed 5904.47 samples/sec   Loss 9.9912   LearningRate 0.2052   Epoch: 7   Global Step: 73690   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:42:38,005-Speed 5905.69 samples/sec   Loss 9.9383   LearningRate 0.2052   Epoch: 7   Global Step: 73700   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:42:44,898-Speed 5942.98 samples/sec   Loss 9.9252   LearningRate 0.2052   Epoch: 7   Global Step: 73710   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:42:51,752-Speed 5976.92 samples/sec   Loss 9.9168   LearningRate 0.2051   Epoch: 7   Global Step: 73720   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:42:58,601-Speed 5982.25 samples/sec   Loss 10.0044   LearningRate 0.2051   Epoch: 7   Global Step: 73730   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:43:05,465-Speed 5967.64 samples/sec   Loss 9.9174   LearningRate 0.2051   Epoch: 7   Global Step: 73740   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:43:12,318-Speed 5978.27 samples/sec   Loss 9.9183   LearningRate 0.2050   Epoch: 7   Global Step: 73750   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:43:19,196-Speed 5956.85 samples/sec   Loss 10.0018   LearningRate 0.2050   Epoch: 7   Global Step: 73760   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:43:26,055-Speed 5972.94 samples/sec   Loss 9.9672   LearningRate 0.2050   Epoch: 7   Global Step: 73770   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:43:32,939-Speed 5951.82 samples/sec   Loss 9.9125   LearningRate 0.2050   Epoch: 7   Global Step: 73780   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:43:39,866-Speed 5916.55 samples/sec   Loss 9.9906   LearningRate 0.2049   Epoch: 7   Global Step: 73790   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:43:46,731-Speed 5967.62 samples/sec   Loss 10.0053   LearningRate 0.2049   Epoch: 7   Global Step: 73800   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:43:53,564-Speed 5995.08 samples/sec   Loss 10.0330   LearningRate 0.2049   Epoch: 7   Global Step: 73810   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:44:00,438-Speed 5959.91 samples/sec   Loss 9.9916   LearningRate 0.2048   Epoch: 7   Global Step: 73820   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:44:07,294-Speed 5975.90 samples/sec   Loss 9.9371   LearningRate 0.2048   Epoch: 7   Global Step: 73830   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:44:14,159-Speed 5967.81 samples/sec   Loss 9.9568   LearningRate 0.2048   Epoch: 7   Global Step: 73840   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:44:21,028-Speed 5964.39 samples/sec   Loss 9.9283   LearningRate 0.2047   Epoch: 7   Global Step: 73850   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:44:27,878-Speed 5980.18 samples/sec   Loss 9.9621   LearningRate 0.2047   Epoch: 7   Global Step: 73860   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:44:34,723-Speed 5984.66 samples/sec   Loss 10.0050   LearningRate 0.2047   Epoch: 7   Global Step: 73870   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:44:41,586-Speed 5969.44 samples/sec   Loss 9.9334   LearningRate 0.2046   Epoch: 7   Global Step: 73880   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:44:48,437-Speed 5979.78 samples/sec   Loss 10.0118   LearningRate 0.2046   Epoch: 7   Global Step: 73890   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:44:55,294-Speed 5977.44 samples/sec   Loss 9.8754   LearningRate 0.2046   Epoch: 7   Global Step: 73900   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:45:02,183-Speed 5947.01 samples/sec   Loss 9.8480   LearningRate 0.2046   Epoch: 7   Global Step: 73910   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:45:09,041-Speed 5973.61 samples/sec   Loss 9.9305   LearningRate 0.2045   Epoch: 7   Global Step: 73920   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:45:15,886-Speed 5985.61 samples/sec   Loss 9.9289   LearningRate 0.2045   Epoch: 7   Global Step: 73930   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:45:22,776-Speed 5946.23 samples/sec   Loss 9.9648   LearningRate 0.2045   Epoch: 7   Global Step: 73940   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:45:29,652-Speed 5957.52 samples/sec   Loss 9.9766   LearningRate 0.2044   Epoch: 7   Global Step: 73950   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:45:36,562-Speed 5930.96 samples/sec   Loss 9.9949   LearningRate 0.2044   Epoch: 7   Global Step: 73960   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:45:43,440-Speed 5956.18 samples/sec   Loss 10.0429   LearningRate 0.2044   Epoch: 7   Global Step: 73970   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:45:50,297-Speed 5975.10 samples/sec   Loss 9.9310   LearningRate 0.2043   Epoch: 7   Global Step: 73980   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:45:57,139-Speed 5987.72 samples/sec   Loss 9.9497   LearningRate 0.2043   Epoch: 7   Global Step: 73990   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:46:03,983-Speed 5985.51 samples/sec   Loss 9.8888   LearningRate 0.2043   Epoch: 7   Global Step: 74000   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:46:10,826-Speed 5986.83 samples/sec   Loss 10.0111   LearningRate 0.2042   Epoch: 7   Global Step: 74010   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:46:17,664-Speed 5990.32 samples/sec   Loss 9.9507   LearningRate 0.2042   Epoch: 7   Global Step: 74020   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:46:24,570-Speed 5932.58 samples/sec   Loss 9.8546   LearningRate 0.2042   Epoch: 7   Global Step: 74030   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:46:31,441-Speed 5962.35 samples/sec   Loss 9.9119   LearningRate 0.2042   Epoch: 7   Global Step: 74040   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:46:38,295-Speed 5977.04 samples/sec   Loss 9.9288   LearningRate 0.2041   Epoch: 7   Global Step: 74050   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:46:45,145-Speed 5980.32 samples/sec   Loss 9.9127   LearningRate 0.2041   Epoch: 7   Global Step: 74060   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:46:52,040-Speed 5942.73 samples/sec   Loss 9.9978   LearningRate 0.2041   Epoch: 7   Global Step: 74070   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:46:58,916-Speed 5957.74 samples/sec   Loss 9.9766   LearningRate 0.2040   Epoch: 7   Global Step: 74080   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:47:05,761-Speed 5984.91 samples/sec   Loss 9.9266   LearningRate 0.2040   Epoch: 7   Global Step: 74090   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:47:12,617-Speed 5976.27 samples/sec   Loss 9.8917   LearningRate 0.2040   Epoch: 7   Global Step: 74100   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:47:19,471-Speed 5978.40 samples/sec   Loss 9.9823   LearningRate 0.2039   Epoch: 7   Global Step: 74110   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:47:26,333-Speed 5970.61 samples/sec   Loss 10.0026   LearningRate 0.2039   Epoch: 7   Global Step: 74120   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:47:33,193-Speed 5972.00 samples/sec   Loss 9.9687   LearningRate 0.2039   Epoch: 7   Global Step: 74130   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:47:40,057-Speed 5968.34 samples/sec   Loss 9.9958   LearningRate 0.2038   Epoch: 7   Global Step: 74140   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:47:46,913-Speed 5975.43 samples/sec   Loss 9.8965   LearningRate 0.2038   Epoch: 7   Global Step: 74150   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:47:53,762-Speed 5982.35 samples/sec   Loss 9.9520   LearningRate 0.2038   Epoch: 7   Global Step: 74160   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:48:00,619-Speed 5973.95 samples/sec   Loss 10.0109   LearningRate 0.2038   Epoch: 7   Global Step: 74170   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:48:07,465-Speed 5983.81 samples/sec   Loss 9.9610   LearningRate 0.2037   Epoch: 7   Global Step: 74180   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:48:14,323-Speed 5973.54 samples/sec   Loss 9.8954   LearningRate 0.2037   Epoch: 7   Global Step: 74190   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:48:21,170-Speed 5983.51 samples/sec   Loss 9.9760   LearningRate 0.2037   Epoch: 7   Global Step: 74200   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:48:28,008-Speed 5990.99 samples/sec   Loss 9.9241   LearningRate 0.2036   Epoch: 7   Global Step: 74210   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:48:34,863-Speed 5975.94 samples/sec   Loss 9.9010   LearningRate 0.2036   Epoch: 7   Global Step: 74220   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:48:41,733-Speed 5963.63 samples/sec   Loss 9.9008   LearningRate 0.2036   Epoch: 7   Global Step: 74230   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:48:48,632-Speed 5938.06 samples/sec   Loss 9.8977   LearningRate 0.2035   Epoch: 7   Global Step: 74240   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:48:55,490-Speed 5973.68 samples/sec   Loss 9.9060   LearningRate 0.2035   Epoch: 7   Global Step: 74250   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:49:02,346-Speed 5975.46 samples/sec   Loss 9.9081   LearningRate 0.2035   Epoch: 7   Global Step: 74260   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:49:09,195-Speed 5981.58 samples/sec   Loss 9.8941   LearningRate 0.2035   Epoch: 7   Global Step: 74270   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:49:16,040-Speed 5984.55 samples/sec   Loss 9.9339   LearningRate 0.2034   Epoch: 7   Global Step: 74280   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:49:22,909-Speed 5965.88 samples/sec   Loss 9.9186   LearningRate 0.2034   Epoch: 7   Global Step: 74290   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:49:29,762-Speed 5978.14 samples/sec   Loss 9.9879   LearningRate 0.2034   Epoch: 7   Global Step: 74300   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:49:36,616-Speed 5977.46 samples/sec   Loss 9.8998   LearningRate 0.2033   Epoch: 7   Global Step: 74310   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:49:43,481-Speed 5970.63 samples/sec   Loss 9.8880   LearningRate 0.2033   Epoch: 7   Global Step: 74320   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:49:50,342-Speed 5971.12 samples/sec   Loss 9.9495   LearningRate 0.2033   Epoch: 7   Global Step: 74330   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:49:57,203-Speed 5970.90 samples/sec   Loss 9.8648   LearningRate 0.2032   Epoch: 7   Global Step: 74340   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:50:04,066-Speed 5969.69 samples/sec   Loss 9.9684   LearningRate 0.2032   Epoch: 7   Global Step: 74350   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:50:10,929-Speed 5968.73 samples/sec   Loss 9.9689   LearningRate 0.2032   Epoch: 7   Global Step: 74360   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:50:17,786-Speed 5975.06 samples/sec   Loss 9.9787   LearningRate 0.2031   Epoch: 7   Global Step: 74370   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:50:24,645-Speed 5972.72 samples/sec   Loss 10.0038   LearningRate 0.2031   Epoch: 7   Global Step: 74380   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:50:31,498-Speed 5977.81 samples/sec   Loss 9.9070   LearningRate 0.2031   Epoch: 7   Global Step: 74390   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:50:38,360-Speed 5970.55 samples/sec   Loss 9.9079   LearningRate 0.2031   Epoch: 7   Global Step: 74400   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:50:45,207-Speed 5983.15 samples/sec   Loss 9.9022   LearningRate 0.2030   Epoch: 7   Global Step: 74410   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:50:52,061-Speed 5977.91 samples/sec   Loss 9.8012   LearningRate 0.2030   Epoch: 7   Global Step: 74420   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:50:58,920-Speed 5972.87 samples/sec   Loss 9.8468   LearningRate 0.2030   Epoch: 7   Global Step: 74430   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:51:05,765-Speed 5984.97 samples/sec   Loss 9.8728   LearningRate 0.2029   Epoch: 7   Global Step: 74440   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:51:12,620-Speed 5976.28 samples/sec   Loss 9.8717   LearningRate 0.2029   Epoch: 7   Global Step: 74450   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:51:19,475-Speed 5975.97 samples/sec   Loss 9.9796   LearningRate 0.2029   Epoch: 7   Global Step: 74460   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:51:26,317-Speed 5987.51 samples/sec   Loss 9.8820   LearningRate 0.2028   Epoch: 7   Global Step: 74470   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:51:33,169-Speed 5979.27 samples/sec   Loss 9.9336   LearningRate 0.2028   Epoch: 7   Global Step: 74480   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:51:40,021-Speed 5978.28 samples/sec   Loss 9.9019   LearningRate 0.2028   Epoch: 7   Global Step: 74490   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:51:46,883-Speed 5972.58 samples/sec   Loss 9.7905   LearningRate 0.2027   Epoch: 7   Global Step: 74500   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:51:53,726-Speed 5986.57 samples/sec   Loss 9.9370   LearningRate 0.2027   Epoch: 7   Global Step: 74510   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:52:00,645-Speed 5921.73 samples/sec   Loss 9.8846   LearningRate 0.2027   Epoch: 7   Global Step: 74520   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:52:07,499-Speed 5977.35 samples/sec   Loss 9.9270   LearningRate 0.2027   Epoch: 7   Global Step: 74530   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:52:14,358-Speed 5971.71 samples/sec   Loss 9.8306   LearningRate 0.2026   Epoch: 7   Global Step: 74540   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:52:21,204-Speed 5984.64 samples/sec   Loss 9.9279   LearningRate 0.2026   Epoch: 7   Global Step: 74550   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:52:28,056-Speed 5978.97 samples/sec   Loss 9.9196   LearningRate 0.2026   Epoch: 7   Global Step: 74560   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:52:34,902-Speed 5984.19 samples/sec   Loss 9.8261   LearningRate 0.2025   Epoch: 7   Global Step: 74570   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:52:41,741-Speed 5990.54 samples/sec   Loss 9.9227   LearningRate 0.2025   Epoch: 7   Global Step: 74580   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:52:48,602-Speed 5971.13 samples/sec   Loss 9.8857   LearningRate 0.2025   Epoch: 7   Global Step: 74590   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:52:55,464-Speed 5970.08 samples/sec   Loss 9.8627   LearningRate 0.2024   Epoch: 7   Global Step: 74600   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:53:02,330-Speed 5966.90 samples/sec   Loss 9.8271   LearningRate 0.2024   Epoch: 7   Global Step: 74610   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:53:09,291-Speed 5885.74 samples/sec   Loss 9.8914   LearningRate 0.2024   Epoch: 7   Global Step: 74620   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:53:16,206-Speed 5924.41 samples/sec   Loss 9.9350   LearningRate 0.2024   Epoch: 7   Global Step: 74630   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:53:23,071-Speed 5967.17 samples/sec   Loss 9.8952   LearningRate 0.2023   Epoch: 7   Global Step: 74640   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:53:29,939-Speed 5965.44 samples/sec   Loss 9.8461   LearningRate 0.2023   Epoch: 7   Global Step: 74650   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:53:36,799-Speed 5972.53 samples/sec   Loss 9.9091   LearningRate 0.2023   Epoch: 7   Global Step: 74660   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:53:43,657-Speed 5973.31 samples/sec   Loss 10.0026   LearningRate 0.2022   Epoch: 7   Global Step: 74670   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:53:50,511-Speed 5977.56 samples/sec   Loss 9.8999   LearningRate 0.2022   Epoch: 7   Global Step: 74680   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:53:57,391-Speed 5956.80 samples/sec   Loss 9.8206   LearningRate 0.2022   Epoch: 7   Global Step: 74690   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:54:04,249-Speed 5975.95 samples/sec   Loss 9.8787   LearningRate 0.2021   Epoch: 7   Global Step: 74700   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:54:11,102-Speed 5981.10 samples/sec   Loss 9.9097   LearningRate 0.2021   Epoch: 7   Global Step: 74710   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:54:17,973-Speed 5963.31 samples/sec   Loss 9.9274   LearningRate 0.2021   Epoch: 7   Global Step: 74720   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:54:24,831-Speed 5973.99 samples/sec   Loss 9.8769   LearningRate 0.2020   Epoch: 7   Global Step: 74730   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:54:31,687-Speed 5975.79 samples/sec   Loss 9.9221   LearningRate 0.2020   Epoch: 7   Global Step: 74740   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:54:38,576-Speed 5946.59 samples/sec   Loss 9.8491   LearningRate 0.2020   Epoch: 7   Global Step: 74750   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:54:45,465-Speed 5947.28 samples/sec   Loss 9.8465   LearningRate 0.2020   Epoch: 7   Global Step: 74760   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:54:52,319-Speed 5978.12 samples/sec   Loss 9.9558   LearningRate 0.2019   Epoch: 7   Global Step: 74770   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:54:59,169-Speed 5980.60 samples/sec   Loss 9.9455   LearningRate 0.2019   Epoch: 7   Global Step: 74780   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:55:06,042-Speed 5961.34 samples/sec   Loss 9.8984   LearningRate 0.2019   Epoch: 7   Global Step: 74790   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:55:12,882-Speed 5988.96 samples/sec   Loss 9.8920   LearningRate 0.2018   Epoch: 7   Global Step: 74800   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:55:19,726-Speed 5987.72 samples/sec   Loss 9.8992   LearningRate 0.2018   Epoch: 7   Global Step: 74810   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:55:26,588-Speed 5972.84 samples/sec   Loss 9.7959   LearningRate 0.2018   Epoch: 7   Global Step: 74820   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:55:33,451-Speed 5970.47 samples/sec   Loss 9.8369   LearningRate 0.2017   Epoch: 7   Global Step: 74830   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:55:40,302-Speed 5979.09 samples/sec   Loss 9.7988   LearningRate 0.2017   Epoch: 7   Global Step: 74840   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:55:47,143-Speed 5988.76 samples/sec   Loss 9.9008   LearningRate 0.2017   Epoch: 7   Global Step: 74850   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:55:53,995-Speed 5978.42 samples/sec   Loss 9.9114   LearningRate 0.2017   Epoch: 7   Global Step: 74860   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:56:00,857-Speed 5970.12 samples/sec   Loss 9.8766   LearningRate 0.2016   Epoch: 7   Global Step: 74870   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:56:07,722-Speed 5967.93 samples/sec   Loss 9.8505   LearningRate 0.2016   Epoch: 7   Global Step: 74880   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:56:14,574-Speed 5981.37 samples/sec   Loss 9.8064   LearningRate 0.2016   Epoch: 7   Global Step: 74890   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:56:21,425-Speed 5980.04 samples/sec   Loss 9.8308   LearningRate 0.2015   Epoch: 7   Global Step: 74900   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:56:28,280-Speed 5976.46 samples/sec   Loss 9.8633   LearningRate 0.2015   Epoch: 7   Global Step: 74910   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:56:35,133-Speed 5977.95 samples/sec   Loss 9.8963   LearningRate 0.2015   Epoch: 7   Global Step: 74920   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:56:42,012-Speed 5955.92 samples/sec   Loss 9.9357   LearningRate 0.2014   Epoch: 7   Global Step: 74930   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:56:48,887-Speed 5959.31 samples/sec   Loss 9.8607   LearningRate 0.2014   Epoch: 7   Global Step: 74940   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:56:55,739-Speed 5980.42 samples/sec   Loss 9.7999   LearningRate 0.2014   Epoch: 7   Global Step: 74950   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:57:02,583-Speed 5985.59 samples/sec   Loss 9.9857   LearningRate 0.2013   Epoch: 7   Global Step: 74960   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:57:09,428-Speed 5984.95 samples/sec   Loss 9.8447   LearningRate 0.2013   Epoch: 7   Global Step: 74970   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:57:16,369-Speed 5902.99 samples/sec   Loss 9.8906   LearningRate 0.2013   Epoch: 7   Global Step: 74980   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:57:23,225-Speed 5975.36 samples/sec   Loss 9.8835   LearningRate 0.2013   Epoch: 7   Global Step: 74990   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 10:57:30,089-Speed 5970.39 samples/sec   Loss 9.9370   LearningRate 0.2012   Epoch: 7   Global Step: 75000   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:57:56,807-[lfw][75000]XNorm: 23.391407
Training: 2022-01-08 10:57:56,808-[lfw][75000]Accuracy-Flip: 0.99717+-0.00279
Training: 2022-01-08 10:57:56,808-[lfw][75000]Accuracy-Highest: 0.99750
Training: 2022-01-08 10:58:27,710-[cfp_fp][75000]XNorm: 20.369652
Training: 2022-01-08 10:58:27,711-[cfp_fp][75000]Accuracy-Flip: 0.97314+-0.00820
Training: 2022-01-08 10:58:27,713-[cfp_fp][75000]Accuracy-Highest: 0.97686
Training: 2022-01-08 10:58:54,492-[agedb_30][75000]XNorm: 22.560914
Training: 2022-01-08 10:58:54,493-[agedb_30][75000]Accuracy-Flip: 0.96800+-0.00792
Training: 2022-01-08 10:58:54,493-[agedb_30][75000]Accuracy-Highest: 0.96800
Training: 2022-01-08 10:59:01,345-Speed 448.86 samples/sec   Loss 9.8661   LearningRate 0.2012   Epoch: 7   Global Step: 75010   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:59:08,186-Speed 5988.74 samples/sec   Loss 9.8623   LearningRate 0.2012   Epoch: 7   Global Step: 75020   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:59:15,071-Speed 5950.71 samples/sec   Loss 9.9138   LearningRate 0.2011   Epoch: 7   Global Step: 75030   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:59:21,940-Speed 5964.30 samples/sec   Loss 9.8377   LearningRate 0.2011   Epoch: 7   Global Step: 75040   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:59:28,856-Speed 5923.66 samples/sec   Loss 9.8907   LearningRate 0.2011   Epoch: 7   Global Step: 75050   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:59:35,724-Speed 5964.59 samples/sec   Loss 9.9059   LearningRate 0.2010   Epoch: 7   Global Step: 75060   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:59:42,618-Speed 5943.30 samples/sec   Loss 9.8782   LearningRate 0.2010   Epoch: 7   Global Step: 75070   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 10:59:49,507-Speed 5946.58 samples/sec   Loss 9.9137   LearningRate 0.2010   Epoch: 7   Global Step: 75080   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 10:59:56,400-Speed 5943.99 samples/sec   Loss 9.8888   LearningRate 0.2010   Epoch: 7   Global Step: 75090   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:00:03,267-Speed 5965.43 samples/sec   Loss 9.9988   LearningRate 0.2009   Epoch: 7   Global Step: 75100   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:00:10,174-Speed 5931.20 samples/sec   Loss 9.8769   LearningRate 0.2009   Epoch: 7   Global Step: 75110   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:00:17,054-Speed 5954.88 samples/sec   Loss 9.8333   LearningRate 0.2009   Epoch: 7   Global Step: 75120   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:00:23,963-Speed 5929.97 samples/sec   Loss 9.8276   LearningRate 0.2008   Epoch: 7   Global Step: 75130   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:00:30,826-Speed 5969.75 samples/sec   Loss 9.9001   LearningRate 0.2008   Epoch: 7   Global Step: 75140   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:00:37,683-Speed 5975.06 samples/sec   Loss 9.8614   LearningRate 0.2008   Epoch: 7   Global Step: 75150   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:00:44,532-Speed 5981.70 samples/sec   Loss 9.8142   LearningRate 0.2007   Epoch: 7   Global Step: 75160   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:00:51,392-Speed 5971.56 samples/sec   Loss 9.9512   LearningRate 0.2007   Epoch: 7   Global Step: 75170   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:00:58,285-Speed 5943.78 samples/sec   Loss 9.8145   LearningRate 0.2007   Epoch: 7   Global Step: 75180   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:01:05,151-Speed 5967.10 samples/sec   Loss 9.8769   LearningRate 0.2006   Epoch: 7   Global Step: 75190   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:01:12,011-Speed 5972.01 samples/sec   Loss 9.8497   LearningRate 0.2006   Epoch: 7   Global Step: 75200   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:01:18,874-Speed 5969.33 samples/sec   Loss 9.9488   LearningRate 0.2006   Epoch: 7   Global Step: 75210   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:01:25,722-Speed 5982.68 samples/sec   Loss 9.8119   LearningRate 0.2006   Epoch: 7   Global Step: 75220   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:01:32,621-Speed 5937.76 samples/sec   Loss 9.8429   LearningRate 0.2005   Epoch: 7   Global Step: 75230   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:01:39,477-Speed 5975.91 samples/sec   Loss 9.8838   LearningRate 0.2005   Epoch: 7   Global Step: 75240   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:01:46,345-Speed 5965.14 samples/sec   Loss 9.9020   LearningRate 0.2005   Epoch: 7   Global Step: 75250   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:01:53,238-Speed 5943.25 samples/sec   Loss 9.8185   LearningRate 0.2004   Epoch: 7   Global Step: 75260   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:02:00,106-Speed 5966.60 samples/sec   Loss 9.8728   LearningRate 0.2004   Epoch: 7   Global Step: 75270   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:02:06,976-Speed 5966.25 samples/sec   Loss 9.8593   LearningRate 0.2004   Epoch: 7   Global Step: 75280   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 11:02:13,844-Speed 5964.77 samples/sec   Loss 9.8436   LearningRate 0.2003   Epoch: 7   Global Step: 75290   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:02:20,699-Speed 5976.44 samples/sec   Loss 9.8876   LearningRate 0.2003   Epoch: 7   Global Step: 75300   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:02:27,567-Speed 5964.74 samples/sec   Loss 9.8409   LearningRate 0.2003   Epoch: 7   Global Step: 75310   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:02:34,420-Speed 5978.20 samples/sec   Loss 9.7889   LearningRate 0.2003   Epoch: 7   Global Step: 75320   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:02:41,304-Speed 5952.26 samples/sec   Loss 9.8511   LearningRate 0.2002   Epoch: 7   Global Step: 75330   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:02:48,164-Speed 5973.91 samples/sec   Loss 9.8549   LearningRate 0.2002   Epoch: 7   Global Step: 75340   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:02:55,037-Speed 5960.92 samples/sec   Loss 9.8054   LearningRate 0.2002   Epoch: 7   Global Step: 75350   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:03:01,903-Speed 5966.79 samples/sec   Loss 9.8712   LearningRate 0.2001   Epoch: 7   Global Step: 75360   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:03:08,769-Speed 5966.49 samples/sec   Loss 9.8696   LearningRate 0.2001   Epoch: 7   Global Step: 75370   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:03:15,633-Speed 5968.47 samples/sec   Loss 9.8934   LearningRate 0.2001   Epoch: 7   Global Step: 75380   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:03:22,487-Speed 5977.26 samples/sec   Loss 9.8327   LearningRate 0.2000   Epoch: 7   Global Step: 75390   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 11:03:29,403-Speed 5923.52 samples/sec   Loss 9.9169   LearningRate 0.2000   Epoch: 7   Global Step: 75400   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 11:03:36,268-Speed 5967.24 samples/sec   Loss 9.9138   LearningRate 0.2000   Epoch: 7   Global Step: 75410   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 11:03:43,165-Speed 5943.63 samples/sec   Loss 9.7794   LearningRate 0.2000   Epoch: 7   Global Step: 75420   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 11:03:50,085-Speed 5920.19 samples/sec   Loss 9.8948   LearningRate 0.1999   Epoch: 7   Global Step: 75430   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:03:57,019-Speed 5908.91 samples/sec   Loss 9.7909   LearningRate 0.1999   Epoch: 7   Global Step: 75440   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:04:03,877-Speed 5973.24 samples/sec   Loss 9.8285   LearningRate 0.1999   Epoch: 7   Global Step: 75450   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:04:10,739-Speed 5970.24 samples/sec   Loss 9.8102   LearningRate 0.1998   Epoch: 7   Global Step: 75460   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:04:17,620-Speed 5954.02 samples/sec   Loss 9.8087   LearningRate 0.1998   Epoch: 7   Global Step: 75470   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:04:24,484-Speed 5968.69 samples/sec   Loss 9.8708   LearningRate 0.1998   Epoch: 7   Global Step: 75480   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:04:31,340-Speed 5975.61 samples/sec   Loss 9.7870   LearningRate 0.1997   Epoch: 7   Global Step: 75490   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:04:38,198-Speed 5973.45 samples/sec   Loss 9.9104   LearningRate 0.1997   Epoch: 7   Global Step: 75500   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:04:45,075-Speed 5957.27 samples/sec   Loss 9.8948   LearningRate 0.1997   Epoch: 7   Global Step: 75510   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:04:51,934-Speed 5977.21 samples/sec   Loss 9.8742   LearningRate 0.1996   Epoch: 7   Global Step: 75520   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:04:58,791-Speed 5974.63 samples/sec   Loss 9.7520   LearningRate 0.1996   Epoch: 7   Global Step: 75530   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:05:05,672-Speed 5954.05 samples/sec   Loss 9.8320   LearningRate 0.1996   Epoch: 7   Global Step: 75540   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:05:12,528-Speed 5975.45 samples/sec   Loss 9.8432   LearningRate 0.1996   Epoch: 7   Global Step: 75550   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:05:19,436-Speed 5930.55 samples/sec   Loss 9.8553   LearningRate 0.1995   Epoch: 7   Global Step: 75560   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:05:26,313-Speed 5957.45 samples/sec   Loss 9.8436   LearningRate 0.1995   Epoch: 7   Global Step: 75570   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:05:33,194-Speed 5953.63 samples/sec   Loss 9.7698   LearningRate 0.1995   Epoch: 7   Global Step: 75580   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:05:40,053-Speed 5972.93 samples/sec   Loss 9.9043   LearningRate 0.1994   Epoch: 7   Global Step: 75590   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:05:46,955-Speed 5935.25 samples/sec   Loss 9.8689   LearningRate 0.1994   Epoch: 7   Global Step: 75600   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:05:53,822-Speed 5966.60 samples/sec   Loss 9.7917   LearningRate 0.1994   Epoch: 7   Global Step: 75610   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:06:00,674-Speed 5978.52 samples/sec   Loss 9.8244   LearningRate 0.1993   Epoch: 7   Global Step: 75620   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:06:07,549-Speed 5959.29 samples/sec   Loss 9.7667   LearningRate 0.1993   Epoch: 7   Global Step: 75630   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:06:14,401-Speed 5978.72 samples/sec   Loss 9.9326   LearningRate 0.1993   Epoch: 7   Global Step: 75640   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:06:21,238-Speed 5991.47 samples/sec   Loss 9.8202   LearningRate 0.1993   Epoch: 7   Global Step: 75650   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:06:28,116-Speed 5957.02 samples/sec   Loss 9.8206   LearningRate 0.1992   Epoch: 7   Global Step: 75660   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:06:34,971-Speed 5975.78 samples/sec   Loss 9.8561   LearningRate 0.1992   Epoch: 7   Global Step: 75670   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:06:41,830-Speed 5972.62 samples/sec   Loss 9.8051   LearningRate 0.1992   Epoch: 7   Global Step: 75680   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:06:48,684-Speed 5977.25 samples/sec   Loss 9.8713   LearningRate 0.1991   Epoch: 7   Global Step: 75690   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:06:55,531-Speed 5983.63 samples/sec   Loss 9.8888   LearningRate 0.1991   Epoch: 7   Global Step: 75700   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:07:02,387-Speed 5976.91 samples/sec   Loss 9.8179   LearningRate 0.1991   Epoch: 7   Global Step: 75710   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:07:09,269-Speed 5953.19 samples/sec   Loss 9.8336   LearningRate 0.1990   Epoch: 7   Global Step: 75720   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:07:16,112-Speed 5986.65 samples/sec   Loss 9.8023   LearningRate 0.1990   Epoch: 7   Global Step: 75730   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:07:22,970-Speed 5974.19 samples/sec   Loss 9.8615   LearningRate 0.1990   Epoch: 7   Global Step: 75740   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:07:29,831-Speed 5970.79 samples/sec   Loss 9.8153   LearningRate 0.1990   Epoch: 7   Global Step: 75750   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:07:36,684-Speed 5978.17 samples/sec   Loss 9.7460   LearningRate 0.1989   Epoch: 7   Global Step: 75760   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:07:43,603-Speed 5920.73 samples/sec   Loss 9.7718   LearningRate 0.1989   Epoch: 7   Global Step: 75770   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:07:50,463-Speed 5972.67 samples/sec   Loss 9.7968   LearningRate 0.1989   Epoch: 7   Global Step: 75780   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:07:57,322-Speed 5974.33 samples/sec   Loss 9.8150   LearningRate 0.1988   Epoch: 7   Global Step: 75790   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:08:04,180-Speed 5972.58 samples/sec   Loss 9.7667   LearningRate 0.1988   Epoch: 7   Global Step: 75800   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:08:11,058-Speed 5957.60 samples/sec   Loss 9.8012   LearningRate 0.1988   Epoch: 7   Global Step: 75810   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:08:17,923-Speed 5967.81 samples/sec   Loss 9.8082   LearningRate 0.1987   Epoch: 7   Global Step: 75820   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:08:24,769-Speed 5984.05 samples/sec   Loss 9.8084   LearningRate 0.1987   Epoch: 7   Global Step: 75830   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:08:31,636-Speed 5966.00 samples/sec   Loss 9.8115   LearningRate 0.1987   Epoch: 7   Global Step: 75840   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:08:38,498-Speed 5970.52 samples/sec   Loss 9.8623   LearningRate 0.1987   Epoch: 7   Global Step: 75850   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 11:08:45,367-Speed 5963.91 samples/sec   Loss 9.7898   LearningRate 0.1986   Epoch: 7   Global Step: 75860   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 11:08:52,254-Speed 5948.82 samples/sec   Loss 9.7237   LearningRate 0.1986   Epoch: 7   Global Step: 75870   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 11:08:59,128-Speed 5962.70 samples/sec   Loss 9.8045   LearningRate 0.1986   Epoch: 7   Global Step: 75880   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 11:09:05,980-Speed 5979.28 samples/sec   Loss 9.8494   LearningRate 0.1985   Epoch: 7   Global Step: 75890   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 11:09:12,850-Speed 5963.28 samples/sec   Loss 9.7710   LearningRate 0.1985   Epoch: 7   Global Step: 75900   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 11:09:19,731-Speed 5954.48 samples/sec   Loss 9.8214   LearningRate 0.1985   Epoch: 7   Global Step: 75910   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 11:09:26,586-Speed 5976.01 samples/sec   Loss 9.7732   LearningRate 0.1984   Epoch: 7   Global Step: 75920   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:09:33,440-Speed 5976.99 samples/sec   Loss 9.7940   LearningRate 0.1984   Epoch: 7   Global Step: 75930   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:09:40,305-Speed 5968.75 samples/sec   Loss 9.8811   LearningRate 0.1984   Epoch: 7   Global Step: 75940   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:09:47,255-Speed 5895.05 samples/sec   Loss 9.8224   LearningRate 0.1983   Epoch: 7   Global Step: 75950   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:09:54,216-Speed 5885.64 samples/sec   Loss 9.7928   LearningRate 0.1983   Epoch: 7   Global Step: 75960   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:10:01,100-Speed 5951.22 samples/sec   Loss 9.7886   LearningRate 0.1983   Epoch: 7   Global Step: 75970   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:10:07,960-Speed 5972.36 samples/sec   Loss 9.7771   LearningRate 0.1983   Epoch: 7   Global Step: 75980   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:10:15,551-Speed 5975.68 samples/sec   Loss 9.7438   LearningRate 0.1982   Epoch: 7   Global Step: 75990   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:10:22,405-Speed 5977.62 samples/sec   Loss 9.7987   LearningRate 0.1982   Epoch: 7   Global Step: 76000   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:10:29,260-Speed 5976.54 samples/sec   Loss 9.8743   LearningRate 0.1982   Epoch: 7   Global Step: 76010   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:10:36,119-Speed 5972.71 samples/sec   Loss 9.8284   LearningRate 0.1981   Epoch: 7   Global Step: 76020   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:10:42,975-Speed 5975.48 samples/sec   Loss 9.7465   LearningRate 0.1981   Epoch: 7   Global Step: 76030   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:10:49,857-Speed 5952.51 samples/sec   Loss 9.6812   LearningRate 0.1981   Epoch: 7   Global Step: 76040   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:10:56,731-Speed 5959.31 samples/sec   Loss 9.8149   LearningRate 0.1980   Epoch: 7   Global Step: 76050   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:11:03,622-Speed 5945.52 samples/sec   Loss 9.8545   LearningRate 0.1980   Epoch: 7   Global Step: 76060   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:11:10,506-Speed 5951.07 samples/sec   Loss 9.7984   LearningRate 0.1980   Epoch: 7   Global Step: 76070   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:11:17,377-Speed 5962.64 samples/sec   Loss 9.7632   LearningRate 0.1980   Epoch: 7   Global Step: 76080   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:11:24,233-Speed 5976.22 samples/sec   Loss 9.8171   LearningRate 0.1979   Epoch: 7   Global Step: 76090   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:11:31,089-Speed 5975.34 samples/sec   Loss 9.7801   LearningRate 0.1979   Epoch: 7   Global Step: 76100   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:11:37,942-Speed 5977.71 samples/sec   Loss 9.8155   LearningRate 0.1979   Epoch: 7   Global Step: 76110   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:11:44,806-Speed 5968.33 samples/sec   Loss 9.7797   LearningRate 0.1978   Epoch: 7   Global Step: 76120   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 11:11:51,668-Speed 5969.80 samples/sec   Loss 9.9077   LearningRate 0.1978   Epoch: 7   Global Step: 76130   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 11:11:58,527-Speed 5972.77 samples/sec   Loss 9.7723   LearningRate 0.1978   Epoch: 7   Global Step: 76140   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 11:12:05,393-Speed 5966.72 samples/sec   Loss 9.7709   LearningRate 0.1977   Epoch: 7   Global Step: 76150   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 11:12:12,272-Speed 5955.21 samples/sec   Loss 9.8159   LearningRate 0.1977   Epoch: 7   Global Step: 76160   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 11:12:19,137-Speed 5967.43 samples/sec   Loss 9.7984   LearningRate 0.1977   Epoch: 7   Global Step: 76170   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 11:12:25,990-Speed 5978.15 samples/sec   Loss 9.7824   LearningRate 0.1977   Epoch: 7   Global Step: 76180   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 11:12:32,843-Speed 5978.24 samples/sec   Loss 9.8508   LearningRate 0.1976   Epoch: 7   Global Step: 76190   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 11:12:39,693-Speed 5980.22 samples/sec   Loss 9.7996   LearningRate 0.1976   Epoch: 7   Global Step: 76200   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:12:46,556-Speed 5969.28 samples/sec   Loss 9.7699   LearningRate 0.1976   Epoch: 7   Global Step: 76210   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:12:53,392-Speed 5992.77 samples/sec   Loss 9.8243   LearningRate 0.1975   Epoch: 7   Global Step: 76220   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:13:00,259-Speed 5966.38 samples/sec   Loss 9.8308   LearningRate 0.1975   Epoch: 7   Global Step: 76230   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:13:07,128-Speed 5964.72 samples/sec   Loss 9.8833   LearningRate 0.1975   Epoch: 7   Global Step: 76240   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:13:14,015-Speed 5948.95 samples/sec   Loss 9.7974   LearningRate 0.1974   Epoch: 7   Global Step: 76250   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:13:21,946-Speed 5165.32 samples/sec   Loss 9.7456   LearningRate 0.1974   Epoch: 7   Global Step: 76260   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:13:28,813-Speed 5966.01 samples/sec   Loss 9.8474   LearningRate 0.1974   Epoch: 7   Global Step: 76270   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:13:35,679-Speed 5966.97 samples/sec   Loss 9.7649   LearningRate 0.1974   Epoch: 7   Global Step: 76280   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:13:42,530-Speed 5979.72 samples/sec   Loss 9.8252   LearningRate 0.1973   Epoch: 7   Global Step: 76290   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:13:49,392-Speed 5969.84 samples/sec   Loss 9.7767   LearningRate 0.1973   Epoch: 7   Global Step: 76300   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:13:56,237-Speed 5985.01 samples/sec   Loss 9.7582   LearningRate 0.1973   Epoch: 7   Global Step: 76310   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 11:14:03,116-Speed 5955.58 samples/sec   Loss 9.8149   LearningRate 0.1972   Epoch: 7   Global Step: 76320   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 11:14:09,981-Speed 5967.14 samples/sec   Loss 9.7148   LearningRate 0.1972   Epoch: 7   Global Step: 76330   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 11:14:16,840-Speed 5973.58 samples/sec   Loss 9.7043   LearningRate 0.1972   Epoch: 7   Global Step: 76340   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 11:14:23,696-Speed 5974.47 samples/sec   Loss 9.8140   LearningRate 0.1971   Epoch: 7   Global Step: 76350   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 11:14:30,687-Speed 5860.36 samples/sec   Loss 9.8135   LearningRate 0.1971   Epoch: 7   Global Step: 76360   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 11:14:37,557-Speed 5963.59 samples/sec   Loss 9.7494   LearningRate 0.1971   Epoch: 7   Global Step: 76370   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 11:14:44,414-Speed 5974.01 samples/sec   Loss 9.8532   LearningRate 0.1971   Epoch: 7   Global Step: 76380   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 11:14:51,270-Speed 5975.77 samples/sec   Loss 9.7503   LearningRate 0.1970   Epoch: 7   Global Step: 76390   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 11:14:58,135-Speed 5969.75 samples/sec   Loss 9.7703   LearningRate 0.1970   Epoch: 7   Global Step: 76400   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 11:15:04,992-Speed 5975.03 samples/sec   Loss 9.7466   LearningRate 0.1970   Epoch: 7   Global Step: 76410   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:15:11,843-Speed 5980.31 samples/sec   Loss 9.7500   LearningRate 0.1969   Epoch: 7   Global Step: 76420   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:15:21,207-Speed 4374.52 samples/sec   Loss 9.7238   LearningRate 0.1969   Epoch: 7   Global Step: 76430   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:15:28,065-Speed 5974.65 samples/sec   Loss 9.7144   LearningRate 0.1969   Epoch: 7   Global Step: 76440   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:15:34,925-Speed 5972.35 samples/sec   Loss 9.8415   LearningRate 0.1968   Epoch: 7   Global Step: 76450   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:15:41,781-Speed 5976.64 samples/sec   Loss 9.7391   LearningRate 0.1968   Epoch: 7   Global Step: 76460   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:15:48,634-Speed 5977.77 samples/sec   Loss 9.8163   LearningRate 0.1968   Epoch: 7   Global Step: 76470   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:15:55,512-Speed 5956.17 samples/sec   Loss 9.7419   LearningRate 0.1968   Epoch: 7   Global Step: 76480   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:16:02,364-Speed 5981.46 samples/sec   Loss 9.8793   LearningRate 0.1967   Epoch: 7   Global Step: 76490   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:16:09,238-Speed 5959.61 samples/sec   Loss 9.8032   LearningRate 0.1967   Epoch: 7   Global Step: 76500   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:16:16,123-Speed 5951.15 samples/sec   Loss 9.7177   LearningRate 0.1967   Epoch: 7   Global Step: 76510   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:16:22,984-Speed 5970.60 samples/sec   Loss 9.8230   LearningRate 0.1966   Epoch: 7   Global Step: 76520   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:16:29,886-Speed 5935.80 samples/sec   Loss 9.7539   LearningRate 0.1966   Epoch: 7   Global Step: 76530   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:16:36,798-Speed 5927.05 samples/sec   Loss 9.7879   LearningRate 0.1966   Epoch: 7   Global Step: 76540   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:16:43,650-Speed 5978.60 samples/sec   Loss 9.7966   LearningRate 0.1965   Epoch: 7   Global Step: 76550   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:16:50,519-Speed 5964.01 samples/sec   Loss 9.6802   LearningRate 0.1965   Epoch: 7   Global Step: 76560   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:16:57,386-Speed 5965.77 samples/sec   Loss 9.6578   LearningRate 0.1965   Epoch: 7   Global Step: 76570   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:17:04,253-Speed 5966.35 samples/sec   Loss 9.7603   LearningRate 0.1965   Epoch: 7   Global Step: 76580   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:17:11,118-Speed 5967.10 samples/sec   Loss 9.7856   LearningRate 0.1964   Epoch: 7   Global Step: 76590   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:17:18,001-Speed 5952.54 samples/sec   Loss 9.7843   LearningRate 0.1964   Epoch: 7   Global Step: 76600   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:17:24,867-Speed 5967.15 samples/sec   Loss 9.8387   LearningRate 0.1964   Epoch: 7   Global Step: 76610   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 11:17:31,724-Speed 5974.15 samples/sec   Loss 9.7311   LearningRate 0.1963   Epoch: 7   Global Step: 76620   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 11:17:38,577-Speed 5978.43 samples/sec   Loss 9.7992   LearningRate 0.1963   Epoch: 7   Global Step: 76630   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 11:17:45,425-Speed 5981.77 samples/sec   Loss 9.7368   LearningRate 0.1963   Epoch: 7   Global Step: 76640   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:17:52,308-Speed 5951.90 samples/sec   Loss 9.8049   LearningRate 0.1962   Epoch: 7   Global Step: 76650   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:17:59,179-Speed 5962.60 samples/sec   Loss 9.7424   LearningRate 0.1962   Epoch: 7   Global Step: 76660   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:18:06,059-Speed 5956.92 samples/sec   Loss 9.7322   LearningRate 0.1962   Epoch: 7   Global Step: 76670   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:18:12,913-Speed 5976.57 samples/sec   Loss 9.7962   LearningRate 0.1962   Epoch: 7   Global Step: 76680   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:18:19,774-Speed 5970.89 samples/sec   Loss 9.7111   LearningRate 0.1961   Epoch: 7   Global Step: 76690   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:18:26,632-Speed 5973.76 samples/sec   Loss 9.7731   LearningRate 0.1961   Epoch: 7   Global Step: 76700   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:18:33,488-Speed 5975.21 samples/sec   Loss 9.7503   LearningRate 0.1961   Epoch: 7   Global Step: 76710   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:18:40,346-Speed 5973.45 samples/sec   Loss 9.7937   LearningRate 0.1960   Epoch: 7   Global Step: 76720   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:18:47,190-Speed 5986.07 samples/sec   Loss 9.7378   LearningRate 0.1960   Epoch: 7   Global Step: 76730   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:18:54,043-Speed 5978.70 samples/sec   Loss 9.7873   LearningRate 0.1960   Epoch: 7   Global Step: 76740   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:19:00,883-Speed 5989.04 samples/sec   Loss 9.7104   LearningRate 0.1959   Epoch: 7   Global Step: 76750   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 11:19:07,753-Speed 5962.79 samples/sec   Loss 9.6779   LearningRate 0.1959   Epoch: 7   Global Step: 76760   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 11:19:14,602-Speed 5981.97 samples/sec   Loss 9.7230   LearningRate 0.1959   Epoch: 7   Global Step: 76770   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 11:19:21,467-Speed 5967.33 samples/sec   Loss 9.7230   LearningRate 0.1959   Epoch: 7   Global Step: 76780   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 11:19:28,320-Speed 5978.00 samples/sec   Loss 9.7245   LearningRate 0.1958   Epoch: 7   Global Step: 76790   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 11:19:35,183-Speed 5969.54 samples/sec   Loss 9.7638   LearningRate 0.1958   Epoch: 7   Global Step: 76800   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 11:19:42,033-Speed 5980.82 samples/sec   Loss 9.6759   LearningRate 0.1958   Epoch: 7   Global Step: 76810   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 11:19:48,892-Speed 5972.49 samples/sec   Loss 9.7603   LearningRate 0.1957   Epoch: 7   Global Step: 76820   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 11:19:55,748-Speed 5976.18 samples/sec   Loss 9.7838   LearningRate 0.1957   Epoch: 7   Global Step: 76830   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 11:20:02,609-Speed 5970.86 samples/sec   Loss 9.7337   LearningRate 0.1957   Epoch: 7   Global Step: 76840   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 11:20:09,462-Speed 5978.62 samples/sec   Loss 9.7440   LearningRate 0.1956   Epoch: 7   Global Step: 76850   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:20:16,331-Speed 5964.02 samples/sec   Loss 9.7147   LearningRate 0.1956   Epoch: 7   Global Step: 76860   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:20:23,182-Speed 5980.19 samples/sec   Loss 9.7735   LearningRate 0.1956   Epoch: 7   Global Step: 76870   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:20:30,109-Speed 5914.01 samples/sec   Loss 9.7570   LearningRate 0.1956   Epoch: 7   Global Step: 76880   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:20:36,965-Speed 5975.00 samples/sec   Loss 9.7494   LearningRate 0.1955   Epoch: 7   Global Step: 76890   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:20:43,797-Speed 5996.49 samples/sec   Loss 9.6655   LearningRate 0.1955   Epoch: 7   Global Step: 76900   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 11:20:50,640-Speed 5986.55 samples/sec   Loss 9.6667   LearningRate 0.1955   Epoch: 7   Global Step: 76910   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 11:20:57,486-Speed 5983.53 samples/sec   Loss 9.7537   LearningRate 0.1954   Epoch: 7   Global Step: 76920   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 11:21:04,341-Speed 5978.22 samples/sec   Loss 9.7719   LearningRate 0.1954   Epoch: 7   Global Step: 76930   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 11:21:11,194-Speed 5978.23 samples/sec   Loss 9.6635   LearningRate 0.1954   Epoch: 7   Global Step: 76940   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 11:21:18,076-Speed 5952.92 samples/sec   Loss 9.7035   LearningRate 0.1953   Epoch: 7   Global Step: 76950   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 11:21:24,939-Speed 5968.81 samples/sec   Loss 9.6889   LearningRate 0.1953   Epoch: 7   Global Step: 76960   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 11:21:31,805-Speed 5967.08 samples/sec   Loss 9.6437   LearningRate 0.1953   Epoch: 7   Global Step: 76970   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 11:21:38,672-Speed 5966.11 samples/sec   Loss 9.8016   LearningRate 0.1953   Epoch: 7   Global Step: 76980   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 11:21:45,536-Speed 5968.48 samples/sec   Loss 9.7654   LearningRate 0.1952   Epoch: 7   Global Step: 76990   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-01-08 11:21:52,396-Speed 5972.58 samples/sec   Loss 9.8288   LearningRate 0.1952   Epoch: 7   Global Step: 77000   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:21:59,250-Speed 5976.50 samples/sec   Loss 9.7937   LearningRate 0.1952   Epoch: 7   Global Step: 77010   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:22:06,111-Speed 5973.96 samples/sec   Loss 9.6763   LearningRate 0.1951   Epoch: 7   Global Step: 77020   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:22:12,967-Speed 5975.78 samples/sec   Loss 9.6943   LearningRate 0.1951   Epoch: 7   Global Step: 77030   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:22:19,822-Speed 5975.78 samples/sec   Loss 9.8525   LearningRate 0.1951   Epoch: 7   Global Step: 77040   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:22:26,726-Speed 5934.01 samples/sec   Loss 9.7292   LearningRate 0.1950   Epoch: 7   Global Step: 77050   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:22:33,570-Speed 5985.09 samples/sec   Loss 9.7892   LearningRate 0.1950   Epoch: 7   Global Step: 77060   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:22:40,418-Speed 5984.46 samples/sec   Loss 9.7376   LearningRate 0.1950   Epoch: 7   Global Step: 77070   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:22:47,275-Speed 5973.76 samples/sec   Loss 9.7498   LearningRate 0.1950   Epoch: 7   Global Step: 77080   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:22:54,135-Speed 5972.10 samples/sec   Loss 9.6974   LearningRate 0.1949   Epoch: 7   Global Step: 77090   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-01-08 11:23:00,992-Speed 5974.50 samples/sec   Loss 9.7521   LearningRate 0.1949   Epoch: 7   Global Step: 77100   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:23:12,584-Speed 3533.74 samples/sec   Loss 9.6776   LearningRate 0.1949   Epoch: 7   Global Step: 77110   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:23:19,450-Speed 5967.31 samples/sec   Loss 9.7044   LearningRate 0.1948   Epoch: 7   Global Step: 77120   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:23:26,342-Speed 5945.00 samples/sec   Loss 9.7334   LearningRate 0.1948   Epoch: 7   Global Step: 77130   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:23:33,192-Speed 5980.45 samples/sec   Loss 9.7285   LearningRate 0.1948   Epoch: 7   Global Step: 77140   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:23:40,050-Speed 5973.83 samples/sec   Loss 9.7064   LearningRate 0.1947   Epoch: 7   Global Step: 77150   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:23:46,965-Speed 5926.37 samples/sec   Loss 9.6901   LearningRate 0.1947   Epoch: 7   Global Step: 77160   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:23:53,840-Speed 5958.47 samples/sec   Loss 9.7015   LearningRate 0.1947   Epoch: 7   Global Step: 77170   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:24:00,700-Speed 5972.29 samples/sec   Loss 9.6768   LearningRate 0.1947   Epoch: 7   Global Step: 77180   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:24:07,552-Speed 5978.92 samples/sec   Loss 9.7202   LearningRate 0.1946   Epoch: 7   Global Step: 77190   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:24:14,422-Speed 5965.69 samples/sec   Loss 9.7414   LearningRate 0.1946   Epoch: 7   Global Step: 77200   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 11:24:21,306-Speed 5951.44 samples/sec   Loss 9.7794   LearningRate 0.1946   Epoch: 7   Global Step: 77210   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 11:24:28,174-Speed 5963.98 samples/sec   Loss 9.6775   LearningRate 0.1945   Epoch: 7   Global Step: 77220   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-01-08 11:24:35,054-Speed 5955.02 samples/sec   Loss 9.6526   LearningRate 0.1945   Epoch: 7   Global Step: 77230   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:24:41,931-Speed 5957.50 samples/sec   Loss 9.7504   LearningRate 0.1945   Epoch: 7   Global Step: 77240   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:24:48,790-Speed 5973.04 samples/sec   Loss 9.6483   LearningRate 0.1944   Epoch: 7   Global Step: 77250   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:24:55,643-Speed 5978.19 samples/sec   Loss 9.7103   LearningRate 0.1944   Epoch: 7   Global Step: 77260   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:25:02,575-Speed 5910.17 samples/sec   Loss 9.6605   LearningRate 0.1944   Epoch: 7   Global Step: 77270   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:25:09,423-Speed 5982.34 samples/sec   Loss 9.7954   LearningRate 0.1944   Epoch: 7   Global Step: 77280   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:25:16,292-Speed 5964.96 samples/sec   Loss 9.7158   LearningRate 0.1943   Epoch: 7   Global Step: 77290   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:25:23,143-Speed 5979.35 samples/sec   Loss 9.7524   LearningRate 0.1943   Epoch: 7   Global Step: 77300   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:25:30,004-Speed 5971.77 samples/sec   Loss 9.6653   LearningRate 0.1943   Epoch: 7   Global Step: 77310   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-01-08 11:25:36,874-Speed 5963.36 samples/sec   Loss 9.6916   LearningRate 0.1942   Epoch: 7   Global Step: 77320   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:25:43,716-Speed 5986.62 samples/sec   Loss 9.7217   LearningRate 0.1942   Epoch: 7   Global Step: 77330   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:25:50,588-Speed 5962.37 samples/sec   Loss 9.7035   LearningRate 0.1942   Epoch: 7   Global Step: 77340   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:25:57,438-Speed 5980.52 samples/sec   Loss 9.6619   LearningRate 0.1941   Epoch: 7   Global Step: 77350   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:26:04,327-Speed 5946.75 samples/sec   Loss 9.6803   LearningRate 0.1941   Epoch: 7   Global Step: 77360   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:26:11,177-Speed 5980.77 samples/sec   Loss 9.6905   LearningRate 0.1941   Epoch: 7   Global Step: 77370   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:26:18,023-Speed 5983.83 samples/sec   Loss 9.6745   LearningRate 0.1941   Epoch: 7   Global Step: 77380   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:26:24,905-Speed 5954.85 samples/sec   Loss 9.7452   LearningRate 0.1940   Epoch: 7   Global Step: 77390   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:26:31,763-Speed 5973.13 samples/sec   Loss 9.6548   LearningRate 0.1940   Epoch: 7   Global Step: 77400   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:26:38,622-Speed 5973.92 samples/sec   Loss 9.7167   LearningRate 0.1940   Epoch: 7   Global Step: 77410   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:26:45,494-Speed 5961.24 samples/sec   Loss 9.6947   LearningRate 0.1939   Epoch: 7   Global Step: 77420   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:26:52,375-Speed 5955.28 samples/sec   Loss 9.7208   LearningRate 0.1939   Epoch: 7   Global Step: 77430   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 11:26:59,222-Speed 5983.62 samples/sec   Loss 9.7594   LearningRate 0.1939   Epoch: 7   Global Step: 77440   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 11:27:06,060-Speed 5991.06 samples/sec   Loss 9.7413   LearningRate 0.1938   Epoch: 7   Global Step: 77450   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:27:13,019-Speed 5887.13 samples/sec   Loss 9.7070   LearningRate 0.1938   Epoch: 7   Global Step: 77460   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:27:19,876-Speed 5974.64 samples/sec   Loss 9.6946   LearningRate 0.1938   Epoch: 7   Global Step: 77470   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:27:26,773-Speed 5940.31 samples/sec   Loss 9.7538   LearningRate 0.1938   Epoch: 7   Global Step: 77480   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:27:33,664-Speed 5945.20 samples/sec   Loss 9.6800   LearningRate 0.1937   Epoch: 7   Global Step: 77490   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:27:40,553-Speed 5947.26 samples/sec   Loss 9.6736   LearningRate 0.1937   Epoch: 7   Global Step: 77500   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:27:47,441-Speed 5947.84 samples/sec   Loss 9.6715   LearningRate 0.1937   Epoch: 7   Global Step: 77510   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:27:54,302-Speed 5971.64 samples/sec   Loss 9.7354   LearningRate 0.1936   Epoch: 7   Global Step: 77520   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:28:01,156-Speed 5977.39 samples/sec   Loss 9.7045   LearningRate 0.1936   Epoch: 7   Global Step: 77530   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:28:08,006-Speed 5981.24 samples/sec   Loss 9.6208   LearningRate 0.1936   Epoch: 7   Global Step: 77540   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:28:14,878-Speed 5961.85 samples/sec   Loss 9.7040   LearningRate 0.1935   Epoch: 7   Global Step: 77550   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 11:28:21,721-Speed 5986.37 samples/sec   Loss 9.7032   LearningRate 0.1935   Epoch: 7   Global Step: 77560   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 11:28:28,575-Speed 5977.13 samples/sec   Loss 9.7815   LearningRate 0.1935   Epoch: 7   Global Step: 77570   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 11:28:35,428-Speed 5979.92 samples/sec   Loss 9.7471   LearningRate 0.1935   Epoch: 7   Global Step: 77580   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 11:28:42,296-Speed 5965.09 samples/sec   Loss 9.6164   LearningRate 0.1934   Epoch: 7   Global Step: 77590   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 11:28:49,148-Speed 5978.41 samples/sec   Loss 9.6621   LearningRate 0.1934   Epoch: 7   Global Step: 77600   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 11:28:55,999-Speed 5980.16 samples/sec   Loss 9.6391   LearningRate 0.1934   Epoch: 7   Global Step: 77610   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:29:02,861-Speed 5970.17 samples/sec   Loss 9.6814   LearningRate 0.1933   Epoch: 7   Global Step: 77620   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:29:09,727-Speed 5965.76 samples/sec   Loss 9.7232   LearningRate 0.1933   Epoch: 7   Global Step: 77630   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:29:16,576-Speed 5981.58 samples/sec   Loss 9.7006   LearningRate 0.1933   Epoch: 7   Global Step: 77640   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:29:23,439-Speed 5969.65 samples/sec   Loss 9.6273   LearningRate 0.1933   Epoch: 7   Global Step: 77650   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:29:30,303-Speed 5968.37 samples/sec   Loss 9.6421   LearningRate 0.1932   Epoch: 7   Global Step: 77660   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:29:37,164-Speed 5970.52 samples/sec   Loss 9.7131   LearningRate 0.1932   Epoch: 7   Global Step: 77670   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:29:44,031-Speed 5966.60 samples/sec   Loss 9.7177   LearningRate 0.1932   Epoch: 7   Global Step: 77680   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:29:50,907-Speed 5957.35 samples/sec   Loss 9.6672   LearningRate 0.1931   Epoch: 7   Global Step: 77690   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:29:57,760-Speed 5977.66 samples/sec   Loss 9.6993   LearningRate 0.1931   Epoch: 7   Global Step: 77700   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:30:04,618-Speed 5974.31 samples/sec   Loss 9.6446   LearningRate 0.1931   Epoch: 7   Global Step: 77710   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:30:11,461-Speed 5986.22 samples/sec   Loss 9.6196   LearningRate 0.1930   Epoch: 7   Global Step: 77720   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:30:18,340-Speed 5955.23 samples/sec   Loss 9.6208   LearningRate 0.1930   Epoch: 7   Global Step: 77730   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:30:25,184-Speed 5985.85 samples/sec   Loss 9.7250   LearningRate 0.1930   Epoch: 7   Global Step: 77740   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:30:32,035-Speed 5979.50 samples/sec   Loss 9.7224   LearningRate 0.1930   Epoch: 7   Global Step: 77750   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:30:38,921-Speed 5957.13 samples/sec   Loss 9.5628   LearningRate 0.1929   Epoch: 7   Global Step: 77760   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:30:45,775-Speed 5976.29 samples/sec   Loss 9.6909   LearningRate 0.1929   Epoch: 7   Global Step: 77770   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:30:52,720-Speed 5899.50 samples/sec   Loss 9.6547   LearningRate 0.1929   Epoch: 7   Global Step: 77780   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:30:59,566-Speed 5985.54 samples/sec   Loss 9.6789   LearningRate 0.1928   Epoch: 7   Global Step: 77790   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:31:06,424-Speed 5974.28 samples/sec   Loss 9.6545   LearningRate 0.1928   Epoch: 7   Global Step: 77800   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:31:13,275-Speed 5978.87 samples/sec   Loss 9.6421   LearningRate 0.1928   Epoch: 7   Global Step: 77810   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 11:31:20,137-Speed 5970.09 samples/sec   Loss 9.6424   LearningRate 0.1927   Epoch: 7   Global Step: 77820   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 11:31:26,993-Speed 5976.28 samples/sec   Loss 9.6624   LearningRate 0.1927   Epoch: 7   Global Step: 77830   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 11:31:33,833-Speed 5989.42 samples/sec   Loss 9.6600   LearningRate 0.1927   Epoch: 7   Global Step: 77840   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 11:31:40,678-Speed 5984.52 samples/sec   Loss 9.7061   LearningRate 0.1927   Epoch: 7   Global Step: 77850   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 11:31:47,536-Speed 5973.42 samples/sec   Loss 9.6754   LearningRate 0.1926   Epoch: 7   Global Step: 77860   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 11:31:54,390-Speed 5976.75 samples/sec   Loss 9.6961   LearningRate 0.1926   Epoch: 7   Global Step: 77870   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 11:32:01,252-Speed 5970.07 samples/sec   Loss 9.6621   LearningRate 0.1926   Epoch: 7   Global Step: 77880   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 11:32:08,176-Speed 5917.06 samples/sec   Loss 9.6221   LearningRate 0.1925   Epoch: 7   Global Step: 77890   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 11:32:15,098-Speed 5917.89 samples/sec   Loss 9.6785   LearningRate 0.1925   Epoch: 7   Global Step: 77900   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 11:32:22,024-Speed 5914.84 samples/sec   Loss 9.7187   LearningRate 0.1925   Epoch: 7   Global Step: 77910   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:32:28,946-Speed 5918.39 samples/sec   Loss 9.6155   LearningRate 0.1924   Epoch: 7   Global Step: 77920   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:32:35,882-Speed 5909.82 samples/sec   Loss 9.5967   LearningRate 0.1924   Epoch: 7   Global Step: 77930   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:32:42,760-Speed 5956.14 samples/sec   Loss 9.6842   LearningRate 0.1924   Epoch: 7   Global Step: 77940   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:32:49,708-Speed 5897.67 samples/sec   Loss 9.7107   LearningRate 0.1924   Epoch: 7   Global Step: 77950   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:32:56,557-Speed 5981.41 samples/sec   Loss 9.6809   LearningRate 0.1923   Epoch: 7   Global Step: 77960   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:33:03,424-Speed 5968.41 samples/sec   Loss 9.6103   LearningRate 0.1923   Epoch: 7   Global Step: 77970   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:33:10,280-Speed 5976.48 samples/sec   Loss 9.6049   LearningRate 0.1923   Epoch: 7   Global Step: 77980   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:33:17,125-Speed 5984.38 samples/sec   Loss 9.6904   LearningRate 0.1922   Epoch: 7   Global Step: 77990   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:33:23,994-Speed 5964.61 samples/sec   Loss 9.6920   LearningRate 0.1922   Epoch: 7   Global Step: 78000   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:33:30,851-Speed 5973.70 samples/sec   Loss 9.6797   LearningRate 0.1922   Epoch: 7   Global Step: 78010   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 11:33:37,709-Speed 5974.14 samples/sec   Loss 9.7218   LearningRate 0.1922   Epoch: 7   Global Step: 78020   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 11:33:44,547-Speed 5991.50 samples/sec   Loss 9.6173   LearningRate 0.1921   Epoch: 7   Global Step: 78030   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:33:51,408-Speed 5970.49 samples/sec   Loss 9.6496   LearningRate 0.1921   Epoch: 7   Global Step: 78040   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:33:58,264-Speed 5975.62 samples/sec   Loss 9.6628   LearningRate 0.1921   Epoch: 7   Global Step: 78050   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:34:05,132-Speed 5965.93 samples/sec   Loss 9.6873   LearningRate 0.1920   Epoch: 7   Global Step: 78060   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:34:11,977-Speed 5984.66 samples/sec   Loss 9.6727   LearningRate 0.1920   Epoch: 7   Global Step: 78070   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:34:18,851-Speed 5960.28 samples/sec   Loss 9.6520   LearningRate 0.1920   Epoch: 7   Global Step: 78080   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:34:25,712-Speed 5971.21 samples/sec   Loss 9.6549   LearningRate 0.1919   Epoch: 7   Global Step: 78090   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:34:32,661-Speed 5895.01 samples/sec   Loss 9.7628   LearningRate 0.1919   Epoch: 7   Global Step: 78100   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:34:39,551-Speed 5946.22 samples/sec   Loss 9.6858   LearningRate 0.1919   Epoch: 7   Global Step: 78110   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:34:46,399-Speed 5982.95 samples/sec   Loss 9.6214   LearningRate 0.1919   Epoch: 7   Global Step: 78120   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:34:53,270-Speed 5962.12 samples/sec   Loss 9.6497   LearningRate 0.1918   Epoch: 7   Global Step: 78130   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 11:35:00,149-Speed 5955.91 samples/sec   Loss 9.6930   LearningRate 0.1918   Epoch: 7   Global Step: 78140   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:35:07,007-Speed 5973.49 samples/sec   Loss 9.6955   LearningRate 0.1918   Epoch: 7   Global Step: 78150   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:35:13,854-Speed 5983.16 samples/sec   Loss 9.7206   LearningRate 0.1917   Epoch: 7   Global Step: 78160   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:35:20,721-Speed 5966.24 samples/sec   Loss 9.6850   LearningRate 0.1917   Epoch: 7   Global Step: 78170   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:35:27,576-Speed 5975.81 samples/sec   Loss 9.6551   LearningRate 0.1917   Epoch: 7   Global Step: 78180   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:35:34,446-Speed 5964.95 samples/sec   Loss 9.6584   LearningRate 0.1916   Epoch: 7   Global Step: 78190   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:35:41,307-Speed 5972.37 samples/sec   Loss 9.6161   LearningRate 0.1916   Epoch: 7   Global Step: 78200   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:35:48,160-Speed 5978.55 samples/sec   Loss 9.5329   LearningRate 0.1916   Epoch: 7   Global Step: 78210   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:35:55,010-Speed 5980.91 samples/sec   Loss 9.6331   LearningRate 0.1916   Epoch: 7   Global Step: 78220   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:36:01,859-Speed 5981.58 samples/sec   Loss 9.7121   LearningRate 0.1915   Epoch: 7   Global Step: 78230   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:36:08,711-Speed 5979.92 samples/sec   Loss 9.6819   LearningRate 0.1915   Epoch: 7   Global Step: 78240   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 11:36:15,585-Speed 5964.91 samples/sec   Loss 9.6126   LearningRate 0.1915   Epoch: 7   Global Step: 78250   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 11:36:22,442-Speed 5974.76 samples/sec   Loss 9.6676   LearningRate 0.1914   Epoch: 7   Global Step: 78260   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 11:36:29,310-Speed 5965.06 samples/sec   Loss 9.6709   LearningRate 0.1914   Epoch: 7   Global Step: 78270   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:36:36,176-Speed 5966.52 samples/sec   Loss 9.6799   LearningRate 0.1914   Epoch: 7   Global Step: 78280   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:36:43,066-Speed 5948.38 samples/sec   Loss 9.6494   LearningRate 0.1913   Epoch: 7   Global Step: 78290   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:36:49,925-Speed 5975.59 samples/sec   Loss 9.6029   LearningRate 0.1913   Epoch: 7   Global Step: 78300   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:36:56,773-Speed 5981.64 samples/sec   Loss 9.6233   LearningRate 0.1913   Epoch: 7   Global Step: 78310   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:37:03,620-Speed 5983.66 samples/sec   Loss 9.6943   LearningRate 0.1913   Epoch: 7   Global Step: 78320   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:37:10,478-Speed 5975.54 samples/sec   Loss 9.5528   LearningRate 0.1912   Epoch: 7   Global Step: 78330   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:37:17,346-Speed 5965.50 samples/sec   Loss 9.6277   LearningRate 0.1912   Epoch: 7   Global Step: 78340   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:37:24,208-Speed 5968.97 samples/sec   Loss 9.6756   LearningRate 0.1912   Epoch: 7   Global Step: 78350   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:37:31,071-Speed 5969.63 samples/sec   Loss 9.6094   LearningRate 0.1911   Epoch: 7   Global Step: 78360   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:37:37,945-Speed 5960.05 samples/sec   Loss 9.6569   LearningRate 0.1911   Epoch: 7   Global Step: 78370   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:37:44,800-Speed 5976.20 samples/sec   Loss 9.6556   LearningRate 0.1911   Epoch: 7   Global Step: 78380   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:37:51,658-Speed 5974.01 samples/sec   Loss 9.6149   LearningRate 0.1911   Epoch: 7   Global Step: 78390   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:37:58,511-Speed 5977.18 samples/sec   Loss 9.6532   LearningRate 0.1910   Epoch: 7   Global Step: 78400   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:38:05,358-Speed 5982.92 samples/sec   Loss 9.7388   LearningRate 0.1910   Epoch: 7   Global Step: 78410   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:38:12,226-Speed 5965.06 samples/sec   Loss 9.6528   LearningRate 0.1910   Epoch: 7   Global Step: 78420   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:38:19,080-Speed 5977.44 samples/sec   Loss 9.6208   LearningRate 0.1909   Epoch: 7   Global Step: 78430   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:38:25,947-Speed 5965.61 samples/sec   Loss 9.4936   LearningRate 0.1909   Epoch: 7   Global Step: 78440   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:38:32,841-Speed 5943.02 samples/sec   Loss 9.6574   LearningRate 0.1909   Epoch: 7   Global Step: 78450   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:38:39,697-Speed 5974.72 samples/sec   Loss 9.5436   LearningRate 0.1908   Epoch: 7   Global Step: 78460   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:38:46,572-Speed 5960.58 samples/sec   Loss 9.5046   LearningRate 0.1908   Epoch: 7   Global Step: 78470   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 11:38:53,428-Speed 5975.61 samples/sec   Loss 9.5768   LearningRate 0.1908   Epoch: 7   Global Step: 78480   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 11:39:00,305-Speed 5957.34 samples/sec   Loss 9.6054   LearningRate 0.1908   Epoch: 7   Global Step: 78490   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 11:39:07,157-Speed 5978.54 samples/sec   Loss 9.6585   LearningRate 0.1907   Epoch: 7   Global Step: 78500   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:39:14,030-Speed 5961.08 samples/sec   Loss 9.6160   LearningRate 0.1907   Epoch: 7   Global Step: 78510   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:39:20,887-Speed 5974.78 samples/sec   Loss 9.6033   LearningRate 0.1907   Epoch: 7   Global Step: 78520   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:39:27,744-Speed 5976.71 samples/sec   Loss 9.6040   LearningRate 0.1906   Epoch: 7   Global Step: 78530   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:39:34,598-Speed 5976.90 samples/sec   Loss 9.5513   LearningRate 0.1906   Epoch: 7   Global Step: 78540   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:39:41,500-Speed 5977.37 samples/sec   Loss 9.6019   LearningRate 0.1906   Epoch: 7   Global Step: 78550   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:39:48,359-Speed 5973.37 samples/sec   Loss 9.5931   LearningRate 0.1905   Epoch: 7   Global Step: 78560   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:39:55,226-Speed 5966.18 samples/sec   Loss 9.6342   LearningRate 0.1905   Epoch: 7   Global Step: 78570   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:40:02,074-Speed 5982.07 samples/sec   Loss 9.6179   LearningRate 0.1905   Epoch: 7   Global Step: 78580   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:40:08,945-Speed 5963.97 samples/sec   Loss 9.5617   LearningRate 0.1905   Epoch: 7   Global Step: 78590   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:40:15,792-Speed 5984.07 samples/sec   Loss 9.6373   LearningRate 0.1904   Epoch: 7   Global Step: 78600   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:40:22,651-Speed 5972.44 samples/sec   Loss 9.6010   LearningRate 0.1904   Epoch: 7   Global Step: 78610   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:40:29,530-Speed 5955.78 samples/sec   Loss 9.6622   LearningRate 0.1904   Epoch: 7   Global Step: 78620   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 11:40:36,387-Speed 5974.65 samples/sec   Loss 9.6177   LearningRate 0.1903   Epoch: 7   Global Step: 78630   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 11:40:43,259-Speed 5961.35 samples/sec   Loss 9.5432   LearningRate 0.1903   Epoch: 7   Global Step: 78640   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 11:40:50,114-Speed 5975.88 samples/sec   Loss 9.5999   LearningRate 0.1903   Epoch: 7   Global Step: 78650   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 11:40:56,973-Speed 5973.43 samples/sec   Loss 9.5573   LearningRate 0.1903   Epoch: 7   Global Step: 78660   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 11:41:03,833-Speed 5972.23 samples/sec   Loss 9.6367   LearningRate 0.1902   Epoch: 7   Global Step: 78670   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 11:41:10,685-Speed 5979.17 samples/sec   Loss 9.5932   LearningRate 0.1902   Epoch: 7   Global Step: 78680   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 11:41:17,548-Speed 5969.28 samples/sec   Loss 9.5990   LearningRate 0.1902   Epoch: 7   Global Step: 78690   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 11:41:24,417-Speed 5964.54 samples/sec   Loss 9.5971   LearningRate 0.1901   Epoch: 7   Global Step: 78700   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 11:41:31,283-Speed 5967.20 samples/sec   Loss 9.6279   LearningRate 0.1901   Epoch: 7   Global Step: 78710   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 11:41:38,143-Speed 5971.67 samples/sec   Loss 9.6074   LearningRate 0.1901   Epoch: 7   Global Step: 78720   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:41:45,054-Speed 5927.78 samples/sec   Loss 9.6150   LearningRate 0.1900   Epoch: 7   Global Step: 78730   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:41:51,913-Speed 5972.92 samples/sec   Loss 9.6743   LearningRate 0.1900   Epoch: 7   Global Step: 78740   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:41:58,770-Speed 5974.49 samples/sec   Loss 9.6107   LearningRate 0.1900   Epoch: 7   Global Step: 78750   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:42:05,617-Speed 5983.44 samples/sec   Loss 9.5691   LearningRate 0.1900   Epoch: 7   Global Step: 78760   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:42:12,476-Speed 5972.42 samples/sec   Loss 9.6306   LearningRate 0.1899   Epoch: 7   Global Step: 78770   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:42:19,359-Speed 5952.97 samples/sec   Loss 9.5430   LearningRate 0.1899   Epoch: 7   Global Step: 78780   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:42:26,263-Speed 5933.28 samples/sec   Loss 9.5912   LearningRate 0.1899   Epoch: 7   Global Step: 78790   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:42:33,178-Speed 5924.20 samples/sec   Loss 9.5605   LearningRate 0.1898   Epoch: 7   Global Step: 78800   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:42:40,062-Speed 5951.44 samples/sec   Loss 9.5921   LearningRate 0.1898   Epoch: 7   Global Step: 78810   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:42:46,943-Speed 5953.52 samples/sec   Loss 9.5995   LearningRate 0.1898   Epoch: 7   Global Step: 78820   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 11:42:53,794-Speed 5980.42 samples/sec   Loss 9.6135   LearningRate 0.1898   Epoch: 7   Global Step: 78830   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:43:00,657-Speed 5969.15 samples/sec   Loss 9.5097   LearningRate 0.1897   Epoch: 7   Global Step: 78840   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:43:07,523-Speed 5966.77 samples/sec   Loss 9.6004   LearningRate 0.1897   Epoch: 7   Global Step: 78850   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:43:14,371-Speed 5981.79 samples/sec   Loss 9.6341   LearningRate 0.1897   Epoch: 7   Global Step: 78860   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:43:21,224-Speed 5978.02 samples/sec   Loss 9.6168   LearningRate 0.1896   Epoch: 7   Global Step: 78870   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:43:28,096-Speed 5962.84 samples/sec   Loss 9.5753   LearningRate 0.1896   Epoch: 7   Global Step: 78880   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:43:34,978-Speed 5952.27 samples/sec   Loss 9.6524   LearningRate 0.1896   Epoch: 7   Global Step: 78890   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:43:41,828-Speed 5980.50 samples/sec   Loss 9.6109   LearningRate 0.1895   Epoch: 7   Global Step: 78900   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:43:48,688-Speed 5973.62 samples/sec   Loss 9.5836   LearningRate 0.1895   Epoch: 7   Global Step: 78910   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:43:55,535-Speed 5983.67 samples/sec   Loss 9.4990   LearningRate 0.1895   Epoch: 7   Global Step: 78920   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:44:02,397-Speed 5970.59 samples/sec   Loss 9.5929   LearningRate 0.1895   Epoch: 7   Global Step: 78930   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 11:44:09,241-Speed 5987.76 samples/sec   Loss 9.6404   LearningRate 0.1894   Epoch: 7   Global Step: 78940   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:44:16,149-Speed 5930.76 samples/sec   Loss 9.6790   LearningRate 0.1894   Epoch: 7   Global Step: 78950   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:44:23,026-Speed 5958.03 samples/sec   Loss 9.6457   LearningRate 0.1894   Epoch: 7   Global Step: 78960   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:44:29,881-Speed 5976.41 samples/sec   Loss 9.6012   LearningRate 0.1893   Epoch: 7   Global Step: 78970   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:44:36,744-Speed 5970.03 samples/sec   Loss 9.6226   LearningRate 0.1893   Epoch: 7   Global Step: 78980   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:44:43,596-Speed 5978.77 samples/sec   Loss 9.6830   LearningRate 0.1893   Epoch: 7   Global Step: 78990   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:44:50,444-Speed 5981.95 samples/sec   Loss 9.6446   LearningRate 0.1893   Epoch: 7   Global Step: 79000   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:44:57,298-Speed 5977.07 samples/sec   Loss 9.5844   LearningRate 0.1892   Epoch: 7   Global Step: 79010   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:45:04,155-Speed 5977.99 samples/sec   Loss 9.5337   LearningRate 0.1892   Epoch: 7   Global Step: 79020   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:45:11,000-Speed 5984.59 samples/sec   Loss 9.6409   LearningRate 0.1892   Epoch: 7   Global Step: 79030   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:45:17,853-Speed 5978.63 samples/sec   Loss 9.6007   LearningRate 0.1891   Epoch: 7   Global Step: 79040   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 11:45:24,713-Speed 5972.44 samples/sec   Loss 9.6003   LearningRate 0.1891   Epoch: 7   Global Step: 79050   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 11:45:31,586-Speed 5960.62 samples/sec   Loss 9.5894   LearningRate 0.1891   Epoch: 7   Global Step: 79060   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:45:38,439-Speed 5978.65 samples/sec   Loss 9.6359   LearningRate 0.1890   Epoch: 7   Global Step: 79070   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:45:45,358-Speed 5922.81 samples/sec   Loss 9.6267   LearningRate 0.1890   Epoch: 7   Global Step: 79080   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:45:52,224-Speed 5966.58 samples/sec   Loss 9.6247   LearningRate 0.1890   Epoch: 7   Global Step: 79090   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:45:59,107-Speed 5952.14 samples/sec   Loss 9.5552   LearningRate 0.1890   Epoch: 7   Global Step: 79100   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:46:05,988-Speed 5954.45 samples/sec   Loss 9.5714   LearningRate 0.1889   Epoch: 7   Global Step: 79110   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:46:12,854-Speed 5966.33 samples/sec   Loss 9.6368   LearningRate 0.1889   Epoch: 7   Global Step: 79120   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:46:19,735-Speed 5954.10 samples/sec   Loss 9.5669   LearningRate 0.1889   Epoch: 7   Global Step: 79130   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:46:26,591-Speed 5975.34 samples/sec   Loss 9.5393   LearningRate 0.1888   Epoch: 7   Global Step: 79140   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:46:33,442-Speed 5980.13 samples/sec   Loss 9.6504   LearningRate 0.1888   Epoch: 7   Global Step: 79150   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:46:40,285-Speed 5986.15 samples/sec   Loss 9.5942   LearningRate 0.1888   Epoch: 7   Global Step: 79160   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 11:46:47,136-Speed 5979.59 samples/sec   Loss 9.5823   LearningRate 0.1887   Epoch: 7   Global Step: 79170   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 11:46:53,989-Speed 5977.78 samples/sec   Loss 9.6217   LearningRate 0.1887   Epoch: 7   Global Step: 79180   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 11:47:00,830-Speed 5988.87 samples/sec   Loss 9.5978   LearningRate 0.1887   Epoch: 7   Global Step: 79190   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:47:07,675-Speed 5985.08 samples/sec   Loss 9.6152   LearningRate 0.1887   Epoch: 7   Global Step: 79200   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:47:14,563-Speed 5947.81 samples/sec   Loss 9.5068   LearningRate 0.1886   Epoch: 7   Global Step: 79210   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:47:21,424-Speed 5970.90 samples/sec   Loss 9.5578   LearningRate 0.1886   Epoch: 7   Global Step: 79220   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:47:28,399-Speed 5874.08 samples/sec   Loss 9.5335   LearningRate 0.1886   Epoch: 7   Global Step: 79230   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:47:35,266-Speed 5967.53 samples/sec   Loss 9.5805   LearningRate 0.1885   Epoch: 7   Global Step: 79240   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:47:42,155-Speed 5946.40 samples/sec   Loss 9.5534   LearningRate 0.1885   Epoch: 7   Global Step: 79250   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:47:49,010-Speed 5976.68 samples/sec   Loss 9.5377   LearningRate 0.1885   Epoch: 7   Global Step: 79260   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:47:55,889-Speed 5954.94 samples/sec   Loss 9.6167   LearningRate 0.1885   Epoch: 7   Global Step: 79270   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:48:02,744-Speed 5976.43 samples/sec   Loss 9.6468   LearningRate 0.1884   Epoch: 7   Global Step: 79280   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:48:09,666-Speed 5921.47 samples/sec   Loss 9.5608   LearningRate 0.1884   Epoch: 7   Global Step: 79290   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 11:48:16,552-Speed 5948.89 samples/sec   Loss 9.6391   LearningRate 0.1884   Epoch: 7   Global Step: 79300   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:48:23,474-Speed 5918.48 samples/sec   Loss 9.5947   LearningRate 0.1883   Epoch: 7   Global Step: 79310   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:48:30,382-Speed 5931.22 samples/sec   Loss 9.5583   LearningRate 0.1883   Epoch: 7   Global Step: 79320   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:48:37,255-Speed 5960.69 samples/sec   Loss 9.5307   LearningRate 0.1883   Epoch: 7   Global Step: 79330   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:48:44,125-Speed 5963.64 samples/sec   Loss 9.5643   LearningRate 0.1882   Epoch: 7   Global Step: 79340   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:48:50,978-Speed 5977.37 samples/sec   Loss 9.5994   LearningRate 0.1882   Epoch: 7   Global Step: 79350   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:48:57,826-Speed 5982.53 samples/sec   Loss 9.5313   LearningRate 0.1882   Epoch: 7   Global Step: 79360   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:49:04,687-Speed 5971.62 samples/sec   Loss 9.6137   LearningRate 0.1882   Epoch: 7   Global Step: 79370   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 11:49:11,545-Speed 5973.57 samples/sec   Loss 9.5886   LearningRate 0.1881   Epoch: 7   Global Step: 79380   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 11:49:18,430-Speed 5949.49 samples/sec   Loss 9.6017   LearningRate 0.1881   Epoch: 7   Global Step: 79390   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 11:49:25,328-Speed 5939.84 samples/sec   Loss 9.6249   LearningRate 0.1881   Epoch: 7   Global Step: 79400   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 11:49:32,193-Speed 5967.17 samples/sec   Loss 9.5834   LearningRate 0.1880   Epoch: 7   Global Step: 79410   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 11:49:39,051-Speed 5974.26 samples/sec   Loss 9.5443   LearningRate 0.1880   Epoch: 7   Global Step: 79420   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 11:49:45,935-Speed 5951.35 samples/sec   Loss 9.5734   LearningRate 0.1880   Epoch: 7   Global Step: 79430   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 11:49:52,791-Speed 5975.34 samples/sec   Loss 9.5454   LearningRate 0.1880   Epoch: 7   Global Step: 79440   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 11:49:59,644-Speed 5978.92 samples/sec   Loss 9.6537   LearningRate 0.1879   Epoch: 7   Global Step: 79450   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 11:50:06,490-Speed 5984.36 samples/sec   Loss 9.5864   LearningRate 0.1879   Epoch: 7   Global Step: 79460   Fp16 Grad Scale: 16384   Required: 25 hours
Training: 2022-01-08 11:50:13,351-Speed 5971.00 samples/sec   Loss 9.5968   LearningRate 0.1879   Epoch: 7   Global Step: 79470   Fp16 Grad Scale: 16384   Required: 25 hours
Training: 2022-01-08 11:50:20,200-Speed 5981.52 samples/sec   Loss 9.5078   LearningRate 0.1878   Epoch: 7   Global Step: 79480   Fp16 Grad Scale: 16384   Required: 25 hours
Training: 2022-01-08 11:50:27,046-Speed 5984.15 samples/sec   Loss 9.5599   LearningRate 0.1878   Epoch: 7   Global Step: 79490   Fp16 Grad Scale: 16384   Required: 25 hours
Training: 2022-01-08 11:50:33,918-Speed 5961.52 samples/sec   Loss 9.5727   LearningRate 0.1878   Epoch: 7   Global Step: 79500   Fp16 Grad Scale: 16384   Required: 25 hours
Training: 2022-01-08 11:50:40,799-Speed 5953.78 samples/sec   Loss 9.4855   LearningRate 0.1877   Epoch: 7   Global Step: 79510   Fp16 Grad Scale: 16384   Required: 25 hours
Training: 2022-01-08 11:50:47,653-Speed 5977.47 samples/sec   Loss 9.5125   LearningRate 0.1877   Epoch: 7   Global Step: 79520   Fp16 Grad Scale: 16384   Required: 25 hours
Training: 2022-01-08 11:50:54,508-Speed 5975.98 samples/sec   Loss 9.5962   LearningRate 0.1877   Epoch: 7   Global Step: 79530   Fp16 Grad Scale: 16384   Required: 25 hours
Training: 2022-01-08 11:51:01,357-Speed 5981.50 samples/sec   Loss 9.6399   LearningRate 0.1877   Epoch: 7   Global Step: 79540   Fp16 Grad Scale: 16384   Required: 25 hours
Training: 2022-01-08 11:51:08,266-Speed 5929.89 samples/sec   Loss 9.6618   LearningRate 0.1876   Epoch: 7   Global Step: 79550   Fp16 Grad Scale: 16384   Required: 25 hours
Training: 2022-01-08 11:51:15,219-Speed 5892.68 samples/sec   Loss 9.5404   LearningRate 0.1876   Epoch: 7   Global Step: 79560   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 11:51:22,071-Speed 5980.18 samples/sec   Loss 9.4591   LearningRate 0.1876   Epoch: 7   Global Step: 79570   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 11:51:28,937-Speed 5966.20 samples/sec   Loss 9.6284   LearningRate 0.1875   Epoch: 7   Global Step: 79580   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 11:51:35,799-Speed 5970.41 samples/sec   Loss 9.6079   LearningRate 0.1875   Epoch: 7   Global Step: 79590   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 11:51:42,644-Speed 5984.64 samples/sec   Loss 9.5349   LearningRate 0.1875   Epoch: 7   Global Step: 79600   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 11:51:49,503-Speed 5973.16 samples/sec   Loss 9.5305   LearningRate 0.1875   Epoch: 7   Global Step: 79610   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 11:51:56,339-Speed 5992.37 samples/sec   Loss 9.6283   LearningRate 0.1874   Epoch: 7   Global Step: 79620   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 11:52:03,199-Speed 5972.61 samples/sec   Loss 9.4766   LearningRate 0.1874   Epoch: 7   Global Step: 79630   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 11:52:10,040-Speed 5988.66 samples/sec   Loss 9.5876   LearningRate 0.1874   Epoch: 7   Global Step: 79640   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 11:52:16,886-Speed 5984.46 samples/sec   Loss 9.5538   LearningRate 0.1873   Epoch: 7   Global Step: 79650   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-01-08 11:52:23,762-Speed 5957.66 samples/sec   Loss 9.5640   LearningRate 0.1873   Epoch: 7   Global Step: 79660   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 11:52:30,627-Speed 5968.02 samples/sec   Loss 9.5228   LearningRate 0.1873   Epoch: 7   Global Step: 79670   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 11:52:37,500-Speed 5960.37 samples/sec   Loss 9.5331   LearningRate 0.1873   Epoch: 7   Global Step: 79680   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 11:52:44,363-Speed 5969.30 samples/sec   Loss 9.5451   LearningRate 0.1872   Epoch: 7   Global Step: 79690   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 11:52:51,238-Speed 5959.44 samples/sec   Loss 9.5264   LearningRate 0.1872   Epoch: 7   Global Step: 79700   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 11:52:58,094-Speed 5974.76 samples/sec   Loss 9.4435   LearningRate 0.1872   Epoch: 7   Global Step: 79710   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 11:53:04,947-Speed 5978.50 samples/sec   Loss 9.4963   LearningRate 0.1871   Epoch: 7   Global Step: 79720   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 11:53:11,803-Speed 5976.08 samples/sec   Loss 9.5346   LearningRate 0.1871   Epoch: 7   Global Step: 79730   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 11:53:18,687-Speed 5956.68 samples/sec   Loss 9.6122   LearningRate 0.1871   Epoch: 7   Global Step: 79740   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 11:53:25,565-Speed 5955.78 samples/sec   Loss 9.5094   LearningRate 0.1870   Epoch: 7   Global Step: 79750   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 11:53:32,424-Speed 5972.78 samples/sec   Loss 9.4621   LearningRate 0.1870   Epoch: 7   Global Step: 79760   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:53:39,277-Speed 5977.85 samples/sec   Loss 9.4942   LearningRate 0.1870   Epoch: 7   Global Step: 79770   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:53:46,146-Speed 5964.05 samples/sec   Loss 9.5875   LearningRate 0.1870   Epoch: 7   Global Step: 79780   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:53:53,014-Speed 5967.89 samples/sec   Loss 9.5110   LearningRate 0.1869   Epoch: 7   Global Step: 79790   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:53:59,868-Speed 5977.00 samples/sec   Loss 9.4948   LearningRate 0.1869   Epoch: 7   Global Step: 79800   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:54:06,714-Speed 5984.46 samples/sec   Loss 9.5054   LearningRate 0.1869   Epoch: 7   Global Step: 79810   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:54:13,596-Speed 5952.91 samples/sec   Loss 9.4955   LearningRate 0.1868   Epoch: 7   Global Step: 79820   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:54:20,448-Speed 5978.71 samples/sec   Loss 9.4660   LearningRate 0.1868   Epoch: 7   Global Step: 79830   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:54:27,343-Speed 5942.11 samples/sec   Loss 9.5510   LearningRate 0.1868   Epoch: 7   Global Step: 79840   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:54:34,189-Speed 5983.46 samples/sec   Loss 9.4905   LearningRate 0.1868   Epoch: 7   Global Step: 79850   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:54:41,045-Speed 5975.62 samples/sec   Loss 9.5010   LearningRate 0.1867   Epoch: 7   Global Step: 79860   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 11:54:48,671-Speed 5373.73 samples/sec   Loss 9.4900   LearningRate 0.1867   Epoch: 7   Global Step: 79870   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 11:54:55,540-Speed 5967.35 samples/sec   Loss 9.5148   LearningRate 0.1867   Epoch: 7   Global Step: 79880   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 11:55:02,397-Speed 5973.91 samples/sec   Loss 9.5700   LearningRate 0.1866   Epoch: 7   Global Step: 79890   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 11:55:09,257-Speed 5971.68 samples/sec   Loss 9.5856   LearningRate 0.1866   Epoch: 7   Global Step: 79900   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 11:55:16,128-Speed 5962.93 samples/sec   Loss 9.5497   LearningRate 0.1866   Epoch: 7   Global Step: 79910   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 11:55:23,024-Speed 5941.33 samples/sec   Loss 9.4699   LearningRate 0.1865   Epoch: 7   Global Step: 79920   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 11:55:29,911-Speed 5948.44 samples/sec   Loss 9.5203   LearningRate 0.1865   Epoch: 7   Global Step: 79930   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:55:36,789-Speed 5956.34 samples/sec   Loss 9.5646   LearningRate 0.1865   Epoch: 7   Global Step: 79940   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:55:43,648-Speed 5973.01 samples/sec   Loss 9.5517   LearningRate 0.1865   Epoch: 7   Global Step: 79950   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:55:50,523-Speed 5960.88 samples/sec   Loss 9.5407   LearningRate 0.1864   Epoch: 7   Global Step: 79960   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:55:57,372-Speed 5982.22 samples/sec   Loss 9.5041   LearningRate 0.1864   Epoch: 7   Global Step: 79970   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:56:04,222-Speed 5982.99 samples/sec   Loss 9.5452   LearningRate 0.1864   Epoch: 7   Global Step: 79980   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:56:11,104-Speed 5952.63 samples/sec   Loss 9.5323   LearningRate 0.1863   Epoch: 7   Global Step: 79990   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:56:17,976-Speed 5962.58 samples/sec   Loss 9.4844   LearningRate 0.1863   Epoch: 7   Global Step: 80000   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:56:49,243-[lfw][80000]XNorm: 22.646365
Training: 2022-01-08 11:56:49,244-[lfw][80000]Accuracy-Flip: 0.99750+-0.00327
Training: 2022-01-08 11:56:49,245-[lfw][80000]Accuracy-Highest: 0.99750
Training: 2022-01-08 11:57:20,102-[cfp_fp][80000]XNorm: 19.757936
Training: 2022-01-08 11:57:20,102-[cfp_fp][80000]Accuracy-Flip: 0.97786+-0.00659
Training: 2022-01-08 11:57:20,102-[cfp_fp][80000]Accuracy-Highest: 0.97786
Training: 2022-01-08 11:57:46,565-[agedb_30][80000]XNorm: 22.282000
Training: 2022-01-08 11:57:46,566-[agedb_30][80000]Accuracy-Flip: 0.96883+-0.00742
Training: 2022-01-08 11:57:46,566-[agedb_30][80000]Accuracy-Highest: 0.96883
Training: 2022-01-08 11:57:53,425-Speed 429.13 samples/sec   Loss 9.5204   LearningRate 0.1863   Epoch: 7   Global Step: 80010   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:58:00,290-Speed 5968.42 samples/sec   Loss 9.5827   LearningRate 0.1863   Epoch: 7   Global Step: 80020   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:58:07,142-Speed 5979.23 samples/sec   Loss 9.5709   LearningRate 0.1862   Epoch: 7   Global Step: 80030   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 11:58:13,981-Speed 5990.83 samples/sec   Loss 9.5513   LearningRate 0.1862   Epoch: 7   Global Step: 80040   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 11:58:20,834-Speed 5977.83 samples/sec   Loss 9.5179   LearningRate 0.1862   Epoch: 7   Global Step: 80050   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 11:58:27,705-Speed 5962.07 samples/sec   Loss 9.5389   LearningRate 0.1861   Epoch: 7   Global Step: 80060   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:58:34,567-Speed 5970.53 samples/sec   Loss 9.5530   LearningRate 0.1861   Epoch: 7   Global Step: 80070   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:58:41,433-Speed 5967.46 samples/sec   Loss 9.5194   LearningRate 0.1861   Epoch: 7   Global Step: 80080   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:58:48,299-Speed 5966.53 samples/sec   Loss 9.5177   LearningRate 0.1861   Epoch: 7   Global Step: 80090   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:58:55,169-Speed 5963.10 samples/sec   Loss 9.5463   LearningRate 0.1860   Epoch: 7   Global Step: 80100   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:59:02,021-Speed 5979.39 samples/sec   Loss 9.5778   LearningRate 0.1860   Epoch: 7   Global Step: 80110   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:59:08,921-Speed 5938.10 samples/sec   Loss 9.5347   LearningRate 0.1860   Epoch: 7   Global Step: 80120   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:59:15,766-Speed 5984.23 samples/sec   Loss 9.4851   LearningRate 0.1859   Epoch: 7   Global Step: 80130   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:59:22,635-Speed 5964.31 samples/sec   Loss 9.5789   LearningRate 0.1859   Epoch: 7   Global Step: 80140   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:59:29,495-Speed 5972.11 samples/sec   Loss 9.4104   LearningRate 0.1859   Epoch: 7   Global Step: 80150   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 11:59:36,349-Speed 5977.22 samples/sec   Loss 9.4212   LearningRate 0.1858   Epoch: 7   Global Step: 80160   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 11:59:43,226-Speed 5956.94 samples/sec   Loss 9.5122   LearningRate 0.1858   Epoch: 7   Global Step: 80170   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 11:59:50,078-Speed 5979.72 samples/sec   Loss 9.5882   LearningRate 0.1858   Epoch: 7   Global Step: 80180   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 11:59:56,922-Speed 5985.11 samples/sec   Loss 9.5256   LearningRate 0.1858   Epoch: 7   Global Step: 80190   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 12:00:03,771-Speed 5981.14 samples/sec   Loss 9.5012   LearningRate 0.1857   Epoch: 7   Global Step: 80200   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:00:10,625-Speed 5978.25 samples/sec   Loss 9.4985   LearningRate 0.1857   Epoch: 7   Global Step: 80210   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:00:17,468-Speed 5986.47 samples/sec   Loss 9.4694   LearningRate 0.1857   Epoch: 7   Global Step: 80220   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:00:24,324-Speed 5974.90 samples/sec   Loss 9.4948   LearningRate 0.1856   Epoch: 7   Global Step: 80230   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:00:31,182-Speed 5974.14 samples/sec   Loss 9.5539   LearningRate 0.1856   Epoch: 7   Global Step: 80240   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:00:38,084-Speed 5935.85 samples/sec   Loss 9.5141   LearningRate 0.1856   Epoch: 7   Global Step: 80250   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:00:44,971-Speed 5950.14 samples/sec   Loss 9.5297   LearningRate 0.1856   Epoch: 7   Global Step: 80260   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:00:51,851-Speed 5953.97 samples/sec   Loss 9.4881   LearningRate 0.1855   Epoch: 7   Global Step: 80270   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:00:58,701-Speed 5982.23 samples/sec   Loss 9.4859   LearningRate 0.1855   Epoch: 7   Global Step: 80280   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:01:05,552-Speed 5979.99 samples/sec   Loss 9.4559   LearningRate 0.1855   Epoch: 7   Global Step: 80290   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:01:12,401-Speed 5981.66 samples/sec   Loss 9.6212   LearningRate 0.1854   Epoch: 7   Global Step: 80300   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 12:01:19,263-Speed 5970.31 samples/sec   Loss 9.5068   LearningRate 0.1854   Epoch: 7   Global Step: 80310   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 12:01:26,113-Speed 5980.58 samples/sec   Loss 9.6056   LearningRate 0.1854   Epoch: 7   Global Step: 80320   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:01:32,965-Speed 5978.98 samples/sec   Loss 9.5623   LearningRate 0.1853   Epoch: 7   Global Step: 80330   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:01:39,807-Speed 5987.41 samples/sec   Loss 9.4717   LearningRate 0.1853   Epoch: 7   Global Step: 80340   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:01:46,665-Speed 5975.09 samples/sec   Loss 9.5138   LearningRate 0.1853   Epoch: 7   Global Step: 80350   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:01:53,518-Speed 5978.36 samples/sec   Loss 9.4482   LearningRate 0.1853   Epoch: 7   Global Step: 80360   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:02:00,377-Speed 5972.92 samples/sec   Loss 9.5298   LearningRate 0.1852   Epoch: 7   Global Step: 80370   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:02:07,237-Speed 5974.76 samples/sec   Loss 9.4438   LearningRate 0.1852   Epoch: 7   Global Step: 80380   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:02:14,092-Speed 5985.60 samples/sec   Loss 9.4692   LearningRate 0.1852   Epoch: 7   Global Step: 80390   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:02:20,944-Speed 5981.58 samples/sec   Loss 9.4284   LearningRate 0.1851   Epoch: 7   Global Step: 80400   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:02:27,826-Speed 5952.89 samples/sec   Loss 9.4733   LearningRate 0.1851   Epoch: 7   Global Step: 80410   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:02:34,680-Speed 5977.72 samples/sec   Loss 9.4840   LearningRate 0.1851   Epoch: 7   Global Step: 80420   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 12:02:41,527-Speed 5982.63 samples/sec   Loss 9.4532   LearningRate 0.1851   Epoch: 7   Global Step: 80430   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 12:02:48,386-Speed 5975.98 samples/sec   Loss 9.4421   LearningRate 0.1850   Epoch: 7   Global Step: 80440   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 12:02:55,226-Speed 5990.14 samples/sec   Loss 9.5348   LearningRate 0.1850   Epoch: 7   Global Step: 80450   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:03:02,076-Speed 5981.01 samples/sec   Loss 9.4853   LearningRate 0.1850   Epoch: 7   Global Step: 80460   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:03:08,938-Speed 5969.48 samples/sec   Loss 9.3668   LearningRate 0.1849   Epoch: 7   Global Step: 80470   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:03:15,793-Speed 5979.40 samples/sec   Loss 9.5373   LearningRate 0.1849   Epoch: 7   Global Step: 80480   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:03:22,654-Speed 5971.57 samples/sec   Loss 9.4630   LearningRate 0.1849   Epoch: 7   Global Step: 80490   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:03:29,508-Speed 5976.79 samples/sec   Loss 9.5376   LearningRate 0.1849   Epoch: 7   Global Step: 80500   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:03:36,387-Speed 5957.77 samples/sec   Loss 9.4602   LearningRate 0.1848   Epoch: 7   Global Step: 80510   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:03:43,229-Speed 5988.60 samples/sec   Loss 9.5352   LearningRate 0.1848   Epoch: 7   Global Step: 80520   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:03:50,078-Speed 5981.42 samples/sec   Loss 9.4863   LearningRate 0.1848   Epoch: 7   Global Step: 80530   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:03:56,942-Speed 5968.19 samples/sec   Loss 9.4750   LearningRate 0.1847   Epoch: 7   Global Step: 80540   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:04:03,813-Speed 5963.15 samples/sec   Loss 9.5047   LearningRate 0.1847   Epoch: 7   Global Step: 80550   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:04:10,665-Speed 5978.43 samples/sec   Loss 9.5001   LearningRate 0.1847   Epoch: 7   Global Step: 80560   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:04:17,522-Speed 5974.91 samples/sec   Loss 9.4964   LearningRate 0.1846   Epoch: 7   Global Step: 80570   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:04:24,396-Speed 5959.42 samples/sec   Loss 9.5056   LearningRate 0.1846   Epoch: 7   Global Step: 80580   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:04:31,256-Speed 5972.56 samples/sec   Loss 9.4805   LearningRate 0.1846   Epoch: 7   Global Step: 80590   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:04:38,110-Speed 5976.94 samples/sec   Loss 9.6440   LearningRate 0.1846   Epoch: 7   Global Step: 80600   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:04:44,995-Speed 5950.72 samples/sec   Loss 9.5293   LearningRate 0.1845   Epoch: 7   Global Step: 80610   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:04:51,847-Speed 5978.70 samples/sec   Loss 9.5175   LearningRate 0.1845   Epoch: 7   Global Step: 80620   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:04:58,727-Speed 5954.53 samples/sec   Loss 9.5153   LearningRate 0.1845   Epoch: 7   Global Step: 80630   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:05:05,597-Speed 5963.46 samples/sec   Loss 9.5230   LearningRate 0.1844   Epoch: 7   Global Step: 80640   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:05:12,453-Speed 5974.86 samples/sec   Loss 9.4571   LearningRate 0.1844   Epoch: 7   Global Step: 80650   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:05:19,305-Speed 5978.72 samples/sec   Loss 9.5047   LearningRate 0.1844   Epoch: 7   Global Step: 80660   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:05:26,173-Speed 5965.52 samples/sec   Loss 9.4999   LearningRate 0.1844   Epoch: 7   Global Step: 80670   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:05:33,024-Speed 5979.34 samples/sec   Loss 9.5124   LearningRate 0.1843   Epoch: 7   Global Step: 80680   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 12:05:39,880-Speed 5975.19 samples/sec   Loss 9.5498   LearningRate 0.1843   Epoch: 7   Global Step: 80690   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:05:46,750-Speed 5964.15 samples/sec   Loss 9.4766   LearningRate 0.1843   Epoch: 7   Global Step: 80700   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:05:53,618-Speed 5964.68 samples/sec   Loss 9.5756   LearningRate 0.1842   Epoch: 7   Global Step: 80710   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:06:00,478-Speed 5971.36 samples/sec   Loss 9.4559   LearningRate 0.1842   Epoch: 7   Global Step: 80720   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:06:07,323-Speed 5984.92 samples/sec   Loss 9.4780   LearningRate 0.1842   Epoch: 7   Global Step: 80730   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:06:14,163-Speed 5988.94 samples/sec   Loss 9.4690   LearningRate 0.1842   Epoch: 7   Global Step: 80740   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:06:21,057-Speed 5943.59 samples/sec   Loss 9.5478   LearningRate 0.1841   Epoch: 7   Global Step: 80750   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:06:27,913-Speed 5974.73 samples/sec   Loss 9.5014   LearningRate 0.1841   Epoch: 7   Global Step: 80760   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:06:34,791-Speed 5956.58 samples/sec   Loss 9.5687   LearningRate 0.1841   Epoch: 7   Global Step: 80770   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:06:41,638-Speed 5983.33 samples/sec   Loss 9.4663   LearningRate 0.1840   Epoch: 7   Global Step: 80780   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:06:48,501-Speed 5969.43 samples/sec   Loss 9.5124   LearningRate 0.1840   Epoch: 7   Global Step: 80790   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:06:55,383-Speed 5952.24 samples/sec   Loss 9.5592   LearningRate 0.1840   Epoch: 7   Global Step: 80800   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:07:02,237-Speed 5977.31 samples/sec   Loss 9.3976   LearningRate 0.1840   Epoch: 7   Global Step: 80810   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:07:09,088-Speed 5980.05 samples/sec   Loss 9.4367   LearningRate 0.1839   Epoch: 7   Global Step: 80820   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:07:15,949-Speed 5973.58 samples/sec   Loss 9.4167   LearningRate 0.1839   Epoch: 7   Global Step: 80830   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:07:22,813-Speed 5968.46 samples/sec   Loss 9.4742   LearningRate 0.1839   Epoch: 7   Global Step: 80840   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:07:29,702-Speed 5946.30 samples/sec   Loss 9.5133   LearningRate 0.1838   Epoch: 7   Global Step: 80850   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:07:36,557-Speed 5976.95 samples/sec   Loss 9.4694   LearningRate 0.1838   Epoch: 7   Global Step: 80860   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:07:43,430-Speed 5960.51 samples/sec   Loss 9.4055   LearningRate 0.1838   Epoch: 7   Global Step: 80870   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:07:50,284-Speed 5979.27 samples/sec   Loss 9.4394   LearningRate 0.1837   Epoch: 7   Global Step: 80880   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:07:57,144-Speed 5972.18 samples/sec   Loss 9.4653   LearningRate 0.1837   Epoch: 7   Global Step: 80890   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:08:04,017-Speed 5960.29 samples/sec   Loss 9.4052   LearningRate 0.1837   Epoch: 7   Global Step: 80900   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:08:10,868-Speed 5979.99 samples/sec   Loss 9.4659   LearningRate 0.1837   Epoch: 7   Global Step: 80910   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:08:17,741-Speed 5959.82 samples/sec   Loss 9.4983   LearningRate 0.1836   Epoch: 7   Global Step: 80920   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:08:24,654-Speed 5926.12 samples/sec   Loss 9.4641   LearningRate 0.1836   Epoch: 7   Global Step: 80930   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:08:31,548-Speed 5942.87 samples/sec   Loss 9.4927   LearningRate 0.1836   Epoch: 7   Global Step: 80940   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 12:08:38,426-Speed 5956.79 samples/sec   Loss 9.5120   LearningRate 0.1835   Epoch: 7   Global Step: 80950   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 12:08:45,293-Speed 5965.86 samples/sec   Loss 9.4999   LearningRate 0.1835   Epoch: 7   Global Step: 80960   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 12:08:52,161-Speed 5965.03 samples/sec   Loss 9.4568   LearningRate 0.1835   Epoch: 7   Global Step: 80970   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:08:59,027-Speed 5966.91 samples/sec   Loss 9.4052   LearningRate 0.1835   Epoch: 7   Global Step: 80980   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:09:05,934-Speed 5931.38 samples/sec   Loss 9.4150   LearningRate 0.1834   Epoch: 7   Global Step: 80990   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:09:12,822-Speed 5948.10 samples/sec   Loss 9.4296   LearningRate 0.1834   Epoch: 7   Global Step: 81000   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:09:19,672-Speed 5980.71 samples/sec   Loss 9.5394   LearningRate 0.1834   Epoch: 7   Global Step: 81010   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:09:26,542-Speed 5963.98 samples/sec   Loss 9.4532   LearningRate 0.1833   Epoch: 7   Global Step: 81020   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:09:33,403-Speed 5971.36 samples/sec   Loss 9.4439   LearningRate 0.1833   Epoch: 7   Global Step: 81030   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:09:40,288-Speed 5950.19 samples/sec   Loss 9.4789   LearningRate 0.1833   Epoch: 7   Global Step: 81040   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:09:47,165-Speed 5957.41 samples/sec   Loss 9.5030   LearningRate 0.1833   Epoch: 7   Global Step: 81050   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:09:54,031-Speed 5967.88 samples/sec   Loss 9.4417   LearningRate 0.1832   Epoch: 7   Global Step: 81060   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:10:00,885-Speed 5976.76 samples/sec   Loss 9.4316   LearningRate 0.1832   Epoch: 7   Global Step: 81070   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:10:07,877-Speed 5859.85 samples/sec   Loss 9.4534   LearningRate 0.1832   Epoch: 7   Global Step: 81080   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:10:14,848-Speed 5876.74 samples/sec   Loss 9.4572   LearningRate 0.1831   Epoch: 7   Global Step: 81090   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:10:21,770-Speed 5918.51 samples/sec   Loss 9.4684   LearningRate 0.1831   Epoch: 7   Global Step: 81100   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:10:28,660-Speed 5946.96 samples/sec   Loss 9.4856   LearningRate 0.1831   Epoch: 7   Global Step: 81110   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:10:35,528-Speed 5964.88 samples/sec   Loss 9.4176   LearningRate 0.1831   Epoch: 7   Global Step: 81120   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:10:42,408-Speed 5954.29 samples/sec   Loss 9.4307   LearningRate 0.1830   Epoch: 7   Global Step: 81130   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:10:49,257-Speed 5982.09 samples/sec   Loss 9.3939   LearningRate 0.1830   Epoch: 7   Global Step: 81140   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:10:56,106-Speed 5981.23 samples/sec   Loss 9.4714   LearningRate 0.1830   Epoch: 7   Global Step: 81150   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:11:02,969-Speed 5968.85 samples/sec   Loss 9.4783   LearningRate 0.1829   Epoch: 7   Global Step: 81160   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:11:09,830-Speed 5971.03 samples/sec   Loss 9.4714   LearningRate 0.1829   Epoch: 7   Global Step: 81170   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:11:16,702-Speed 5962.06 samples/sec   Loss 9.5003   LearningRate 0.1829   Epoch: 7   Global Step: 81180   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:11:23,590-Speed 5947.72 samples/sec   Loss 9.4677   LearningRate 0.1828   Epoch: 7   Global Step: 81190   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:11:30,477-Speed 5949.38 samples/sec   Loss 9.4692   LearningRate 0.1828   Epoch: 7   Global Step: 81200   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:11:37,362-Speed 5949.92 samples/sec   Loss 9.4539   LearningRate 0.1828   Epoch: 7   Global Step: 81210   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 12:11:44,259-Speed 5939.53 samples/sec   Loss 9.4942   LearningRate 0.1828   Epoch: 7   Global Step: 81220   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 12:11:51,196-Speed 5905.98 samples/sec   Loss 9.3586   LearningRate 0.1827   Epoch: 7   Global Step: 81230   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:11:58,056-Speed 5974.26 samples/sec   Loss 9.3721   LearningRate 0.1827   Epoch: 7   Global Step: 81240   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:12:04,910-Speed 5977.05 samples/sec   Loss 9.5127   LearningRate 0.1827   Epoch: 7   Global Step: 81250   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:12:11,753-Speed 5986.80 samples/sec   Loss 9.5181   LearningRate 0.1826   Epoch: 7   Global Step: 81260   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:12:18,598-Speed 5985.79 samples/sec   Loss 9.3521   LearningRate 0.1826   Epoch: 7   Global Step: 81270   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:12:25,462-Speed 5968.76 samples/sec   Loss 9.4581   LearningRate 0.1826   Epoch: 7   Global Step: 81280   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:12:32,321-Speed 5972.34 samples/sec   Loss 9.3900   LearningRate 0.1826   Epoch: 7   Global Step: 81290   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:12:39,187-Speed 5966.96 samples/sec   Loss 9.4388   LearningRate 0.1825   Epoch: 7   Global Step: 81300   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:12:46,062-Speed 5958.53 samples/sec   Loss 9.4019   LearningRate 0.1825   Epoch: 7   Global Step: 81310   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:12:52,925-Speed 5969.60 samples/sec   Loss 9.4093   LearningRate 0.1825   Epoch: 7   Global Step: 81320   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:12:59,779-Speed 5977.79 samples/sec   Loss 9.4510   LearningRate 0.1824   Epoch: 7   Global Step: 81330   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:13:06,637-Speed 5973.39 samples/sec   Loss 9.4259   LearningRate 0.1824   Epoch: 7   Global Step: 81340   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:13:13,482-Speed 5985.05 samples/sec   Loss 9.4301   LearningRate 0.1824   Epoch: 7   Global Step: 81350   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:13:20,335-Speed 5976.97 samples/sec   Loss 9.4003   LearningRate 0.1824   Epoch: 7   Global Step: 81360   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:13:27,215-Speed 5955.20 samples/sec   Loss 9.4481   LearningRate 0.1823   Epoch: 7   Global Step: 81370   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:13:34,071-Speed 5976.27 samples/sec   Loss 9.3942   LearningRate 0.1823   Epoch: 7   Global Step: 81380   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:13:40,937-Speed 5966.40 samples/sec   Loss 9.4540   LearningRate 0.1823   Epoch: 7   Global Step: 81390   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:13:47,791-Speed 5977.82 samples/sec   Loss 9.4445   LearningRate 0.1822   Epoch: 7   Global Step: 81400   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:13:54,643-Speed 5978.11 samples/sec   Loss 9.4403   LearningRate 0.1822   Epoch: 7   Global Step: 81410   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:14:01,490-Speed 5982.56 samples/sec   Loss 9.4523   LearningRate 0.1822   Epoch: 7   Global Step: 81420   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:14:08,455-Speed 5882.31 samples/sec   Loss 9.3611   LearningRate 0.1822   Epoch: 7   Global Step: 81430   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:14:15,316-Speed 5971.22 samples/sec   Loss 9.4809   LearningRate 0.1821   Epoch: 7   Global Step: 81440   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 12:14:22,176-Speed 5971.64 samples/sec   Loss 9.5191   LearningRate 0.1821   Epoch: 7   Global Step: 81450   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 12:14:29,061-Speed 5950.20 samples/sec   Loss 9.4071   LearningRate 0.1821   Epoch: 7   Global Step: 81460   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 12:14:35,939-Speed 5956.49 samples/sec   Loss 9.3883   LearningRate 0.1820   Epoch: 7   Global Step: 81470   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 12:14:42,812-Speed 5961.05 samples/sec   Loss 9.3989   LearningRate 0.1820   Epoch: 7   Global Step: 81480   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 12:14:49,703-Speed 5945.02 samples/sec   Loss 9.4736   LearningRate 0.1820   Epoch: 7   Global Step: 81490   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 12:14:56,568-Speed 5967.87 samples/sec   Loss 9.3818   LearningRate 0.1820   Epoch: 7   Global Step: 81500   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 12:15:03,438-Speed 5963.35 samples/sec   Loss 9.3853   LearningRate 0.1819   Epoch: 7   Global Step: 81510   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 12:15:10,297-Speed 5972.66 samples/sec   Loss 9.3260   LearningRate 0.1819   Epoch: 7   Global Step: 81520   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 12:15:17,151-Speed 5978.10 samples/sec   Loss 9.3524   LearningRate 0.1819   Epoch: 7   Global Step: 81530   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:15:24,000-Speed 5981.01 samples/sec   Loss 9.4105   LearningRate 0.1818   Epoch: 7   Global Step: 81540   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:15:30,875-Speed 5958.50 samples/sec   Loss 9.5040   LearningRate 0.1818   Epoch: 7   Global Step: 81550   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:15:37,739-Speed 5968.31 samples/sec   Loss 9.4766   LearningRate 0.1818   Epoch: 7   Global Step: 81560   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:15:44,589-Speed 5981.00 samples/sec   Loss 9.5059   LearningRate 0.1817   Epoch: 7   Global Step: 81570   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:15:51,457-Speed 5964.84 samples/sec   Loss 9.4718   LearningRate 0.1817   Epoch: 7   Global Step: 81580   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:15:58,310-Speed 5977.64 samples/sec   Loss 9.4507   LearningRate 0.1817   Epoch: 7   Global Step: 81590   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:16:05,161-Speed 5979.65 samples/sec   Loss 9.4515   LearningRate 0.1817   Epoch: 7   Global Step: 81600   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:16:12,015-Speed 5977.54 samples/sec   Loss 9.4610   LearningRate 0.1816   Epoch: 7   Global Step: 81610   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:16:18,869-Speed 5978.29 samples/sec   Loss 9.4023   LearningRate 0.1816   Epoch: 7   Global Step: 81620   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:16:25,738-Speed 5964.26 samples/sec   Loss 9.4317   LearningRate 0.1816   Epoch: 7   Global Step: 81630   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 12:16:32,593-Speed 5978.30 samples/sec   Loss 9.4201   LearningRate 0.1815   Epoch: 7   Global Step: 81640   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:16:39,457-Speed 5968.40 samples/sec   Loss 9.3999   LearningRate 0.1815   Epoch: 7   Global Step: 81650   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:16:46,328-Speed 5962.93 samples/sec   Loss 9.4291   LearningRate 0.1815   Epoch: 7   Global Step: 81660   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:16:53,184-Speed 5975.00 samples/sec   Loss 9.3615   LearningRate 0.1815   Epoch: 7   Global Step: 81670   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:17:00,040-Speed 5975.79 samples/sec   Loss 9.4143   LearningRate 0.1814   Epoch: 7   Global Step: 81680   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:17:06,889-Speed 5981.39 samples/sec   Loss 9.4230   LearningRate 0.1814   Epoch: 7   Global Step: 81690   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:17:13,749-Speed 5973.77 samples/sec   Loss 9.3473   LearningRate 0.1814   Epoch: 7   Global Step: 81700   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:17:20,603-Speed 5976.95 samples/sec   Loss 9.4460   LearningRate 0.1813   Epoch: 7   Global Step: 81710   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:17:27,466-Speed 5969.06 samples/sec   Loss 9.3885   LearningRate 0.1813   Epoch: 7   Global Step: 81720   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:17:34,319-Speed 5978.43 samples/sec   Loss 9.4636   LearningRate 0.1813   Epoch: 7   Global Step: 81730   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:17:41,179-Speed 5972.42 samples/sec   Loss 9.4892   LearningRate 0.1813   Epoch: 7   Global Step: 81740   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 12:17:48,020-Speed 5988.47 samples/sec   Loss 9.4757   LearningRate 0.1812   Epoch: 7   Global Step: 81750   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:17:54,869-Speed 5980.98 samples/sec   Loss 9.4230   LearningRate 0.1812   Epoch: 7   Global Step: 81760   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:18:01,720-Speed 5980.33 samples/sec   Loss 9.3762   LearningRate 0.1812   Epoch: 7   Global Step: 81770   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:18:08,574-Speed 5976.08 samples/sec   Loss 9.4517   LearningRate 0.1811   Epoch: 7   Global Step: 81780   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:18:15,432-Speed 5974.49 samples/sec   Loss 9.4461   LearningRate 0.1811   Epoch: 7   Global Step: 81790   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:18:22,370-Speed 5905.00 samples/sec   Loss 9.3623   LearningRate 0.1811   Epoch: 7   Global Step: 81800   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:18:29,340-Speed 5877.51 samples/sec   Loss 9.4836   LearningRate 0.1811   Epoch: 7   Global Step: 81810   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:18:36,191-Speed 5980.26 samples/sec   Loss 9.4028   LearningRate 0.1810   Epoch: 7   Global Step: 81820   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:18:43,063-Speed 5961.79 samples/sec   Loss 9.3505   LearningRate 0.1810   Epoch: 7   Global Step: 81830   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:18:49,926-Speed 5969.19 samples/sec   Loss 9.3375   LearningRate 0.1810   Epoch: 7   Global Step: 81840   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:18:56,775-Speed 5981.64 samples/sec   Loss 9.3848   LearningRate 0.1809   Epoch: 7   Global Step: 81850   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:19:03,659-Speed 5952.56 samples/sec   Loss 9.4702   LearningRate 0.1809   Epoch: 7   Global Step: 81860   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:19:10,523-Speed 5968.44 samples/sec   Loss 9.4133   LearningRate 0.1809   Epoch: 7   Global Step: 81870   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:19:17,383-Speed 5975.19 samples/sec   Loss 9.3162   LearningRate 0.1809   Epoch: 7   Global Step: 81880   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:19:24,243-Speed 5971.82 samples/sec   Loss 9.3981   LearningRate 0.1808   Epoch: 7   Global Step: 81890   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:19:31,097-Speed 5977.42 samples/sec   Loss 9.4172   LearningRate 0.1808   Epoch: 7   Global Step: 81900   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:19:37,975-Speed 5956.45 samples/sec   Loss 9.4302   LearningRate 0.1808   Epoch: 7   Global Step: 81910   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:19:44,830-Speed 5978.06 samples/sec   Loss 9.4550   LearningRate 0.1807   Epoch: 7   Global Step: 81920   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:19:51,730-Speed 5939.98 samples/sec   Loss 9.4102   LearningRate 0.1807   Epoch: 7   Global Step: 81930   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:19:58,574-Speed 5985.58 samples/sec   Loss 9.4177   LearningRate 0.1807   Epoch: 7   Global Step: 81940   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:20:05,428-Speed 5977.16 samples/sec   Loss 9.4341   LearningRate 0.1807   Epoch: 7   Global Step: 81950   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 12:20:12,296-Speed 5965.64 samples/sec   Loss 9.4667   LearningRate 0.1806   Epoch: 7   Global Step: 81960   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:20:19,162-Speed 5966.87 samples/sec   Loss 9.4528   LearningRate 0.1806   Epoch: 7   Global Step: 81970   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:20:26,028-Speed 5966.80 samples/sec   Loss 9.3555   LearningRate 0.1806   Epoch: 7   Global Step: 81980   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:20:32,891-Speed 5969.43 samples/sec   Loss 9.3225   LearningRate 0.1805   Epoch: 7   Global Step: 81990   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:20:39,757-Speed 5967.18 samples/sec   Loss 9.3505   LearningRate 0.1805   Epoch: 7   Global Step: 82000   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:20:46,698-Speed 5901.93 samples/sec   Loss 9.4294   LearningRate 0.1805   Epoch: 7   Global Step: 82010   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:20:53,575-Speed 5957.11 samples/sec   Loss 9.4408   LearningRate 0.1805   Epoch: 7   Global Step: 82020   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:21:00,437-Speed 5970.58 samples/sec   Loss 9.2990   LearningRate 0.1804   Epoch: 7   Global Step: 82030   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:21:07,291-Speed 5976.63 samples/sec   Loss 9.3615   LearningRate 0.1804   Epoch: 7   Global Step: 82040   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:21:14,147-Speed 5974.98 samples/sec   Loss 9.3890   LearningRate 0.1804   Epoch: 7   Global Step: 82050   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:21:21,018-Speed 5962.42 samples/sec   Loss 9.4276   LearningRate 0.1803   Epoch: 7   Global Step: 82060   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:21:27,875-Speed 5974.61 samples/sec   Loss 9.3292   LearningRate 0.1803   Epoch: 7   Global Step: 82070   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:21:34,744-Speed 5963.95 samples/sec   Loss 9.3866   LearningRate 0.1803   Epoch: 7   Global Step: 82080   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:21:41,589-Speed 5984.34 samples/sec   Loss 9.3424   LearningRate 0.1802   Epoch: 7   Global Step: 82090   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:21:48,446-Speed 5975.25 samples/sec   Loss 9.3827   LearningRate 0.1802   Epoch: 7   Global Step: 82100   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:21:55,308-Speed 5970.06 samples/sec   Loss 9.3495   LearningRate 0.1802   Epoch: 7   Global Step: 82110   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:22:02,166-Speed 5974.04 samples/sec   Loss 9.3806   LearningRate 0.1802   Epoch: 7   Global Step: 82120   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:22:09,046-Speed 5954.08 samples/sec   Loss 9.3842   LearningRate 0.1801   Epoch: 7   Global Step: 82130   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:22:15,910-Speed 5969.25 samples/sec   Loss 9.3007   LearningRate 0.1801   Epoch: 7   Global Step: 82140   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:22:22,758-Speed 5982.04 samples/sec   Loss 9.3782   LearningRate 0.1801   Epoch: 7   Global Step: 82150   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:22:29,611-Speed 5978.40 samples/sec   Loss 9.3882   LearningRate 0.1800   Epoch: 7   Global Step: 82160   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:22:36,472-Speed 5971.13 samples/sec   Loss 9.3653   LearningRate 0.1800   Epoch: 7   Global Step: 82170   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:22:43,336-Speed 5969.37 samples/sec   Loss 9.4015   LearningRate 0.1800   Epoch: 7   Global Step: 82180   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:22:50,194-Speed 5973.36 samples/sec   Loss 9.3018   LearningRate 0.1800   Epoch: 7   Global Step: 82190   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:22:57,067-Speed 5960.81 samples/sec   Loss 9.3863   LearningRate 0.1799   Epoch: 7   Global Step: 82200   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:23:03,973-Speed 5932.74 samples/sec   Loss 9.3903   LearningRate 0.1799   Epoch: 7   Global Step: 82210   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 12:23:10,878-Speed 5932.88 samples/sec   Loss 9.3690   LearningRate 0.1799   Epoch: 7   Global Step: 82220   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 12:23:17,754-Speed 5960.55 samples/sec   Loss 9.4583   LearningRate 0.1798   Epoch: 7   Global Step: 82230   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-01-08 12:23:24,604-Speed 5980.24 samples/sec   Loss 9.4296   LearningRate 0.1798   Epoch: 7   Global Step: 82240   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:23:31,447-Speed 5987.41 samples/sec   Loss 9.3510   LearningRate 0.1798   Epoch: 7   Global Step: 82250   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:23:38,326-Speed 5955.18 samples/sec   Loss 9.3413   LearningRate 0.1798   Epoch: 7   Global Step: 82260   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:23:45,185-Speed 5973.07 samples/sec   Loss 9.3923   LearningRate 0.1797   Epoch: 7   Global Step: 82270   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:23:52,044-Speed 5972.18 samples/sec   Loss 9.3903   LearningRate 0.1797   Epoch: 7   Global Step: 82280   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:23:58,889-Speed 5985.09 samples/sec   Loss 9.3929   LearningRate 0.1797   Epoch: 7   Global Step: 82290   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:24:05,762-Speed 5961.16 samples/sec   Loss 9.3719   LearningRate 0.1796   Epoch: 7   Global Step: 82300   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:24:12,625-Speed 5969.54 samples/sec   Loss 9.4120   LearningRate 0.1796   Epoch: 7   Global Step: 82310   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:24:19,498-Speed 5960.98 samples/sec   Loss 9.3064   LearningRate 0.1796   Epoch: 7   Global Step: 82320   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:24:26,351-Speed 5977.83 samples/sec   Loss 9.3345   LearningRate 0.1796   Epoch: 7   Global Step: 82330   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-01-08 12:24:33,196-Speed 5985.17 samples/sec   Loss 9.4427   LearningRate 0.1795   Epoch: 7   Global Step: 82340   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:24:40,047-Speed 5979.84 samples/sec   Loss 9.3771   LearningRate 0.1795   Epoch: 7   Global Step: 82350   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:24:46,899-Speed 5981.63 samples/sec   Loss 9.3438   LearningRate 0.1795   Epoch: 7   Global Step: 82360   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:24:53,763-Speed 5968.70 samples/sec   Loss 9.4032   LearningRate 0.1794   Epoch: 7   Global Step: 82370   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:25:00,607-Speed 5985.35 samples/sec   Loss 9.3591   LearningRate 0.1794   Epoch: 7   Global Step: 82380   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:25:07,463-Speed 5975.64 samples/sec   Loss 9.3669   LearningRate 0.1794   Epoch: 7   Global Step: 82390   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-01-08 12:25:14,321-Speed 5974.00 samples/sec   Loss 9.3405   LearningRate 0.1794   Epoch: 7   Global Step: 82400   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:25:21,172-Speed 5979.10 samples/sec   Loss 9.3519   LearningRate 0.1793   Epoch: 7   Global Step: 82410   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:25:28,023-Speed 5980.36 samples/sec   Loss 9.3753   LearningRate 0.1793   Epoch: 7   Global Step: 82420   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:25:34,883-Speed 5973.48 samples/sec   Loss 9.3377   LearningRate 0.1793   Epoch: 7   Global Step: 82430   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:25:41,741-Speed 5973.18 samples/sec   Loss 9.4138   LearningRate 0.1792   Epoch: 7   Global Step: 82440   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 12:25:48,589-Speed 5982.80 samples/sec   Loss 9.4312   LearningRate 0.1792   Epoch: 7   Global Step: 82450   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:25:55,463-Speed 5959.69 samples/sec   Loss 9.4258   LearningRate 0.1792   Epoch: 7   Global Step: 82460   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:26:02,315-Speed 5979.10 samples/sec   Loss 9.4408   LearningRate 0.1792   Epoch: 7   Global Step: 82470   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:26:09,183-Speed 5964.24 samples/sec   Loss 9.3987   LearningRate 0.1791   Epoch: 7   Global Step: 82480   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:26:16,044-Speed 5971.51 samples/sec   Loss 9.3713   LearningRate 0.1791   Epoch: 7   Global Step: 82490   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:26:22,894-Speed 5980.99 samples/sec   Loss 9.4469   LearningRate 0.1791   Epoch: 7   Global Step: 82500   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:26:29,747-Speed 5978.00 samples/sec   Loss 9.2891   LearningRate 0.1790   Epoch: 7   Global Step: 82510   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:26:36,591-Speed 5985.58 samples/sec   Loss 9.3214   LearningRate 0.1790   Epoch: 7   Global Step: 82520   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:26:43,448-Speed 5975.19 samples/sec   Loss 9.3626   LearningRate 0.1790   Epoch: 7   Global Step: 82530   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:26:50,300-Speed 5978.30 samples/sec   Loss 9.3128   LearningRate 0.1790   Epoch: 7   Global Step: 82540   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:26:57,151-Speed 5980.01 samples/sec   Loss 9.3767   LearningRate 0.1789   Epoch: 7   Global Step: 82550   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:27:04,027-Speed 5957.48 samples/sec   Loss 9.3289   LearningRate 0.1789   Epoch: 7   Global Step: 82560   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:27:10,887-Speed 5972.49 samples/sec   Loss 9.3703   LearningRate 0.1789   Epoch: 7   Global Step: 82570   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:27:17,775-Speed 5947.32 samples/sec   Loss 9.3494   LearningRate 0.1788   Epoch: 7   Global Step: 82580   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:27:24,626-Speed 5979.99 samples/sec   Loss 9.2729   LearningRate 0.1788   Epoch: 7   Global Step: 82590   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:27:31,477-Speed 5980.10 samples/sec   Loss 9.3508   LearningRate 0.1788   Epoch: 7   Global Step: 82600   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:27:38,341-Speed 5968.79 samples/sec   Loss 9.3023   LearningRate 0.1788   Epoch: 7   Global Step: 82610   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:27:45,198-Speed 5973.97 samples/sec   Loss 9.3243   LearningRate 0.1787   Epoch: 7   Global Step: 82620   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:27:52,048-Speed 5980.80 samples/sec   Loss 9.3755   LearningRate 0.1787   Epoch: 7   Global Step: 82630   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:27:58,918-Speed 5963.72 samples/sec   Loss 9.3797   LearningRate 0.1787   Epoch: 7   Global Step: 82640   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:28:05,774-Speed 5974.72 samples/sec   Loss 9.4342   LearningRate 0.1786   Epoch: 7   Global Step: 82650   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:28:12,635-Speed 5971.67 samples/sec   Loss 9.4006   LearningRate 0.1786   Epoch: 7   Global Step: 82660   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:28:19,491-Speed 5975.72 samples/sec   Loss 9.3303   LearningRate 0.1786   Epoch: 7   Global Step: 82670   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:28:26,362-Speed 5962.02 samples/sec   Loss 9.3459   LearningRate 0.1786   Epoch: 7   Global Step: 82680   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:28:33,226-Speed 5968.27 samples/sec   Loss 9.3757   LearningRate 0.1785   Epoch: 7   Global Step: 82690   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:28:40,095-Speed 5964.20 samples/sec   Loss 9.3725   LearningRate 0.1785   Epoch: 7   Global Step: 82700   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:28:46,955-Speed 5972.09 samples/sec   Loss 9.3575   LearningRate 0.1785   Epoch: 7   Global Step: 82710   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:28:53,823-Speed 5965.55 samples/sec   Loss 9.3783   LearningRate 0.1784   Epoch: 7   Global Step: 82720   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:29:00,666-Speed 5987.07 samples/sec   Loss 9.3288   LearningRate 0.1784   Epoch: 7   Global Step: 82730   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:29:07,514-Speed 5982.95 samples/sec   Loss 9.3107   LearningRate 0.1784   Epoch: 7   Global Step: 82740   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:29:14,395-Speed 5955.27 samples/sec   Loss 9.4096   LearningRate 0.1784   Epoch: 7   Global Step: 82750   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:29:21,248-Speed 5980.18 samples/sec   Loss 9.2857   LearningRate 0.1783   Epoch: 7   Global Step: 82760   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:29:28,100-Speed 5978.79 samples/sec   Loss 9.3808   LearningRate 0.1783   Epoch: 7   Global Step: 82770   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:29:34,949-Speed 5981.16 samples/sec   Loss 9.4254   LearningRate 0.1783   Epoch: 7   Global Step: 82780   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:29:41,801-Speed 5979.14 samples/sec   Loss 9.3201   LearningRate 0.1782   Epoch: 7   Global Step: 82790   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:29:48,696-Speed 5942.56 samples/sec   Loss 9.3382   LearningRate 0.1782   Epoch: 7   Global Step: 82800   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:29:55,545-Speed 5981.34 samples/sec   Loss 9.3581   LearningRate 0.1782   Epoch: 7   Global Step: 82810   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:30:02,395-Speed 5980.70 samples/sec   Loss 9.3504   LearningRate 0.1782   Epoch: 7   Global Step: 82820   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:30:09,242-Speed 5983.31 samples/sec   Loss 9.3402   LearningRate 0.1781   Epoch: 7   Global Step: 82830   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:30:16,114-Speed 5961.85 samples/sec   Loss 9.3632   LearningRate 0.1781   Epoch: 7   Global Step: 82840   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:30:22,985-Speed 5962.01 samples/sec   Loss 9.3331   LearningRate 0.1781   Epoch: 7   Global Step: 82850   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:30:29,850-Speed 5967.74 samples/sec   Loss 9.3210   LearningRate 0.1780   Epoch: 7   Global Step: 82860   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:30:36,710-Speed 5972.23 samples/sec   Loss 9.3947   LearningRate 0.1780   Epoch: 7   Global Step: 82870   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:30:43,577-Speed 5966.35 samples/sec   Loss 9.3186   LearningRate 0.1780   Epoch: 7   Global Step: 82880   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:30:50,430-Speed 5978.01 samples/sec   Loss 9.3535   LearningRate 0.1780   Epoch: 7   Global Step: 82890   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:30:57,281-Speed 5980.25 samples/sec   Loss 9.2777   LearningRate 0.1779   Epoch: 7   Global Step: 82900   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:31:04,179-Speed 5939.22 samples/sec   Loss 9.3568   LearningRate 0.1779   Epoch: 7   Global Step: 82910   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:31:11,039-Speed 5971.38 samples/sec   Loss 9.3770   LearningRate 0.1779   Epoch: 7   Global Step: 82920   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:31:17,888-Speed 5981.81 samples/sec   Loss 9.3649   LearningRate 0.1778   Epoch: 7   Global Step: 82930   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:31:24,745-Speed 5978.11 samples/sec   Loss 9.3819   LearningRate 0.1778   Epoch: 7   Global Step: 82940   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 12:31:31,603-Speed 5973.16 samples/sec   Loss 9.3072   LearningRate 0.1778   Epoch: 7   Global Step: 82950   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 12:31:56,699-Speed 1632.45 samples/sec   Loss 9.3326   LearningRate 0.1778   Epoch: 8   Global Step: 82960   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 12:32:03,510-Speed 6015.42 samples/sec   Loss 9.3149   LearningRate 0.1777   Epoch: 8   Global Step: 82970   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 12:32:10,322-Speed 6013.51 samples/sec   Loss 9.3662   LearningRate 0.1777   Epoch: 8   Global Step: 82980   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:32:17,153-Speed 5997.11 samples/sec   Loss 9.3140   LearningRate 0.1777   Epoch: 8   Global Step: 82990   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:32:24,008-Speed 5977.07 samples/sec   Loss 9.3551   LearningRate 0.1776   Epoch: 8   Global Step: 83000   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:32:30,858-Speed 5980.74 samples/sec   Loss 9.3382   LearningRate 0.1776   Epoch: 8   Global Step: 83010   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:32:37,730-Speed 5961.53 samples/sec   Loss 9.3018   LearningRate 0.1776   Epoch: 8   Global Step: 83020   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:32:44,632-Speed 5936.56 samples/sec   Loss 9.2431   LearningRate 0.1776   Epoch: 8   Global Step: 83030   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:32:51,526-Speed 5941.98 samples/sec   Loss 9.2942   LearningRate 0.1775   Epoch: 8   Global Step: 83040   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:32:58,418-Speed 5944.63 samples/sec   Loss 9.3675   LearningRate 0.1775   Epoch: 8   Global Step: 83050   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:33:06,180-Speed 5278.27 samples/sec   Loss 9.3641   LearningRate 0.1775   Epoch: 8   Global Step: 83060   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:33:13,057-Speed 5956.97 samples/sec   Loss 9.2384   LearningRate 0.1774   Epoch: 8   Global Step: 83070   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:33:19,935-Speed 5956.37 samples/sec   Loss 9.3371   LearningRate 0.1774   Epoch: 8   Global Step: 83080   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:33:26,850-Speed 5925.13 samples/sec   Loss 9.3046   LearningRate 0.1774   Epoch: 8   Global Step: 83090   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:33:33,717-Speed 5966.00 samples/sec   Loss 9.2502   LearningRate 0.1774   Epoch: 8   Global Step: 83100   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:33:40,590-Speed 5960.99 samples/sec   Loss 9.3197   LearningRate 0.1773   Epoch: 8   Global Step: 83110   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:33:47,463-Speed 5960.52 samples/sec   Loss 9.3337   LearningRate 0.1773   Epoch: 8   Global Step: 83120   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:33:54,306-Speed 5987.07 samples/sec   Loss 9.2741   LearningRate 0.1773   Epoch: 8   Global Step: 83130   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:34:01,162-Speed 5975.34 samples/sec   Loss 9.3103   LearningRate 0.1772   Epoch: 8   Global Step: 83140   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:34:08,044-Speed 5953.00 samples/sec   Loss 9.3321   LearningRate 0.1772   Epoch: 8   Global Step: 83150   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:34:14,894-Speed 5980.58 samples/sec   Loss 9.3517   LearningRate 0.1772   Epoch: 8   Global Step: 83160   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:34:21,741-Speed 5983.41 samples/sec   Loss 9.3512   LearningRate 0.1772   Epoch: 8   Global Step: 83170   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:34:28,604-Speed 5969.52 samples/sec   Loss 9.3126   LearningRate 0.1771   Epoch: 8   Global Step: 83180   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:34:35,496-Speed 5944.03 samples/sec   Loss 9.3281   LearningRate 0.1771   Epoch: 8   Global Step: 83190   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:34:42,341-Speed 5984.81 samples/sec   Loss 9.2284   LearningRate 0.1771   Epoch: 8   Global Step: 83200   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:34:49,203-Speed 5970.96 samples/sec   Loss 9.3418   LearningRate 0.1770   Epoch: 8   Global Step: 83210   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:34:56,067-Speed 5967.92 samples/sec   Loss 9.3135   LearningRate 0.1770   Epoch: 8   Global Step: 83220   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:35:02,920-Speed 5978.42 samples/sec   Loss 9.3041   LearningRate 0.1770   Epoch: 8   Global Step: 83230   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:35:12,005-Speed 4509.36 samples/sec   Loss 9.2651   LearningRate 0.1770   Epoch: 8   Global Step: 83240   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:35:18,855-Speed 5980.64 samples/sec   Loss 9.3013   LearningRate 0.1769   Epoch: 8   Global Step: 83250   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:35:25,709-Speed 5978.11 samples/sec   Loss 9.2488   LearningRate 0.1769   Epoch: 8   Global Step: 83260   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:35:32,569-Speed 5971.86 samples/sec   Loss 9.2812   LearningRate 0.1769   Epoch: 8   Global Step: 83270   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:35:39,438-Speed 5967.50 samples/sec   Loss 9.3236   LearningRate 0.1768   Epoch: 8   Global Step: 83280   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:35:46,291-Speed 5977.83 samples/sec   Loss 9.3103   LearningRate 0.1768   Epoch: 8   Global Step: 83290   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:35:53,146-Speed 5975.98 samples/sec   Loss 9.2511   LearningRate 0.1768   Epoch: 8   Global Step: 83300   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:36:00,006-Speed 5972.54 samples/sec   Loss 9.3437   LearningRate 0.1768   Epoch: 8   Global Step: 83310   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:36:06,860-Speed 5977.13 samples/sec   Loss 9.3050   LearningRate 0.1767   Epoch: 8   Global Step: 83320   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:36:13,722-Speed 5970.37 samples/sec   Loss 9.3420   LearningRate 0.1767   Epoch: 8   Global Step: 83330   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 12:36:20,585-Speed 5968.98 samples/sec   Loss 9.3579   LearningRate 0.1767   Epoch: 8   Global Step: 83340   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 12:36:27,441-Speed 5976.28 samples/sec   Loss 9.3557   LearningRate 0.1766   Epoch: 8   Global Step: 83350   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 12:36:34,306-Speed 5966.95 samples/sec   Loss 9.3796   LearningRate 0.1766   Epoch: 8   Global Step: 83360   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 12:36:41,162-Speed 5975.55 samples/sec   Loss 9.2559   LearningRate 0.1766   Epoch: 8   Global Step: 83370   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:36:48,015-Speed 5980.79 samples/sec   Loss 9.3300   LearningRate 0.1766   Epoch: 8   Global Step: 83380   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:36:54,861-Speed 5984.22 samples/sec   Loss 9.3196   LearningRate 0.1765   Epoch: 8   Global Step: 83390   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:37:01,714-Speed 5978.20 samples/sec   Loss 9.3146   LearningRate 0.1765   Epoch: 8   Global Step: 83400   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:37:08,587-Speed 5961.02 samples/sec   Loss 9.3554   LearningRate 0.1765   Epoch: 8   Global Step: 83410   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:37:15,447-Speed 5971.36 samples/sec   Loss 9.3322   LearningRate 0.1764   Epoch: 8   Global Step: 83420   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:37:22,353-Speed 5932.40 samples/sec   Loss 9.3520   LearningRate 0.1764   Epoch: 8   Global Step: 83430   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:37:29,290-Speed 5905.86 samples/sec   Loss 9.2462   LearningRate 0.1764   Epoch: 8   Global Step: 83440   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:37:36,141-Speed 5979.52 samples/sec   Loss 9.3202   LearningRate 0.1764   Epoch: 8   Global Step: 83450   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:37:43,014-Speed 5960.96 samples/sec   Loss 9.2772   LearningRate 0.1763   Epoch: 8   Global Step: 83460   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:37:49,869-Speed 5977.04 samples/sec   Loss 9.2625   LearningRate 0.1763   Epoch: 8   Global Step: 83470   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 12:37:56,721-Speed 5978.78 samples/sec   Loss 9.2800   LearningRate 0.1763   Epoch: 8   Global Step: 83480   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:38:03,586-Speed 5968.39 samples/sec   Loss 9.2925   LearningRate 0.1762   Epoch: 8   Global Step: 83490   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:38:10,447-Speed 5971.11 samples/sec   Loss 9.3398   LearningRate 0.1762   Epoch: 8   Global Step: 83500   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:38:17,305-Speed 5973.00 samples/sec   Loss 9.2845   LearningRate 0.1762   Epoch: 8   Global Step: 83510   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:38:24,154-Speed 5982.07 samples/sec   Loss 9.2797   LearningRate 0.1762   Epoch: 8   Global Step: 83520   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:38:31,007-Speed 5980.31 samples/sec   Loss 9.3045   LearningRate 0.1761   Epoch: 8   Global Step: 83530   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:38:37,871-Speed 5968.00 samples/sec   Loss 9.3594   LearningRate 0.1761   Epoch: 8   Global Step: 83540   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:38:44,725-Speed 5977.22 samples/sec   Loss 9.2563   LearningRate 0.1761   Epoch: 8   Global Step: 83550   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:38:51,585-Speed 5971.89 samples/sec   Loss 9.3222   LearningRate 0.1760   Epoch: 8   Global Step: 83560   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:38:58,471-Speed 5949.46 samples/sec   Loss 9.2670   LearningRate 0.1760   Epoch: 8   Global Step: 83570   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:39:05,319-Speed 5982.22 samples/sec   Loss 9.2776   LearningRate 0.1760   Epoch: 8   Global Step: 83580   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:39:12,196-Speed 5959.24 samples/sec   Loss 9.2889   LearningRate 0.1760   Epoch: 8   Global Step: 83590   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:39:19,346-Speed 5729.42 samples/sec   Loss 9.2587   LearningRate 0.1759   Epoch: 8   Global Step: 83600   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:39:26,213-Speed 5965.82 samples/sec   Loss 9.3014   LearningRate 0.1759   Epoch: 8   Global Step: 83610   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:39:33,041-Speed 5999.70 samples/sec   Loss 9.2892   LearningRate 0.1759   Epoch: 8   Global Step: 83620   Fp16 Grad Scale: 16384   Required: 24 hours
Training: 2022-01-08 12:39:39,898-Speed 5974.09 samples/sec   Loss 9.3063   LearningRate 0.1758   Epoch: 8   Global Step: 83630   Fp16 Grad Scale: 16384   Required: 24 hours
Training: 2022-01-08 12:39:46,801-Speed 5935.17 samples/sec   Loss 9.3307   LearningRate 0.1758   Epoch: 8   Global Step: 83640   Fp16 Grad Scale: 16384   Required: 24 hours
Training: 2022-01-08 12:39:53,696-Speed 5941.61 samples/sec   Loss 9.3232   LearningRate 0.1758   Epoch: 8   Global Step: 83650   Fp16 Grad Scale: 16384   Required: 24 hours
Training: 2022-01-08 12:40:00,551-Speed 5976.26 samples/sec   Loss 9.3208   LearningRate 0.1758   Epoch: 8   Global Step: 83660   Fp16 Grad Scale: 16384   Required: 24 hours
Training: 2022-01-08 12:40:07,409-Speed 5974.55 samples/sec   Loss 9.2937   LearningRate 0.1757   Epoch: 8   Global Step: 83670   Fp16 Grad Scale: 16384   Required: 24 hours
Training: 2022-01-08 12:40:14,266-Speed 5974.98 samples/sec   Loss 9.2890   LearningRate 0.1757   Epoch: 8   Global Step: 83680   Fp16 Grad Scale: 16384   Required: 24 hours
Training: 2022-01-08 12:40:21,143-Speed 5957.05 samples/sec   Loss 9.3312   LearningRate 0.1757   Epoch: 8   Global Step: 83690   Fp16 Grad Scale: 16384   Required: 24 hours
Training: 2022-01-08 12:40:28,009-Speed 5966.44 samples/sec   Loss 9.2508   LearningRate 0.1756   Epoch: 8   Global Step: 83700   Fp16 Grad Scale: 16384   Required: 24 hours
Training: 2022-01-08 12:40:34,863-Speed 5977.74 samples/sec   Loss 9.3031   LearningRate 0.1756   Epoch: 8   Global Step: 83710   Fp16 Grad Scale: 16384   Required: 24 hours
Training: 2022-01-08 12:40:41,715-Speed 5978.76 samples/sec   Loss 9.2568   LearningRate 0.1756   Epoch: 8   Global Step: 83720   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 12:40:48,572-Speed 5974.71 samples/sec   Loss 9.2891   LearningRate 0.1756   Epoch: 8   Global Step: 83730   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 12:40:55,435-Speed 5971.74 samples/sec   Loss 9.2838   LearningRate 0.1755   Epoch: 8   Global Step: 83740   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 12:41:02,318-Speed 5952.33 samples/sec   Loss 9.2026   LearningRate 0.1755   Epoch: 8   Global Step: 83750   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 12:41:09,180-Speed 5970.21 samples/sec   Loss 9.2620   LearningRate 0.1755   Epoch: 8   Global Step: 83760   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 12:41:16,049-Speed 5964.19 samples/sec   Loss 9.2606   LearningRate 0.1754   Epoch: 8   Global Step: 83770   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 12:41:22,940-Speed 5944.24 samples/sec   Loss 9.1789   LearningRate 0.1754   Epoch: 8   Global Step: 83780   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 12:41:29,821-Speed 5954.05 samples/sec   Loss 9.3218   LearningRate 0.1754   Epoch: 8   Global Step: 83790   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 12:41:36,691-Speed 5966.71 samples/sec   Loss 9.2382   LearningRate 0.1754   Epoch: 8   Global Step: 83800   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 12:41:43,580-Speed 5946.01 samples/sec   Loss 9.2551   LearningRate 0.1753   Epoch: 8   Global Step: 83810   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 12:41:50,439-Speed 5973.80 samples/sec   Loss 9.3349   LearningRate 0.1753   Epoch: 8   Global Step: 83820   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:41:57,296-Speed 5975.24 samples/sec   Loss 9.2444   LearningRate 0.1753   Epoch: 8   Global Step: 83830   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:42:04,146-Speed 5980.15 samples/sec   Loss 9.3824   LearningRate 0.1752   Epoch: 8   Global Step: 83840   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:42:11,030-Speed 5951.45 samples/sec   Loss 9.2622   LearningRate 0.1752   Epoch: 8   Global Step: 83850   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:42:17,900-Speed 5963.15 samples/sec   Loss 9.3022   LearningRate 0.1752   Epoch: 8   Global Step: 83860   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:42:24,749-Speed 5981.56 samples/sec   Loss 9.3407   LearningRate 0.1752   Epoch: 8   Global Step: 83870   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:42:31,616-Speed 5966.18 samples/sec   Loss 9.2133   LearningRate 0.1751   Epoch: 8   Global Step: 83880   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:42:38,499-Speed 5952.35 samples/sec   Loss 9.2167   LearningRate 0.1751   Epoch: 8   Global Step: 83890   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:42:45,372-Speed 5961.32 samples/sec   Loss 9.2295   LearningRate 0.1751   Epoch: 8   Global Step: 83900   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:42:52,220-Speed 5982.25 samples/sec   Loss 9.2751   LearningRate 0.1751   Epoch: 8   Global Step: 83910   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 12:42:59,091-Speed 5962.50 samples/sec   Loss 9.2489   LearningRate 0.1750   Epoch: 8   Global Step: 83920   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:43:05,947-Speed 5974.94 samples/sec   Loss 9.2996   LearningRate 0.1750   Epoch: 8   Global Step: 83930   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:43:12,811-Speed 5969.41 samples/sec   Loss 9.2724   LearningRate 0.1750   Epoch: 8   Global Step: 83940   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:43:19,673-Speed 5969.71 samples/sec   Loss 9.2711   LearningRate 0.1749   Epoch: 8   Global Step: 83950   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:43:26,551-Speed 5956.37 samples/sec   Loss 9.2846   LearningRate 0.1749   Epoch: 8   Global Step: 83960   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:43:33,415-Speed 5968.27 samples/sec   Loss 9.2197   LearningRate 0.1749   Epoch: 8   Global Step: 83970   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:43:40,270-Speed 5978.15 samples/sec   Loss 9.1881   LearningRate 0.1749   Epoch: 8   Global Step: 83980   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:43:47,193-Speed 5918.05 samples/sec   Loss 9.2973   LearningRate 0.1748   Epoch: 8   Global Step: 83990   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:43:54,118-Speed 5915.79 samples/sec   Loss 9.3061   LearningRate 0.1748   Epoch: 8   Global Step: 84000   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:44:01,048-Speed 5913.38 samples/sec   Loss 9.2599   LearningRate 0.1748   Epoch: 8   Global Step: 84010   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:44:07,979-Speed 5910.72 samples/sec   Loss 9.2366   LearningRate 0.1747   Epoch: 8   Global Step: 84020   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 12:44:14,884-Speed 5932.78 samples/sec   Loss 9.2632   LearningRate 0.1747   Epoch: 8   Global Step: 84030   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:44:21,748-Speed 5970.57 samples/sec   Loss 9.2182   LearningRate 0.1747   Epoch: 8   Global Step: 84040   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:44:28,605-Speed 5973.94 samples/sec   Loss 9.2809   LearningRate 0.1747   Epoch: 8   Global Step: 84050   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:44:35,478-Speed 5960.81 samples/sec   Loss 9.2668   LearningRate 0.1746   Epoch: 8   Global Step: 84060   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:44:42,379-Speed 5937.14 samples/sec   Loss 9.2509   LearningRate 0.1746   Epoch: 8   Global Step: 84070   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:44:49,248-Speed 5963.59 samples/sec   Loss 9.2307   LearningRate 0.1746   Epoch: 8   Global Step: 84080   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:44:56,142-Speed 5942.77 samples/sec   Loss 9.2306   LearningRate 0.1745   Epoch: 8   Global Step: 84090   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:45:03,008-Speed 5966.83 samples/sec   Loss 9.2252   LearningRate 0.1745   Epoch: 8   Global Step: 84100   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:45:09,862-Speed 5977.24 samples/sec   Loss 9.3093   LearningRate 0.1745   Epoch: 8   Global Step: 84110   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:45:16,713-Speed 5979.45 samples/sec   Loss 9.2129   LearningRate 0.1745   Epoch: 8   Global Step: 84120   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:45:23,568-Speed 5976.96 samples/sec   Loss 9.2221   LearningRate 0.1744   Epoch: 8   Global Step: 84130   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 12:45:30,425-Speed 5974.48 samples/sec   Loss 9.2411   LearningRate 0.1744   Epoch: 8   Global Step: 84140   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:45:37,305-Speed 5955.05 samples/sec   Loss 9.2405   LearningRate 0.1744   Epoch: 8   Global Step: 84150   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:45:44,177-Speed 5960.48 samples/sec   Loss 9.2824   LearningRate 0.1743   Epoch: 8   Global Step: 84160   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:45:51,040-Speed 5969.63 samples/sec   Loss 9.2219   LearningRate 0.1743   Epoch: 8   Global Step: 84170   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:45:57,894-Speed 5977.26 samples/sec   Loss 9.2562   LearningRate 0.1743   Epoch: 8   Global Step: 84180   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:46:04,761-Speed 5966.28 samples/sec   Loss 9.2478   LearningRate 0.1743   Epoch: 8   Global Step: 84190   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:46:11,630-Speed 5963.95 samples/sec   Loss 9.1947   LearningRate 0.1742   Epoch: 8   Global Step: 84200   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:46:18,513-Speed 5952.05 samples/sec   Loss 9.2393   LearningRate 0.1742   Epoch: 8   Global Step: 84210   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:46:25,399-Speed 5949.10 samples/sec   Loss 9.1917   LearningRate 0.1742   Epoch: 8   Global Step: 84220   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:46:32,259-Speed 5972.18 samples/sec   Loss 9.2458   LearningRate 0.1741   Epoch: 8   Global Step: 84230   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:46:39,143-Speed 5951.28 samples/sec   Loss 9.2890   LearningRate 0.1741   Epoch: 8   Global Step: 84240   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 12:46:46,010-Speed 5968.92 samples/sec   Loss 9.2025   LearningRate 0.1741   Epoch: 8   Global Step: 84250   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 12:46:52,867-Speed 5973.94 samples/sec   Loss 9.2095   LearningRate 0.1741   Epoch: 8   Global Step: 84260   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 12:46:59,730-Speed 5969.86 samples/sec   Loss 9.1316   LearningRate 0.1740   Epoch: 8   Global Step: 84270   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 12:47:06,598-Speed 5965.19 samples/sec   Loss 9.2094   LearningRate 0.1740   Epoch: 8   Global Step: 84280   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 12:47:13,470-Speed 5960.61 samples/sec   Loss 9.2477   LearningRate 0.1740   Epoch: 8   Global Step: 84290   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 12:47:20,349-Speed 5955.45 samples/sec   Loss 9.2627   LearningRate 0.1739   Epoch: 8   Global Step: 84300   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 12:47:27,200-Speed 5980.02 samples/sec   Loss 9.2882   LearningRate 0.1739   Epoch: 8   Global Step: 84310   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:47:34,056-Speed 5975.24 samples/sec   Loss 9.3144   LearningRate 0.1739   Epoch: 8   Global Step: 84320   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:47:40,904-Speed 5982.13 samples/sec   Loss 9.1903   LearningRate 0.1739   Epoch: 8   Global Step: 84330   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:47:47,755-Speed 5980.21 samples/sec   Loss 9.1579   LearningRate 0.1738   Epoch: 8   Global Step: 84340   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:47:54,623-Speed 5964.79 samples/sec   Loss 9.2362   LearningRate 0.1738   Epoch: 8   Global Step: 84350   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:48:01,477-Speed 5976.94 samples/sec   Loss 9.2279   LearningRate 0.1738   Epoch: 8   Global Step: 84360   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:48:08,347-Speed 5963.91 samples/sec   Loss 9.3288   LearningRate 0.1737   Epoch: 8   Global Step: 84370   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:48:15,215-Speed 5964.49 samples/sec   Loss 9.2505   LearningRate 0.1737   Epoch: 8   Global Step: 84380   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:48:22,071-Speed 5975.86 samples/sec   Loss 9.2715   LearningRate 0.1737   Epoch: 8   Global Step: 84390   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:48:28,970-Speed 5938.99 samples/sec   Loss 9.1754   LearningRate 0.1737   Epoch: 8   Global Step: 84400   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:48:35,847-Speed 5956.41 samples/sec   Loss 9.2321   LearningRate 0.1736   Epoch: 8   Global Step: 84410   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 12:48:42,729-Speed 5953.85 samples/sec   Loss 9.2410   LearningRate 0.1736   Epoch: 8   Global Step: 84420   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 12:48:49,591-Speed 5970.16 samples/sec   Loss 9.2785   LearningRate 0.1736   Epoch: 8   Global Step: 84430   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:48:56,454-Speed 5969.35 samples/sec   Loss 9.2281   LearningRate 0.1736   Epoch: 8   Global Step: 84440   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:49:03,322-Speed 5965.40 samples/sec   Loss 9.3417   LearningRate 0.1735   Epoch: 8   Global Step: 84450   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:49:10,175-Speed 5977.97 samples/sec   Loss 9.2163   LearningRate 0.1735   Epoch: 8   Global Step: 84460   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:49:17,030-Speed 5976.22 samples/sec   Loss 9.2313   LearningRate 0.1735   Epoch: 8   Global Step: 84470   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:49:23,905-Speed 5959.35 samples/sec   Loss 9.2344   LearningRate 0.1734   Epoch: 8   Global Step: 84480   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:49:30,787-Speed 5953.37 samples/sec   Loss 9.2234   LearningRate 0.1734   Epoch: 8   Global Step: 84490   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:49:37,682-Speed 5942.20 samples/sec   Loss 9.2556   LearningRate 0.1734   Epoch: 8   Global Step: 84500   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:49:44,577-Speed 5941.61 samples/sec   Loss 9.1572   LearningRate 0.1734   Epoch: 8   Global Step: 84510   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:49:51,455-Speed 5956.84 samples/sec   Loss 9.2565   LearningRate 0.1733   Epoch: 8   Global Step: 84520   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:49:58,314-Speed 5972.19 samples/sec   Loss 9.3060   LearningRate 0.1733   Epoch: 8   Global Step: 84530   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 12:50:05,175-Speed 5971.84 samples/sec   Loss 9.2420   LearningRate 0.1733   Epoch: 8   Global Step: 84540   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 12:50:12,029-Speed 5978.86 samples/sec   Loss 9.1420   LearningRate 0.1732   Epoch: 8   Global Step: 84550   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:50:18,913-Speed 5951.23 samples/sec   Loss 9.2545   LearningRate 0.1732   Epoch: 8   Global Step: 84560   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:50:25,776-Speed 5969.57 samples/sec   Loss 9.1300   LearningRate 0.1732   Epoch: 8   Global Step: 84570   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:50:32,644-Speed 5965.56 samples/sec   Loss 9.2417   LearningRate 0.1732   Epoch: 8   Global Step: 84580   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:50:39,511-Speed 5966.06 samples/sec   Loss 9.1548   LearningRate 0.1731   Epoch: 8   Global Step: 84590   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:50:46,378-Speed 5966.30 samples/sec   Loss 9.2347   LearningRate 0.1731   Epoch: 8   Global Step: 84600   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:50:53,261-Speed 5951.75 samples/sec   Loss 9.1339   LearningRate 0.1731   Epoch: 8   Global Step: 84610   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:51:00,150-Speed 5946.42 samples/sec   Loss 9.1976   LearningRate 0.1730   Epoch: 8   Global Step: 84620   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:51:07,020-Speed 5963.65 samples/sec   Loss 9.1197   LearningRate 0.1730   Epoch: 8   Global Step: 84630   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:51:13,888-Speed 5965.29 samples/sec   Loss 9.2453   LearningRate 0.1730   Epoch: 8   Global Step: 84640   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:51:20,749-Speed 5971.02 samples/sec   Loss 9.2499   LearningRate 0.1730   Epoch: 8   Global Step: 84650   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 12:51:27,611-Speed 5972.20 samples/sec   Loss 9.1812   LearningRate 0.1729   Epoch: 8   Global Step: 84660   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 12:51:34,475-Speed 5968.77 samples/sec   Loss 9.2049   LearningRate 0.1729   Epoch: 8   Global Step: 84670   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:51:41,407-Speed 5909.95 samples/sec   Loss 9.1857   LearningRate 0.1729   Epoch: 8   Global Step: 84680   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:51:48,271-Speed 5968.54 samples/sec   Loss 9.1767   LearningRate 0.1728   Epoch: 8   Global Step: 84690   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:51:55,152-Speed 5956.05 samples/sec   Loss 9.1340   LearningRate 0.1728   Epoch: 8   Global Step: 84700   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:52:02,016-Speed 5968.43 samples/sec   Loss 9.1858   LearningRate 0.1728   Epoch: 8   Global Step: 84710   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:52:08,893-Speed 5957.45 samples/sec   Loss 9.1093   LearningRate 0.1728   Epoch: 8   Global Step: 84720   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:52:15,769-Speed 5958.03 samples/sec   Loss 9.1635   LearningRate 0.1727   Epoch: 8   Global Step: 84730   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:52:22,627-Speed 5973.57 samples/sec   Loss 9.1566   LearningRate 0.1727   Epoch: 8   Global Step: 84740   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:52:29,489-Speed 5970.78 samples/sec   Loss 9.2486   LearningRate 0.1727   Epoch: 8   Global Step: 84750   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:52:36,348-Speed 5976.23 samples/sec   Loss 9.2900   LearningRate 0.1726   Epoch: 8   Global Step: 84760   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:52:43,324-Speed 5872.72 samples/sec   Loss 9.1816   LearningRate 0.1726   Epoch: 8   Global Step: 84770   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 12:52:50,169-Speed 5984.37 samples/sec   Loss 9.2363   LearningRate 0.1726   Epoch: 8   Global Step: 84780   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:52:57,094-Speed 5916.51 samples/sec   Loss 9.1633   LearningRate 0.1726   Epoch: 8   Global Step: 84790   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:53:04,019-Speed 5915.14 samples/sec   Loss 9.1856   LearningRate 0.1725   Epoch: 8   Global Step: 84800   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:53:10,919-Speed 5939.96 samples/sec   Loss 9.2298   LearningRate 0.1725   Epoch: 8   Global Step: 84810   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:53:17,780-Speed 5971.92 samples/sec   Loss 9.2275   LearningRate 0.1725   Epoch: 8   Global Step: 84820   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:53:24,643-Speed 5968.59 samples/sec   Loss 9.2969   LearningRate 0.1725   Epoch: 8   Global Step: 84830   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:53:31,530-Speed 5948.80 samples/sec   Loss 9.2136   LearningRate 0.1724   Epoch: 8   Global Step: 84840   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:53:38,429-Speed 5941.21 samples/sec   Loss 9.2247   LearningRate 0.1724   Epoch: 8   Global Step: 84850   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:53:45,321-Speed 5944.38 samples/sec   Loss 9.1299   LearningRate 0.1724   Epoch: 8   Global Step: 84860   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:53:52,233-Speed 5926.70 samples/sec   Loss 9.1953   LearningRate 0.1723   Epoch: 8   Global Step: 84870   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:53:59,143-Speed 5929.05 samples/sec   Loss 9.1544   LearningRate 0.1723   Epoch: 8   Global Step: 84880   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 12:54:06,022-Speed 5955.37 samples/sec   Loss 9.2645   LearningRate 0.1723   Epoch: 8   Global Step: 84890   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 12:54:12,908-Speed 5950.05 samples/sec   Loss 9.1332   LearningRate 0.1723   Epoch: 8   Global Step: 84900   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:54:19,818-Speed 5929.21 samples/sec   Loss 9.1482   LearningRate 0.1722   Epoch: 8   Global Step: 84910   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:54:26,692-Speed 5960.20 samples/sec   Loss 9.1359   LearningRate 0.1722   Epoch: 8   Global Step: 84920   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:54:33,562-Speed 5962.99 samples/sec   Loss 9.1913   LearningRate 0.1722   Epoch: 8   Global Step: 84930   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:54:40,432-Speed 5963.61 samples/sec   Loss 9.1355   LearningRate 0.1721   Epoch: 8   Global Step: 84940   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:54:47,281-Speed 5981.22 samples/sec   Loss 9.2300   LearningRate 0.1721   Epoch: 8   Global Step: 84950   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:54:54,142-Speed 5974.83 samples/sec   Loss 9.2231   LearningRate 0.1721   Epoch: 8   Global Step: 84960   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:55:01,016-Speed 5960.13 samples/sec   Loss 9.0993   LearningRate 0.1721   Epoch: 8   Global Step: 84970   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:55:07,899-Speed 5951.49 samples/sec   Loss 9.1361   LearningRate 0.1720   Epoch: 8   Global Step: 84980   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:55:14,764-Speed 5968.83 samples/sec   Loss 9.1688   LearningRate 0.1720   Epoch: 8   Global Step: 84990   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:55:21,640-Speed 5958.25 samples/sec   Loss 9.2153   LearningRate 0.1720   Epoch: 8   Global Step: 85000   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 12:55:59,326-[lfw][85000]XNorm: 24.676382
Training: 2022-01-08 12:55:59,327-[lfw][85000]Accuracy-Flip: 0.99683+-0.00241
Training: 2022-01-08 12:55:59,328-[lfw][85000]Accuracy-Highest: 0.99750
Training: 2022-01-08 12:56:30,214-[cfp_fp][85000]XNorm: 21.869296
Training: 2022-01-08 12:56:30,215-[cfp_fp][85000]Accuracy-Flip: 0.98114+-0.00498
Training: 2022-01-08 12:56:30,216-[cfp_fp][85000]Accuracy-Highest: 0.98114
Training: 2022-01-08 12:56:56,978-[agedb_30][85000]XNorm: 24.313713
Training: 2022-01-08 12:56:56,979-[agedb_30][85000]Accuracy-Flip: 0.96583+-0.00704
Training: 2022-01-08 12:56:56,979-[agedb_30][85000]Accuracy-Highest: 0.96883
Training: 2022-01-08 12:57:03,848-Speed 400.76 samples/sec   Loss 9.2098   LearningRate 0.1719   Epoch: 8   Global Step: 85010   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 12:57:10,707-Speed 5973.50 samples/sec   Loss 9.2582   LearningRate 0.1719   Epoch: 8   Global Step: 85020   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:57:17,604-Speed 5940.15 samples/sec   Loss 9.2257   LearningRate 0.1719   Epoch: 8   Global Step: 85030   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:57:24,476-Speed 5962.41 samples/sec   Loss 9.1773   LearningRate 0.1719   Epoch: 8   Global Step: 85040   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:57:31,350-Speed 5959.95 samples/sec   Loss 9.1134   LearningRate 0.1718   Epoch: 8   Global Step: 85050   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:57:38,384-Speed 5824.04 samples/sec   Loss 9.1807   LearningRate 0.1718   Epoch: 8   Global Step: 85060   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:57:45,255-Speed 5963.04 samples/sec   Loss 9.2165   LearningRate 0.1718   Epoch: 8   Global Step: 85070   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:57:52,142-Speed 5948.35 samples/sec   Loss 9.1850   LearningRate 0.1717   Epoch: 8   Global Step: 85080   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:57:59,022-Speed 5954.35 samples/sec   Loss 9.1437   LearningRate 0.1717   Epoch: 8   Global Step: 85090   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:58:05,891-Speed 5964.82 samples/sec   Loss 9.1489   LearningRate 0.1717   Epoch: 8   Global Step: 85100   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:58:12,808-Speed 5921.94 samples/sec   Loss 9.1823   LearningRate 0.1717   Epoch: 8   Global Step: 85110   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:58:19,701-Speed 5943.90 samples/sec   Loss 9.1345   LearningRate 0.1716   Epoch: 8   Global Step: 85120   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 12:58:26,555-Speed 5977.76 samples/sec   Loss 9.1905   LearningRate 0.1716   Epoch: 8   Global Step: 85130   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:58:33,410-Speed 5975.95 samples/sec   Loss 9.1434   LearningRate 0.1716   Epoch: 8   Global Step: 85140   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:58:40,263-Speed 5978.62 samples/sec   Loss 9.1532   LearningRate 0.1716   Epoch: 8   Global Step: 85150   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:58:47,110-Speed 5983.11 samples/sec   Loss 9.2167   LearningRate 0.1715   Epoch: 8   Global Step: 85160   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:58:54,118-Speed 5846.11 samples/sec   Loss 9.1580   LearningRate 0.1715   Epoch: 8   Global Step: 85170   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:59:01,067-Speed 5895.71 samples/sec   Loss 9.2088   LearningRate 0.1715   Epoch: 8   Global Step: 85180   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:59:07,978-Speed 5927.64 samples/sec   Loss 9.2017   LearningRate 0.1714   Epoch: 8   Global Step: 85190   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:59:14,828-Speed 5981.41 samples/sec   Loss 9.1566   LearningRate 0.1714   Epoch: 8   Global Step: 85200   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:59:21,786-Speed 5887.69 samples/sec   Loss 9.1372   LearningRate 0.1714   Epoch: 8   Global Step: 85210   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:59:28,657-Speed 5962.90 samples/sec   Loss 9.1684   LearningRate 0.1714   Epoch: 8   Global Step: 85220   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:59:35,506-Speed 5981.41 samples/sec   Loss 9.1776   LearningRate 0.1713   Epoch: 8   Global Step: 85230   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 12:59:42,351-Speed 5984.77 samples/sec   Loss 9.1444   LearningRate 0.1713   Epoch: 8   Global Step: 85240   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:59:49,202-Speed 5980.27 samples/sec   Loss 9.1900   LearningRate 0.1713   Epoch: 8   Global Step: 85250   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 12:59:56,054-Speed 5978.98 samples/sec   Loss 9.1562   LearningRate 0.1712   Epoch: 8   Global Step: 85260   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:00:02,958-Speed 5934.57 samples/sec   Loss 9.1667   LearningRate 0.1712   Epoch: 8   Global Step: 85270   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:00:09,848-Speed 5945.56 samples/sec   Loss 9.1693   LearningRate 0.1712   Epoch: 8   Global Step: 85280   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:00:16,695-Speed 5983.65 samples/sec   Loss 9.2392   LearningRate 0.1712   Epoch: 8   Global Step: 85290   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:00:23,542-Speed 5982.73 samples/sec   Loss 9.2009   LearningRate 0.1711   Epoch: 8   Global Step: 85300   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:00:30,397-Speed 5977.09 samples/sec   Loss 9.1353   LearningRate 0.1711   Epoch: 8   Global Step: 85310   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:00:37,253-Speed 5975.19 samples/sec   Loss 9.1449   LearningRate 0.1711   Epoch: 8   Global Step: 85320   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:00:44,126-Speed 5961.46 samples/sec   Loss 9.1522   LearningRate 0.1710   Epoch: 8   Global Step: 85330   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:00:51,000-Speed 5959.58 samples/sec   Loss 9.1556   LearningRate 0.1710   Epoch: 8   Global Step: 85340   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 13:00:58,220-Speed 5674.53 samples/sec   Loss 9.1916   LearningRate 0.1710   Epoch: 8   Global Step: 85350   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:01:05,193-Speed 5875.07 samples/sec   Loss 9.1733   LearningRate 0.1710   Epoch: 8   Global Step: 85360   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:01:12,169-Speed 5874.91 samples/sec   Loss 9.2379   LearningRate 0.1709   Epoch: 8   Global Step: 85370   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:01:19,034-Speed 5967.72 samples/sec   Loss 9.2023   LearningRate 0.1709   Epoch: 8   Global Step: 85380   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:01:25,904-Speed 5963.26 samples/sec   Loss 9.1868   LearningRate 0.1709   Epoch: 8   Global Step: 85390   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:01:32,803-Speed 5938.70 samples/sec   Loss 9.1770   LearningRate 0.1709   Epoch: 8   Global Step: 85400   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:01:39,743-Speed 5903.02 samples/sec   Loss 9.1124   LearningRate 0.1708   Epoch: 8   Global Step: 85410   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:01:46,615-Speed 5961.23 samples/sec   Loss 9.1397   LearningRate 0.1708   Epoch: 8   Global Step: 85420   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:01:53,477-Speed 5970.14 samples/sec   Loss 9.1964   LearningRate 0.1708   Epoch: 8   Global Step: 85430   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:02:00,342-Speed 5967.91 samples/sec   Loss 9.1821   LearningRate 0.1707   Epoch: 8   Global Step: 85440   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:02:07,213-Speed 5962.64 samples/sec   Loss 9.2031   LearningRate 0.1707   Epoch: 8   Global Step: 85450   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 13:02:14,082-Speed 5964.67 samples/sec   Loss 9.1691   LearningRate 0.1707   Epoch: 8   Global Step: 85460   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 13:02:20,939-Speed 5974.63 samples/sec   Loss 9.1557   LearningRate 0.1707   Epoch: 8   Global Step: 85470   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 13:02:27,808-Speed 5965.19 samples/sec   Loss 9.0612   LearningRate 0.1706   Epoch: 8   Global Step: 85480   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 13:02:34,660-Speed 5978.53 samples/sec   Loss 9.1282   LearningRate 0.1706   Epoch: 8   Global Step: 85490   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:02:41,542-Speed 5952.93 samples/sec   Loss 9.1552   LearningRate 0.1706   Epoch: 8   Global Step: 85500   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:02:48,406-Speed 5970.55 samples/sec   Loss 9.1449   LearningRate 0.1705   Epoch: 8   Global Step: 85510   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:02:55,258-Speed 5978.99 samples/sec   Loss 9.1371   LearningRate 0.1705   Epoch: 8   Global Step: 85520   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:03:02,114-Speed 5974.72 samples/sec   Loss 9.1829   LearningRate 0.1705   Epoch: 8   Global Step: 85530   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:03:09,026-Speed 5927.80 samples/sec   Loss 9.1258   LearningRate 0.1705   Epoch: 8   Global Step: 85540   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:03:15,878-Speed 5978.50 samples/sec   Loss 9.1269   LearningRate 0.1704   Epoch: 8   Global Step: 85550   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:03:22,750-Speed 5961.23 samples/sec   Loss 9.1389   LearningRate 0.1704   Epoch: 8   Global Step: 85560   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:03:29,603-Speed 5978.68 samples/sec   Loss 9.1890   LearningRate 0.1704   Epoch: 8   Global Step: 85570   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:03:36,479-Speed 5958.59 samples/sec   Loss 9.0745   LearningRate 0.1703   Epoch: 8   Global Step: 85580   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:03:43,321-Speed 5987.20 samples/sec   Loss 9.1074   LearningRate 0.1703   Epoch: 8   Global Step: 85590   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:03:50,180-Speed 5973.16 samples/sec   Loss 9.1281   LearningRate 0.1703   Epoch: 8   Global Step: 85600   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:03:57,048-Speed 5965.17 samples/sec   Loss 9.1559   LearningRate 0.1703   Epoch: 8   Global Step: 85610   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:04:03,908-Speed 5972.42 samples/sec   Loss 9.2488   LearningRate 0.1702   Epoch: 8   Global Step: 85620   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:04:10,779-Speed 5962.81 samples/sec   Loss 9.1424   LearningRate 0.1702   Epoch: 8   Global Step: 85630   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:04:17,652-Speed 5960.91 samples/sec   Loss 9.1537   LearningRate 0.1702   Epoch: 8   Global Step: 85640   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:04:24,520-Speed 5964.62 samples/sec   Loss 9.0688   LearningRate 0.1702   Epoch: 8   Global Step: 85650   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:04:31,371-Speed 5979.81 samples/sec   Loss 9.1092   LearningRate 0.1701   Epoch: 8   Global Step: 85660   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:04:38,229-Speed 5973.46 samples/sec   Loss 9.2079   LearningRate 0.1701   Epoch: 8   Global Step: 85670   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:04:45,087-Speed 5973.62 samples/sec   Loss 9.1774   LearningRate 0.1701   Epoch: 8   Global Step: 85680   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:04:51,935-Speed 5982.46 samples/sec   Loss 9.1632   LearningRate 0.1700   Epoch: 8   Global Step: 85690   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 13:04:58,835-Speed 5937.57 samples/sec   Loss 9.1125   LearningRate 0.1700   Epoch: 8   Global Step: 85700   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 13:05:05,690-Speed 5976.70 samples/sec   Loss 9.1089   LearningRate 0.1700   Epoch: 8   Global Step: 85710   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:05:12,547-Speed 5974.80 samples/sec   Loss 9.1441   LearningRate 0.1700   Epoch: 8   Global Step: 85720   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:05:19,412-Speed 5967.57 samples/sec   Loss 9.1668   LearningRate 0.1699   Epoch: 8   Global Step: 85730   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:05:26,294-Speed 5953.02 samples/sec   Loss 9.1385   LearningRate 0.1699   Epoch: 8   Global Step: 85740   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:05:33,168-Speed 5960.54 samples/sec   Loss 9.1835   LearningRate 0.1699   Epoch: 8   Global Step: 85750   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:05:40,057-Speed 5947.18 samples/sec   Loss 9.1322   LearningRate 0.1698   Epoch: 8   Global Step: 85760   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:05:46,940-Speed 5952.35 samples/sec   Loss 9.1868   LearningRate 0.1698   Epoch: 8   Global Step: 85770   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:05:53,797-Speed 5975.11 samples/sec   Loss 9.1734   LearningRate 0.1698   Epoch: 8   Global Step: 85780   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:06:00,671-Speed 5959.40 samples/sec   Loss 9.0615   LearningRate 0.1698   Epoch: 8   Global Step: 85790   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:06:07,570-Speed 5938.29 samples/sec   Loss 9.0766   LearningRate 0.1697   Epoch: 8   Global Step: 85800   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:06:14,450-Speed 5957.14 samples/sec   Loss 9.1488   LearningRate 0.1697   Epoch: 8   Global Step: 85810   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 13:06:21,321-Speed 5962.64 samples/sec   Loss 9.1407   LearningRate 0.1697   Epoch: 8   Global Step: 85820   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:06:28,191-Speed 5962.68 samples/sec   Loss 9.1682   LearningRate 0.1696   Epoch: 8   Global Step: 85830   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:06:35,063-Speed 5963.60 samples/sec   Loss 9.0593   LearningRate 0.1696   Epoch: 8   Global Step: 85840   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:06:41,934-Speed 5962.62 samples/sec   Loss 9.0953   LearningRate 0.1696   Epoch: 8   Global Step: 85850   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:06:48,787-Speed 5977.78 samples/sec   Loss 9.1345   LearningRate 0.1696   Epoch: 8   Global Step: 85860   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:06:55,644-Speed 5975.53 samples/sec   Loss 9.0832   LearningRate 0.1695   Epoch: 8   Global Step: 85870   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:07:02,514-Speed 5963.69 samples/sec   Loss 9.1766   LearningRate 0.1695   Epoch: 8   Global Step: 85880   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:07:09,382-Speed 5964.52 samples/sec   Loss 9.1310   LearningRate 0.1695   Epoch: 8   Global Step: 85890   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:07:16,285-Speed 5935.18 samples/sec   Loss 9.1443   LearningRate 0.1695   Epoch: 8   Global Step: 85900   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:07:23,144-Speed 5972.82 samples/sec   Loss 9.1509   LearningRate 0.1694   Epoch: 8   Global Step: 85910   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:07:30,010-Speed 5966.65 samples/sec   Loss 9.1465   LearningRate 0.1694   Epoch: 8   Global Step: 85920   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 13:07:36,880-Speed 5964.80 samples/sec   Loss 9.1192   LearningRate 0.1694   Epoch: 8   Global Step: 85930   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 13:07:43,744-Speed 5968.90 samples/sec   Loss 9.1625   LearningRate 0.1693   Epoch: 8   Global Step: 85940   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 13:07:50,626-Speed 5954.74 samples/sec   Loss 9.1470   LearningRate 0.1693   Epoch: 8   Global Step: 85950   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 13:07:57,518-Speed 5944.61 samples/sec   Loss 9.1628   LearningRate 0.1693   Epoch: 8   Global Step: 85960   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 13:08:04,451-Speed 5908.94 samples/sec   Loss 9.1270   LearningRate 0.1693   Epoch: 8   Global Step: 85970   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 13:08:11,325-Speed 5960.38 samples/sec   Loss 9.0889   LearningRate 0.1692   Epoch: 8   Global Step: 85980   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:08:18,186-Speed 5971.36 samples/sec   Loss 9.0986   LearningRate 0.1692   Epoch: 8   Global Step: 85990   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:08:25,051-Speed 5967.94 samples/sec   Loss 9.0863   LearningRate 0.1692   Epoch: 8   Global Step: 86000   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:08:31,917-Speed 5966.95 samples/sec   Loss 9.1037   LearningRate 0.1691   Epoch: 8   Global Step: 86010   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:08:38,779-Speed 5970.26 samples/sec   Loss 9.1353   LearningRate 0.1691   Epoch: 8   Global Step: 86020   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 13:08:45,656-Speed 5956.83 samples/sec   Loss 9.1735   LearningRate 0.1691   Epoch: 8   Global Step: 86030   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 13:08:52,525-Speed 5963.94 samples/sec   Loss 9.1280   LearningRate 0.1691   Epoch: 8   Global Step: 86040   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 13:08:59,408-Speed 5952.02 samples/sec   Loss 9.1007   LearningRate 0.1690   Epoch: 8   Global Step: 86050   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 13:09:06,268-Speed 5973.32 samples/sec   Loss 9.1471   LearningRate 0.1690   Epoch: 8   Global Step: 86060   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 13:09:13,132-Speed 5968.50 samples/sec   Loss 9.1841   LearningRate 0.1690   Epoch: 8   Global Step: 86070   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 13:09:19,989-Speed 5973.96 samples/sec   Loss 9.0865   LearningRate 0.1690   Epoch: 8   Global Step: 86080   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 13:09:26,875-Speed 5952.44 samples/sec   Loss 9.1124   LearningRate 0.1689   Epoch: 8   Global Step: 86090   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 13:09:33,728-Speed 5977.73 samples/sec   Loss 9.0930   LearningRate 0.1689   Epoch: 8   Global Step: 86100   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 13:09:40,595-Speed 5965.97 samples/sec   Loss 9.1295   LearningRate 0.1689   Epoch: 8   Global Step: 86110   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-01-08 13:09:47,472-Speed 5957.32 samples/sec   Loss 9.2116   LearningRate 0.1688   Epoch: 8   Global Step: 86120   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:09:54,358-Speed 5949.20 samples/sec   Loss 9.1026   LearningRate 0.1688   Epoch: 8   Global Step: 86130   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:10:01,207-Speed 5981.99 samples/sec   Loss 9.0902   LearningRate 0.1688   Epoch: 8   Global Step: 86140   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:10:08,081-Speed 5959.24 samples/sec   Loss 9.1448   LearningRate 0.1688   Epoch: 8   Global Step: 86150   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:10:14,980-Speed 5938.87 samples/sec   Loss 9.1428   LearningRate 0.1687   Epoch: 8   Global Step: 86160   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:10:21,843-Speed 5969.60 samples/sec   Loss 9.1121   LearningRate 0.1687   Epoch: 8   Global Step: 86170   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:10:28,705-Speed 5969.79 samples/sec   Loss 9.0934   LearningRate 0.1687   Epoch: 8   Global Step: 86180   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:10:35,582-Speed 5957.38 samples/sec   Loss 9.0483   LearningRate 0.1686   Epoch: 8   Global Step: 86190   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:10:42,468-Speed 5950.17 samples/sec   Loss 9.0755   LearningRate 0.1686   Epoch: 8   Global Step: 86200   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:10:49,362-Speed 5942.51 samples/sec   Loss 9.0332   LearningRate 0.1686   Epoch: 8   Global Step: 86210   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:10:56,241-Speed 5954.99 samples/sec   Loss 9.0686   LearningRate 0.1686   Epoch: 8   Global Step: 86220   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:11:03,112-Speed 5963.21 samples/sec   Loss 9.1465   LearningRate 0.1685   Epoch: 8   Global Step: 86230   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:11:09,974-Speed 5970.34 samples/sec   Loss 9.0854   LearningRate 0.1685   Epoch: 8   Global Step: 86240   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:11:16,829-Speed 5975.75 samples/sec   Loss 9.2217   LearningRate 0.1685   Epoch: 8   Global Step: 86250   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:11:23,698-Speed 5964.30 samples/sec   Loss 9.1201   LearningRate 0.1685   Epoch: 8   Global Step: 86260   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:11:30,559-Speed 5971.72 samples/sec   Loss 9.1551   LearningRate 0.1684   Epoch: 8   Global Step: 86270   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:11:37,441-Speed 5952.75 samples/sec   Loss 9.1008   LearningRate 0.1684   Epoch: 8   Global Step: 86280   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:11:44,302-Speed 5970.95 samples/sec   Loss 9.0636   LearningRate 0.1684   Epoch: 8   Global Step: 86290   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:11:51,169-Speed 5966.59 samples/sec   Loss 9.1015   LearningRate 0.1683   Epoch: 8   Global Step: 86300   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:11:58,027-Speed 5973.63 samples/sec   Loss 9.0883   LearningRate 0.1683   Epoch: 8   Global Step: 86310   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:12:04,903-Speed 5958.85 samples/sec   Loss 9.1180   LearningRate 0.1683   Epoch: 8   Global Step: 86320   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:12:11,782-Speed 5955.51 samples/sec   Loss 9.1147   LearningRate 0.1683   Epoch: 8   Global Step: 86330   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:12:18,649-Speed 5965.73 samples/sec   Loss 9.1432   LearningRate 0.1682   Epoch: 8   Global Step: 86340   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:12:25,509-Speed 5971.95 samples/sec   Loss 9.0465   LearningRate 0.1682   Epoch: 8   Global Step: 86350   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:12:32,373-Speed 5969.20 samples/sec   Loss 9.1730   LearningRate 0.1682   Epoch: 8   Global Step: 86360   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:12:39,229-Speed 5974.56 samples/sec   Loss 9.0540   LearningRate 0.1681   Epoch: 8   Global Step: 86370   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:12:46,097-Speed 5966.14 samples/sec   Loss 9.0253   LearningRate 0.1681   Epoch: 8   Global Step: 86380   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:12:52,995-Speed 5938.84 samples/sec   Loss 9.0970   LearningRate 0.1681   Epoch: 8   Global Step: 86390   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:12:59,886-Speed 5945.83 samples/sec   Loss 9.0205   LearningRate 0.1681   Epoch: 8   Global Step: 86400   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:13:06,752-Speed 5967.00 samples/sec   Loss 9.1114   LearningRate 0.1680   Epoch: 8   Global Step: 86410   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:13:13,624-Speed 5962.01 samples/sec   Loss 9.1092   LearningRate 0.1680   Epoch: 8   Global Step: 86420   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:13:20,491-Speed 5965.71 samples/sec   Loss 9.0737   LearningRate 0.1680   Epoch: 8   Global Step: 86430   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:13:27,349-Speed 5973.71 samples/sec   Loss 9.0363   LearningRate 0.1680   Epoch: 8   Global Step: 86440   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:13:34,213-Speed 5969.01 samples/sec   Loss 9.0941   LearningRate 0.1679   Epoch: 8   Global Step: 86450   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:13:41,081-Speed 5965.35 samples/sec   Loss 9.0115   LearningRate 0.1679   Epoch: 8   Global Step: 86460   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:13:47,942-Speed 5970.93 samples/sec   Loss 9.0584   LearningRate 0.1679   Epoch: 8   Global Step: 86470   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:13:54,806-Speed 5968.40 samples/sec   Loss 9.0554   LearningRate 0.1678   Epoch: 8   Global Step: 86480   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:14:01,681-Speed 5959.26 samples/sec   Loss 9.0908   LearningRate 0.1678   Epoch: 8   Global Step: 86490   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 13:14:11,145-Speed 4328.57 samples/sec   Loss 9.0950   LearningRate 0.1678   Epoch: 8   Global Step: 86500   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 13:14:17,995-Speed 5981.48 samples/sec   Loss 9.0964   LearningRate 0.1678   Epoch: 8   Global Step: 86510   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:14:24,874-Speed 5954.75 samples/sec   Loss 9.1014   LearningRate 0.1677   Epoch: 8   Global Step: 86520   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:14:31,736-Speed 5971.15 samples/sec   Loss 9.0775   LearningRate 0.1677   Epoch: 8   Global Step: 86530   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:14:38,631-Speed 5941.45 samples/sec   Loss 9.0739   LearningRate 0.1677   Epoch: 8   Global Step: 86540   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:14:45,483-Speed 5978.32 samples/sec   Loss 9.0832   LearningRate 0.1676   Epoch: 8   Global Step: 86550   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:14:52,344-Speed 5971.67 samples/sec   Loss 9.0505   LearningRate 0.1676   Epoch: 8   Global Step: 86560   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:14:59,210-Speed 5966.66 samples/sec   Loss 9.1199   LearningRate 0.1676   Epoch: 8   Global Step: 86570   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:15:06,073-Speed 5969.04 samples/sec   Loss 9.1205   LearningRate 0.1676   Epoch: 8   Global Step: 86580   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:15:12,929-Speed 5975.98 samples/sec   Loss 9.1344   LearningRate 0.1675   Epoch: 8   Global Step: 86590   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:15:19,786-Speed 5974.24 samples/sec   Loss 9.0481   LearningRate 0.1675   Epoch: 8   Global Step: 86600   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:15:26,647-Speed 5971.25 samples/sec   Loss 9.0665   LearningRate 0.1675   Epoch: 8   Global Step: 86610   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:15:33,514-Speed 5966.02 samples/sec   Loss 9.0846   LearningRate 0.1675   Epoch: 8   Global Step: 86620   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:15:40,369-Speed 5976.50 samples/sec   Loss 9.0986   LearningRate 0.1674   Epoch: 8   Global Step: 86630   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:15:47,254-Speed 5950.27 samples/sec   Loss 8.9905   LearningRate 0.1674   Epoch: 8   Global Step: 86640   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:15:54,129-Speed 5959.03 samples/sec   Loss 9.1102   LearningRate 0.1674   Epoch: 8   Global Step: 86650   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:16:00,996-Speed 5965.68 samples/sec   Loss 9.0621   LearningRate 0.1673   Epoch: 8   Global Step: 86660   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:16:07,856-Speed 5972.50 samples/sec   Loss 9.0461   LearningRate 0.1673   Epoch: 8   Global Step: 86670   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:16:14,725-Speed 5964.37 samples/sec   Loss 9.1278   LearningRate 0.1673   Epoch: 8   Global Step: 86680   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:16:21,599-Speed 5959.59 samples/sec   Loss 9.0922   LearningRate 0.1673   Epoch: 8   Global Step: 86690   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:16:28,460-Speed 5971.27 samples/sec   Loss 9.0892   LearningRate 0.1672   Epoch: 8   Global Step: 86700   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:16:35,330-Speed 5963.37 samples/sec   Loss 9.1281   LearningRate 0.1672   Epoch: 8   Global Step: 86710   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:16:42,200-Speed 5963.56 samples/sec   Loss 9.0035   LearningRate 0.1672   Epoch: 8   Global Step: 86720   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:16:49,108-Speed 5930.24 samples/sec   Loss 9.0183   LearningRate 0.1671   Epoch: 8   Global Step: 86730   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:16:55,972-Speed 5969.06 samples/sec   Loss 9.0700   LearningRate 0.1671   Epoch: 8   Global Step: 86740   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:17:02,840-Speed 5965.48 samples/sec   Loss 9.0867   LearningRate 0.1671   Epoch: 8   Global Step: 86750   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:17:09,711-Speed 5961.84 samples/sec   Loss 8.9989   LearningRate 0.1671   Epoch: 8   Global Step: 86760   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:17:16,580-Speed 5964.60 samples/sec   Loss 9.1123   LearningRate 0.1670   Epoch: 8   Global Step: 86770   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:17:23,454-Speed 5959.49 samples/sec   Loss 9.0577   LearningRate 0.1670   Epoch: 8   Global Step: 86780   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:17:30,307-Speed 5978.14 samples/sec   Loss 9.1001   LearningRate 0.1670   Epoch: 8   Global Step: 86790   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:17:37,174-Speed 5966.12 samples/sec   Loss 9.0533   LearningRate 0.1670   Epoch: 8   Global Step: 86800   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:17:44,051-Speed 5956.92 samples/sec   Loss 8.9999   LearningRate 0.1669   Epoch: 8   Global Step: 86810   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:17:50,915-Speed 5968.30 samples/sec   Loss 8.9591   LearningRate 0.1669   Epoch: 8   Global Step: 86820   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:17:57,791-Speed 5958.71 samples/sec   Loss 9.0752   LearningRate 0.1669   Epoch: 8   Global Step: 86830   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:18:04,691-Speed 5939.96 samples/sec   Loss 9.0282   LearningRate 0.1668   Epoch: 8   Global Step: 86840   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:18:11,531-Speed 5989.29 samples/sec   Loss 9.0941   LearningRate 0.1668   Epoch: 8   Global Step: 86850   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:18:18,404-Speed 5960.49 samples/sec   Loss 9.0041   LearningRate 0.1668   Epoch: 8   Global Step: 86860   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:18:25,270-Speed 5967.26 samples/sec   Loss 9.0224   LearningRate 0.1668   Epoch: 8   Global Step: 86870   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:18:32,134-Speed 5968.66 samples/sec   Loss 9.0144   LearningRate 0.1667   Epoch: 8   Global Step: 86880   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:18:38,998-Speed 5968.38 samples/sec   Loss 9.0535   LearningRate 0.1667   Epoch: 8   Global Step: 86890   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:18:45,863-Speed 5967.84 samples/sec   Loss 9.0819   LearningRate 0.1667   Epoch: 8   Global Step: 86900   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:18:52,717-Speed 5977.02 samples/sec   Loss 9.1402   LearningRate 0.1666   Epoch: 8   Global Step: 86910   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:18:59,563-Speed 5983.82 samples/sec   Loss 9.0872   LearningRate 0.1666   Epoch: 8   Global Step: 86920   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:19:06,443-Speed 5954.59 samples/sec   Loss 9.0546   LearningRate 0.1666   Epoch: 8   Global Step: 86930   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:19:13,298-Speed 5976.49 samples/sec   Loss 9.0061   LearningRate 0.1666   Epoch: 8   Global Step: 86940   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:19:20,214-Speed 5924.24 samples/sec   Loss 8.9606   LearningRate 0.1665   Epoch: 8   Global Step: 86950   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:19:27,101-Speed 5949.64 samples/sec   Loss 9.0020   LearningRate 0.1665   Epoch: 8   Global Step: 86960   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:19:33,981-Speed 5954.68 samples/sec   Loss 8.9649   LearningRate 0.1665   Epoch: 8   Global Step: 86970   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:19:40,842-Speed 5971.33 samples/sec   Loss 9.0208   LearningRate 0.1665   Epoch: 8   Global Step: 86980   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:19:47,705-Speed 5969.54 samples/sec   Loss 9.0839   LearningRate 0.1664   Epoch: 8   Global Step: 86990   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:19:54,582-Speed 5956.83 samples/sec   Loss 9.1570   LearningRate 0.1664   Epoch: 8   Global Step: 87000   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:20:01,452-Speed 5963.69 samples/sec   Loss 9.0541   LearningRate 0.1664   Epoch: 8   Global Step: 87010   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:20:08,315-Speed 5971.61 samples/sec   Loss 9.0047   LearningRate 0.1663   Epoch: 8   Global Step: 87020   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:20:15,183-Speed 5964.97 samples/sec   Loss 9.0520   LearningRate 0.1663   Epoch: 8   Global Step: 87030   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:20:22,053-Speed 5963.96 samples/sec   Loss 9.1114   LearningRate 0.1663   Epoch: 8   Global Step: 87040   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:20:28,933-Speed 5954.91 samples/sec   Loss 9.0957   LearningRate 0.1663   Epoch: 8   Global Step: 87050   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:20:35,799-Speed 5966.08 samples/sec   Loss 9.0549   LearningRate 0.1662   Epoch: 8   Global Step: 87060   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:20:42,662-Speed 5969.45 samples/sec   Loss 9.0519   LearningRate 0.1662   Epoch: 8   Global Step: 87070   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:20:49,528-Speed 5966.92 samples/sec   Loss 9.0337   LearningRate 0.1662   Epoch: 8   Global Step: 87080   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 13:20:56,467-Speed 5904.58 samples/sec   Loss 9.0329   LearningRate 0.1661   Epoch: 8   Global Step: 87090   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:21:03,319-Speed 5978.93 samples/sec   Loss 9.0001   LearningRate 0.1661   Epoch: 8   Global Step: 87100   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:21:10,174-Speed 5976.99 samples/sec   Loss 9.0399   LearningRate 0.1661   Epoch: 8   Global Step: 87110   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:21:17,043-Speed 5963.45 samples/sec   Loss 8.9887   LearningRate 0.1661   Epoch: 8   Global Step: 87120   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:21:25,426-Speed 4887.14 samples/sec   Loss 8.9950   LearningRate 0.1660   Epoch: 8   Global Step: 87130   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:21:32,308-Speed 5954.41 samples/sec   Loss 8.9937   LearningRate 0.1660   Epoch: 8   Global Step: 87140   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:21:39,162-Speed 5976.85 samples/sec   Loss 9.0880   LearningRate 0.1660   Epoch: 8   Global Step: 87150   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:21:46,037-Speed 5959.39 samples/sec   Loss 8.9811   LearningRate 0.1660   Epoch: 8   Global Step: 87160   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:21:52,907-Speed 5964.89 samples/sec   Loss 9.0048   LearningRate 0.1659   Epoch: 8   Global Step: 87170   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:21:59,760-Speed 5977.31 samples/sec   Loss 8.9834   LearningRate 0.1659   Epoch: 8   Global Step: 87180   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:22:06,600-Speed 5989.88 samples/sec   Loss 9.0431   LearningRate 0.1659   Epoch: 8   Global Step: 87190   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:22:13,456-Speed 5975.46 samples/sec   Loss 8.9971   LearningRate 0.1658   Epoch: 8   Global Step: 87200   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:22:20,323-Speed 5965.59 samples/sec   Loss 9.0475   LearningRate 0.1658   Epoch: 8   Global Step: 87210   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:22:27,190-Speed 5965.70 samples/sec   Loss 9.0003   LearningRate 0.1658   Epoch: 8   Global Step: 87220   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:22:34,052-Speed 5973.76 samples/sec   Loss 9.0575   LearningRate 0.1658   Epoch: 8   Global Step: 87230   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:22:40,903-Speed 5979.47 samples/sec   Loss 9.0065   LearningRate 0.1657   Epoch: 8   Global Step: 87240   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:22:47,751-Speed 5985.09 samples/sec   Loss 8.9979   LearningRate 0.1657   Epoch: 8   Global Step: 87250   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:22:54,613-Speed 5969.55 samples/sec   Loss 9.1282   LearningRate 0.1657   Epoch: 8   Global Step: 87260   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:23:01,478-Speed 5968.34 samples/sec   Loss 9.0170   LearningRate 0.1657   Epoch: 8   Global Step: 87270   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:23:08,340-Speed 5969.73 samples/sec   Loss 8.9908   LearningRate 0.1656   Epoch: 8   Global Step: 87280   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:23:15,197-Speed 5975.50 samples/sec   Loss 9.0760   LearningRate 0.1656   Epoch: 8   Global Step: 87290   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 13:23:22,082-Speed 5950.09 samples/sec   Loss 9.0539   LearningRate 0.1656   Epoch: 8   Global Step: 87300   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-01-08 13:23:28,951-Speed 5964.15 samples/sec   Loss 9.0606   LearningRate 0.1655   Epoch: 8   Global Step: 87310   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:23:35,806-Speed 5977.10 samples/sec   Loss 9.0505   LearningRate 0.1655   Epoch: 8   Global Step: 87320   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:23:42,664-Speed 5973.73 samples/sec   Loss 8.9265   LearningRate 0.1655   Epoch: 8   Global Step: 87330   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:23:49,560-Speed 5941.57 samples/sec   Loss 9.0285   LearningRate 0.1655   Epoch: 8   Global Step: 87340   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:23:56,457-Speed 5948.00 samples/sec   Loss 9.0931   LearningRate 0.1654   Epoch: 8   Global Step: 87350   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:24:03,328-Speed 5962.20 samples/sec   Loss 8.9937   LearningRate 0.1654   Epoch: 8   Global Step: 87360   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:24:10,197-Speed 5964.01 samples/sec   Loss 8.9847   LearningRate 0.1654   Epoch: 8   Global Step: 87370   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:24:17,046-Speed 5982.86 samples/sec   Loss 8.9755   LearningRate 0.1653   Epoch: 8   Global Step: 87380   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:24:23,903-Speed 5975.03 samples/sec   Loss 9.0489   LearningRate 0.1653   Epoch: 8   Global Step: 87390   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:24:30,780-Speed 5957.51 samples/sec   Loss 9.0075   LearningRate 0.1653   Epoch: 8   Global Step: 87400   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:24:37,631-Speed 5979.60 samples/sec   Loss 9.0850   LearningRate 0.1653   Epoch: 8   Global Step: 87410   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:24:44,478-Speed 5983.24 samples/sec   Loss 9.0399   LearningRate 0.1652   Epoch: 8   Global Step: 87420   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:24:51,321-Speed 5987.58 samples/sec   Loss 8.9592   LearningRate 0.1652   Epoch: 8   Global Step: 87430   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:24:58,175-Speed 5977.80 samples/sec   Loss 8.9702   LearningRate 0.1652   Epoch: 8   Global Step: 87440   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:25:05,052-Speed 5957.18 samples/sec   Loss 9.0045   LearningRate 0.1652   Epoch: 8   Global Step: 87450   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:25:11,909-Speed 5974.86 samples/sec   Loss 9.0413   LearningRate 0.1651   Epoch: 8   Global Step: 87460   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:25:18,782-Speed 5960.72 samples/sec   Loss 9.0003   LearningRate 0.1651   Epoch: 8   Global Step: 87470   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:25:25,642-Speed 5971.53 samples/sec   Loss 8.9786   LearningRate 0.1651   Epoch: 8   Global Step: 87480   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:25:32,490-Speed 5983.49 samples/sec   Loss 8.9050   LearningRate 0.1650   Epoch: 8   Global Step: 87490   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:25:39,338-Speed 5982.60 samples/sec   Loss 8.9706   LearningRate 0.1650   Epoch: 8   Global Step: 87500   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:25:46,212-Speed 5959.49 samples/sec   Loss 8.9474   LearningRate 0.1650   Epoch: 8   Global Step: 87510   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:25:53,078-Speed 5967.43 samples/sec   Loss 9.0083   LearningRate 0.1650   Epoch: 8   Global Step: 87520   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:25:59,932-Speed 5977.17 samples/sec   Loss 8.9973   LearningRate 0.1649   Epoch: 8   Global Step: 87530   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-01-08 13:26:06,793-Speed 5971.66 samples/sec   Loss 9.0851   LearningRate 0.1649   Epoch: 8   Global Step: 87540   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-01-08 13:26:13,656-Speed 5969.09 samples/sec   Loss 9.0116   LearningRate 0.1649   Epoch: 8   Global Step: 87550   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:26:21,009-Speed 5571.54 samples/sec   Loss 8.8967   LearningRate 0.1649   Epoch: 8   Global Step: 87560   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:26:27,876-Speed 5965.70 samples/sec   Loss 8.9699   LearningRate 0.1648   Epoch: 8   Global Step: 87570   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:26:34,774-Speed 5939.30 samples/sec   Loss 9.1285   LearningRate 0.1648   Epoch: 8   Global Step: 87580   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:26:41,648-Speed 5960.40 samples/sec   Loss 9.0058   LearningRate 0.1648   Epoch: 8   Global Step: 87590   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:26:48,500-Speed 5978.72 samples/sec   Loss 9.0037   LearningRate 0.1647   Epoch: 8   Global Step: 87600   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:26:55,391-Speed 5945.36 samples/sec   Loss 8.9320   LearningRate 0.1647   Epoch: 8   Global Step: 87610   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:27:02,260-Speed 5963.84 samples/sec   Loss 8.9895   LearningRate 0.1647   Epoch: 8   Global Step: 87620   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:27:09,138-Speed 5956.44 samples/sec   Loss 9.0776   LearningRate 0.1647   Epoch: 8   Global Step: 87630   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:27:16,005-Speed 5965.74 samples/sec   Loss 8.9920   LearningRate 0.1646   Epoch: 8   Global Step: 87640   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-01-08 13:27:22,886-Speed 5954.60 samples/sec   Loss 9.0130   LearningRate 0.1646   Epoch: 8   Global Step: 87650   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-01-08 13:27:29,758-Speed 5961.47 samples/sec   Loss 8.9959   LearningRate 0.1646   Epoch: 8   Global Step: 87660   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-01-08 13:27:36,622-Speed 5968.36 samples/sec   Loss 9.0034   LearningRate 0.1646   Epoch: 8   Global Step: 87670   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-01-08 13:27:43,494-Speed 5961.94 samples/sec   Loss 8.9714   LearningRate 0.1645   Epoch: 8   Global Step: 87680   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-01-08 13:27:50,337-Speed 5986.35 samples/sec   Loss 9.0013   LearningRate 0.1645   Epoch: 8   Global Step: 87690   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-01-08 13:27:57,229-Speed 5946.22 samples/sec   Loss 8.9459   LearningRate 0.1645   Epoch: 8   Global Step: 87700   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:28:04,078-Speed 5981.69 samples/sec   Loss 8.9295   LearningRate 0.1644   Epoch: 8   Global Step: 87710   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:28:10,923-Speed 5984.90 samples/sec   Loss 9.0471   LearningRate 0.1644   Epoch: 8   Global Step: 87720   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:28:17,778-Speed 5976.02 samples/sec   Loss 9.0276   LearningRate 0.1644   Epoch: 8   Global Step: 87730   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:28:24,624-Speed 5984.51 samples/sec   Loss 8.9952   LearningRate 0.1644   Epoch: 8   Global Step: 87740   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:28:31,476-Speed 5978.64 samples/sec   Loss 8.9399   LearningRate 0.1643   Epoch: 8   Global Step: 87750   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:28:38,315-Speed 5989.61 samples/sec   Loss 8.9694   LearningRate 0.1643   Epoch: 8   Global Step: 87760   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:28:45,184-Speed 5965.77 samples/sec   Loss 9.0154   LearningRate 0.1643   Epoch: 8   Global Step: 87770   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:28:52,032-Speed 5982.42 samples/sec   Loss 8.9978   LearningRate 0.1642   Epoch: 8   Global Step: 87780   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:28:58,893-Speed 5971.15 samples/sec   Loss 8.9846   LearningRate 0.1642   Epoch: 8   Global Step: 87790   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:29:05,736-Speed 5986.92 samples/sec   Loss 8.9323   LearningRate 0.1642   Epoch: 8   Global Step: 87800   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:29:12,615-Speed 5954.92 samples/sec   Loss 8.9552   LearningRate 0.1642   Epoch: 8   Global Step: 87810   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:29:19,469-Speed 5977.26 samples/sec   Loss 8.9894   LearningRate 0.1641   Epoch: 8   Global Step: 87820   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:29:26,309-Speed 5989.57 samples/sec   Loss 9.0009   LearningRate 0.1641   Epoch: 8   Global Step: 87830   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:29:33,175-Speed 5966.59 samples/sec   Loss 9.0020   LearningRate 0.1641   Epoch: 8   Global Step: 87840   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:29:40,057-Speed 5956.41 samples/sec   Loss 8.9751   LearningRate 0.1641   Epoch: 8   Global Step: 87850   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:29:46,927-Speed 5963.91 samples/sec   Loss 8.9595   LearningRate 0.1640   Epoch: 8   Global Step: 87860   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:29:53,787-Speed 5971.73 samples/sec   Loss 8.9925   LearningRate 0.1640   Epoch: 8   Global Step: 87870   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:30:00,648-Speed 5971.64 samples/sec   Loss 8.9351   LearningRate 0.1640   Epoch: 8   Global Step: 87880   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:30:07,538-Speed 5946.13 samples/sec   Loss 9.0021   LearningRate 0.1639   Epoch: 8   Global Step: 87890   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:30:14,432-Speed 5942.32 samples/sec   Loss 9.0458   LearningRate 0.1639   Epoch: 8   Global Step: 87900   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-01-08 13:30:21,274-Speed 5988.15 samples/sec   Loss 9.0007   LearningRate 0.1639   Epoch: 8   Global Step: 87910   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:30:28,127-Speed 5981.42 samples/sec   Loss 8.9538   LearningRate 0.1639   Epoch: 8   Global Step: 87920   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:30:35,037-Speed 5928.50 samples/sec   Loss 9.0013   LearningRate 0.1638   Epoch: 8   Global Step: 87930   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:30:41,909-Speed 5961.64 samples/sec   Loss 8.9211   LearningRate 0.1638   Epoch: 8   Global Step: 87940   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:30:48,796-Speed 5948.96 samples/sec   Loss 8.9954   LearningRate 0.1638   Epoch: 8   Global Step: 87950   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:30:55,642-Speed 5984.64 samples/sec   Loss 8.9615   LearningRate 0.1638   Epoch: 8   Global Step: 87960   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 13:31:02,538-Speed 5940.46 samples/sec   Loss 8.9678   LearningRate 0.1637   Epoch: 8   Global Step: 87970   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 13:31:09,400-Speed 5971.19 samples/sec   Loss 8.8999   LearningRate 0.1637   Epoch: 8   Global Step: 87980   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 13:31:16,239-Speed 5989.63 samples/sec   Loss 9.0366   LearningRate 0.1637   Epoch: 8   Global Step: 87990   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 13:31:23,127-Speed 5948.80 samples/sec   Loss 8.9250   LearningRate 0.1636   Epoch: 8   Global Step: 88000   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 13:31:29,983-Speed 5978.49 samples/sec   Loss 9.0020   LearningRate 0.1636   Epoch: 8   Global Step: 88010   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 13:31:36,847-Speed 5967.72 samples/sec   Loss 9.0246   LearningRate 0.1636   Epoch: 8   Global Step: 88020   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 13:31:43,716-Speed 5964.73 samples/sec   Loss 8.9016   LearningRate 0.1636   Epoch: 8   Global Step: 88030   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 13:31:50,574-Speed 5973.41 samples/sec   Loss 8.9854   LearningRate 0.1635   Epoch: 8   Global Step: 88040   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 13:31:57,431-Speed 5974.06 samples/sec   Loss 8.9731   LearningRate 0.1635   Epoch: 8   Global Step: 88050   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 13:32:04,327-Speed 5942.02 samples/sec   Loss 8.9761   LearningRate 0.1635   Epoch: 8   Global Step: 88060   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:32:11,171-Speed 5986.44 samples/sec   Loss 9.0159   LearningRate 0.1635   Epoch: 8   Global Step: 88070   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:32:18,050-Speed 5955.63 samples/sec   Loss 8.9598   LearningRate 0.1634   Epoch: 8   Global Step: 88080   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:32:24,903-Speed 5978.24 samples/sec   Loss 9.0476   LearningRate 0.1634   Epoch: 8   Global Step: 88090   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:32:31,758-Speed 5976.71 samples/sec   Loss 8.9219   LearningRate 0.1634   Epoch: 8   Global Step: 88100   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:32:38,603-Speed 5984.65 samples/sec   Loss 8.9267   LearningRate 0.1633   Epoch: 8   Global Step: 88110   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:32:45,515-Speed 5976.90 samples/sec   Loss 8.9529   LearningRate 0.1633   Epoch: 8   Global Step: 88120   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:32:52,391-Speed 5958.37 samples/sec   Loss 8.9393   LearningRate 0.1633   Epoch: 8   Global Step: 88130   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:32:59,266-Speed 5961.34 samples/sec   Loss 8.9432   LearningRate 0.1633   Epoch: 8   Global Step: 88140   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:33:06,113-Speed 5982.70 samples/sec   Loss 8.8700   LearningRate 0.1632   Epoch: 8   Global Step: 88150   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:33:12,973-Speed 5972.46 samples/sec   Loss 8.9178   LearningRate 0.1632   Epoch: 8   Global Step: 88160   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:33:19,826-Speed 5980.24 samples/sec   Loss 8.9539   LearningRate 0.1632   Epoch: 8   Global Step: 88170   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:33:26,734-Speed 5930.26 samples/sec   Loss 8.8745   LearningRate 0.1632   Epoch: 8   Global Step: 88180   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:33:33,592-Speed 5973.53 samples/sec   Loss 8.8297   LearningRate 0.1631   Epoch: 8   Global Step: 88190   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:33:40,447-Speed 5978.55 samples/sec   Loss 8.9633   LearningRate 0.1631   Epoch: 8   Global Step: 88200   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:33:47,325-Speed 5955.86 samples/sec   Loss 8.9309   LearningRate 0.1631   Epoch: 8   Global Step: 88210   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:33:54,216-Speed 5945.94 samples/sec   Loss 8.9317   LearningRate 0.1630   Epoch: 8   Global Step: 88220   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:34:01,108-Speed 5945.70 samples/sec   Loss 8.9243   LearningRate 0.1630   Epoch: 8   Global Step: 88230   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:34:07,983-Speed 5958.97 samples/sec   Loss 8.9939   LearningRate 0.1630   Epoch: 8   Global Step: 88240   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 13:34:14,829-Speed 5984.40 samples/sec   Loss 8.9234   LearningRate 0.1630   Epoch: 8   Global Step: 88250   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 13:34:21,675-Speed 5983.90 samples/sec   Loss 8.8985   LearningRate 0.1629   Epoch: 8   Global Step: 88260   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 13:34:28,514-Speed 5990.28 samples/sec   Loss 8.9564   LearningRate 0.1629   Epoch: 8   Global Step: 88270   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 13:34:35,369-Speed 5977.13 samples/sec   Loss 8.9944   LearningRate 0.1629   Epoch: 8   Global Step: 88280   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 13:34:42,214-Speed 5985.38 samples/sec   Loss 8.9421   LearningRate 0.1629   Epoch: 8   Global Step: 88290   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 13:34:49,084-Speed 5962.70 samples/sec   Loss 8.9756   LearningRate 0.1628   Epoch: 8   Global Step: 88300   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 13:34:55,942-Speed 5974.18 samples/sec   Loss 8.9768   LearningRate 0.1628   Epoch: 8   Global Step: 88310   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 13:35:02,803-Speed 5971.13 samples/sec   Loss 9.0328   LearningRate 0.1628   Epoch: 8   Global Step: 88320   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 13:35:09,663-Speed 5972.09 samples/sec   Loss 8.9216   LearningRate 0.1627   Epoch: 8   Global Step: 88330   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 13:35:16,531-Speed 5965.50 samples/sec   Loss 8.9036   LearningRate 0.1627   Epoch: 8   Global Step: 88340   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:35:23,403-Speed 5961.00 samples/sec   Loss 8.9301   LearningRate 0.1627   Epoch: 8   Global Step: 88350   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:35:30,252-Speed 5981.61 samples/sec   Loss 8.9843   LearningRate 0.1627   Epoch: 8   Global Step: 88360   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:35:37,106-Speed 5977.76 samples/sec   Loss 8.9709   LearningRate 0.1626   Epoch: 8   Global Step: 88370   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:35:43,954-Speed 5982.66 samples/sec   Loss 8.9751   LearningRate 0.1626   Epoch: 8   Global Step: 88380   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:35:50,793-Speed 5990.04 samples/sec   Loss 8.9935   LearningRate 0.1626   Epoch: 8   Global Step: 88390   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:35:57,645-Speed 5981.03 samples/sec   Loss 8.9237   LearningRate 0.1626   Epoch: 8   Global Step: 88400   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:36:04,494-Speed 5981.88 samples/sec   Loss 8.9167   LearningRate 0.1625   Epoch: 8   Global Step: 88410   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:36:11,339-Speed 5984.92 samples/sec   Loss 8.8726   LearningRate 0.1625   Epoch: 8   Global Step: 88420   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:36:18,214-Speed 5960.11 samples/sec   Loss 8.8997   LearningRate 0.1625   Epoch: 8   Global Step: 88430   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:36:25,066-Speed 5979.87 samples/sec   Loss 8.9484   LearningRate 0.1624   Epoch: 8   Global Step: 88440   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:36:31,922-Speed 5975.70 samples/sec   Loss 8.9433   LearningRate 0.1624   Epoch: 8   Global Step: 88450   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:36:38,795-Speed 5960.44 samples/sec   Loss 8.9498   LearningRate 0.1624   Epoch: 8   Global Step: 88460   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:36:45,647-Speed 5981.88 samples/sec   Loss 8.9681   LearningRate 0.1624   Epoch: 8   Global Step: 88470   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:36:52,498-Speed 5979.78 samples/sec   Loss 8.9842   LearningRate 0.1623   Epoch: 8   Global Step: 88480   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:36:59,359-Speed 5973.85 samples/sec   Loss 8.9557   LearningRate 0.1623   Epoch: 8   Global Step: 88490   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:37:06,237-Speed 5956.25 samples/sec   Loss 9.0095   LearningRate 0.1623   Epoch: 8   Global Step: 88500   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:37:13,127-Speed 5946.10 samples/sec   Loss 9.0182   LearningRate 0.1623   Epoch: 8   Global Step: 88510   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:37:19,984-Speed 5974.41 samples/sec   Loss 8.9445   LearningRate 0.1622   Epoch: 8   Global Step: 88520   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:37:26,867-Speed 5952.61 samples/sec   Loss 8.8899   LearningRate 0.1622   Epoch: 8   Global Step: 88530   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:37:33,742-Speed 5958.67 samples/sec   Loss 8.9649   LearningRate 0.1622   Epoch: 8   Global Step: 88540   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-01-08 13:37:40,580-Speed 5991.76 samples/sec   Loss 8.9161   LearningRate 0.1621   Epoch: 8   Global Step: 88550   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:37:47,449-Speed 5964.58 samples/sec   Loss 9.0259   LearningRate 0.1621   Epoch: 8   Global Step: 88560   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:37:54,320-Speed 5962.53 samples/sec   Loss 9.0078   LearningRate 0.1621   Epoch: 8   Global Step: 88570   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:38:01,174-Speed 5980.59 samples/sec   Loss 8.9144   LearningRate 0.1621   Epoch: 8   Global Step: 88580   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:38:08,038-Speed 5969.11 samples/sec   Loss 8.9212   LearningRate 0.1620   Epoch: 8   Global Step: 88590   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:38:14,899-Speed 5970.82 samples/sec   Loss 8.8849   LearningRate 0.1620   Epoch: 8   Global Step: 88600   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:38:21,776-Speed 5959.19 samples/sec   Loss 8.9159   LearningRate 0.1620   Epoch: 8   Global Step: 88610   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:38:28,652-Speed 5958.43 samples/sec   Loss 8.8900   LearningRate 0.1620   Epoch: 8   Global Step: 88620   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:38:35,591-Speed 5904.17 samples/sec   Loss 8.9791   LearningRate 0.1619   Epoch: 8   Global Step: 88630   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:38:42,446-Speed 5976.01 samples/sec   Loss 8.8470   LearningRate 0.1619   Epoch: 8   Global Step: 88640   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:38:49,315-Speed 5964.32 samples/sec   Loss 8.9011   LearningRate 0.1619   Epoch: 8   Global Step: 88650   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:38:56,173-Speed 5973.77 samples/sec   Loss 8.9565   LearningRate 0.1618   Epoch: 8   Global Step: 88660   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:39:03,021-Speed 5986.66 samples/sec   Loss 8.9569   LearningRate 0.1618   Epoch: 8   Global Step: 88670   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:39:10,341-Speed 5596.19 samples/sec   Loss 8.9810   LearningRate 0.1618   Epoch: 8   Global Step: 88680   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:39:17,181-Speed 5989.31 samples/sec   Loss 8.9875   LearningRate 0.1618   Epoch: 8   Global Step: 88690   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:39:24,029-Speed 5982.88 samples/sec   Loss 8.9104   LearningRate 0.1617   Epoch: 8   Global Step: 88700   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:39:30,873-Speed 5985.87 samples/sec   Loss 8.8650   LearningRate 0.1617   Epoch: 8   Global Step: 88710   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:39:37,745-Speed 5960.55 samples/sec   Loss 8.8801   LearningRate 0.1617   Epoch: 8   Global Step: 88720   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:39:44,639-Speed 5943.47 samples/sec   Loss 8.9347   LearningRate 0.1617   Epoch: 8   Global Step: 88730   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:39:51,497-Speed 5973.58 samples/sec   Loss 8.9274   LearningRate 0.1616   Epoch: 8   Global Step: 88740   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:39:58,364-Speed 5964.82 samples/sec   Loss 8.9152   LearningRate 0.1616   Epoch: 8   Global Step: 88750   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-01-08 13:40:05,314-Speed 5895.04 samples/sec   Loss 9.0013   LearningRate 0.1616   Epoch: 8   Global Step: 88760   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-01-08 13:40:12,164-Speed 5984.81 samples/sec   Loss 8.9193   LearningRate 0.1615   Epoch: 8   Global Step: 88770   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:40:19,016-Speed 5978.35 samples/sec   Loss 8.8248   LearningRate 0.1615   Epoch: 8   Global Step: 88780   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:40:25,876-Speed 5974.37 samples/sec   Loss 8.8953   LearningRate 0.1615   Epoch: 8   Global Step: 88790   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:40:32,724-Speed 5983.19 samples/sec   Loss 8.8991   LearningRate 0.1615   Epoch: 8   Global Step: 88800   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:40:39,578-Speed 5976.86 samples/sec   Loss 8.9382   LearningRate 0.1614   Epoch: 8   Global Step: 88810   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:40:46,457-Speed 5954.85 samples/sec   Loss 8.9055   LearningRate 0.1614   Epoch: 8   Global Step: 88820   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:40:53,306-Speed 5982.32 samples/sec   Loss 8.9617   LearningRate 0.1614   Epoch: 8   Global Step: 88830   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:41:00,156-Speed 5980.64 samples/sec   Loss 8.9340   LearningRate 0.1614   Epoch: 8   Global Step: 88840   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:41:07,022-Speed 5966.45 samples/sec   Loss 8.8874   LearningRate 0.1613   Epoch: 8   Global Step: 88850   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:41:13,871-Speed 5982.81 samples/sec   Loss 8.9086   LearningRate 0.1613   Epoch: 8   Global Step: 88860   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:41:20,721-Speed 5980.03 samples/sec   Loss 8.7981   LearningRate 0.1613   Epoch: 8   Global Step: 88870   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:41:27,570-Speed 5981.90 samples/sec   Loss 8.9013   LearningRate 0.1612   Epoch: 8   Global Step: 88880   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:41:34,430-Speed 5972.36 samples/sec   Loss 8.8847   LearningRate 0.1612   Epoch: 8   Global Step: 88890   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:41:41,288-Speed 5973.86 samples/sec   Loss 8.8981   LearningRate 0.1612   Epoch: 8   Global Step: 88900   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:41:48,141-Speed 5978.23 samples/sec   Loss 8.9457   LearningRate 0.1612   Epoch: 8   Global Step: 88910   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:41:55,014-Speed 5960.32 samples/sec   Loss 8.9379   LearningRate 0.1611   Epoch: 8   Global Step: 88920   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:42:01,863-Speed 5981.86 samples/sec   Loss 8.9414   LearningRate 0.1611   Epoch: 8   Global Step: 88930   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:42:08,713-Speed 5982.57 samples/sec   Loss 8.8656   LearningRate 0.1611   Epoch: 8   Global Step: 88940   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:42:15,562-Speed 5981.57 samples/sec   Loss 8.9063   LearningRate 0.1611   Epoch: 8   Global Step: 88950   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:42:22,415-Speed 5978.48 samples/sec   Loss 8.9721   LearningRate 0.1610   Epoch: 8   Global Step: 88960   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:42:29,266-Speed 5979.95 samples/sec   Loss 8.8874   LearningRate 0.1610   Epoch: 8   Global Step: 88970   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:42:36,124-Speed 5977.22 samples/sec   Loss 8.8265   LearningRate 0.1610   Epoch: 8   Global Step: 88980   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:42:42,985-Speed 5970.06 samples/sec   Loss 8.8698   LearningRate 0.1609   Epoch: 8   Global Step: 88990   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:42:49,843-Speed 5974.21 samples/sec   Loss 8.9209   LearningRate 0.1609   Epoch: 8   Global Step: 89000   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:42:56,704-Speed 5971.39 samples/sec   Loss 8.9412   LearningRate 0.1609   Epoch: 8   Global Step: 89010   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:43:03,567-Speed 5968.73 samples/sec   Loss 8.8864   LearningRate 0.1609   Epoch: 8   Global Step: 89020   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:43:10,431-Speed 5968.07 samples/sec   Loss 8.8744   LearningRate 0.1608   Epoch: 8   Global Step: 89030   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-01-08 13:43:17,313-Speed 5955.99 samples/sec   Loss 8.9189   LearningRate 0.1608   Epoch: 8   Global Step: 89040   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-01-08 13:43:24,169-Speed 5975.04 samples/sec   Loss 8.8064   LearningRate 0.1608   Epoch: 8   Global Step: 89050   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:43:31,024-Speed 5976.44 samples/sec   Loss 8.9146   LearningRate 0.1608   Epoch: 8   Global Step: 89060   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:43:37,881-Speed 5974.83 samples/sec   Loss 8.9243   LearningRate 0.1607   Epoch: 8   Global Step: 89070   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:43:44,746-Speed 5970.13 samples/sec   Loss 8.8691   LearningRate 0.1607   Epoch: 8   Global Step: 89080   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:43:51,591-Speed 5985.39 samples/sec   Loss 8.8902   LearningRate 0.1607   Epoch: 8   Global Step: 89090   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:43:58,460-Speed 5964.55 samples/sec   Loss 8.8844   LearningRate 0.1606   Epoch: 8   Global Step: 89100   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:44:05,328-Speed 5964.77 samples/sec   Loss 8.8403   LearningRate 0.1606   Epoch: 8   Global Step: 89110   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:44:12,194-Speed 5966.49 samples/sec   Loss 8.8595   LearningRate 0.1606   Epoch: 8   Global Step: 89120   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:44:19,053-Speed 5973.24 samples/sec   Loss 8.8719   LearningRate 0.1606   Epoch: 8   Global Step: 89130   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:44:25,917-Speed 5968.42 samples/sec   Loss 8.9143   LearningRate 0.1605   Epoch: 8   Global Step: 89140   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:44:32,764-Speed 5983.46 samples/sec   Loss 8.8234   LearningRate 0.1605   Epoch: 8   Global Step: 89150   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:44:39,621-Speed 5975.36 samples/sec   Loss 8.9040   LearningRate 0.1605   Epoch: 8   Global Step: 89160   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:44:46,472-Speed 5979.10 samples/sec   Loss 8.8932   LearningRate 0.1605   Epoch: 8   Global Step: 89170   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:44:53,331-Speed 5973.02 samples/sec   Loss 8.8627   LearningRate 0.1604   Epoch: 8   Global Step: 89180   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:45:00,212-Speed 5953.64 samples/sec   Loss 8.8891   LearningRate 0.1604   Epoch: 8   Global Step: 89190   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:45:07,075-Speed 5968.79 samples/sec   Loss 8.8625   LearningRate 0.1604   Epoch: 8   Global Step: 89200   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:45:13,942-Speed 5966.30 samples/sec   Loss 8.7971   LearningRate 0.1603   Epoch: 8   Global Step: 89210   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:45:21,465-Speed 5446.01 samples/sec   Loss 8.8313   LearningRate 0.1603   Epoch: 8   Global Step: 89220   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:45:28,329-Speed 5968.28 samples/sec   Loss 8.8647   LearningRate 0.1603   Epoch: 8   Global Step: 89230   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:45:35,186-Speed 5974.60 samples/sec   Loss 8.9280   LearningRate 0.1603   Epoch: 8   Global Step: 89240   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:45:42,042-Speed 5975.56 samples/sec   Loss 8.7865   LearningRate 0.1602   Epoch: 8   Global Step: 89250   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:45:48,894-Speed 5978.52 samples/sec   Loss 8.9154   LearningRate 0.1602   Epoch: 8   Global Step: 89260   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:45:55,834-Speed 5903.71 samples/sec   Loss 8.9396   LearningRate 0.1602   Epoch: 8   Global Step: 89270   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:46:02,710-Speed 5958.58 samples/sec   Loss 8.8531   LearningRate 0.1602   Epoch: 8   Global Step: 89280   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:46:09,575-Speed 5969.76 samples/sec   Loss 8.8998   LearningRate 0.1601   Epoch: 8   Global Step: 89290   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:46:16,442-Speed 5965.43 samples/sec   Loss 8.8553   LearningRate 0.1601   Epoch: 8   Global Step: 89300   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:46:23,309-Speed 5965.91 samples/sec   Loss 8.8922   LearningRate 0.1601   Epoch: 8   Global Step: 89310   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:46:30,170-Speed 5971.26 samples/sec   Loss 8.8818   LearningRate 0.1600   Epoch: 8   Global Step: 89320   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:46:37,027-Speed 5974.32 samples/sec   Loss 8.9378   LearningRate 0.1600   Epoch: 8   Global Step: 89330   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:46:43,884-Speed 5974.84 samples/sec   Loss 8.9489   LearningRate 0.1600   Epoch: 8   Global Step: 89340   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:46:50,733-Speed 5981.70 samples/sec   Loss 8.8868   LearningRate 0.1600   Epoch: 8   Global Step: 89350   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:46:57,593-Speed 5972.26 samples/sec   Loss 8.9523   LearningRate 0.1599   Epoch: 8   Global Step: 89360   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:47:04,447-Speed 5978.79 samples/sec   Loss 8.9176   LearningRate 0.1599   Epoch: 8   Global Step: 89370   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:47:11,334-Speed 5948.39 samples/sec   Loss 8.8580   LearningRate 0.1599   Epoch: 8   Global Step: 89380   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:47:18,272-Speed 5904.39 samples/sec   Loss 8.8826   LearningRate 0.1599   Epoch: 8   Global Step: 89390   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:47:25,210-Speed 5906.23 samples/sec   Loss 8.8834   LearningRate 0.1598   Epoch: 8   Global Step: 89400   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:47:32,091-Speed 5954.08 samples/sec   Loss 8.8119   LearningRate 0.1598   Epoch: 8   Global Step: 89410   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:47:38,948-Speed 5975.17 samples/sec   Loss 8.8741   LearningRate 0.1598   Epoch: 8   Global Step: 89420   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:47:45,799-Speed 5979.50 samples/sec   Loss 8.8371   LearningRate 0.1597   Epoch: 8   Global Step: 89430   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:47:52,649-Speed 5980.58 samples/sec   Loss 8.8240   LearningRate 0.1597   Epoch: 8   Global Step: 89440   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:47:59,493-Speed 5987.84 samples/sec   Loss 8.9122   LearningRate 0.1597   Epoch: 8   Global Step: 89450   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:48:06,370-Speed 5957.18 samples/sec   Loss 8.8995   LearningRate 0.1597   Epoch: 8   Global Step: 89460   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:48:13,222-Speed 5978.93 samples/sec   Loss 8.7924   LearningRate 0.1596   Epoch: 8   Global Step: 89470   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:48:20,077-Speed 5976.82 samples/sec   Loss 8.8372   LearningRate 0.1596   Epoch: 8   Global Step: 89480   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:48:26,959-Speed 5953.35 samples/sec   Loss 8.8484   LearningRate 0.1596   Epoch: 8   Global Step: 89490   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:48:33,808-Speed 5981.29 samples/sec   Loss 8.8039   LearningRate 0.1596   Epoch: 8   Global Step: 89500   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:48:40,678-Speed 5963.19 samples/sec   Loss 8.9483   LearningRate 0.1595   Epoch: 8   Global Step: 89510   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:48:47,523-Speed 5984.67 samples/sec   Loss 8.8351   LearningRate 0.1595   Epoch: 8   Global Step: 89520   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:48:54,386-Speed 5969.93 samples/sec   Loss 8.8409   LearningRate 0.1595   Epoch: 8   Global Step: 89530   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:49:01,239-Speed 5978.10 samples/sec   Loss 8.8836   LearningRate 0.1595   Epoch: 8   Global Step: 89540   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:49:08,106-Speed 5966.22 samples/sec   Loss 8.9540   LearningRate 0.1594   Epoch: 8   Global Step: 89550   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:49:14,984-Speed 5956.13 samples/sec   Loss 8.8603   LearningRate 0.1594   Epoch: 8   Global Step: 89560   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:49:21,847-Speed 5972.11 samples/sec   Loss 8.8894   LearningRate 0.1594   Epoch: 8   Global Step: 89570   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:49:28,735-Speed 5947.02 samples/sec   Loss 8.8228   LearningRate 0.1593   Epoch: 8   Global Step: 89580   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:49:35,671-Speed 5906.79 samples/sec   Loss 8.7977   LearningRate 0.1593   Epoch: 8   Global Step: 89590   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:49:42,563-Speed 5945.08 samples/sec   Loss 8.8426   LearningRate 0.1593   Epoch: 8   Global Step: 89600   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:49:49,418-Speed 5976.27 samples/sec   Loss 8.8273   LearningRate 0.1593   Epoch: 8   Global Step: 89610   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:49:56,278-Speed 5971.31 samples/sec   Loss 8.8686   LearningRate 0.1592   Epoch: 8   Global Step: 89620   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:50:03,127-Speed 5981.91 samples/sec   Loss 8.8714   LearningRate 0.1592   Epoch: 8   Global Step: 89630   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:50:09,992-Speed 5967.96 samples/sec   Loss 8.9098   LearningRate 0.1592   Epoch: 8   Global Step: 89640   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:50:16,958-Speed 5881.08 samples/sec   Loss 8.8855   LearningRate 0.1592   Epoch: 8   Global Step: 89650   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:50:23,811-Speed 5978.29 samples/sec   Loss 8.9023   LearningRate 0.1591   Epoch: 8   Global Step: 89660   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:50:30,661-Speed 5980.28 samples/sec   Loss 8.8127   LearningRate 0.1591   Epoch: 8   Global Step: 89670   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:50:37,523-Speed 5970.20 samples/sec   Loss 8.8148   LearningRate 0.1591   Epoch: 8   Global Step: 89680   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:50:44,376-Speed 5979.04 samples/sec   Loss 8.8867   LearningRate 0.1590   Epoch: 8   Global Step: 89690   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:50:51,244-Speed 5966.10 samples/sec   Loss 8.8274   LearningRate 0.1590   Epoch: 8   Global Step: 89700   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:50:58,101-Speed 5973.84 samples/sec   Loss 8.8386   LearningRate 0.1590   Epoch: 8   Global Step: 89710   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:51:04,951-Speed 5981.23 samples/sec   Loss 8.9232   LearningRate 0.1590   Epoch: 8   Global Step: 89720   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:51:11,797-Speed 5983.91 samples/sec   Loss 8.8387   LearningRate 0.1589   Epoch: 8   Global Step: 89730   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:51:18,666-Speed 5964.29 samples/sec   Loss 8.8294   LearningRate 0.1589   Epoch: 8   Global Step: 89740   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:51:25,512-Speed 5984.42 samples/sec   Loss 8.8622   LearningRate 0.1589   Epoch: 8   Global Step: 89750   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:51:32,373-Speed 5972.46 samples/sec   Loss 8.8419   LearningRate 0.1589   Epoch: 8   Global Step: 89760   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:51:39,243-Speed 5965.92 samples/sec   Loss 8.8720   LearningRate 0.1588   Epoch: 8   Global Step: 89770   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:51:46,109-Speed 5967.10 samples/sec   Loss 8.8005   LearningRate 0.1588   Epoch: 8   Global Step: 89780   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:51:52,963-Speed 5977.29 samples/sec   Loss 8.8235   LearningRate 0.1588   Epoch: 8   Global Step: 89790   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:51:59,848-Speed 5949.72 samples/sec   Loss 8.8342   LearningRate 0.1587   Epoch: 8   Global Step: 89800   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:52:06,714-Speed 5967.48 samples/sec   Loss 8.8656   LearningRate 0.1587   Epoch: 8   Global Step: 89810   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:52:13,565-Speed 5979.63 samples/sec   Loss 8.8688   LearningRate 0.1587   Epoch: 8   Global Step: 89820   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:52:20,418-Speed 5978.08 samples/sec   Loss 8.8897   LearningRate 0.1587   Epoch: 8   Global Step: 89830   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:52:27,288-Speed 5964.06 samples/sec   Loss 8.8955   LearningRate 0.1586   Epoch: 8   Global Step: 89840   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:52:34,155-Speed 5966.28 samples/sec   Loss 8.8578   LearningRate 0.1586   Epoch: 8   Global Step: 89850   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:52:41,005-Speed 5979.99 samples/sec   Loss 8.8489   LearningRate 0.1586   Epoch: 8   Global Step: 89860   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:52:47,873-Speed 5965.23 samples/sec   Loss 8.8473   LearningRate 0.1586   Epoch: 8   Global Step: 89870   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:52:54,748-Speed 5961.94 samples/sec   Loss 8.8437   LearningRate 0.1585   Epoch: 8   Global Step: 89880   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:53:01,603-Speed 5975.80 samples/sec   Loss 8.8072   LearningRate 0.1585   Epoch: 8   Global Step: 89890   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:53:08,496-Speed 5943.23 samples/sec   Loss 8.7818   LearningRate 0.1585   Epoch: 8   Global Step: 89900   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:53:15,388-Speed 5946.32 samples/sec   Loss 8.8597   LearningRate 0.1585   Epoch: 8   Global Step: 89910   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:53:22,274-Speed 5949.40 samples/sec   Loss 8.7564   LearningRate 0.1584   Epoch: 8   Global Step: 89920   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:53:29,200-Speed 5915.28 samples/sec   Loss 8.8522   LearningRate 0.1584   Epoch: 8   Global Step: 89930   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:53:36,064-Speed 5969.89 samples/sec   Loss 8.8964   LearningRate 0.1584   Epoch: 8   Global Step: 89940   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:53:42,916-Speed 5979.66 samples/sec   Loss 8.8028   LearningRate 0.1583   Epoch: 8   Global Step: 89950   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:53:49,782-Speed 5966.33 samples/sec   Loss 8.8560   LearningRate 0.1583   Epoch: 8   Global Step: 89960   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:53:56,632-Speed 5983.87 samples/sec   Loss 8.8233   LearningRate 0.1583   Epoch: 8   Global Step: 89970   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:54:03,478-Speed 5983.63 samples/sec   Loss 8.8749   LearningRate 0.1583   Epoch: 8   Global Step: 89980   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:54:10,390-Speed 5927.36 samples/sec   Loss 8.7826   LearningRate 0.1582   Epoch: 8   Global Step: 89990   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:54:17,238-Speed 5982.02 samples/sec   Loss 8.7688   LearningRate 0.1582   Epoch: 8   Global Step: 90000   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:54:44,267-[lfw][90000]XNorm: 24.866661
Training: 2022-01-08 13:54:44,268-[lfw][90000]Accuracy-Flip: 0.99733+-0.00291
Training: 2022-01-08 13:54:44,269-[lfw][90000]Accuracy-Highest: 0.99750
Training: 2022-01-08 13:55:15,331-[cfp_fp][90000]XNorm: 21.526887
Training: 2022-01-08 13:55:15,332-[cfp_fp][90000]Accuracy-Flip: 0.97757+-0.01110
Training: 2022-01-08 13:55:15,333-[cfp_fp][90000]Accuracy-Highest: 0.98114
Training: 2022-01-08 13:55:42,142-[agedb_30][90000]XNorm: 23.877042
Training: 2022-01-08 13:55:42,143-[agedb_30][90000]Accuracy-Flip: 0.96650+-0.00848
Training: 2022-01-08 13:55:42,144-[agedb_30][90000]Accuracy-Highest: 0.96883
Training: 2022-01-08 13:55:49,002-Speed 446.37 samples/sec   Loss 8.8322   LearningRate 0.1582   Epoch: 8   Global Step: 90010   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:55:55,843-Speed 5988.87 samples/sec   Loss 8.8285   LearningRate 0.1582   Epoch: 8   Global Step: 90020   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:56:02,697-Speed 5979.60 samples/sec   Loss 8.8735   LearningRate 0.1581   Epoch: 8   Global Step: 90030   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:56:09,570-Speed 5960.52 samples/sec   Loss 8.8437   LearningRate 0.1581   Epoch: 8   Global Step: 90040   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:56:16,428-Speed 5973.69 samples/sec   Loss 8.7870   LearningRate 0.1581   Epoch: 8   Global Step: 90050   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:56:23,348-Speed 5920.78 samples/sec   Loss 8.8299   LearningRate 0.1580   Epoch: 8   Global Step: 90060   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:56:30,241-Speed 5943.83 samples/sec   Loss 8.8102   LearningRate 0.1580   Epoch: 8   Global Step: 90070   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:56:37,134-Speed 5942.92 samples/sec   Loss 8.7856   LearningRate 0.1580   Epoch: 8   Global Step: 90080   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:56:44,028-Speed 5943.08 samples/sec   Loss 8.8004   LearningRate 0.1580   Epoch: 8   Global Step: 90090   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:56:50,907-Speed 5954.75 samples/sec   Loss 8.7866   LearningRate 0.1579   Epoch: 8   Global Step: 90100   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:56:57,791-Speed 5951.05 samples/sec   Loss 8.8302   LearningRate 0.1579   Epoch: 8   Global Step: 90110   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:57:04,663-Speed 5963.18 samples/sec   Loss 8.8127   LearningRate 0.1579   Epoch: 8   Global Step: 90120   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:57:11,527-Speed 5968.29 samples/sec   Loss 8.7702   LearningRate 0.1579   Epoch: 8   Global Step: 90130   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:57:18,533-Speed 5847.38 samples/sec   Loss 8.7165   LearningRate 0.1578   Epoch: 8   Global Step: 90140   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:57:25,374-Speed 5989.64 samples/sec   Loss 8.8046   LearningRate 0.1578   Epoch: 8   Global Step: 90150   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 13:57:32,311-Speed 5905.60 samples/sec   Loss 8.8150   LearningRate 0.1578   Epoch: 8   Global Step: 90160   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 13:57:39,267-Speed 5889.98 samples/sec   Loss 8.8387   LearningRate 0.1578   Epoch: 8   Global Step: 90170   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 13:57:46,262-Speed 5857.36 samples/sec   Loss 8.7182   LearningRate 0.1577   Epoch: 8   Global Step: 90180   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 13:57:53,192-Speed 5911.62 samples/sec   Loss 8.8097   LearningRate 0.1577   Epoch: 8   Global Step: 90190   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 13:58:00,129-Speed 5905.51 samples/sec   Loss 8.7723   LearningRate 0.1577   Epoch: 8   Global Step: 90200   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 13:58:06,979-Speed 5980.76 samples/sec   Loss 8.7812   LearningRate 0.1576   Epoch: 8   Global Step: 90210   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 13:58:13,838-Speed 5973.28 samples/sec   Loss 8.7902   LearningRate 0.1576   Epoch: 8   Global Step: 90220   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 13:58:20,715-Speed 5957.13 samples/sec   Loss 8.8311   LearningRate 0.1576   Epoch: 8   Global Step: 90230   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 13:58:27,584-Speed 5965.00 samples/sec   Loss 8.7915   LearningRate 0.1576   Epoch: 8   Global Step: 90240   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 13:58:34,470-Speed 5949.64 samples/sec   Loss 8.8638   LearningRate 0.1575   Epoch: 8   Global Step: 90250   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:58:41,379-Speed 5929.70 samples/sec   Loss 8.8211   LearningRate 0.1575   Epoch: 8   Global Step: 90260   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:58:48,238-Speed 5972.90 samples/sec   Loss 8.7821   LearningRate 0.1575   Epoch: 8   Global Step: 90270   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:58:55,109-Speed 5962.95 samples/sec   Loss 8.7424   LearningRate 0.1575   Epoch: 8   Global Step: 90280   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:59:01,983-Speed 5959.72 samples/sec   Loss 8.7903   LearningRate 0.1574   Epoch: 8   Global Step: 90290   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:59:08,833-Speed 5980.76 samples/sec   Loss 8.7778   LearningRate 0.1574   Epoch: 8   Global Step: 90300   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:59:15,696-Speed 5969.15 samples/sec   Loss 8.8553   LearningRate 0.1574   Epoch: 8   Global Step: 90310   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:59:22,552-Speed 5975.59 samples/sec   Loss 8.8228   LearningRate 0.1573   Epoch: 8   Global Step: 90320   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:59:29,405-Speed 5978.66 samples/sec   Loss 8.8579   LearningRate 0.1573   Epoch: 8   Global Step: 90330   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:59:36,275-Speed 5963.35 samples/sec   Loss 8.8069   LearningRate 0.1573   Epoch: 8   Global Step: 90340   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 13:59:43,138-Speed 5969.50 samples/sec   Loss 8.8342   LearningRate 0.1573   Epoch: 8   Global Step: 90350   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:59:49,999-Speed 5970.95 samples/sec   Loss 8.7974   LearningRate 0.1572   Epoch: 8   Global Step: 90360   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 13:59:56,864-Speed 5967.62 samples/sec   Loss 8.8307   LearningRate 0.1572   Epoch: 8   Global Step: 90370   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:00:03,810-Speed 5897.87 samples/sec   Loss 8.7767   LearningRate 0.1572   Epoch: 8   Global Step: 90380   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:00:10,663-Speed 5977.67 samples/sec   Loss 8.8424   LearningRate 0.1572   Epoch: 8   Global Step: 90390   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:00:17,518-Speed 5976.72 samples/sec   Loss 8.7851   LearningRate 0.1571   Epoch: 8   Global Step: 90400   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:00:24,380-Speed 5969.42 samples/sec   Loss 8.7581   LearningRate 0.1571   Epoch: 8   Global Step: 90410   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:00:31,233-Speed 5978.07 samples/sec   Loss 8.8125   LearningRate 0.1571   Epoch: 8   Global Step: 90420   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:00:38,113-Speed 5955.12 samples/sec   Loss 8.7776   LearningRate 0.1571   Epoch: 8   Global Step: 90430   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:00:44,968-Speed 5976.37 samples/sec   Loss 8.8421   LearningRate 0.1570   Epoch: 8   Global Step: 90440   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:00:51,870-Speed 5935.60 samples/sec   Loss 8.7963   LearningRate 0.1570   Epoch: 8   Global Step: 90450   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-01-08 14:00:58,736-Speed 5967.17 samples/sec   Loss 8.7915   LearningRate 0.1570   Epoch: 8   Global Step: 90460   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-01-08 14:01:05,595-Speed 5972.78 samples/sec   Loss 8.7540   LearningRate 0.1569   Epoch: 8   Global Step: 90470   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-01-08 14:01:12,455-Speed 5972.10 samples/sec   Loss 8.6955   LearningRate 0.1569   Epoch: 8   Global Step: 90480   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-01-08 14:01:19,320-Speed 5967.12 samples/sec   Loss 8.7677   LearningRate 0.1569   Epoch: 8   Global Step: 90490   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:01:26,175-Speed 5975.95 samples/sec   Loss 8.8275   LearningRate 0.1569   Epoch: 8   Global Step: 90500   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:01:35,735-Speed 5976.44 samples/sec   Loss 8.8126   LearningRate 0.1568   Epoch: 8   Global Step: 90510   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:01:42,583-Speed 5981.77 samples/sec   Loss 8.8110   LearningRate 0.1568   Epoch: 8   Global Step: 90520   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:01:49,449-Speed 5967.51 samples/sec   Loss 8.8310   LearningRate 0.1568   Epoch: 8   Global Step: 90530   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:01:56,331-Speed 5952.64 samples/sec   Loss 8.7948   LearningRate 0.1568   Epoch: 8   Global Step: 90540   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:02:03,228-Speed 5940.28 samples/sec   Loss 8.8155   LearningRate 0.1567   Epoch: 8   Global Step: 90550   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:02:10,089-Speed 5971.72 samples/sec   Loss 8.7384   LearningRate 0.1567   Epoch: 8   Global Step: 90560   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:02:16,982-Speed 5943.71 samples/sec   Loss 8.8435   LearningRate 0.1567   Epoch: 8   Global Step: 90570   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:02:23,855-Speed 5960.36 samples/sec   Loss 8.8408   LearningRate 0.1566   Epoch: 8   Global Step: 90580   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:02:30,740-Speed 5950.59 samples/sec   Loss 8.7715   LearningRate 0.1566   Epoch: 8   Global Step: 90590   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-01-08 14:02:37,642-Speed 5937.15 samples/sec   Loss 8.8306   LearningRate 0.1566   Epoch: 8   Global Step: 90600   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-01-08 14:02:44,505-Speed 5969.18 samples/sec   Loss 8.8359   LearningRate 0.1566   Epoch: 8   Global Step: 90610   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-01-08 14:02:51,373-Speed 5965.45 samples/sec   Loss 8.7444   LearningRate 0.1565   Epoch: 8   Global Step: 90620   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-01-08 14:02:58,254-Speed 5953.51 samples/sec   Loss 8.8094   LearningRate 0.1565   Epoch: 8   Global Step: 90630   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-01-08 14:03:05,117-Speed 5969.47 samples/sec   Loss 8.7487   LearningRate 0.1565   Epoch: 8   Global Step: 90640   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:03:12,004-Speed 5948.99 samples/sec   Loss 8.7642   LearningRate 0.1565   Epoch: 8   Global Step: 90650   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:03:18,863-Speed 5972.76 samples/sec   Loss 8.8067   LearningRate 0.1564   Epoch: 8   Global Step: 90660   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:03:25,715-Speed 5978.57 samples/sec   Loss 8.8246   LearningRate 0.1564   Epoch: 8   Global Step: 90670   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:03:32,602-Speed 5949.04 samples/sec   Loss 8.7327   LearningRate 0.1564   Epoch: 8   Global Step: 90680   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:03:39,462-Speed 5971.74 samples/sec   Loss 8.7446   LearningRate 0.1564   Epoch: 8   Global Step: 90690   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:03:46,330-Speed 5965.23 samples/sec   Loss 8.7691   LearningRate 0.1563   Epoch: 8   Global Step: 90700   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:03:53,188-Speed 5973.35 samples/sec   Loss 8.7597   LearningRate 0.1563   Epoch: 8   Global Step: 90710   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:04:00,052-Speed 5968.66 samples/sec   Loss 8.7874   LearningRate 0.1563   Epoch: 8   Global Step: 90720   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:04:06,910-Speed 5973.99 samples/sec   Loss 8.7433   LearningRate 0.1562   Epoch: 8   Global Step: 90730   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:04:13,767-Speed 5974.61 samples/sec   Loss 8.7528   LearningRate 0.1562   Epoch: 8   Global Step: 90740   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-01-08 14:04:20,624-Speed 5974.72 samples/sec   Loss 8.8368   LearningRate 0.1562   Epoch: 8   Global Step: 90750   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-01-08 14:04:27,465-Speed 5988.00 samples/sec   Loss 8.7997   LearningRate 0.1562   Epoch: 8   Global Step: 90760   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:04:34,317-Speed 5981.51 samples/sec   Loss 8.7352   LearningRate 0.1561   Epoch: 8   Global Step: 90770   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:04:41,163-Speed 5983.99 samples/sec   Loss 8.7741   LearningRate 0.1561   Epoch: 8   Global Step: 90780   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:04:48,026-Speed 5968.76 samples/sec   Loss 8.7565   LearningRate 0.1561   Epoch: 8   Global Step: 90790   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:04:54,873-Speed 5983.48 samples/sec   Loss 8.7061   LearningRate 0.1561   Epoch: 8   Global Step: 90800   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:05:01,747-Speed 5963.50 samples/sec   Loss 8.7970   LearningRate 0.1560   Epoch: 8   Global Step: 90810   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:05:08,615-Speed 5964.59 samples/sec   Loss 8.7714   LearningRate 0.1560   Epoch: 8   Global Step: 90820   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:05:15,481-Speed 5966.81 samples/sec   Loss 8.7875   LearningRate 0.1560   Epoch: 8   Global Step: 90830   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:05:22,344-Speed 5970.35 samples/sec   Loss 8.7616   LearningRate 0.1560   Epoch: 8   Global Step: 90840   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:05:29,194-Speed 5980.53 samples/sec   Loss 8.7711   LearningRate 0.1559   Epoch: 8   Global Step: 90850   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:05:36,058-Speed 5968.40 samples/sec   Loss 8.7871   LearningRate 0.1559   Epoch: 8   Global Step: 90860   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-01-08 14:05:42,925-Speed 5968.60 samples/sec   Loss 8.7453   LearningRate 0.1559   Epoch: 8   Global Step: 90870   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-01-08 14:05:49,776-Speed 5979.73 samples/sec   Loss 8.7420   LearningRate 0.1558   Epoch: 8   Global Step: 90880   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:05:56,628-Speed 5978.62 samples/sec   Loss 8.7220   LearningRate 0.1558   Epoch: 8   Global Step: 90890   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:06:03,476-Speed 5983.28 samples/sec   Loss 8.7672   LearningRate 0.1558   Epoch: 8   Global Step: 90900   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:06:10,342-Speed 5965.92 samples/sec   Loss 8.7720   LearningRate 0.1558   Epoch: 8   Global Step: 90910   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:06:17,219-Speed 5959.86 samples/sec   Loss 8.7571   LearningRate 0.1557   Epoch: 8   Global Step: 90920   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:06:24,086-Speed 5967.79 samples/sec   Loss 8.8040   LearningRate 0.1557   Epoch: 8   Global Step: 90930   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:06:30,971-Speed 5950.48 samples/sec   Loss 8.7024   LearningRate 0.1557   Epoch: 8   Global Step: 90940   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:06:37,831-Speed 5971.84 samples/sec   Loss 8.8043   LearningRate 0.1557   Epoch: 8   Global Step: 90950   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:06:44,682-Speed 5980.27 samples/sec   Loss 8.7667   LearningRate 0.1556   Epoch: 8   Global Step: 90960   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:06:51,524-Speed 5986.59 samples/sec   Loss 8.6794   LearningRate 0.1556   Epoch: 8   Global Step: 90970   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:06:58,374-Speed 5981.57 samples/sec   Loss 8.7875   LearningRate 0.1556   Epoch: 8   Global Step: 90980   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-01-08 14:07:05,237-Speed 5969.21 samples/sec   Loss 8.8027   LearningRate 0.1556   Epoch: 8   Global Step: 90990   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:07:12,100-Speed 5969.16 samples/sec   Loss 8.6854   LearningRate 0.1555   Epoch: 8   Global Step: 91000   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:07:18,961-Speed 5971.56 samples/sec   Loss 8.8077   LearningRate 0.1555   Epoch: 8   Global Step: 91010   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:07:25,839-Speed 5956.55 samples/sec   Loss 8.7699   LearningRate 0.1555   Epoch: 8   Global Step: 91020   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:07:32,707-Speed 5965.29 samples/sec   Loss 8.7667   LearningRate 0.1554   Epoch: 8   Global Step: 91030   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:07:39,593-Speed 5952.04 samples/sec   Loss 8.8465   LearningRate 0.1554   Epoch: 8   Global Step: 91040   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:07:46,454-Speed 5973.79 samples/sec   Loss 8.7529   LearningRate 0.1554   Epoch: 8   Global Step: 91050   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:07:53,308-Speed 5976.45 samples/sec   Loss 8.7619   LearningRate 0.1554   Epoch: 8   Global Step: 91060   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:08:00,169-Speed 5973.57 samples/sec   Loss 8.7746   LearningRate 0.1553   Epoch: 8   Global Step: 91070   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:08:07,035-Speed 5967.04 samples/sec   Loss 8.8075   LearningRate 0.1553   Epoch: 8   Global Step: 91080   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:08:13,890-Speed 5976.44 samples/sec   Loss 8.7435   LearningRate 0.1553   Epoch: 8   Global Step: 91090   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-01-08 14:08:20,770-Speed 5954.81 samples/sec   Loss 8.6939   LearningRate 0.1553   Epoch: 8   Global Step: 91100   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:08:27,636-Speed 5966.06 samples/sec   Loss 8.7221   LearningRate 0.1552   Epoch: 8   Global Step: 91110   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:08:34,490-Speed 5978.08 samples/sec   Loss 8.7192   LearningRate 0.1552   Epoch: 8   Global Step: 91120   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:08:41,371-Speed 5952.89 samples/sec   Loss 8.7490   LearningRate 0.1552   Epoch: 8   Global Step: 91130   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:08:48,243-Speed 5961.73 samples/sec   Loss 8.7753   LearningRate 0.1552   Epoch: 8   Global Step: 91140   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:08:55,092-Speed 5981.53 samples/sec   Loss 8.7402   LearningRate 0.1551   Epoch: 8   Global Step: 91150   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:09:01,953-Speed 5972.06 samples/sec   Loss 8.7855   LearningRate 0.1551   Epoch: 8   Global Step: 91160   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:09:08,785-Speed 5996.62 samples/sec   Loss 8.7488   LearningRate 0.1551   Epoch: 8   Global Step: 91170   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 14:09:15,685-Speed 5937.30 samples/sec   Loss 8.8082   LearningRate 0.1550   Epoch: 8   Global Step: 91180   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 14:09:22,544-Speed 5972.50 samples/sec   Loss 8.7820   LearningRate 0.1550   Epoch: 8   Global Step: 91190   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 14:09:29,405-Speed 5970.92 samples/sec   Loss 8.6945   LearningRate 0.1550   Epoch: 8   Global Step: 91200   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 14:09:36,251-Speed 5984.82 samples/sec   Loss 8.7543   LearningRate 0.1550   Epoch: 8   Global Step: 91210   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 14:09:43,099-Speed 5981.83 samples/sec   Loss 8.7611   LearningRate 0.1549   Epoch: 8   Global Step: 91220   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 14:09:49,980-Speed 5953.17 samples/sec   Loss 8.7171   LearningRate 0.1549   Epoch: 8   Global Step: 91230   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 14:09:56,832-Speed 5979.21 samples/sec   Loss 8.6918   LearningRate 0.1549   Epoch: 8   Global Step: 91240   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 14:10:03,679-Speed 5983.83 samples/sec   Loss 8.7371   LearningRate 0.1549   Epoch: 8   Global Step: 91250   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 14:10:10,536-Speed 5974.80 samples/sec   Loss 8.7278   LearningRate 0.1548   Epoch: 8   Global Step: 91260   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 14:10:17,398-Speed 5970.34 samples/sec   Loss 8.6512   LearningRate 0.1548   Epoch: 8   Global Step: 91270   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:10:24,244-Speed 5984.43 samples/sec   Loss 8.7994   LearningRate 0.1548   Epoch: 8   Global Step: 91280   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:10:31,087-Speed 5986.99 samples/sec   Loss 8.7026   LearningRate 0.1548   Epoch: 8   Global Step: 91290   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:10:37,952-Speed 5967.93 samples/sec   Loss 8.7431   LearningRate 0.1547   Epoch: 8   Global Step: 91300   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:10:44,798-Speed 5984.09 samples/sec   Loss 8.7323   LearningRate 0.1547   Epoch: 8   Global Step: 91310   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:10:51,645-Speed 5985.91 samples/sec   Loss 8.7379   LearningRate 0.1547   Epoch: 8   Global Step: 91320   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:10:58,502-Speed 5973.96 samples/sec   Loss 8.7978   LearningRate 0.1546   Epoch: 8   Global Step: 91330   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:11:05,371-Speed 5964.54 samples/sec   Loss 8.7099   LearningRate 0.1546   Epoch: 8   Global Step: 91340   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:11:12,219-Speed 5982.80 samples/sec   Loss 8.7564   LearningRate 0.1546   Epoch: 8   Global Step: 91350   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:11:19,081-Speed 5970.10 samples/sec   Loss 8.7462   LearningRate 0.1546   Epoch: 8   Global Step: 91360   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:11:25,950-Speed 5964.12 samples/sec   Loss 8.7125   LearningRate 0.1545   Epoch: 8   Global Step: 91370   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:11:32,828-Speed 5956.93 samples/sec   Loss 8.7315   LearningRate 0.1545   Epoch: 8   Global Step: 91380   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:11:39,682-Speed 5976.46 samples/sec   Loss 8.7329   LearningRate 0.1545   Epoch: 8   Global Step: 91390   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:11:46,544-Speed 5970.33 samples/sec   Loss 8.6834   LearningRate 0.1545   Epoch: 8   Global Step: 91400   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:11:53,393-Speed 5983.40 samples/sec   Loss 8.6628   LearningRate 0.1544   Epoch: 8   Global Step: 91410   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:12:00,250-Speed 5974.39 samples/sec   Loss 8.7249   LearningRate 0.1544   Epoch: 8   Global Step: 91420   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:12:07,131-Speed 5953.81 samples/sec   Loss 8.7299   LearningRate 0.1544   Epoch: 8   Global Step: 91430   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:12:14,000-Speed 5964.74 samples/sec   Loss 8.6337   LearningRate 0.1544   Epoch: 8   Global Step: 91440   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:12:20,867-Speed 5965.26 samples/sec   Loss 8.7234   LearningRate 0.1543   Epoch: 8   Global Step: 91450   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:12:27,737-Speed 5964.16 samples/sec   Loss 8.6975   LearningRate 0.1543   Epoch: 8   Global Step: 91460   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:12:34,596-Speed 5972.53 samples/sec   Loss 8.6844   LearningRate 0.1543   Epoch: 8   Global Step: 91470   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-01-08 14:12:41,454-Speed 5973.95 samples/sec   Loss 8.7350   LearningRate 0.1542   Epoch: 8   Global Step: 91480   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:12:48,304-Speed 5980.56 samples/sec   Loss 8.6877   LearningRate 0.1542   Epoch: 8   Global Step: 91490   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:12:55,150-Speed 5987.58 samples/sec   Loss 8.7238   LearningRate 0.1542   Epoch: 8   Global Step: 91500   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:13:02,021-Speed 5962.57 samples/sec   Loss 8.7496   LearningRate 0.1542   Epoch: 8   Global Step: 91510   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:13:08,888-Speed 5967.57 samples/sec   Loss 8.7354   LearningRate 0.1541   Epoch: 8   Global Step: 91520   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:13:15,750-Speed 5970.38 samples/sec   Loss 8.7150   LearningRate 0.1541   Epoch: 8   Global Step: 91530   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:13:22,608-Speed 5973.43 samples/sec   Loss 8.7362   LearningRate 0.1541   Epoch: 8   Global Step: 91540   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:13:29,475-Speed 5965.80 samples/sec   Loss 8.7260   LearningRate 0.1541   Epoch: 8   Global Step: 91550   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:13:36,334-Speed 5973.88 samples/sec   Loss 8.7559   LearningRate 0.1540   Epoch: 8   Global Step: 91560   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:13:43,187-Speed 5977.89 samples/sec   Loss 8.8022   LearningRate 0.1540   Epoch: 8   Global Step: 91570   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:13:50,052-Speed 5967.52 samples/sec   Loss 8.7596   LearningRate 0.1540   Epoch: 8   Global Step: 91580   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-01-08 14:13:56,924-Speed 5961.28 samples/sec   Loss 8.6740   LearningRate 0.1540   Epoch: 8   Global Step: 91590   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:14:03,783-Speed 5973.15 samples/sec   Loss 8.6914   LearningRate 0.1539   Epoch: 8   Global Step: 91600   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:14:10,648-Speed 5969.32 samples/sec   Loss 8.6591   LearningRate 0.1539   Epoch: 8   Global Step: 91610   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:14:17,519-Speed 5962.74 samples/sec   Loss 8.6839   LearningRate 0.1539   Epoch: 8   Global Step: 91620   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:14:24,384-Speed 5967.28 samples/sec   Loss 8.7239   LearningRate 0.1538   Epoch: 8   Global Step: 91630   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:14:31,257-Speed 5960.34 samples/sec   Loss 8.6736   LearningRate 0.1538   Epoch: 8   Global Step: 91640   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:14:38,121-Speed 5972.25 samples/sec   Loss 8.6788   LearningRate 0.1538   Epoch: 8   Global Step: 91650   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:14:44,965-Speed 5985.37 samples/sec   Loss 8.6665   LearningRate 0.1538   Epoch: 8   Global Step: 91660   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:14:51,881-Speed 5923.12 samples/sec   Loss 8.7218   LearningRate 0.1537   Epoch: 8   Global Step: 91670   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:14:58,726-Speed 5984.77 samples/sec   Loss 8.6720   LearningRate 0.1537   Epoch: 8   Global Step: 91680   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:15:05,587-Speed 5971.41 samples/sec   Loss 8.6858   LearningRate 0.1537   Epoch: 8   Global Step: 91690   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:15:12,442-Speed 5976.37 samples/sec   Loss 8.6817   LearningRate 0.1537   Epoch: 8   Global Step: 91700   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:15:19,296-Speed 5976.90 samples/sec   Loss 8.7448   LearningRate 0.1536   Epoch: 8   Global Step: 91710   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:15:26,182-Speed 5949.19 samples/sec   Loss 8.7116   LearningRate 0.1536   Epoch: 8   Global Step: 91720   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:15:33,034-Speed 5979.76 samples/sec   Loss 8.6965   LearningRate 0.1536   Epoch: 8   Global Step: 91730   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:15:39,903-Speed 5964.36 samples/sec   Loss 8.7323   LearningRate 0.1536   Epoch: 8   Global Step: 91740   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:15:46,773-Speed 5963.16 samples/sec   Loss 8.6852   LearningRate 0.1535   Epoch: 8   Global Step: 91750   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:15:53,639-Speed 5966.74 samples/sec   Loss 8.6404   LearningRate 0.1535   Epoch: 8   Global Step: 91760   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:16:00,518-Speed 5955.97 samples/sec   Loss 8.7037   LearningRate 0.1535   Epoch: 8   Global Step: 91770   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:16:07,375-Speed 5974.22 samples/sec   Loss 8.6228   LearningRate 0.1534   Epoch: 8   Global Step: 91780   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:16:14,232-Speed 5974.26 samples/sec   Loss 8.6404   LearningRate 0.1534   Epoch: 8   Global Step: 91790   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:16:21,079-Speed 5983.90 samples/sec   Loss 8.6540   LearningRate 0.1534   Epoch: 8   Global Step: 91800   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:16:27,941-Speed 5969.70 samples/sec   Loss 8.6293   LearningRate 0.1534   Epoch: 8   Global Step: 91810   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:16:34,807-Speed 5966.52 samples/sec   Loss 8.6406   LearningRate 0.1533   Epoch: 8   Global Step: 91820   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:16:41,669-Speed 5969.94 samples/sec   Loss 8.6968   LearningRate 0.1533   Epoch: 8   Global Step: 91830   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:16:48,530-Speed 5970.97 samples/sec   Loss 8.6926   LearningRate 0.1533   Epoch: 8   Global Step: 91840   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:16:55,397-Speed 5966.32 samples/sec   Loss 8.7252   LearningRate 0.1533   Epoch: 8   Global Step: 91850   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:17:02,261-Speed 5968.15 samples/sec   Loss 8.6732   LearningRate 0.1532   Epoch: 8   Global Step: 91860   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:17:09,112-Speed 5979.53 samples/sec   Loss 8.6543   LearningRate 0.1532   Epoch: 8   Global Step: 91870   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:17:15,958-Speed 5983.96 samples/sec   Loss 8.6015   LearningRate 0.1532   Epoch: 8   Global Step: 91880   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:17:22,800-Speed 5987.91 samples/sec   Loss 8.6925   LearningRate 0.1532   Epoch: 8   Global Step: 91890   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:17:29,646-Speed 5984.00 samples/sec   Loss 8.7021   LearningRate 0.1531   Epoch: 8   Global Step: 91900   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:17:36,497-Speed 5980.11 samples/sec   Loss 8.7745   LearningRate 0.1531   Epoch: 8   Global Step: 91910   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:17:43,360-Speed 5969.10 samples/sec   Loss 8.6995   LearningRate 0.1531   Epoch: 8   Global Step: 91920   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:17:50,213-Speed 5977.70 samples/sec   Loss 8.7094   LearningRate 0.1530   Epoch: 8   Global Step: 91930   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:17:57,065-Speed 5978.48 samples/sec   Loss 8.6336   LearningRate 0.1530   Epoch: 8   Global Step: 91940   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:18:03,939-Speed 5960.02 samples/sec   Loss 8.6409   LearningRate 0.1530   Epoch: 8   Global Step: 91950   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:18:10,787-Speed 5981.75 samples/sec   Loss 8.6769   LearningRate 0.1530   Epoch: 8   Global Step: 91960   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:18:17,661-Speed 5960.42 samples/sec   Loss 8.6893   LearningRate 0.1529   Epoch: 8   Global Step: 91970   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:18:24,518-Speed 5973.95 samples/sec   Loss 8.7155   LearningRate 0.1529   Epoch: 8   Global Step: 91980   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:18:31,376-Speed 5974.35 samples/sec   Loss 8.6404   LearningRate 0.1529   Epoch: 8   Global Step: 91990   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:18:38,248-Speed 5963.26 samples/sec   Loss 8.6834   LearningRate 0.1529   Epoch: 8   Global Step: 92000   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:18:45,127-Speed 5955.47 samples/sec   Loss 8.6923   LearningRate 0.1528   Epoch: 8   Global Step: 92010   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:18:52,003-Speed 5957.81 samples/sec   Loss 8.6766   LearningRate 0.1528   Epoch: 8   Global Step: 92020   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:18:58,875-Speed 5962.02 samples/sec   Loss 8.7165   LearningRate 0.1528   Epoch: 8   Global Step: 92030   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:19:05,734-Speed 5973.82 samples/sec   Loss 8.7094   LearningRate 0.1528   Epoch: 8   Global Step: 92040   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:19:12,617-Speed 5952.18 samples/sec   Loss 8.6178   LearningRate 0.1527   Epoch: 8   Global Step: 92050   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:19:19,484-Speed 5966.03 samples/sec   Loss 8.6893   LearningRate 0.1527   Epoch: 8   Global Step: 92060   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:19:26,331-Speed 5983.15 samples/sec   Loss 8.6357   LearningRate 0.1527   Epoch: 8   Global Step: 92070   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:19:33,211-Speed 5954.80 samples/sec   Loss 8.6496   LearningRate 0.1527   Epoch: 8   Global Step: 92080   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:19:40,085-Speed 5959.94 samples/sec   Loss 8.6777   LearningRate 0.1526   Epoch: 8   Global Step: 92090   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-01-08 14:19:46,940-Speed 5976.00 samples/sec   Loss 8.6432   LearningRate 0.1526   Epoch: 8   Global Step: 92100   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-01-08 14:19:53,801-Speed 5971.44 samples/sec   Loss 8.6499   LearningRate 0.1526   Epoch: 8   Global Step: 92110   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-01-08 14:20:00,649-Speed 5982.43 samples/sec   Loss 8.7233   LearningRate 0.1525   Epoch: 8   Global Step: 92120   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-01-08 14:20:07,505-Speed 5974.78 samples/sec   Loss 8.6755   LearningRate 0.1525   Epoch: 8   Global Step: 92130   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-01-08 14:20:14,380-Speed 5958.93 samples/sec   Loss 8.6844   LearningRate 0.1525   Epoch: 8   Global Step: 92140   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-01-08 14:20:21,260-Speed 5955.44 samples/sec   Loss 8.6387   LearningRate 0.1525   Epoch: 8   Global Step: 92150   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:20:28,124-Speed 5968.89 samples/sec   Loss 8.7214   LearningRate 0.1524   Epoch: 8   Global Step: 92160   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:20:34,987-Speed 5969.25 samples/sec   Loss 8.7215   LearningRate 0.1524   Epoch: 8   Global Step: 92170   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:20:41,838-Speed 5979.25 samples/sec   Loss 8.6413   LearningRate 0.1524   Epoch: 8   Global Step: 92180   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:20:48,698-Speed 5972.73 samples/sec   Loss 8.6019   LearningRate 0.1524   Epoch: 8   Global Step: 92190   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:20:55,572-Speed 5959.18 samples/sec   Loss 8.6799   LearningRate 0.1523   Epoch: 8   Global Step: 92200   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:21:02,442-Speed 5964.04 samples/sec   Loss 8.6838   LearningRate 0.1523   Epoch: 8   Global Step: 92210   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:21:09,314-Speed 5961.90 samples/sec   Loss 8.6041   LearningRate 0.1523   Epoch: 8   Global Step: 92220   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:21:16,161-Speed 5983.55 samples/sec   Loss 8.6918   LearningRate 0.1523   Epoch: 8   Global Step: 92230   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:21:23,019-Speed 5973.92 samples/sec   Loss 8.6391   LearningRate 0.1522   Epoch: 8   Global Step: 92240   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:21:29,877-Speed 5974.16 samples/sec   Loss 8.6723   LearningRate 0.1522   Epoch: 8   Global Step: 92250   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-01-08 14:21:36,721-Speed 5985.33 samples/sec   Loss 8.6222   LearningRate 0.1522   Epoch: 8   Global Step: 92260   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-01-08 14:21:43,572-Speed 5979.89 samples/sec   Loss 8.7579   LearningRate 0.1521   Epoch: 8   Global Step: 92270   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:21:50,452-Speed 5954.04 samples/sec   Loss 8.6978   LearningRate 0.1521   Epoch: 8   Global Step: 92280   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:21:57,310-Speed 5974.12 samples/sec   Loss 8.5962   LearningRate 0.1521   Epoch: 8   Global Step: 92290   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:22:04,182-Speed 5961.01 samples/sec   Loss 8.6437   LearningRate 0.1521   Epoch: 8   Global Step: 92300   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:22:11,061-Speed 5955.52 samples/sec   Loss 8.6621   LearningRate 0.1520   Epoch: 8   Global Step: 92310   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:22:17,958-Speed 5940.12 samples/sec   Loss 8.6956   LearningRate 0.1520   Epoch: 8   Global Step: 92320   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:22:24,822-Speed 5968.82 samples/sec   Loss 8.6270   LearningRate 0.1520   Epoch: 8   Global Step: 92330   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:22:31,666-Speed 5985.82 samples/sec   Loss 8.6053   LearningRate 0.1520   Epoch: 8   Global Step: 92340   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:22:38,548-Speed 5952.47 samples/sec   Loss 8.6238   LearningRate 0.1519   Epoch: 8   Global Step: 92350   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:22:45,395-Speed 5984.26 samples/sec   Loss 8.6701   LearningRate 0.1519   Epoch: 8   Global Step: 92360   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:22:52,254-Speed 5973.23 samples/sec   Loss 8.6481   LearningRate 0.1519   Epoch: 8   Global Step: 92370   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:22:59,135-Speed 5953.19 samples/sec   Loss 8.6283   LearningRate 0.1519   Epoch: 8   Global Step: 92380   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:23:05,983-Speed 5982.64 samples/sec   Loss 8.6727   LearningRate 0.1518   Epoch: 8   Global Step: 92390   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:23:12,874-Speed 5945.07 samples/sec   Loss 8.6602   LearningRate 0.1518   Epoch: 8   Global Step: 92400   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:23:19,729-Speed 5976.32 samples/sec   Loss 8.6816   LearningRate 0.1518   Epoch: 8   Global Step: 92410   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:23:26,610-Speed 5953.97 samples/sec   Loss 8.6619   LearningRate 0.1518   Epoch: 8   Global Step: 92420   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:23:33,473-Speed 5971.80 samples/sec   Loss 8.6146   LearningRate 0.1517   Epoch: 8   Global Step: 92430   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:23:40,317-Speed 5984.81 samples/sec   Loss 8.5019   LearningRate 0.1517   Epoch: 8   Global Step: 92440   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:23:47,177-Speed 5973.88 samples/sec   Loss 8.6827   LearningRate 0.1517   Epoch: 8   Global Step: 92450   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:23:54,031-Speed 5977.14 samples/sec   Loss 8.6909   LearningRate 0.1516   Epoch: 8   Global Step: 92460   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:24:00,901-Speed 5963.64 samples/sec   Loss 8.6716   LearningRate 0.1516   Epoch: 8   Global Step: 92470   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:24:07,773-Speed 5961.25 samples/sec   Loss 8.6681   LearningRate 0.1516   Epoch: 8   Global Step: 92480   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:24:14,637-Speed 5972.11 samples/sec   Loss 8.6608   LearningRate 0.1516   Epoch: 8   Global Step: 92490   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-01-08 14:24:21,483-Speed 5983.95 samples/sec   Loss 8.6268   LearningRate 0.1515   Epoch: 8   Global Step: 92500   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:24:28,354-Speed 5962.66 samples/sec   Loss 8.6471   LearningRate 0.1515   Epoch: 8   Global Step: 92510   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 14:24:35,196-Speed 5987.52 samples/sec   Loss 8.6829   LearningRate 0.1515   Epoch: 8   Global Step: 92520   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 14:24:42,102-Speed 5932.49 samples/sec   Loss 8.6276   LearningRate 0.1515   Epoch: 8   Global Step: 92530   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 14:24:49,020-Speed 5922.18 samples/sec   Loss 8.7300   LearningRate 0.1514   Epoch: 8   Global Step: 92540   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 14:24:55,881-Speed 5971.06 samples/sec   Loss 8.6137   LearningRate 0.1514   Epoch: 8   Global Step: 92550   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 14:25:02,735-Speed 5977.63 samples/sec   Loss 8.6129   LearningRate 0.1514   Epoch: 8   Global Step: 92560   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 14:25:09,593-Speed 5973.68 samples/sec   Loss 8.6814   LearningRate 0.1514   Epoch: 8   Global Step: 92570   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 14:25:16,455-Speed 5970.57 samples/sec   Loss 8.6388   LearningRate 0.1513   Epoch: 8   Global Step: 92580   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 14:25:23,331-Speed 5957.57 samples/sec   Loss 8.5286   LearningRate 0.1513   Epoch: 8   Global Step: 92590   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 14:25:30,205-Speed 5960.36 samples/sec   Loss 8.5870   LearningRate 0.1513   Epoch: 8   Global Step: 92600   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-01-08 14:25:37,072-Speed 5966.93 samples/sec   Loss 8.6627   LearningRate 0.1513   Epoch: 8   Global Step: 92610   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:25:43,929-Speed 5974.55 samples/sec   Loss 8.6787   LearningRate 0.1512   Epoch: 8   Global Step: 92620   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-01-08 14:25:50,801-Speed 5961.38 samples/sec   Loss 8.6827   LearningRate 0.1512   Epoch: 8   Global Step: 92630   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:25:57,699-Speed 5939.18 samples/sec   Loss 8.6082   LearningRate 0.1512   Epoch: 8   Global Step: 92640   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:26:04,629-Speed 5911.68 samples/sec   Loss 8.6374   LearningRate 0.1511   Epoch: 8   Global Step: 92650   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:26:11,560-Speed 5910.49 samples/sec   Loss 8.6986   LearningRate 0.1511   Epoch: 8   Global Step: 92660   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:26:18,483-Speed 5917.50 samples/sec   Loss 8.6325   LearningRate 0.1511   Epoch: 8   Global Step: 92670   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:26:25,409-Speed 5914.55 samples/sec   Loss 8.6980   LearningRate 0.1511   Epoch: 8   Global Step: 92680   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:26:32,340-Speed 5911.39 samples/sec   Loss 8.6094   LearningRate 0.1510   Epoch: 8   Global Step: 92690   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:26:39,210-Speed 5962.85 samples/sec   Loss 8.6171   LearningRate 0.1510   Epoch: 8   Global Step: 92700   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:26:46,061-Speed 5980.01 samples/sec   Loss 8.5834   LearningRate 0.1510   Epoch: 8   Global Step: 92710   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:26:52,924-Speed 5969.45 samples/sec   Loss 8.6277   LearningRate 0.1510   Epoch: 8   Global Step: 92720   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:26:59,832-Speed 5930.03 samples/sec   Loss 8.5973   LearningRate 0.1509   Epoch: 8   Global Step: 92730   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:27:06,747-Speed 5924.25 samples/sec   Loss 8.6158   LearningRate 0.1509   Epoch: 8   Global Step: 92740   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:27:13,606-Speed 5973.45 samples/sec   Loss 8.6008   LearningRate 0.1509   Epoch: 8   Global Step: 92750   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:27:20,462-Speed 5975.73 samples/sec   Loss 8.6375   LearningRate 0.1509   Epoch: 8   Global Step: 92760   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:27:27,327-Speed 5967.62 samples/sec   Loss 8.5733   LearningRate 0.1508   Epoch: 8   Global Step: 92770   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:27:34,192-Speed 5967.91 samples/sec   Loss 8.6267   LearningRate 0.1508   Epoch: 8   Global Step: 92780   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:27:41,073-Speed 5953.48 samples/sec   Loss 8.6769   LearningRate 0.1508   Epoch: 8   Global Step: 92790   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:27:47,956-Speed 5951.92 samples/sec   Loss 8.6366   LearningRate 0.1508   Epoch: 8   Global Step: 92800   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:27:54,834-Speed 5956.49 samples/sec   Loss 8.6460   LearningRate 0.1507   Epoch: 8   Global Step: 92810   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 14:28:01,709-Speed 5959.71 samples/sec   Loss 8.7022   LearningRate 0.1507   Epoch: 8   Global Step: 92820   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 14:28:08,563-Speed 5976.94 samples/sec   Loss 8.6539   LearningRate 0.1507   Epoch: 8   Global Step: 92830   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:28:15,414-Speed 5980.05 samples/sec   Loss 8.6353   LearningRate 0.1506   Epoch: 8   Global Step: 92840   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:28:22,276-Speed 5970.63 samples/sec   Loss 8.6022   LearningRate 0.1506   Epoch: 8   Global Step: 92850   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:28:29,160-Speed 5950.83 samples/sec   Loss 8.5930   LearningRate 0.1506   Epoch: 8   Global Step: 92860   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:28:36,014-Speed 5977.33 samples/sec   Loss 8.6232   LearningRate 0.1506   Epoch: 8   Global Step: 92870   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:28:42,890-Speed 5958.05 samples/sec   Loss 8.6098   LearningRate 0.1505   Epoch: 8   Global Step: 92880   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:28:49,749-Speed 5973.18 samples/sec   Loss 8.5442   LearningRate 0.1505   Epoch: 8   Global Step: 92890   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:28:56,634-Speed 5951.40 samples/sec   Loss 8.6919   LearningRate 0.1505   Epoch: 8   Global Step: 92900   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:29:03,494-Speed 5972.65 samples/sec   Loss 8.5724   LearningRate 0.1505   Epoch: 8   Global Step: 92910   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:29:10,343-Speed 5981.34 samples/sec   Loss 8.6986   LearningRate 0.1504   Epoch: 8   Global Step: 92920   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:29:17,219-Speed 5958.72 samples/sec   Loss 8.5772   LearningRate 0.1504   Epoch: 8   Global Step: 92930   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 14:29:24,066-Speed 5982.47 samples/sec   Loss 8.5965   LearningRate 0.1504   Epoch: 8   Global Step: 92940   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:29:30,927-Speed 5970.74 samples/sec   Loss 8.5502   LearningRate 0.1504   Epoch: 8   Global Step: 92950   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:29:37,784-Speed 5974.20 samples/sec   Loss 8.5456   LearningRate 0.1503   Epoch: 8   Global Step: 92960   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:29:44,641-Speed 5975.21 samples/sec   Loss 8.5947   LearningRate 0.1503   Epoch: 8   Global Step: 92970   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:29:51,489-Speed 5981.74 samples/sec   Loss 8.5404   LearningRate 0.1503   Epoch: 8   Global Step: 92980   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:29:58,367-Speed 5955.95 samples/sec   Loss 8.5706   LearningRate 0.1503   Epoch: 8   Global Step: 92990   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:30:05,264-Speed 5940.91 samples/sec   Loss 8.5979   LearningRate 0.1502   Epoch: 8   Global Step: 93000   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:30:12,119-Speed 5976.01 samples/sec   Loss 8.6329   LearningRate 0.1502   Epoch: 8   Global Step: 93010   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:30:18,994-Speed 5958.99 samples/sec   Loss 8.6584   LearningRate 0.1502   Epoch: 8   Global Step: 93020   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:30:25,845-Speed 5980.50 samples/sec   Loss 8.5946   LearningRate 0.1501   Epoch: 8   Global Step: 93030   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:30:32,708-Speed 5969.16 samples/sec   Loss 8.5647   LearningRate 0.1501   Epoch: 8   Global Step: 93040   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 14:30:39,553-Speed 5985.17 samples/sec   Loss 8.6360   LearningRate 0.1501   Epoch: 8   Global Step: 93050   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:30:46,431-Speed 5956.10 samples/sec   Loss 8.6178   LearningRate 0.1501   Epoch: 8   Global Step: 93060   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:30:53,283-Speed 5979.28 samples/sec   Loss 8.6141   LearningRate 0.1500   Epoch: 8   Global Step: 93070   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:31:00,150-Speed 5966.30 samples/sec   Loss 8.5823   LearningRate 0.1500   Epoch: 8   Global Step: 93080   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:31:07,017-Speed 5965.52 samples/sec   Loss 8.6059   LearningRate 0.1500   Epoch: 8   Global Step: 93090   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:31:13,945-Speed 5913.46 samples/sec   Loss 8.6187   LearningRate 0.1500   Epoch: 8   Global Step: 93100   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:31:20,807-Speed 5970.44 samples/sec   Loss 8.6291   LearningRate 0.1499   Epoch: 8   Global Step: 93110   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:31:27,678-Speed 5963.01 samples/sec   Loss 8.6043   LearningRate 0.1499   Epoch: 8   Global Step: 93120   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:31:34,541-Speed 5969.29 samples/sec   Loss 8.6315   LearningRate 0.1499   Epoch: 8   Global Step: 93130   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:31:41,418-Speed 5957.65 samples/sec   Loss 8.6111   LearningRate 0.1499   Epoch: 8   Global Step: 93140   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:31:48,292-Speed 5962.05 samples/sec   Loss 8.6330   LearningRate 0.1498   Epoch: 8   Global Step: 93150   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:31:55,161-Speed 5964.29 samples/sec   Loss 8.5966   LearningRate 0.1498   Epoch: 8   Global Step: 93160   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:32:02,017-Speed 5975.36 samples/sec   Loss 8.5948   LearningRate 0.1498   Epoch: 8   Global Step: 93170   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:32:08,906-Speed 5946.73 samples/sec   Loss 8.5613   LearningRate 0.1498   Epoch: 8   Global Step: 93180   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:32:15,752-Speed 5983.93 samples/sec   Loss 8.6189   LearningRate 0.1497   Epoch: 8   Global Step: 93190   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:32:22,626-Speed 5960.24 samples/sec   Loss 8.6492   LearningRate 0.1497   Epoch: 8   Global Step: 93200   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:32:29,504-Speed 5956.73 samples/sec   Loss 8.6294   LearningRate 0.1497   Epoch: 8   Global Step: 93210   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:32:36,359-Speed 5976.29 samples/sec   Loss 8.5223   LearningRate 0.1496   Epoch: 8   Global Step: 93220   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:32:43,227-Speed 5964.84 samples/sec   Loss 8.6294   LearningRate 0.1496   Epoch: 8   Global Step: 93230   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:32:50,073-Speed 5983.82 samples/sec   Loss 8.6466   LearningRate 0.1496   Epoch: 8   Global Step: 93240   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:32:56,936-Speed 5968.84 samples/sec   Loss 8.5670   LearningRate 0.1496   Epoch: 8   Global Step: 93250   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:33:03,785-Speed 5981.42 samples/sec   Loss 8.5639   LearningRate 0.1495   Epoch: 8   Global Step: 93260   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:33:10,657-Speed 5961.92 samples/sec   Loss 8.5819   LearningRate 0.1495   Epoch: 8   Global Step: 93270   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:33:17,546-Speed 5946.98 samples/sec   Loss 8.6346   LearningRate 0.1495   Epoch: 8   Global Step: 93280   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:33:24,400-Speed 5977.49 samples/sec   Loss 8.6010   LearningRate 0.1495   Epoch: 8   Global Step: 93290   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:33:31,267-Speed 5965.85 samples/sec   Loss 8.6214   LearningRate 0.1494   Epoch: 8   Global Step: 93300   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:33:38,137-Speed 5963.09 samples/sec   Loss 8.5980   LearningRate 0.1494   Epoch: 8   Global Step: 93310   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:33:45,025-Speed 5949.83 samples/sec   Loss 8.6122   LearningRate 0.1494   Epoch: 8   Global Step: 93320   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:34:10,090-Speed 1634.28 samples/sec   Loss 8.6018   LearningRate 0.1494   Epoch: 9   Global Step: 93330   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:34:16,960-Speed 5964.04 samples/sec   Loss 8.6006   LearningRate 0.1493   Epoch: 9   Global Step: 93340   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:34:23,869-Speed 5929.61 samples/sec   Loss 8.6021   LearningRate 0.1493   Epoch: 9   Global Step: 93350   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 14:34:30,786-Speed 5923.34 samples/sec   Loss 8.5335   LearningRate 0.1493   Epoch: 9   Global Step: 93360   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 14:34:37,681-Speed 5942.55 samples/sec   Loss 8.5867   LearningRate 0.1493   Epoch: 9   Global Step: 93370   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:34:44,584-Speed 5934.45 samples/sec   Loss 8.5694   LearningRate 0.1492   Epoch: 9   Global Step: 93380   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:34:51,455-Speed 5962.71 samples/sec   Loss 8.6281   LearningRate 0.1492   Epoch: 9   Global Step: 93390   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 14:34:58,291-Speed 5993.44 samples/sec   Loss 8.6078   LearningRate 0.1492   Epoch: 9   Global Step: 93400   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 14:35:05,130-Speed 5990.04 samples/sec   Loss 8.5538   LearningRate 0.1491   Epoch: 9   Global Step: 93410   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 14:35:11,984-Speed 5977.32 samples/sec   Loss 8.5810   LearningRate 0.1491   Epoch: 9   Global Step: 93420   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 14:35:18,834-Speed 5980.95 samples/sec   Loss 8.5631   LearningRate 0.1491   Epoch: 9   Global Step: 93430   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 14:35:25,704-Speed 5963.13 samples/sec   Loss 8.5577   LearningRate 0.1491   Epoch: 9   Global Step: 93440   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 14:35:32,572-Speed 5965.38 samples/sec   Loss 8.5963   LearningRate 0.1490   Epoch: 9   Global Step: 93450   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 14:35:39,495-Speed 5917.62 samples/sec   Loss 8.6240   LearningRate 0.1490   Epoch: 9   Global Step: 93460   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 14:35:46,417-Speed 5919.30 samples/sec   Loss 8.5147   LearningRate 0.1490   Epoch: 9   Global Step: 93470   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 14:35:53,398-Speed 5868.39 samples/sec   Loss 8.5592   LearningRate 0.1490   Epoch: 9   Global Step: 93480   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 14:36:00,361-Speed 5884.56 samples/sec   Loss 8.5592   LearningRate 0.1489   Epoch: 9   Global Step: 93490   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:36:07,271-Speed 5929.00 samples/sec   Loss 8.5497   LearningRate 0.1489   Epoch: 9   Global Step: 93500   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:36:14,196-Speed 5915.80 samples/sec   Loss 8.5039   LearningRate 0.1489   Epoch: 9   Global Step: 93510   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:36:21,093-Speed 5940.37 samples/sec   Loss 8.5950   LearningRate 0.1489   Epoch: 9   Global Step: 93520   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:36:28,004-Speed 5927.47 samples/sec   Loss 8.5369   LearningRate 0.1488   Epoch: 9   Global Step: 93530   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:36:34,907-Speed 5934.78 samples/sec   Loss 8.5058   LearningRate 0.1488   Epoch: 9   Global Step: 93540   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:36:41,817-Speed 5928.90 samples/sec   Loss 8.6027   LearningRate 0.1488   Epoch: 9   Global Step: 93550   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:36:48,676-Speed 5972.98 samples/sec   Loss 8.6153   LearningRate 0.1488   Epoch: 9   Global Step: 93560   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:36:55,536-Speed 5972.31 samples/sec   Loss 8.5984   LearningRate 0.1487   Epoch: 9   Global Step: 93570   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:37:02,394-Speed 5973.54 samples/sec   Loss 8.4898   LearningRate 0.1487   Epoch: 9   Global Step: 93580   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:37:09,246-Speed 5979.30 samples/sec   Loss 8.4640   LearningRate 0.1487   Epoch: 9   Global Step: 93590   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:37:16,104-Speed 5974.05 samples/sec   Loss 8.5527   LearningRate 0.1487   Epoch: 9   Global Step: 93600   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:37:22,950-Speed 5983.64 samples/sec   Loss 8.6179   LearningRate 0.1486   Epoch: 9   Global Step: 93610   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:37:29,807-Speed 5974.74 samples/sec   Loss 8.5476   LearningRate 0.1486   Epoch: 9   Global Step: 93620   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:37:36,677-Speed 5963.32 samples/sec   Loss 8.5651   LearningRate 0.1486   Epoch: 9   Global Step: 93630   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:37:43,538-Speed 5970.36 samples/sec   Loss 8.5095   LearningRate 0.1485   Epoch: 9   Global Step: 93640   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:37:50,414-Speed 5958.27 samples/sec   Loss 8.5645   LearningRate 0.1485   Epoch: 9   Global Step: 93650   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:37:57,271-Speed 5974.94 samples/sec   Loss 8.5903   LearningRate 0.1485   Epoch: 9   Global Step: 93660   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:38:04,223-Speed 5892.92 samples/sec   Loss 8.5522   LearningRate 0.1485   Epoch: 9   Global Step: 93670   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:38:11,094-Speed 5962.66 samples/sec   Loss 8.5857   LearningRate 0.1484   Epoch: 9   Global Step: 93680   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:38:17,935-Speed 5988.39 samples/sec   Loss 8.6214   LearningRate 0.1484   Epoch: 9   Global Step: 93690   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:38:24,795-Speed 5972.28 samples/sec   Loss 8.5957   LearningRate 0.1484   Epoch: 9   Global Step: 93700   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:38:31,652-Speed 5974.51 samples/sec   Loss 8.5012   LearningRate 0.1484   Epoch: 9   Global Step: 93710   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:38:38,533-Speed 5953.19 samples/sec   Loss 8.6468   LearningRate 0.1483   Epoch: 9   Global Step: 93720   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:38:45,434-Speed 5937.16 samples/sec   Loss 8.5930   LearningRate 0.1483   Epoch: 9   Global Step: 93730   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:38:52,333-Speed 5939.03 samples/sec   Loss 8.5796   LearningRate 0.1483   Epoch: 9   Global Step: 93740   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:38:59,200-Speed 5969.38 samples/sec   Loss 8.5654   LearningRate 0.1483   Epoch: 9   Global Step: 93750   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:39:06,065-Speed 5967.44 samples/sec   Loss 8.5725   LearningRate 0.1482   Epoch: 9   Global Step: 93760   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:39:12,983-Speed 5921.70 samples/sec   Loss 8.5157   LearningRate 0.1482   Epoch: 9   Global Step: 93770   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:39:19,834-Speed 5979.43 samples/sec   Loss 8.5580   LearningRate 0.1482   Epoch: 9   Global Step: 93780   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:39:26,682-Speed 5983.01 samples/sec   Loss 8.5990   LearningRate 0.1482   Epoch: 9   Global Step: 93790   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:39:33,545-Speed 5971.65 samples/sec   Loss 8.5310   LearningRate 0.1481   Epoch: 9   Global Step: 93800   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:39:40,419-Speed 5959.20 samples/sec   Loss 8.6051   LearningRate 0.1481   Epoch: 9   Global Step: 93810   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:39:47,286-Speed 5970.56 samples/sec   Loss 8.5811   LearningRate 0.1481   Epoch: 9   Global Step: 93820   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:39:54,144-Speed 5975.70 samples/sec   Loss 8.5989   LearningRate 0.1481   Epoch: 9   Global Step: 93830   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:40:01,006-Speed 5970.63 samples/sec   Loss 8.5320   LearningRate 0.1480   Epoch: 9   Global Step: 93840   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:40:07,858-Speed 5978.84 samples/sec   Loss 8.5167   LearningRate 0.1480   Epoch: 9   Global Step: 93850   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:40:14,715-Speed 5974.48 samples/sec   Loss 8.4865   LearningRate 0.1480   Epoch: 9   Global Step: 93860   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:40:21,584-Speed 5964.02 samples/sec   Loss 8.5588   LearningRate 0.1479   Epoch: 9   Global Step: 93870   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:40:28,447-Speed 5969.65 samples/sec   Loss 8.4700   LearningRate 0.1479   Epoch: 9   Global Step: 93880   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:40:35,301-Speed 5977.50 samples/sec   Loss 8.4780   LearningRate 0.1479   Epoch: 9   Global Step: 93890   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:40:42,176-Speed 5958.77 samples/sec   Loss 8.6121   LearningRate 0.1479   Epoch: 9   Global Step: 93900   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:40:49,035-Speed 5973.71 samples/sec   Loss 8.5202   LearningRate 0.1478   Epoch: 9   Global Step: 93910   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:40:55,892-Speed 5974.37 samples/sec   Loss 8.5411   LearningRate 0.1478   Epoch: 9   Global Step: 93920   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:41:02,741-Speed 5981.87 samples/sec   Loss 8.5682   LearningRate 0.1478   Epoch: 9   Global Step: 93930   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:41:09,596-Speed 5975.43 samples/sec   Loss 8.5380   LearningRate 0.1478   Epoch: 9   Global Step: 93940   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:41:16,450-Speed 5977.05 samples/sec   Loss 8.5079   LearningRate 0.1477   Epoch: 9   Global Step: 93950   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:41:23,308-Speed 5973.81 samples/sec   Loss 8.4924   LearningRate 0.1477   Epoch: 9   Global Step: 93960   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:41:30,164-Speed 5975.69 samples/sec   Loss 8.5613   LearningRate 0.1477   Epoch: 9   Global Step: 93970   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:41:37,016-Speed 5978.94 samples/sec   Loss 8.5533   LearningRate 0.1477   Epoch: 9   Global Step: 93980   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:41:43,976-Speed 5886.62 samples/sec   Loss 8.5687   LearningRate 0.1476   Epoch: 9   Global Step: 93990   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:41:50,863-Speed 5949.93 samples/sec   Loss 8.4304   LearningRate 0.1476   Epoch: 9   Global Step: 94000   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:41:57,718-Speed 5975.84 samples/sec   Loss 8.5611   LearningRate 0.1476   Epoch: 9   Global Step: 94010   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:42:04,564-Speed 5983.48 samples/sec   Loss 8.5431   LearningRate 0.1476   Epoch: 9   Global Step: 94020   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:42:11,415-Speed 5979.75 samples/sec   Loss 8.5164   LearningRate 0.1475   Epoch: 9   Global Step: 94030   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:42:18,288-Speed 5960.62 samples/sec   Loss 8.5568   LearningRate 0.1475   Epoch: 9   Global Step: 94040   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:42:25,133-Speed 5985.11 samples/sec   Loss 8.5038   LearningRate 0.1475   Epoch: 9   Global Step: 94050   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:42:32,007-Speed 5959.57 samples/sec   Loss 8.5626   LearningRate 0.1475   Epoch: 9   Global Step: 94060   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:42:38,860-Speed 5978.31 samples/sec   Loss 8.5771   LearningRate 0.1474   Epoch: 9   Global Step: 94070   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:42:45,710-Speed 5980.49 samples/sec   Loss 8.5879   LearningRate 0.1474   Epoch: 9   Global Step: 94080   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:42:52,560-Speed 5980.03 samples/sec   Loss 8.4936   LearningRate 0.1474   Epoch: 9   Global Step: 94090   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:42:59,428-Speed 5967.55 samples/sec   Loss 8.5467   LearningRate 0.1473   Epoch: 9   Global Step: 94100   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:43:06,289-Speed 5971.19 samples/sec   Loss 8.5786   LearningRate 0.1473   Epoch: 9   Global Step: 94110   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:43:13,150-Speed 5971.28 samples/sec   Loss 8.5228   LearningRate 0.1473   Epoch: 9   Global Step: 94120   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:43:20,039-Speed 5947.05 samples/sec   Loss 8.4929   LearningRate 0.1473   Epoch: 9   Global Step: 94130   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:43:26,909-Speed 5967.08 samples/sec   Loss 8.5082   LearningRate 0.1472   Epoch: 9   Global Step: 94140   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:43:33,770-Speed 5970.99 samples/sec   Loss 8.5144   LearningRate 0.1472   Epoch: 9   Global Step: 94150   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:43:40,621-Speed 5979.51 samples/sec   Loss 8.4777   LearningRate 0.1472   Epoch: 9   Global Step: 94160   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:43:47,476-Speed 5976.15 samples/sec   Loss 8.5247   LearningRate 0.1472   Epoch: 9   Global Step: 94170   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:43:54,354-Speed 5957.17 samples/sec   Loss 8.5446   LearningRate 0.1471   Epoch: 9   Global Step: 94180   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:44:01,209-Speed 5975.95 samples/sec   Loss 8.5224   LearningRate 0.1471   Epoch: 9   Global Step: 94190   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 14:44:08,062-Speed 5978.70 samples/sec   Loss 8.5389   LearningRate 0.1471   Epoch: 9   Global Step: 94200   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 14:44:14,924-Speed 5971.04 samples/sec   Loss 8.6043   LearningRate 0.1471   Epoch: 9   Global Step: 94210   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 14:44:21,803-Speed 5955.80 samples/sec   Loss 8.5643   LearningRate 0.1470   Epoch: 9   Global Step: 94220   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 14:44:28,673-Speed 5963.95 samples/sec   Loss 8.5091   LearningRate 0.1470   Epoch: 9   Global Step: 94230   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:44:35,531-Speed 5973.81 samples/sec   Loss 8.5107   LearningRate 0.1470   Epoch: 9   Global Step: 94240   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:44:42,386-Speed 5976.77 samples/sec   Loss 8.5990   LearningRate 0.1470   Epoch: 9   Global Step: 94250   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:44:49,263-Speed 5959.85 samples/sec   Loss 8.4748   LearningRate 0.1469   Epoch: 9   Global Step: 94260   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:44:56,125-Speed 5970.44 samples/sec   Loss 8.5106   LearningRate 0.1469   Epoch: 9   Global Step: 94270   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:45:03,006-Speed 5953.45 samples/sec   Loss 8.5284   LearningRate 0.1469   Epoch: 9   Global Step: 94280   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:45:09,861-Speed 5977.15 samples/sec   Loss 8.5269   LearningRate 0.1469   Epoch: 9   Global Step: 94290   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:45:16,721-Speed 5973.53 samples/sec   Loss 8.5217   LearningRate 0.1468   Epoch: 9   Global Step: 94300   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:45:23,555-Speed 5994.38 samples/sec   Loss 8.5393   LearningRate 0.1468   Epoch: 9   Global Step: 94310   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:45:30,408-Speed 5978.21 samples/sec   Loss 8.6126   LearningRate 0.1468   Epoch: 9   Global Step: 94320   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:45:37,258-Speed 5980.79 samples/sec   Loss 8.5081   LearningRate 0.1468   Epoch: 9   Global Step: 94330   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:45:44,117-Speed 5972.91 samples/sec   Loss 8.4557   LearningRate 0.1467   Epoch: 9   Global Step: 94340   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:45:51,036-Speed 5921.52 samples/sec   Loss 8.5024   LearningRate 0.1467   Epoch: 9   Global Step: 94350   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:45:57,898-Speed 5970.22 samples/sec   Loss 8.5188   LearningRate 0.1467   Epoch: 9   Global Step: 94360   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:46:04,751-Speed 5979.48 samples/sec   Loss 8.5121   LearningRate 0.1466   Epoch: 9   Global Step: 94370   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:46:11,631-Speed 5954.51 samples/sec   Loss 8.4240   LearningRate 0.1466   Epoch: 9   Global Step: 94380   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:46:18,483-Speed 5979.68 samples/sec   Loss 8.4269   LearningRate 0.1466   Epoch: 9   Global Step: 94390   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:46:25,344-Speed 5971.07 samples/sec   Loss 8.5239   LearningRate 0.1466   Epoch: 9   Global Step: 94400   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:46:32,203-Speed 5972.90 samples/sec   Loss 8.5564   LearningRate 0.1465   Epoch: 9   Global Step: 94410   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:46:39,065-Speed 5970.79 samples/sec   Loss 8.4707   LearningRate 0.1465   Epoch: 9   Global Step: 94420   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:46:45,930-Speed 5969.58 samples/sec   Loss 8.5504   LearningRate 0.1465   Epoch: 9   Global Step: 94430   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:46:52,783-Speed 5977.95 samples/sec   Loss 8.5255   LearningRate 0.1465   Epoch: 9   Global Step: 94440   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:46:59,645-Speed 5970.21 samples/sec   Loss 8.4512   LearningRate 0.1464   Epoch: 9   Global Step: 94450   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:47:06,505-Speed 5972.34 samples/sec   Loss 8.5675   LearningRate 0.1464   Epoch: 9   Global Step: 94460   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:47:13,383-Speed 5958.13 samples/sec   Loss 8.4981   LearningRate 0.1464   Epoch: 9   Global Step: 94470   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:47:20,233-Speed 5980.63 samples/sec   Loss 8.5204   LearningRate 0.1464   Epoch: 9   Global Step: 94480   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:47:27,104-Speed 5962.53 samples/sec   Loss 8.5080   LearningRate 0.1463   Epoch: 9   Global Step: 94490   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:47:33,947-Speed 5987.00 samples/sec   Loss 8.5033   LearningRate 0.1463   Epoch: 9   Global Step: 94500   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:47:40,792-Speed 5984.75 samples/sec   Loss 8.4324   LearningRate 0.1463   Epoch: 9   Global Step: 94510   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 14:47:47,655-Speed 5972.30 samples/sec   Loss 8.4608   LearningRate 0.1463   Epoch: 9   Global Step: 94520   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 14:47:54,525-Speed 5963.66 samples/sec   Loss 8.4531   LearningRate 0.1462   Epoch: 9   Global Step: 94530   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 14:48:01,377-Speed 5978.40 samples/sec   Loss 8.4875   LearningRate 0.1462   Epoch: 9   Global Step: 94540   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 14:48:08,263-Speed 5949.76 samples/sec   Loss 8.5204   LearningRate 0.1462   Epoch: 9   Global Step: 94550   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 14:48:15,161-Speed 5939.41 samples/sec   Loss 8.5140   LearningRate 0.1462   Epoch: 9   Global Step: 94560   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 14:48:22,035-Speed 5960.24 samples/sec   Loss 8.4840   LearningRate 0.1461   Epoch: 9   Global Step: 94570   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:48:28,903-Speed 5964.88 samples/sec   Loss 8.5002   LearningRate 0.1461   Epoch: 9   Global Step: 94580   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:48:35,767-Speed 5969.24 samples/sec   Loss 8.5241   LearningRate 0.1461   Epoch: 9   Global Step: 94590   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:48:42,616-Speed 5981.75 samples/sec   Loss 8.4691   LearningRate 0.1461   Epoch: 9   Global Step: 94600   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:48:49,477-Speed 5970.79 samples/sec   Loss 8.4837   LearningRate 0.1460   Epoch: 9   Global Step: 94610   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:48:56,330-Speed 5978.49 samples/sec   Loss 8.4813   LearningRate 0.1460   Epoch: 9   Global Step: 94620   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:49:03,205-Speed 5958.41 samples/sec   Loss 8.4594   LearningRate 0.1460   Epoch: 9   Global Step: 94630   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:49:10,078-Speed 5961.20 samples/sec   Loss 8.5356   LearningRate 0.1459   Epoch: 9   Global Step: 94640   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:49:16,946-Speed 5964.87 samples/sec   Loss 8.5254   LearningRate 0.1459   Epoch: 9   Global Step: 94650   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:49:23,790-Speed 5985.26 samples/sec   Loss 8.4482   LearningRate 0.1459   Epoch: 9   Global Step: 94660   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:49:30,659-Speed 5964.36 samples/sec   Loss 8.5002   LearningRate 0.1459   Epoch: 9   Global Step: 94670   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 14:49:37,514-Speed 5975.74 samples/sec   Loss 8.4142   LearningRate 0.1458   Epoch: 9   Global Step: 94680   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 14:49:44,348-Speed 5997.67 samples/sec   Loss 8.4980   LearningRate 0.1458   Epoch: 9   Global Step: 94690   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:49:51,199-Speed 5979.69 samples/sec   Loss 8.5033   LearningRate 0.1458   Epoch: 9   Global Step: 94700   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:49:58,057-Speed 5973.87 samples/sec   Loss 8.5505   LearningRate 0.1458   Epoch: 9   Global Step: 94710   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:50:04,921-Speed 5968.01 samples/sec   Loss 8.4542   LearningRate 0.1457   Epoch: 9   Global Step: 94720   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:50:11,774-Speed 5978.26 samples/sec   Loss 8.4480   LearningRate 0.1457   Epoch: 9   Global Step: 94730   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:50:18,619-Speed 5985.00 samples/sec   Loss 8.5048   LearningRate 0.1457   Epoch: 9   Global Step: 94740   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:50:25,494-Speed 5958.44 samples/sec   Loss 8.4821   LearningRate 0.1457   Epoch: 9   Global Step: 94750   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:50:32,362-Speed 5965.51 samples/sec   Loss 8.5401   LearningRate 0.1456   Epoch: 9   Global Step: 94760   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:50:39,204-Speed 5987.18 samples/sec   Loss 8.5126   LearningRate 0.1456   Epoch: 9   Global Step: 94770   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:50:46,047-Speed 5986.56 samples/sec   Loss 8.4650   LearningRate 0.1456   Epoch: 9   Global Step: 94780   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:50:52,966-Speed 5920.55 samples/sec   Loss 8.4403   LearningRate 0.1456   Epoch: 9   Global Step: 94790   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 14:50:59,853-Speed 5948.61 samples/sec   Loss 8.5058   LearningRate 0.1455   Epoch: 9   Global Step: 94800   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:51:06,702-Speed 5981.73 samples/sec   Loss 8.5397   LearningRate 0.1455   Epoch: 9   Global Step: 94810   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:51:13,562-Speed 5971.91 samples/sec   Loss 8.5284   LearningRate 0.1455   Epoch: 9   Global Step: 94820   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:51:20,423-Speed 5971.30 samples/sec   Loss 8.4941   LearningRate 0.1455   Epoch: 9   Global Step: 94830   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:51:27,275-Speed 5979.49 samples/sec   Loss 8.4885   LearningRate 0.1454   Epoch: 9   Global Step: 94840   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:51:34,153-Speed 5957.62 samples/sec   Loss 8.5219   LearningRate 0.1454   Epoch: 9   Global Step: 94850   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:51:41,011-Speed 5975.31 samples/sec   Loss 8.4710   LearningRate 0.1454   Epoch: 9   Global Step: 94860   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:51:47,865-Speed 5977.95 samples/sec   Loss 8.4645   LearningRate 0.1454   Epoch: 9   Global Step: 94870   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:51:54,717-Speed 5981.60 samples/sec   Loss 8.5146   LearningRate 0.1453   Epoch: 9   Global Step: 94880   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:52:01,577-Speed 5972.64 samples/sec   Loss 8.4724   LearningRate 0.1453   Epoch: 9   Global Step: 94890   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:52:08,423-Speed 5983.93 samples/sec   Loss 8.5015   LearningRate 0.1453   Epoch: 9   Global Step: 94900   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 14:52:15,274-Speed 5979.30 samples/sec   Loss 8.4068   LearningRate 0.1452   Epoch: 9   Global Step: 94910   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 14:52:22,136-Speed 5970.65 samples/sec   Loss 8.4158   LearningRate 0.1452   Epoch: 9   Global Step: 94920   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:52:28,996-Speed 5975.29 samples/sec   Loss 8.5034   LearningRate 0.1452   Epoch: 9   Global Step: 94930   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:52:35,952-Speed 5889.73 samples/sec   Loss 8.4773   LearningRate 0.1452   Epoch: 9   Global Step: 94940   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:52:42,825-Speed 5960.64 samples/sec   Loss 8.4502   LearningRate 0.1451   Epoch: 9   Global Step: 94950   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:52:49,696-Speed 5962.71 samples/sec   Loss 8.4352   LearningRate 0.1451   Epoch: 9   Global Step: 94960   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:52:56,569-Speed 5961.30 samples/sec   Loss 8.4770   LearningRate 0.1451   Epoch: 9   Global Step: 94970   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:53:03,448-Speed 5955.65 samples/sec   Loss 8.4380   LearningRate 0.1451   Epoch: 9   Global Step: 94980   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:53:10,314-Speed 5966.88 samples/sec   Loss 8.4307   LearningRate 0.1450   Epoch: 9   Global Step: 94990   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:53:17,172-Speed 5973.67 samples/sec   Loss 8.4288   LearningRate 0.1450   Epoch: 9   Global Step: 95000   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:53:43,723-[lfw][95000]XNorm: 22.545525
Training: 2022-01-08 14:53:43,724-[lfw][95000]Accuracy-Flip: 0.99700+-0.00267
Training: 2022-01-08 14:53:43,725-[lfw][95000]Accuracy-Highest: 0.99750
Training: 2022-01-08 14:54:14,771-[cfp_fp][95000]XNorm: 19.472820
Training: 2022-01-08 14:54:14,772-[cfp_fp][95000]Accuracy-Flip: 0.97957+-0.00808
Training: 2022-01-08 14:54:14,773-[cfp_fp][95000]Accuracy-Highest: 0.98114
Training: 2022-01-08 14:54:41,583-[agedb_30][95000]XNorm: 21.829587
Training: 2022-01-08 14:54:41,583-[agedb_30][95000]Accuracy-Flip: 0.97150+-0.00762
Training: 2022-01-08 14:54:41,584-[agedb_30][95000]Accuracy-Highest: 0.97150
Training: 2022-01-08 14:54:48,440-Speed 448.79 samples/sec   Loss 8.4634   LearningRate 0.1450   Epoch: 9   Global Step: 95010   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:54:55,300-Speed 5978.92 samples/sec   Loss 8.4394   LearningRate 0.1450   Epoch: 9   Global Step: 95020   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:55:02,157-Speed 5988.70 samples/sec   Loss 8.4882   LearningRate 0.1449   Epoch: 9   Global Step: 95030   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:55:09,015-Speed 5974.23 samples/sec   Loss 8.4639   LearningRate 0.1449   Epoch: 9   Global Step: 95040   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:55:15,858-Speed 5986.55 samples/sec   Loss 8.4449   LearningRate 0.1449   Epoch: 9   Global Step: 95050   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:55:22,713-Speed 5975.90 samples/sec   Loss 8.3620   LearningRate 0.1449   Epoch: 9   Global Step: 95060   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:55:29,587-Speed 5962.62 samples/sec   Loss 8.3835   LearningRate 0.1448   Epoch: 9   Global Step: 95070   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:55:36,563-Speed 5873.04 samples/sec   Loss 8.4737   LearningRate 0.1448   Epoch: 9   Global Step: 95080   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:55:43,446-Speed 5951.63 samples/sec   Loss 8.5341   LearningRate 0.1448   Epoch: 9   Global Step: 95090   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:55:50,326-Speed 5959.89 samples/sec   Loss 8.4734   LearningRate 0.1448   Epoch: 9   Global Step: 95100   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:55:57,188-Speed 5972.18 samples/sec   Loss 8.4913   LearningRate 0.1447   Epoch: 9   Global Step: 95110   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:56:04,066-Speed 5956.68 samples/sec   Loss 8.4386   LearningRate 0.1447   Epoch: 9   Global Step: 95120   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 14:56:10,911-Speed 5984.81 samples/sec   Loss 8.4819   LearningRate 0.1447   Epoch: 9   Global Step: 95130   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:56:17,772-Speed 5971.36 samples/sec   Loss 8.4183   LearningRate 0.1447   Epoch: 9   Global Step: 95140   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:56:24,620-Speed 5981.76 samples/sec   Loss 8.4623   LearningRate 0.1446   Epoch: 9   Global Step: 95150   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:56:31,475-Speed 5976.69 samples/sec   Loss 8.4561   LearningRate 0.1446   Epoch: 9   Global Step: 95160   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:56:38,337-Speed 5970.38 samples/sec   Loss 8.4387   LearningRate 0.1446   Epoch: 9   Global Step: 95170   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:56:45,207-Speed 5962.79 samples/sec   Loss 8.4338   LearningRate 0.1446   Epoch: 9   Global Step: 95180   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:56:52,054-Speed 5983.23 samples/sec   Loss 8.3854   LearningRate 0.1445   Epoch: 9   Global Step: 95190   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:56:58,916-Speed 5970.59 samples/sec   Loss 8.4721   LearningRate 0.1445   Epoch: 9   Global Step: 95200   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:57:05,784-Speed 5964.46 samples/sec   Loss 8.4417   LearningRate 0.1445   Epoch: 9   Global Step: 95210   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:57:12,644-Speed 5972.06 samples/sec   Loss 8.4605   LearningRate 0.1444   Epoch: 9   Global Step: 95220   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:57:19,506-Speed 5970.24 samples/sec   Loss 8.4064   LearningRate 0.1444   Epoch: 9   Global Step: 95230   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:57:26,356-Speed 5983.30 samples/sec   Loss 8.4413   LearningRate 0.1444   Epoch: 9   Global Step: 95240   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:57:33,218-Speed 5970.21 samples/sec   Loss 8.5052   LearningRate 0.1444   Epoch: 9   Global Step: 95250   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:57:40,052-Speed 5994.24 samples/sec   Loss 8.4274   LearningRate 0.1443   Epoch: 9   Global Step: 95260   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:57:46,970-Speed 5922.27 samples/sec   Loss 8.4579   LearningRate 0.1443   Epoch: 9   Global Step: 95270   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:57:53,922-Speed 5893.32 samples/sec   Loss 8.4356   LearningRate 0.1443   Epoch: 9   Global Step: 95280   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:58:00,836-Speed 5925.12 samples/sec   Loss 8.3978   LearningRate 0.1443   Epoch: 9   Global Step: 95290   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:58:07,722-Speed 5950.36 samples/sec   Loss 8.4149   LearningRate 0.1442   Epoch: 9   Global Step: 95300   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:58:14,591-Speed 5963.37 samples/sec   Loss 8.3628   LearningRate 0.1442   Epoch: 9   Global Step: 95310   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:58:21,433-Speed 5987.87 samples/sec   Loss 8.4107   LearningRate 0.1442   Epoch: 9   Global Step: 95320   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:58:28,278-Speed 5984.98 samples/sec   Loss 8.4953   LearningRate 0.1442   Epoch: 9   Global Step: 95330   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:58:35,124-Speed 5983.69 samples/sec   Loss 8.4198   LearningRate 0.1441   Epoch: 9   Global Step: 95340   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:58:41,976-Speed 5979.18 samples/sec   Loss 8.5021   LearningRate 0.1441   Epoch: 9   Global Step: 95350   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:58:48,828-Speed 5978.34 samples/sec   Loss 8.4859   LearningRate 0.1441   Epoch: 9   Global Step: 95360   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:58:55,677-Speed 5981.73 samples/sec   Loss 8.4718   LearningRate 0.1441   Epoch: 9   Global Step: 95370   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 14:59:02,527-Speed 5980.05 samples/sec   Loss 8.4545   LearningRate 0.1440   Epoch: 9   Global Step: 95380   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:59:09,382-Speed 5975.21 samples/sec   Loss 8.5070   LearningRate 0.1440   Epoch: 9   Global Step: 95390   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:59:16,238-Speed 5976.20 samples/sec   Loss 8.4452   LearningRate 0.1440   Epoch: 9   Global Step: 95400   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:59:23,207-Speed 5878.20 samples/sec   Loss 8.4474   LearningRate 0.1440   Epoch: 9   Global Step: 95410   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:59:30,089-Speed 5953.27 samples/sec   Loss 8.3916   LearningRate 0.1439   Epoch: 9   Global Step: 95420   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:59:36,942-Speed 5977.53 samples/sec   Loss 8.4204   LearningRate 0.1439   Epoch: 9   Global Step: 95430   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:59:43,808-Speed 5966.44 samples/sec   Loss 8.4021   LearningRate 0.1439   Epoch: 9   Global Step: 95440   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:59:50,652-Speed 5986.66 samples/sec   Loss 8.4189   LearningRate 0.1439   Epoch: 9   Global Step: 95450   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 14:59:57,494-Speed 5987.28 samples/sec   Loss 8.4680   LearningRate 0.1438   Epoch: 9   Global Step: 95460   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:00:04,346-Speed 5979.06 samples/sec   Loss 8.4123   LearningRate 0.1438   Epoch: 9   Global Step: 95470   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:00:11,190-Speed 5985.89 samples/sec   Loss 8.3728   LearningRate 0.1438   Epoch: 9   Global Step: 95480   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:00:18,050-Speed 5971.51 samples/sec   Loss 8.4114   LearningRate 0.1438   Epoch: 9   Global Step: 95490   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:00:24,900-Speed 5981.32 samples/sec   Loss 8.4818   LearningRate 0.1437   Epoch: 9   Global Step: 95500   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:00:31,745-Speed 5984.50 samples/sec   Loss 8.4626   LearningRate 0.1437   Epoch: 9   Global Step: 95510   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:00:38,619-Speed 5959.50 samples/sec   Loss 8.4089   LearningRate 0.1437   Epoch: 9   Global Step: 95520   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:00:45,493-Speed 5959.98 samples/sec   Loss 8.3127   LearningRate 0.1437   Epoch: 9   Global Step: 95530   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:00:52,380-Speed 5948.36 samples/sec   Loss 8.4802   LearningRate 0.1436   Epoch: 9   Global Step: 95540   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:00:59,243-Speed 5971.58 samples/sec   Loss 8.4205   LearningRate 0.1436   Epoch: 9   Global Step: 95550   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:01:06,105-Speed 5970.32 samples/sec   Loss 8.3881   LearningRate 0.1436   Epoch: 9   Global Step: 95560   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:01:12,984-Speed 5955.40 samples/sec   Loss 8.4143   LearningRate 0.1435   Epoch: 9   Global Step: 95570   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:01:19,841-Speed 5974.59 samples/sec   Loss 8.5028   LearningRate 0.1435   Epoch: 9   Global Step: 95580   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 15:01:26,682-Speed 5988.70 samples/sec   Loss 8.4154   LearningRate 0.1435   Epoch: 9   Global Step: 95590   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:01:33,534-Speed 5979.03 samples/sec   Loss 8.3527   LearningRate 0.1435   Epoch: 9   Global Step: 95600   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:01:40,389-Speed 5976.03 samples/sec   Loss 8.4355   LearningRate 0.1434   Epoch: 9   Global Step: 95610   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:01:47,247-Speed 5973.60 samples/sec   Loss 8.4150   LearningRate 0.1434   Epoch: 9   Global Step: 95620   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:01:54,110-Speed 5969.47 samples/sec   Loss 8.3903   LearningRate 0.1434   Epoch: 9   Global Step: 95630   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:02:00,957-Speed 5983.68 samples/sec   Loss 8.3490   LearningRate 0.1434   Epoch: 9   Global Step: 95640   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:02:07,842-Speed 5950.26 samples/sec   Loss 8.3753   LearningRate 0.1433   Epoch: 9   Global Step: 95650   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:02:14,719-Speed 5956.69 samples/sec   Loss 8.3956   LearningRate 0.1433   Epoch: 9   Global Step: 95660   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:02:21,593-Speed 5961.00 samples/sec   Loss 8.4381   LearningRate 0.1433   Epoch: 9   Global Step: 95670   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:02:28,463-Speed 5963.84 samples/sec   Loss 8.4177   LearningRate 0.1433   Epoch: 9   Global Step: 95680   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:02:35,317-Speed 5977.13 samples/sec   Loss 8.4060   LearningRate 0.1432   Epoch: 9   Global Step: 95690   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:02:42,165-Speed 5984.15 samples/sec   Loss 8.4420   LearningRate 0.1432   Epoch: 9   Global Step: 95700   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:02:49,024-Speed 5972.74 samples/sec   Loss 8.4102   LearningRate 0.1432   Epoch: 9   Global Step: 95710   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:02:55,883-Speed 5973.25 samples/sec   Loss 8.4111   LearningRate 0.1432   Epoch: 9   Global Step: 95720   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:03:02,746-Speed 5968.79 samples/sec   Loss 8.4903   LearningRate 0.1431   Epoch: 9   Global Step: 95730   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:03:09,620-Speed 5960.47 samples/sec   Loss 8.5000   LearningRate 0.1431   Epoch: 9   Global Step: 95740   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:03:16,487-Speed 5965.39 samples/sec   Loss 8.4118   LearningRate 0.1431   Epoch: 9   Global Step: 95750   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:03:23,356-Speed 5964.76 samples/sec   Loss 8.4275   LearningRate 0.1431   Epoch: 9   Global Step: 95760   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:03:30,238-Speed 5952.65 samples/sec   Loss 8.3920   LearningRate 0.1430   Epoch: 9   Global Step: 95770   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:03:37,101-Speed 5969.65 samples/sec   Loss 8.3880   LearningRate 0.1430   Epoch: 9   Global Step: 95780   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:03:43,967-Speed 5966.34 samples/sec   Loss 8.4160   LearningRate 0.1430   Epoch: 9   Global Step: 95790   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:03:50,825-Speed 5974.13 samples/sec   Loss 8.4547   LearningRate 0.1430   Epoch: 9   Global Step: 95800   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:03:57,680-Speed 5976.32 samples/sec   Loss 8.3791   LearningRate 0.1429   Epoch: 9   Global Step: 95810   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 15:04:04,543-Speed 5968.87 samples/sec   Loss 8.4244   LearningRate 0.1429   Epoch: 9   Global Step: 95820   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:04:11,390-Speed 5983.84 samples/sec   Loss 8.3792   LearningRate 0.1429   Epoch: 9   Global Step: 95830   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:04:18,248-Speed 5974.27 samples/sec   Loss 8.4533   LearningRate 0.1429   Epoch: 9   Global Step: 95840   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:04:25,088-Speed 5989.38 samples/sec   Loss 8.4107   LearningRate 0.1428   Epoch: 9   Global Step: 95850   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:04:31,974-Speed 5948.92 samples/sec   Loss 8.3669   LearningRate 0.1428   Epoch: 9   Global Step: 95860   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:04:38,844-Speed 5963.18 samples/sec   Loss 8.3996   LearningRate 0.1428   Epoch: 9   Global Step: 95870   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:04:45,708-Speed 5968.55 samples/sec   Loss 8.4004   LearningRate 0.1428   Epoch: 9   Global Step: 95880   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:04:52,557-Speed 5981.77 samples/sec   Loss 8.4378   LearningRate 0.1427   Epoch: 9   Global Step: 95890   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:04:59,432-Speed 5958.50 samples/sec   Loss 8.3625   LearningRate 0.1427   Epoch: 9   Global Step: 95900   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:05:06,284-Speed 5979.04 samples/sec   Loss 8.3206   LearningRate 0.1427   Epoch: 9   Global Step: 95910   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:05:13,136-Speed 5978.40 samples/sec   Loss 8.3369   LearningRate 0.1427   Epoch: 9   Global Step: 95920   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:05:19,993-Speed 5975.69 samples/sec   Loss 8.4505   LearningRate 0.1426   Epoch: 9   Global Step: 95930   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:05:26,846-Speed 5977.73 samples/sec   Loss 8.3538   LearningRate 0.1426   Epoch: 9   Global Step: 95940   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:05:33,707-Speed 5970.76 samples/sec   Loss 8.4741   LearningRate 0.1426   Epoch: 9   Global Step: 95950   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:05:40,566-Speed 5973.26 samples/sec   Loss 8.3989   LearningRate 0.1426   Epoch: 9   Global Step: 95960   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:05:47,485-Speed 5920.66 samples/sec   Loss 8.4722   LearningRate 0.1425   Epoch: 9   Global Step: 95970   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:05:54,424-Speed 5904.16 samples/sec   Loss 8.3912   LearningRate 0.1425   Epoch: 9   Global Step: 95980   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:06:01,340-Speed 5923.15 samples/sec   Loss 8.3972   LearningRate 0.1425   Epoch: 9   Global Step: 95990   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:06:08,199-Speed 5973.16 samples/sec   Loss 8.4312   LearningRate 0.1424   Epoch: 9   Global Step: 96000   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:06:15,057-Speed 5973.41 samples/sec   Loss 8.3672   LearningRate 0.1424   Epoch: 9   Global Step: 96010   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:06:21,923-Speed 5966.89 samples/sec   Loss 8.3665   LearningRate 0.1424   Epoch: 9   Global Step: 96020   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:06:28,788-Speed 5968.84 samples/sec   Loss 8.4062   LearningRate 0.1424   Epoch: 9   Global Step: 96030   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:06:35,708-Speed 5920.32 samples/sec   Loss 8.4285   LearningRate 0.1423   Epoch: 9   Global Step: 96040   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:06:42,600-Speed 5943.89 samples/sec   Loss 8.4188   LearningRate 0.1423   Epoch: 9   Global Step: 96050   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:06:49,477-Speed 5958.12 samples/sec   Loss 8.3967   LearningRate 0.1423   Epoch: 9   Global Step: 96060   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:06:56,327-Speed 5980.28 samples/sec   Loss 8.3804   LearningRate 0.1423   Epoch: 9   Global Step: 96070   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:07:03,181-Speed 5977.71 samples/sec   Loss 8.3786   LearningRate 0.1422   Epoch: 9   Global Step: 96080   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:07:10,028-Speed 5982.41 samples/sec   Loss 8.3455   LearningRate 0.1422   Epoch: 9   Global Step: 96090   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:07:16,876-Speed 5982.97 samples/sec   Loss 8.4753   LearningRate 0.1422   Epoch: 9   Global Step: 96100   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:07:23,718-Speed 5987.42 samples/sec   Loss 8.4000   LearningRate 0.1422   Epoch: 9   Global Step: 96110   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 15:07:30,563-Speed 5984.51 samples/sec   Loss 8.3709   LearningRate 0.1421   Epoch: 9   Global Step: 96120   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 15:07:37,422-Speed 5973.33 samples/sec   Loss 8.3593   LearningRate 0.1421   Epoch: 9   Global Step: 96130   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 15:07:44,301-Speed 5954.77 samples/sec   Loss 8.4109   LearningRate 0.1421   Epoch: 9   Global Step: 96140   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 15:07:51,169-Speed 5965.35 samples/sec   Loss 8.3501   LearningRate 0.1421   Epoch: 9   Global Step: 96150   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 15:07:58,032-Speed 5968.64 samples/sec   Loss 8.4638   LearningRate 0.1420   Epoch: 9   Global Step: 96160   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 15:08:04,896-Speed 5968.84 samples/sec   Loss 8.3681   LearningRate 0.1420   Epoch: 9   Global Step: 96170   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 15:08:11,760-Speed 5969.42 samples/sec   Loss 8.3789   LearningRate 0.1420   Epoch: 9   Global Step: 96180   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 15:08:18,615-Speed 5976.75 samples/sec   Loss 8.3841   LearningRate 0.1420   Epoch: 9   Global Step: 96190   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 15:08:25,495-Speed 5953.93 samples/sec   Loss 8.3360   LearningRate 0.1419   Epoch: 9   Global Step: 96200   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 15:08:32,350-Speed 5976.91 samples/sec   Loss 8.3489   LearningRate 0.1419   Epoch: 9   Global Step: 96210   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:08:39,217-Speed 5966.38 samples/sec   Loss 8.3010   LearningRate 0.1419   Epoch: 9   Global Step: 96220   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:08:46,082-Speed 5967.61 samples/sec   Loss 8.3657   LearningRate 0.1419   Epoch: 9   Global Step: 96230   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:08:52,937-Speed 5975.77 samples/sec   Loss 8.4454   LearningRate 0.1418   Epoch: 9   Global Step: 96240   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:08:59,861-Speed 5917.48 samples/sec   Loss 8.3511   LearningRate 0.1418   Epoch: 9   Global Step: 96250   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:09:06,745-Speed 5950.94 samples/sec   Loss 8.3929   LearningRate 0.1418   Epoch: 9   Global Step: 96260   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:09:13,616-Speed 5962.41 samples/sec   Loss 8.3657   LearningRate 0.1418   Epoch: 9   Global Step: 96270   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:09:20,468-Speed 5979.67 samples/sec   Loss 8.4458   LearningRate 0.1417   Epoch: 9   Global Step: 96280   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 15:09:27,335-Speed 5965.39 samples/sec   Loss 8.3895   LearningRate 0.1417   Epoch: 9   Global Step: 96290   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 15:09:34,194-Speed 5973.07 samples/sec   Loss 8.3733   LearningRate 0.1417   Epoch: 9   Global Step: 96300   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 15:09:41,087-Speed 5943.84 samples/sec   Loss 8.4680   LearningRate 0.1417   Epoch: 9   Global Step: 96310   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 15:09:47,961-Speed 5959.43 samples/sec   Loss 8.4305   LearningRate 0.1416   Epoch: 9   Global Step: 96320   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 15:09:54,854-Speed 5943.81 samples/sec   Loss 8.4254   LearningRate 0.1416   Epoch: 9   Global Step: 96330   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 15:10:01,707-Speed 5978.38 samples/sec   Loss 8.3950   LearningRate 0.1416   Epoch: 9   Global Step: 96340   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 15:10:08,580-Speed 5960.87 samples/sec   Loss 8.3683   LearningRate 0.1416   Epoch: 9   Global Step: 96350   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 15:10:15,450-Speed 5963.64 samples/sec   Loss 8.3843   LearningRate 0.1415   Epoch: 9   Global Step: 96360   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 15:10:22,301-Speed 5982.30 samples/sec   Loss 8.3933   LearningRate 0.1415   Epoch: 9   Global Step: 96370   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-01-08 15:10:29,173-Speed 5960.76 samples/sec   Loss 8.3623   LearningRate 0.1415   Epoch: 9   Global Step: 96380   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:10:36,023-Speed 5985.38 samples/sec   Loss 8.3621   LearningRate 0.1415   Epoch: 9   Global Step: 96390   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:10:42,880-Speed 5973.61 samples/sec   Loss 8.4056   LearningRate 0.1414   Epoch: 9   Global Step: 96400   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:10:49,731-Speed 5979.78 samples/sec   Loss 8.3184   LearningRate 0.1414   Epoch: 9   Global Step: 96410   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:10:56,581-Speed 5980.18 samples/sec   Loss 8.3489   LearningRate 0.1414   Epoch: 9   Global Step: 96420   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:11:03,429-Speed 5982.65 samples/sec   Loss 8.3755   LearningRate 0.1414   Epoch: 9   Global Step: 96430   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:11:10,284-Speed 5977.31 samples/sec   Loss 8.3948   LearningRate 0.1413   Epoch: 9   Global Step: 96440   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:11:17,167-Speed 5952.14 samples/sec   Loss 8.3987   LearningRate 0.1413   Epoch: 9   Global Step: 96450   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:11:24,023-Speed 5975.18 samples/sec   Loss 8.3968   LearningRate 0.1413   Epoch: 9   Global Step: 96460   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:11:30,903-Speed 5954.99 samples/sec   Loss 8.4277   LearningRate 0.1412   Epoch: 9   Global Step: 96470   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-01-08 15:11:37,781-Speed 5956.76 samples/sec   Loss 8.3924   LearningRate 0.1412   Epoch: 9   Global Step: 96480   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:11:44,632-Speed 5979.62 samples/sec   Loss 8.4070   LearningRate 0.1412   Epoch: 9   Global Step: 96490   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:11:51,499-Speed 5966.20 samples/sec   Loss 8.3515   LearningRate 0.1412   Epoch: 9   Global Step: 96500   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:11:58,358-Speed 5980.54 samples/sec   Loss 8.3485   LearningRate 0.1411   Epoch: 9   Global Step: 96510   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:12:05,216-Speed 5973.57 samples/sec   Loss 8.3403   LearningRate 0.1411   Epoch: 9   Global Step: 96520   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:12:12,090-Speed 5959.81 samples/sec   Loss 8.3820   LearningRate 0.1411   Epoch: 9   Global Step: 96530   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:12:18,954-Speed 5968.18 samples/sec   Loss 8.3230   LearningRate 0.1411   Epoch: 9   Global Step: 96540   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:12:25,834-Speed 5955.09 samples/sec   Loss 8.2899   LearningRate 0.1410   Epoch: 9   Global Step: 96550   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:12:32,689-Speed 5976.28 samples/sec   Loss 8.3845   LearningRate 0.1410   Epoch: 9   Global Step: 96560   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:12:39,560-Speed 5962.86 samples/sec   Loss 8.3024   LearningRate 0.1410   Epoch: 9   Global Step: 96570   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:12:46,426-Speed 5966.46 samples/sec   Loss 8.3523   LearningRate 0.1410   Epoch: 9   Global Step: 96580   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 15:12:53,280-Speed 5977.59 samples/sec   Loss 8.3950   LearningRate 0.1409   Epoch: 9   Global Step: 96590   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 15:13:00,140-Speed 5973.39 samples/sec   Loss 8.3815   LearningRate 0.1409   Epoch: 9   Global Step: 96600   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 15:13:06,992-Speed 5979.60 samples/sec   Loss 8.3813   LearningRate 0.1409   Epoch: 9   Global Step: 96610   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 15:13:13,877-Speed 5959.61 samples/sec   Loss 8.3583   LearningRate 0.1409   Epoch: 9   Global Step: 96620   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 15:13:20,744-Speed 5965.39 samples/sec   Loss 8.3729   LearningRate 0.1408   Epoch: 9   Global Step: 96630   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 15:13:27,597-Speed 5977.91 samples/sec   Loss 8.3514   LearningRate 0.1408   Epoch: 9   Global Step: 96640   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 15:13:34,463-Speed 5966.45 samples/sec   Loss 8.3170   LearningRate 0.1408   Epoch: 9   Global Step: 96650   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:13:41,330-Speed 5965.41 samples/sec   Loss 8.4102   LearningRate 0.1408   Epoch: 9   Global Step: 96660   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:13:48,197-Speed 5966.27 samples/sec   Loss 8.3227   LearningRate 0.1407   Epoch: 9   Global Step: 96670   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:13:55,065-Speed 5965.22 samples/sec   Loss 8.3194   LearningRate 0.1407   Epoch: 9   Global Step: 96680   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:14:01,924-Speed 5972.49 samples/sec   Loss 8.3454   LearningRate 0.1407   Epoch: 9   Global Step: 96690   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:14:08,794-Speed 5966.26 samples/sec   Loss 8.3308   LearningRate 0.1407   Epoch: 9   Global Step: 96700   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:14:15,654-Speed 5972.74 samples/sec   Loss 8.3379   LearningRate 0.1406   Epoch: 9   Global Step: 96710   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:14:22,510-Speed 5974.86 samples/sec   Loss 8.3213   LearningRate 0.1406   Epoch: 9   Global Step: 96720   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:14:29,401-Speed 5945.22 samples/sec   Loss 8.3437   LearningRate 0.1406   Epoch: 9   Global Step: 96730   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:14:36,260-Speed 5973.24 samples/sec   Loss 8.3192   LearningRate 0.1406   Epoch: 9   Global Step: 96740   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:14:43,151-Speed 5945.23 samples/sec   Loss 8.3132   LearningRate 0.1405   Epoch: 9   Global Step: 96750   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 15:14:50,014-Speed 5969.90 samples/sec   Loss 8.3918   LearningRate 0.1405   Epoch: 9   Global Step: 96760   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 15:14:56,879-Speed 5967.01 samples/sec   Loss 8.2619   LearningRate 0.1405   Epoch: 9   Global Step: 96770   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 15:15:03,732-Speed 5978.13 samples/sec   Loss 8.3256   LearningRate 0.1405   Epoch: 9   Global Step: 96780   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:15:10,584-Speed 5978.72 samples/sec   Loss 8.3987   LearningRate 0.1404   Epoch: 9   Global Step: 96790   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:15:17,447-Speed 5970.66 samples/sec   Loss 8.3794   LearningRate 0.1404   Epoch: 9   Global Step: 96800   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:15:24,302-Speed 5975.36 samples/sec   Loss 8.3483   LearningRate 0.1404   Epoch: 9   Global Step: 96810   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:15:31,149-Speed 5983.73 samples/sec   Loss 8.2937   LearningRate 0.1404   Epoch: 9   Global Step: 96820   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:15:38,005-Speed 5975.25 samples/sec   Loss 8.3325   LearningRate 0.1403   Epoch: 9   Global Step: 96830   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:15:44,877-Speed 5964.17 samples/sec   Loss 8.3130   LearningRate 0.1403   Epoch: 9   Global Step: 96840   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:15:51,753-Speed 5958.26 samples/sec   Loss 8.3472   LearningRate 0.1403   Epoch: 9   Global Step: 96850   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:15:58,611-Speed 5972.77 samples/sec   Loss 8.3188   LearningRate 0.1403   Epoch: 9   Global Step: 96860   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:16:05,464-Speed 5981.45 samples/sec   Loss 8.3073   LearningRate 0.1402   Epoch: 9   Global Step: 96870   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:16:12,345-Speed 5953.59 samples/sec   Loss 8.3368   LearningRate 0.1402   Epoch: 9   Global Step: 96880   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 15:16:19,194-Speed 5982.04 samples/sec   Loss 8.3244   LearningRate 0.1402   Epoch: 9   Global Step: 96890   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:16:26,050-Speed 5976.26 samples/sec   Loss 8.3671   LearningRate 0.1402   Epoch: 9   Global Step: 96900   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:16:32,895-Speed 5984.82 samples/sec   Loss 8.3994   LearningRate 0.1401   Epoch: 9   Global Step: 96910   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:16:39,750-Speed 5975.86 samples/sec   Loss 8.3429   LearningRate 0.1401   Epoch: 9   Global Step: 96920   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:16:46,630-Speed 5955.38 samples/sec   Loss 8.3431   LearningRate 0.1401   Epoch: 9   Global Step: 96930   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:16:53,521-Speed 5945.32 samples/sec   Loss 8.2962   LearningRate 0.1401   Epoch: 9   Global Step: 96940   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:17:00,371-Speed 5980.77 samples/sec   Loss 8.4429   LearningRate 0.1400   Epoch: 9   Global Step: 96950   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:17:07,223-Speed 5978.97 samples/sec   Loss 8.3358   LearningRate 0.1400   Epoch: 9   Global Step: 96960   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:17:14,072-Speed 5983.69 samples/sec   Loss 8.3365   LearningRate 0.1400   Epoch: 9   Global Step: 96970   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:17:20,933-Speed 5971.32 samples/sec   Loss 8.3554   LearningRate 0.1400   Epoch: 9   Global Step: 96980   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:17:27,785-Speed 5978.11 samples/sec   Loss 8.3545   LearningRate 0.1399   Epoch: 9   Global Step: 96990   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 15:17:34,647-Speed 5970.66 samples/sec   Loss 8.3328   LearningRate 0.1399   Epoch: 9   Global Step: 97000   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 15:17:41,510-Speed 5969.42 samples/sec   Loss 8.3448   LearningRate 0.1399   Epoch: 9   Global Step: 97010   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 15:17:48,359-Speed 5981.63 samples/sec   Loss 8.3645   LearningRate 0.1399   Epoch: 9   Global Step: 97020   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 15:17:55,244-Speed 5950.71 samples/sec   Loss 8.3597   LearningRate 0.1398   Epoch: 9   Global Step: 97030   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 15:18:02,098-Speed 5976.85 samples/sec   Loss 8.3061   LearningRate 0.1398   Epoch: 9   Global Step: 97040   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:18:08,986-Speed 5948.25 samples/sec   Loss 8.3026   LearningRate 0.1398   Epoch: 9   Global Step: 97050   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:18:15,893-Speed 5931.10 samples/sec   Loss 8.2879   LearningRate 0.1397   Epoch: 9   Global Step: 97060   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:18:22,753-Speed 5972.11 samples/sec   Loss 8.2855   LearningRate 0.1397   Epoch: 9   Global Step: 97070   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:18:29,616-Speed 5969.36 samples/sec   Loss 8.3109   LearningRate 0.1397   Epoch: 9   Global Step: 97080   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:18:36,571-Speed 5890.74 samples/sec   Loss 8.2593   LearningRate 0.1397   Epoch: 9   Global Step: 97090   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:18:43,443-Speed 5961.28 samples/sec   Loss 8.2925   LearningRate 0.1396   Epoch: 9   Global Step: 97100   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:18:50,294-Speed 5979.70 samples/sec   Loss 8.2591   LearningRate 0.1396   Epoch: 9   Global Step: 97110   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:18:57,151-Speed 5974.99 samples/sec   Loss 8.3029   LearningRate 0.1396   Epoch: 9   Global Step: 97120   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:19:04,003-Speed 5978.27 samples/sec   Loss 8.3612   LearningRate 0.1396   Epoch: 9   Global Step: 97130   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:19:10,867-Speed 5968.68 samples/sec   Loss 8.2615   LearningRate 0.1395   Epoch: 9   Global Step: 97140   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 15:19:17,719-Speed 5980.76 samples/sec   Loss 8.2903   LearningRate 0.1395   Epoch: 9   Global Step: 97150   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 15:19:24,579-Speed 5975.32 samples/sec   Loss 8.2707   LearningRate 0.1395   Epoch: 9   Global Step: 97160   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 15:19:31,436-Speed 5974.34 samples/sec   Loss 8.2932   LearningRate 0.1395   Epoch: 9   Global Step: 97170   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 15:19:38,321-Speed 5949.91 samples/sec   Loss 8.2741   LearningRate 0.1394   Epoch: 9   Global Step: 97180   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:19:45,171-Speed 5982.05 samples/sec   Loss 8.3311   LearningRate 0.1394   Epoch: 9   Global Step: 97190   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:19:52,031-Speed 5971.84 samples/sec   Loss 8.2866   LearningRate 0.1394   Epoch: 9   Global Step: 97200   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:19:58,891-Speed 5971.72 samples/sec   Loss 8.3543   LearningRate 0.1394   Epoch: 9   Global Step: 97210   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:20:05,855-Speed 5882.99 samples/sec   Loss 8.2688   LearningRate 0.1393   Epoch: 9   Global Step: 97220   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:20:12,856-Speed 5853.02 samples/sec   Loss 8.3828   LearningRate 0.1393   Epoch: 9   Global Step: 97230   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:20:19,709-Speed 5978.54 samples/sec   Loss 8.3579   LearningRate 0.1393   Epoch: 9   Global Step: 97240   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:20:26,561-Speed 5979.49 samples/sec   Loss 8.2823   LearningRate 0.1393   Epoch: 9   Global Step: 97250   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:20:33,412-Speed 5979.25 samples/sec   Loss 8.2964   LearningRate 0.1392   Epoch: 9   Global Step: 97260   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:20:40,377-Speed 5883.03 samples/sec   Loss 8.2977   LearningRate 0.1392   Epoch: 9   Global Step: 97270   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:20:47,378-Speed 5852.62 samples/sec   Loss 8.2688   LearningRate 0.1392   Epoch: 9   Global Step: 97280   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 15:20:54,323-Speed 5898.98 samples/sec   Loss 8.3110   LearningRate 0.1392   Epoch: 9   Global Step: 97290   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:21:01,297-Speed 5874.71 samples/sec   Loss 8.2903   LearningRate 0.1391   Epoch: 9   Global Step: 97300   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:21:08,140-Speed 5986.88 samples/sec   Loss 8.2418   LearningRate 0.1391   Epoch: 9   Global Step: 97310   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:21:14,992-Speed 5979.33 samples/sec   Loss 8.2061   LearningRate 0.1391   Epoch: 9   Global Step: 97320   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:21:21,871-Speed 5956.09 samples/sec   Loss 8.3449   LearningRate 0.1391   Epoch: 9   Global Step: 97330   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:21:28,736-Speed 5967.24 samples/sec   Loss 8.3462   LearningRate 0.1390   Epoch: 9   Global Step: 97340   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:21:35,600-Speed 5968.46 samples/sec   Loss 8.2842   LearningRate 0.1390   Epoch: 9   Global Step: 97350   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:21:42,488-Speed 5947.76 samples/sec   Loss 8.2633   LearningRate 0.1390   Epoch: 9   Global Step: 97360   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:21:49,347-Speed 5974.68 samples/sec   Loss 8.3623   LearningRate 0.1390   Epoch: 9   Global Step: 97370   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:21:56,200-Speed 5977.99 samples/sec   Loss 8.3407   LearningRate 0.1389   Epoch: 9   Global Step: 97380   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:22:03,064-Speed 5968.12 samples/sec   Loss 8.2615   LearningRate 0.1389   Epoch: 9   Global Step: 97390   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:22:09,940-Speed 5958.63 samples/sec   Loss 8.3161   LearningRate 0.1389   Epoch: 9   Global Step: 97400   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:22:16,794-Speed 5977.97 samples/sec   Loss 8.3301   LearningRate 0.1389   Epoch: 9   Global Step: 97410   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:22:23,670-Speed 5958.27 samples/sec   Loss 8.2898   LearningRate 0.1388   Epoch: 9   Global Step: 97420   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:22:30,569-Speed 5937.57 samples/sec   Loss 8.3544   LearningRate 0.1388   Epoch: 9   Global Step: 97430   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:22:37,427-Speed 5974.32 samples/sec   Loss 8.2872   LearningRate 0.1388   Epoch: 9   Global Step: 97440   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:22:44,282-Speed 5976.98 samples/sec   Loss 8.2930   LearningRate 0.1388   Epoch: 9   Global Step: 97450   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:22:51,144-Speed 5973.38 samples/sec   Loss 8.3039   LearningRate 0.1387   Epoch: 9   Global Step: 97460   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:22:58,000-Speed 5975.18 samples/sec   Loss 8.2258   LearningRate 0.1387   Epoch: 9   Global Step: 97470   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:23:04,861-Speed 5970.47 samples/sec   Loss 8.2841   LearningRate 0.1387   Epoch: 9   Global Step: 97480   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:23:11,710-Speed 5981.84 samples/sec   Loss 8.2683   LearningRate 0.1387   Epoch: 9   Global Step: 97490   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 15:23:18,569-Speed 5972.91 samples/sec   Loss 8.3439   LearningRate 0.1386   Epoch: 9   Global Step: 97500   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 15:23:25,424-Speed 5976.06 samples/sec   Loss 8.3595   LearningRate 0.1386   Epoch: 9   Global Step: 97510   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 15:23:32,304-Speed 5957.13 samples/sec   Loss 8.2271   LearningRate 0.1386   Epoch: 9   Global Step: 97520   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:23:39,172-Speed 5964.84 samples/sec   Loss 8.3356   LearningRate 0.1386   Epoch: 9   Global Step: 97530   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:23:46,019-Speed 5983.37 samples/sec   Loss 8.2181   LearningRate 0.1385   Epoch: 9   Global Step: 97540   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:23:52,880-Speed 5971.03 samples/sec   Loss 8.2866   LearningRate 0.1385   Epoch: 9   Global Step: 97550   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:23:59,735-Speed 5976.64 samples/sec   Loss 8.3274   LearningRate 0.1385   Epoch: 9   Global Step: 97560   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:24:06,604-Speed 5964.74 samples/sec   Loss 8.2674   LearningRate 0.1385   Epoch: 9   Global Step: 97570   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:24:13,467-Speed 5969.23 samples/sec   Loss 8.2679   LearningRate 0.1384   Epoch: 9   Global Step: 97580   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:24:20,332-Speed 5969.69 samples/sec   Loss 8.2621   LearningRate 0.1384   Epoch: 9   Global Step: 97590   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:24:27,178-Speed 5983.84 samples/sec   Loss 8.3020   LearningRate 0.1384   Epoch: 9   Global Step: 97600   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:24:34,032-Speed 5977.37 samples/sec   Loss 8.2743   LearningRate 0.1384   Epoch: 9   Global Step: 97610   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:24:40,898-Speed 5966.79 samples/sec   Loss 8.3558   LearningRate 0.1383   Epoch: 9   Global Step: 97620   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 15:24:47,761-Speed 5969.40 samples/sec   Loss 8.3771   LearningRate 0.1383   Epoch: 9   Global Step: 97630   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:24:54,621-Speed 5971.37 samples/sec   Loss 8.2599   LearningRate 0.1383   Epoch: 9   Global Step: 97640   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:25:01,493-Speed 5962.35 samples/sec   Loss 8.2968   LearningRate 0.1383   Epoch: 9   Global Step: 97650   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:25:08,374-Speed 5953.46 samples/sec   Loss 8.2643   LearningRate 0.1382   Epoch: 9   Global Step: 97660   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:25:15,220-Speed 5983.73 samples/sec   Loss 8.2625   LearningRate 0.1382   Epoch: 9   Global Step: 97670   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:25:22,072-Speed 5979.33 samples/sec   Loss 8.2031   LearningRate 0.1382   Epoch: 9   Global Step: 97680   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:25:28,945-Speed 5960.34 samples/sec   Loss 8.2362   LearningRate 0.1382   Epoch: 9   Global Step: 97690   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:25:35,852-Speed 5931.80 samples/sec   Loss 8.2200   LearningRate 0.1381   Epoch: 9   Global Step: 97700   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:25:42,705-Speed 5978.16 samples/sec   Loss 8.2570   LearningRate 0.1381   Epoch: 9   Global Step: 97710   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:25:49,583-Speed 5956.51 samples/sec   Loss 8.3377   LearningRate 0.1381   Epoch: 9   Global Step: 97720   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-01-08 15:25:56,452-Speed 5964.12 samples/sec   Loss 8.2374   LearningRate 0.1381   Epoch: 9   Global Step: 97730   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-01-08 15:26:03,314-Speed 5970.36 samples/sec   Loss 8.3224   LearningRate 0.1380   Epoch: 9   Global Step: 97740   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-08 15:26:10,172-Speed 5976.45 samples/sec   Loss 8.2956   LearningRate 0.1380   Epoch: 9   Global Step: 97750   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:26:17,017-Speed 5984.73 samples/sec   Loss 8.2910   LearningRate 0.1380   Epoch: 9   Global Step: 97760   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:26:23,879-Speed 5970.15 samples/sec   Loss 8.2502   LearningRate 0.1380   Epoch: 9   Global Step: 97770   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:26:30,738-Speed 5973.49 samples/sec   Loss 8.2795   LearningRate 0.1379   Epoch: 9   Global Step: 97780   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:26:37,603-Speed 5968.89 samples/sec   Loss 8.2746   LearningRate 0.1379   Epoch: 9   Global Step: 97790   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:26:44,450-Speed 5983.36 samples/sec   Loss 8.2756   LearningRate 0.1379   Epoch: 9   Global Step: 97800   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:26:51,316-Speed 5967.44 samples/sec   Loss 8.2783   LearningRate 0.1379   Epoch: 9   Global Step: 97810   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:26:58,182-Speed 5967.03 samples/sec   Loss 8.2767   LearningRate 0.1378   Epoch: 9   Global Step: 97820   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:27:05,032-Speed 5981.54 samples/sec   Loss 8.2690   LearningRate 0.1378   Epoch: 9   Global Step: 97830   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:27:11,887-Speed 5976.28 samples/sec   Loss 8.2621   LearningRate 0.1378   Epoch: 9   Global Step: 97840   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:27:18,750-Speed 5968.78 samples/sec   Loss 8.2846   LearningRate 0.1378   Epoch: 9   Global Step: 97850   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-08 15:27:25,605-Speed 5977.21 samples/sec   Loss 8.2962   LearningRate 0.1377   Epoch: 9   Global Step: 97860   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:27:32,461-Speed 5975.23 samples/sec   Loss 8.2346   LearningRate 0.1377   Epoch: 9   Global Step: 97870   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:27:39,389-Speed 5913.59 samples/sec   Loss 8.2725   LearningRate 0.1377   Epoch: 9   Global Step: 97880   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:27:46,245-Speed 5975.92 samples/sec   Loss 8.2901   LearningRate 0.1377   Epoch: 9   Global Step: 97890   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:27:53,095-Speed 5980.17 samples/sec   Loss 8.3533   LearningRate 0.1376   Epoch: 9   Global Step: 97900   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:27:59,957-Speed 5970.12 samples/sec   Loss 8.2865   LearningRate 0.1376   Epoch: 9   Global Step: 97910   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:28:06,858-Speed 5938.03 samples/sec   Loss 8.2713   LearningRate 0.1376   Epoch: 9   Global Step: 97920   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:28:13,755-Speed 5940.17 samples/sec   Loss 8.2311   LearningRate 0.1376   Epoch: 9   Global Step: 97930   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:28:20,611-Speed 5975.68 samples/sec   Loss 8.2068   LearningRate 0.1375   Epoch: 9   Global Step: 97940   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:28:27,501-Speed 5945.60 samples/sec   Loss 8.2877   LearningRate 0.1375   Epoch: 9   Global Step: 97950   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:28:34,391-Speed 5945.91 samples/sec   Loss 8.2512   LearningRate 0.1375   Epoch: 9   Global Step: 97960   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:28:41,247-Speed 5976.03 samples/sec   Loss 8.2536   LearningRate 0.1375   Epoch: 9   Global Step: 97970   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:28:48,098-Speed 5979.87 samples/sec   Loss 8.2541   LearningRate 0.1374   Epoch: 9   Global Step: 97980   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:28:54,968-Speed 5963.13 samples/sec   Loss 8.2490   LearningRate 0.1374   Epoch: 9   Global Step: 97990   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:29:01,827-Speed 5973.38 samples/sec   Loss 8.1416   LearningRate 0.1374   Epoch: 9   Global Step: 98000   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:29:08,692-Speed 5967.58 samples/sec   Loss 8.2225   LearningRate 0.1374   Epoch: 9   Global Step: 98010   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:29:15,538-Speed 5983.84 samples/sec   Loss 8.3038   LearningRate 0.1373   Epoch: 9   Global Step: 98020   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:29:22,406-Speed 5965.71 samples/sec   Loss 8.2993   LearningRate 0.1373   Epoch: 9   Global Step: 98030   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:29:29,279-Speed 5960.70 samples/sec   Loss 8.2742   LearningRate 0.1373   Epoch: 9   Global Step: 98040   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:29:36,139-Speed 5972.10 samples/sec   Loss 8.2663   LearningRate 0.1373   Epoch: 9   Global Step: 98050   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:29:43,037-Speed 5940.05 samples/sec   Loss 8.2489   LearningRate 0.1372   Epoch: 9   Global Step: 98060   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-08 15:29:49,894-Speed 5974.44 samples/sec   Loss 8.2492   LearningRate 0.1372   Epoch: 9   Global Step: 98070   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-08 15:29:56,749-Speed 5978.29 samples/sec   Loss 8.2580   LearningRate 0.1372   Epoch: 9   Global Step: 98080   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:30:03,621-Speed 5962.41 samples/sec   Loss 8.2940   LearningRate 0.1372   Epoch: 9   Global Step: 98090   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:30:10,487-Speed 5966.70 samples/sec   Loss 8.2041   LearningRate 0.1371   Epoch: 9   Global Step: 98100   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:30:17,377-Speed 5945.87 samples/sec   Loss 8.2132   LearningRate 0.1371   Epoch: 9   Global Step: 98110   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:30:24,228-Speed 5980.08 samples/sec   Loss 8.2686   LearningRate 0.1371   Epoch: 9   Global Step: 98120   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:30:31,086-Speed 5973.91 samples/sec   Loss 8.2292   LearningRate 0.1371   Epoch: 9   Global Step: 98130   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:30:37,990-Speed 5934.12 samples/sec   Loss 8.2460   LearningRate 0.1370   Epoch: 9   Global Step: 98140   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:30:44,862-Speed 5961.69 samples/sec   Loss 8.2650   LearningRate 0.1370   Epoch: 9   Global Step: 98150   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:30:51,733-Speed 5962.65 samples/sec   Loss 8.1891   LearningRate 0.1370   Epoch: 9   Global Step: 98160   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:30:58,585-Speed 5979.49 samples/sec   Loss 8.3401   LearningRate 0.1370   Epoch: 9   Global Step: 98170   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:31:05,460-Speed 5959.18 samples/sec   Loss 8.2303   LearningRate 0.1369   Epoch: 9   Global Step: 98180   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-08 15:31:12,325-Speed 5968.23 samples/sec   Loss 8.2491   LearningRate 0.1369   Epoch: 9   Global Step: 98190   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:31:19,180-Speed 5975.49 samples/sec   Loss 8.3168   LearningRate 0.1369   Epoch: 9   Global Step: 98200   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:31:26,049-Speed 5964.99 samples/sec   Loss 8.2436   LearningRate 0.1369   Epoch: 9   Global Step: 98210   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:31:32,912-Speed 5969.59 samples/sec   Loss 8.2803   LearningRate 0.1368   Epoch: 9   Global Step: 98220   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:31:39,778-Speed 5966.69 samples/sec   Loss 8.2225   LearningRate 0.1368   Epoch: 9   Global Step: 98230   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:31:46,644-Speed 5967.04 samples/sec   Loss 8.2615   LearningRate 0.1368   Epoch: 9   Global Step: 98240   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:31:53,495-Speed 5979.85 samples/sec   Loss 8.2266   LearningRate 0.1368   Epoch: 9   Global Step: 98250   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:32:00,366-Speed 5962.65 samples/sec   Loss 8.2555   LearningRate 0.1367   Epoch: 9   Global Step: 98260   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:32:07,253-Speed 5949.47 samples/sec   Loss 8.2067   LearningRate 0.1367   Epoch: 9   Global Step: 98270   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:32:14,109-Speed 5975.48 samples/sec   Loss 8.2822   LearningRate 0.1367   Epoch: 9   Global Step: 98280   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:32:20,973-Speed 5970.20 samples/sec   Loss 8.1825   LearningRate 0.1367   Epoch: 9   Global Step: 98290   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:32:27,870-Speed 5939.86 samples/sec   Loss 8.2746   LearningRate 0.1366   Epoch: 9   Global Step: 98300   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:32:34,745-Speed 5958.54 samples/sec   Loss 8.2554   LearningRate 0.1366   Epoch: 9   Global Step: 98310   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:32:41,593-Speed 5983.12 samples/sec   Loss 8.2397   LearningRate 0.1366   Epoch: 9   Global Step: 98320   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:32:48,511-Speed 5923.02 samples/sec   Loss 8.2082   LearningRate 0.1366   Epoch: 9   Global Step: 98330   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:32:55,399-Speed 5947.60 samples/sec   Loss 8.1974   LearningRate 0.1365   Epoch: 9   Global Step: 98340   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:33:02,288-Speed 5946.65 samples/sec   Loss 8.1584   LearningRate 0.1365   Epoch: 9   Global Step: 98350   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:33:09,147-Speed 5973.67 samples/sec   Loss 8.2138   LearningRate 0.1365   Epoch: 9   Global Step: 98360   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:33:15,991-Speed 5985.85 samples/sec   Loss 8.2126   LearningRate 0.1365   Epoch: 9   Global Step: 98370   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:33:22,871-Speed 5953.88 samples/sec   Loss 8.1624   LearningRate 0.1364   Epoch: 9   Global Step: 98380   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:33:29,736-Speed 5967.69 samples/sec   Loss 8.2622   LearningRate 0.1364   Epoch: 9   Global Step: 98390   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:33:36,601-Speed 5968.11 samples/sec   Loss 8.2412   LearningRate 0.1364   Epoch: 9   Global Step: 98400   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:33:43,462-Speed 5970.25 samples/sec   Loss 8.2111   LearningRate 0.1363   Epoch: 9   Global Step: 98410   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:33:50,350-Speed 5949.52 samples/sec   Loss 8.2805   LearningRate 0.1363   Epoch: 9   Global Step: 98420   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:33:57,191-Speed 5988.40 samples/sec   Loss 8.2371   LearningRate 0.1363   Epoch: 9   Global Step: 98430   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:34:04,051-Speed 5971.50 samples/sec   Loss 8.1165   LearningRate 0.1363   Epoch: 9   Global Step: 98440   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:34:10,897-Speed 5984.77 samples/sec   Loss 8.2776   LearningRate 0.1362   Epoch: 9   Global Step: 98450   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:34:17,762-Speed 5967.33 samples/sec   Loss 8.1864   LearningRate 0.1362   Epoch: 9   Global Step: 98460   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:34:24,640-Speed 5956.68 samples/sec   Loss 8.1777   LearningRate 0.1362   Epoch: 9   Global Step: 98470   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:34:31,520-Speed 5953.93 samples/sec   Loss 8.2214   LearningRate 0.1362   Epoch: 9   Global Step: 98480   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:34:38,363-Speed 5986.84 samples/sec   Loss 8.2398   LearningRate 0.1361   Epoch: 9   Global Step: 98490   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:34:45,216-Speed 5977.85 samples/sec   Loss 8.1783   LearningRate 0.1361   Epoch: 9   Global Step: 98500   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:34:52,089-Speed 5960.63 samples/sec   Loss 8.1769   LearningRate 0.1361   Epoch: 9   Global Step: 98510   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:34:58,942-Speed 5977.55 samples/sec   Loss 8.2323   LearningRate 0.1361   Epoch: 9   Global Step: 98520   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:35:05,804-Speed 5970.55 samples/sec   Loss 8.2366   LearningRate 0.1360   Epoch: 9   Global Step: 98530   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:35:12,650-Speed 5984.47 samples/sec   Loss 8.2106   LearningRate 0.1360   Epoch: 9   Global Step: 98540   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:35:19,529-Speed 5955.87 samples/sec   Loss 8.1485   LearningRate 0.1360   Epoch: 9   Global Step: 98550   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:35:26,385-Speed 5975.31 samples/sec   Loss 8.1816   LearningRate 0.1360   Epoch: 9   Global Step: 98560   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:35:33,225-Speed 5989.37 samples/sec   Loss 8.2244   LearningRate 0.1359   Epoch: 9   Global Step: 98570   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:35:40,069-Speed 5985.21 samples/sec   Loss 8.2433   LearningRate 0.1359   Epoch: 9   Global Step: 98580   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:35:46,936-Speed 5965.70 samples/sec   Loss 8.2864   LearningRate 0.1359   Epoch: 9   Global Step: 98590   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:35:53,817-Speed 5954.46 samples/sec   Loss 8.2135   LearningRate 0.1359   Epoch: 9   Global Step: 98600   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:36:00,699-Speed 5952.65 samples/sec   Loss 8.2224   LearningRate 0.1358   Epoch: 9   Global Step: 98610   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:36:07,595-Speed 5940.83 samples/sec   Loss 8.2727   LearningRate 0.1358   Epoch: 9   Global Step: 98620   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:36:14,454-Speed 5975.36 samples/sec   Loss 8.2061   LearningRate 0.1358   Epoch: 9   Global Step: 98630   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:36:21,301-Speed 5983.66 samples/sec   Loss 8.1858   LearningRate 0.1358   Epoch: 9   Global Step: 98640   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:36:28,158-Speed 5973.94 samples/sec   Loss 8.2357   LearningRate 0.1358   Epoch: 9   Global Step: 98650   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:36:35,012-Speed 5977.57 samples/sec   Loss 8.1357   LearningRate 0.1357   Epoch: 9   Global Step: 98660   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:36:41,870-Speed 5973.69 samples/sec   Loss 8.1950   LearningRate 0.1357   Epoch: 9   Global Step: 98670   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:36:48,747-Speed 5957.08 samples/sec   Loss 8.1684   LearningRate 0.1357   Epoch: 9   Global Step: 98680   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:36:55,599-Speed 5979.10 samples/sec   Loss 8.1856   LearningRate 0.1357   Epoch: 9   Global Step: 98690   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-08 15:37:02,481-Speed 5953.16 samples/sec   Loss 8.1883   LearningRate 0.1356   Epoch: 9   Global Step: 98700   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-08 15:37:09,334-Speed 5978.23 samples/sec   Loss 8.1893   LearningRate 0.1356   Epoch: 9   Global Step: 98710   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:37:16,206-Speed 5961.03 samples/sec   Loss 8.2013   LearningRate 0.1356   Epoch: 9   Global Step: 98720   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:37:23,059-Speed 5978.33 samples/sec   Loss 8.1913   LearningRate 0.1356   Epoch: 9   Global Step: 98730   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:37:29,915-Speed 5975.64 samples/sec   Loss 8.2242   LearningRate 0.1355   Epoch: 9   Global Step: 98740   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:37:36,775-Speed 5972.38 samples/sec   Loss 8.2279   LearningRate 0.1355   Epoch: 9   Global Step: 98750   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:37:43,629-Speed 5976.44 samples/sec   Loss 8.2080   LearningRate 0.1355   Epoch: 9   Global Step: 98760   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:37:50,495-Speed 5966.82 samples/sec   Loss 8.2050   LearningRate 0.1355   Epoch: 9   Global Step: 98770   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:37:57,370-Speed 5959.58 samples/sec   Loss 8.1973   LearningRate 0.1354   Epoch: 9   Global Step: 98780   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:38:04,244-Speed 5959.48 samples/sec   Loss 8.1549   LearningRate 0.1354   Epoch: 9   Global Step: 98790   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:38:11,098-Speed 5977.63 samples/sec   Loss 8.2171   LearningRate 0.1354   Epoch: 9   Global Step: 98800   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:38:17,960-Speed 5969.79 samples/sec   Loss 8.2406   LearningRate 0.1354   Epoch: 9   Global Step: 98810   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:38:24,811-Speed 5980.17 samples/sec   Loss 8.2407   LearningRate 0.1353   Epoch: 9   Global Step: 98820   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:38:31,675-Speed 5969.19 samples/sec   Loss 8.2067   LearningRate 0.1353   Epoch: 9   Global Step: 98830   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:38:38,524-Speed 5981.07 samples/sec   Loss 8.2349   LearningRate 0.1353   Epoch: 9   Global Step: 98840   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:38:45,401-Speed 5960.47 samples/sec   Loss 8.2162   LearningRate 0.1353   Epoch: 9   Global Step: 98850   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:38:52,256-Speed 5975.96 samples/sec   Loss 8.2269   LearningRate 0.1352   Epoch: 9   Global Step: 98860   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:38:59,139-Speed 5952.75 samples/sec   Loss 8.2069   LearningRate 0.1352   Epoch: 9   Global Step: 98870   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:39:05,988-Speed 5981.21 samples/sec   Loss 8.1926   LearningRate 0.1352   Epoch: 9   Global Step: 98880   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:39:12,871-Speed 5952.21 samples/sec   Loss 8.1943   LearningRate 0.1352   Epoch: 9   Global Step: 98890   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:39:19,721-Speed 5983.43 samples/sec   Loss 8.2658   LearningRate 0.1351   Epoch: 9   Global Step: 98900   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:39:26,590-Speed 5963.93 samples/sec   Loss 8.1996   LearningRate 0.1351   Epoch: 9   Global Step: 98910   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:39:33,446-Speed 5975.50 samples/sec   Loss 8.2428   LearningRate 0.1351   Epoch: 9   Global Step: 98920   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:39:40,300-Speed 5977.27 samples/sec   Loss 8.2411   LearningRate 0.1351   Epoch: 9   Global Step: 98930   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:39:47,151-Speed 5978.84 samples/sec   Loss 8.1901   LearningRate 0.1350   Epoch: 9   Global Step: 98940   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:39:54,002-Speed 5981.23 samples/sec   Loss 8.1448   LearningRate 0.1350   Epoch: 9   Global Step: 98950   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:40:00,888-Speed 5949.20 samples/sec   Loss 8.1953   LearningRate 0.1350   Epoch: 9   Global Step: 98960   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:40:07,728-Speed 5988.71 samples/sec   Loss 8.2079   LearningRate 0.1350   Epoch: 9   Global Step: 98970   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:40:14,597-Speed 5964.48 samples/sec   Loss 8.1665   LearningRate 0.1349   Epoch: 9   Global Step: 98980   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:40:21,478-Speed 5955.95 samples/sec   Loss 8.2327   LearningRate 0.1349   Epoch: 9   Global Step: 98990   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:40:28,353-Speed 5976.17 samples/sec   Loss 8.1561   LearningRate 0.1349   Epoch: 9   Global Step: 99000   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:40:35,214-Speed 5970.29 samples/sec   Loss 8.2375   LearningRate 0.1349   Epoch: 9   Global Step: 99010   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:40:42,092-Speed 5957.22 samples/sec   Loss 8.2395   LearningRate 0.1348   Epoch: 9   Global Step: 99020   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:40:48,951-Speed 5972.78 samples/sec   Loss 8.1773   LearningRate 0.1348   Epoch: 9   Global Step: 99030   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:40:55,807-Speed 5975.50 samples/sec   Loss 8.1764   LearningRate 0.1348   Epoch: 9   Global Step: 99040   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:41:02,652-Speed 5984.79 samples/sec   Loss 8.1724   LearningRate 0.1348   Epoch: 9   Global Step: 99050   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:41:09,511-Speed 5972.71 samples/sec   Loss 8.2102   LearningRate 0.1347   Epoch: 9   Global Step: 99060   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:41:16,369-Speed 5973.81 samples/sec   Loss 8.1815   LearningRate 0.1347   Epoch: 9   Global Step: 99070   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:41:23,241-Speed 5961.53 samples/sec   Loss 8.2094   LearningRate 0.1347   Epoch: 9   Global Step: 99080   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:41:30,110-Speed 5964.60 samples/sec   Loss 8.1206   LearningRate 0.1347   Epoch: 9   Global Step: 99090   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:41:36,975-Speed 5967.00 samples/sec   Loss 8.1377   LearningRate 0.1346   Epoch: 9   Global Step: 99100   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:41:43,832-Speed 5974.86 samples/sec   Loss 8.2276   LearningRate 0.1346   Epoch: 9   Global Step: 99110   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:41:50,714-Speed 5953.87 samples/sec   Loss 8.2005   LearningRate 0.1346   Epoch: 9   Global Step: 99120   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:41:57,611-Speed 5940.98 samples/sec   Loss 8.1623   LearningRate 0.1346   Epoch: 9   Global Step: 99130   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:42:04,461-Speed 5980.82 samples/sec   Loss 8.1180   LearningRate 0.1345   Epoch: 9   Global Step: 99140   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:42:11,344-Speed 5954.25 samples/sec   Loss 8.1001   LearningRate 0.1345   Epoch: 9   Global Step: 99150   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:42:18,191-Speed 5983.25 samples/sec   Loss 8.2057   LearningRate 0.1345   Epoch: 9   Global Step: 99160   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:42:25,056-Speed 5967.76 samples/sec   Loss 8.1412   LearningRate 0.1345   Epoch: 9   Global Step: 99170   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-08 15:42:31,929-Speed 5961.25 samples/sec   Loss 8.1758   LearningRate 0.1344   Epoch: 9   Global Step: 99180   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-08 15:42:38,815-Speed 5949.40 samples/sec   Loss 8.2563   LearningRate 0.1344   Epoch: 9   Global Step: 99190   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-08 15:42:45,668-Speed 5978.54 samples/sec   Loss 8.1552   LearningRate 0.1344   Epoch: 9   Global Step: 99200   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-08 15:42:52,526-Speed 5973.22 samples/sec   Loss 8.1841   LearningRate 0.1344   Epoch: 9   Global Step: 99210   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-08 15:42:59,370-Speed 5986.98 samples/sec   Loss 8.1932   LearningRate 0.1343   Epoch: 9   Global Step: 99220   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:43:06,239-Speed 5963.79 samples/sec   Loss 8.0924   LearningRate 0.1343   Epoch: 9   Global Step: 99230   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:43:13,080-Speed 5988.26 samples/sec   Loss 8.1686   LearningRate 0.1343   Epoch: 9   Global Step: 99240   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:43:19,929-Speed 5981.52 samples/sec   Loss 8.1840   LearningRate 0.1343   Epoch: 9   Global Step: 99250   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:43:26,784-Speed 5977.10 samples/sec   Loss 8.1794   LearningRate 0.1342   Epoch: 9   Global Step: 99260   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:43:33,653-Speed 5964.17 samples/sec   Loss 8.1802   LearningRate 0.1342   Epoch: 9   Global Step: 99270   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:43:40,513-Speed 5972.34 samples/sec   Loss 8.1780   LearningRate 0.1342   Epoch: 9   Global Step: 99280   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:43:47,382-Speed 5963.71 samples/sec   Loss 8.1523   LearningRate 0.1342   Epoch: 9   Global Step: 99290   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:43:54,319-Speed 5905.61 samples/sec   Loss 8.1522   LearningRate 0.1341   Epoch: 9   Global Step: 99300   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:44:01,175-Speed 5978.13 samples/sec   Loss 8.1329   LearningRate 0.1341   Epoch: 9   Global Step: 99310   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:44:08,024-Speed 5981.97 samples/sec   Loss 8.1847   LearningRate 0.1341   Epoch: 9   Global Step: 99320   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-08 15:44:14,976-Speed 5893.05 samples/sec   Loss 8.1259   LearningRate 0.1341   Epoch: 9   Global Step: 99330   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:44:21,827-Speed 5979.65 samples/sec   Loss 8.1821   LearningRate 0.1340   Epoch: 9   Global Step: 99340   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:44:28,682-Speed 5976.69 samples/sec   Loss 8.1532   LearningRate 0.1340   Epoch: 9   Global Step: 99350   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:44:35,539-Speed 5974.33 samples/sec   Loss 8.1619   LearningRate 0.1340   Epoch: 9   Global Step: 99360   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:44:42,402-Speed 5969.58 samples/sec   Loss 8.1505   LearningRate 0.1340   Epoch: 9   Global Step: 99370   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:44:49,273-Speed 5962.61 samples/sec   Loss 8.1431   LearningRate 0.1339   Epoch: 9   Global Step: 99380   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:44:56,133-Speed 5971.68 samples/sec   Loss 8.1613   LearningRate 0.1339   Epoch: 9   Global Step: 99390   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:45:02,983-Speed 5980.85 samples/sec   Loss 8.1450   LearningRate 0.1339   Epoch: 9   Global Step: 99400   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:45:09,854-Speed 5962.75 samples/sec   Loss 8.1842   LearningRate 0.1339   Epoch: 9   Global Step: 99410   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:45:16,727-Speed 5960.60 samples/sec   Loss 8.1714   LearningRate 0.1338   Epoch: 9   Global Step: 99420   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:45:23,597-Speed 5965.66 samples/sec   Loss 8.1860   LearningRate 0.1338   Epoch: 9   Global Step: 99430   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-08 15:45:30,440-Speed 5986.69 samples/sec   Loss 8.1792   LearningRate 0.1338   Epoch: 9   Global Step: 99440   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:45:37,284-Speed 5985.23 samples/sec   Loss 8.2296   LearningRate 0.1338   Epoch: 9   Global Step: 99450   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:45:44,131-Speed 5983.51 samples/sec   Loss 8.1709   LearningRate 0.1337   Epoch: 9   Global Step: 99460   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:45:50,991-Speed 5971.32 samples/sec   Loss 8.1073   LearningRate 0.1337   Epoch: 9   Global Step: 99470   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:45:57,886-Speed 5942.70 samples/sec   Loss 8.1364   LearningRate 0.1337   Epoch: 9   Global Step: 99480   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:46:04,744-Speed 5973.85 samples/sec   Loss 8.1453   LearningRate 0.1337   Epoch: 9   Global Step: 99490   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:46:11,611-Speed 5966.05 samples/sec   Loss 8.1031   LearningRate 0.1336   Epoch: 9   Global Step: 99500   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:46:18,463-Speed 5978.44 samples/sec   Loss 8.0761   LearningRate 0.1336   Epoch: 9   Global Step: 99510   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:46:25,320-Speed 5975.49 samples/sec   Loss 8.1261   LearningRate 0.1336   Epoch: 9   Global Step: 99520   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:46:32,201-Speed 5953.94 samples/sec   Loss 8.1693   LearningRate 0.1336   Epoch: 9   Global Step: 99530   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:46:39,077-Speed 5958.70 samples/sec   Loss 8.1564   LearningRate 0.1335   Epoch: 9   Global Step: 99540   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-08 15:46:45,933-Speed 5975.11 samples/sec   Loss 8.1098   LearningRate 0.1335   Epoch: 9   Global Step: 99550   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-08 15:46:52,786-Speed 5978.68 samples/sec   Loss 8.1123   LearningRate 0.1335   Epoch: 9   Global Step: 99560   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:46:59,635-Speed 5981.06 samples/sec   Loss 8.1017   LearningRate 0.1335   Epoch: 9   Global Step: 99570   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:47:06,487-Speed 5979.35 samples/sec   Loss 8.0916   LearningRate 0.1334   Epoch: 9   Global Step: 99580   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:47:13,336-Speed 5982.01 samples/sec   Loss 8.2304   LearningRate 0.1334   Epoch: 9   Global Step: 99590   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:47:20,184-Speed 5982.54 samples/sec   Loss 8.2088   LearningRate 0.1334   Epoch: 9   Global Step: 99600   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:47:27,031-Speed 5988.06 samples/sec   Loss 8.1875   LearningRate 0.1334   Epoch: 9   Global Step: 99610   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:47:33,885-Speed 5977.16 samples/sec   Loss 8.1500   LearningRate 0.1333   Epoch: 9   Global Step: 99620   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:47:40,749-Speed 5969.17 samples/sec   Loss 8.1679   LearningRate 0.1333   Epoch: 9   Global Step: 99630   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:47:47,609-Speed 5971.89 samples/sec   Loss 8.0741   LearningRate 0.1333   Epoch: 9   Global Step: 99640   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:47:54,459-Speed 5980.34 samples/sec   Loss 8.1762   LearningRate 0.1333   Epoch: 9   Global Step: 99650   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:48:01,317-Speed 5973.16 samples/sec   Loss 8.1143   LearningRate 0.1332   Epoch: 9   Global Step: 99660   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-08 15:48:08,211-Speed 5943.79 samples/sec   Loss 8.1443   LearningRate 0.1332   Epoch: 9   Global Step: 99670   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:48:15,087-Speed 5957.73 samples/sec   Loss 8.1361   LearningRate 0.1332   Epoch: 9   Global Step: 99680   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:48:21,944-Speed 5974.42 samples/sec   Loss 8.1877   LearningRate 0.1332   Epoch: 9   Global Step: 99690   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:48:28,838-Speed 5943.30 samples/sec   Loss 8.1331   LearningRate 0.1331   Epoch: 9   Global Step: 99700   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:48:35,690-Speed 5978.96 samples/sec   Loss 8.1540   LearningRate 0.1331   Epoch: 9   Global Step: 99710   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:48:42,536-Speed 5983.86 samples/sec   Loss 8.1786   LearningRate 0.1331   Epoch: 9   Global Step: 99720   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:48:49,418-Speed 5952.16 samples/sec   Loss 8.1174   LearningRate 0.1331   Epoch: 9   Global Step: 99730   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:48:56,274-Speed 5975.34 samples/sec   Loss 8.1406   LearningRate 0.1330   Epoch: 9   Global Step: 99740   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:49:03,116-Speed 5987.65 samples/sec   Loss 8.1625   LearningRate 0.1330   Epoch: 9   Global Step: 99750   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:49:09,985-Speed 5964.73 samples/sec   Loss 8.1631   LearningRate 0.1330   Epoch: 9   Global Step: 99760   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:49:16,907-Speed 5918.91 samples/sec   Loss 8.2018   LearningRate 0.1330   Epoch: 9   Global Step: 99770   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-08 15:49:23,763-Speed 5974.93 samples/sec   Loss 8.1783   LearningRate 0.1329   Epoch: 9   Global Step: 99780   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-08 15:49:30,597-Speed 5995.07 samples/sec   Loss 8.1305   LearningRate 0.1329   Epoch: 9   Global Step: 99790   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:49:37,446-Speed 5980.71 samples/sec   Loss 8.0963   LearningRate 0.1329   Epoch: 9   Global Step: 99800   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:49:44,304-Speed 5974.61 samples/sec   Loss 8.2086   LearningRate 0.1329   Epoch: 9   Global Step: 99810   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:49:51,154-Speed 5980.60 samples/sec   Loss 8.1213   LearningRate 0.1328   Epoch: 9   Global Step: 99820   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:49:58,024-Speed 5963.44 samples/sec   Loss 8.1410   LearningRate 0.1328   Epoch: 9   Global Step: 99830   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:50:04,882-Speed 5973.76 samples/sec   Loss 8.1486   LearningRate 0.1328   Epoch: 9   Global Step: 99840   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:50:11,726-Speed 5985.32 samples/sec   Loss 8.1659   LearningRate 0.1328   Epoch: 9   Global Step: 99850   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:50:18,584-Speed 5974.52 samples/sec   Loss 8.1164   LearningRate 0.1327   Epoch: 9   Global Step: 99860   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:50:25,463-Speed 5955.92 samples/sec   Loss 8.1636   LearningRate 0.1327   Epoch: 9   Global Step: 99870   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:50:32,343-Speed 5954.45 samples/sec   Loss 8.0989   LearningRate 0.1327   Epoch: 9   Global Step: 99880   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:50:39,198-Speed 5976.54 samples/sec   Loss 8.0624   LearningRate 0.1327   Epoch: 9   Global Step: 99890   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-08 15:50:46,072-Speed 5959.99 samples/sec   Loss 8.0713   LearningRate 0.1326   Epoch: 9   Global Step: 99900   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-08 15:50:52,926-Speed 5977.25 samples/sec   Loss 8.1521   LearningRate 0.1326   Epoch: 9   Global Step: 99910   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:50:59,791-Speed 5967.17 samples/sec   Loss 8.1443   LearningRate 0.1326   Epoch: 9   Global Step: 99920   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:51:06,634-Speed 5986.79 samples/sec   Loss 8.0925   LearningRate 0.1326   Epoch: 9   Global Step: 99930   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:51:13,497-Speed 5969.21 samples/sec   Loss 8.0720   LearningRate 0.1325   Epoch: 9   Global Step: 99940   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:51:20,373-Speed 5958.42 samples/sec   Loss 8.1456   LearningRate 0.1325   Epoch: 9   Global Step: 99950   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:51:27,243-Speed 5963.56 samples/sec   Loss 8.1441   LearningRate 0.1325   Epoch: 9   Global Step: 99960   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:51:34,126-Speed 5952.80 samples/sec   Loss 8.1028   LearningRate 0.1325   Epoch: 9   Global Step: 99970   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:51:40,990-Speed 5968.35 samples/sec   Loss 8.1817   LearningRate 0.1324   Epoch: 9   Global Step: 99980   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:51:47,868-Speed 5956.09 samples/sec   Loss 8.0969   LearningRate 0.1324   Epoch: 9   Global Step: 99990   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:51:54,716-Speed 5982.40 samples/sec   Loss 8.1212   LearningRate 0.1324   Epoch: 9   Global Step: 100000   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:52:21,676-[lfw][100000]XNorm: 25.080002
Training: 2022-01-08 15:52:21,677-[lfw][100000]Accuracy-Flip: 0.99717+-0.00269
Training: 2022-01-08 15:52:21,678-[lfw][100000]Accuracy-Highest: 0.99750
Training: 2022-01-08 15:52:52,927-[cfp_fp][100000]XNorm: 22.028425
Training: 2022-01-08 15:52:52,928-[cfp_fp][100000]Accuracy-Flip: 0.98257+-0.00530
Training: 2022-01-08 15:52:52,929-[cfp_fp][100000]Accuracy-Highest: 0.98257
Training: 2022-01-08 15:53:19,915-[agedb_30][100000]XNorm: 24.118641
Training: 2022-01-08 15:53:19,916-[agedb_30][100000]Accuracy-Flip: 0.96950+-0.00667
Training: 2022-01-08 15:53:19,916-[agedb_30][100000]Accuracy-Highest: 0.97150
Training: 2022-01-08 15:53:26,763-Speed 445.00 samples/sec   Loss 8.1594   LearningRate 0.1324   Epoch: 9   Global Step: 100010   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-08 15:53:33,579-Speed 6011.99 samples/sec   Loss 8.1189   LearningRate 0.1324   Epoch: 9   Global Step: 100020   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:53:40,442-Speed 5969.54 samples/sec   Loss 8.1343   LearningRate 0.1323   Epoch: 9   Global Step: 100030   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:53:47,309-Speed 5966.83 samples/sec   Loss 8.0438   LearningRate 0.1323   Epoch: 9   Global Step: 100040   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:53:54,158-Speed 5981.65 samples/sec   Loss 8.1311   LearningRate 0.1323   Epoch: 9   Global Step: 100050   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:54:01,035-Speed 5957.72 samples/sec   Loss 8.1485   LearningRate 0.1323   Epoch: 9   Global Step: 100060   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:54:07,922-Speed 5949.20 samples/sec   Loss 8.1221   LearningRate 0.1322   Epoch: 9   Global Step: 100070   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:54:14,806-Speed 5953.32 samples/sec   Loss 8.0612   LearningRate 0.1322   Epoch: 9   Global Step: 100080   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:54:21,692-Speed 5949.35 samples/sec   Loss 8.1228   LearningRate 0.1322   Epoch: 9   Global Step: 100090   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:54:28,590-Speed 5952.70 samples/sec   Loss 8.1454   LearningRate 0.1322   Epoch: 9   Global Step: 100100   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:54:35,495-Speed 5947.63 samples/sec   Loss 8.1453   LearningRate 0.1321   Epoch: 9   Global Step: 100110   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:54:42,471-Speed 5977.76 samples/sec   Loss 8.1020   LearningRate 0.1321   Epoch: 9   Global Step: 100120   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:54:49,354-Speed 5950.98 samples/sec   Loss 8.1753   LearningRate 0.1321   Epoch: 9   Global Step: 100130   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:54:56,284-Speed 5912.21 samples/sec   Loss 8.0929   LearningRate 0.1321   Epoch: 9   Global Step: 100140   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:55:03,141-Speed 5974.75 samples/sec   Loss 8.1083   LearningRate 0.1320   Epoch: 9   Global Step: 100150   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:55:09,992-Speed 5979.86 samples/sec   Loss 8.0710   LearningRate 0.1320   Epoch: 9   Global Step: 100160   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:55:16,887-Speed 5941.16 samples/sec   Loss 8.1171   LearningRate 0.1320   Epoch: 9   Global Step: 100170   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:55:23,753-Speed 5966.27 samples/sec   Loss 8.1434   LearningRate 0.1320   Epoch: 9   Global Step: 100180   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:55:30,612-Speed 5973.15 samples/sec   Loss 8.0478   LearningRate 0.1319   Epoch: 9   Global Step: 100190   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:55:37,473-Speed 5971.12 samples/sec   Loss 8.0923   LearningRate 0.1319   Epoch: 9   Global Step: 100200   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:55:44,326-Speed 5977.81 samples/sec   Loss 8.0735   LearningRate 0.1319   Epoch: 9   Global Step: 100210   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:55:51,209-Speed 5952.15 samples/sec   Loss 8.0818   LearningRate 0.1319   Epoch: 9   Global Step: 100220   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:55:58,088-Speed 5955.39 samples/sec   Loss 8.1112   LearningRate 0.1318   Epoch: 9   Global Step: 100230   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:56:04,958-Speed 5963.68 samples/sec   Loss 8.0430   LearningRate 0.1318   Epoch: 9   Global Step: 100240   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:56:11,825-Speed 5966.80 samples/sec   Loss 8.0827   LearningRate 0.1318   Epoch: 9   Global Step: 100250   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:56:18,688-Speed 5969.52 samples/sec   Loss 8.1107   LearningRate 0.1318   Epoch: 9   Global Step: 100260   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:56:25,526-Speed 5993.71 samples/sec   Loss 8.1608   LearningRate 0.1317   Epoch: 9   Global Step: 100270   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:56:32,373-Speed 5983.17 samples/sec   Loss 8.1320   LearningRate 0.1317   Epoch: 9   Global Step: 100280   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:56:39,239-Speed 5967.01 samples/sec   Loss 8.1750   LearningRate 0.1317   Epoch: 9   Global Step: 100290   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:56:46,100-Speed 5971.15 samples/sec   Loss 8.1075   LearningRate 0.1317   Epoch: 9   Global Step: 100300   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:56:52,957-Speed 5974.80 samples/sec   Loss 8.0190   LearningRate 0.1316   Epoch: 9   Global Step: 100310   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:56:59,804-Speed 5983.77 samples/sec   Loss 8.1246   LearningRate 0.1316   Epoch: 9   Global Step: 100320   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:57:06,685-Speed 5953.40 samples/sec   Loss 8.0684   LearningRate 0.1316   Epoch: 9   Global Step: 100330   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:57:13,541-Speed 5975.84 samples/sec   Loss 8.0488   LearningRate 0.1316   Epoch: 9   Global Step: 100340   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:57:20,409-Speed 5964.59 samples/sec   Loss 8.0747   LearningRate 0.1315   Epoch: 9   Global Step: 100350   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:57:27,289-Speed 5955.44 samples/sec   Loss 8.0770   LearningRate 0.1315   Epoch: 9   Global Step: 100360   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:57:34,150-Speed 5971.04 samples/sec   Loss 8.1299   LearningRate 0.1315   Epoch: 9   Global Step: 100370   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:57:41,009-Speed 5972.61 samples/sec   Loss 8.1465   LearningRate 0.1315   Epoch: 9   Global Step: 100380   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:57:47,861-Speed 5979.21 samples/sec   Loss 8.1514   LearningRate 0.1314   Epoch: 9   Global Step: 100390   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:57:54,710-Speed 5981.51 samples/sec   Loss 8.1138   LearningRate 0.1314   Epoch: 9   Global Step: 100400   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:58:01,554-Speed 5986.08 samples/sec   Loss 8.0883   LearningRate 0.1314   Epoch: 9   Global Step: 100410   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:58:08,410-Speed 5975.15 samples/sec   Loss 8.0766   LearningRate 0.1314   Epoch: 9   Global Step: 100420   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:58:15,268-Speed 5973.47 samples/sec   Loss 8.0545   LearningRate 0.1313   Epoch: 9   Global Step: 100430   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:58:22,125-Speed 5974.45 samples/sec   Loss 8.1000   LearningRate 0.1313   Epoch: 9   Global Step: 100440   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:58:29,003-Speed 5956.57 samples/sec   Loss 8.0365   LearningRate 0.1313   Epoch: 9   Global Step: 100450   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:58:35,873-Speed 5963.74 samples/sec   Loss 8.0943   LearningRate 0.1313   Epoch: 9   Global Step: 100460   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:58:42,733-Speed 5972.28 samples/sec   Loss 8.0646   LearningRate 0.1312   Epoch: 9   Global Step: 100470   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:58:49,592-Speed 5972.92 samples/sec   Loss 8.0641   LearningRate 0.1312   Epoch: 9   Global Step: 100480   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:58:56,451-Speed 5973.21 samples/sec   Loss 8.0334   LearningRate 0.1312   Epoch: 9   Global Step: 100490   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:59:03,318-Speed 5966.22 samples/sec   Loss 8.0887   LearningRate 0.1312   Epoch: 9   Global Step: 100500   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:59:10,167-Speed 5981.97 samples/sec   Loss 8.0572   LearningRate 0.1311   Epoch: 9   Global Step: 100510   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:59:17,032-Speed 5975.41 samples/sec   Loss 8.1135   LearningRate 0.1311   Epoch: 9   Global Step: 100520   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 15:59:23,873-Speed 5988.83 samples/sec   Loss 8.0883   LearningRate 0.1311   Epoch: 9   Global Step: 100530   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:59:30,739-Speed 5969.23 samples/sec   Loss 8.0874   LearningRate 0.1311   Epoch: 9   Global Step: 100540   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:59:37,616-Speed 5956.69 samples/sec   Loss 8.0648   LearningRate 0.1310   Epoch: 9   Global Step: 100550   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:59:44,469-Speed 5978.25 samples/sec   Loss 8.0698   LearningRate 0.1310   Epoch: 9   Global Step: 100560   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:59:51,329-Speed 5972.45 samples/sec   Loss 8.1062   LearningRate 0.1310   Epoch: 9   Global Step: 100570   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 15:59:58,193-Speed 5968.25 samples/sec   Loss 8.0687   LearningRate 0.1310   Epoch: 9   Global Step: 100580   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:00:05,070-Speed 5959.30 samples/sec   Loss 8.0641   LearningRate 0.1309   Epoch: 9   Global Step: 100590   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:00:11,922-Speed 5980.98 samples/sec   Loss 8.0442   LearningRate 0.1309   Epoch: 9   Global Step: 100600   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:00:18,782-Speed 5972.60 samples/sec   Loss 8.0250   LearningRate 0.1309   Epoch: 9   Global Step: 100610   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:00:25,631-Speed 5981.02 samples/sec   Loss 8.0493   LearningRate 0.1309   Epoch: 9   Global Step: 100620   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:00:32,496-Speed 5967.51 samples/sec   Loss 8.0190   LearningRate 0.1309   Epoch: 9   Global Step: 100630   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:00:39,356-Speed 5971.83 samples/sec   Loss 8.0346   LearningRate 0.1308   Epoch: 9   Global Step: 100640   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:00:46,229-Speed 5961.73 samples/sec   Loss 8.0620   LearningRate 0.1308   Epoch: 9   Global Step: 100650   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:00:53,091-Speed 5970.37 samples/sec   Loss 8.0641   LearningRate 0.1308   Epoch: 9   Global Step: 100660   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:00:59,943-Speed 5979.01 samples/sec   Loss 8.1076   LearningRate 0.1308   Epoch: 9   Global Step: 100670   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:01:06,794-Speed 5979.60 samples/sec   Loss 8.0600   LearningRate 0.1307   Epoch: 9   Global Step: 100680   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:01:13,655-Speed 5971.05 samples/sec   Loss 8.1224   LearningRate 0.1307   Epoch: 9   Global Step: 100690   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:01:20,533-Speed 5959.54 samples/sec   Loss 8.0297   LearningRate 0.1307   Epoch: 9   Global Step: 100700   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:01:27,421-Speed 5947.74 samples/sec   Loss 8.0738   LearningRate 0.1307   Epoch: 9   Global Step: 100710   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:01:34,320-Speed 5938.83 samples/sec   Loss 8.1454   LearningRate 0.1306   Epoch: 9   Global Step: 100720   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:01:41,180-Speed 5971.42 samples/sec   Loss 8.1043   LearningRate 0.1306   Epoch: 9   Global Step: 100730   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:01:48,061-Speed 5953.99 samples/sec   Loss 8.0992   LearningRate 0.1306   Epoch: 9   Global Step: 100740   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:01:54,927-Speed 5966.70 samples/sec   Loss 8.0596   LearningRate 0.1306   Epoch: 9   Global Step: 100750   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:02:01,773-Speed 5983.75 samples/sec   Loss 8.0392   LearningRate 0.1305   Epoch: 9   Global Step: 100760   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:02:08,627-Speed 5977.13 samples/sec   Loss 8.0400   LearningRate 0.1305   Epoch: 9   Global Step: 100770   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:02:15,507-Speed 5955.29 samples/sec   Loss 8.0186   LearningRate 0.1305   Epoch: 9   Global Step: 100780   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:02:22,370-Speed 5969.27 samples/sec   Loss 8.1210   LearningRate 0.1305   Epoch: 9   Global Step: 100790   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:02:29,229-Speed 5972.79 samples/sec   Loss 8.0707   LearningRate 0.1304   Epoch: 9   Global Step: 100800   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:02:36,085-Speed 5975.28 samples/sec   Loss 8.0917   LearningRate 0.1304   Epoch: 9   Global Step: 100810   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:02:42,959-Speed 5959.68 samples/sec   Loss 8.0491   LearningRate 0.1304   Epoch: 9   Global Step: 100820   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:02:49,827-Speed 5965.84 samples/sec   Loss 8.0290   LearningRate 0.1304   Epoch: 9   Global Step: 100830   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:02:56,718-Speed 5944.45 samples/sec   Loss 8.1027   LearningRate 0.1303   Epoch: 9   Global Step: 100840   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:03:03,572-Speed 5977.51 samples/sec   Loss 8.0505   LearningRate 0.1303   Epoch: 9   Global Step: 100850   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:03:10,416-Speed 5985.89 samples/sec   Loss 8.0704   LearningRate 0.1303   Epoch: 9   Global Step: 100860   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:03:17,265-Speed 5984.04 samples/sec   Loss 8.0200   LearningRate 0.1303   Epoch: 9   Global Step: 100870   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:03:24,195-Speed 5911.66 samples/sec   Loss 8.0643   LearningRate 0.1302   Epoch: 9   Global Step: 100880   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:03:31,122-Speed 5914.24 samples/sec   Loss 7.9500   LearningRate 0.1302   Epoch: 9   Global Step: 100890   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:03:38,011-Speed 5948.57 samples/sec   Loss 8.0554   LearningRate 0.1302   Epoch: 9   Global Step: 100900   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:03:44,874-Speed 5969.40 samples/sec   Loss 8.0805   LearningRate 0.1302   Epoch: 9   Global Step: 100910   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:03:51,838-Speed 5882.93 samples/sec   Loss 8.0960   LearningRate 0.1301   Epoch: 9   Global Step: 100920   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:03:58,703-Speed 5968.09 samples/sec   Loss 8.0876   LearningRate 0.1301   Epoch: 9   Global Step: 100930   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:04:05,573-Speed 5963.54 samples/sec   Loss 8.0335   LearningRate 0.1301   Epoch: 9   Global Step: 100940   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:04:12,421-Speed 5982.50 samples/sec   Loss 8.0340   LearningRate 0.1301   Epoch: 9   Global Step: 100950   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:04:19,276-Speed 5975.89 samples/sec   Loss 8.0046   LearningRate 0.1300   Epoch: 9   Global Step: 100960   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:04:26,132-Speed 5976.13 samples/sec   Loss 8.0404   LearningRate 0.1300   Epoch: 9   Global Step: 100970   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:04:32,977-Speed 5984.26 samples/sec   Loss 8.0844   LearningRate 0.1300   Epoch: 9   Global Step: 100980   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:04:39,851-Speed 5960.24 samples/sec   Loss 8.0353   LearningRate 0.1300   Epoch: 9   Global Step: 100990   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:04:46,722-Speed 5962.66 samples/sec   Loss 8.0840   LearningRate 0.1299   Epoch: 9   Global Step: 101000   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:04:53,595-Speed 5960.95 samples/sec   Loss 8.0908   LearningRate 0.1299   Epoch: 9   Global Step: 101010   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:05:00,474-Speed 5957.13 samples/sec   Loss 8.1207   LearningRate 0.1299   Epoch: 9   Global Step: 101020   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:05:07,352-Speed 5956.11 samples/sec   Loss 8.0525   LearningRate 0.1299   Epoch: 9   Global Step: 101030   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:05:14,214-Speed 5970.34 samples/sec   Loss 8.0484   LearningRate 0.1298   Epoch: 9   Global Step: 101040   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:05:21,074-Speed 5971.98 samples/sec   Loss 8.1113   LearningRate 0.1298   Epoch: 9   Global Step: 101050   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:05:27,938-Speed 5968.51 samples/sec   Loss 8.0541   LearningRate 0.1298   Epoch: 9   Global Step: 101060   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:05:34,788-Speed 5981.15 samples/sec   Loss 8.0352   LearningRate 0.1298   Epoch: 9   Global Step: 101070   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:05:41,668-Speed 5954.37 samples/sec   Loss 8.0635   LearningRate 0.1298   Epoch: 9   Global Step: 101080   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:05:48,541-Speed 5959.97 samples/sec   Loss 8.1014   LearningRate 0.1297   Epoch: 9   Global Step: 101090   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:05:55,425-Speed 5955.75 samples/sec   Loss 8.0628   LearningRate 0.1297   Epoch: 9   Global Step: 101100   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:06:02,284-Speed 5972.55 samples/sec   Loss 8.0448   LearningRate 0.1297   Epoch: 9   Global Step: 101110   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-08 16:06:09,177-Speed 5943.16 samples/sec   Loss 8.0158   LearningRate 0.1297   Epoch: 9   Global Step: 101120   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-08 16:06:16,030-Speed 5978.60 samples/sec   Loss 8.1284   LearningRate 0.1296   Epoch: 9   Global Step: 101130   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:06:22,892-Speed 5970.45 samples/sec   Loss 7.9992   LearningRate 0.1296   Epoch: 9   Global Step: 101140   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:06:29,754-Speed 5970.16 samples/sec   Loss 7.9808   LearningRate 0.1296   Epoch: 9   Global Step: 101150   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:06:36,622-Speed 5965.19 samples/sec   Loss 8.0437   LearningRate 0.1296   Epoch: 9   Global Step: 101160   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:06:43,494-Speed 5961.84 samples/sec   Loss 8.0324   LearningRate 0.1295   Epoch: 9   Global Step: 101170   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:06:50,389-Speed 5940.88 samples/sec   Loss 7.9935   LearningRate 0.1295   Epoch: 9   Global Step: 101180   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:06:57,240-Speed 5980.43 samples/sec   Loss 8.0222   LearningRate 0.1295   Epoch: 9   Global Step: 101190   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:07:04,100-Speed 5972.03 samples/sec   Loss 8.0030   LearningRate 0.1295   Epoch: 9   Global Step: 101200   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:07:10,961-Speed 5971.34 samples/sec   Loss 8.0489   LearningRate 0.1294   Epoch: 9   Global Step: 101210   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:07:17,823-Speed 5970.23 samples/sec   Loss 8.0476   LearningRate 0.1294   Epoch: 9   Global Step: 101220   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:07:24,682-Speed 5973.35 samples/sec   Loss 8.0478   LearningRate 0.1294   Epoch: 9   Global Step: 101230   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-08 16:07:31,573-Speed 5945.55 samples/sec   Loss 7.9503   LearningRate 0.1294   Epoch: 9   Global Step: 101240   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-08 16:07:38,421-Speed 5982.55 samples/sec   Loss 8.0173   LearningRate 0.1293   Epoch: 9   Global Step: 101250   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:07:45,273-Speed 5978.82 samples/sec   Loss 8.0673   LearningRate 0.1293   Epoch: 9   Global Step: 101260   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:07:52,135-Speed 5970.67 samples/sec   Loss 8.0119   LearningRate 0.1293   Epoch: 9   Global Step: 101270   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:07:59,015-Speed 5954.43 samples/sec   Loss 8.0549   LearningRate 0.1293   Epoch: 9   Global Step: 101280   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:08:05,907-Speed 5944.25 samples/sec   Loss 8.0138   LearningRate 0.1292   Epoch: 9   Global Step: 101290   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:08:12,774-Speed 5965.54 samples/sec   Loss 8.0791   LearningRate 0.1292   Epoch: 9   Global Step: 101300   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:08:19,642-Speed 5964.68 samples/sec   Loss 8.0241   LearningRate 0.1292   Epoch: 9   Global Step: 101310   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:08:26,493-Speed 5980.14 samples/sec   Loss 8.0530   LearningRate 0.1292   Epoch: 9   Global Step: 101320   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:08:33,421-Speed 5913.78 samples/sec   Loss 8.0069   LearningRate 0.1291   Epoch: 9   Global Step: 101330   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:08:40,335-Speed 5925.27 samples/sec   Loss 8.0136   LearningRate 0.1291   Epoch: 9   Global Step: 101340   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:08:47,188-Speed 5977.43 samples/sec   Loss 8.0312   LearningRate 0.1291   Epoch: 9   Global Step: 101350   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-08 16:08:54,153-Speed 5882.22 samples/sec   Loss 8.0502   LearningRate 0.1291   Epoch: 9   Global Step: 101360   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:09:01,007-Speed 5977.70 samples/sec   Loss 8.0218   LearningRate 0.1290   Epoch: 9   Global Step: 101370   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:09:07,879-Speed 5960.95 samples/sec   Loss 8.0726   LearningRate 0.1290   Epoch: 9   Global Step: 101380   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:09:14,729-Speed 5980.29 samples/sec   Loss 7.9944   LearningRate 0.1290   Epoch: 9   Global Step: 101390   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:09:21,602-Speed 5961.64 samples/sec   Loss 8.0283   LearningRate 0.1290   Epoch: 9   Global Step: 101400   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:09:28,471-Speed 5963.33 samples/sec   Loss 8.0612   LearningRate 0.1289   Epoch: 9   Global Step: 101410   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:09:35,336-Speed 5970.06 samples/sec   Loss 8.0606   LearningRate 0.1289   Epoch: 9   Global Step: 101420   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:09:42,197-Speed 5971.84 samples/sec   Loss 8.0805   LearningRate 0.1289   Epoch: 9   Global Step: 101430   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:09:49,066-Speed 5963.80 samples/sec   Loss 8.0504   LearningRate 0.1289   Epoch: 9   Global Step: 101440   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:09:55,934-Speed 5965.51 samples/sec   Loss 8.0412   LearningRate 0.1288   Epoch: 9   Global Step: 101450   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:10:02,789-Speed 5976.40 samples/sec   Loss 8.0002   LearningRate 0.1288   Epoch: 9   Global Step: 101460   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:10:09,663-Speed 5960.03 samples/sec   Loss 8.0368   LearningRate 0.1288   Epoch: 9   Global Step: 101470   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:10:16,518-Speed 5976.44 samples/sec   Loss 8.0146   LearningRate 0.1288   Epoch: 9   Global Step: 101480   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:10:23,388-Speed 5963.66 samples/sec   Loss 7.9451   LearningRate 0.1288   Epoch: 9   Global Step: 101490   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:10:30,276-Speed 5950.01 samples/sec   Loss 8.0614   LearningRate 0.1287   Epoch: 9   Global Step: 101500   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:10:37,131-Speed 5976.70 samples/sec   Loss 8.0538   LearningRate 0.1287   Epoch: 9   Global Step: 101510   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:10:44,012-Speed 5954.18 samples/sec   Loss 8.0036   LearningRate 0.1287   Epoch: 9   Global Step: 101520   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:10:50,875-Speed 5969.01 samples/sec   Loss 8.0168   LearningRate 0.1287   Epoch: 9   Global Step: 101530   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:10:57,736-Speed 5971.52 samples/sec   Loss 8.0282   LearningRate 0.1286   Epoch: 9   Global Step: 101540   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:11:04,627-Speed 5945.28 samples/sec   Loss 8.0542   LearningRate 0.1286   Epoch: 9   Global Step: 101550   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:11:11,515-Speed 5950.41 samples/sec   Loss 7.9903   LearningRate 0.1286   Epoch: 9   Global Step: 101560   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:11:18,390-Speed 5958.99 samples/sec   Loss 7.9817   LearningRate 0.1286   Epoch: 9   Global Step: 101570   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:11:25,262-Speed 5961.64 samples/sec   Loss 8.0308   LearningRate 0.1285   Epoch: 9   Global Step: 101580   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:11:32,132-Speed 5963.10 samples/sec   Loss 8.0359   LearningRate 0.1285   Epoch: 9   Global Step: 101590   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:11:38,990-Speed 5974.30 samples/sec   Loss 8.0297   LearningRate 0.1285   Epoch: 9   Global Step: 101600   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:11:45,839-Speed 5982.05 samples/sec   Loss 8.0865   LearningRate 0.1285   Epoch: 9   Global Step: 101610   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:11:52,686-Speed 5982.37 samples/sec   Loss 7.9830   LearningRate 0.1284   Epoch: 9   Global Step: 101620   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:11:59,564-Speed 5957.56 samples/sec   Loss 8.0699   LearningRate 0.1284   Epoch: 9   Global Step: 101630   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:12:06,482-Speed 5921.84 samples/sec   Loss 7.9886   LearningRate 0.1284   Epoch: 9   Global Step: 101640   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:12:13,336-Speed 5977.37 samples/sec   Loss 8.0489   LearningRate 0.1284   Epoch: 9   Global Step: 101650   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:12:20,199-Speed 5969.52 samples/sec   Loss 7.9798   LearningRate 0.1283   Epoch: 9   Global Step: 101660   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:12:27,059-Speed 5971.65 samples/sec   Loss 7.9687   LearningRate 0.1283   Epoch: 9   Global Step: 101670   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:12:33,914-Speed 5976.57 samples/sec   Loss 7.9670   LearningRate 0.1283   Epoch: 9   Global Step: 101680   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:12:40,775-Speed 5971.13 samples/sec   Loss 8.0576   LearningRate 0.1283   Epoch: 9   Global Step: 101690   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:12:47,629-Speed 5977.68 samples/sec   Loss 8.0329   LearningRate 0.1282   Epoch: 9   Global Step: 101700   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:12:54,482-Speed 5977.98 samples/sec   Loss 8.0034   LearningRate 0.1282   Epoch: 9   Global Step: 101710   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:13:01,344-Speed 5970.11 samples/sec   Loss 8.0031   LearningRate 0.1282   Epoch: 9   Global Step: 101720   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:13:08,201-Speed 5974.83 samples/sec   Loss 7.9504   LearningRate 0.1282   Epoch: 9   Global Step: 101730   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:13:15,053-Speed 5978.91 samples/sec   Loss 8.0967   LearningRate 0.1281   Epoch: 9   Global Step: 101740   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:13:21,902-Speed 5981.06 samples/sec   Loss 8.0610   LearningRate 0.1281   Epoch: 9   Global Step: 101750   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:13:28,756-Speed 5977.02 samples/sec   Loss 8.0020   LearningRate 0.1281   Epoch: 9   Global Step: 101760   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:13:35,629-Speed 5960.78 samples/sec   Loss 7.9898   LearningRate 0.1281   Epoch: 9   Global Step: 101770   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:13:42,475-Speed 5984.36 samples/sec   Loss 7.9795   LearningRate 0.1280   Epoch: 9   Global Step: 101780   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:13:49,360-Speed 5949.80 samples/sec   Loss 8.0074   LearningRate 0.1280   Epoch: 9   Global Step: 101790   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:13:56,236-Speed 5958.43 samples/sec   Loss 7.9902   LearningRate 0.1280   Epoch: 9   Global Step: 101800   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-08 16:14:03,107-Speed 5962.79 samples/sec   Loss 7.9761   LearningRate 0.1280   Epoch: 9   Global Step: 101810   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:14:09,961-Speed 5977.00 samples/sec   Loss 7.9427   LearningRate 0.1279   Epoch: 9   Global Step: 101820   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:14:16,816-Speed 5976.71 samples/sec   Loss 7.9655   LearningRate 0.1279   Epoch: 9   Global Step: 101830   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:14:23,689-Speed 5960.99 samples/sec   Loss 8.0027   LearningRate 0.1279   Epoch: 9   Global Step: 101840   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:14:30,556-Speed 5965.42 samples/sec   Loss 8.0062   LearningRate 0.1279   Epoch: 9   Global Step: 101850   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:14:37,419-Speed 5970.07 samples/sec   Loss 7.9645   LearningRate 0.1279   Epoch: 9   Global Step: 101860   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:14:44,287-Speed 5965.53 samples/sec   Loss 7.9278   LearningRate 0.1278   Epoch: 9   Global Step: 101870   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:14:51,217-Speed 5911.68 samples/sec   Loss 7.9436   LearningRate 0.1278   Epoch: 9   Global Step: 101880   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:14:58,121-Speed 5933.51 samples/sec   Loss 8.0365   LearningRate 0.1278   Epoch: 9   Global Step: 101890   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:15:05,002-Speed 5954.49 samples/sec   Loss 8.0468   LearningRate 0.1278   Epoch: 9   Global Step: 101900   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:15:11,846-Speed 5986.39 samples/sec   Loss 8.0181   LearningRate 0.1277   Epoch: 9   Global Step: 101910   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:15:18,718-Speed 5961.59 samples/sec   Loss 7.9921   LearningRate 0.1277   Epoch: 9   Global Step: 101920   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:15:25,555-Speed 5991.87 samples/sec   Loss 7.9819   LearningRate 0.1277   Epoch: 9   Global Step: 101930   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:15:32,439-Speed 5951.40 samples/sec   Loss 7.9853   LearningRate 0.1277   Epoch: 9   Global Step: 101940   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:15:39,292-Speed 5978.88 samples/sec   Loss 7.9915   LearningRate 0.1276   Epoch: 9   Global Step: 101950   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:15:46,170-Speed 5956.23 samples/sec   Loss 8.0151   LearningRate 0.1276   Epoch: 9   Global Step: 101960   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:15:53,030-Speed 5972.92 samples/sec   Loss 8.0120   LearningRate 0.1276   Epoch: 9   Global Step: 101970   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:15:59,886-Speed 5974.37 samples/sec   Loss 8.0254   LearningRate 0.1276   Epoch: 9   Global Step: 101980   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:16:06,765-Speed 5955.75 samples/sec   Loss 8.0053   LearningRate 0.1275   Epoch: 9   Global Step: 101990   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:16:13,667-Speed 5935.60 samples/sec   Loss 7.9798   LearningRate 0.1275   Epoch: 9   Global Step: 102000   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:16:20,535-Speed 5965.81 samples/sec   Loss 7.9608   LearningRate 0.1275   Epoch: 9   Global Step: 102010   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:16:27,390-Speed 5976.29 samples/sec   Loss 7.9428   LearningRate 0.1275   Epoch: 9   Global Step: 102020   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:16:34,263-Speed 5960.67 samples/sec   Loss 7.9537   LearningRate 0.1274   Epoch: 9   Global Step: 102030   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:16:41,123-Speed 5972.62 samples/sec   Loss 7.9705   LearningRate 0.1274   Epoch: 9   Global Step: 102040   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:16:47,972-Speed 5981.86 samples/sec   Loss 8.0115   LearningRate 0.1274   Epoch: 9   Global Step: 102050   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:16:54,842-Speed 5963.18 samples/sec   Loss 7.9295   LearningRate 0.1274   Epoch: 9   Global Step: 102060   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:17:01,692-Speed 5981.12 samples/sec   Loss 7.9564   LearningRate 0.1273   Epoch: 9   Global Step: 102070   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:17:08,576-Speed 5951.61 samples/sec   Loss 7.9466   LearningRate 0.1273   Epoch: 9   Global Step: 102080   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:17:15,466-Speed 5945.88 samples/sec   Loss 7.9737   LearningRate 0.1273   Epoch: 9   Global Step: 102090   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:17:22,323-Speed 5977.54 samples/sec   Loss 7.9894   LearningRate 0.1273   Epoch: 9   Global Step: 102100   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:17:29,192-Speed 5963.78 samples/sec   Loss 7.9493   LearningRate 0.1272   Epoch: 9   Global Step: 102110   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:17:36,051-Speed 5973.82 samples/sec   Loss 7.9793   LearningRate 0.1272   Epoch: 9   Global Step: 102120   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:17:42,922-Speed 5962.19 samples/sec   Loss 7.9504   LearningRate 0.1272   Epoch: 9   Global Step: 102130   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:17:49,773-Speed 5979.18 samples/sec   Loss 7.9686   LearningRate 0.1272   Epoch: 9   Global Step: 102140   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:17:56,633-Speed 5972.36 samples/sec   Loss 7.9845   LearningRate 0.1272   Epoch: 9   Global Step: 102150   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:18:03,492-Speed 5972.37 samples/sec   Loss 7.9278   LearningRate 0.1271   Epoch: 9   Global Step: 102160   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:18:10,347-Speed 5977.80 samples/sec   Loss 7.9780   LearningRate 0.1271   Epoch: 9   Global Step: 102170   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:18:17,218-Speed 5961.63 samples/sec   Loss 7.9439   LearningRate 0.1271   Epoch: 9   Global Step: 102180   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:18:24,188-Speed 5878.24 samples/sec   Loss 7.9952   LearningRate 0.1271   Epoch: 9   Global Step: 102190   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:18:31,080-Speed 5944.85 samples/sec   Loss 7.9729   LearningRate 0.1270   Epoch: 9   Global Step: 102200   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:18:37,946-Speed 5966.62 samples/sec   Loss 7.9654   LearningRate 0.1270   Epoch: 9   Global Step: 102210   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:18:44,802-Speed 5975.39 samples/sec   Loss 7.9577   LearningRate 0.1270   Epoch: 9   Global Step: 102220   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:18:51,648-Speed 5985.37 samples/sec   Loss 7.9358   LearningRate 0.1270   Epoch: 9   Global Step: 102230   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:18:58,518-Speed 5966.12 samples/sec   Loss 7.9421   LearningRate 0.1269   Epoch: 9   Global Step: 102240   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:19:05,379-Speed 5970.69 samples/sec   Loss 7.9919   LearningRate 0.1269   Epoch: 9   Global Step: 102250   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:19:12,241-Speed 5970.38 samples/sec   Loss 8.0067   LearningRate 0.1269   Epoch: 9   Global Step: 102260   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:19:19,107-Speed 5967.09 samples/sec   Loss 8.0104   LearningRate 0.1269   Epoch: 9   Global Step: 102270   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:19:25,995-Speed 5948.07 samples/sec   Loss 8.0049   LearningRate 0.1268   Epoch: 9   Global Step: 102280   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:19:32,865-Speed 5962.83 samples/sec   Loss 7.9465   LearningRate 0.1268   Epoch: 9   Global Step: 102290   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:19:39,724-Speed 5973.64 samples/sec   Loss 7.9633   LearningRate 0.1268   Epoch: 9   Global Step: 102300   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:19:46,567-Speed 5986.41 samples/sec   Loss 7.9705   LearningRate 0.1268   Epoch: 9   Global Step: 102310   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:19:53,419-Speed 5978.92 samples/sec   Loss 7.9047   LearningRate 0.1267   Epoch: 9   Global Step: 102320   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:20:00,268-Speed 5982.36 samples/sec   Loss 7.9304   LearningRate 0.1267   Epoch: 9   Global Step: 102330   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:20:07,162-Speed 5941.53 samples/sec   Loss 7.9196   LearningRate 0.1267   Epoch: 9   Global Step: 102340   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:20:14,010-Speed 5983.33 samples/sec   Loss 7.9660   LearningRate 0.1267   Epoch: 9   Global Step: 102350   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:20:20,896-Speed 5949.15 samples/sec   Loss 7.9289   LearningRate 0.1266   Epoch: 9   Global Step: 102360   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:20:27,747-Speed 5980.55 samples/sec   Loss 7.9795   LearningRate 0.1266   Epoch: 9   Global Step: 102370   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:20:34,606-Speed 5973.30 samples/sec   Loss 7.9271   LearningRate 0.1266   Epoch: 9   Global Step: 102380   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:20:41,776-Speed 5713.44 samples/sec   Loss 7.9256   LearningRate 0.1266   Epoch: 9   Global Step: 102390   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:20:48,627-Speed 5980.18 samples/sec   Loss 7.9357   LearningRate 0.1265   Epoch: 9   Global Step: 102400   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:20:55,478-Speed 5980.23 samples/sec   Loss 7.9882   LearningRate 0.1265   Epoch: 9   Global Step: 102410   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:21:02,339-Speed 5971.48 samples/sec   Loss 7.9109   LearningRate 0.1265   Epoch: 9   Global Step: 102420   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:21:09,202-Speed 5968.65 samples/sec   Loss 7.9593   LearningRate 0.1265   Epoch: 9   Global Step: 102430   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:21:16,055-Speed 5978.46 samples/sec   Loss 7.9064   LearningRate 0.1265   Epoch: 9   Global Step: 102440   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:21:22,916-Speed 5970.71 samples/sec   Loss 7.8977   LearningRate 0.1264   Epoch: 9   Global Step: 102450   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:21:29,789-Speed 5961.43 samples/sec   Loss 7.9519   LearningRate 0.1264   Epoch: 9   Global Step: 102460   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:21:36,975-Speed 5701.22 samples/sec   Loss 7.9887   LearningRate 0.1264   Epoch: 9   Global Step: 102470   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:21:43,852-Speed 5956.46 samples/sec   Loss 8.0610   LearningRate 0.1264   Epoch: 9   Global Step: 102480   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:21:50,703-Speed 5980.14 samples/sec   Loss 7.9254   LearningRate 0.1263   Epoch: 9   Global Step: 102490   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:21:57,555-Speed 5979.70 samples/sec   Loss 7.9422   LearningRate 0.1263   Epoch: 9   Global Step: 102500   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:22:04,409-Speed 5977.32 samples/sec   Loss 7.9379   LearningRate 0.1263   Epoch: 9   Global Step: 102510   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-08 16:22:11,261-Speed 5978.80 samples/sec   Loss 7.9046   LearningRate 0.1263   Epoch: 9   Global Step: 102520   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-01-08 16:22:18,117-Speed 5976.04 samples/sec   Loss 7.9515   LearningRate 0.1262   Epoch: 9   Global Step: 102530   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:22:24,979-Speed 5971.23 samples/sec   Loss 7.9411   LearningRate 0.1262   Epoch: 9   Global Step: 102540   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:22:31,858-Speed 5955.75 samples/sec   Loss 8.0014   LearningRate 0.1262   Epoch: 9   Global Step: 102550   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:22:38,746-Speed 5947.19 samples/sec   Loss 7.9756   LearningRate 0.1262   Epoch: 9   Global Step: 102560   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:22:45,649-Speed 5937.38 samples/sec   Loss 7.9930   LearningRate 0.1261   Epoch: 9   Global Step: 102570   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:22:52,572-Speed 5917.61 samples/sec   Loss 7.9413   LearningRate 0.1261   Epoch: 9   Global Step: 102580   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:22:59,500-Speed 5914.68 samples/sec   Loss 7.9789   LearningRate 0.1261   Epoch: 9   Global Step: 102590   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:23:06,352-Speed 5978.82 samples/sec   Loss 7.9199   LearningRate 0.1261   Epoch: 9   Global Step: 102600   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:23:13,211-Speed 5972.87 samples/sec   Loss 7.9693   LearningRate 0.1260   Epoch: 9   Global Step: 102610   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:23:20,076-Speed 5967.03 samples/sec   Loss 7.9211   LearningRate 0.1260   Epoch: 9   Global Step: 102620   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:23:26,910-Speed 5994.75 samples/sec   Loss 7.9579   LearningRate 0.1260   Epoch: 9   Global Step: 102630   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:23:33,778-Speed 5965.33 samples/sec   Loss 7.9349   LearningRate 0.1260   Epoch: 9   Global Step: 102640   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:23:40,642-Speed 5968.42 samples/sec   Loss 7.9461   LearningRate 0.1259   Epoch: 9   Global Step: 102650   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:23:47,515-Speed 5961.07 samples/sec   Loss 7.9405   LearningRate 0.1259   Epoch: 9   Global Step: 102660   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:23:54,386-Speed 5962.84 samples/sec   Loss 7.9511   LearningRate 0.1259   Epoch: 9   Global Step: 102670   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:24:01,238-Speed 5978.62 samples/sec   Loss 7.9687   LearningRate 0.1259   Epoch: 9   Global Step: 102680   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:24:08,086-Speed 5982.64 samples/sec   Loss 7.9720   LearningRate 0.1258   Epoch: 9   Global Step: 102690   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:24:14,947-Speed 5970.93 samples/sec   Loss 7.9547   LearningRate 0.1258   Epoch: 9   Global Step: 102700   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:24:21,801-Speed 5978.16 samples/sec   Loss 7.8661   LearningRate 0.1258   Epoch: 9   Global Step: 102710   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:24:28,672-Speed 5961.99 samples/sec   Loss 7.9357   LearningRate 0.1258   Epoch: 9   Global Step: 102720   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:24:35,528-Speed 5974.88 samples/sec   Loss 7.8850   LearningRate 0.1258   Epoch: 9   Global Step: 102730   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:24:42,387-Speed 5972.86 samples/sec   Loss 7.9032   LearningRate 0.1257   Epoch: 9   Global Step: 102740   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-01-08 16:24:49,238-Speed 5979.82 samples/sec   Loss 7.9451   LearningRate 0.1257   Epoch: 9   Global Step: 102750   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:24:56,119-Speed 5955.49 samples/sec   Loss 7.9756   LearningRate 0.1257   Epoch: 9   Global Step: 102760   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:25:02,980-Speed 5971.54 samples/sec   Loss 7.9550   LearningRate 0.1257   Epoch: 9   Global Step: 102770   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:25:09,846-Speed 5966.72 samples/sec   Loss 7.9479   LearningRate 0.1256   Epoch: 9   Global Step: 102780   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:25:16,703-Speed 5975.45 samples/sec   Loss 7.9308   LearningRate 0.1256   Epoch: 9   Global Step: 102790   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:25:23,564-Speed 5971.83 samples/sec   Loss 7.9151   LearningRate 0.1256   Epoch: 9   Global Step: 102800   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:25:30,424-Speed 5971.36 samples/sec   Loss 7.9087   LearningRate 0.1256   Epoch: 9   Global Step: 102810   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-01-08 16:25:37,294-Speed 5963.49 samples/sec   Loss 7.9812   LearningRate 0.1255   Epoch: 9   Global Step: 102820   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:25:44,143-Speed 5981.77 samples/sec   Loss 7.9403   LearningRate 0.1255   Epoch: 9   Global Step: 102830   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:25:51,002-Speed 5972.41 samples/sec   Loss 7.9065   LearningRate 0.1255   Epoch: 9   Global Step: 102840   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:25:57,848-Speed 5985.65 samples/sec   Loss 7.9173   LearningRate 0.1255   Epoch: 9   Global Step: 102850   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:26:04,710-Speed 5969.94 samples/sec   Loss 7.9679   LearningRate 0.1254   Epoch: 9   Global Step: 102860   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:26:11,605-Speed 5942.19 samples/sec   Loss 7.8884   LearningRate 0.1254   Epoch: 9   Global Step: 102870   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:26:18,453-Speed 5982.51 samples/sec   Loss 7.9972   LearningRate 0.1254   Epoch: 9   Global Step: 102880   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:26:25,312-Speed 5973.06 samples/sec   Loss 7.9612   LearningRate 0.1254   Epoch: 9   Global Step: 102890   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:26:32,186-Speed 5959.19 samples/sec   Loss 7.9098   LearningRate 0.1253   Epoch: 9   Global Step: 102900   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:26:39,037-Speed 5980.85 samples/sec   Loss 7.9269   LearningRate 0.1253   Epoch: 9   Global Step: 102910   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:26:45,892-Speed 5976.20 samples/sec   Loss 7.8665   LearningRate 0.1253   Epoch: 9   Global Step: 102920   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:26:52,757-Speed 5967.42 samples/sec   Loss 7.9324   LearningRate 0.1253   Epoch: 9   Global Step: 102930   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:26:59,608-Speed 5980.43 samples/sec   Loss 7.9153   LearningRate 0.1252   Epoch: 9   Global Step: 102940   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:27:06,531-Speed 5918.31 samples/sec   Loss 7.9029   LearningRate 0.1252   Epoch: 9   Global Step: 102950   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:27:13,404-Speed 5962.02 samples/sec   Loss 7.8932   LearningRate 0.1252   Epoch: 9   Global Step: 102960   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:27:20,280-Speed 5957.75 samples/sec   Loss 7.9377   LearningRate 0.1252   Epoch: 9   Global Step: 102970   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:27:27,149-Speed 5964.76 samples/sec   Loss 7.9184   LearningRate 0.1252   Epoch: 9   Global Step: 102980   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:27:34,000-Speed 5979.53 samples/sec   Loss 7.9212   LearningRate 0.1251   Epoch: 9   Global Step: 102990   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:27:41,093-Speed 5778.19 samples/sec   Loss 7.9012   LearningRate 0.1251   Epoch: 9   Global Step: 103000   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:27:47,962-Speed 5964.92 samples/sec   Loss 7.9132   LearningRate 0.1251   Epoch: 9   Global Step: 103010   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:27:54,851-Speed 5946.82 samples/sec   Loss 7.9377   LearningRate 0.1251   Epoch: 9   Global Step: 103020   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:28:01,708-Speed 5974.23 samples/sec   Loss 7.9235   LearningRate 0.1250   Epoch: 9   Global Step: 103030   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:28:08,551-Speed 5986.70 samples/sec   Loss 7.9135   LearningRate 0.1250   Epoch: 9   Global Step: 103040   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:28:15,430-Speed 5955.74 samples/sec   Loss 7.9384   LearningRate 0.1250   Epoch: 9   Global Step: 103050   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-08 16:28:22,276-Speed 5985.30 samples/sec   Loss 7.9124   LearningRate 0.1250   Epoch: 9   Global Step: 103060   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:28:29,137-Speed 5970.39 samples/sec   Loss 7.8983   LearningRate 0.1249   Epoch: 9   Global Step: 103070   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:28:35,980-Speed 5987.18 samples/sec   Loss 7.8704   LearningRate 0.1249   Epoch: 9   Global Step: 103080   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:28:42,838-Speed 5974.45 samples/sec   Loss 7.8948   LearningRate 0.1249   Epoch: 9   Global Step: 103090   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:28:49,697-Speed 5972.74 samples/sec   Loss 7.9396   LearningRate 0.1249   Epoch: 9   Global Step: 103100   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:28:56,554-Speed 5974.23 samples/sec   Loss 7.9275   LearningRate 0.1248   Epoch: 9   Global Step: 103110   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:29:03,416-Speed 5973.28 samples/sec   Loss 7.9568   LearningRate 0.1248   Epoch: 9   Global Step: 103120   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:29:10,301-Speed 5950.50 samples/sec   Loss 7.9413   LearningRate 0.1248   Epoch: 9   Global Step: 103130   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:29:17,147-Speed 5984.86 samples/sec   Loss 7.9246   LearningRate 0.1248   Epoch: 9   Global Step: 103140   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:29:24,010-Speed 5970.16 samples/sec   Loss 7.8499   LearningRate 0.1247   Epoch: 9   Global Step: 103150   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:29:30,874-Speed 5967.75 samples/sec   Loss 7.8714   LearningRate 0.1247   Epoch: 9   Global Step: 103160   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:29:37,725-Speed 5980.60 samples/sec   Loss 7.8808   LearningRate 0.1247   Epoch: 9   Global Step: 103170   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:29:44,583-Speed 5973.09 samples/sec   Loss 7.9550   LearningRate 0.1247   Epoch: 9   Global Step: 103180   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:29:51,439-Speed 5976.09 samples/sec   Loss 7.8711   LearningRate 0.1247   Epoch: 9   Global Step: 103190   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:29:58,347-Speed 5930.71 samples/sec   Loss 7.9431   LearningRate 0.1246   Epoch: 9   Global Step: 103200   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:30:05,211-Speed 5967.98 samples/sec   Loss 7.8862   LearningRate 0.1246   Epoch: 9   Global Step: 103210   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:30:12,062-Speed 5979.58 samples/sec   Loss 7.9432   LearningRate 0.1246   Epoch: 9   Global Step: 103220   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:30:18,918-Speed 5978.66 samples/sec   Loss 7.8650   LearningRate 0.1246   Epoch: 9   Global Step: 103230   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:30:25,795-Speed 5958.48 samples/sec   Loss 7.8894   LearningRate 0.1245   Epoch: 9   Global Step: 103240   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:30:32,660-Speed 5967.55 samples/sec   Loss 7.9577   LearningRate 0.1245   Epoch: 9   Global Step: 103250   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:30:39,572-Speed 5928.02 samples/sec   Loss 7.9007   LearningRate 0.1245   Epoch: 9   Global Step: 103260   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:30:46,434-Speed 5969.74 samples/sec   Loss 7.8958   LearningRate 0.1245   Epoch: 9   Global Step: 103270   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:30:53,297-Speed 5969.69 samples/sec   Loss 7.9321   LearningRate 0.1244   Epoch: 9   Global Step: 103280   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:31:00,160-Speed 5968.22 samples/sec   Loss 7.9153   LearningRate 0.1244   Epoch: 9   Global Step: 103290   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:31:07,038-Speed 5957.01 samples/sec   Loss 7.8388   LearningRate 0.1244   Epoch: 9   Global Step: 103300   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:31:13,882-Speed 5985.88 samples/sec   Loss 7.9057   LearningRate 0.1244   Epoch: 9   Global Step: 103310   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:31:20,753-Speed 5964.60 samples/sec   Loss 7.8221   LearningRate 0.1243   Epoch: 9   Global Step: 103320   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:31:27,607-Speed 5980.69 samples/sec   Loss 7.8558   LearningRate 0.1243   Epoch: 9   Global Step: 103330   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:31:34,456-Speed 5981.23 samples/sec   Loss 7.8145   LearningRate 0.1243   Epoch: 9   Global Step: 103340   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:31:41,314-Speed 5974.06 samples/sec   Loss 7.8983   LearningRate 0.1243   Epoch: 9   Global Step: 103350   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:31:48,165-Speed 5980.23 samples/sec   Loss 7.8659   LearningRate 0.1242   Epoch: 9   Global Step: 103360   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:31:55,023-Speed 5973.67 samples/sec   Loss 7.8627   LearningRate 0.1242   Epoch: 9   Global Step: 103370   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:32:01,875-Speed 5978.18 samples/sec   Loss 7.8381   LearningRate 0.1242   Epoch: 9   Global Step: 103380   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:32:08,760-Speed 5952.07 samples/sec   Loss 7.8813   LearningRate 0.1242   Epoch: 9   Global Step: 103390   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:32:15,599-Speed 5989.87 samples/sec   Loss 7.9104   LearningRate 0.1241   Epoch: 9   Global Step: 103400   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:32:22,444-Speed 5985.33 samples/sec   Loss 7.8553   LearningRate 0.1241   Epoch: 9   Global Step: 103410   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:32:29,294-Speed 5980.28 samples/sec   Loss 7.9206   LearningRate 0.1241   Epoch: 9   Global Step: 103420   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:32:36,141-Speed 5983.26 samples/sec   Loss 7.8658   LearningRate 0.1241   Epoch: 9   Global Step: 103430   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:32:43,003-Speed 5970.42 samples/sec   Loss 7.8572   LearningRate 0.1241   Epoch: 9   Global Step: 103440   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:32:49,885-Speed 5955.57 samples/sec   Loss 7.9407   LearningRate 0.1240   Epoch: 9   Global Step: 103450   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:32:56,739-Speed 5977.28 samples/sec   Loss 7.9246   LearningRate 0.1240   Epoch: 9   Global Step: 103460   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:33:03,603-Speed 5968.31 samples/sec   Loss 7.8993   LearningRate 0.1240   Epoch: 9   Global Step: 103470   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:33:10,452-Speed 5983.63 samples/sec   Loss 7.8637   LearningRate 0.1240   Epoch: 9   Global Step: 103480   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:33:17,309-Speed 5974.83 samples/sec   Loss 7.8972   LearningRate 0.1239   Epoch: 9   Global Step: 103490   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:33:24,585-Speed 5631.40 samples/sec   Loss 7.8375   LearningRate 0.1239   Epoch: 9   Global Step: 103500   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:33:31,485-Speed 5936.43 samples/sec   Loss 7.8265   LearningRate 0.1239   Epoch: 9   Global Step: 103510   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:33:38,331-Speed 5987.03 samples/sec   Loss 7.8909   LearningRate 0.1239   Epoch: 9   Global Step: 103520   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:33:45,192-Speed 5970.86 samples/sec   Loss 7.8133   LearningRate 0.1238   Epoch: 9   Global Step: 103530   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:33:52,065-Speed 5963.21 samples/sec   Loss 7.9389   LearningRate 0.1238   Epoch: 9   Global Step: 103540   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:33:58,942-Speed 5957.06 samples/sec   Loss 7.9132   LearningRate 0.1238   Epoch: 9   Global Step: 103550   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:34:05,793-Speed 5979.84 samples/sec   Loss 7.8999   LearningRate 0.1238   Epoch: 9   Global Step: 103560   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:34:12,637-Speed 5987.43 samples/sec   Loss 7.8665   LearningRate 0.1237   Epoch: 9   Global Step: 103570   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:34:19,487-Speed 5981.76 samples/sec   Loss 7.9298   LearningRate 0.1237   Epoch: 9   Global Step: 103580   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:34:26,350-Speed 5969.51 samples/sec   Loss 7.8719   LearningRate 0.1237   Epoch: 9   Global Step: 103590   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:34:33,208-Speed 5973.48 samples/sec   Loss 7.9314   LearningRate 0.1237   Epoch: 9   Global Step: 103600   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:34:40,083-Speed 5959.46 samples/sec   Loss 7.8619   LearningRate 0.1236   Epoch: 9   Global Step: 103610   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:34:46,936-Speed 5977.59 samples/sec   Loss 7.8347   LearningRate 0.1236   Epoch: 9   Global Step: 103620   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:34:53,782-Speed 5985.03 samples/sec   Loss 7.9591   LearningRate 0.1236   Epoch: 9   Global Step: 103630   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-08 16:35:00,641-Speed 5972.18 samples/sec   Loss 7.9082   LearningRate 0.1236   Epoch: 9   Global Step: 103640   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-08 16:35:07,502-Speed 5971.43 samples/sec   Loss 7.8994   LearningRate 0.1236   Epoch: 9   Global Step: 103650   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:35:14,351-Speed 5981.18 samples/sec   Loss 7.9169   LearningRate 0.1235   Epoch: 9   Global Step: 103660   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:35:21,203-Speed 5978.94 samples/sec   Loss 7.8780   LearningRate 0.1235   Epoch: 9   Global Step: 103670   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:35:28,057-Speed 5976.85 samples/sec   Loss 7.8454   LearningRate 0.1235   Epoch: 9   Global Step: 103680   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:35:34,903-Speed 5986.11 samples/sec   Loss 7.9296   LearningRate 0.1235   Epoch: 9   Global Step: 103690   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:36:00,018-Speed 1630.93 samples/sec   Loss 7.8552   LearningRate 0.1234   Epoch: 10   Global Step: 103700   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:36:06,869-Speed 5980.26 samples/sec   Loss 7.8226   LearningRate 0.1234   Epoch: 10   Global Step: 103710   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:36:13,718-Speed 5982.54 samples/sec   Loss 7.9298   LearningRate 0.1234   Epoch: 10   Global Step: 103720   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:36:20,545-Speed 6000.31 samples/sec   Loss 7.8478   LearningRate 0.1234   Epoch: 10   Global Step: 103730   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:36:27,377-Speed 5996.92 samples/sec   Loss 7.9069   LearningRate 0.1233   Epoch: 10   Global Step: 103740   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:36:34,211-Speed 5996.24 samples/sec   Loss 7.8546   LearningRate 0.1233   Epoch: 10   Global Step: 103750   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:36:41,059-Speed 5982.45 samples/sec   Loss 7.9140   LearningRate 0.1233   Epoch: 10   Global Step: 103760   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:36:47,913-Speed 5979.46 samples/sec   Loss 7.8489   LearningRate 0.1233   Epoch: 10   Global Step: 103770   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:36:54,763-Speed 5982.04 samples/sec   Loss 7.7977   LearningRate 0.1232   Epoch: 10   Global Step: 103780   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:37:01,623-Speed 5971.75 samples/sec   Loss 7.8381   LearningRate 0.1232   Epoch: 10   Global Step: 103790   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:37:08,498-Speed 5960.00 samples/sec   Loss 7.8763   LearningRate 0.1232   Epoch: 10   Global Step: 103800   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:37:15,355-Speed 5975.73 samples/sec   Loss 7.8645   LearningRate 0.1232   Epoch: 10   Global Step: 103810   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:37:22,223-Speed 5964.56 samples/sec   Loss 7.8491   LearningRate 0.1231   Epoch: 10   Global Step: 103820   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:37:29,103-Speed 5955.21 samples/sec   Loss 7.8099   LearningRate 0.1231   Epoch: 10   Global Step: 103830   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:37:35,975-Speed 5961.87 samples/sec   Loss 7.8310   LearningRate 0.1231   Epoch: 10   Global Step: 103840   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:37:42,830-Speed 5975.99 samples/sec   Loss 7.9224   LearningRate 0.1231   Epoch: 10   Global Step: 103850   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-08 16:37:49,712-Speed 5953.64 samples/sec   Loss 7.7877   LearningRate 0.1231   Epoch: 10   Global Step: 103860   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:37:56,585-Speed 5960.52 samples/sec   Loss 7.8722   LearningRate 0.1230   Epoch: 10   Global Step: 103870   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:38:03,453-Speed 5965.16 samples/sec   Loss 7.8317   LearningRate 0.1230   Epoch: 10   Global Step: 103880   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:38:10,334-Speed 5954.43 samples/sec   Loss 7.8749   LearningRate 0.1230   Epoch: 10   Global Step: 103890   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:38:17,227-Speed 5943.44 samples/sec   Loss 7.8126   LearningRate 0.1230   Epoch: 10   Global Step: 103900   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:38:24,111-Speed 5951.52 samples/sec   Loss 7.8090   LearningRate 0.1229   Epoch: 10   Global Step: 103910   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:38:30,976-Speed 5967.44 samples/sec   Loss 7.8723   LearningRate 0.1229   Epoch: 10   Global Step: 103920   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:38:37,853-Speed 5958.59 samples/sec   Loss 7.7899   LearningRate 0.1229   Epoch: 10   Global Step: 103930   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:38:44,739-Speed 5949.53 samples/sec   Loss 7.8019   LearningRate 0.1229   Epoch: 10   Global Step: 103940   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:38:51,608-Speed 5966.37 samples/sec   Loss 7.8786   LearningRate 0.1228   Epoch: 10   Global Step: 103950   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:38:58,464-Speed 5978.81 samples/sec   Loss 7.8657   LearningRate 0.1228   Epoch: 10   Global Step: 103960   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:39:05,330-Speed 5966.69 samples/sec   Loss 7.8031   LearningRate 0.1228   Epoch: 10   Global Step: 103970   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:39:12,471-Speed 5744.18 samples/sec   Loss 7.8580   LearningRate 0.1228   Epoch: 10   Global Step: 103980   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:39:19,377-Speed 5932.93 samples/sec   Loss 7.8545   LearningRate 0.1227   Epoch: 10   Global Step: 103990   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:39:26,382-Speed 5848.22 samples/sec   Loss 7.8184   LearningRate 0.1227   Epoch: 10   Global Step: 104000   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:39:33,335-Speed 5892.44 samples/sec   Loss 7.8395   LearningRate 0.1227   Epoch: 10   Global Step: 104010   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:39:40,214-Speed 5956.32 samples/sec   Loss 7.9562   LearningRate 0.1227   Epoch: 10   Global Step: 104020   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:39:47,081-Speed 5965.39 samples/sec   Loss 7.8951   LearningRate 0.1226   Epoch: 10   Global Step: 104030   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:39:53,936-Speed 5976.02 samples/sec   Loss 7.8949   LearningRate 0.1226   Epoch: 10   Global Step: 104040   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:40:00,806-Speed 5965.35 samples/sec   Loss 7.8579   LearningRate 0.1226   Epoch: 10   Global Step: 104050   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:40:07,662-Speed 5974.42 samples/sec   Loss 7.8889   LearningRate 0.1226   Epoch: 10   Global Step: 104060   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:40:14,534-Speed 5961.78 samples/sec   Loss 7.8290   LearningRate 0.1226   Epoch: 10   Global Step: 104070   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:40:21,391-Speed 5974.69 samples/sec   Loss 7.8852   LearningRate 0.1225   Epoch: 10   Global Step: 104080   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:40:28,260-Speed 5963.90 samples/sec   Loss 7.8842   LearningRate 0.1225   Epoch: 10   Global Step: 104090   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:40:35,127-Speed 5967.40 samples/sec   Loss 7.8853   LearningRate 0.1225   Epoch: 10   Global Step: 104100   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:40:41,992-Speed 5967.04 samples/sec   Loss 7.7875   LearningRate 0.1225   Epoch: 10   Global Step: 104110   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:40:48,859-Speed 5965.60 samples/sec   Loss 7.9328   LearningRate 0.1224   Epoch: 10   Global Step: 104120   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:40:55,715-Speed 5975.83 samples/sec   Loss 7.8537   LearningRate 0.1224   Epoch: 10   Global Step: 104130   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:41:02,579-Speed 5969.17 samples/sec   Loss 7.7897   LearningRate 0.1224   Epoch: 10   Global Step: 104140   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:41:09,434-Speed 5976.22 samples/sec   Loss 7.8531   LearningRate 0.1224   Epoch: 10   Global Step: 104150   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:41:16,305-Speed 5962.37 samples/sec   Loss 7.8285   LearningRate 0.1223   Epoch: 10   Global Step: 104160   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:41:23,161-Speed 5976.32 samples/sec   Loss 7.8777   LearningRate 0.1223   Epoch: 10   Global Step: 104170   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:41:30,006-Speed 5984.61 samples/sec   Loss 7.8771   LearningRate 0.1223   Epoch: 10   Global Step: 104180   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:41:36,878-Speed 5961.60 samples/sec   Loss 7.7752   LearningRate 0.1223   Epoch: 10   Global Step: 104190   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:41:43,738-Speed 5972.50 samples/sec   Loss 7.8373   LearningRate 0.1222   Epoch: 10   Global Step: 104200   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:41:50,601-Speed 5969.06 samples/sec   Loss 7.8369   LearningRate 0.1222   Epoch: 10   Global Step: 104210   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:41:57,494-Speed 5943.45 samples/sec   Loss 7.7981   LearningRate 0.1222   Epoch: 10   Global Step: 104220   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:42:04,350-Speed 5976.03 samples/sec   Loss 7.9214   LearningRate 0.1222   Epoch: 10   Global Step: 104230   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:42:11,219-Speed 5964.79 samples/sec   Loss 7.8069   LearningRate 0.1222   Epoch: 10   Global Step: 104240   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:42:18,099-Speed 5956.02 samples/sec   Loss 7.8003   LearningRate 0.1221   Epoch: 10   Global Step: 104250   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:42:24,948-Speed 5982.22 samples/sec   Loss 7.8577   LearningRate 0.1221   Epoch: 10   Global Step: 104260   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:42:31,811-Speed 5969.77 samples/sec   Loss 7.8130   LearningRate 0.1221   Epoch: 10   Global Step: 104270   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:42:38,663-Speed 5979.12 samples/sec   Loss 7.8050   LearningRate 0.1221   Epoch: 10   Global Step: 104280   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:42:45,513-Speed 5981.93 samples/sec   Loss 7.8682   LearningRate 0.1220   Epoch: 10   Global Step: 104290   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:42:52,643-Speed 5746.11 samples/sec   Loss 7.7937   LearningRate 0.1220   Epoch: 10   Global Step: 104300   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:42:59,503-Speed 5971.97 samples/sec   Loss 7.8434   LearningRate 0.1220   Epoch: 10   Global Step: 104310   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:43:06,350-Speed 5983.33 samples/sec   Loss 7.7789   LearningRate 0.1220   Epoch: 10   Global Step: 104320   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:43:13,196-Speed 5983.66 samples/sec   Loss 7.8892   LearningRate 0.1219   Epoch: 10   Global Step: 104330   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:43:20,040-Speed 5986.01 samples/sec   Loss 7.8188   LearningRate 0.1219   Epoch: 10   Global Step: 104340   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:43:26,900-Speed 5972.12 samples/sec   Loss 7.8439   LearningRate 0.1219   Epoch: 10   Global Step: 104350   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:43:33,760-Speed 5971.41 samples/sec   Loss 7.8626   LearningRate 0.1219   Epoch: 10   Global Step: 104360   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:43:40,613-Speed 5978.00 samples/sec   Loss 7.7153   LearningRate 0.1218   Epoch: 10   Global Step: 104370   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:43:47,484-Speed 5963.74 samples/sec   Loss 7.7848   LearningRate 0.1218   Epoch: 10   Global Step: 104380   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:43:54,343-Speed 5972.89 samples/sec   Loss 7.7993   LearningRate 0.1218   Epoch: 10   Global Step: 104390   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:44:01,208-Speed 5967.88 samples/sec   Loss 7.8163   LearningRate 0.1218   Epoch: 10   Global Step: 104400   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:44:08,074-Speed 5967.38 samples/sec   Loss 7.7668   LearningRate 0.1217   Epoch: 10   Global Step: 104410   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:44:14,943-Speed 5963.81 samples/sec   Loss 7.9083   LearningRate 0.1217   Epoch: 10   Global Step: 104420   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:44:21,815-Speed 5962.23 samples/sec   Loss 7.8113   LearningRate 0.1217   Epoch: 10   Global Step: 104430   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:44:28,676-Speed 5970.68 samples/sec   Loss 7.8712   LearningRate 0.1217   Epoch: 10   Global Step: 104440   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:44:35,533-Speed 5975.18 samples/sec   Loss 7.8008   LearningRate 0.1217   Epoch: 10   Global Step: 104450   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:44:42,406-Speed 5961.87 samples/sec   Loss 7.7943   LearningRate 0.1216   Epoch: 10   Global Step: 104460   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:44:49,262-Speed 5975.08 samples/sec   Loss 7.7864   LearningRate 0.1216   Epoch: 10   Global Step: 104470   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:44:56,107-Speed 5985.01 samples/sec   Loss 7.7578   LearningRate 0.1216   Epoch: 10   Global Step: 104480   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:45:02,953-Speed 5984.49 samples/sec   Loss 7.8724   LearningRate 0.1216   Epoch: 10   Global Step: 104490   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:45:09,906-Speed 5891.95 samples/sec   Loss 7.8203   LearningRate 0.1215   Epoch: 10   Global Step: 104500   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-08 16:45:16,748-Speed 5987.73 samples/sec   Loss 7.7791   LearningRate 0.1215   Epoch: 10   Global Step: 104510   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:45:23,597-Speed 5981.98 samples/sec   Loss 7.8335   LearningRate 0.1215   Epoch: 10   Global Step: 104520   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:45:30,470-Speed 5960.85 samples/sec   Loss 7.7820   LearningRate 0.1215   Epoch: 10   Global Step: 104530   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:45:37,329-Speed 5972.37 samples/sec   Loss 7.7838   LearningRate 0.1214   Epoch: 10   Global Step: 104540   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:45:44,176-Speed 5984.17 samples/sec   Loss 7.8623   LearningRate 0.1214   Epoch: 10   Global Step: 104550   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:45:51,021-Speed 5985.24 samples/sec   Loss 7.7921   LearningRate 0.1214   Epoch: 10   Global Step: 104560   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:45:57,866-Speed 5985.56 samples/sec   Loss 7.8519   LearningRate 0.1214   Epoch: 10   Global Step: 104570   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:46:04,708-Speed 5987.55 samples/sec   Loss 7.8423   LearningRate 0.1213   Epoch: 10   Global Step: 104580   Fp16 Grad Scale: 8192   Required: 20 hours
Training: 2022-01-08 16:46:11,562-Speed 5978.05 samples/sec   Loss 7.8276   LearningRate 0.1213   Epoch: 10   Global Step: 104590   Fp16 Grad Scale: 8192   Required: 20 hours
Training: 2022-01-08 16:46:18,440-Speed 5956.23 samples/sec   Loss 7.8537   LearningRate 0.1213   Epoch: 10   Global Step: 104600   Fp16 Grad Scale: 8192   Required: 20 hours
Training: 2022-01-08 16:46:25,301-Speed 5970.82 samples/sec   Loss 7.8911   LearningRate 0.1213   Epoch: 10   Global Step: 104610   Fp16 Grad Scale: 8192   Required: 20 hours
Training: 2022-01-08 16:46:32,173-Speed 5961.91 samples/sec   Loss 7.8001   LearningRate 0.1213   Epoch: 10   Global Step: 104620   Fp16 Grad Scale: 8192   Required: 20 hours
Training: 2022-01-08 16:46:39,023-Speed 5979.81 samples/sec   Loss 7.7886   LearningRate 0.1212   Epoch: 10   Global Step: 104630   Fp16 Grad Scale: 8192   Required: 20 hours
Training: 2022-01-08 16:46:45,885-Speed 5971.96 samples/sec   Loss 7.7248   LearningRate 0.1212   Epoch: 10   Global Step: 104640   Fp16 Grad Scale: 8192   Required: 20 hours
Training: 2022-01-08 16:46:52,748-Speed 5968.77 samples/sec   Loss 7.7991   LearningRate 0.1212   Epoch: 10   Global Step: 104650   Fp16 Grad Scale: 8192   Required: 20 hours
Training: 2022-01-08 16:46:59,600-Speed 5978.28 samples/sec   Loss 7.7700   LearningRate 0.1212   Epoch: 10   Global Step: 104660   Fp16 Grad Scale: 8192   Required: 20 hours
Training: 2022-01-08 16:47:06,450-Speed 5983.21 samples/sec   Loss 7.7955   LearningRate 0.1211   Epoch: 10   Global Step: 104670   Fp16 Grad Scale: 8192   Required: 20 hours
Training: 2022-01-08 16:47:13,336-Speed 5950.07 samples/sec   Loss 7.7611   LearningRate 0.1211   Epoch: 10   Global Step: 104680   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-01-08 16:47:20,205-Speed 5963.83 samples/sec   Loss 7.7592   LearningRate 0.1211   Epoch: 10   Global Step: 104690   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-01-08 16:47:27,109-Speed 5934.15 samples/sec   Loss 7.7672   LearningRate 0.1211   Epoch: 10   Global Step: 104700   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-01-08 16:47:33,974-Speed 5968.80 samples/sec   Loss 7.7771   LearningRate 0.1210   Epoch: 10   Global Step: 104710   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-01-08 16:47:40,842-Speed 5964.59 samples/sec   Loss 7.8222   LearningRate 0.1210   Epoch: 10   Global Step: 104720   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-01-08 16:47:47,688-Speed 5984.31 samples/sec   Loss 7.8307   LearningRate 0.1210   Epoch: 10   Global Step: 104730   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-01-08 16:47:54,540-Speed 5979.36 samples/sec   Loss 7.8445   LearningRate 0.1210   Epoch: 10   Global Step: 104740   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-01-08 16:48:01,392-Speed 5979.93 samples/sec   Loss 7.7884   LearningRate 0.1209   Epoch: 10   Global Step: 104750   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-01-08 16:48:08,306-Speed 5925.59 samples/sec   Loss 7.7746   LearningRate 0.1209   Epoch: 10   Global Step: 104760   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-01-08 16:48:15,181-Speed 5959.48 samples/sec   Loss 7.8235   LearningRate 0.1209   Epoch: 10   Global Step: 104770   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-01-08 16:48:22,037-Speed 5975.05 samples/sec   Loss 7.7935   LearningRate 0.1209   Epoch: 10   Global Step: 104780   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 16:48:28,925-Speed 5947.81 samples/sec   Loss 7.7382   LearningRate 0.1209   Epoch: 10   Global Step: 104790   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 16:48:35,803-Speed 5957.86 samples/sec   Loss 7.8635   LearningRate 0.1208   Epoch: 10   Global Step: 104800   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 16:48:42,763-Speed 5885.59 samples/sec   Loss 7.7461   LearningRate 0.1208   Epoch: 10   Global Step: 104810   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 16:48:49,722-Speed 5887.50 samples/sec   Loss 7.7941   LearningRate 0.1208   Epoch: 10   Global Step: 104820   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 16:48:56,675-Speed 5893.38 samples/sec   Loss 7.8460   LearningRate 0.1208   Epoch: 10   Global Step: 104830   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 16:49:03,523-Speed 5982.72 samples/sec   Loss 7.8249   LearningRate 0.1207   Epoch: 10   Global Step: 104840   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 16:49:10,387-Speed 5970.36 samples/sec   Loss 7.7997   LearningRate 0.1207   Epoch: 10   Global Step: 104850   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 16:49:17,290-Speed 5934.86 samples/sec   Loss 7.7444   LearningRate 0.1207   Epoch: 10   Global Step: 104860   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 16:49:24,135-Speed 5984.64 samples/sec   Loss 7.8606   LearningRate 0.1207   Epoch: 10   Global Step: 104870   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-01-08 16:49:30,994-Speed 5972.86 samples/sec   Loss 7.8015   LearningRate 0.1206   Epoch: 10   Global Step: 104880   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:49:37,849-Speed 5976.72 samples/sec   Loss 7.7298   LearningRate 0.1206   Epoch: 10   Global Step: 104890   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:49:44,774-Speed 5916.91 samples/sec   Loss 7.7634   LearningRate 0.1206   Epoch: 10   Global Step: 104900   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:49:51,644-Speed 5963.71 samples/sec   Loss 7.7838   LearningRate 0.1206   Epoch: 10   Global Step: 104910   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:49:58,590-Speed 5900.79 samples/sec   Loss 7.7580   LearningRate 0.1205   Epoch: 10   Global Step: 104920   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:50:05,458-Speed 5965.01 samples/sec   Loss 7.8448   LearningRate 0.1205   Epoch: 10   Global Step: 104930   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:50:12,308-Speed 5980.77 samples/sec   Loss 7.7529   LearningRate 0.1205   Epoch: 10   Global Step: 104940   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:50:19,162-Speed 5977.70 samples/sec   Loss 7.7780   LearningRate 0.1205   Epoch: 10   Global Step: 104950   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:50:26,024-Speed 5969.91 samples/sec   Loss 7.7843   LearningRate 0.1205   Epoch: 10   Global Step: 104960   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:50:32,873-Speed 5981.68 samples/sec   Loss 7.6963   LearningRate 0.1204   Epoch: 10   Global Step: 104970   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:50:39,773-Speed 5938.00 samples/sec   Loss 7.7658   LearningRate 0.1204   Epoch: 10   Global Step: 104980   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:50:46,634-Speed 5971.05 samples/sec   Loss 7.7526   LearningRate 0.1204   Epoch: 10   Global Step: 104990   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:50:53,505-Speed 5962.87 samples/sec   Loss 7.8529   LearningRate 0.1204   Epoch: 10   Global Step: 105000   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:51:20,317-[lfw][105000]XNorm: 23.686709
Training: 2022-01-08 16:51:20,318-[lfw][105000]Accuracy-Flip: 0.99767+-0.00300
Training: 2022-01-08 16:51:20,319-[lfw][105000]Accuracy-Highest: 0.99767
Training: 2022-01-08 16:51:51,369-[cfp_fp][105000]XNorm: 20.803627
Training: 2022-01-08 16:51:51,370-[cfp_fp][105000]Accuracy-Flip: 0.98357+-0.00767
Training: 2022-01-08 16:51:51,371-[cfp_fp][105000]Accuracy-Highest: 0.98357
Training: 2022-01-08 16:52:18,162-[agedb_30][105000]XNorm: 23.038324
Training: 2022-01-08 16:52:18,163-[agedb_30][105000]Accuracy-Flip: 0.97200+-0.00710
Training: 2022-01-08 16:52:18,163-[agedb_30][105000]Accuracy-Highest: 0.97200
Training: 2022-01-08 16:52:25,009-Speed 447.64 samples/sec   Loss 7.7539   LearningRate 0.1203   Epoch: 10   Global Step: 105010   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:52:31,852-Speed 5987.43 samples/sec   Loss 7.7663   LearningRate 0.1203   Epoch: 10   Global Step: 105020   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:52:38,703-Speed 5979.69 samples/sec   Loss 7.7631   LearningRate 0.1203   Epoch: 10   Global Step: 105030   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:52:45,553-Speed 5981.72 samples/sec   Loss 7.7813   LearningRate 0.1203   Epoch: 10   Global Step: 105040   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:52:52,425-Speed 5961.42 samples/sec   Loss 7.7248   LearningRate 0.1202   Epoch: 10   Global Step: 105050   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:52:59,282-Speed 5975.16 samples/sec   Loss 7.7999   LearningRate 0.1202   Epoch: 10   Global Step: 105060   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:53:06,164-Speed 5953.05 samples/sec   Loss 7.8065   LearningRate 0.1202   Epoch: 10   Global Step: 105070   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:53:13,017-Speed 5978.43 samples/sec   Loss 7.8162   LearningRate 0.1202   Epoch: 10   Global Step: 105080   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-08 16:53:19,894-Speed 5956.80 samples/sec   Loss 7.7137   LearningRate 0.1201   Epoch: 10   Global Step: 105090   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:53:26,793-Speed 5937.82 samples/sec   Loss 7.7572   LearningRate 0.1201   Epoch: 10   Global Step: 105100   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:53:33,703-Speed 5928.74 samples/sec   Loss 7.8124   LearningRate 0.1201   Epoch: 10   Global Step: 105110   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:53:40,605-Speed 5936.12 samples/sec   Loss 7.7785   LearningRate 0.1201   Epoch: 10   Global Step: 105120   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:53:47,510-Speed 5933.75 samples/sec   Loss 7.7972   LearningRate 0.1201   Epoch: 10   Global Step: 105130   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:53:54,380-Speed 5962.62 samples/sec   Loss 7.7438   LearningRate 0.1200   Epoch: 10   Global Step: 105140   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:54:01,247-Speed 5966.05 samples/sec   Loss 7.7831   LearningRate 0.1200   Epoch: 10   Global Step: 105150   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:54:08,120-Speed 5960.67 samples/sec   Loss 7.7032   LearningRate 0.1200   Epoch: 10   Global Step: 105160   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:54:14,973-Speed 5978.14 samples/sec   Loss 7.7520   LearningRate 0.1200   Epoch: 10   Global Step: 105170   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:54:21,821-Speed 5982.37 samples/sec   Loss 7.7995   LearningRate 0.1199   Epoch: 10   Global Step: 105180   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:54:28,665-Speed 5986.31 samples/sec   Loss 7.7746   LearningRate 0.1199   Epoch: 10   Global Step: 105190   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-08 16:54:35,560-Speed 5941.52 samples/sec   Loss 7.7296   LearningRate 0.1199   Epoch: 10   Global Step: 105200   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-08 16:54:42,410-Speed 5980.21 samples/sec   Loss 7.8644   LearningRate 0.1199   Epoch: 10   Global Step: 105210   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:54:49,259-Speed 5981.26 samples/sec   Loss 7.8001   LearningRate 0.1198   Epoch: 10   Global Step: 105220   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:54:56,113-Speed 5976.96 samples/sec   Loss 7.7169   LearningRate 0.1198   Epoch: 10   Global Step: 105230   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:55:02,974-Speed 5971.49 samples/sec   Loss 7.7740   LearningRate 0.1198   Epoch: 10   Global Step: 105240   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:55:09,828-Speed 5977.18 samples/sec   Loss 7.7600   LearningRate 0.1198   Epoch: 10   Global Step: 105250   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:55:16,702-Speed 5959.91 samples/sec   Loss 7.7680   LearningRate 0.1197   Epoch: 10   Global Step: 105260   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:55:23,557-Speed 5978.76 samples/sec   Loss 7.7349   LearningRate 0.1197   Epoch: 10   Global Step: 105270   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:55:30,401-Speed 5985.69 samples/sec   Loss 7.7906   LearningRate 0.1197   Epoch: 10   Global Step: 105280   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:55:37,257-Speed 5975.46 samples/sec   Loss 7.7649   LearningRate 0.1197   Epoch: 10   Global Step: 105290   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:55:44,124-Speed 5967.65 samples/sec   Loss 7.7564   LearningRate 0.1197   Epoch: 10   Global Step: 105300   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:55:50,961-Speed 5991.95 samples/sec   Loss 7.7807   LearningRate 0.1196   Epoch: 10   Global Step: 105310   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:55:57,854-Speed 5943.55 samples/sec   Loss 7.7421   LearningRate 0.1196   Epoch: 10   Global Step: 105320   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:56:04,706-Speed 5978.54 samples/sec   Loss 7.7369   LearningRate 0.1196   Epoch: 10   Global Step: 105330   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:56:11,579-Speed 5961.24 samples/sec   Loss 7.7375   LearningRate 0.1196   Epoch: 10   Global Step: 105340   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:56:18,459-Speed 5954.24 samples/sec   Loss 7.7297   LearningRate 0.1195   Epoch: 10   Global Step: 105350   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:56:25,303-Speed 5986.10 samples/sec   Loss 7.7810   LearningRate 0.1195   Epoch: 10   Global Step: 105360   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:56:32,185-Speed 5956.82 samples/sec   Loss 7.7060   LearningRate 0.1195   Epoch: 10   Global Step: 105370   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:56:39,033-Speed 5982.55 samples/sec   Loss 7.7418   LearningRate 0.1195   Epoch: 10   Global Step: 105380   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:56:45,884-Speed 5979.97 samples/sec   Loss 7.7826   LearningRate 0.1194   Epoch: 10   Global Step: 105390   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:56:52,765-Speed 5954.04 samples/sec   Loss 7.6851   LearningRate 0.1194   Epoch: 10   Global Step: 105400   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:56:59,630-Speed 5967.93 samples/sec   Loss 7.7721   LearningRate 0.1194   Epoch: 10   Global Step: 105410   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:57:06,483-Speed 5978.07 samples/sec   Loss 7.7764   LearningRate 0.1194   Epoch: 10   Global Step: 105420   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:57:13,337-Speed 5977.11 samples/sec   Loss 7.6915   LearningRate 0.1193   Epoch: 10   Global Step: 105430   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:57:20,199-Speed 5970.40 samples/sec   Loss 7.6853   LearningRate 0.1193   Epoch: 10   Global Step: 105440   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:57:27,076-Speed 5957.21 samples/sec   Loss 7.7087   LearningRate 0.1193   Epoch: 10   Global Step: 105450   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:57:33,967-Speed 5944.92 samples/sec   Loss 7.7353   LearningRate 0.1193   Epoch: 10   Global Step: 105460   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:57:40,823-Speed 5975.48 samples/sec   Loss 7.8076   LearningRate 0.1193   Epoch: 10   Global Step: 105470   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:57:47,780-Speed 5891.43 samples/sec   Loss 7.7773   LearningRate 0.1192   Epoch: 10   Global Step: 105480   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:57:54,674-Speed 5942.54 samples/sec   Loss 7.7188   LearningRate 0.1192   Epoch: 10   Global Step: 105490   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:58:01,559-Speed 5949.90 samples/sec   Loss 7.7604   LearningRate 0.1192   Epoch: 10   Global Step: 105500   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:58:08,413-Speed 5977.43 samples/sec   Loss 7.7393   LearningRate 0.1192   Epoch: 10   Global Step: 105510   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 16:58:15,279-Speed 5966.45 samples/sec   Loss 7.7902   LearningRate 0.1191   Epoch: 10   Global Step: 105520   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:58:22,140-Speed 5971.52 samples/sec   Loss 7.7542   LearningRate 0.1191   Epoch: 10   Global Step: 105530   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:58:29,053-Speed 5926.67 samples/sec   Loss 7.7906   LearningRate 0.1191   Epoch: 10   Global Step: 105540   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:58:35,940-Speed 5952.45 samples/sec   Loss 7.7952   LearningRate 0.1191   Epoch: 10   Global Step: 105550   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:58:42,795-Speed 5975.66 samples/sec   Loss 7.7722   LearningRate 0.1190   Epoch: 10   Global Step: 105560   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:58:49,674-Speed 5955.97 samples/sec   Loss 7.7423   LearningRate 0.1190   Epoch: 10   Global Step: 105570   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:58:56,527-Speed 5978.03 samples/sec   Loss 7.6746   LearningRate 0.1190   Epoch: 10   Global Step: 105580   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:59:03,388-Speed 5970.89 samples/sec   Loss 7.7506   LearningRate 0.1190   Epoch: 10   Global Step: 105590   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:59:10,236-Speed 5982.65 samples/sec   Loss 7.6727   LearningRate 0.1190   Epoch: 10   Global Step: 105600   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:59:17,140-Speed 5935.55 samples/sec   Loss 7.7232   LearningRate 0.1189   Epoch: 10   Global Step: 105610   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:59:24,001-Speed 5970.90 samples/sec   Loss 7.7251   LearningRate 0.1189   Epoch: 10   Global Step: 105620   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:59:30,919-Speed 5923.08 samples/sec   Loss 7.7755   LearningRate 0.1189   Epoch: 10   Global Step: 105630   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:59:37,797-Speed 5956.93 samples/sec   Loss 7.7051   LearningRate 0.1189   Epoch: 10   Global Step: 105640   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:59:44,672-Speed 5959.02 samples/sec   Loss 7.7302   LearningRate 0.1188   Epoch: 10   Global Step: 105650   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:59:51,533-Speed 5970.62 samples/sec   Loss 7.7080   LearningRate 0.1188   Epoch: 10   Global Step: 105660   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 16:59:58,394-Speed 5978.01 samples/sec   Loss 7.7756   LearningRate 0.1188   Epoch: 10   Global Step: 105670   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:00:05,245-Speed 5978.95 samples/sec   Loss 7.7581   LearningRate 0.1188   Epoch: 10   Global Step: 105680   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:00:12,111-Speed 5967.14 samples/sec   Loss 7.7674   LearningRate 0.1187   Epoch: 10   Global Step: 105690   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:00:19,032-Speed 5918.75 samples/sec   Loss 7.7187   LearningRate 0.1187   Epoch: 10   Global Step: 105700   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:00:25,886-Speed 5977.22 samples/sec   Loss 7.7752   LearningRate 0.1187   Epoch: 10   Global Step: 105710   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:00:32,754-Speed 5975.23 samples/sec   Loss 7.7314   LearningRate 0.1187   Epoch: 10   Global Step: 105720   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:00:39,628-Speed 5959.83 samples/sec   Loss 7.7011   LearningRate 0.1186   Epoch: 10   Global Step: 105730   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:00:46,488-Speed 5971.41 samples/sec   Loss 7.7165   LearningRate 0.1186   Epoch: 10   Global Step: 105740   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:00:53,345-Speed 5975.12 samples/sec   Loss 7.7058   LearningRate 0.1186   Epoch: 10   Global Step: 105750   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:01:00,207-Speed 5970.30 samples/sec   Loss 7.7152   LearningRate 0.1186   Epoch: 10   Global Step: 105760   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:01:07,066-Speed 5972.80 samples/sec   Loss 7.7125   LearningRate 0.1186   Epoch: 10   Global Step: 105770   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:01:13,946-Speed 5954.58 samples/sec   Loss 7.6962   LearningRate 0.1185   Epoch: 10   Global Step: 105780   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:01:20,818-Speed 5961.73 samples/sec   Loss 7.6909   LearningRate 0.1185   Epoch: 10   Global Step: 105790   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:01:27,690-Speed 5962.01 samples/sec   Loss 7.7002   LearningRate 0.1185   Epoch: 10   Global Step: 105800   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:01:34,546-Speed 5975.71 samples/sec   Loss 7.6535   LearningRate 0.1185   Epoch: 10   Global Step: 105810   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:01:41,419-Speed 5960.75 samples/sec   Loss 7.6644   LearningRate 0.1184   Epoch: 10   Global Step: 105820   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:01:48,278-Speed 5973.36 samples/sec   Loss 7.8131   LearningRate 0.1184   Epoch: 10   Global Step: 105830   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:01:55,150-Speed 5963.12 samples/sec   Loss 7.6628   LearningRate 0.1184   Epoch: 10   Global Step: 105840   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:02:02,030-Speed 5955.22 samples/sec   Loss 7.7403   LearningRate 0.1184   Epoch: 10   Global Step: 105850   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:02:09,009-Speed 5870.12 samples/sec   Loss 7.6850   LearningRate 0.1183   Epoch: 10   Global Step: 105860   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:02:15,897-Speed 5948.00 samples/sec   Loss 7.7018   LearningRate 0.1183   Epoch: 10   Global Step: 105870   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:02:22,780-Speed 5953.30 samples/sec   Loss 7.6315   LearningRate 0.1183   Epoch: 10   Global Step: 105880   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:02:29,674-Speed 5942.49 samples/sec   Loss 7.7083   LearningRate 0.1183   Epoch: 10   Global Step: 105890   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:02:36,637-Speed 5883.72 samples/sec   Loss 7.6744   LearningRate 0.1183   Epoch: 10   Global Step: 105900   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:02:43,525-Speed 5948.54 samples/sec   Loss 7.7462   LearningRate 0.1182   Epoch: 10   Global Step: 105910   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:02:50,399-Speed 5959.38 samples/sec   Loss 7.7075   LearningRate 0.1182   Epoch: 10   Global Step: 105920   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:02:57,273-Speed 5962.50 samples/sec   Loss 7.7263   LearningRate 0.1182   Epoch: 10   Global Step: 105930   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:03:04,133-Speed 5972.16 samples/sec   Loss 7.6920   LearningRate 0.1182   Epoch: 10   Global Step: 105940   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:03:10,999-Speed 5966.24 samples/sec   Loss 7.7662   LearningRate 0.1181   Epoch: 10   Global Step: 105950   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:03:17,888-Speed 5946.89 samples/sec   Loss 7.7201   LearningRate 0.1181   Epoch: 10   Global Step: 105960   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:03:24,756-Speed 5965.93 samples/sec   Loss 7.6861   LearningRate 0.1181   Epoch: 10   Global Step: 105970   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:03:31,611-Speed 5975.37 samples/sec   Loss 7.6781   LearningRate 0.1181   Epoch: 10   Global Step: 105980   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:03:38,484-Speed 5961.51 samples/sec   Loss 7.7123   LearningRate 0.1180   Epoch: 10   Global Step: 105990   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:03:45,364-Speed 5954.35 samples/sec   Loss 7.7198   LearningRate 0.1180   Epoch: 10   Global Step: 106000   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-08 17:03:52,244-Speed 5954.39 samples/sec   Loss 7.6522   LearningRate 0.1180   Epoch: 10   Global Step: 106010   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-08 17:03:59,094-Speed 5980.38 samples/sec   Loss 7.7397   LearningRate 0.1180   Epoch: 10   Global Step: 106020   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:04:05,947-Speed 5978.69 samples/sec   Loss 7.6569   LearningRate 0.1179   Epoch: 10   Global Step: 106030   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:04:12,818-Speed 5961.09 samples/sec   Loss 7.6347   LearningRate 0.1179   Epoch: 10   Global Step: 106040   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:04:19,680-Speed 5970.53 samples/sec   Loss 7.7201   LearningRate 0.1179   Epoch: 10   Global Step: 106050   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:04:26,563-Speed 5952.76 samples/sec   Loss 7.7405   LearningRate 0.1179   Epoch: 10   Global Step: 106060   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:04:33,428-Speed 5966.91 samples/sec   Loss 7.7301   LearningRate 0.1179   Epoch: 10   Global Step: 106070   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:04:40,282-Speed 5977.23 samples/sec   Loss 7.6951   LearningRate 0.1178   Epoch: 10   Global Step: 106080   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:04:47,139-Speed 5973.92 samples/sec   Loss 7.6323   LearningRate 0.1178   Epoch: 10   Global Step: 106090   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:04:54,001-Speed 5970.16 samples/sec   Loss 7.7124   LearningRate 0.1178   Epoch: 10   Global Step: 106100   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:05:00,869-Speed 5965.13 samples/sec   Loss 7.6989   LearningRate 0.1178   Epoch: 10   Global Step: 106110   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:05:07,719-Speed 5980.79 samples/sec   Loss 7.7445   LearningRate 0.1177   Epoch: 10   Global Step: 106120   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:05:14,574-Speed 5975.75 samples/sec   Loss 7.7069   LearningRate 0.1177   Epoch: 10   Global Step: 106130   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:05:21,436-Speed 5970.37 samples/sec   Loss 7.7198   LearningRate 0.1177   Epoch: 10   Global Step: 106140   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:05:28,298-Speed 5970.33 samples/sec   Loss 7.7703   LearningRate 0.1177   Epoch: 10   Global Step: 106150   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:05:35,150-Speed 5978.54 samples/sec   Loss 7.7007   LearningRate 0.1176   Epoch: 10   Global Step: 106160   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:05:42,006-Speed 5975.73 samples/sec   Loss 7.7396   LearningRate 0.1176   Epoch: 10   Global Step: 106170   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:05:48,881-Speed 5959.87 samples/sec   Loss 7.6975   LearningRate 0.1176   Epoch: 10   Global Step: 106180   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:05:55,734-Speed 5977.34 samples/sec   Loss 7.6846   LearningRate 0.1176   Epoch: 10   Global Step: 106190   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:06:02,595-Speed 5970.96 samples/sec   Loss 7.6993   LearningRate 0.1176   Epoch: 10   Global Step: 106200   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:06:09,447-Speed 5978.94 samples/sec   Loss 7.6620   LearningRate 0.1175   Epoch: 10   Global Step: 106210   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:06:16,307-Speed 5971.76 samples/sec   Loss 7.6843   LearningRate 0.1175   Epoch: 10   Global Step: 106220   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:06:23,169-Speed 5970.80 samples/sec   Loss 7.6430   LearningRate 0.1175   Epoch: 10   Global Step: 106230   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-08 17:06:30,040-Speed 5964.89 samples/sec   Loss 7.6151   LearningRate 0.1175   Epoch: 10   Global Step: 106240   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:06:36,911-Speed 5973.83 samples/sec   Loss 7.7072   LearningRate 0.1174   Epoch: 10   Global Step: 106250   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:06:43,778-Speed 5966.39 samples/sec   Loss 7.6748   LearningRate 0.1174   Epoch: 10   Global Step: 106260   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:06:50,665-Speed 5950.57 samples/sec   Loss 7.7615   LearningRate 0.1174   Epoch: 10   Global Step: 106270   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:06:57,559-Speed 5941.96 samples/sec   Loss 7.7429   LearningRate 0.1174   Epoch: 10   Global Step: 106280   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:07:04,456-Speed 5940.53 samples/sec   Loss 7.7280   LearningRate 0.1173   Epoch: 10   Global Step: 106290   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:07:11,315-Speed 5972.39 samples/sec   Loss 7.6690   LearningRate 0.1173   Epoch: 10   Global Step: 106300   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:07:18,169-Speed 5977.32 samples/sec   Loss 7.6520   LearningRate 0.1173   Epoch: 10   Global Step: 106310   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:07:25,023-Speed 5976.64 samples/sec   Loss 7.6662   LearningRate 0.1173   Epoch: 10   Global Step: 106320   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:07:31,888-Speed 5967.51 samples/sec   Loss 7.6552   LearningRate 0.1173   Epoch: 10   Global Step: 106330   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:07:38,747-Speed 5972.91 samples/sec   Loss 7.7265   LearningRate 0.1172   Epoch: 10   Global Step: 106340   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-08 17:07:45,594-Speed 5984.81 samples/sec   Loss 7.6878   LearningRate 0.1172   Epoch: 10   Global Step: 106350   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:07:52,449-Speed 5976.85 samples/sec   Loss 7.6678   LearningRate 0.1172   Epoch: 10   Global Step: 106360   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:07:59,318-Speed 5963.48 samples/sec   Loss 7.6898   LearningRate 0.1172   Epoch: 10   Global Step: 106370   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:08:06,205-Speed 5949.20 samples/sec   Loss 7.6352   LearningRate 0.1171   Epoch: 10   Global Step: 106380   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:08:13,060-Speed 5976.13 samples/sec   Loss 7.6456   LearningRate 0.1171   Epoch: 10   Global Step: 106390   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:08:19,916-Speed 5975.35 samples/sec   Loss 7.6492   LearningRate 0.1171   Epoch: 10   Global Step: 106400   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:08:26,772-Speed 5974.85 samples/sec   Loss 7.6792   LearningRate 0.1171   Epoch: 10   Global Step: 106410   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:08:33,647-Speed 5959.46 samples/sec   Loss 7.6883   LearningRate 0.1170   Epoch: 10   Global Step: 106420   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:08:40,503-Speed 5974.89 samples/sec   Loss 7.7703   LearningRate 0.1170   Epoch: 10   Global Step: 106430   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:08:47,358-Speed 5975.92 samples/sec   Loss 7.6734   LearningRate 0.1170   Epoch: 10   Global Step: 106440   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:08:54,206-Speed 5982.41 samples/sec   Loss 7.7563   LearningRate 0.1170   Epoch: 10   Global Step: 106450   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:09:01,066-Speed 5971.15 samples/sec   Loss 7.6586   LearningRate 0.1169   Epoch: 10   Global Step: 106460   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:09:07,920-Speed 5977.82 samples/sec   Loss 7.6729   LearningRate 0.1169   Epoch: 10   Global Step: 106470   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:09:14,782-Speed 5970.05 samples/sec   Loss 7.7336   LearningRate 0.1169   Epoch: 10   Global Step: 106480   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:09:21,653-Speed 5962.12 samples/sec   Loss 7.7043   LearningRate 0.1169   Epoch: 10   Global Step: 106490   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:09:28,500-Speed 5983.04 samples/sec   Loss 7.6495   LearningRate 0.1169   Epoch: 10   Global Step: 106500   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:09:35,348-Speed 5982.29 samples/sec   Loss 7.7604   LearningRate 0.1168   Epoch: 10   Global Step: 106510   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:09:42,264-Speed 5923.77 samples/sec   Loss 7.6328   LearningRate 0.1168   Epoch: 10   Global Step: 106520   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:09:49,123-Speed 5973.15 samples/sec   Loss 7.7044   LearningRate 0.1168   Epoch: 10   Global Step: 106530   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:09:55,980-Speed 5974.84 samples/sec   Loss 7.5745   LearningRate 0.1168   Epoch: 10   Global Step: 106540   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:10:02,837-Speed 5973.89 samples/sec   Loss 7.6505   LearningRate 0.1167   Epoch: 10   Global Step: 106550   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:10:09,688-Speed 5979.54 samples/sec   Loss 7.6909   LearningRate 0.1167   Epoch: 10   Global Step: 106560   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-08 17:10:16,529-Speed 5988.70 samples/sec   Loss 7.6717   LearningRate 0.1167   Epoch: 10   Global Step: 106570   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:10:23,395-Speed 5967.11 samples/sec   Loss 7.6604   LearningRate 0.1167   Epoch: 10   Global Step: 106580   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:10:30,256-Speed 5970.43 samples/sec   Loss 7.6437   LearningRate 0.1166   Epoch: 10   Global Step: 106590   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:10:37,122-Speed 5967.33 samples/sec   Loss 7.6766   LearningRate 0.1166   Epoch: 10   Global Step: 106600   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:10:43,979-Speed 5974.39 samples/sec   Loss 7.6051   LearningRate 0.1166   Epoch: 10   Global Step: 106610   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:10:50,844-Speed 5968.12 samples/sec   Loss 7.6147   LearningRate 0.1166   Epoch: 10   Global Step: 106620   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:10:57,690-Speed 5984.25 samples/sec   Loss 7.6288   LearningRate 0.1166   Epoch: 10   Global Step: 106630   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:11:04,565-Speed 5958.41 samples/sec   Loss 7.7263   LearningRate 0.1165   Epoch: 10   Global Step: 106640   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:11:11,416-Speed 5979.82 samples/sec   Loss 7.6863   LearningRate 0.1165   Epoch: 10   Global Step: 106650   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:11:18,294-Speed 5958.91 samples/sec   Loss 7.6727   LearningRate 0.1165   Epoch: 10   Global Step: 106660   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:11:25,139-Speed 5985.26 samples/sec   Loss 7.6911   LearningRate 0.1165   Epoch: 10   Global Step: 106670   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-08 17:11:31,985-Speed 5983.91 samples/sec   Loss 7.6294   LearningRate 0.1164   Epoch: 10   Global Step: 106680   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:11:38,836-Speed 5979.89 samples/sec   Loss 7.6230   LearningRate 0.1164   Epoch: 10   Global Step: 106690   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:11:45,697-Speed 5971.29 samples/sec   Loss 7.6678   LearningRate 0.1164   Epoch: 10   Global Step: 106700   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:11:52,649-Speed 5892.67 samples/sec   Loss 7.6990   LearningRate 0.1164   Epoch: 10   Global Step: 106710   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:11:59,510-Speed 5971.28 samples/sec   Loss 7.6900   LearningRate 0.1163   Epoch: 10   Global Step: 106720   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:12:06,354-Speed 5985.40 samples/sec   Loss 7.6168   LearningRate 0.1163   Epoch: 10   Global Step: 106730   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:12:13,219-Speed 5967.51 samples/sec   Loss 7.6891   LearningRate 0.1163   Epoch: 10   Global Step: 106740   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:12:20,091-Speed 5961.60 samples/sec   Loss 7.6130   LearningRate 0.1163   Epoch: 10   Global Step: 106750   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:12:26,939-Speed 5982.21 samples/sec   Loss 7.6476   LearningRate 0.1163   Epoch: 10   Global Step: 106760   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:12:33,788-Speed 5981.57 samples/sec   Loss 7.7045   LearningRate 0.1162   Epoch: 10   Global Step: 106770   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:12:40,632-Speed 5985.81 samples/sec   Loss 7.6655   LearningRate 0.1162   Epoch: 10   Global Step: 106780   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:12:47,485-Speed 5977.44 samples/sec   Loss 7.6620   LearningRate 0.1162   Epoch: 10   Global Step: 106790   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:12:54,342-Speed 5974.29 samples/sec   Loss 7.6621   LearningRate 0.1162   Epoch: 10   Global Step: 106800   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:13:01,207-Speed 5968.06 samples/sec   Loss 7.6719   LearningRate 0.1161   Epoch: 10   Global Step: 106810   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:13:08,058-Speed 5979.36 samples/sec   Loss 7.6303   LearningRate 0.1161   Epoch: 10   Global Step: 106820   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:13:14,917-Speed 5973.15 samples/sec   Loss 7.7005   LearningRate 0.1161   Epoch: 10   Global Step: 106830   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:13:21,775-Speed 5973.80 samples/sec   Loss 7.6301   LearningRate 0.1161   Epoch: 10   Global Step: 106840   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:13:28,624-Speed 5981.36 samples/sec   Loss 7.7048   LearningRate 0.1160   Epoch: 10   Global Step: 106850   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:13:35,474-Speed 5980.50 samples/sec   Loss 7.6204   LearningRate 0.1160   Epoch: 10   Global Step: 106860   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:13:42,341-Speed 5965.99 samples/sec   Loss 7.7093   LearningRate 0.1160   Epoch: 10   Global Step: 106870   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:13:49,208-Speed 5965.43 samples/sec   Loss 7.6454   LearningRate 0.1160   Epoch: 10   Global Step: 106880   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:13:56,066-Speed 5974.00 samples/sec   Loss 7.6008   LearningRate 0.1160   Epoch: 10   Global Step: 106890   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:14:02,930-Speed 5968.15 samples/sec   Loss 7.6247   LearningRate 0.1159   Epoch: 10   Global Step: 106900   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:14:09,818-Speed 5948.16 samples/sec   Loss 7.5985   LearningRate 0.1159   Epoch: 10   Global Step: 106910   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:14:16,680-Speed 5970.45 samples/sec   Loss 7.5653   LearningRate 0.1159   Epoch: 10   Global Step: 106920   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:14:23,527-Speed 5983.41 samples/sec   Loss 7.6057   LearningRate 0.1159   Epoch: 10   Global Step: 106930   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:14:30,376-Speed 5981.17 samples/sec   Loss 7.5915   LearningRate 0.1158   Epoch: 10   Global Step: 106940   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:14:37,236-Speed 5974.88 samples/sec   Loss 7.6235   LearningRate 0.1158   Epoch: 10   Global Step: 106950   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:14:44,093-Speed 5974.59 samples/sec   Loss 7.6165   LearningRate 0.1158   Epoch: 10   Global Step: 106960   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:14:50,959-Speed 5966.20 samples/sec   Loss 7.6819   LearningRate 0.1158   Epoch: 10   Global Step: 106970   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:14:57,816-Speed 5974.38 samples/sec   Loss 7.7382   LearningRate 0.1157   Epoch: 10   Global Step: 106980   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:15:04,672-Speed 5975.36 samples/sec   Loss 7.5993   LearningRate 0.1157   Epoch: 10   Global Step: 106990   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:15:11,528-Speed 5974.89 samples/sec   Loss 7.5872   LearningRate 0.1157   Epoch: 10   Global Step: 107000   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:15:18,387-Speed 5973.60 samples/sec   Loss 7.6216   LearningRate 0.1157   Epoch: 10   Global Step: 107010   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:15:25,239-Speed 5979.26 samples/sec   Loss 7.6063   LearningRate 0.1157   Epoch: 10   Global Step: 107020   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:15:32,124-Speed 5953.17 samples/sec   Loss 7.6687   LearningRate 0.1156   Epoch: 10   Global Step: 107030   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:15:39,010-Speed 5949.11 samples/sec   Loss 7.6415   LearningRate 0.1156   Epoch: 10   Global Step: 107040   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:15:45,868-Speed 5974.18 samples/sec   Loss 7.6459   LearningRate 0.1156   Epoch: 10   Global Step: 107050   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-08 17:15:52,738-Speed 5963.84 samples/sec   Loss 7.5752   LearningRate 0.1156   Epoch: 10   Global Step: 107060   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-08 17:15:59,603-Speed 5966.96 samples/sec   Loss 7.6085   LearningRate 0.1155   Epoch: 10   Global Step: 107070   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:16:06,454-Speed 5980.35 samples/sec   Loss 7.6073   LearningRate 0.1155   Epoch: 10   Global Step: 107080   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:16:13,305-Speed 5979.94 samples/sec   Loss 7.6941   LearningRate 0.1155   Epoch: 10   Global Step: 107090   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:16:20,176-Speed 5964.03 samples/sec   Loss 7.6580   LearningRate 0.1155   Epoch: 10   Global Step: 107100   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:16:27,071-Speed 5944.77 samples/sec   Loss 7.6368   LearningRate 0.1154   Epoch: 10   Global Step: 107110   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:16:33,952-Speed 5953.21 samples/sec   Loss 7.5958   LearningRate 0.1154   Epoch: 10   Global Step: 107120   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:16:40,807-Speed 5976.56 samples/sec   Loss 7.6667   LearningRate 0.1154   Epoch: 10   Global Step: 107130   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:16:47,665-Speed 5973.87 samples/sec   Loss 7.5764   LearningRate 0.1154   Epoch: 10   Global Step: 107140   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:16:54,508-Speed 5986.33 samples/sec   Loss 7.5980   LearningRate 0.1154   Epoch: 10   Global Step: 107150   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:17:01,358-Speed 5980.70 samples/sec   Loss 7.6097   LearningRate 0.1153   Epoch: 10   Global Step: 107160   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:17:08,211-Speed 5978.68 samples/sec   Loss 7.5740   LearningRate 0.1153   Epoch: 10   Global Step: 107170   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:17:15,079-Speed 5964.93 samples/sec   Loss 7.6633   LearningRate 0.1153   Epoch: 10   Global Step: 107180   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:17:21,939-Speed 5971.80 samples/sec   Loss 7.6785   LearningRate 0.1153   Epoch: 10   Global Step: 107190   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:17:28,798-Speed 5973.40 samples/sec   Loss 7.6491   LearningRate 0.1152   Epoch: 10   Global Step: 107200   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:17:35,649-Speed 5979.89 samples/sec   Loss 7.5834   LearningRate 0.1152   Epoch: 10   Global Step: 107210   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:17:42,499-Speed 5980.58 samples/sec   Loss 7.6038   LearningRate 0.1152   Epoch: 10   Global Step: 107220   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:17:49,387-Speed 5947.83 samples/sec   Loss 7.6344   LearningRate 0.1152   Epoch: 10   Global Step: 107230   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:17:56,280-Speed 5942.90 samples/sec   Loss 7.6407   LearningRate 0.1151   Epoch: 10   Global Step: 107240   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:18:03,143-Speed 5969.83 samples/sec   Loss 7.6320   LearningRate 0.1151   Epoch: 10   Global Step: 107250   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:18:09,995-Speed 5979.11 samples/sec   Loss 7.6250   LearningRate 0.1151   Epoch: 10   Global Step: 107260   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:18:16,848-Speed 5978.21 samples/sec   Loss 7.5919   LearningRate 0.1151   Epoch: 10   Global Step: 107270   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:18:23,717-Speed 5964.55 samples/sec   Loss 7.6717   LearningRate 0.1151   Epoch: 10   Global Step: 107280   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:18:30,573-Speed 5974.94 samples/sec   Loss 7.6324   LearningRate 0.1150   Epoch: 10   Global Step: 107290   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:18:37,438-Speed 5967.52 samples/sec   Loss 7.5748   LearningRate 0.1150   Epoch: 10   Global Step: 107300   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:18:44,305-Speed 5965.96 samples/sec   Loss 7.6473   LearningRate 0.1150   Epoch: 10   Global Step: 107310   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:18:51,171-Speed 5967.62 samples/sec   Loss 7.6228   LearningRate 0.1150   Epoch: 10   Global Step: 107320   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:18:58,032-Speed 5970.68 samples/sec   Loss 7.6131   LearningRate 0.1149   Epoch: 10   Global Step: 107330   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:19:04,935-Speed 5935.36 samples/sec   Loss 7.6117   LearningRate 0.1149   Epoch: 10   Global Step: 107340   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:19:11,817-Speed 5952.07 samples/sec   Loss 7.6354   LearningRate 0.1149   Epoch: 10   Global Step: 107350   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:19:18,782-Speed 5882.66 samples/sec   Loss 7.7118   LearningRate 0.1149   Epoch: 10   Global Step: 107360   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:19:25,650-Speed 5964.40 samples/sec   Loss 7.6502   LearningRate 0.1148   Epoch: 10   Global Step: 107370   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:19:32,526-Speed 5957.98 samples/sec   Loss 7.6183   LearningRate 0.1148   Epoch: 10   Global Step: 107380   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:19:39,409-Speed 5952.40 samples/sec   Loss 7.5716   LearningRate 0.1148   Epoch: 10   Global Step: 107390   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:19:46,279-Speed 5963.76 samples/sec   Loss 7.5642   LearningRate 0.1148   Epoch: 10   Global Step: 107400   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:19:53,128-Speed 5981.45 samples/sec   Loss 7.5721   LearningRate 0.1148   Epoch: 10   Global Step: 107410   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:19:59,988-Speed 5972.16 samples/sec   Loss 7.6055   LearningRate 0.1147   Epoch: 10   Global Step: 107420   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:20:06,918-Speed 5914.28 samples/sec   Loss 7.5762   LearningRate 0.1147   Epoch: 10   Global Step: 107430   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:20:13,790-Speed 5961.75 samples/sec   Loss 7.5733   LearningRate 0.1147   Epoch: 10   Global Step: 107440   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:20:20,640-Speed 5979.98 samples/sec   Loss 7.5937   LearningRate 0.1147   Epoch: 10   Global Step: 107450   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:20:27,499-Speed 5974.38 samples/sec   Loss 7.6405   LearningRate 0.1146   Epoch: 10   Global Step: 107460   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:20:34,347-Speed 5982.20 samples/sec   Loss 7.6887   LearningRate 0.1146   Epoch: 10   Global Step: 107470   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-08 17:20:41,192-Speed 5985.16 samples/sec   Loss 7.6607   LearningRate 0.1146   Epoch: 10   Global Step: 107480   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:20:48,061-Speed 5964.51 samples/sec   Loss 7.5355   LearningRate 0.1146   Epoch: 10   Global Step: 107490   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:20:54,927-Speed 5967.34 samples/sec   Loss 7.6299   LearningRate 0.1146   Epoch: 10   Global Step: 107500   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:21:01,789-Speed 5970.31 samples/sec   Loss 7.5837   LearningRate 0.1145   Epoch: 10   Global Step: 107510   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:21:08,642-Speed 5978.15 samples/sec   Loss 7.5901   LearningRate 0.1145   Epoch: 10   Global Step: 107520   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:21:15,498-Speed 5975.11 samples/sec   Loss 7.5683   LearningRate 0.1145   Epoch: 10   Global Step: 107530   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:21:22,373-Speed 5959.32 samples/sec   Loss 7.5771   LearningRate 0.1145   Epoch: 10   Global Step: 107540   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:21:29,238-Speed 5968.35 samples/sec   Loss 7.5558   LearningRate 0.1144   Epoch: 10   Global Step: 107550   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:21:36,098-Speed 5971.66 samples/sec   Loss 7.6315   LearningRate 0.1144   Epoch: 10   Global Step: 107560   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:21:42,977-Speed 5955.67 samples/sec   Loss 7.5395   LearningRate 0.1144   Epoch: 10   Global Step: 107570   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:21:49,838-Speed 5971.81 samples/sec   Loss 7.6042   LearningRate 0.1144   Epoch: 10   Global Step: 107580   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-08 17:21:56,704-Speed 5966.15 samples/sec   Loss 7.6310   LearningRate 0.1143   Epoch: 10   Global Step: 107590   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-01-08 17:22:03,565-Speed 5970.83 samples/sec   Loss 7.5678   LearningRate 0.1143   Epoch: 10   Global Step: 107600   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:22:10,413-Speed 5984.76 samples/sec   Loss 7.5964   LearningRate 0.1143   Epoch: 10   Global Step: 107610   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:22:17,268-Speed 5977.00 samples/sec   Loss 7.5595   LearningRate 0.1143   Epoch: 10   Global Step: 107620   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:22:24,134-Speed 5966.60 samples/sec   Loss 7.6281   LearningRate 0.1143   Epoch: 10   Global Step: 107630   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:22:30,991-Speed 5974.14 samples/sec   Loss 7.6006   LearningRate 0.1142   Epoch: 10   Global Step: 107640   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:22:37,842-Speed 5980.13 samples/sec   Loss 7.5975   LearningRate 0.1142   Epoch: 10   Global Step: 107650   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:22:44,694-Speed 5978.84 samples/sec   Loss 7.5978   LearningRate 0.1142   Epoch: 10   Global Step: 107660   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:22:51,568-Speed 5959.11 samples/sec   Loss 7.5592   LearningRate 0.1142   Epoch: 10   Global Step: 107670   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:22:58,428-Speed 5973.14 samples/sec   Loss 7.5614   LearningRate 0.1141   Epoch: 10   Global Step: 107680   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:23:05,278-Speed 5979.75 samples/sec   Loss 7.5581   LearningRate 0.1141   Epoch: 10   Global Step: 107690   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:23:12,167-Speed 5946.71 samples/sec   Loss 7.6376   LearningRate 0.1141   Epoch: 10   Global Step: 107700   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:23:19,061-Speed 5946.38 samples/sec   Loss 7.6294   LearningRate 0.1141   Epoch: 10   Global Step: 107710   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:23:25,948-Speed 5947.88 samples/sec   Loss 7.5631   LearningRate 0.1140   Epoch: 10   Global Step: 107720   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:23:32,803-Speed 5977.11 samples/sec   Loss 7.5730   LearningRate 0.1140   Epoch: 10   Global Step: 107730   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:23:39,666-Speed 5970.96 samples/sec   Loss 7.5562   LearningRate 0.1140   Epoch: 10   Global Step: 107740   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:23:46,562-Speed 5940.39 samples/sec   Loss 7.5537   LearningRate 0.1140   Epoch: 10   Global Step: 107750   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:23:53,437-Speed 5960.21 samples/sec   Loss 7.6047   LearningRate 0.1140   Epoch: 10   Global Step: 107760   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:24:00,319-Speed 5952.56 samples/sec   Loss 7.6166   LearningRate 0.1139   Epoch: 10   Global Step: 107770   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:24:07,172-Speed 5978.27 samples/sec   Loss 7.5518   LearningRate 0.1139   Epoch: 10   Global Step: 107780   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:24:14,027-Speed 5976.45 samples/sec   Loss 7.5544   LearningRate 0.1139   Epoch: 10   Global Step: 107790   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:24:20,892-Speed 5967.20 samples/sec   Loss 7.6107   LearningRate 0.1139   Epoch: 10   Global Step: 107800   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:24:27,753-Speed 5970.44 samples/sec   Loss 7.5889   LearningRate 0.1138   Epoch: 10   Global Step: 107810   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:24:34,598-Speed 5985.52 samples/sec   Loss 7.6156   LearningRate 0.1138   Epoch: 10   Global Step: 107820   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:24:41,471-Speed 5961.07 samples/sec   Loss 7.5741   LearningRate 0.1138   Epoch: 10   Global Step: 107830   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:24:48,343-Speed 5961.40 samples/sec   Loss 7.5319   LearningRate 0.1138   Epoch: 10   Global Step: 107840   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:24:55,201-Speed 5974.19 samples/sec   Loss 7.5393   LearningRate 0.1137   Epoch: 10   Global Step: 107850   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:25:02,077-Speed 5958.26 samples/sec   Loss 7.5318   LearningRate 0.1137   Epoch: 10   Global Step: 107860   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:25:08,935-Speed 5974.00 samples/sec   Loss 7.5461   LearningRate 0.1137   Epoch: 10   Global Step: 107870   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:25:15,793-Speed 5973.75 samples/sec   Loss 7.5725   LearningRate 0.1137   Epoch: 10   Global Step: 107880   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:25:22,663-Speed 5963.90 samples/sec   Loss 7.5278   LearningRate 0.1137   Epoch: 10   Global Step: 107890   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:25:29,522-Speed 5972.33 samples/sec   Loss 7.5346   LearningRate 0.1136   Epoch: 10   Global Step: 107900   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-01-08 17:25:36,369-Speed 5983.41 samples/sec   Loss 7.5666   LearningRate 0.1136   Epoch: 10   Global Step: 107910   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:25:43,226-Speed 5974.72 samples/sec   Loss 7.5816   LearningRate 0.1136   Epoch: 10   Global Step: 107920   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-01-08 17:25:50,076-Speed 5980.51 samples/sec   Loss 7.5327   LearningRate 0.1136   Epoch: 10   Global Step: 107930   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:25:56,941-Speed 5968.25 samples/sec   Loss 7.5378   LearningRate 0.1135   Epoch: 10   Global Step: 107940   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:26:03,784-Speed 5987.10 samples/sec   Loss 7.5943   LearningRate 0.1135   Epoch: 10   Global Step: 107950   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:26:10,654-Speed 5962.07 samples/sec   Loss 7.5218   LearningRate 0.1135   Epoch: 10   Global Step: 107960   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:26:17,600-Speed 5901.60 samples/sec   Loss 7.4394   LearningRate 0.1135   Epoch: 10   Global Step: 107970   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:26:24,622-Speed 5834.09 samples/sec   Loss 7.5880   LearningRate 0.1135   Epoch: 10   Global Step: 107980   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:26:31,553-Speed 5911.16 samples/sec   Loss 7.5780   LearningRate 0.1134   Epoch: 10   Global Step: 107990   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:26:38,413-Speed 5972.14 samples/sec   Loss 7.5326   LearningRate 0.1134   Epoch: 10   Global Step: 108000   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:26:45,247-Speed 5994.40 samples/sec   Loss 7.5427   LearningRate 0.1134   Epoch: 10   Global Step: 108010   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:26:52,101-Speed 5976.19 samples/sec   Loss 7.5711   LearningRate 0.1134   Epoch: 10   Global Step: 108020   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:26:58,959-Speed 5974.28 samples/sec   Loss 7.5460   LearningRate 0.1133   Epoch: 10   Global Step: 108030   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:27:05,811-Speed 5981.15 samples/sec   Loss 7.6009   LearningRate 0.1133   Epoch: 10   Global Step: 108040   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:27:12,659-Speed 5981.88 samples/sec   Loss 7.5689   LearningRate 0.1133   Epoch: 10   Global Step: 108050   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:27:19,521-Speed 5970.81 samples/sec   Loss 7.5627   LearningRate 0.1133   Epoch: 10   Global Step: 108060   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:27:26,400-Speed 5955.78 samples/sec   Loss 7.5644   LearningRate 0.1132   Epoch: 10   Global Step: 108070   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:27:33,291-Speed 5944.27 samples/sec   Loss 7.5955   LearningRate 0.1132   Epoch: 10   Global Step: 108080   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:27:40,162-Speed 5962.82 samples/sec   Loss 7.6258   LearningRate 0.1132   Epoch: 10   Global Step: 108090   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:27:47,048-Speed 5949.88 samples/sec   Loss 7.5916   LearningRate 0.1132   Epoch: 10   Global Step: 108100   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:27:53,892-Speed 5985.43 samples/sec   Loss 7.5289   LearningRate 0.1132   Epoch: 10   Global Step: 108110   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:28:00,772-Speed 5955.44 samples/sec   Loss 7.5185   LearningRate 0.1131   Epoch: 10   Global Step: 108120   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:28:07,641-Speed 5964.47 samples/sec   Loss 7.5417   LearningRate 0.1131   Epoch: 10   Global Step: 108130   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:28:14,486-Speed 5984.24 samples/sec   Loss 7.5273   LearningRate 0.1131   Epoch: 10   Global Step: 108140   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:28:21,354-Speed 5966.57 samples/sec   Loss 7.5610   LearningRate 0.1131   Epoch: 10   Global Step: 108150   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:28:28,206-Speed 5980.16 samples/sec   Loss 7.5177   LearningRate 0.1130   Epoch: 10   Global Step: 108160   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:28:35,041-Speed 5993.71 samples/sec   Loss 7.5939   LearningRate 0.1130   Epoch: 10   Global Step: 108170   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 17:28:41,917-Speed 5960.82 samples/sec   Loss 7.5583   LearningRate 0.1130   Epoch: 10   Global Step: 108180   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 17:28:48,820-Speed 5934.42 samples/sec   Loss 7.5644   LearningRate 0.1130   Epoch: 10   Global Step: 108190   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 17:28:55,692-Speed 5961.96 samples/sec   Loss 7.5270   LearningRate 0.1130   Epoch: 10   Global Step: 108200   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 17:29:02,540-Speed 5983.95 samples/sec   Loss 7.5802   LearningRate 0.1129   Epoch: 10   Global Step: 108210   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 17:29:09,418-Speed 5956.34 samples/sec   Loss 7.5423   LearningRate 0.1129   Epoch: 10   Global Step: 108220   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 17:29:16,261-Speed 5986.42 samples/sec   Loss 7.5599   LearningRate 0.1129   Epoch: 10   Global Step: 108230   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 17:29:23,110-Speed 5984.21 samples/sec   Loss 7.5051   LearningRate 0.1129   Epoch: 10   Global Step: 108240   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 17:29:29,967-Speed 5974.31 samples/sec   Loss 7.6013   LearningRate 0.1128   Epoch: 10   Global Step: 108250   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 17:29:36,865-Speed 5939.11 samples/sec   Loss 7.5051   LearningRate 0.1128   Epoch: 10   Global Step: 108260   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 17:29:43,736-Speed 5962.13 samples/sec   Loss 7.5988   LearningRate 0.1128   Epoch: 10   Global Step: 108270   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:29:50,597-Speed 5971.27 samples/sec   Loss 7.5866   LearningRate 0.1128   Epoch: 10   Global Step: 108280   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:29:57,453-Speed 5974.92 samples/sec   Loss 7.4988   LearningRate 0.1127   Epoch: 10   Global Step: 108290   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:30:04,315-Speed 5970.47 samples/sec   Loss 7.4987   LearningRate 0.1127   Epoch: 10   Global Step: 108300   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:30:11,181-Speed 5967.22 samples/sec   Loss 7.4950   LearningRate 0.1127   Epoch: 10   Global Step: 108310   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:30:18,050-Speed 5964.45 samples/sec   Loss 7.5163   LearningRate 0.1127   Epoch: 10   Global Step: 108320   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:30:24,930-Speed 5954.27 samples/sec   Loss 7.5203   LearningRate 0.1127   Epoch: 10   Global Step: 108330   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:30:31,901-Speed 5877.55 samples/sec   Loss 7.5787   LearningRate 0.1126   Epoch: 10   Global Step: 108340   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:30:38,781-Speed 5953.87 samples/sec   Loss 7.5723   LearningRate 0.1126   Epoch: 10   Global Step: 108350   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:30:45,686-Speed 5933.94 samples/sec   Loss 7.5624   LearningRate 0.1126   Epoch: 10   Global Step: 108360   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:30:52,551-Speed 5967.03 samples/sec   Loss 7.5515   LearningRate 0.1126   Epoch: 10   Global Step: 108370   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:30:59,406-Speed 5976.16 samples/sec   Loss 7.5381   LearningRate 0.1125   Epoch: 10   Global Step: 108380   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:31:06,269-Speed 5969.79 samples/sec   Loss 7.5163   LearningRate 0.1125   Epoch: 10   Global Step: 108390   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:31:13,131-Speed 5969.85 samples/sec   Loss 7.4937   LearningRate 0.1125   Epoch: 10   Global Step: 108400   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:31:19,978-Speed 5983.11 samples/sec   Loss 7.5433   LearningRate 0.1125   Epoch: 10   Global Step: 108410   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:31:26,847-Speed 5964.14 samples/sec   Loss 7.4792   LearningRate 0.1125   Epoch: 10   Global Step: 108420   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:31:33,714-Speed 5966.13 samples/sec   Loss 7.5833   LearningRate 0.1124   Epoch: 10   Global Step: 108430   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:31:40,560-Speed 5984.16 samples/sec   Loss 7.5692   LearningRate 0.1124   Epoch: 10   Global Step: 108440   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:31:47,428-Speed 5965.18 samples/sec   Loss 7.4860   LearningRate 0.1124   Epoch: 10   Global Step: 108450   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:31:54,296-Speed 5965.34 samples/sec   Loss 7.5311   LearningRate 0.1124   Epoch: 10   Global Step: 108460   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:32:01,145-Speed 5981.37 samples/sec   Loss 7.5116   LearningRate 0.1123   Epoch: 10   Global Step: 108470   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:32:08,007-Speed 5970.40 samples/sec   Loss 7.5152   LearningRate 0.1123   Epoch: 10   Global Step: 108480   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:32:14,869-Speed 5969.91 samples/sec   Loss 7.5499   LearningRate 0.1123   Epoch: 10   Global Step: 108490   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:32:21,865-Speed 5856.17 samples/sec   Loss 7.4803   LearningRate 0.1123   Epoch: 10   Global Step: 108500   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:32:28,727-Speed 5970.20 samples/sec   Loss 7.4934   LearningRate 0.1122   Epoch: 10   Global Step: 108510   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:32:35,581-Speed 5977.31 samples/sec   Loss 7.5284   LearningRate 0.1122   Epoch: 10   Global Step: 108520   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:32:42,432-Speed 5980.12 samples/sec   Loss 7.4927   LearningRate 0.1122   Epoch: 10   Global Step: 108530   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:32:49,335-Speed 5934.31 samples/sec   Loss 7.5260   LearningRate 0.1122   Epoch: 10   Global Step: 108540   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:32:56,268-Speed 5909.23 samples/sec   Loss 7.5459   LearningRate 0.1122   Epoch: 10   Global Step: 108550   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:33:03,161-Speed 5943.25 samples/sec   Loss 7.5878   LearningRate 0.1121   Epoch: 10   Global Step: 108560   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:33:10,054-Speed 5944.15 samples/sec   Loss 7.5755   LearningRate 0.1121   Epoch: 10   Global Step: 108570   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-08 17:33:16,909-Speed 5975.84 samples/sec   Loss 7.5807   LearningRate 0.1121   Epoch: 10   Global Step: 108580   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:33:23,768-Speed 5972.43 samples/sec   Loss 7.4804   LearningRate 0.1121   Epoch: 10   Global Step: 108590   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:33:30,636-Speed 5965.62 samples/sec   Loss 7.5263   LearningRate 0.1120   Epoch: 10   Global Step: 108600   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:33:37,514-Speed 5956.23 samples/sec   Loss 7.5312   LearningRate 0.1120   Epoch: 10   Global Step: 108610   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:33:44,432-Speed 5921.37 samples/sec   Loss 7.5098   LearningRate 0.1120   Epoch: 10   Global Step: 108620   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:33:51,291-Speed 5973.02 samples/sec   Loss 7.4886   LearningRate 0.1120   Epoch: 10   Global Step: 108630   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:33:58,183-Speed 5944.96 samples/sec   Loss 7.4717   LearningRate 0.1120   Epoch: 10   Global Step: 108640   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:34:05,036-Speed 5977.19 samples/sec   Loss 7.5712   LearningRate 0.1119   Epoch: 10   Global Step: 108650   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:34:11,904-Speed 5965.52 samples/sec   Loss 7.5304   LearningRate 0.1119   Epoch: 10   Global Step: 108660   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:34:18,763-Speed 5972.80 samples/sec   Loss 7.5128   LearningRate 0.1119   Epoch: 10   Global Step: 108670   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:34:25,615-Speed 5978.93 samples/sec   Loss 7.5060   LearningRate 0.1119   Epoch: 10   Global Step: 108680   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:34:32,460-Speed 5985.38 samples/sec   Loss 7.4916   LearningRate 0.1118   Epoch: 10   Global Step: 108690   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:34:39,331-Speed 5962.54 samples/sec   Loss 7.4616   LearningRate 0.1118   Epoch: 10   Global Step: 108700   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:34:46,199-Speed 5965.16 samples/sec   Loss 7.5074   LearningRate 0.1118   Epoch: 10   Global Step: 108710   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:34:53,120-Speed 5919.26 samples/sec   Loss 7.4830   LearningRate 0.1118   Epoch: 10   Global Step: 108720   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:35:00,054-Speed 5908.51 samples/sec   Loss 7.4803   LearningRate 0.1117   Epoch: 10   Global Step: 108730   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:35:06,895-Speed 5987.54 samples/sec   Loss 7.4962   LearningRate 0.1117   Epoch: 10   Global Step: 108740   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:35:13,745-Speed 5980.98 samples/sec   Loss 7.5230   LearningRate 0.1117   Epoch: 10   Global Step: 108750   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:35:20,619-Speed 5959.70 samples/sec   Loss 7.4766   LearningRate 0.1117   Epoch: 10   Global Step: 108760   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:35:27,475-Speed 5975.43 samples/sec   Loss 7.5133   LearningRate 0.1117   Epoch: 10   Global Step: 108770   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:35:34,320-Speed 5984.80 samples/sec   Loss 7.4674   LearningRate 0.1116   Epoch: 10   Global Step: 108780   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:35:41,185-Speed 5968.04 samples/sec   Loss 7.5124   LearningRate 0.1116   Epoch: 10   Global Step: 108790   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:35:48,036-Speed 5979.63 samples/sec   Loss 7.5803   LearningRate 0.1116   Epoch: 10   Global Step: 108800   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:35:54,890-Speed 5977.51 samples/sec   Loss 7.5147   LearningRate 0.1116   Epoch: 10   Global Step: 108810   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:36:01,742-Speed 5978.63 samples/sec   Loss 7.5622   LearningRate 0.1115   Epoch: 10   Global Step: 108820   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:36:08,641-Speed 5938.71 samples/sec   Loss 7.5636   LearningRate 0.1115   Epoch: 10   Global Step: 108830   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:36:15,570-Speed 5912.07 samples/sec   Loss 7.4509   LearningRate 0.1115   Epoch: 10   Global Step: 108840   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:36:22,435-Speed 5967.45 samples/sec   Loss 7.4992   LearningRate 0.1115   Epoch: 10   Global Step: 108850   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:36:29,300-Speed 5967.68 samples/sec   Loss 7.5009   LearningRate 0.1115   Epoch: 10   Global Step: 108860   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:36:36,165-Speed 5967.19 samples/sec   Loss 7.5259   LearningRate 0.1114   Epoch: 10   Global Step: 108870   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:36:43,063-Speed 5939.53 samples/sec   Loss 7.5121   LearningRate 0.1114   Epoch: 10   Global Step: 108880   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:36:49,907-Speed 5985.79 samples/sec   Loss 7.4779   LearningRate 0.1114   Epoch: 10   Global Step: 108890   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:36:56,874-Speed 5880.12 samples/sec   Loss 7.4128   LearningRate 0.1114   Epoch: 10   Global Step: 108900   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:37:03,751-Speed 5957.57 samples/sec   Loss 7.5377   LearningRate 0.1113   Epoch: 10   Global Step: 108910   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:37:10,605-Speed 5976.55 samples/sec   Loss 7.4855   LearningRate 0.1113   Epoch: 10   Global Step: 108920   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:37:17,459-Speed 5977.10 samples/sec   Loss 7.4772   LearningRate 0.1113   Epoch: 10   Global Step: 108930   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:37:24,310-Speed 5979.59 samples/sec   Loss 7.4792   LearningRate 0.1113   Epoch: 10   Global Step: 108940   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:37:31,176-Speed 5966.99 samples/sec   Loss 7.4972   LearningRate 0.1112   Epoch: 10   Global Step: 108950   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:37:38,046-Speed 5963.89 samples/sec   Loss 7.4504   LearningRate 0.1112   Epoch: 10   Global Step: 108960   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:37:44,900-Speed 5977.04 samples/sec   Loss 7.4669   LearningRate 0.1112   Epoch: 10   Global Step: 108970   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:37:51,773-Speed 5960.42 samples/sec   Loss 7.5284   LearningRate 0.1112   Epoch: 10   Global Step: 108980   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:37:58,639-Speed 5966.77 samples/sec   Loss 7.4872   LearningRate 0.1112   Epoch: 10   Global Step: 108990   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:38:05,498-Speed 5972.34 samples/sec   Loss 7.5012   LearningRate 0.1111   Epoch: 10   Global Step: 109000   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:38:12,349-Speed 5980.29 samples/sec   Loss 7.4536   LearningRate 0.1111   Epoch: 10   Global Step: 109010   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:38:19,208-Speed 5972.86 samples/sec   Loss 7.5161   LearningRate 0.1111   Epoch: 10   Global Step: 109020   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:38:26,082-Speed 5960.46 samples/sec   Loss 7.4497   LearningRate 0.1111   Epoch: 10   Global Step: 109030   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:38:32,945-Speed 5970.10 samples/sec   Loss 7.4838   LearningRate 0.1110   Epoch: 10   Global Step: 109040   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:38:39,812-Speed 5965.20 samples/sec   Loss 7.4709   LearningRate 0.1110   Epoch: 10   Global Step: 109050   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:38:46,676-Speed 5968.40 samples/sec   Loss 7.4587   LearningRate 0.1110   Epoch: 10   Global Step: 109060   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:38:53,535-Speed 5972.58 samples/sec   Loss 7.4570   LearningRate 0.1110   Epoch: 10   Global Step: 109070   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:39:00,394-Speed 5973.01 samples/sec   Loss 7.3944   LearningRate 0.1110   Epoch: 10   Global Step: 109080   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:39:07,285-Speed 5945.20 samples/sec   Loss 7.5756   LearningRate 0.1109   Epoch: 10   Global Step: 109090   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:39:14,131-Speed 5984.41 samples/sec   Loss 7.5479   LearningRate 0.1109   Epoch: 10   Global Step: 109100   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:39:21,010-Speed 5955.32 samples/sec   Loss 7.5055   LearningRate 0.1109   Epoch: 10   Global Step: 109110   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:39:27,868-Speed 5973.31 samples/sec   Loss 7.4515   LearningRate 0.1109   Epoch: 10   Global Step: 109120   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:39:34,724-Speed 5976.32 samples/sec   Loss 7.4520   LearningRate 0.1108   Epoch: 10   Global Step: 109130   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:39:41,568-Speed 5985.33 samples/sec   Loss 7.4796   LearningRate 0.1108   Epoch: 10   Global Step: 109140   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:39:48,443-Speed 5959.11 samples/sec   Loss 7.4850   LearningRate 0.1108   Epoch: 10   Global Step: 109150   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:39:55,302-Speed 5973.88 samples/sec   Loss 7.5030   LearningRate 0.1108   Epoch: 10   Global Step: 109160   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:40:02,157-Speed 5976.20 samples/sec   Loss 7.5161   LearningRate 0.1108   Epoch: 10   Global Step: 109170   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:40:09,056-Speed 5938.54 samples/sec   Loss 7.5090   LearningRate 0.1107   Epoch: 10   Global Step: 109180   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:40:15,962-Speed 5932.28 samples/sec   Loss 7.4834   LearningRate 0.1107   Epoch: 10   Global Step: 109190   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:40:22,852-Speed 5946.43 samples/sec   Loss 7.4258   LearningRate 0.1107   Epoch: 10   Global Step: 109200   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:40:29,751-Speed 5938.46 samples/sec   Loss 7.4670   LearningRate 0.1107   Epoch: 10   Global Step: 109210   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:40:36,616-Speed 5967.80 samples/sec   Loss 7.4949   LearningRate 0.1106   Epoch: 10   Global Step: 109220   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:40:43,490-Speed 5959.43 samples/sec   Loss 7.4706   LearningRate 0.1106   Epoch: 10   Global Step: 109230   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:40:50,352-Speed 5970.99 samples/sec   Loss 7.4971   LearningRate 0.1106   Epoch: 10   Global Step: 109240   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:40:57,225-Speed 5960.48 samples/sec   Loss 7.4770   LearningRate 0.1106   Epoch: 10   Global Step: 109250   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:41:04,092-Speed 5966.22 samples/sec   Loss 7.4591   LearningRate 0.1105   Epoch: 10   Global Step: 109260   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:41:10,943-Speed 5979.80 samples/sec   Loss 7.4965   LearningRate 0.1105   Epoch: 10   Global Step: 109270   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:41:17,795-Speed 5978.73 samples/sec   Loss 7.4575   LearningRate 0.1105   Epoch: 10   Global Step: 109280   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:41:24,648-Speed 5978.65 samples/sec   Loss 7.4486   LearningRate 0.1105   Epoch: 10   Global Step: 109290   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:41:31,518-Speed 5963.20 samples/sec   Loss 7.4673   LearningRate 0.1105   Epoch: 10   Global Step: 109300   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:41:38,384-Speed 5967.14 samples/sec   Loss 7.5152   LearningRate 0.1104   Epoch: 10   Global Step: 109310   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:41:45,256-Speed 5961.61 samples/sec   Loss 7.4680   LearningRate 0.1104   Epoch: 10   Global Step: 109320   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:41:52,135-Speed 5955.28 samples/sec   Loss 7.5237   LearningRate 0.1104   Epoch: 10   Global Step: 109330   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-08 17:41:59,020-Speed 5950.48 samples/sec   Loss 7.4288   LearningRate 0.1104   Epoch: 10   Global Step: 109340   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:42:05,914-Speed 5942.91 samples/sec   Loss 7.5156   LearningRate 0.1103   Epoch: 10   Global Step: 109350   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:42:12,800-Speed 5949.29 samples/sec   Loss 7.4588   LearningRate 0.1103   Epoch: 10   Global Step: 109360   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:42:19,645-Speed 5984.92 samples/sec   Loss 7.4452   LearningRate 0.1103   Epoch: 10   Global Step: 109370   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:42:26,495-Speed 5980.70 samples/sec   Loss 7.4805   LearningRate 0.1103   Epoch: 10   Global Step: 109380   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:42:33,349-Speed 5976.90 samples/sec   Loss 7.4303   LearningRate 0.1103   Epoch: 10   Global Step: 109390   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:42:40,212-Speed 5970.10 samples/sec   Loss 7.4705   LearningRate 0.1102   Epoch: 10   Global Step: 109400   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:42:47,069-Speed 5974.33 samples/sec   Loss 7.4670   LearningRate 0.1102   Epoch: 10   Global Step: 109410   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:42:53,931-Speed 5970.72 samples/sec   Loss 7.5174   LearningRate 0.1102   Epoch: 10   Global Step: 109420   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:43:00,771-Speed 5989.42 samples/sec   Loss 7.4407   LearningRate 0.1102   Epoch: 10   Global Step: 109430   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:43:07,640-Speed 5964.21 samples/sec   Loss 7.4570   LearningRate 0.1101   Epoch: 10   Global Step: 109440   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:43:14,493-Speed 5977.76 samples/sec   Loss 7.4873   LearningRate 0.1101   Epoch: 10   Global Step: 109450   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:43:21,358-Speed 5968.06 samples/sec   Loss 7.4553   LearningRate 0.1101   Epoch: 10   Global Step: 109460   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:43:28,202-Speed 5984.85 samples/sec   Loss 7.4272   LearningRate 0.1101   Epoch: 10   Global Step: 109470   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:43:35,065-Speed 5969.01 samples/sec   Loss 7.3913   LearningRate 0.1101   Epoch: 10   Global Step: 109480   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:43:41,923-Speed 5973.94 samples/sec   Loss 7.4971   LearningRate 0.1100   Epoch: 10   Global Step: 109490   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:43:48,772-Speed 5981.84 samples/sec   Loss 7.4097   LearningRate 0.1100   Epoch: 10   Global Step: 109500   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:43:55,643-Speed 5962.78 samples/sec   Loss 7.4212   LearningRate 0.1100   Epoch: 10   Global Step: 109510   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:44:02,513-Speed 5963.42 samples/sec   Loss 7.4602   LearningRate 0.1100   Epoch: 10   Global Step: 109520   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:44:09,362-Speed 5981.92 samples/sec   Loss 7.3830   LearningRate 0.1099   Epoch: 10   Global Step: 109530   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:44:16,234-Speed 5961.28 samples/sec   Loss 7.4431   LearningRate 0.1099   Epoch: 10   Global Step: 109540   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:44:23,207-Speed 5880.95 samples/sec   Loss 7.4595   LearningRate 0.1099   Epoch: 10   Global Step: 109550   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:44:30,181-Speed 5874.40 samples/sec   Loss 7.4487   LearningRate 0.1099   Epoch: 10   Global Step: 109560   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:44:37,070-Speed 5947.24 samples/sec   Loss 7.4176   LearningRate 0.1099   Epoch: 10   Global Step: 109570   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:44:43,918-Speed 5983.88 samples/sec   Loss 7.4386   LearningRate 0.1098   Epoch: 10   Global Step: 109580   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 17:44:51,072-Speed 5726.54 samples/sec   Loss 7.4186   LearningRate 0.1098   Epoch: 10   Global Step: 109590   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 17:44:57,948-Speed 5958.61 samples/sec   Loss 7.5088   LearningRate 0.1098   Epoch: 10   Global Step: 109600   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 17:45:04,805-Speed 5974.69 samples/sec   Loss 7.5842   LearningRate 0.1098   Epoch: 10   Global Step: 109610   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 17:45:11,652-Speed 5983.01 samples/sec   Loss 7.5439   LearningRate 0.1097   Epoch: 10   Global Step: 109620   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 17:45:18,519-Speed 5965.66 samples/sec   Loss 7.5299   LearningRate 0.1097   Epoch: 10   Global Step: 109630   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 17:45:25,371-Speed 5980.37 samples/sec   Loss 7.4682   LearningRate 0.1097   Epoch: 10   Global Step: 109640   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 17:45:32,228-Speed 5974.31 samples/sec   Loss 7.4468   LearningRate 0.1097   Epoch: 10   Global Step: 109650   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 17:45:39,076-Speed 5982.43 samples/sec   Loss 7.5132   LearningRate 0.1096   Epoch: 10   Global Step: 109660   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 17:45:45,938-Speed 5971.10 samples/sec   Loss 7.4564   LearningRate 0.1096   Epoch: 10   Global Step: 109670   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 17:45:52,790-Speed 5978.83 samples/sec   Loss 7.3970   LearningRate 0.1096   Epoch: 10   Global Step: 109680   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:45:59,635-Speed 5985.14 samples/sec   Loss 7.4191   LearningRate 0.1096   Epoch: 10   Global Step: 109690   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:46:06,480-Speed 5985.24 samples/sec   Loss 7.4409   LearningRate 0.1096   Epoch: 10   Global Step: 109700   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:46:13,343-Speed 5969.29 samples/sec   Loss 7.3625   LearningRate 0.1095   Epoch: 10   Global Step: 109710   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:46:20,205-Speed 5970.52 samples/sec   Loss 7.4050   LearningRate 0.1095   Epoch: 10   Global Step: 109720   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:46:27,076-Speed 5962.47 samples/sec   Loss 7.3850   LearningRate 0.1095   Epoch: 10   Global Step: 109730   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:46:33,942-Speed 5966.41 samples/sec   Loss 7.4923   LearningRate 0.1095   Epoch: 10   Global Step: 109740   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:46:40,809-Speed 5966.64 samples/sec   Loss 7.4577   LearningRate 0.1094   Epoch: 10   Global Step: 109750   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:46:47,656-Speed 5982.88 samples/sec   Loss 7.4856   LearningRate 0.1094   Epoch: 10   Global Step: 109760   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:46:54,506-Speed 5982.72 samples/sec   Loss 7.4598   LearningRate 0.1094   Epoch: 10   Global Step: 109770   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:47:01,360-Speed 5978.18 samples/sec   Loss 7.4101   LearningRate 0.1094   Epoch: 10   Global Step: 109780   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:47:08,307-Speed 5897.61 samples/sec   Loss 7.4378   LearningRate 0.1094   Epoch: 10   Global Step: 109790   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:47:15,231-Speed 5916.19 samples/sec   Loss 7.4948   LearningRate 0.1093   Epoch: 10   Global Step: 109800   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:47:22,157-Speed 5915.68 samples/sec   Loss 7.4380   LearningRate 0.1093   Epoch: 10   Global Step: 109810   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:47:29,081-Speed 5916.96 samples/sec   Loss 7.5022   LearningRate 0.1093   Epoch: 10   Global Step: 109820   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:47:36,025-Speed 5898.84 samples/sec   Loss 7.3652   LearningRate 0.1093   Epoch: 10   Global Step: 109830   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:47:42,952-Speed 5914.85 samples/sec   Loss 7.4172   LearningRate 0.1092   Epoch: 10   Global Step: 109840   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:47:49,855-Speed 5934.58 samples/sec   Loss 7.3834   LearningRate 0.1092   Epoch: 10   Global Step: 109850   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:47:56,773-Speed 5922.10 samples/sec   Loss 7.4472   LearningRate 0.1092   Epoch: 10   Global Step: 109860   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:48:03,696-Speed 5917.34 samples/sec   Loss 7.3754   LearningRate 0.1092   Epoch: 10   Global Step: 109870   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:48:10,601-Speed 5933.05 samples/sec   Loss 7.3794   LearningRate 0.1092   Epoch: 10   Global Step: 109880   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-08 17:48:17,534-Speed 5909.26 samples/sec   Loss 7.4213   LearningRate 0.1091   Epoch: 10   Global Step: 109890   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-08 17:48:24,420-Speed 5949.84 samples/sec   Loss 7.4333   LearningRate 0.1091   Epoch: 10   Global Step: 109900   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:48:31,331-Speed 5927.52 samples/sec   Loss 7.4148   LearningRate 0.1091   Epoch: 10   Global Step: 109910   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:48:38,214-Speed 5952.21 samples/sec   Loss 7.4880   LearningRate 0.1091   Epoch: 10   Global Step: 109920   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:48:45,077-Speed 5969.16 samples/sec   Loss 7.4018   LearningRate 0.1090   Epoch: 10   Global Step: 109930   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:48:51,944-Speed 5966.44 samples/sec   Loss 7.4367   LearningRate 0.1090   Epoch: 10   Global Step: 109940   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:48:58,901-Speed 5888.58 samples/sec   Loss 7.4271   LearningRate 0.1090   Epoch: 10   Global Step: 109950   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:49:05,763-Speed 5969.79 samples/sec   Loss 7.3763   LearningRate 0.1090   Epoch: 10   Global Step: 109960   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:49:12,621-Speed 5973.96 samples/sec   Loss 7.4033   LearningRate 0.1090   Epoch: 10   Global Step: 109970   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:49:19,470-Speed 5981.38 samples/sec   Loss 7.4417   LearningRate 0.1089   Epoch: 10   Global Step: 109980   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:49:26,333-Speed 5969.29 samples/sec   Loss 7.3959   LearningRate 0.1089   Epoch: 10   Global Step: 109990   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:49:33,177-Speed 5985.93 samples/sec   Loss 7.4159   LearningRate 0.1089   Epoch: 10   Global Step: 110000   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:49:59,867-[lfw][110000]XNorm: 22.489254
Training: 2022-01-08 17:49:59,868-[lfw][110000]Accuracy-Flip: 0.99783+-0.00269
Training: 2022-01-08 17:49:59,868-[lfw][110000]Accuracy-Highest: 0.99783
Training: 2022-01-08 17:50:30,697-[cfp_fp][110000]XNorm: 19.859721
Training: 2022-01-08 17:50:30,698-[cfp_fp][110000]Accuracy-Flip: 0.98057+-0.00613
Training: 2022-01-08 17:50:30,699-[cfp_fp][110000]Accuracy-Highest: 0.98357
Training: 2022-01-08 17:50:57,367-[agedb_30][110000]XNorm: 21.727236
Training: 2022-01-08 17:50:57,368-[agedb_30][110000]Accuracy-Flip: 0.96850+-0.00647
Training: 2022-01-08 17:50:57,368-[agedb_30][110000]Accuracy-Highest: 0.97200
Training: 2022-01-08 17:51:04,224-Speed 449.88 samples/sec   Loss 7.4067   LearningRate 0.1089   Epoch: 10   Global Step: 110010   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:51:11,116-Speed 5945.95 samples/sec   Loss 7.4301   LearningRate 0.1088   Epoch: 10   Global Step: 110020   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:51:18,076-Speed 5886.12 samples/sec   Loss 7.4576   LearningRate 0.1088   Epoch: 10   Global Step: 110030   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:51:24,927-Speed 5979.63 samples/sec   Loss 7.4126   LearningRate 0.1088   Epoch: 10   Global Step: 110040   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:51:31,796-Speed 5964.20 samples/sec   Loss 7.3935   LearningRate 0.1088   Epoch: 10   Global Step: 110050   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:51:38,664-Speed 5965.56 samples/sec   Loss 7.3637   LearningRate 0.1088   Epoch: 10   Global Step: 110060   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:51:45,539-Speed 5961.09 samples/sec   Loss 7.4348   LearningRate 0.1087   Epoch: 10   Global Step: 110070   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:51:52,416-Speed 5956.61 samples/sec   Loss 7.3654   LearningRate 0.1087   Epoch: 10   Global Step: 110080   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:51:59,279-Speed 5969.15 samples/sec   Loss 7.4266   LearningRate 0.1087   Epoch: 10   Global Step: 110090   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:52:06,162-Speed 5954.79 samples/sec   Loss 7.3873   LearningRate 0.1087   Epoch: 10   Global Step: 110100   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:52:13,035-Speed 5960.26 samples/sec   Loss 7.4282   LearningRate 0.1086   Epoch: 10   Global Step: 110110   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:52:19,904-Speed 5964.42 samples/sec   Loss 7.4589   LearningRate 0.1086   Epoch: 10   Global Step: 110120   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:52:26,771-Speed 5965.50 samples/sec   Loss 7.4172   LearningRate 0.1086   Epoch: 10   Global Step: 110130   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:52:33,627-Speed 5975.20 samples/sec   Loss 7.4838   LearningRate 0.1086   Epoch: 10   Global Step: 110140   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:52:40,517-Speed 5946.29 samples/sec   Loss 7.4093   LearningRate 0.1086   Epoch: 10   Global Step: 110150   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:52:47,379-Speed 5970.52 samples/sec   Loss 7.3929   LearningRate 0.1085   Epoch: 10   Global Step: 110160   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:52:54,245-Speed 5966.73 samples/sec   Loss 7.4564   LearningRate 0.1085   Epoch: 10   Global Step: 110170   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-08 17:53:01,114-Speed 5964.11 samples/sec   Loss 7.4164   LearningRate 0.1085   Epoch: 10   Global Step: 110180   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-08 17:53:07,990-Speed 5957.98 samples/sec   Loss 7.3989   LearningRate 0.1085   Epoch: 10   Global Step: 110190   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:53:14,852-Speed 5970.32 samples/sec   Loss 7.4356   LearningRate 0.1084   Epoch: 10   Global Step: 110200   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:53:21,701-Speed 5981.97 samples/sec   Loss 7.4361   LearningRate 0.1084   Epoch: 10   Global Step: 110210   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:53:28,557-Speed 5975.46 samples/sec   Loss 7.4413   LearningRate 0.1084   Epoch: 10   Global Step: 110220   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:53:35,404-Speed 5983.47 samples/sec   Loss 7.4217   LearningRate 0.1084   Epoch: 10   Global Step: 110230   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:53:42,247-Speed 5986.87 samples/sec   Loss 7.3389   LearningRate 0.1084   Epoch: 10   Global Step: 110240   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:53:49,087-Speed 5988.83 samples/sec   Loss 7.3828   LearningRate 0.1083   Epoch: 10   Global Step: 110250   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:53:55,938-Speed 5979.64 samples/sec   Loss 7.4038   LearningRate 0.1083   Epoch: 10   Global Step: 110260   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:54:02,791-Speed 5978.28 samples/sec   Loss 7.3843   LearningRate 0.1083   Epoch: 10   Global Step: 110270   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:54:09,640-Speed 5981.89 samples/sec   Loss 7.4403   LearningRate 0.1083   Epoch: 10   Global Step: 110280   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:54:16,499-Speed 5972.41 samples/sec   Loss 7.4091   LearningRate 0.1082   Epoch: 10   Global Step: 110290   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:54:23,377-Speed 5956.95 samples/sec   Loss 7.4115   LearningRate 0.1082   Epoch: 10   Global Step: 110300   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:54:30,253-Speed 5958.38 samples/sec   Loss 7.3554   LearningRate 0.1082   Epoch: 10   Global Step: 110310   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:54:37,098-Speed 5984.17 samples/sec   Loss 7.4583   LearningRate 0.1082   Epoch: 10   Global Step: 110320   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:54:43,951-Speed 5980.54 samples/sec   Loss 7.3978   LearningRate 0.1082   Epoch: 10   Global Step: 110330   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:54:50,809-Speed 5977.36 samples/sec   Loss 7.4416   LearningRate 0.1081   Epoch: 10   Global Step: 110340   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:54:57,658-Speed 5980.93 samples/sec   Loss 7.4339   LearningRate 0.1081   Epoch: 10   Global Step: 110350   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:55:04,524-Speed 5968.10 samples/sec   Loss 7.4270   LearningRate 0.1081   Epoch: 10   Global Step: 110360   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:55:11,381-Speed 5974.35 samples/sec   Loss 7.3873   LearningRate 0.1081   Epoch: 10   Global Step: 110370   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:55:18,269-Speed 5947.81 samples/sec   Loss 7.4266   LearningRate 0.1080   Epoch: 10   Global Step: 110380   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:55:25,118-Speed 5980.96 samples/sec   Loss 7.4043   LearningRate 0.1080   Epoch: 10   Global Step: 110390   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:55:31,987-Speed 5965.18 samples/sec   Loss 7.3513   LearningRate 0.1080   Epoch: 10   Global Step: 110400   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:55:38,845-Speed 5972.55 samples/sec   Loss 7.3343   LearningRate 0.1080   Epoch: 10   Global Step: 110410   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:55:45,717-Speed 5961.52 samples/sec   Loss 7.3560   LearningRate 0.1080   Epoch: 10   Global Step: 110420   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:55:52,571-Speed 5977.85 samples/sec   Loss 7.3826   LearningRate 0.1079   Epoch: 10   Global Step: 110430   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-08 17:55:59,430-Speed 5971.94 samples/sec   Loss 7.3737   LearningRate 0.1079   Epoch: 10   Global Step: 110440   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-08 17:56:06,253-Speed 6004.63 samples/sec   Loss 7.4276   LearningRate 0.1079   Epoch: 10   Global Step: 110450   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 17:56:13,124-Speed 5962.36 samples/sec   Loss 7.4500   LearningRate 0.1079   Epoch: 10   Global Step: 110460   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 17:56:19,980-Speed 5975.24 samples/sec   Loss 7.3768   LearningRate 0.1078   Epoch: 10   Global Step: 110470   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 17:56:26,847-Speed 5966.80 samples/sec   Loss 7.4014   LearningRate 0.1078   Epoch: 10   Global Step: 110480   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 17:56:33,725-Speed 5956.46 samples/sec   Loss 7.4482   LearningRate 0.1078   Epoch: 10   Global Step: 110490   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 17:56:40,611-Speed 5949.75 samples/sec   Loss 7.3904   LearningRate 0.1078   Epoch: 10   Global Step: 110500   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 17:56:47,461-Speed 5981.31 samples/sec   Loss 7.4134   LearningRate 0.1078   Epoch: 10   Global Step: 110510   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 17:56:54,314-Speed 5980.98 samples/sec   Loss 7.3320   LearningRate 0.1077   Epoch: 10   Global Step: 110520   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 17:57:01,169-Speed 5975.97 samples/sec   Loss 7.4036   LearningRate 0.1077   Epoch: 10   Global Step: 110530   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 17:57:08,026-Speed 5975.07 samples/sec   Loss 7.4333   LearningRate 0.1077   Epoch: 10   Global Step: 110540   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-01-08 17:57:14,884-Speed 5973.75 samples/sec   Loss 7.3865   LearningRate 0.1077   Epoch: 10   Global Step: 110550   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:57:21,760-Speed 5958.25 samples/sec   Loss 7.3681   LearningRate 0.1076   Epoch: 10   Global Step: 110560   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:57:28,619-Speed 5975.57 samples/sec   Loss 7.3860   LearningRate 0.1076   Epoch: 10   Global Step: 110570   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:57:35,478-Speed 5972.47 samples/sec   Loss 7.3800   LearningRate 0.1076   Epoch: 10   Global Step: 110580   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:57:42,338-Speed 5971.56 samples/sec   Loss 7.4409   LearningRate 0.1076   Epoch: 10   Global Step: 110590   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:57:49,246-Speed 5930.55 samples/sec   Loss 7.3543   LearningRate 0.1076   Epoch: 10   Global Step: 110600   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:57:56,120-Speed 5961.16 samples/sec   Loss 7.3528   LearningRate 0.1075   Epoch: 10   Global Step: 110610   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:58:03,001-Speed 5953.83 samples/sec   Loss 7.3582   LearningRate 0.1075   Epoch: 10   Global Step: 110620   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:58:09,857-Speed 5975.65 samples/sec   Loss 7.3337   LearningRate 0.1075   Epoch: 10   Global Step: 110630   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:58:16,723-Speed 5967.24 samples/sec   Loss 7.3426   LearningRate 0.1075   Epoch: 10   Global Step: 110640   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:58:23,573-Speed 5980.07 samples/sec   Loss 7.3237   LearningRate 0.1074   Epoch: 10   Global Step: 110650   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:58:30,442-Speed 5966.29 samples/sec   Loss 7.3237   LearningRate 0.1074   Epoch: 10   Global Step: 110660   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:58:37,323-Speed 5953.54 samples/sec   Loss 7.3787   LearningRate 0.1074   Epoch: 10   Global Step: 110670   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:58:44,227-Speed 5933.89 samples/sec   Loss 7.3854   LearningRate 0.1074   Epoch: 10   Global Step: 110680   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:58:51,118-Speed 5945.25 samples/sec   Loss 7.3755   LearningRate 0.1074   Epoch: 10   Global Step: 110690   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:58:57,978-Speed 5974.24 samples/sec   Loss 7.4014   LearningRate 0.1073   Epoch: 10   Global Step: 110700   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:59:04,835-Speed 5974.60 samples/sec   Loss 7.3842   LearningRate 0.1073   Epoch: 10   Global Step: 110710   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:59:11,699-Speed 5968.01 samples/sec   Loss 7.3761   LearningRate 0.1073   Epoch: 10   Global Step: 110720   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:59:18,552-Speed 5978.38 samples/sec   Loss 7.3735   LearningRate 0.1073   Epoch: 10   Global Step: 110730   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 17:59:25,426-Speed 5960.02 samples/sec   Loss 7.4042   LearningRate 0.1072   Epoch: 10   Global Step: 110740   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:59:32,271-Speed 5984.72 samples/sec   Loss 7.3652   LearningRate 0.1072   Epoch: 10   Global Step: 110750   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:59:39,138-Speed 5966.19 samples/sec   Loss 7.3482   LearningRate 0.1072   Epoch: 10   Global Step: 110760   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:59:45,984-Speed 5983.63 samples/sec   Loss 7.4506   LearningRate 0.1072   Epoch: 10   Global Step: 110770   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:59:52,862-Speed 5956.65 samples/sec   Loss 7.3787   LearningRate 0.1072   Epoch: 10   Global Step: 110780   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 17:59:59,715-Speed 5978.06 samples/sec   Loss 7.3367   LearningRate 0.1071   Epoch: 10   Global Step: 110790   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:00:06,572-Speed 5974.62 samples/sec   Loss 7.2994   LearningRate 0.1071   Epoch: 10   Global Step: 110800   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:00:13,435-Speed 5969.67 samples/sec   Loss 7.3800   LearningRate 0.1071   Epoch: 10   Global Step: 110810   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:00:20,288-Speed 5977.87 samples/sec   Loss 7.3668   LearningRate 0.1071   Epoch: 10   Global Step: 110820   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:00:27,164-Speed 5958.31 samples/sec   Loss 7.3424   LearningRate 0.1070   Epoch: 10   Global Step: 110830   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:00:34,048-Speed 5951.54 samples/sec   Loss 7.3503   LearningRate 0.1070   Epoch: 10   Global Step: 110840   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:00:40,904-Speed 5975.24 samples/sec   Loss 7.3512   LearningRate 0.1070   Epoch: 10   Global Step: 110850   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:00:47,759-Speed 5976.02 samples/sec   Loss 7.3111   LearningRate 0.1070   Epoch: 10   Global Step: 110860   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:00:54,612-Speed 5977.90 samples/sec   Loss 7.3277   LearningRate 0.1070   Epoch: 10   Global Step: 110870   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:01:01,476-Speed 5968.58 samples/sec   Loss 7.4074   LearningRate 0.1069   Epoch: 10   Global Step: 110880   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:01:08,368-Speed 5944.28 samples/sec   Loss 7.3953   LearningRate 0.1069   Epoch: 10   Global Step: 110890   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:01:15,223-Speed 5977.70 samples/sec   Loss 7.3732   LearningRate 0.1069   Epoch: 10   Global Step: 110900   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:01:22,100-Speed 5957.62 samples/sec   Loss 7.3347   LearningRate 0.1069   Epoch: 10   Global Step: 110910   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:01:28,960-Speed 5971.48 samples/sec   Loss 7.3178   LearningRate 0.1068   Epoch: 10   Global Step: 110920   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:01:35,871-Speed 5928.55 samples/sec   Loss 7.3569   LearningRate 0.1068   Epoch: 10   Global Step: 110930   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:01:42,727-Speed 5975.86 samples/sec   Loss 7.3431   LearningRate 0.1068   Epoch: 10   Global Step: 110940   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:01:49,585-Speed 5973.84 samples/sec   Loss 7.3037   LearningRate 0.1068   Epoch: 10   Global Step: 110950   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:01:56,430-Speed 5985.41 samples/sec   Loss 7.3370   LearningRate 0.1068   Epoch: 10   Global Step: 110960   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:02:03,288-Speed 5973.65 samples/sec   Loss 7.3581   LearningRate 0.1067   Epoch: 10   Global Step: 110970   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:02:10,163-Speed 5958.85 samples/sec   Loss 7.3473   LearningRate 0.1067   Epoch: 10   Global Step: 110980   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:02:17,001-Speed 5991.25 samples/sec   Loss 7.2935   LearningRate 0.1067   Epoch: 10   Global Step: 110990   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:02:23,846-Speed 5985.47 samples/sec   Loss 7.3775   LearningRate 0.1067   Epoch: 10   Global Step: 111000   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:02:30,713-Speed 5966.02 samples/sec   Loss 7.3338   LearningRate 0.1066   Epoch: 10   Global Step: 111010   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:02:37,590-Speed 5957.12 samples/sec   Loss 7.3795   LearningRate 0.1066   Epoch: 10   Global Step: 111020   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:02:44,464-Speed 5960.25 samples/sec   Loss 7.3585   LearningRate 0.1066   Epoch: 10   Global Step: 111030   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:02:51,336-Speed 5961.29 samples/sec   Loss 7.3396   LearningRate 0.1066   Epoch: 10   Global Step: 111040   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:02:58,227-Speed 5945.11 samples/sec   Loss 7.3105   LearningRate 0.1066   Epoch: 10   Global Step: 111050   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:03:05,088-Speed 5972.16 samples/sec   Loss 7.3964   LearningRate 0.1065   Epoch: 10   Global Step: 111060   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:03:11,963-Speed 5959.21 samples/sec   Loss 7.3312   LearningRate 0.1065   Epoch: 10   Global Step: 111070   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:03:18,841-Speed 5956.43 samples/sec   Loss 7.3371   LearningRate 0.1065   Epoch: 10   Global Step: 111080   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:03:25,706-Speed 5967.60 samples/sec   Loss 7.3677   LearningRate 0.1065   Epoch: 10   Global Step: 111090   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:03:32,553-Speed 5983.45 samples/sec   Loss 7.2698   LearningRate 0.1064   Epoch: 10   Global Step: 111100   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:03:39,416-Speed 5969.42 samples/sec   Loss 7.3250   LearningRate 0.1064   Epoch: 10   Global Step: 111110   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:03:46,266-Speed 5980.63 samples/sec   Loss 7.3602   LearningRate 0.1064   Epoch: 10   Global Step: 111120   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:03:53,123-Speed 5974.08 samples/sec   Loss 7.3218   LearningRate 0.1064   Epoch: 10   Global Step: 111130   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:03:59,968-Speed 5984.71 samples/sec   Loss 7.2759   LearningRate 0.1064   Epoch: 10   Global Step: 111140   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:04:06,819-Speed 5979.72 samples/sec   Loss 7.4083   LearningRate 0.1063   Epoch: 10   Global Step: 111150   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:04:13,670-Speed 5979.93 samples/sec   Loss 7.3773   LearningRate 0.1063   Epoch: 10   Global Step: 111160   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:04:20,545-Speed 5958.71 samples/sec   Loss 7.3641   LearningRate 0.1063   Epoch: 10   Global Step: 111170   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:04:27,419-Speed 5959.49 samples/sec   Loss 7.2853   LearningRate 0.1063   Epoch: 10   Global Step: 111180   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:04:34,260-Speed 5989.06 samples/sec   Loss 7.3887   LearningRate 0.1062   Epoch: 10   Global Step: 111190   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:04:41,111-Speed 5981.66 samples/sec   Loss 7.3735   LearningRate 0.1062   Epoch: 10   Global Step: 111200   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:04:47,969-Speed 5974.04 samples/sec   Loss 7.3872   LearningRate 0.1062   Epoch: 10   Global Step: 111210   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:04:54,823-Speed 5976.36 samples/sec   Loss 7.3861   LearningRate 0.1062   Epoch: 10   Global Step: 111220   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:05:01,675-Speed 5978.75 samples/sec   Loss 7.3402   LearningRate 0.1062   Epoch: 10   Global Step: 111230   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:05:08,543-Speed 5965.21 samples/sec   Loss 7.3870   LearningRate 0.1061   Epoch: 10   Global Step: 111240   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:05:15,418-Speed 5959.10 samples/sec   Loss 7.4023   LearningRate 0.1061   Epoch: 10   Global Step: 111250   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:05:22,391-Speed 5875.31 samples/sec   Loss 7.3442   LearningRate 0.1061   Epoch: 10   Global Step: 111260   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:05:29,239-Speed 5982.83 samples/sec   Loss 7.2859   LearningRate 0.1061   Epoch: 10   Global Step: 111270   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:05:36,105-Speed 5966.85 samples/sec   Loss 7.2875   LearningRate 0.1060   Epoch: 10   Global Step: 111280   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:05:43,007-Speed 5937.09 samples/sec   Loss 7.2833   LearningRate 0.1060   Epoch: 10   Global Step: 111290   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:05:49,858-Speed 5979.78 samples/sec   Loss 7.3597   LearningRate 0.1060   Epoch: 10   Global Step: 111300   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:05:56,719-Speed 5971.16 samples/sec   Loss 7.3174   LearningRate 0.1060   Epoch: 10   Global Step: 111310   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:06:03,570-Speed 5979.46 samples/sec   Loss 7.3101   LearningRate 0.1060   Epoch: 10   Global Step: 111320   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:06:10,420-Speed 5981.53 samples/sec   Loss 7.3510   LearningRate 0.1059   Epoch: 10   Global Step: 111330   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:06:17,316-Speed 5940.62 samples/sec   Loss 7.3047   LearningRate 0.1059   Epoch: 10   Global Step: 111340   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:06:24,178-Speed 5970.73 samples/sec   Loss 7.3554   LearningRate 0.1059   Epoch: 10   Global Step: 111350   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:06:31,054-Speed 5958.53 samples/sec   Loss 7.3372   LearningRate 0.1059   Epoch: 10   Global Step: 111360   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:06:37,945-Speed 5944.79 samples/sec   Loss 7.3359   LearningRate 0.1058   Epoch: 10   Global Step: 111370   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:06:44,804-Speed 5972.90 samples/sec   Loss 7.3260   LearningRate 0.1058   Epoch: 10   Global Step: 111380   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:06:51,654-Speed 5980.83 samples/sec   Loss 7.3952   LearningRate 0.1058   Epoch: 10   Global Step: 111390   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:06:58,547-Speed 5943.31 samples/sec   Loss 7.2770   LearningRate 0.1058   Epoch: 10   Global Step: 111400   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:07:05,427-Speed 5954.33 samples/sec   Loss 7.3154   LearningRate 0.1058   Epoch: 10   Global Step: 111410   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:07:12,299-Speed 5963.27 samples/sec   Loss 7.3572   LearningRate 0.1057   Epoch: 10   Global Step: 111420   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:07:19,152-Speed 5977.24 samples/sec   Loss 7.2983   LearningRate 0.1057   Epoch: 10   Global Step: 111430   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:07:25,997-Speed 5985.37 samples/sec   Loss 7.3207   LearningRate 0.1057   Epoch: 10   Global Step: 111440   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:07:32,858-Speed 5971.42 samples/sec   Loss 7.3779   LearningRate 0.1057   Epoch: 10   Global Step: 111450   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:07:39,719-Speed 5970.80 samples/sec   Loss 7.3662   LearningRate 0.1056   Epoch: 10   Global Step: 111460   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:07:46,586-Speed 5966.18 samples/sec   Loss 7.2948   LearningRate 0.1056   Epoch: 10   Global Step: 111470   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:07:53,469-Speed 5952.87 samples/sec   Loss 7.3103   LearningRate 0.1056   Epoch: 10   Global Step: 111480   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:08:00,334-Speed 5967.42 samples/sec   Loss 7.3586   LearningRate 0.1056   Epoch: 10   Global Step: 111490   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:08:07,186-Speed 5980.90 samples/sec   Loss 7.3259   LearningRate 0.1056   Epoch: 10   Global Step: 111500   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:08:14,037-Speed 5980.39 samples/sec   Loss 7.2514   LearningRate 0.1055   Epoch: 10   Global Step: 111510   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:08:20,880-Speed 5986.15 samples/sec   Loss 7.3532   LearningRate 0.1055   Epoch: 10   Global Step: 111520   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:08:27,876-Speed 5856.56 samples/sec   Loss 7.3329   LearningRate 0.1055   Epoch: 10   Global Step: 111530   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:08:34,722-Speed 5983.89 samples/sec   Loss 7.2480   LearningRate 0.1055   Epoch: 10   Global Step: 111540   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:08:41,574-Speed 5978.56 samples/sec   Loss 7.2719   LearningRate 0.1054   Epoch: 10   Global Step: 111550   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:08:48,435-Speed 5971.22 samples/sec   Loss 7.3461   LearningRate 0.1054   Epoch: 10   Global Step: 111560   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:08:55,295-Speed 5972.38 samples/sec   Loss 7.3322   LearningRate 0.1054   Epoch: 10   Global Step: 111570   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-08 18:09:02,184-Speed 5946.98 samples/sec   Loss 7.3201   LearningRate 0.1054   Epoch: 10   Global Step: 111580   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:09:09,043-Speed 5974.61 samples/sec   Loss 7.2448   LearningRate 0.1054   Epoch: 10   Global Step: 111590   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:09:15,894-Speed 5979.40 samples/sec   Loss 7.2878   LearningRate 0.1053   Epoch: 10   Global Step: 111600   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:09:22,753-Speed 5972.63 samples/sec   Loss 7.2932   LearningRate 0.1053   Epoch: 10   Global Step: 111610   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:09:29,607-Speed 5977.66 samples/sec   Loss 7.2817   LearningRate 0.1053   Epoch: 10   Global Step: 111620   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:09:36,476-Speed 5964.41 samples/sec   Loss 7.2902   LearningRate 0.1053   Epoch: 10   Global Step: 111630   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:09:43,337-Speed 5970.55 samples/sec   Loss 7.3288   LearningRate 0.1053   Epoch: 10   Global Step: 111640   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:09:50,193-Speed 5977.77 samples/sec   Loss 7.3314   LearningRate 0.1052   Epoch: 10   Global Step: 111650   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:09:57,056-Speed 5970.06 samples/sec   Loss 7.3978   LearningRate 0.1052   Epoch: 10   Global Step: 111660   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:10:03,917-Speed 5970.89 samples/sec   Loss 7.2882   LearningRate 0.1052   Epoch: 10   Global Step: 111670   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:10:10,760-Speed 5987.10 samples/sec   Loss 7.3444   LearningRate 0.1052   Epoch: 10   Global Step: 111680   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:10:17,628-Speed 5964.92 samples/sec   Loss 7.3053   LearningRate 0.1051   Epoch: 10   Global Step: 111690   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:10:24,498-Speed 5962.37 samples/sec   Loss 7.2128   LearningRate 0.1051   Epoch: 10   Global Step: 111700   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:10:31,371-Speed 5960.88 samples/sec   Loss 7.3240   LearningRate 0.1051   Epoch: 10   Global Step: 111710   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:10:38,230-Speed 5972.98 samples/sec   Loss 7.2942   LearningRate 0.1051   Epoch: 10   Global Step: 111720   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:10:45,094-Speed 5968.60 samples/sec   Loss 7.3026   LearningRate 0.1051   Epoch: 10   Global Step: 111730   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:10:51,954-Speed 5972.36 samples/sec   Loss 7.3503   LearningRate 0.1050   Epoch: 10   Global Step: 111740   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:10:58,821-Speed 5969.54 samples/sec   Loss 7.3629   LearningRate 0.1050   Epoch: 10   Global Step: 111750   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:11:05,685-Speed 5968.32 samples/sec   Loss 7.2868   LearningRate 0.1050   Epoch: 10   Global Step: 111760   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:11:12,546-Speed 5971.30 samples/sec   Loss 7.3334   LearningRate 0.1050   Epoch: 10   Global Step: 111770   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:11:19,417-Speed 5963.13 samples/sec   Loss 7.3114   LearningRate 0.1049   Epoch: 10   Global Step: 111780   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:11:26,283-Speed 5966.48 samples/sec   Loss 7.3348   LearningRate 0.1049   Epoch: 10   Global Step: 111790   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:11:33,157-Speed 5960.38 samples/sec   Loss 7.3061   LearningRate 0.1049   Epoch: 10   Global Step: 111800   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:11:40,030-Speed 5961.05 samples/sec   Loss 7.2889   LearningRate 0.1049   Epoch: 10   Global Step: 111810   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:11:46,882-Speed 5978.48 samples/sec   Loss 7.2922   LearningRate 0.1049   Epoch: 10   Global Step: 111820   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:11:53,735-Speed 5977.85 samples/sec   Loss 7.3180   LearningRate 0.1048   Epoch: 10   Global Step: 111830   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:12:00,586-Speed 5979.57 samples/sec   Loss 7.3291   LearningRate 0.1048   Epoch: 10   Global Step: 111840   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:12:07,482-Speed 5941.36 samples/sec   Loss 7.2685   LearningRate 0.1048   Epoch: 10   Global Step: 111850   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:12:14,419-Speed 5905.29 samples/sec   Loss 7.2592   LearningRate 0.1048   Epoch: 10   Global Step: 111860   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:12:21,351-Speed 5910.47 samples/sec   Loss 7.2812   LearningRate 0.1047   Epoch: 10   Global Step: 111870   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:12:28,276-Speed 5915.03 samples/sec   Loss 7.3114   LearningRate 0.1047   Epoch: 10   Global Step: 111880   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:12:35,173-Speed 5940.76 samples/sec   Loss 7.2883   LearningRate 0.1047   Epoch: 10   Global Step: 111890   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:12:42,036-Speed 5969.25 samples/sec   Loss 7.3323   LearningRate 0.1047   Epoch: 10   Global Step: 111900   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:12:48,900-Speed 5968.84 samples/sec   Loss 7.3360   LearningRate 0.1047   Epoch: 10   Global Step: 111910   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:12:55,773-Speed 5960.51 samples/sec   Loss 7.3545   LearningRate 0.1046   Epoch: 10   Global Step: 111920   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:13:02,630-Speed 5975.19 samples/sec   Loss 7.2328   LearningRate 0.1046   Epoch: 10   Global Step: 111930   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:13:09,509-Speed 5954.97 samples/sec   Loss 7.3355   LearningRate 0.1046   Epoch: 10   Global Step: 111940   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:13:16,397-Speed 5948.62 samples/sec   Loss 7.2784   LearningRate 0.1046   Epoch: 10   Global Step: 111950   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:13:23,253-Speed 5974.79 samples/sec   Loss 7.3078   LearningRate 0.1045   Epoch: 10   Global Step: 111960   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:13:30,129-Speed 5957.81 samples/sec   Loss 7.2807   LearningRate 0.1045   Epoch: 10   Global Step: 111970   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:13:36,984-Speed 5976.42 samples/sec   Loss 7.2696   LearningRate 0.1045   Epoch: 10   Global Step: 111980   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:13:43,830-Speed 5984.26 samples/sec   Loss 7.2575   LearningRate 0.1045   Epoch: 10   Global Step: 111990   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:13:50,684-Speed 5976.38 samples/sec   Loss 7.2549   LearningRate 0.1045   Epoch: 10   Global Step: 112000   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:13:57,527-Speed 5987.75 samples/sec   Loss 7.3021   LearningRate 0.1044   Epoch: 10   Global Step: 112010   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:14:04,374-Speed 5982.90 samples/sec   Loss 7.2439   LearningRate 0.1044   Epoch: 10   Global Step: 112020   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:14:11,224-Speed 5979.83 samples/sec   Loss 7.2054   LearningRate 0.1044   Epoch: 10   Global Step: 112030   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:14:18,076-Speed 5984.24 samples/sec   Loss 7.3185   LearningRate 0.1044   Epoch: 10   Global Step: 112040   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:14:24,943-Speed 5966.91 samples/sec   Loss 7.2623   LearningRate 0.1044   Epoch: 10   Global Step: 112050   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:14:31,810-Speed 5965.39 samples/sec   Loss 7.2782   LearningRate 0.1043   Epoch: 10   Global Step: 112060   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:14:38,653-Speed 5989.95 samples/sec   Loss 7.2413   LearningRate 0.1043   Epoch: 10   Global Step: 112070   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:14:45,524-Speed 5962.30 samples/sec   Loss 7.3279   LearningRate 0.1043   Epoch: 10   Global Step: 112080   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:14:52,421-Speed 5939.99 samples/sec   Loss 7.2667   LearningRate 0.1043   Epoch: 10   Global Step: 112090   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:14:59,271-Speed 5980.53 samples/sec   Loss 7.2573   LearningRate 0.1042   Epoch: 10   Global Step: 112100   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:15:06,133-Speed 5970.82 samples/sec   Loss 7.2665   LearningRate 0.1042   Epoch: 10   Global Step: 112110   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:15:12,982-Speed 5981.47 samples/sec   Loss 7.2490   LearningRate 0.1042   Epoch: 10   Global Step: 112120   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:15:19,862-Speed 5954.11 samples/sec   Loss 7.2984   LearningRate 0.1042   Epoch: 10   Global Step: 112130   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:15:26,711-Speed 5982.58 samples/sec   Loss 7.3019   LearningRate 0.1042   Epoch: 10   Global Step: 112140   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:15:33,570-Speed 5972.24 samples/sec   Loss 7.3537   LearningRate 0.1041   Epoch: 10   Global Step: 112150   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:15:40,496-Speed 5914.97 samples/sec   Loss 7.3214   LearningRate 0.1041   Epoch: 10   Global Step: 112160   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:15:47,346-Speed 5981.34 samples/sec   Loss 7.2616   LearningRate 0.1041   Epoch: 10   Global Step: 112170   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:15:54,197-Speed 5979.58 samples/sec   Loss 7.1950   LearningRate 0.1041   Epoch: 10   Global Step: 112180   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:16:01,053-Speed 5975.31 samples/sec   Loss 7.2786   LearningRate 0.1040   Epoch: 10   Global Step: 112190   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:16:07,950-Speed 5940.72 samples/sec   Loss 7.2153   LearningRate 0.1040   Epoch: 10   Global Step: 112200   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:16:14,794-Speed 5985.12 samples/sec   Loss 7.2017   LearningRate 0.1040   Epoch: 10   Global Step: 112210   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:16:21,647-Speed 5978.40 samples/sec   Loss 7.2535   LearningRate 0.1040   Epoch: 10   Global Step: 112220   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:16:28,487-Speed 5989.31 samples/sec   Loss 7.2623   LearningRate 0.1040   Epoch: 10   Global Step: 112230   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:16:35,333-Speed 5983.68 samples/sec   Loss 7.2017   LearningRate 0.1039   Epoch: 10   Global Step: 112240   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:16:42,208-Speed 5964.49 samples/sec   Loss 7.2537   LearningRate 0.1039   Epoch: 10   Global Step: 112250   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:16:49,069-Speed 5979.44 samples/sec   Loss 7.3030   LearningRate 0.1039   Epoch: 10   Global Step: 112260   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:16:55,942-Speed 5960.63 samples/sec   Loss 7.3285   LearningRate 0.1039   Epoch: 10   Global Step: 112270   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:17:02,810-Speed 5964.61 samples/sec   Loss 7.2524   LearningRate 0.1038   Epoch: 10   Global Step: 112280   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:17:09,668-Speed 5974.17 samples/sec   Loss 7.2004   LearningRate 0.1038   Epoch: 10   Global Step: 112290   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:17:16,523-Speed 5975.66 samples/sec   Loss 7.3273   LearningRate 0.1038   Epoch: 10   Global Step: 112300   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:17:23,398-Speed 5960.02 samples/sec   Loss 7.2806   LearningRate 0.1038   Epoch: 10   Global Step: 112310   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:17:30,272-Speed 5959.66 samples/sec   Loss 7.3112   LearningRate 0.1038   Epoch: 10   Global Step: 112320   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:17:37,137-Speed 5967.56 samples/sec   Loss 7.3472   LearningRate 0.1037   Epoch: 10   Global Step: 112330   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:17:43,992-Speed 5977.09 samples/sec   Loss 7.2656   LearningRate 0.1037   Epoch: 10   Global Step: 112340   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:17:50,851-Speed 5972.83 samples/sec   Loss 7.2995   LearningRate 0.1037   Epoch: 10   Global Step: 112350   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:17:57,734-Speed 5951.22 samples/sec   Loss 7.2796   LearningRate 0.1037   Epoch: 10   Global Step: 112360   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:18:04,582-Speed 5983.14 samples/sec   Loss 7.2327   LearningRate 0.1037   Epoch: 10   Global Step: 112370   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:18:11,464-Speed 5952.86 samples/sec   Loss 7.2990   LearningRate 0.1036   Epoch: 10   Global Step: 112380   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:18:18,349-Speed 5949.99 samples/sec   Loss 7.1537   LearningRate 0.1036   Epoch: 10   Global Step: 112390   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:18:25,197-Speed 5982.51 samples/sec   Loss 7.2179   LearningRate 0.1036   Epoch: 10   Global Step: 112400   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:18:32,042-Speed 5985.39 samples/sec   Loss 7.2305   LearningRate 0.1036   Epoch: 10   Global Step: 112410   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:18:38,886-Speed 5986.01 samples/sec   Loss 7.3126   LearningRate 0.1035   Epoch: 10   Global Step: 112420   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:18:45,757-Speed 5965.47 samples/sec   Loss 7.2807   LearningRate 0.1035   Epoch: 10   Global Step: 112430   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:18:52,616-Speed 5974.58 samples/sec   Loss 7.3410   LearningRate 0.1035   Epoch: 10   Global Step: 112440   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:18:59,479-Speed 5969.86 samples/sec   Loss 7.2852   LearningRate 0.1035   Epoch: 10   Global Step: 112450   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:19:06,325-Speed 5985.50 samples/sec   Loss 7.2445   LearningRate 0.1035   Epoch: 10   Global Step: 112460   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:19:13,195-Speed 5964.07 samples/sec   Loss 7.2668   LearningRate 0.1034   Epoch: 10   Global Step: 112470   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:19:20,117-Speed 5917.70 samples/sec   Loss 7.1943   LearningRate 0.1034   Epoch: 10   Global Step: 112480   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:19:27,037-Speed 5922.46 samples/sec   Loss 7.2284   LearningRate 0.1034   Epoch: 10   Global Step: 112490   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:19:33,880-Speed 5989.76 samples/sec   Loss 7.1778   LearningRate 0.1034   Epoch: 10   Global Step: 112500   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:19:40,733-Speed 5977.33 samples/sec   Loss 7.1954   LearningRate 0.1033   Epoch: 10   Global Step: 112510   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:19:47,627-Speed 5943.01 samples/sec   Loss 7.2189   LearningRate 0.1033   Epoch: 10   Global Step: 112520   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:19:54,534-Speed 5931.53 samples/sec   Loss 7.2589   LearningRate 0.1033   Epoch: 10   Global Step: 112530   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:20:01,405-Speed 5962.40 samples/sec   Loss 7.1878   LearningRate 0.1033   Epoch: 10   Global Step: 112540   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:20:08,283-Speed 5957.24 samples/sec   Loss 7.2168   LearningRate 0.1033   Epoch: 10   Global Step: 112550   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:20:15,134-Speed 5980.00 samples/sec   Loss 7.2194   LearningRate 0.1032   Epoch: 10   Global Step: 112560   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:20:21,994-Speed 5971.61 samples/sec   Loss 7.1753   LearningRate 0.1032   Epoch: 10   Global Step: 112570   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:20:28,881-Speed 5948.60 samples/sec   Loss 7.2011   LearningRate 0.1032   Epoch: 10   Global Step: 112580   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:20:35,752-Speed 5965.65 samples/sec   Loss 7.2314   LearningRate 0.1032   Epoch: 10   Global Step: 112590   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:20:42,607-Speed 5975.11 samples/sec   Loss 7.2637   LearningRate 0.1032   Epoch: 10   Global Step: 112600   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:20:49,461-Speed 5977.66 samples/sec   Loss 7.2161   LearningRate 0.1031   Epoch: 10   Global Step: 112610   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:20:56,323-Speed 5970.50 samples/sec   Loss 7.1313   LearningRate 0.1031   Epoch: 10   Global Step: 112620   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:21:03,176-Speed 5977.68 samples/sec   Loss 7.2729   LearningRate 0.1031   Epoch: 10   Global Step: 112630   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-08 18:21:10,061-Speed 5950.68 samples/sec   Loss 7.2715   LearningRate 0.1031   Epoch: 10   Global Step: 112640   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-08 18:21:16,930-Speed 5966.44 samples/sec   Loss 7.2268   LearningRate 0.1030   Epoch: 10   Global Step: 112650   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:21:23,767-Speed 5991.92 samples/sec   Loss 7.2290   LearningRate 0.1030   Epoch: 10   Global Step: 112660   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:21:30,626-Speed 5972.91 samples/sec   Loss 7.2724   LearningRate 0.1030   Epoch: 10   Global Step: 112670   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:21:37,485-Speed 5972.44 samples/sec   Loss 7.2390   LearningRate 0.1030   Epoch: 10   Global Step: 112680   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:21:44,335-Speed 5981.05 samples/sec   Loss 7.2659   LearningRate 0.1030   Epoch: 10   Global Step: 112690   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:21:51,279-Speed 5899.48 samples/sec   Loss 7.2054   LearningRate 0.1029   Epoch: 10   Global Step: 112700   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:21:58,130-Speed 5979.67 samples/sec   Loss 7.2414   LearningRate 0.1029   Epoch: 10   Global Step: 112710   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:22:04,975-Speed 5985.40 samples/sec   Loss 7.2101   LearningRate 0.1029   Epoch: 10   Global Step: 112720   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:22:11,833-Speed 5973.30 samples/sec   Loss 7.1830   LearningRate 0.1029   Epoch: 10   Global Step: 112730   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:22:18,708-Speed 5959.19 samples/sec   Loss 7.1767   LearningRate 0.1028   Epoch: 10   Global Step: 112740   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:22:25,554-Speed 5984.38 samples/sec   Loss 7.2546   LearningRate 0.1028   Epoch: 10   Global Step: 112750   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:22:32,398-Speed 5985.68 samples/sec   Loss 7.2076   LearningRate 0.1028   Epoch: 10   Global Step: 112760   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:22:39,249-Speed 5981.09 samples/sec   Loss 7.2076   LearningRate 0.1028   Epoch: 10   Global Step: 112770   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:22:46,122-Speed 5960.01 samples/sec   Loss 7.2321   LearningRate 0.1028   Epoch: 10   Global Step: 112780   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:22:52,986-Speed 5969.19 samples/sec   Loss 7.2432   LearningRate 0.1027   Epoch: 10   Global Step: 112790   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:22:59,863-Speed 5957.32 samples/sec   Loss 7.2224   LearningRate 0.1027   Epoch: 10   Global Step: 112800   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:23:06,712-Speed 5981.37 samples/sec   Loss 7.2587   LearningRate 0.1027   Epoch: 10   Global Step: 112810   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:23:13,572-Speed 5972.27 samples/sec   Loss 7.2166   LearningRate 0.1027   Epoch: 10   Global Step: 112820   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:23:20,422-Speed 5980.51 samples/sec   Loss 7.1975   LearningRate 0.1027   Epoch: 10   Global Step: 112830   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:23:27,308-Speed 5948.81 samples/sec   Loss 7.1898   LearningRate 0.1026   Epoch: 10   Global Step: 112840   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:23:34,161-Speed 5977.90 samples/sec   Loss 7.1918   LearningRate 0.1026   Epoch: 10   Global Step: 112850   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:23:41,008-Speed 5983.32 samples/sec   Loss 7.2034   LearningRate 0.1026   Epoch: 10   Global Step: 112860   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:23:47,872-Speed 5969.16 samples/sec   Loss 7.2839   LearningRate 0.1026   Epoch: 10   Global Step: 112870   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:23:54,765-Speed 5942.52 samples/sec   Loss 7.2756   LearningRate 0.1025   Epoch: 10   Global Step: 112880   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:24:01,625-Speed 5972.72 samples/sec   Loss 7.1920   LearningRate 0.1025   Epoch: 10   Global Step: 112890   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:24:08,474-Speed 5980.82 samples/sec   Loss 7.2411   LearningRate 0.1025   Epoch: 10   Global Step: 112900   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:24:15,345-Speed 5963.11 samples/sec   Loss 7.1924   LearningRate 0.1025   Epoch: 10   Global Step: 112910   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-01-08 18:24:22,209-Speed 5968.65 samples/sec   Loss 7.2486   LearningRate 0.1025   Epoch: 10   Global Step: 112920   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:24:29,178-Speed 5878.57 samples/sec   Loss 7.1973   LearningRate 0.1024   Epoch: 10   Global Step: 112930   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-01-08 18:24:36,042-Speed 5968.87 samples/sec   Loss 7.2240   LearningRate 0.1024   Epoch: 10   Global Step: 112940   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:24:42,899-Speed 5975.25 samples/sec   Loss 7.2651   LearningRate 0.1024   Epoch: 10   Global Step: 112950   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:24:49,775-Speed 5957.70 samples/sec   Loss 7.2450   LearningRate 0.1024   Epoch: 10   Global Step: 112960   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:24:56,718-Speed 5900.97 samples/sec   Loss 7.2803   LearningRate 0.1023   Epoch: 10   Global Step: 112970   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:25:03,650-Speed 5910.28 samples/sec   Loss 7.2246   LearningRate 0.1023   Epoch: 10   Global Step: 112980   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:25:10,547-Speed 5939.43 samples/sec   Loss 7.1188   LearningRate 0.1023   Epoch: 10   Global Step: 112990   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:25:17,417-Speed 5966.11 samples/sec   Loss 7.2109   LearningRate 0.1023   Epoch: 10   Global Step: 113000   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:25:24,277-Speed 5972.56 samples/sec   Loss 7.2146   LearningRate 0.1023   Epoch: 10   Global Step: 113010   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-01-08 18:25:31,161-Speed 5950.27 samples/sec   Loss 7.2620   LearningRate 0.1022   Epoch: 10   Global Step: 113020   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:25:38,019-Speed 5974.85 samples/sec   Loss 7.1922   LearningRate 0.1022   Epoch: 10   Global Step: 113030   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:25:44,899-Speed 5954.51 samples/sec   Loss 7.1875   LearningRate 0.1022   Epoch: 10   Global Step: 113040   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:25:51,753-Speed 5976.91 samples/sec   Loss 7.1519   LearningRate 0.1022   Epoch: 10   Global Step: 113050   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:25:58,604-Speed 5980.36 samples/sec   Loss 7.2456   LearningRate 0.1022   Epoch: 10   Global Step: 113060   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:26:05,469-Speed 5968.26 samples/sec   Loss 7.2074   LearningRate 0.1021   Epoch: 10   Global Step: 113070   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:26:12,323-Speed 5976.25 samples/sec   Loss 7.1742   LearningRate 0.1021   Epoch: 10   Global Step: 113080   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:26:19,204-Speed 5953.99 samples/sec   Loss 7.1848   LearningRate 0.1021   Epoch: 10   Global Step: 113090   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:26:26,082-Speed 5957.21 samples/sec   Loss 7.1888   LearningRate 0.1021   Epoch: 10   Global Step: 113100   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:26:32,933-Speed 5979.17 samples/sec   Loss 7.1710   LearningRate 0.1020   Epoch: 10   Global Step: 113110   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:26:39,817-Speed 5951.73 samples/sec   Loss 7.2490   LearningRate 0.1020   Epoch: 10   Global Step: 113120   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:26:46,713-Speed 5941.21 samples/sec   Loss 7.1689   LearningRate 0.1020   Epoch: 10   Global Step: 113130   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:26:54,657-Speed 5156.94 samples/sec   Loss 7.1837   LearningRate 0.1020   Epoch: 10   Global Step: 113140   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:27:01,507-Speed 5981.04 samples/sec   Loss 7.2847   LearningRate 0.1020   Epoch: 10   Global Step: 113150   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:27:08,375-Speed 5965.02 samples/sec   Loss 7.2229   LearningRate 0.1019   Epoch: 10   Global Step: 113160   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:27:15,226-Speed 5982.04 samples/sec   Loss 7.1826   LearningRate 0.1019   Epoch: 10   Global Step: 113170   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:27:22,092-Speed 5969.61 samples/sec   Loss 7.1775   LearningRate 0.1019   Epoch: 10   Global Step: 113180   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:27:28,964-Speed 5962.06 samples/sec   Loss 7.2069   LearningRate 0.1019   Epoch: 10   Global Step: 113190   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:27:35,824-Speed 5971.28 samples/sec   Loss 7.1860   LearningRate 0.1018   Epoch: 10   Global Step: 113200   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:27:42,691-Speed 5965.89 samples/sec   Loss 7.1787   LearningRate 0.1018   Epoch: 10   Global Step: 113210   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:27:49,547-Speed 5975.81 samples/sec   Loss 7.2048   LearningRate 0.1018   Epoch: 10   Global Step: 113220   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:27:56,400-Speed 5977.74 samples/sec   Loss 7.1755   LearningRate 0.1018   Epoch: 10   Global Step: 113230   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:28:03,248-Speed 5982.67 samples/sec   Loss 7.1485   LearningRate 0.1018   Epoch: 10   Global Step: 113240   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:28:10,123-Speed 5958.84 samples/sec   Loss 7.2087   LearningRate 0.1017   Epoch: 10   Global Step: 113250   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:28:16,980-Speed 5974.82 samples/sec   Loss 7.1702   LearningRate 0.1017   Epoch: 10   Global Step: 113260   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:28:23,869-Speed 5946.79 samples/sec   Loss 7.1067   LearningRate 0.1017   Epoch: 10   Global Step: 113270   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:28:30,739-Speed 5963.38 samples/sec   Loss 7.2116   LearningRate 0.1017   Epoch: 10   Global Step: 113280   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:28:37,588-Speed 5981.36 samples/sec   Loss 7.2070   LearningRate 0.1017   Epoch: 10   Global Step: 113290   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:28:44,462-Speed 5971.16 samples/sec   Loss 7.2044   LearningRate 0.1016   Epoch: 10   Global Step: 113300   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:28:51,334-Speed 5961.86 samples/sec   Loss 7.2271   LearningRate 0.1016   Epoch: 10   Global Step: 113310   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:28:58,216-Speed 5952.38 samples/sec   Loss 7.2431   LearningRate 0.1016   Epoch: 10   Global Step: 113320   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:29:05,070-Speed 5977.80 samples/sec   Loss 7.1822   LearningRate 0.1016   Epoch: 10   Global Step: 113330   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:29:11,913-Speed 5986.50 samples/sec   Loss 7.2056   LearningRate 0.1015   Epoch: 10   Global Step: 113340   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:29:18,777-Speed 5967.79 samples/sec   Loss 7.1907   LearningRate 0.1015   Epoch: 10   Global Step: 113350   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:29:25,639-Speed 5970.74 samples/sec   Loss 7.1886   LearningRate 0.1015   Epoch: 10   Global Step: 113360   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:29:32,520-Speed 5954.56 samples/sec   Loss 7.1298   LearningRate 0.1015   Epoch: 10   Global Step: 113370   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:29:39,405-Speed 5949.50 samples/sec   Loss 7.1739   LearningRate 0.1015   Epoch: 10   Global Step: 113380   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:29:46,290-Speed 5952.83 samples/sec   Loss 7.1994   LearningRate 0.1014   Epoch: 10   Global Step: 113390   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:29:53,152-Speed 5970.07 samples/sec   Loss 7.1712   LearningRate 0.1014   Epoch: 10   Global Step: 113400   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:30:00,009-Speed 5973.97 samples/sec   Loss 7.1979   LearningRate 0.1014   Epoch: 10   Global Step: 113410   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:30:07,701-Speed 5326.51 samples/sec   Loss 7.2410   LearningRate 0.1014   Epoch: 10   Global Step: 113420   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:30:14,552-Speed 5979.25 samples/sec   Loss 7.1886   LearningRate 0.1014   Epoch: 10   Global Step: 113430   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:30:21,400-Speed 5982.74 samples/sec   Loss 7.1593   LearningRate 0.1013   Epoch: 10   Global Step: 113440   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:30:28,360-Speed 5886.35 samples/sec   Loss 7.1435   LearningRate 0.1013   Epoch: 10   Global Step: 113450   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:30:35,208-Speed 5982.09 samples/sec   Loss 7.1546   LearningRate 0.1013   Epoch: 10   Global Step: 113460   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:30:42,078-Speed 5962.39 samples/sec   Loss 7.1808   LearningRate 0.1013   Epoch: 10   Global Step: 113470   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:30:48,934-Speed 5975.85 samples/sec   Loss 7.0694   LearningRate 0.1012   Epoch: 10   Global Step: 113480   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:30:55,810-Speed 5958.41 samples/sec   Loss 7.1341   LearningRate 0.1012   Epoch: 10   Global Step: 113490   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:31:02,684-Speed 5959.16 samples/sec   Loss 7.2303   LearningRate 0.1012   Epoch: 10   Global Step: 113500   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-08 18:31:09,544-Speed 5972.17 samples/sec   Loss 7.1616   LearningRate 0.1012   Epoch: 10   Global Step: 113510   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:31:16,405-Speed 5971.72 samples/sec   Loss 7.1616   LearningRate 0.1012   Epoch: 10   Global Step: 113520   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:31:23,294-Speed 5946.96 samples/sec   Loss 7.2012   LearningRate 0.1011   Epoch: 10   Global Step: 113530   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:31:30,143-Speed 5981.02 samples/sec   Loss 7.2274   LearningRate 0.1011   Epoch: 10   Global Step: 113540   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:31:36,997-Speed 5977.78 samples/sec   Loss 7.1892   LearningRate 0.1011   Epoch: 10   Global Step: 113550   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:31:43,850-Speed 5977.90 samples/sec   Loss 7.1752   LearningRate 0.1011   Epoch: 10   Global Step: 113560   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:31:50,834-Speed 5867.05 samples/sec   Loss 7.1814   LearningRate 0.1011   Epoch: 10   Global Step: 113570   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:31:57,685-Speed 5980.04 samples/sec   Loss 7.2349   LearningRate 0.1010   Epoch: 10   Global Step: 113580   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:32:04,520-Speed 5993.39 samples/sec   Loss 7.1977   LearningRate 0.1010   Epoch: 10   Global Step: 113590   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:32:11,365-Speed 5985.53 samples/sec   Loss 7.2219   LearningRate 0.1010   Epoch: 10   Global Step: 113600   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:32:18,215-Speed 5980.93 samples/sec   Loss 7.1727   LearningRate 0.1010   Epoch: 10   Global Step: 113610   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:32:25,065-Speed 5980.44 samples/sec   Loss 7.2182   LearningRate 0.1009   Epoch: 10   Global Step: 113620   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:32:31,934-Speed 5963.88 samples/sec   Loss 7.1697   LearningRate 0.1009   Epoch: 10   Global Step: 113630   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:32:38,796-Speed 5970.39 samples/sec   Loss 7.1813   LearningRate 0.1009   Epoch: 10   Global Step: 113640   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:32:45,656-Speed 5971.43 samples/sec   Loss 7.1674   LearningRate 0.1009   Epoch: 10   Global Step: 113650   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:32:52,552-Speed 5941.87 samples/sec   Loss 7.1447   LearningRate 0.1009   Epoch: 10   Global Step: 113660   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:32:59,538-Speed 5865.48 samples/sec   Loss 7.1718   LearningRate 0.1008   Epoch: 10   Global Step: 113670   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:33:06,408-Speed 5963.17 samples/sec   Loss 7.1760   LearningRate 0.1008   Epoch: 10   Global Step: 113680   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:33:13,392-Speed 5865.84 samples/sec   Loss 7.0758   LearningRate 0.1008   Epoch: 10   Global Step: 113690   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:33:20,385-Speed 5858.84 samples/sec   Loss 7.1581   LearningRate 0.1008   Epoch: 10   Global Step: 113700   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:33:27,278-Speed 5943.67 samples/sec   Loss 7.1561   LearningRate 0.1007   Epoch: 10   Global Step: 113710   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:33:34,147-Speed 5964.28 samples/sec   Loss 7.0866   LearningRate 0.1007   Epoch: 10   Global Step: 113720   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:33:40,997-Speed 5980.98 samples/sec   Loss 7.1593   LearningRate 0.1007   Epoch: 10   Global Step: 113730   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:33:47,858-Speed 5971.15 samples/sec   Loss 7.1557   LearningRate 0.1007   Epoch: 10   Global Step: 113740   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:33:54,712-Speed 5979.33 samples/sec   Loss 7.1513   LearningRate 0.1007   Epoch: 10   Global Step: 113750   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:34:01,574-Speed 5969.84 samples/sec   Loss 7.1678   LearningRate 0.1006   Epoch: 10   Global Step: 113760   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:34:08,430-Speed 5975.76 samples/sec   Loss 7.1258   LearningRate 0.1006   Epoch: 10   Global Step: 113770   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:34:15,272-Speed 5987.28 samples/sec   Loss 7.1220   LearningRate 0.1006   Epoch: 10   Global Step: 113780   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:34:22,131-Speed 5973.49 samples/sec   Loss 7.1091   LearningRate 0.1006   Epoch: 10   Global Step: 113790   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:34:29,003-Speed 5961.54 samples/sec   Loss 7.0986   LearningRate 0.1006   Epoch: 10   Global Step: 113800   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:34:35,862-Speed 5972.89 samples/sec   Loss 7.1909   LearningRate 0.1005   Epoch: 10   Global Step: 113810   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:34:42,712-Speed 5981.08 samples/sec   Loss 7.1031   LearningRate 0.1005   Epoch: 10   Global Step: 113820   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:34:49,558-Speed 5983.85 samples/sec   Loss 7.1534   LearningRate 0.1005   Epoch: 10   Global Step: 113830   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:34:56,407-Speed 5982.37 samples/sec   Loss 7.2127   LearningRate 0.1005   Epoch: 10   Global Step: 113840   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:35:03,254-Speed 5984.32 samples/sec   Loss 7.0997   LearningRate 0.1004   Epoch: 10   Global Step: 113850   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:35:10,109-Speed 5976.02 samples/sec   Loss 7.1492   LearningRate 0.1004   Epoch: 10   Global Step: 113860   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:35:16,962-Speed 5978.80 samples/sec   Loss 7.1913   LearningRate 0.1004   Epoch: 10   Global Step: 113870   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:35:23,813-Speed 5979.69 samples/sec   Loss 7.1634   LearningRate 0.1004   Epoch: 10   Global Step: 113880   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:35:30,674-Speed 5971.11 samples/sec   Loss 7.1710   LearningRate 0.1004   Epoch: 10   Global Step: 113890   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:35:37,541-Speed 5965.64 samples/sec   Loss 7.1149   LearningRate 0.1003   Epoch: 10   Global Step: 113900   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:35:44,404-Speed 5969.46 samples/sec   Loss 7.1142   LearningRate 0.1003   Epoch: 10   Global Step: 113910   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:35:51,319-Speed 5924.64 samples/sec   Loss 7.1745   LearningRate 0.1003   Epoch: 10   Global Step: 113920   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:35:58,172-Speed 5977.41 samples/sec   Loss 7.1902   LearningRate 0.1003   Epoch: 10   Global Step: 113930   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:36:05,023-Speed 5980.33 samples/sec   Loss 7.1136   LearningRate 0.1003   Epoch: 10   Global Step: 113940   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:36:11,870-Speed 5982.41 samples/sec   Loss 7.1527   LearningRate 0.1002   Epoch: 10   Global Step: 113950   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:36:18,763-Speed 5943.90 samples/sec   Loss 7.1801   LearningRate 0.1002   Epoch: 10   Global Step: 113960   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:36:25,633-Speed 5963.09 samples/sec   Loss 7.1633   LearningRate 0.1002   Epoch: 10   Global Step: 113970   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:36:32,507-Speed 5959.60 samples/sec   Loss 7.1543   LearningRate 0.1002   Epoch: 10   Global Step: 113980   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:36:39,383-Speed 5959.09 samples/sec   Loss 7.2331   LearningRate 0.1001   Epoch: 10   Global Step: 113990   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:36:46,266-Speed 5952.35 samples/sec   Loss 7.1854   LearningRate 0.1001   Epoch: 10   Global Step: 114000   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:36:53,129-Speed 5969.06 samples/sec   Loss 7.2365   LearningRate 0.1001   Epoch: 10   Global Step: 114010   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:36:59,992-Speed 5969.78 samples/sec   Loss 7.1628   LearningRate 0.1001   Epoch: 10   Global Step: 114020   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:37:06,844-Speed 5979.08 samples/sec   Loss 7.1744   LearningRate 0.1001   Epoch: 10   Global Step: 114030   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:37:13,686-Speed 5987.17 samples/sec   Loss 7.1399   LearningRate 0.1000   Epoch: 10   Global Step: 114040   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:37:20,542-Speed 5977.50 samples/sec   Loss 7.1499   LearningRate 0.1000   Epoch: 10   Global Step: 114050   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:37:27,412-Speed 5963.12 samples/sec   Loss 7.1142   LearningRate 0.1000   Epoch: 10   Global Step: 114060   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:37:51,428-Speed 1705.67 samples/sec   Loss 7.1571   LearningRate 0.1000   Epoch: 11   Global Step: 114070   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:37:58,264-Speed 5993.02 samples/sec   Loss 7.1330   LearningRate 0.1000   Epoch: 11   Global Step: 114080   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:38:05,107-Speed 5987.42 samples/sec   Loss 7.1313   LearningRate 0.0999   Epoch: 11   Global Step: 114090   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:38:11,954-Speed 5983.96 samples/sec   Loss 7.1215   LearningRate 0.0999   Epoch: 11   Global Step: 114100   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:38:18,819-Speed 5967.67 samples/sec   Loss 7.1864   LearningRate 0.0999   Epoch: 11   Global Step: 114110   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:38:25,702-Speed 5951.44 samples/sec   Loss 7.0655   LearningRate 0.0999   Epoch: 11   Global Step: 114120   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:38:32,576-Speed 5960.90 samples/sec   Loss 7.1030   LearningRate 0.0998   Epoch: 11   Global Step: 114130   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:38:39,449-Speed 5962.91 samples/sec   Loss 7.1188   LearningRate 0.0998   Epoch: 11   Global Step: 114140   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:38:46,321-Speed 5961.26 samples/sec   Loss 7.1873   LearningRate 0.0998   Epoch: 11   Global Step: 114150   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:38:53,233-Speed 5927.57 samples/sec   Loss 7.1540   LearningRate 0.0998   Epoch: 11   Global Step: 114160   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:39:00,225-Speed 5859.56 samples/sec   Loss 7.1939   LearningRate 0.0998   Epoch: 11   Global Step: 114170   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:39:07,097-Speed 5961.20 samples/sec   Loss 7.1496   LearningRate 0.0997   Epoch: 11   Global Step: 114180   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:39:13,992-Speed 5941.70 samples/sec   Loss 7.1240   LearningRate 0.0997   Epoch: 11   Global Step: 114190   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:39:20,860-Speed 5965.69 samples/sec   Loss 7.1299   LearningRate 0.0997   Epoch: 11   Global Step: 114200   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:39:27,734-Speed 5959.17 samples/sec   Loss 7.0933   LearningRate 0.0997   Epoch: 11   Global Step: 114210   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:39:34,582-Speed 5982.75 samples/sec   Loss 7.0996   LearningRate 0.0997   Epoch: 11   Global Step: 114220   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:39:41,427-Speed 5985.42 samples/sec   Loss 7.0684   LearningRate 0.0996   Epoch: 11   Global Step: 114230   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:39:48,290-Speed 5969.21 samples/sec   Loss 7.1138   LearningRate 0.0996   Epoch: 11   Global Step: 114240   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:39:55,156-Speed 5966.68 samples/sec   Loss 7.1334   LearningRate 0.0996   Epoch: 11   Global Step: 114250   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:40:02,006-Speed 5981.20 samples/sec   Loss 7.1209   LearningRate 0.0996   Epoch: 11   Global Step: 114260   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:40:08,871-Speed 5967.68 samples/sec   Loss 7.1245   LearningRate 0.0995   Epoch: 11   Global Step: 114270   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:40:15,734-Speed 5969.13 samples/sec   Loss 7.1452   LearningRate 0.0995   Epoch: 11   Global Step: 114280   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:40:22,584-Speed 5981.37 samples/sec   Loss 7.1216   LearningRate 0.0995   Epoch: 11   Global Step: 114290   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:40:29,427-Speed 5986.06 samples/sec   Loss 7.1379   LearningRate 0.0995   Epoch: 11   Global Step: 114300   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:40:36,263-Speed 5993.07 samples/sec   Loss 7.1189   LearningRate 0.0995   Epoch: 11   Global Step: 114310   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:40:43,104-Speed 5988.74 samples/sec   Loss 7.0890   LearningRate 0.0994   Epoch: 11   Global Step: 114320   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:40:49,959-Speed 5976.47 samples/sec   Loss 7.1107   LearningRate 0.0994   Epoch: 11   Global Step: 114330   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:40:56,811-Speed 5978.83 samples/sec   Loss 7.1776   LearningRate 0.0994   Epoch: 11   Global Step: 114340   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:41:03,657-Speed 5985.02 samples/sec   Loss 7.0901   LearningRate 0.0994   Epoch: 11   Global Step: 114350   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:41:10,517-Speed 5972.23 samples/sec   Loss 7.0855   LearningRate 0.0994   Epoch: 11   Global Step: 114360   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:41:17,386-Speed 5964.56 samples/sec   Loss 7.1373   LearningRate 0.0993   Epoch: 11   Global Step: 114370   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:41:24,271-Speed 5950.72 samples/sec   Loss 7.1223   LearningRate 0.0993   Epoch: 11   Global Step: 114380   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:41:31,118-Speed 5982.93 samples/sec   Loss 7.1376   LearningRate 0.0993   Epoch: 11   Global Step: 114390   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:41:37,975-Speed 5973.88 samples/sec   Loss 7.1358   LearningRate 0.0993   Epoch: 11   Global Step: 114400   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:41:44,828-Speed 5978.29 samples/sec   Loss 7.0785   LearningRate 0.0992   Epoch: 11   Global Step: 114410   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:41:51,706-Speed 5955.76 samples/sec   Loss 7.1515   LearningRate 0.0992   Epoch: 11   Global Step: 114420   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:41:58,573-Speed 5966.82 samples/sec   Loss 7.1426   LearningRate 0.0992   Epoch: 11   Global Step: 114430   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:42:05,455-Speed 5952.61 samples/sec   Loss 7.0530   LearningRate 0.0992   Epoch: 11   Global Step: 114440   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:42:12,318-Speed 5969.34 samples/sec   Loss 7.0851   LearningRate 0.0992   Epoch: 11   Global Step: 114450   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:42:19,201-Speed 5954.61 samples/sec   Loss 7.0548   LearningRate 0.0991   Epoch: 11   Global Step: 114460   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:42:26,077-Speed 5959.83 samples/sec   Loss 7.1332   LearningRate 0.0991   Epoch: 11   Global Step: 114470   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:42:32,970-Speed 5943.35 samples/sec   Loss 7.1758   LearningRate 0.0991   Epoch: 11   Global Step: 114480   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:42:39,931-Speed 5885.97 samples/sec   Loss 7.0914   LearningRate 0.0991   Epoch: 11   Global Step: 114490   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:42:46,872-Speed 5901.60 samples/sec   Loss 7.1583   LearningRate 0.0991   Epoch: 11   Global Step: 114500   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:42:53,721-Speed 5982.02 samples/sec   Loss 7.1061   LearningRate 0.0990   Epoch: 11   Global Step: 114510   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:43:00,595-Speed 5960.13 samples/sec   Loss 7.0959   LearningRate 0.0990   Epoch: 11   Global Step: 114520   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:43:07,459-Speed 5968.41 samples/sec   Loss 7.0578   LearningRate 0.0990   Epoch: 11   Global Step: 114530   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:43:14,332-Speed 5962.04 samples/sec   Loss 7.0557   LearningRate 0.0990   Epoch: 11   Global Step: 114540   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:43:21,198-Speed 5966.56 samples/sec   Loss 7.1336   LearningRate 0.0990   Epoch: 11   Global Step: 114550   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:43:28,055-Speed 5975.09 samples/sec   Loss 7.1098   LearningRate 0.0989   Epoch: 11   Global Step: 114560   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:43:34,951-Speed 5943.38 samples/sec   Loss 7.0835   LearningRate 0.0989   Epoch: 11   Global Step: 114570   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:43:41,807-Speed 5975.07 samples/sec   Loss 7.0536   LearningRate 0.0989   Epoch: 11   Global Step: 114580   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:43:48,677-Speed 5962.96 samples/sec   Loss 7.1144   LearningRate 0.0989   Epoch: 11   Global Step: 114590   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:43:55,549-Speed 5961.78 samples/sec   Loss 7.1130   LearningRate 0.0988   Epoch: 11   Global Step: 114600   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:44:02,415-Speed 5967.26 samples/sec   Loss 7.1083   LearningRate 0.0988   Epoch: 11   Global Step: 114610   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:44:09,269-Speed 5976.95 samples/sec   Loss 7.0792   LearningRate 0.0988   Epoch: 11   Global Step: 114620   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:44:16,169-Speed 5937.97 samples/sec   Loss 7.0926   LearningRate 0.0988   Epoch: 11   Global Step: 114630   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:44:23,031-Speed 5970.50 samples/sec   Loss 7.1169   LearningRate 0.0988   Epoch: 11   Global Step: 114640   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:44:29,932-Speed 5936.74 samples/sec   Loss 7.1147   LearningRate 0.0987   Epoch: 11   Global Step: 114650   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:44:36,799-Speed 5965.21 samples/sec   Loss 7.1055   LearningRate 0.0987   Epoch: 11   Global Step: 114660   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:44:43,666-Speed 5966.04 samples/sec   Loss 7.1119   LearningRate 0.0987   Epoch: 11   Global Step: 114670   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:44:50,520-Speed 5976.81 samples/sec   Loss 7.1002   LearningRate 0.0987   Epoch: 11   Global Step: 114680   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:44:57,377-Speed 5974.28 samples/sec   Loss 7.0701   LearningRate 0.0987   Epoch: 11   Global Step: 114690   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:45:04,261-Speed 5953.33 samples/sec   Loss 7.0339   LearningRate 0.0986   Epoch: 11   Global Step: 114700   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:45:11,114-Speed 5977.20 samples/sec   Loss 7.0494   LearningRate 0.0986   Epoch: 11   Global Step: 114710   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:45:18,435-Speed 5595.81 samples/sec   Loss 7.0857   LearningRate 0.0986   Epoch: 11   Global Step: 114720   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:45:25,336-Speed 5937.25 samples/sec   Loss 7.0828   LearningRate 0.0986   Epoch: 11   Global Step: 114730   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:45:32,188-Speed 5978.65 samples/sec   Loss 7.1018   LearningRate 0.0985   Epoch: 11   Global Step: 114740   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:45:39,048-Speed 5971.82 samples/sec   Loss 7.1109   LearningRate 0.0985   Epoch: 11   Global Step: 114750   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:45:45,896-Speed 5982.06 samples/sec   Loss 7.0668   LearningRate 0.0985   Epoch: 11   Global Step: 114760   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:45:52,741-Speed 5985.38 samples/sec   Loss 7.0870   LearningRate 0.0985   Epoch: 11   Global Step: 114770   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:45:59,597-Speed 5977.10 samples/sec   Loss 7.1405   LearningRate 0.0985   Epoch: 11   Global Step: 114780   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:46:06,453-Speed 5975.03 samples/sec   Loss 7.0657   LearningRate 0.0984   Epoch: 11   Global Step: 114790   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:46:13,311-Speed 5974.22 samples/sec   Loss 7.0861   LearningRate 0.0984   Epoch: 11   Global Step: 114800   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:46:20,164-Speed 5977.48 samples/sec   Loss 7.1009   LearningRate 0.0984   Epoch: 11   Global Step: 114810   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:46:27,018-Speed 5976.33 samples/sec   Loss 7.0976   LearningRate 0.0984   Epoch: 11   Global Step: 114820   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:46:33,866-Speed 5982.42 samples/sec   Loss 7.1084   LearningRate 0.0984   Epoch: 11   Global Step: 114830   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:46:40,724-Speed 5973.64 samples/sec   Loss 7.1008   LearningRate 0.0983   Epoch: 11   Global Step: 114840   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:46:47,586-Speed 5969.68 samples/sec   Loss 7.0629   LearningRate 0.0983   Epoch: 11   Global Step: 114850   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:46:54,979-Speed 5897.85 samples/sec   Loss 7.0747   LearningRate 0.0983   Epoch: 11   Global Step: 114860   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:47:01,834-Speed 5976.07 samples/sec   Loss 7.0679   LearningRate 0.0983   Epoch: 11   Global Step: 114870   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:47:08,678-Speed 5985.68 samples/sec   Loss 7.0521   LearningRate 0.0982   Epoch: 11   Global Step: 114880   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-08 18:47:15,520-Speed 5987.98 samples/sec   Loss 7.1134   LearningRate 0.0982   Epoch: 11   Global Step: 114890   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:47:22,373-Speed 5977.76 samples/sec   Loss 7.0866   LearningRate 0.0982   Epoch: 11   Global Step: 114900   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:47:29,249-Speed 5958.24 samples/sec   Loss 7.0587   LearningRate 0.0982   Epoch: 11   Global Step: 114910   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:47:36,118-Speed 5968.27 samples/sec   Loss 7.0778   LearningRate 0.0982   Epoch: 11   Global Step: 114920   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:47:42,981-Speed 5969.59 samples/sec   Loss 7.0977   LearningRate 0.0981   Epoch: 11   Global Step: 114930   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:47:49,846-Speed 5967.26 samples/sec   Loss 7.1482   LearningRate 0.0981   Epoch: 11   Global Step: 114940   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:47:56,713-Speed 5966.17 samples/sec   Loss 7.1004   LearningRate 0.0981   Epoch: 11   Global Step: 114950   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:48:03,576-Speed 5969.24 samples/sec   Loss 7.0508   LearningRate 0.0981   Epoch: 11   Global Step: 114960   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:48:10,431-Speed 5975.82 samples/sec   Loss 7.0266   LearningRate 0.0981   Epoch: 11   Global Step: 114970   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:48:17,283-Speed 5982.60 samples/sec   Loss 7.0441   LearningRate 0.0980   Epoch: 11   Global Step: 114980   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:48:24,125-Speed 5987.02 samples/sec   Loss 7.0239   LearningRate 0.0980   Epoch: 11   Global Step: 114990   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:48:30,971-Speed 5984.23 samples/sec   Loss 7.1320   LearningRate 0.0980   Epoch: 11   Global Step: 115000   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:48:58,197-[lfw][115000]XNorm: 24.114920
Training: 2022-01-08 18:48:58,197-[lfw][115000]Accuracy-Flip: 0.99750+-0.00281
Training: 2022-01-08 18:48:58,198-[lfw][115000]Accuracy-Highest: 0.99783
Training: 2022-01-08 18:49:29,614-[cfp_fp][115000]XNorm: 21.190004
Training: 2022-01-08 18:49:29,615-[cfp_fp][115000]Accuracy-Flip: 0.98557+-0.00724
Training: 2022-01-08 18:49:29,616-[cfp_fp][115000]Accuracy-Highest: 0.98557
Training: 2022-01-08 18:49:56,492-[agedb_30][115000]XNorm: 23.470952
Training: 2022-01-08 18:49:56,493-[agedb_30][115000]Accuracy-Flip: 0.97383+-0.00641
Training: 2022-01-08 18:49:56,493-[agedb_30][115000]Accuracy-Highest: 0.97383
Training: 2022-01-08 18:50:03,338-Speed 443.45 samples/sec   Loss 7.0247   LearningRate 0.0980   Epoch: 11   Global Step: 115010   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:50:10,225-Speed 5950.49 samples/sec   Loss 7.1038   LearningRate 0.0980   Epoch: 11   Global Step: 115020   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:50:17,106-Speed 5954.30 samples/sec   Loss 7.0273   LearningRate 0.0979   Epoch: 11   Global Step: 115030   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:50:23,949-Speed 5986.04 samples/sec   Loss 7.0485   LearningRate 0.0979   Epoch: 11   Global Step: 115040   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:50:30,806-Speed 5975.06 samples/sec   Loss 7.0687   LearningRate 0.0979   Epoch: 11   Global Step: 115050   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:50:37,663-Speed 5974.36 samples/sec   Loss 7.0549   LearningRate 0.0979   Epoch: 11   Global Step: 115060   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:50:44,529-Speed 5966.60 samples/sec   Loss 7.0303   LearningRate 0.0978   Epoch: 11   Global Step: 115070   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:50:51,394-Speed 5967.18 samples/sec   Loss 7.0611   LearningRate 0.0978   Epoch: 11   Global Step: 115080   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:50:58,281-Speed 5948.50 samples/sec   Loss 7.0104   LearningRate 0.0978   Epoch: 11   Global Step: 115090   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:51:05,176-Speed 5941.86 samples/sec   Loss 7.0891   LearningRate 0.0978   Epoch: 11   Global Step: 115100   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:51:12,085-Speed 5930.42 samples/sec   Loss 7.1442   LearningRate 0.0978   Epoch: 11   Global Step: 115110   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:51:18,950-Speed 5967.42 samples/sec   Loss 7.0969   LearningRate 0.0977   Epoch: 11   Global Step: 115120   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:51:25,834-Speed 5950.57 samples/sec   Loss 7.0071   LearningRate 0.0977   Epoch: 11   Global Step: 115130   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:51:32,695-Speed 5971.51 samples/sec   Loss 7.1090   LearningRate 0.0977   Epoch: 11   Global Step: 115140   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:51:39,590-Speed 5941.16 samples/sec   Loss 7.0864   LearningRate 0.0977   Epoch: 11   Global Step: 115150   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:51:46,445-Speed 5976.89 samples/sec   Loss 7.0796   LearningRate 0.0977   Epoch: 11   Global Step: 115160   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:51:53,300-Speed 5976.09 samples/sec   Loss 7.0159   LearningRate 0.0976   Epoch: 11   Global Step: 115170   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:52:00,162-Speed 5969.88 samples/sec   Loss 7.0835   LearningRate 0.0976   Epoch: 11   Global Step: 115180   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:52:07,024-Speed 5970.46 samples/sec   Loss 7.1043   LearningRate 0.0976   Epoch: 11   Global Step: 115190   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:52:13,879-Speed 5976.01 samples/sec   Loss 7.0629   LearningRate 0.0976   Epoch: 11   Global Step: 115200   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:52:20,748-Speed 5963.85 samples/sec   Loss 6.9985   LearningRate 0.0975   Epoch: 11   Global Step: 115210   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:52:27,630-Speed 5953.20 samples/sec   Loss 7.0555   LearningRate 0.0975   Epoch: 11   Global Step: 115220   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:52:34,484-Speed 5978.17 samples/sec   Loss 7.0842   LearningRate 0.0975   Epoch: 11   Global Step: 115230   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:52:41,337-Speed 5978.13 samples/sec   Loss 7.1003   LearningRate 0.0975   Epoch: 11   Global Step: 115240   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:52:48,195-Speed 5973.48 samples/sec   Loss 7.1084   LearningRate 0.0975   Epoch: 11   Global Step: 115250   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:52:55,056-Speed 5971.14 samples/sec   Loss 7.0708   LearningRate 0.0974   Epoch: 11   Global Step: 115260   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:53:01,910-Speed 5977.03 samples/sec   Loss 7.0381   LearningRate 0.0974   Epoch: 11   Global Step: 115270   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:53:08,779-Speed 5964.14 samples/sec   Loss 7.0324   LearningRate 0.0974   Epoch: 11   Global Step: 115280   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:53:15,654-Speed 5958.66 samples/sec   Loss 7.0629   LearningRate 0.0974   Epoch: 11   Global Step: 115290   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:53:22,518-Speed 5968.96 samples/sec   Loss 7.0724   LearningRate 0.0974   Epoch: 11   Global Step: 115300   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:53:29,397-Speed 5954.79 samples/sec   Loss 7.0012   LearningRate 0.0973   Epoch: 11   Global Step: 115310   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:53:36,286-Speed 5947.63 samples/sec   Loss 7.0419   LearningRate 0.0973   Epoch: 11   Global Step: 115320   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:53:43,135-Speed 5981.16 samples/sec   Loss 7.1391   LearningRate 0.0973   Epoch: 11   Global Step: 115330   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:53:50,004-Speed 5964.10 samples/sec   Loss 7.0232   LearningRate 0.0973   Epoch: 11   Global Step: 115340   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:53:56,861-Speed 5974.61 samples/sec   Loss 7.0873   LearningRate 0.0973   Epoch: 11   Global Step: 115350   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:54:03,721-Speed 5971.98 samples/sec   Loss 6.9932   LearningRate 0.0972   Epoch: 11   Global Step: 115360   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:54:10,597-Speed 5957.72 samples/sec   Loss 7.0726   LearningRate 0.0972   Epoch: 11   Global Step: 115370   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:54:17,466-Speed 5964.25 samples/sec   Loss 7.0597   LearningRate 0.0972   Epoch: 11   Global Step: 115380   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:54:24,335-Speed 5964.38 samples/sec   Loss 6.9798   LearningRate 0.0972   Epoch: 11   Global Step: 115390   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:54:31,220-Speed 5949.92 samples/sec   Loss 7.0574   LearningRate 0.0971   Epoch: 11   Global Step: 115400   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:54:38,082-Speed 5971.06 samples/sec   Loss 7.0742   LearningRate 0.0971   Epoch: 11   Global Step: 115410   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:54:44,958-Speed 5958.17 samples/sec   Loss 7.0742   LearningRate 0.0971   Epoch: 11   Global Step: 115420   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:54:51,809-Speed 5979.19 samples/sec   Loss 7.0693   LearningRate 0.0971   Epoch: 11   Global Step: 115430   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:54:58,672-Speed 5969.38 samples/sec   Loss 6.9947   LearningRate 0.0971   Epoch: 11   Global Step: 115440   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:55:05,524-Speed 5978.71 samples/sec   Loss 7.0293   LearningRate 0.0970   Epoch: 11   Global Step: 115450   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:55:12,380-Speed 5975.52 samples/sec   Loss 7.0525   LearningRate 0.0970   Epoch: 11   Global Step: 115460   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:55:19,234-Speed 5976.66 samples/sec   Loss 6.9817   LearningRate 0.0970   Epoch: 11   Global Step: 115470   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:55:26,103-Speed 5964.53 samples/sec   Loss 7.0054   LearningRate 0.0970   Epoch: 11   Global Step: 115480   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:55:33,026-Speed 5917.50 samples/sec   Loss 6.9947   LearningRate 0.0970   Epoch: 11   Global Step: 115490   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:55:39,873-Speed 5982.88 samples/sec   Loss 7.1168   LearningRate 0.0969   Epoch: 11   Global Step: 115500   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:55:46,745-Speed 5961.81 samples/sec   Loss 7.0092   LearningRate 0.0969   Epoch: 11   Global Step: 115510   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:55:53,595-Speed 5980.28 samples/sec   Loss 6.9953   LearningRate 0.0969   Epoch: 11   Global Step: 115520   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:56:00,472-Speed 5957.53 samples/sec   Loss 7.0463   LearningRate 0.0969   Epoch: 11   Global Step: 115530   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:56:07,330-Speed 5973.65 samples/sec   Loss 7.0339   LearningRate 0.0969   Epoch: 11   Global Step: 115540   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:56:14,185-Speed 5975.62 samples/sec   Loss 7.0572   LearningRate 0.0968   Epoch: 11   Global Step: 115550   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:56:21,017-Speed 5996.97 samples/sec   Loss 7.0816   LearningRate 0.0968   Epoch: 11   Global Step: 115560   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:56:27,872-Speed 5975.96 samples/sec   Loss 7.0985   LearningRate 0.0968   Epoch: 11   Global Step: 115570   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:56:34,723-Speed 5979.70 samples/sec   Loss 7.0541   LearningRate 0.0968   Epoch: 11   Global Step: 115580   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:56:41,574-Speed 5981.36 samples/sec   Loss 6.9724   LearningRate 0.0967   Epoch: 11   Global Step: 115590   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:56:48,459-Speed 5951.38 samples/sec   Loss 7.0864   LearningRate 0.0967   Epoch: 11   Global Step: 115600   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:56:55,471-Speed 5842.51 samples/sec   Loss 7.0695   LearningRate 0.0967   Epoch: 11   Global Step: 115610   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:57:02,420-Speed 5895.66 samples/sec   Loss 7.0338   LearningRate 0.0967   Epoch: 11   Global Step: 115620   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:57:09,267-Speed 5983.08 samples/sec   Loss 7.0192   LearningRate 0.0967   Epoch: 11   Global Step: 115630   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:57:16,116-Speed 5981.16 samples/sec   Loss 6.9890   LearningRate 0.0966   Epoch: 11   Global Step: 115640   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:57:22,972-Speed 5975.62 samples/sec   Loss 6.9897   LearningRate 0.0966   Epoch: 11   Global Step: 115650   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:57:29,829-Speed 5974.85 samples/sec   Loss 7.0510   LearningRate 0.0966   Epoch: 11   Global Step: 115660   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:57:36,683-Speed 5977.25 samples/sec   Loss 7.0229   LearningRate 0.0966   Epoch: 11   Global Step: 115670   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:57:43,543-Speed 5971.94 samples/sec   Loss 7.0008   LearningRate 0.0966   Epoch: 11   Global Step: 115680   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:57:50,406-Speed 5968.27 samples/sec   Loss 6.9475   LearningRate 0.0965   Epoch: 11   Global Step: 115690   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:57:57,277-Speed 5962.44 samples/sec   Loss 7.0172   LearningRate 0.0965   Epoch: 11   Global Step: 115700   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:58:04,151-Speed 5959.75 samples/sec   Loss 7.0125   LearningRate 0.0965   Epoch: 11   Global Step: 115710   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:58:11,038-Speed 5948.18 samples/sec   Loss 6.9871   LearningRate 0.0965   Epoch: 11   Global Step: 115720   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:58:17,890-Speed 5979.13 samples/sec   Loss 7.0640   LearningRate 0.0965   Epoch: 11   Global Step: 115730   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:58:24,739-Speed 5982.25 samples/sec   Loss 7.0012   LearningRate 0.0964   Epoch: 11   Global Step: 115740   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:58:31,598-Speed 5973.20 samples/sec   Loss 7.0124   LearningRate 0.0964   Epoch: 11   Global Step: 115750   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:58:38,458-Speed 5971.52 samples/sec   Loss 7.0271   LearningRate 0.0964   Epoch: 11   Global Step: 115760   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:58:45,354-Speed 5941.49 samples/sec   Loss 6.9859   LearningRate 0.0964   Epoch: 11   Global Step: 115770   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:58:52,228-Speed 5964.83 samples/sec   Loss 6.9743   LearningRate 0.0963   Epoch: 11   Global Step: 115780   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:58:59,094-Speed 5966.72 samples/sec   Loss 7.0240   LearningRate 0.0963   Epoch: 11   Global Step: 115790   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:59:05,953-Speed 5973.23 samples/sec   Loss 7.0053   LearningRate 0.0963   Epoch: 11   Global Step: 115800   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:59:12,829-Speed 5958.26 samples/sec   Loss 7.0040   LearningRate 0.0963   Epoch: 11   Global Step: 115810   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 18:59:19,684-Speed 5976.21 samples/sec   Loss 7.0270   LearningRate 0.0963   Epoch: 11   Global Step: 115820   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:59:26,553-Speed 5964.48 samples/sec   Loss 6.9481   LearningRate 0.0962   Epoch: 11   Global Step: 115830   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:59:33,434-Speed 5953.93 samples/sec   Loss 6.9914   LearningRate 0.0962   Epoch: 11   Global Step: 115840   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:59:40,306-Speed 5962.30 samples/sec   Loss 7.0542   LearningRate 0.0962   Epoch: 11   Global Step: 115850   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:59:47,174-Speed 5965.08 samples/sec   Loss 7.0416   LearningRate 0.0962   Epoch: 11   Global Step: 115860   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 18:59:54,025-Speed 5979.62 samples/sec   Loss 6.9916   LearningRate 0.0962   Epoch: 11   Global Step: 115870   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:00:00,884-Speed 5972.70 samples/sec   Loss 7.0080   LearningRate 0.0961   Epoch: 11   Global Step: 115880   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:00:07,770-Speed 5949.36 samples/sec   Loss 6.9529   LearningRate 0.0961   Epoch: 11   Global Step: 115890   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:00:14,625-Speed 5976.65 samples/sec   Loss 7.0552   LearningRate 0.0961   Epoch: 11   Global Step: 115900   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:00:21,501-Speed 5958.92 samples/sec   Loss 7.0038   LearningRate 0.0961   Epoch: 11   Global Step: 115910   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:00:28,356-Speed 5977.62 samples/sec   Loss 6.9503   LearningRate 0.0961   Epoch: 11   Global Step: 115920   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:00:35,212-Speed 5975.87 samples/sec   Loss 6.9655   LearningRate 0.0960   Epoch: 11   Global Step: 115930   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:00:42,082-Speed 5962.92 samples/sec   Loss 7.0182   LearningRate 0.0960   Epoch: 11   Global Step: 115940   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:00:48,975-Speed 5943.85 samples/sec   Loss 6.9515   LearningRate 0.0960   Epoch: 11   Global Step: 115950   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:00:55,831-Speed 5975.27 samples/sec   Loss 6.9896   LearningRate 0.0960   Epoch: 11   Global Step: 115960   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:01:02,677-Speed 5984.45 samples/sec   Loss 7.0506   LearningRate 0.0959   Epoch: 11   Global Step: 115970   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:01:09,554-Speed 5958.17 samples/sec   Loss 6.9859   LearningRate 0.0959   Epoch: 11   Global Step: 115980   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:01:16,413-Speed 5972.15 samples/sec   Loss 6.9682   LearningRate 0.0959   Epoch: 11   Global Step: 115990   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:01:23,272-Speed 5973.58 samples/sec   Loss 6.9543   LearningRate 0.0959   Epoch: 11   Global Step: 116000   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:01:30,135-Speed 5971.09 samples/sec   Loss 7.0342   LearningRate 0.0959   Epoch: 11   Global Step: 116010   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:01:36,990-Speed 5976.63 samples/sec   Loss 7.0212   LearningRate 0.0958   Epoch: 11   Global Step: 116020   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:01:43,867-Speed 5956.58 samples/sec   Loss 7.0070   LearningRate 0.0958   Epoch: 11   Global Step: 116030   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:01:50,718-Speed 5980.03 samples/sec   Loss 7.0184   LearningRate 0.0958   Epoch: 11   Global Step: 116040   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:01:57,590-Speed 5961.51 samples/sec   Loss 7.0000   LearningRate 0.0958   Epoch: 11   Global Step: 116050   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:02:04,438-Speed 5982.54 samples/sec   Loss 7.0352   LearningRate 0.0958   Epoch: 11   Global Step: 116060   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:02:11,311-Speed 5960.40 samples/sec   Loss 6.9139   LearningRate 0.0957   Epoch: 11   Global Step: 116070   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:02:18,159-Speed 5982.24 samples/sec   Loss 6.9709   LearningRate 0.0957   Epoch: 11   Global Step: 116080   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:02:25,017-Speed 5973.68 samples/sec   Loss 7.0174   LearningRate 0.0957   Epoch: 11   Global Step: 116090   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:02:31,892-Speed 5958.82 samples/sec   Loss 7.0045   LearningRate 0.0957   Epoch: 11   Global Step: 116100   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:02:38,756-Speed 5968.53 samples/sec   Loss 7.0530   LearningRate 0.0957   Epoch: 11   Global Step: 116110   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:02:45,610-Speed 5977.29 samples/sec   Loss 7.0616   LearningRate 0.0956   Epoch: 11   Global Step: 116120   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:02:52,472-Speed 5969.95 samples/sec   Loss 7.0039   LearningRate 0.0956   Epoch: 11   Global Step: 116130   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:02:59,328-Speed 5975.21 samples/sec   Loss 7.0023   LearningRate 0.0956   Epoch: 11   Global Step: 116140   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:03:06,227-Speed 5937.76 samples/sec   Loss 7.0209   LearningRate 0.0956   Epoch: 11   Global Step: 116150   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:03:13,143-Speed 5926.68 samples/sec   Loss 6.9869   LearningRate 0.0955   Epoch: 11   Global Step: 116160   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:03:20,047-Speed 5933.88 samples/sec   Loss 7.0002   LearningRate 0.0955   Epoch: 11   Global Step: 116170   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:03:26,948-Speed 5936.07 samples/sec   Loss 6.9702   LearningRate 0.0955   Epoch: 11   Global Step: 116180   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:03:33,829-Speed 5954.27 samples/sec   Loss 6.9650   LearningRate 0.0955   Epoch: 11   Global Step: 116190   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:03:40,670-Speed 5988.34 samples/sec   Loss 6.9772   LearningRate 0.0955   Epoch: 11   Global Step: 116200   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 19:03:47,532-Speed 5973.96 samples/sec   Loss 6.9872   LearningRate 0.0954   Epoch: 11   Global Step: 116210   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 19:03:54,410-Speed 5955.86 samples/sec   Loss 6.9980   LearningRate 0.0954   Epoch: 11   Global Step: 116220   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 19:04:01,288-Speed 5956.54 samples/sec   Loss 6.9476   LearningRate 0.0954   Epoch: 11   Global Step: 116230   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 19:04:08,169-Speed 5953.34 samples/sec   Loss 7.0025   LearningRate 0.0954   Epoch: 11   Global Step: 116240   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 19:04:15,030-Speed 5971.42 samples/sec   Loss 6.9714   LearningRate 0.0954   Epoch: 11   Global Step: 116250   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 19:04:21,884-Speed 5977.09 samples/sec   Loss 7.0064   LearningRate 0.0953   Epoch: 11   Global Step: 116260   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 19:04:28,733-Speed 5980.94 samples/sec   Loss 6.9362   LearningRate 0.0953   Epoch: 11   Global Step: 116270   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 19:04:35,591-Speed 5976.73 samples/sec   Loss 7.0013   LearningRate 0.0953   Epoch: 11   Global Step: 116280   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 19:04:42,463-Speed 5960.92 samples/sec   Loss 6.8973   LearningRate 0.0953   Epoch: 11   Global Step: 116290   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 19:04:49,334-Speed 5962.97 samples/sec   Loss 6.9895   LearningRate 0.0953   Epoch: 11   Global Step: 116300   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:04:56,206-Speed 5962.00 samples/sec   Loss 6.9826   LearningRate 0.0952   Epoch: 11   Global Step: 116310   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:05:03,086-Speed 5954.10 samples/sec   Loss 6.9726   LearningRate 0.0952   Epoch: 11   Global Step: 116320   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:05:10,036-Speed 5895.38 samples/sec   Loss 6.9508   LearningRate 0.0952   Epoch: 11   Global Step: 116330   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:05:16,891-Speed 5976.21 samples/sec   Loss 6.9827   LearningRate 0.0952   Epoch: 11   Global Step: 116340   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:05:23,740-Speed 5982.36 samples/sec   Loss 6.9505   LearningRate 0.0952   Epoch: 11   Global Step: 116350   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:05:30,583-Speed 5985.76 samples/sec   Loss 6.9820   LearningRate 0.0951   Epoch: 11   Global Step: 116360   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:05:37,445-Speed 5970.55 samples/sec   Loss 6.9446   LearningRate 0.0951   Epoch: 11   Global Step: 116370   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:05:44,302-Speed 5974.38 samples/sec   Loss 6.9941   LearningRate 0.0951   Epoch: 11   Global Step: 116380   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:05:51,169-Speed 5966.22 samples/sec   Loss 7.0254   LearningRate 0.0951   Epoch: 11   Global Step: 116390   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:05:58,053-Speed 5950.63 samples/sec   Loss 6.9536   LearningRate 0.0950   Epoch: 11   Global Step: 116400   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:06:04,928-Speed 5960.02 samples/sec   Loss 6.9594   LearningRate 0.0950   Epoch: 11   Global Step: 116410   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:06:11,793-Speed 5967.12 samples/sec   Loss 7.0157   LearningRate 0.0950   Epoch: 11   Global Step: 116420   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:06:18,701-Speed 5931.07 samples/sec   Loss 6.9303   LearningRate 0.0950   Epoch: 11   Global Step: 116430   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:06:25,567-Speed 5966.14 samples/sec   Loss 6.9537   LearningRate 0.0950   Epoch: 11   Global Step: 116440   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:06:32,426-Speed 5972.91 samples/sec   Loss 6.9794   LearningRate 0.0949   Epoch: 11   Global Step: 116450   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:06:39,282-Speed 5975.38 samples/sec   Loss 7.0050   LearningRate 0.0949   Epoch: 11   Global Step: 116460   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:06:46,136-Speed 5977.73 samples/sec   Loss 6.9428   LearningRate 0.0949   Epoch: 11   Global Step: 116470   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 19:06:53,019-Speed 5953.51 samples/sec   Loss 6.9996   LearningRate 0.0949   Epoch: 11   Global Step: 116480   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 19:06:59,886-Speed 5966.05 samples/sec   Loss 7.0506   LearningRate 0.0949   Epoch: 11   Global Step: 116490   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 19:07:06,757-Speed 5962.11 samples/sec   Loss 6.9937   LearningRate 0.0948   Epoch: 11   Global Step: 116500   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 19:07:13,632-Speed 5959.25 samples/sec   Loss 7.0012   LearningRate 0.0948   Epoch: 11   Global Step: 116510   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 19:07:20,484-Speed 5979.04 samples/sec   Loss 6.9466   LearningRate 0.0948   Epoch: 11   Global Step: 116520   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 19:07:27,379-Speed 5942.05 samples/sec   Loss 6.9976   LearningRate 0.0948   Epoch: 11   Global Step: 116530   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 19:07:34,250-Speed 5961.93 samples/sec   Loss 6.9298   LearningRate 0.0948   Epoch: 11   Global Step: 116540   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 19:07:41,123-Speed 5960.65 samples/sec   Loss 7.0082   LearningRate 0.0947   Epoch: 11   Global Step: 116550   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 19:07:47,969-Speed 5984.22 samples/sec   Loss 6.9315   LearningRate 0.0947   Epoch: 11   Global Step: 116560   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-01-08 19:07:54,815-Speed 5986.80 samples/sec   Loss 7.0041   LearningRate 0.0947   Epoch: 11   Global Step: 116570   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:08:01,686-Speed 5962.33 samples/sec   Loss 6.9184   LearningRate 0.0947   Epoch: 11   Global Step: 116580   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:08:08,562-Speed 5958.29 samples/sec   Loss 6.9446   LearningRate 0.0946   Epoch: 11   Global Step: 116590   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:08:15,435-Speed 5960.67 samples/sec   Loss 6.9489   LearningRate 0.0946   Epoch: 11   Global Step: 116600   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:08:22,282-Speed 5983.33 samples/sec   Loss 6.9993   LearningRate 0.0946   Epoch: 11   Global Step: 116610   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:08:29,127-Speed 5985.76 samples/sec   Loss 6.9893   LearningRate 0.0946   Epoch: 11   Global Step: 116620   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:08:35,977-Speed 5980.25 samples/sec   Loss 6.9012   LearningRate 0.0946   Epoch: 11   Global Step: 116630   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:08:42,844-Speed 5965.63 samples/sec   Loss 6.9075   LearningRate 0.0945   Epoch: 11   Global Step: 116640   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:08:49,713-Speed 5964.66 samples/sec   Loss 6.9580   LearningRate 0.0945   Epoch: 11   Global Step: 116650   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:08:56,572-Speed 5972.83 samples/sec   Loss 7.0094   LearningRate 0.0945   Epoch: 11   Global Step: 116660   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:09:03,424-Speed 5980.10 samples/sec   Loss 6.9757   LearningRate 0.0945   Epoch: 11   Global Step: 116670   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:09:10,300-Speed 5958.04 samples/sec   Loss 6.9799   LearningRate 0.0945   Epoch: 11   Global Step: 116680   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:09:17,156-Speed 5976.78 samples/sec   Loss 6.9367   LearningRate 0.0944   Epoch: 11   Global Step: 116690   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:09:24,049-Speed 5944.31 samples/sec   Loss 6.9495   LearningRate 0.0944   Epoch: 11   Global Step: 116700   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:09:30,896-Speed 5983.25 samples/sec   Loss 6.9174   LearningRate 0.0944   Epoch: 11   Global Step: 116710   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:09:37,751-Speed 5976.18 samples/sec   Loss 6.9455   LearningRate 0.0944   Epoch: 11   Global Step: 116720   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:09:44,603-Speed 5979.56 samples/sec   Loss 6.9603   LearningRate 0.0944   Epoch: 11   Global Step: 116730   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:09:51,463-Speed 5972.05 samples/sec   Loss 6.9608   LearningRate 0.0943   Epoch: 11   Global Step: 116740   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:09:58,312-Speed 5981.76 samples/sec   Loss 6.9249   LearningRate 0.0943   Epoch: 11   Global Step: 116750   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:10:05,252-Speed 5903.21 samples/sec   Loss 6.9063   LearningRate 0.0943   Epoch: 11   Global Step: 116760   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:10:12,160-Speed 5929.80 samples/sec   Loss 6.9874   LearningRate 0.0943   Epoch: 11   Global Step: 116770   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:10:19,063-Speed 5935.02 samples/sec   Loss 6.9461   LearningRate 0.0943   Epoch: 11   Global Step: 116780   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:10:25,941-Speed 5956.88 samples/sec   Loss 6.9568   LearningRate 0.0942   Epoch: 11   Global Step: 116790   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:10:32,820-Speed 5955.05 samples/sec   Loss 6.9600   LearningRate 0.0942   Epoch: 11   Global Step: 116800   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:10:39,672-Speed 5979.21 samples/sec   Loss 6.9434   LearningRate 0.0942   Epoch: 11   Global Step: 116810   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:10:46,553-Speed 5954.99 samples/sec   Loss 6.8741   LearningRate 0.0942   Epoch: 11   Global Step: 116820   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:10:53,444-Speed 5946.47 samples/sec   Loss 6.9023   LearningRate 0.0941   Epoch: 11   Global Step: 116830   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:11:00,313-Speed 5963.55 samples/sec   Loss 6.9978   LearningRate 0.0941   Epoch: 11   Global Step: 116840   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:11:07,182-Speed 5966.83 samples/sec   Loss 7.0034   LearningRate 0.0941   Epoch: 11   Global Step: 116850   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:11:14,056-Speed 5959.36 samples/sec   Loss 6.9224   LearningRate 0.0941   Epoch: 11   Global Step: 116860   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:11:20,905-Speed 5981.43 samples/sec   Loss 6.9629   LearningRate 0.0941   Epoch: 11   Global Step: 116870   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:11:27,774-Speed 5964.56 samples/sec   Loss 6.9563   LearningRate 0.0940   Epoch: 11   Global Step: 116880   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:11:34,645-Speed 5962.09 samples/sec   Loss 6.9389   LearningRate 0.0940   Epoch: 11   Global Step: 116890   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:11:41,502-Speed 5974.68 samples/sec   Loss 6.9596   LearningRate 0.0940   Epoch: 11   Global Step: 116900   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:11:48,387-Speed 5950.00 samples/sec   Loss 6.9322   LearningRate 0.0940   Epoch: 11   Global Step: 116910   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:11:55,238-Speed 5982.31 samples/sec   Loss 6.9896   LearningRate 0.0940   Epoch: 11   Global Step: 116920   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:12:02,089-Speed 5980.19 samples/sec   Loss 6.9582   LearningRate 0.0939   Epoch: 11   Global Step: 116930   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:12:08,948-Speed 5972.36 samples/sec   Loss 6.9117   LearningRate 0.0939   Epoch: 11   Global Step: 116940   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:12:15,811-Speed 5969.72 samples/sec   Loss 6.9249   LearningRate 0.0939   Epoch: 11   Global Step: 116950   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:12:22,674-Speed 5969.14 samples/sec   Loss 6.8983   LearningRate 0.0939   Epoch: 11   Global Step: 116960   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:12:29,561-Speed 5949.12 samples/sec   Loss 6.9586   LearningRate 0.0939   Epoch: 11   Global Step: 116970   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-08 19:12:36,433-Speed 5961.48 samples/sec   Loss 6.9920   LearningRate 0.0938   Epoch: 11   Global Step: 116980   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:12:43,293-Speed 5972.68 samples/sec   Loss 7.0068   LearningRate 0.0938   Epoch: 11   Global Step: 116990   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:12:50,156-Speed 5969.21 samples/sec   Loss 6.9150   LearningRate 0.0938   Epoch: 11   Global Step: 117000   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:12:57,002-Speed 5984.31 samples/sec   Loss 6.9171   LearningRate 0.0938   Epoch: 11   Global Step: 117010   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:13:03,862-Speed 5971.96 samples/sec   Loss 6.9412   LearningRate 0.0938   Epoch: 11   Global Step: 117020   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:13:10,710-Speed 5982.47 samples/sec   Loss 6.8890   LearningRate 0.0937   Epoch: 11   Global Step: 117030   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:13:17,578-Speed 5964.39 samples/sec   Loss 6.9426   LearningRate 0.0937   Epoch: 11   Global Step: 117040   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:13:24,442-Speed 5970.39 samples/sec   Loss 6.9595   LearningRate 0.0937   Epoch: 11   Global Step: 117050   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:13:31,303-Speed 5971.60 samples/sec   Loss 6.9557   LearningRate 0.0937   Epoch: 11   Global Step: 117060   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:13:38,169-Speed 5966.15 samples/sec   Loss 6.9232   LearningRate 0.0937   Epoch: 11   Global Step: 117070   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:13:45,010-Speed 5988.43 samples/sec   Loss 6.9564   LearningRate 0.0936   Epoch: 11   Global Step: 117080   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:13:51,872-Speed 5971.24 samples/sec   Loss 6.9570   LearningRate 0.0936   Epoch: 11   Global Step: 117090   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:13:58,715-Speed 5987.08 samples/sec   Loss 6.9096   LearningRate 0.0936   Epoch: 11   Global Step: 117100   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:14:05,559-Speed 5985.95 samples/sec   Loss 6.9536   LearningRate 0.0936   Epoch: 11   Global Step: 117110   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:14:12,457-Speed 5938.27 samples/sec   Loss 6.8685   LearningRate 0.0935   Epoch: 11   Global Step: 117120   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:14:19,388-Speed 5910.41 samples/sec   Loss 6.9038   LearningRate 0.0935   Epoch: 11   Global Step: 117130   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:14:26,285-Speed 5940.55 samples/sec   Loss 6.9476   LearningRate 0.0935   Epoch: 11   Global Step: 117140   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:14:33,232-Speed 5896.90 samples/sec   Loss 6.9494   LearningRate 0.0935   Epoch: 11   Global Step: 117150   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:14:40,084-Speed 5978.80 samples/sec   Loss 6.9487   LearningRate 0.0935   Epoch: 11   Global Step: 117160   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:14:46,934-Speed 5980.15 samples/sec   Loss 6.9843   LearningRate 0.0934   Epoch: 11   Global Step: 117170   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:14:53,894-Speed 5885.82 samples/sec   Loss 6.9250   LearningRate 0.0934   Epoch: 11   Global Step: 117180   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:15:00,824-Speed 5912.07 samples/sec   Loss 6.9093   LearningRate 0.0934   Epoch: 11   Global Step: 117190   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:15:07,698-Speed 5959.66 samples/sec   Loss 6.9527   LearningRate 0.0934   Epoch: 11   Global Step: 117200   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:15:14,554-Speed 5977.14 samples/sec   Loss 6.9454   LearningRate 0.0934   Epoch: 11   Global Step: 117210   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:15:21,489-Speed 5907.07 samples/sec   Loss 6.8909   LearningRate 0.0933   Epoch: 11   Global Step: 117220   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:15:28,386-Speed 5940.44 samples/sec   Loss 6.9454   LearningRate 0.0933   Epoch: 11   Global Step: 117230   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:15:35,240-Speed 5976.96 samples/sec   Loss 6.9675   LearningRate 0.0933   Epoch: 11   Global Step: 117240   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:15:42,082-Speed 5988.45 samples/sec   Loss 6.9232   LearningRate 0.0933   Epoch: 11   Global Step: 117250   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:15:48,945-Speed 5969.32 samples/sec   Loss 6.9079   LearningRate 0.0933   Epoch: 11   Global Step: 117260   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:15:55,819-Speed 5960.23 samples/sec   Loss 6.9325   LearningRate 0.0932   Epoch: 11   Global Step: 117270   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:16:02,674-Speed 5976.27 samples/sec   Loss 6.8582   LearningRate 0.0932   Epoch: 11   Global Step: 117280   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:16:09,534-Speed 5971.25 samples/sec   Loss 6.9227   LearningRate 0.0932   Epoch: 11   Global Step: 117290   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-08 19:16:17,714-Speed 5008.12 samples/sec   Loss 6.9527   LearningRate 0.0932   Epoch: 11   Global Step: 117300   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-08 19:16:24,549-Speed 5994.47 samples/sec   Loss 6.8935   LearningRate 0.0932   Epoch: 11   Global Step: 117310   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:16:31,406-Speed 5975.01 samples/sec   Loss 6.8496   LearningRate 0.0931   Epoch: 11   Global Step: 117320   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:16:38,260-Speed 5976.12 samples/sec   Loss 6.9014   LearningRate 0.0931   Epoch: 11   Global Step: 117330   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:16:45,118-Speed 5974.04 samples/sec   Loss 6.9837   LearningRate 0.0931   Epoch: 11   Global Step: 117340   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:16:51,965-Speed 5983.73 samples/sec   Loss 6.8879   LearningRate 0.0931   Epoch: 11   Global Step: 117350   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:16:58,840-Speed 5958.95 samples/sec   Loss 6.8937   LearningRate 0.0931   Epoch: 11   Global Step: 117360   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:17:05,707-Speed 5967.81 samples/sec   Loss 6.8529   LearningRate 0.0930   Epoch: 11   Global Step: 117370   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:17:12,576-Speed 5964.40 samples/sec   Loss 6.8631   LearningRate 0.0930   Epoch: 11   Global Step: 117380   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:17:19,423-Speed 5983.21 samples/sec   Loss 6.9068   LearningRate 0.0930   Epoch: 11   Global Step: 117390   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:17:26,286-Speed 5969.05 samples/sec   Loss 6.9717   LearningRate 0.0930   Epoch: 11   Global Step: 117400   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:17:33,141-Speed 5976.61 samples/sec   Loss 6.9494   LearningRate 0.0929   Epoch: 11   Global Step: 117410   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-08 19:17:39,995-Speed 5977.09 samples/sec   Loss 6.8552   LearningRate 0.0929   Epoch: 11   Global Step: 117420   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-08 19:17:46,847-Speed 5978.64 samples/sec   Loss 6.9318   LearningRate 0.0929   Epoch: 11   Global Step: 117430   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-08 19:17:53,687-Speed 5991.21 samples/sec   Loss 6.8556   LearningRate 0.0929   Epoch: 11   Global Step: 117440   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:18:00,544-Speed 5974.29 samples/sec   Loss 6.9177   LearningRate 0.0929   Epoch: 11   Global Step: 117450   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:18:07,405-Speed 5973.75 samples/sec   Loss 6.9120   LearningRate 0.0928   Epoch: 11   Global Step: 117460   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:18:14,259-Speed 5976.46 samples/sec   Loss 6.8830   LearningRate 0.0928   Epoch: 11   Global Step: 117470   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:18:21,135-Speed 5959.04 samples/sec   Loss 6.8600   LearningRate 0.0928   Epoch: 11   Global Step: 117480   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:18:28,025-Speed 5946.06 samples/sec   Loss 6.9490   LearningRate 0.0928   Epoch: 11   Global Step: 117490   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:18:34,879-Speed 5976.91 samples/sec   Loss 6.8850   LearningRate 0.0928   Epoch: 11   Global Step: 117500   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:18:41,734-Speed 5976.67 samples/sec   Loss 6.9205   LearningRate 0.0927   Epoch: 11   Global Step: 117510   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:18:48,580-Speed 5984.50 samples/sec   Loss 6.8429   LearningRate 0.0927   Epoch: 11   Global Step: 117520   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:18:55,416-Speed 5993.26 samples/sec   Loss 6.9334   LearningRate 0.0927   Epoch: 11   Global Step: 117530   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:19:02,270-Speed 5978.90 samples/sec   Loss 6.9580   LearningRate 0.0927   Epoch: 11   Global Step: 117540   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:19:09,160-Speed 5951.46 samples/sec   Loss 6.9606   LearningRate 0.0927   Epoch: 11   Global Step: 117550   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:19:16,054-Speed 5943.25 samples/sec   Loss 6.8941   LearningRate 0.0926   Epoch: 11   Global Step: 117560   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:19:22,931-Speed 5957.29 samples/sec   Loss 6.9066   LearningRate 0.0926   Epoch: 11   Global Step: 117570   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:19:29,792-Speed 5971.36 samples/sec   Loss 6.8851   LearningRate 0.0926   Epoch: 11   Global Step: 117580   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:19:36,668-Speed 5957.64 samples/sec   Loss 6.8623   LearningRate 0.0926   Epoch: 11   Global Step: 117590   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:19:43,517-Speed 5981.42 samples/sec   Loss 6.9282   LearningRate 0.0926   Epoch: 11   Global Step: 117600   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:19:50,366-Speed 5981.76 samples/sec   Loss 6.8843   LearningRate 0.0925   Epoch: 11   Global Step: 117610   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:19:57,218-Speed 5979.22 samples/sec   Loss 6.9207   LearningRate 0.0925   Epoch: 11   Global Step: 117620   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:20:04,094-Speed 5958.82 samples/sec   Loss 6.8597   LearningRate 0.0925   Epoch: 11   Global Step: 117630   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:20:10,951-Speed 5974.48 samples/sec   Loss 6.8769   LearningRate 0.0925   Epoch: 11   Global Step: 117640   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:20:17,818-Speed 5965.61 samples/sec   Loss 6.8712   LearningRate 0.0925   Epoch: 11   Global Step: 117650   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:20:24,688-Speed 5963.71 samples/sec   Loss 6.8638   LearningRate 0.0924   Epoch: 11   Global Step: 117660   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:20:31,572-Speed 5951.18 samples/sec   Loss 6.9173   LearningRate 0.0924   Epoch: 11   Global Step: 117670   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:20:38,448-Speed 5958.99 samples/sec   Loss 6.8770   LearningRate 0.0924   Epoch: 11   Global Step: 117680   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:20:45,302-Speed 5976.85 samples/sec   Loss 6.8748   LearningRate 0.0924   Epoch: 11   Global Step: 117690   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:20:52,145-Speed 5987.06 samples/sec   Loss 6.9599   LearningRate 0.0923   Epoch: 11   Global Step: 117700   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:20:58,988-Speed 5986.69 samples/sec   Loss 6.8955   LearningRate 0.0923   Epoch: 11   Global Step: 117710   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:21:05,848-Speed 5971.75 samples/sec   Loss 6.8943   LearningRate 0.0923   Epoch: 11   Global Step: 117720   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:21:12,724-Speed 5958.56 samples/sec   Loss 6.9175   LearningRate 0.0923   Epoch: 11   Global Step: 117730   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:21:19,573-Speed 5981.04 samples/sec   Loss 6.9023   LearningRate 0.0923   Epoch: 11   Global Step: 117740   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:21:26,452-Speed 5957.06 samples/sec   Loss 6.9289   LearningRate 0.0922   Epoch: 11   Global Step: 117750   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:21:33,353-Speed 5936.14 samples/sec   Loss 6.8098   LearningRate 0.0922   Epoch: 11   Global Step: 117760   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:21:40,225-Speed 5963.38 samples/sec   Loss 6.8401   LearningRate 0.0922   Epoch: 11   Global Step: 117770   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:21:47,074-Speed 5981.61 samples/sec   Loss 6.8903   LearningRate 0.0922   Epoch: 11   Global Step: 117780   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:21:53,952-Speed 5956.52 samples/sec   Loss 6.8820   LearningRate 0.0922   Epoch: 11   Global Step: 117790   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:22:00,841-Speed 5946.80 samples/sec   Loss 6.9050   LearningRate 0.0921   Epoch: 11   Global Step: 117800   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:22:07,695-Speed 5977.30 samples/sec   Loss 6.8869   LearningRate 0.0921   Epoch: 11   Global Step: 117810   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:22:14,550-Speed 5976.66 samples/sec   Loss 6.8874   LearningRate 0.0921   Epoch: 11   Global Step: 117820   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:22:21,430-Speed 5954.11 samples/sec   Loss 6.8188   LearningRate 0.0921   Epoch: 11   Global Step: 117830   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:22:28,330-Speed 5937.98 samples/sec   Loss 6.9099   LearningRate 0.0921   Epoch: 11   Global Step: 117840   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:22:35,207-Speed 5957.27 samples/sec   Loss 6.8915   LearningRate 0.0920   Epoch: 11   Global Step: 117850   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:22:42,092-Speed 5950.69 samples/sec   Loss 6.7850   LearningRate 0.0920   Epoch: 11   Global Step: 117860   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:22:48,960-Speed 5964.79 samples/sec   Loss 6.8655   LearningRate 0.0920   Epoch: 11   Global Step: 117870   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:22:55,921-Speed 5885.63 samples/sec   Loss 6.8568   LearningRate 0.0920   Epoch: 11   Global Step: 117880   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-01-08 19:23:02,768-Speed 5982.68 samples/sec   Loss 6.7979   LearningRate 0.0920   Epoch: 11   Global Step: 117890   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:23:09,599-Speed 5997.95 samples/sec   Loss 6.8596   LearningRate 0.0919   Epoch: 11   Global Step: 117900   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:23:16,456-Speed 5978.16 samples/sec   Loss 6.8994   LearningRate 0.0919   Epoch: 11   Global Step: 117910   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:23:23,316-Speed 5971.98 samples/sec   Loss 6.8542   LearningRate 0.0919   Epoch: 11   Global Step: 117920   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:23:30,210-Speed 5942.92 samples/sec   Loss 6.8358   LearningRate 0.0919   Epoch: 11   Global Step: 117930   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:23:37,086-Speed 5960.32 samples/sec   Loss 6.8487   LearningRate 0.0919   Epoch: 11   Global Step: 117940   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:23:43,959-Speed 5962.41 samples/sec   Loss 6.8911   LearningRate 0.0918   Epoch: 11   Global Step: 117950   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:23:50,823-Speed 5968.48 samples/sec   Loss 6.8415   LearningRate 0.0918   Epoch: 11   Global Step: 117960   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:23:57,678-Speed 5975.67 samples/sec   Loss 6.9289   LearningRate 0.0918   Epoch: 11   Global Step: 117970   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:24:04,536-Speed 5973.75 samples/sec   Loss 6.8607   LearningRate 0.0918   Epoch: 11   Global Step: 117980   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:24:11,425-Speed 5947.55 samples/sec   Loss 6.9309   LearningRate 0.0918   Epoch: 11   Global Step: 117990   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:24:18,269-Speed 5986.32 samples/sec   Loss 6.8926   LearningRate 0.0917   Epoch: 11   Global Step: 118000   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:24:25,119-Speed 5980.64 samples/sec   Loss 6.8070   LearningRate 0.0917   Epoch: 11   Global Step: 118010   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:24:31,975-Speed 5975.54 samples/sec   Loss 6.8459   LearningRate 0.0917   Epoch: 11   Global Step: 118020   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:24:38,862-Speed 5948.08 samples/sec   Loss 6.9099   LearningRate 0.0917   Epoch: 11   Global Step: 118030   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:24:45,713-Speed 5980.39 samples/sec   Loss 6.8954   LearningRate 0.0917   Epoch: 11   Global Step: 118040   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:24:52,620-Speed 5931.99 samples/sec   Loss 6.8760   LearningRate 0.0916   Epoch: 11   Global Step: 118050   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:24:59,476-Speed 5975.77 samples/sec   Loss 6.8747   LearningRate 0.0916   Epoch: 11   Global Step: 118060   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:25:06,377-Speed 5936.87 samples/sec   Loss 6.8335   LearningRate 0.0916   Epoch: 11   Global Step: 118070   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:25:14,831-Speed 4845.37 samples/sec   Loss 6.8138   LearningRate 0.0916   Epoch: 11   Global Step: 118080   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:25:21,691-Speed 5972.35 samples/sec   Loss 6.8717   LearningRate 0.0915   Epoch: 11   Global Step: 118090   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-01-08 19:25:28,547-Speed 5975.09 samples/sec   Loss 6.7953   LearningRate 0.0915   Epoch: 11   Global Step: 118100   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:25:35,399-Speed 5979.07 samples/sec   Loss 6.8331   LearningRate 0.0915   Epoch: 11   Global Step: 118110   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:25:42,249-Speed 5980.73 samples/sec   Loss 6.8718   LearningRate 0.0915   Epoch: 11   Global Step: 118120   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-01-08 19:25:49,108-Speed 5974.86 samples/sec   Loss 6.8764   LearningRate 0.0915   Epoch: 11   Global Step: 118130   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:25:55,977-Speed 5964.03 samples/sec   Loss 6.8265   LearningRate 0.0914   Epoch: 11   Global Step: 118140   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:26:02,848-Speed 5961.89 samples/sec   Loss 6.8620   LearningRate 0.0914   Epoch: 11   Global Step: 118150   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:26:09,704-Speed 5975.33 samples/sec   Loss 6.8475   LearningRate 0.0914   Epoch: 11   Global Step: 118160   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:26:16,569-Speed 5968.83 samples/sec   Loss 6.8840   LearningRate 0.0914   Epoch: 11   Global Step: 118170   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:26:23,447-Speed 5956.42 samples/sec   Loss 6.8438   LearningRate 0.0914   Epoch: 11   Global Step: 118180   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:26:30,331-Speed 5951.41 samples/sec   Loss 6.8557   LearningRate 0.0913   Epoch: 11   Global Step: 118190   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:26:37,217-Speed 5952.03 samples/sec   Loss 6.8839   LearningRate 0.0913   Epoch: 11   Global Step: 118200   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-08 19:26:44,073-Speed 5975.53 samples/sec   Loss 6.8206   LearningRate 0.0913   Epoch: 11   Global Step: 118210   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:26:50,925-Speed 5980.00 samples/sec   Loss 6.8161   LearningRate 0.0913   Epoch: 11   Global Step: 118220   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:26:57,784-Speed 5972.93 samples/sec   Loss 6.8289   LearningRate 0.0913   Epoch: 11   Global Step: 118230   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:27:04,641-Speed 5975.14 samples/sec   Loss 6.9115   LearningRate 0.0912   Epoch: 11   Global Step: 118240   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:27:11,521-Speed 5954.85 samples/sec   Loss 6.8436   LearningRate 0.0912   Epoch: 11   Global Step: 118250   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:27:18,366-Speed 5985.31 samples/sec   Loss 6.8798   LearningRate 0.0912   Epoch: 11   Global Step: 118260   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:27:25,223-Speed 5974.09 samples/sec   Loss 6.7761   LearningRate 0.0912   Epoch: 11   Global Step: 118270   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:27:32,082-Speed 5972.65 samples/sec   Loss 6.8222   LearningRate 0.0912   Epoch: 11   Global Step: 118280   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:27:38,951-Speed 5964.31 samples/sec   Loss 6.8557   LearningRate 0.0911   Epoch: 11   Global Step: 118290   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:27:45,795-Speed 5985.92 samples/sec   Loss 6.8318   LearningRate 0.0911   Epoch: 11   Global Step: 118300   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:27:52,640-Speed 5985.00 samples/sec   Loss 6.8161   LearningRate 0.0911   Epoch: 11   Global Step: 118310   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-08 19:27:59,524-Speed 5953.86 samples/sec   Loss 6.8452   LearningRate 0.0911   Epoch: 11   Global Step: 118320   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:28:06,367-Speed 5986.33 samples/sec   Loss 6.8300   LearningRate 0.0911   Epoch: 11   Global Step: 118330   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:28:13,248-Speed 5953.67 samples/sec   Loss 6.8802   LearningRate 0.0910   Epoch: 11   Global Step: 118340   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:28:20,117-Speed 5964.78 samples/sec   Loss 6.8236   LearningRate 0.0910   Epoch: 11   Global Step: 118350   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:28:26,958-Speed 5988.50 samples/sec   Loss 6.8886   LearningRate 0.0910   Epoch: 11   Global Step: 118360   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:28:33,824-Speed 5966.19 samples/sec   Loss 6.9133   LearningRate 0.0910   Epoch: 11   Global Step: 118370   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:28:40,689-Speed 5967.55 samples/sec   Loss 6.8336   LearningRate 0.0910   Epoch: 11   Global Step: 118380   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:28:47,542-Speed 5978.84 samples/sec   Loss 6.8512   LearningRate 0.0909   Epoch: 11   Global Step: 118390   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:28:54,452-Speed 5928.52 samples/sec   Loss 6.8329   LearningRate 0.0909   Epoch: 11   Global Step: 118400   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:29:01,313-Speed 5970.90 samples/sec   Loss 6.8623   LearningRate 0.0909   Epoch: 11   Global Step: 118410   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:29:08,163-Speed 5981.10 samples/sec   Loss 6.7588   LearningRate 0.0909   Epoch: 11   Global Step: 118420   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:29:15,055-Speed 5944.04 samples/sec   Loss 6.8120   LearningRate 0.0909   Epoch: 11   Global Step: 118430   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:29:21,907-Speed 5978.58 samples/sec   Loss 6.8190   LearningRate 0.0908   Epoch: 11   Global Step: 118440   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:29:29,534-Speed 5371.23 samples/sec   Loss 6.8012   LearningRate 0.0908   Epoch: 11   Global Step: 118450   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:29:36,377-Speed 5987.52 samples/sec   Loss 6.8443   LearningRate 0.0908   Epoch: 11   Global Step: 118460   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:29:43,355-Speed 5870.48 samples/sec   Loss 6.7931   LearningRate 0.0908   Epoch: 11   Global Step: 118470   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:29:50,211-Speed 5975.61 samples/sec   Loss 6.8788   LearningRate 0.0907   Epoch: 11   Global Step: 118480   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:29:57,062-Speed 5982.25 samples/sec   Loss 6.7687   LearningRate 0.0907   Epoch: 11   Global Step: 118490   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:30:03,936-Speed 5959.94 samples/sec   Loss 6.8472   LearningRate 0.0907   Epoch: 11   Global Step: 118500   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:30:10,788-Speed 5978.26 samples/sec   Loss 6.8561   LearningRate 0.0907   Epoch: 11   Global Step: 118510   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:30:17,659-Speed 5962.48 samples/sec   Loss 6.8526   LearningRate 0.0907   Epoch: 11   Global Step: 118520   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-08 19:30:24,536-Speed 5958.03 samples/sec   Loss 6.7936   LearningRate 0.0906   Epoch: 11   Global Step: 118530   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-08 19:30:31,389-Speed 5978.08 samples/sec   Loss 6.8426   LearningRate 0.0906   Epoch: 11   Global Step: 118540   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:30:38,250-Speed 5971.00 samples/sec   Loss 6.8068   LearningRate 0.0906   Epoch: 11   Global Step: 118550   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:30:45,117-Speed 5966.07 samples/sec   Loss 6.8824   LearningRate 0.0906   Epoch: 11   Global Step: 118560   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:30:51,972-Speed 5976.31 samples/sec   Loss 6.7889   LearningRate 0.0906   Epoch: 11   Global Step: 118570   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:30:58,921-Speed 5895.04 samples/sec   Loss 6.7941   LearningRate 0.0905   Epoch: 11   Global Step: 118580   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:31:05,769-Speed 5982.93 samples/sec   Loss 6.8029   LearningRate 0.0905   Epoch: 11   Global Step: 118590   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:31:12,620-Speed 5979.83 samples/sec   Loss 6.8218   LearningRate 0.0905   Epoch: 11   Global Step: 118600   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:31:19,480-Speed 5971.59 samples/sec   Loss 6.8708   LearningRate 0.0905   Epoch: 11   Global Step: 118610   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:31:26,370-Speed 5946.76 samples/sec   Loss 6.8354   LearningRate 0.0905   Epoch: 11   Global Step: 118620   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:31:33,255-Speed 5949.70 samples/sec   Loss 6.8199   LearningRate 0.0904   Epoch: 11   Global Step: 118630   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:31:40,104-Speed 5981.51 samples/sec   Loss 6.8518   LearningRate 0.0904   Epoch: 11   Global Step: 118640   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:31:46,957-Speed 5979.62 samples/sec   Loss 6.7927   LearningRate 0.0904   Epoch: 11   Global Step: 118650   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:31:53,807-Speed 5980.50 samples/sec   Loss 6.8136   LearningRate 0.0904   Epoch: 11   Global Step: 118660   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:32:00,666-Speed 5973.07 samples/sec   Loss 6.7857   LearningRate 0.0904   Epoch: 11   Global Step: 118670   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:32:07,517-Speed 5980.02 samples/sec   Loss 6.8322   LearningRate 0.0903   Epoch: 11   Global Step: 118680   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:32:14,370-Speed 5978.67 samples/sec   Loss 6.8179   LearningRate 0.0903   Epoch: 11   Global Step: 118690   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:32:21,240-Speed 5963.20 samples/sec   Loss 6.8386   LearningRate 0.0903   Epoch: 11   Global Step: 118700   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:32:28,100-Speed 5972.75 samples/sec   Loss 6.8314   LearningRate 0.0903   Epoch: 11   Global Step: 118710   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:32:34,960-Speed 5970.86 samples/sec   Loss 6.7755   LearningRate 0.0903   Epoch: 11   Global Step: 118720   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:32:41,840-Speed 5954.70 samples/sec   Loss 6.7722   LearningRate 0.0902   Epoch: 11   Global Step: 118730   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:32:48,709-Speed 5964.83 samples/sec   Loss 6.8346   LearningRate 0.0902   Epoch: 11   Global Step: 118740   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:32:55,574-Speed 5966.46 samples/sec   Loss 6.8269   LearningRate 0.0902   Epoch: 11   Global Step: 118750   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:33:02,425-Speed 5980.08 samples/sec   Loss 6.8184   LearningRate 0.0902   Epoch: 11   Global Step: 118760   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:33:09,275-Speed 5980.63 samples/sec   Loss 6.8038   LearningRate 0.0902   Epoch: 11   Global Step: 118770   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:33:16,127-Speed 5979.27 samples/sec   Loss 6.8020   LearningRate 0.0901   Epoch: 11   Global Step: 118780   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:33:22,985-Speed 5973.80 samples/sec   Loss 6.7996   LearningRate 0.0901   Epoch: 11   Global Step: 118790   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:33:29,833-Speed 5982.01 samples/sec   Loss 6.8054   LearningRate 0.0901   Epoch: 11   Global Step: 118800   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:33:36,678-Speed 5985.06 samples/sec   Loss 6.8275   LearningRate 0.0901   Epoch: 11   Global Step: 118810   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:33:43,569-Speed 5945.12 samples/sec   Loss 6.7502   LearningRate 0.0901   Epoch: 11   Global Step: 118820   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:33:50,461-Speed 5944.34 samples/sec   Loss 6.7717   LearningRate 0.0900   Epoch: 11   Global Step: 118830   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:33:57,332-Speed 5962.18 samples/sec   Loss 6.7989   LearningRate 0.0900   Epoch: 11   Global Step: 118840   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:34:04,210-Speed 5955.87 samples/sec   Loss 6.7961   LearningRate 0.0900   Epoch: 11   Global Step: 118850   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:34:11,050-Speed 5990.11 samples/sec   Loss 6.8424   LearningRate 0.0900   Epoch: 11   Global Step: 118860   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:34:17,913-Speed 5968.45 samples/sec   Loss 6.8347   LearningRate 0.0900   Epoch: 11   Global Step: 118870   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:34:24,773-Speed 5973.04 samples/sec   Loss 6.8247   LearningRate 0.0899   Epoch: 11   Global Step: 118880   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:34:31,639-Speed 5966.74 samples/sec   Loss 6.7623   LearningRate 0.0899   Epoch: 11   Global Step: 118890   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:34:38,487-Speed 5981.47 samples/sec   Loss 6.8490   LearningRate 0.0899   Epoch: 11   Global Step: 118900   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:34:45,360-Speed 5961.19 samples/sec   Loss 6.7916   LearningRate 0.0899   Epoch: 11   Global Step: 118910   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:34:52,224-Speed 5968.55 samples/sec   Loss 6.7693   LearningRate 0.0899   Epoch: 11   Global Step: 118920   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:34:59,083-Speed 5972.73 samples/sec   Loss 6.7931   LearningRate 0.0898   Epoch: 11   Global Step: 118930   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:35:05,931-Speed 5982.08 samples/sec   Loss 6.8093   LearningRate 0.0898   Epoch: 11   Global Step: 118940   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:35:12,785-Speed 5978.03 samples/sec   Loss 6.7594   LearningRate 0.0898   Epoch: 11   Global Step: 118950   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:35:19,649-Speed 5968.49 samples/sec   Loss 6.8600   LearningRate 0.0898   Epoch: 11   Global Step: 118960   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:35:26,498-Speed 5981.55 samples/sec   Loss 6.7549   LearningRate 0.0898   Epoch: 11   Global Step: 118970   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:35:33,351-Speed 5977.72 samples/sec   Loss 6.8243   LearningRate 0.0897   Epoch: 11   Global Step: 118980   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:35:40,209-Speed 5973.82 samples/sec   Loss 6.8318   LearningRate 0.0897   Epoch: 11   Global Step: 118990   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:35:47,061-Speed 5978.66 samples/sec   Loss 6.7924   LearningRate 0.0897   Epoch: 11   Global Step: 119000   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:35:53,923-Speed 5970.42 samples/sec   Loss 6.7921   LearningRate 0.0897   Epoch: 11   Global Step: 119010   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:36:00,774-Speed 5980.00 samples/sec   Loss 6.7972   LearningRate 0.0897   Epoch: 11   Global Step: 119020   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:36:07,606-Speed 5996.51 samples/sec   Loss 6.7742   LearningRate 0.0896   Epoch: 11   Global Step: 119030   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:36:14,468-Speed 5969.74 samples/sec   Loss 6.8389   LearningRate 0.0896   Epoch: 11   Global Step: 119040   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:36:21,308-Speed 5989.16 samples/sec   Loss 6.8173   LearningRate 0.0896   Epoch: 11   Global Step: 119050   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:36:28,217-Speed 5929.39 samples/sec   Loss 6.8352   LearningRate 0.0896   Epoch: 11   Global Step: 119060   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:36:35,066-Speed 5981.68 samples/sec   Loss 6.7448   LearningRate 0.0895   Epoch: 11   Global Step: 119070   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:36:41,949-Speed 5952.14 samples/sec   Loss 6.7872   LearningRate 0.0895   Epoch: 11   Global Step: 119080   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:36:48,824-Speed 5959.30 samples/sec   Loss 6.7858   LearningRate 0.0895   Epoch: 11   Global Step: 119090   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:36:55,686-Speed 5970.16 samples/sec   Loss 6.7952   LearningRate 0.0895   Epoch: 11   Global Step: 119100   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:37:02,538-Speed 5977.98 samples/sec   Loss 6.8155   LearningRate 0.0895   Epoch: 11   Global Step: 119110   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:37:09,404-Speed 5967.11 samples/sec   Loss 6.7943   LearningRate 0.0894   Epoch: 11   Global Step: 119120   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:37:16,257-Speed 5981.00 samples/sec   Loss 6.7921   LearningRate 0.0894   Epoch: 11   Global Step: 119130   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:37:23,104-Speed 5982.40 samples/sec   Loss 6.7771   LearningRate 0.0894   Epoch: 11   Global Step: 119140   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:37:29,947-Speed 5986.94 samples/sec   Loss 6.8110   LearningRate 0.0894   Epoch: 11   Global Step: 119150   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:37:36,826-Speed 5955.89 samples/sec   Loss 6.8067   LearningRate 0.0894   Epoch: 11   Global Step: 119160   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:37:43,682-Speed 5974.88 samples/sec   Loss 6.7567   LearningRate 0.0893   Epoch: 11   Global Step: 119170   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:37:50,532-Speed 5981.13 samples/sec   Loss 6.8185   LearningRate 0.0893   Epoch: 11   Global Step: 119180   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:37:57,421-Speed 5946.50 samples/sec   Loss 6.7343   LearningRate 0.0893   Epoch: 11   Global Step: 119190   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:38:04,273-Speed 5979.27 samples/sec   Loss 6.7698   LearningRate 0.0893   Epoch: 11   Global Step: 119200   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:38:11,129-Speed 5974.51 samples/sec   Loss 6.7911   LearningRate 0.0893   Epoch: 11   Global Step: 119210   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:38:17,984-Speed 5976.80 samples/sec   Loss 6.8090   LearningRate 0.0892   Epoch: 11   Global Step: 119220   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:38:24,831-Speed 5982.19 samples/sec   Loss 6.7853   LearningRate 0.0892   Epoch: 11   Global Step: 119230   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:38:31,691-Speed 5972.87 samples/sec   Loss 6.7614   LearningRate 0.0892   Epoch: 11   Global Step: 119240   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:38:38,543-Speed 5981.15 samples/sec   Loss 6.7356   LearningRate 0.0892   Epoch: 11   Global Step: 119250   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:38:45,393-Speed 5980.39 samples/sec   Loss 6.7145   LearningRate 0.0892   Epoch: 11   Global Step: 119260   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:38:52,250-Speed 5974.79 samples/sec   Loss 6.7888   LearningRate 0.0891   Epoch: 11   Global Step: 119270   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:38:59,118-Speed 5967.92 samples/sec   Loss 6.7726   LearningRate 0.0891   Epoch: 11   Global Step: 119280   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:39:05,980-Speed 5970.24 samples/sec   Loss 6.8206   LearningRate 0.0891   Epoch: 11   Global Step: 119290   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:39:12,892-Speed 5926.86 samples/sec   Loss 6.7445   LearningRate 0.0891   Epoch: 11   Global Step: 119300   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:39:19,767-Speed 5959.31 samples/sec   Loss 6.7731   LearningRate 0.0891   Epoch: 11   Global Step: 119310   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:39:26,633-Speed 5966.80 samples/sec   Loss 6.7948   LearningRate 0.0890   Epoch: 11   Global Step: 119320   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:39:33,499-Speed 5966.23 samples/sec   Loss 6.6995   LearningRate 0.0890   Epoch: 11   Global Step: 119330   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:39:40,346-Speed 5983.46 samples/sec   Loss 6.7364   LearningRate 0.0890   Epoch: 11   Global Step: 119340   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:39:47,335-Speed 5860.97 samples/sec   Loss 6.8238   LearningRate 0.0890   Epoch: 11   Global Step: 119350   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-08 19:39:54,205-Speed 5966.00 samples/sec   Loss 6.7273   LearningRate 0.0890   Epoch: 11   Global Step: 119360   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:40:01,059-Speed 5977.57 samples/sec   Loss 6.7391   LearningRate 0.0889   Epoch: 11   Global Step: 119370   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:40:07,924-Speed 5967.44 samples/sec   Loss 6.7198   LearningRate 0.0889   Epoch: 11   Global Step: 119380   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:40:14,805-Speed 5953.74 samples/sec   Loss 6.7176   LearningRate 0.0889   Epoch: 11   Global Step: 119390   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:40:21,715-Speed 5928.93 samples/sec   Loss 6.7434   LearningRate 0.0889   Epoch: 11   Global Step: 119400   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:40:28,607-Speed 5943.65 samples/sec   Loss 6.7807   LearningRate 0.0889   Epoch: 11   Global Step: 119410   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:40:35,476-Speed 5964.24 samples/sec   Loss 6.7743   LearningRate 0.0888   Epoch: 11   Global Step: 119420   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:40:42,336-Speed 5971.79 samples/sec   Loss 6.8154   LearningRate 0.0888   Epoch: 11   Global Step: 119430   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:40:49,174-Speed 5991.23 samples/sec   Loss 6.7953   LearningRate 0.0888   Epoch: 11   Global Step: 119440   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:40:56,022-Speed 5982.56 samples/sec   Loss 6.7605   LearningRate 0.0888   Epoch: 11   Global Step: 119450   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:41:02,880-Speed 5974.36 samples/sec   Loss 6.7950   LearningRate 0.0888   Epoch: 11   Global Step: 119460   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:41:09,752-Speed 5961.01 samples/sec   Loss 6.8768   LearningRate 0.0887   Epoch: 11   Global Step: 119470   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:41:16,601-Speed 5982.01 samples/sec   Loss 6.7423   LearningRate 0.0887   Epoch: 11   Global Step: 119480   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:41:23,449-Speed 5982.62 samples/sec   Loss 6.7332   LearningRate 0.0887   Epoch: 11   Global Step: 119490   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:41:30,310-Speed 5971.50 samples/sec   Loss 6.7360   LearningRate 0.0887   Epoch: 11   Global Step: 119500   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:41:37,194-Speed 5950.55 samples/sec   Loss 6.8296   LearningRate 0.0887   Epoch: 11   Global Step: 119510   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:41:44,047-Speed 5977.94 samples/sec   Loss 6.7292   LearningRate 0.0886   Epoch: 11   Global Step: 119520   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:41:50,904-Speed 5974.96 samples/sec   Loss 6.7687   LearningRate 0.0886   Epoch: 11   Global Step: 119530   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:41:57,752-Speed 5982.09 samples/sec   Loss 6.7674   LearningRate 0.0886   Epoch: 11   Global Step: 119540   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:42:04,603-Speed 5980.10 samples/sec   Loss 6.8350   LearningRate 0.0886   Epoch: 11   Global Step: 119550   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:42:11,445-Speed 5988.07 samples/sec   Loss 6.7905   LearningRate 0.0886   Epoch: 11   Global Step: 119560   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:42:18,302-Speed 5974.33 samples/sec   Loss 6.7692   LearningRate 0.0885   Epoch: 11   Global Step: 119570   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:42:25,156-Speed 5977.39 samples/sec   Loss 6.7826   LearningRate 0.0885   Epoch: 11   Global Step: 119580   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:42:32,039-Speed 5951.99 samples/sec   Loss 6.7119   LearningRate 0.0885   Epoch: 11   Global Step: 119590   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:42:38,893-Speed 5977.42 samples/sec   Loss 6.7495   LearningRate 0.0885   Epoch: 11   Global Step: 119600   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:42:45,758-Speed 5967.78 samples/sec   Loss 6.7091   LearningRate 0.0885   Epoch: 11   Global Step: 119610   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:42:52,614-Speed 5974.19 samples/sec   Loss 6.7682   LearningRate 0.0884   Epoch: 11   Global Step: 119620   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:42:59,464-Speed 5981.12 samples/sec   Loss 6.7940   LearningRate 0.0884   Epoch: 11   Global Step: 119630   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:43:06,321-Speed 5974.41 samples/sec   Loss 6.7905   LearningRate 0.0884   Epoch: 11   Global Step: 119640   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-08 19:43:13,184-Speed 5969.45 samples/sec   Loss 6.8193   LearningRate 0.0884   Epoch: 11   Global Step: 119650   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:43:20,064-Speed 5954.32 samples/sec   Loss 6.7003   LearningRate 0.0884   Epoch: 11   Global Step: 119660   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:43:26,925-Speed 5971.88 samples/sec   Loss 6.7669   LearningRate 0.0883   Epoch: 11   Global Step: 119670   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:43:33,779-Speed 5976.41 samples/sec   Loss 6.7394   LearningRate 0.0883   Epoch: 11   Global Step: 119680   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:43:40,623-Speed 5986.60 samples/sec   Loss 6.7955   LearningRate 0.0883   Epoch: 11   Global Step: 119690   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:43:47,469-Speed 5984.34 samples/sec   Loss 6.7480   LearningRate 0.0883   Epoch: 11   Global Step: 119700   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:43:54,339-Speed 5963.20 samples/sec   Loss 6.7362   LearningRate 0.0883   Epoch: 11   Global Step: 119710   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:44:01,179-Speed 5989.70 samples/sec   Loss 6.7463   LearningRate 0.0882   Epoch: 11   Global Step: 119720   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:44:08,043-Speed 5970.54 samples/sec   Loss 6.7627   LearningRate 0.0882   Epoch: 11   Global Step: 119730   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:44:14,918-Speed 5959.25 samples/sec   Loss 6.7409   LearningRate 0.0882   Epoch: 11   Global Step: 119740   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:44:21,780-Speed 5970.77 samples/sec   Loss 6.7754   LearningRate 0.0882   Epoch: 11   Global Step: 119750   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:44:28,627-Speed 5982.91 samples/sec   Loss 6.7400   LearningRate 0.0882   Epoch: 11   Global Step: 119760   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:44:35,501-Speed 5959.84 samples/sec   Loss 6.7690   LearningRate 0.0881   Epoch: 11   Global Step: 119770   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:44:42,352-Speed 5979.83 samples/sec   Loss 6.7232   LearningRate 0.0881   Epoch: 11   Global Step: 119780   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:44:49,204-Speed 5978.77 samples/sec   Loss 6.7653   LearningRate 0.0881   Epoch: 11   Global Step: 119790   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:44:56,074-Speed 5963.43 samples/sec   Loss 6.7184   LearningRate 0.0881   Epoch: 11   Global Step: 119800   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:45:02,930-Speed 5975.30 samples/sec   Loss 6.7541   LearningRate 0.0881   Epoch: 11   Global Step: 119810   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:45:09,778-Speed 5981.98 samples/sec   Loss 6.7281   LearningRate 0.0880   Epoch: 11   Global Step: 119820   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:45:16,622-Speed 5986.22 samples/sec   Loss 6.7707   LearningRate 0.0880   Epoch: 11   Global Step: 119830   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:45:23,486-Speed 5968.45 samples/sec   Loss 6.7413   LearningRate 0.0880   Epoch: 11   Global Step: 119840   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:45:30,442-Speed 5889.60 samples/sec   Loss 6.7133   LearningRate 0.0880   Epoch: 11   Global Step: 119850   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:45:37,302-Speed 5972.15 samples/sec   Loss 6.7301   LearningRate 0.0880   Epoch: 11   Global Step: 119860   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:45:44,151-Speed 5980.96 samples/sec   Loss 6.6355   LearningRate 0.0879   Epoch: 11   Global Step: 119870   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:45:51,008-Speed 5975.29 samples/sec   Loss 6.7661   LearningRate 0.0879   Epoch: 11   Global Step: 119880   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:45:57,848-Speed 5989.06 samples/sec   Loss 6.7835   LearningRate 0.0879   Epoch: 11   Global Step: 119890   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:46:04,713-Speed 5967.89 samples/sec   Loss 6.6921   LearningRate 0.0879   Epoch: 11   Global Step: 119900   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:46:11,595-Speed 5953.34 samples/sec   Loss 6.6919   LearningRate 0.0879   Epoch: 11   Global Step: 119910   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:46:18,448-Speed 5977.24 samples/sec   Loss 6.6977   LearningRate 0.0878   Epoch: 11   Global Step: 119920   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:46:25,294-Speed 5984.49 samples/sec   Loss 6.6913   LearningRate 0.0878   Epoch: 11   Global Step: 119930   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:46:32,146-Speed 5979.57 samples/sec   Loss 6.7447   LearningRate 0.0878   Epoch: 11   Global Step: 119940   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:46:38,998-Speed 5978.98 samples/sec   Loss 6.7327   LearningRate 0.0878   Epoch: 11   Global Step: 119950   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:46:45,869-Speed 5961.95 samples/sec   Loss 6.7293   LearningRate 0.0878   Epoch: 11   Global Step: 119960   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:46:52,855-Speed 5864.60 samples/sec   Loss 6.6819   LearningRate 0.0877   Epoch: 11   Global Step: 119970   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:46:59,898-Speed 5816.83 samples/sec   Loss 6.6859   LearningRate 0.0877   Epoch: 11   Global Step: 119980   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:47:06,745-Speed 5982.95 samples/sec   Loss 6.7233   LearningRate 0.0877   Epoch: 11   Global Step: 119990   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:47:13,602-Speed 5974.62 samples/sec   Loss 6.7342   LearningRate 0.0877   Epoch: 11   Global Step: 120000   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:47:40,270-[lfw][120000]XNorm: 23.052176
Training: 2022-01-08 19:47:40,270-[lfw][120000]Accuracy-Flip: 0.99633+-0.00314
Training: 2022-01-08 19:47:40,271-[lfw][120000]Accuracy-Highest: 0.99783
Training: 2022-01-08 19:48:11,130-[cfp_fp][120000]XNorm: 20.136159
Training: 2022-01-08 19:48:11,131-[cfp_fp][120000]Accuracy-Flip: 0.98386+-0.00409
Training: 2022-01-08 19:48:11,132-[cfp_fp][120000]Accuracy-Highest: 0.98557
Training: 2022-01-08 19:48:37,853-[agedb_30][120000]XNorm: 22.390298
Training: 2022-01-08 19:48:37,854-[agedb_30][120000]Accuracy-Flip: 0.97383+-0.00606
Training: 2022-01-08 19:48:37,855-[agedb_30][120000]Accuracy-Highest: 0.97383
Training: 2022-01-08 19:48:44,718-Speed 449.54 samples/sec   Loss 6.7477   LearningRate 0.0877   Epoch: 11   Global Step: 120010   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:48:51,559-Speed 5988.36 samples/sec   Loss 6.7198   LearningRate 0.0876   Epoch: 11   Global Step: 120020   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:48:58,394-Speed 5993.71 samples/sec   Loss 6.6936   LearningRate 0.0876   Epoch: 11   Global Step: 120030   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:49:05,252-Speed 5973.75 samples/sec   Loss 6.7214   LearningRate 0.0876   Epoch: 11   Global Step: 120040   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-08 19:49:12,087-Speed 5993.04 samples/sec   Loss 6.7723   LearningRate 0.0876   Epoch: 11   Global Step: 120050   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:49:18,972-Speed 5951.07 samples/sec   Loss 6.7287   LearningRate 0.0876   Epoch: 11   Global Step: 120060   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:49:25,838-Speed 5966.25 samples/sec   Loss 6.6807   LearningRate 0.0875   Epoch: 11   Global Step: 120070   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:49:32,692-Speed 5977.07 samples/sec   Loss 6.7174   LearningRate 0.0875   Epoch: 11   Global Step: 120080   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:49:39,556-Speed 5968.51 samples/sec   Loss 6.7061   LearningRate 0.0875   Epoch: 11   Global Step: 120090   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:49:46,439-Speed 5952.68 samples/sec   Loss 6.7155   LearningRate 0.0875   Epoch: 11   Global Step: 120100   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:49:53,300-Speed 5971.16 samples/sec   Loss 6.7111   LearningRate 0.0875   Epoch: 11   Global Step: 120110   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:50:00,140-Speed 5988.94 samples/sec   Loss 6.7574   LearningRate 0.0874   Epoch: 11   Global Step: 120120   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:50:07,009-Speed 5964.45 samples/sec   Loss 6.7152   LearningRate 0.0874   Epoch: 11   Global Step: 120130   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:50:13,847-Speed 5991.00 samples/sec   Loss 6.7221   LearningRate 0.0874   Epoch: 11   Global Step: 120140   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:50:20,690-Speed 5986.60 samples/sec   Loss 6.7740   LearningRate 0.0874   Epoch: 11   Global Step: 120150   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:50:27,554-Speed 5969.17 samples/sec   Loss 6.6609   LearningRate 0.0874   Epoch: 11   Global Step: 120160   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:50:34,424-Speed 5962.85 samples/sec   Loss 6.7256   LearningRate 0.0873   Epoch: 11   Global Step: 120170   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:50:41,266-Speed 5987.99 samples/sec   Loss 6.7437   LearningRate 0.0873   Epoch: 11   Global Step: 120180   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:50:48,140-Speed 5959.82 samples/sec   Loss 6.7021   LearningRate 0.0873   Epoch: 11   Global Step: 120190   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:50:55,011-Speed 5962.37 samples/sec   Loss 6.6540   LearningRate 0.0873   Epoch: 11   Global Step: 120200   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:51:01,853-Speed 5987.52 samples/sec   Loss 6.6627   LearningRate 0.0873   Epoch: 11   Global Step: 120210   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:51:08,741-Speed 5947.62 samples/sec   Loss 6.6633   LearningRate 0.0872   Epoch: 11   Global Step: 120220   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:51:15,638-Speed 5940.45 samples/sec   Loss 6.6813   LearningRate 0.0872   Epoch: 11   Global Step: 120230   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:51:22,500-Speed 5972.22 samples/sec   Loss 6.6668   LearningRate 0.0872   Epoch: 11   Global Step: 120240   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:51:29,368-Speed 5966.86 samples/sec   Loss 6.6556   LearningRate 0.0872   Epoch: 11   Global Step: 120250   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:51:36,219-Speed 5979.39 samples/sec   Loss 6.7666   LearningRate 0.0872   Epoch: 11   Global Step: 120260   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:51:43,078-Speed 5973.45 samples/sec   Loss 6.7470   LearningRate 0.0871   Epoch: 11   Global Step: 120270   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:51:49,975-Speed 5940.61 samples/sec   Loss 6.6484   LearningRate 0.0871   Epoch: 11   Global Step: 120280   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:51:56,823-Speed 5981.81 samples/sec   Loss 6.6590   LearningRate 0.0871   Epoch: 11   Global Step: 120290   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:52:03,678-Speed 5976.76 samples/sec   Loss 6.7353   LearningRate 0.0871   Epoch: 11   Global Step: 120300   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:52:10,577-Speed 5940.69 samples/sec   Loss 6.7034   LearningRate 0.0871   Epoch: 11   Global Step: 120310   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:52:17,458-Speed 5953.38 samples/sec   Loss 6.6891   LearningRate 0.0870   Epoch: 11   Global Step: 120320   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-08 19:52:24,278-Speed 6007.37 samples/sec   Loss 6.6950   LearningRate 0.0870   Epoch: 11   Global Step: 120330   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:52:31,156-Speed 5956.26 samples/sec   Loss 6.6587   LearningRate 0.0870   Epoch: 11   Global Step: 120340   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:52:38,041-Speed 5950.36 samples/sec   Loss 6.7242   LearningRate 0.0870   Epoch: 11   Global Step: 120350   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:52:44,884-Speed 5986.61 samples/sec   Loss 6.7417   LearningRate 0.0870   Epoch: 11   Global Step: 120360   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:52:51,764-Speed 5955.11 samples/sec   Loss 6.7091   LearningRate 0.0869   Epoch: 11   Global Step: 120370   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:52:58,609-Speed 5985.47 samples/sec   Loss 6.7518   LearningRate 0.0869   Epoch: 11   Global Step: 120380   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:53:05,450-Speed 5988.19 samples/sec   Loss 6.6885   LearningRate 0.0869   Epoch: 11   Global Step: 120390   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:53:12,305-Speed 5975.71 samples/sec   Loss 6.6814   LearningRate 0.0869   Epoch: 11   Global Step: 120400   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:53:19,147-Speed 5987.68 samples/sec   Loss 6.6817   LearningRate 0.0869   Epoch: 11   Global Step: 120410   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:53:26,107-Speed 5887.34 samples/sec   Loss 6.7227   LearningRate 0.0868   Epoch: 11   Global Step: 120420   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:53:32,957-Speed 5979.87 samples/sec   Loss 6.7060   LearningRate 0.0868   Epoch: 11   Global Step: 120430   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:53:39,808-Speed 5980.65 samples/sec   Loss 6.7010   LearningRate 0.0868   Epoch: 11   Global Step: 120440   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:53:46,654-Speed 5984.06 samples/sec   Loss 6.7453   LearningRate 0.0868   Epoch: 11   Global Step: 120450   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:53:53,586-Speed 5910.08 samples/sec   Loss 6.7268   LearningRate 0.0868   Epoch: 11   Global Step: 120460   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:54:00,491-Speed 5933.93 samples/sec   Loss 6.7284   LearningRate 0.0867   Epoch: 11   Global Step: 120470   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:54:07,351-Speed 5971.75 samples/sec   Loss 6.7137   LearningRate 0.0867   Epoch: 11   Global Step: 120480   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:54:14,212-Speed 5971.32 samples/sec   Loss 6.7032   LearningRate 0.0867   Epoch: 11   Global Step: 120490   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:54:21,069-Speed 5977.16 samples/sec   Loss 6.7054   LearningRate 0.0867   Epoch: 11   Global Step: 120500   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:54:27,932-Speed 5969.14 samples/sec   Loss 6.7538   LearningRate 0.0867   Epoch: 11   Global Step: 120510   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:54:34,786-Speed 5977.39 samples/sec   Loss 6.7310   LearningRate 0.0866   Epoch: 11   Global Step: 120520   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:54:41,674-Speed 5947.57 samples/sec   Loss 6.6833   LearningRate 0.0866   Epoch: 11   Global Step: 120530   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-08 19:54:48,613-Speed 5903.91 samples/sec   Loss 6.6819   LearningRate 0.0866   Epoch: 11   Global Step: 120540   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-08 19:54:55,460-Speed 5983.12 samples/sec   Loss 6.7270   LearningRate 0.0866   Epoch: 11   Global Step: 120550   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:55:02,310-Speed 5980.85 samples/sec   Loss 6.6558   LearningRate 0.0866   Epoch: 11   Global Step: 120560   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:55:09,162-Speed 5978.74 samples/sec   Loss 6.6935   LearningRate 0.0865   Epoch: 11   Global Step: 120570   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:55:16,007-Speed 5985.80 samples/sec   Loss 6.6785   LearningRate 0.0865   Epoch: 11   Global Step: 120580   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:55:22,859-Speed 5978.16 samples/sec   Loss 6.6520   LearningRate 0.0865   Epoch: 11   Global Step: 120590   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:55:29,703-Speed 5987.35 samples/sec   Loss 6.7126   LearningRate 0.0865   Epoch: 11   Global Step: 120600   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:55:36,556-Speed 5977.84 samples/sec   Loss 6.7114   LearningRate 0.0865   Epoch: 11   Global Step: 120610   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:55:43,410-Speed 5979.34 samples/sec   Loss 6.6960   LearningRate 0.0864   Epoch: 11   Global Step: 120620   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:55:50,269-Speed 5973.11 samples/sec   Loss 6.6778   LearningRate 0.0864   Epoch: 11   Global Step: 120630   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:55:57,127-Speed 5973.66 samples/sec   Loss 6.7045   LearningRate 0.0864   Epoch: 11   Global Step: 120640   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:56:03,984-Speed 5974.83 samples/sec   Loss 6.6808   LearningRate 0.0864   Epoch: 11   Global Step: 120650   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-08 19:56:10,850-Speed 5966.91 samples/sec   Loss 6.6400   LearningRate 0.0864   Epoch: 11   Global Step: 120660   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:56:17,706-Speed 5978.05 samples/sec   Loss 6.7391   LearningRate 0.0863   Epoch: 11   Global Step: 120670   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:56:24,563-Speed 5974.70 samples/sec   Loss 6.6838   LearningRate 0.0863   Epoch: 11   Global Step: 120680   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:56:31,421-Speed 5974.05 samples/sec   Loss 6.7476   LearningRate 0.0863   Epoch: 11   Global Step: 120690   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:56:38,343-Speed 5918.24 samples/sec   Loss 6.6389   LearningRate 0.0863   Epoch: 11   Global Step: 120700   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:56:45,179-Speed 5993.18 samples/sec   Loss 6.6740   LearningRate 0.0863   Epoch: 11   Global Step: 120710   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:56:52,049-Speed 5962.64 samples/sec   Loss 6.6885   LearningRate 0.0862   Epoch: 11   Global Step: 120720   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:56:58,905-Speed 5975.86 samples/sec   Loss 6.6894   LearningRate 0.0862   Epoch: 11   Global Step: 120730   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:57:05,769-Speed 5975.58 samples/sec   Loss 6.6483   LearningRate 0.0862   Epoch: 11   Global Step: 120740   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:57:13,460-Speed 5971.58 samples/sec   Loss 6.6532   LearningRate 0.0862   Epoch: 11   Global Step: 120750   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:57:20,317-Speed 5973.99 samples/sec   Loss 6.7062   LearningRate 0.0862   Epoch: 11   Global Step: 120760   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:57:27,163-Speed 5984.31 samples/sec   Loss 6.6750   LearningRate 0.0861   Epoch: 11   Global Step: 120770   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:57:34,008-Speed 5985.27 samples/sec   Loss 6.6803   LearningRate 0.0861   Epoch: 11   Global Step: 120780   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:57:40,853-Speed 5985.94 samples/sec   Loss 6.6596   LearningRate 0.0861   Epoch: 11   Global Step: 120790   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:57:47,761-Speed 5930.24 samples/sec   Loss 6.6816   LearningRate 0.0861   Epoch: 11   Global Step: 120800   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:57:54,625-Speed 5968.24 samples/sec   Loss 6.6662   LearningRate 0.0861   Epoch: 11   Global Step: 120810   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:58:01,524-Speed 5938.61 samples/sec   Loss 6.6759   LearningRate 0.0860   Epoch: 11   Global Step: 120820   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:58:08,390-Speed 5966.87 samples/sec   Loss 6.7045   LearningRate 0.0860   Epoch: 11   Global Step: 120830   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:58:15,238-Speed 5982.05 samples/sec   Loss 6.7171   LearningRate 0.0860   Epoch: 11   Global Step: 120840   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:58:22,093-Speed 5976.30 samples/sec   Loss 6.6551   LearningRate 0.0860   Epoch: 11   Global Step: 120850   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:58:28,946-Speed 5977.94 samples/sec   Loss 6.6687   LearningRate 0.0860   Epoch: 11   Global Step: 120860   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:58:35,810-Speed 5968.45 samples/sec   Loss 6.6657   LearningRate 0.0859   Epoch: 11   Global Step: 120870   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:58:42,660-Speed 5981.39 samples/sec   Loss 6.5967   LearningRate 0.0859   Epoch: 11   Global Step: 120880   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:58:49,526-Speed 5966.20 samples/sec   Loss 6.6016   LearningRate 0.0859   Epoch: 11   Global Step: 120890   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:58:56,406-Speed 5954.54 samples/sec   Loss 6.6980   LearningRate 0.0859   Epoch: 11   Global Step: 120900   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:59:03,254-Speed 5982.35 samples/sec   Loss 6.7825   LearningRate 0.0859   Epoch: 11   Global Step: 120910   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-08 19:59:10,102-Speed 5985.93 samples/sec   Loss 6.6389   LearningRate 0.0858   Epoch: 11   Global Step: 120920   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 19:59:16,946-Speed 5985.71 samples/sec   Loss 6.6099   LearningRate 0.0858   Epoch: 11   Global Step: 120930   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:59:23,815-Speed 5965.63 samples/sec   Loss 6.7182   LearningRate 0.0858   Epoch: 11   Global Step: 120940   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:59:30,674-Speed 5972.97 samples/sec   Loss 6.6285   LearningRate 0.0858   Epoch: 11   Global Step: 120950   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:59:37,522-Speed 5982.20 samples/sec   Loss 6.5977   LearningRate 0.0858   Epoch: 11   Global Step: 120960   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:59:44,375-Speed 5978.22 samples/sec   Loss 6.6666   LearningRate 0.0857   Epoch: 11   Global Step: 120970   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:59:51,235-Speed 5972.50 samples/sec   Loss 6.6599   LearningRate 0.0857   Epoch: 11   Global Step: 120980   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 19:59:58,111-Speed 5958.00 samples/sec   Loss 6.6285   LearningRate 0.0857   Epoch: 11   Global Step: 120990   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:00:04,968-Speed 5974.72 samples/sec   Loss 6.6410   LearningRate 0.0857   Epoch: 11   Global Step: 121000   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:00:11,817-Speed 5981.09 samples/sec   Loss 6.6132   LearningRate 0.0857   Epoch: 11   Global Step: 121010   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:00:18,699-Speed 5956.08 samples/sec   Loss 6.5942   LearningRate 0.0856   Epoch: 11   Global Step: 121020   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:00:25,568-Speed 5964.24 samples/sec   Loss 6.6770   LearningRate 0.0856   Epoch: 11   Global Step: 121030   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:00:32,431-Speed 5968.99 samples/sec   Loss 6.6402   LearningRate 0.0856   Epoch: 11   Global Step: 121040   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:00:39,296-Speed 5968.57 samples/sec   Loss 6.6708   LearningRate 0.0856   Epoch: 11   Global Step: 121050   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:00:46,157-Speed 5971.66 samples/sec   Loss 6.6455   LearningRate 0.0856   Epoch: 11   Global Step: 121060   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:00:53,021-Speed 5967.76 samples/sec   Loss 6.7324   LearningRate 0.0855   Epoch: 11   Global Step: 121070   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:00:59,886-Speed 5968.12 samples/sec   Loss 6.6840   LearningRate 0.0855   Epoch: 11   Global Step: 121080   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:01:06,734-Speed 5981.95 samples/sec   Loss 6.6920   LearningRate 0.0855   Epoch: 11   Global Step: 121090   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:01:13,586-Speed 5980.50 samples/sec   Loss 6.6717   LearningRate 0.0855   Epoch: 11   Global Step: 121100   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:01:20,447-Speed 5971.78 samples/sec   Loss 6.6951   LearningRate 0.0855   Epoch: 11   Global Step: 121110   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:01:27,310-Speed 5968.45 samples/sec   Loss 6.5960   LearningRate 0.0854   Epoch: 11   Global Step: 121120   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:01:34,171-Speed 5971.59 samples/sec   Loss 6.6012   LearningRate 0.0854   Epoch: 11   Global Step: 121130   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:01:41,064-Speed 5943.34 samples/sec   Loss 6.6535   LearningRate 0.0854   Epoch: 11   Global Step: 121140   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:01:47,969-Speed 5932.88 samples/sec   Loss 6.6521   LearningRate 0.0854   Epoch: 11   Global Step: 121150   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:01:54,829-Speed 5972.33 samples/sec   Loss 6.6589   LearningRate 0.0854   Epoch: 11   Global Step: 121160   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:02:01,690-Speed 5970.76 samples/sec   Loss 6.6193   LearningRate 0.0853   Epoch: 11   Global Step: 121170   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:02:08,577-Speed 5948.16 samples/sec   Loss 6.6637   LearningRate 0.0853   Epoch: 11   Global Step: 121180   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:02:15,436-Speed 5973.17 samples/sec   Loss 6.6804   LearningRate 0.0853   Epoch: 11   Global Step: 121190   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:02:22,302-Speed 5967.28 samples/sec   Loss 6.6272   LearningRate 0.0853   Epoch: 11   Global Step: 121200   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:02:29,177-Speed 5958.49 samples/sec   Loss 6.6324   LearningRate 0.0853   Epoch: 11   Global Step: 121210   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:02:36,028-Speed 5979.75 samples/sec   Loss 6.6032   LearningRate 0.0852   Epoch: 11   Global Step: 121220   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:02:42,898-Speed 5963.20 samples/sec   Loss 6.6461   LearningRate 0.0852   Epoch: 11   Global Step: 121230   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:02:49,780-Speed 5953.22 samples/sec   Loss 6.6495   LearningRate 0.0852   Epoch: 11   Global Step: 121240   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-08 20:02:56,636-Speed 5975.24 samples/sec   Loss 6.6135   LearningRate 0.0852   Epoch: 11   Global Step: 121250   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:03:03,513-Speed 5957.61 samples/sec   Loss 6.6528   LearningRate 0.0852   Epoch: 11   Global Step: 121260   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:03:10,366-Speed 5977.46 samples/sec   Loss 6.6153   LearningRate 0.0851   Epoch: 11   Global Step: 121270   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:03:17,215-Speed 5981.80 samples/sec   Loss 6.5736   LearningRate 0.0851   Epoch: 11   Global Step: 121280   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:03:24,070-Speed 5976.78 samples/sec   Loss 6.5665   LearningRate 0.0851   Epoch: 11   Global Step: 121290   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:03:30,928-Speed 5972.64 samples/sec   Loss 6.7170   LearningRate 0.0851   Epoch: 11   Global Step: 121300   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:03:37,776-Speed 5983.16 samples/sec   Loss 6.5707   LearningRate 0.0851   Epoch: 11   Global Step: 121310   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:03:44,616-Speed 5989.40 samples/sec   Loss 6.5855   LearningRate 0.0850   Epoch: 11   Global Step: 121320   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-08 20:03:51,470-Speed 5977.85 samples/sec   Loss 6.5699   LearningRate 0.0850   Epoch: 11   Global Step: 121330   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-08 20:03:58,326-Speed 5975.18 samples/sec   Loss 6.6195   LearningRate 0.0850   Epoch: 11   Global Step: 121340   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-08 20:04:05,189-Speed 5969.13 samples/sec   Loss 6.6221   LearningRate 0.0850   Epoch: 11   Global Step: 121350   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-08 20:04:12,056-Speed 5966.11 samples/sec   Loss 6.6000   LearningRate 0.0850   Epoch: 11   Global Step: 121360   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-08 20:04:18,903-Speed 5983.17 samples/sec   Loss 6.5891   LearningRate 0.0849   Epoch: 11   Global Step: 121370   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-08 20:04:25,766-Speed 5969.76 samples/sec   Loss 6.7243   LearningRate 0.0849   Epoch: 11   Global Step: 121380   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-08 20:04:32,626-Speed 5972.24 samples/sec   Loss 6.6496   LearningRate 0.0849   Epoch: 11   Global Step: 121390   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-08 20:04:39,482-Speed 5975.51 samples/sec   Loss 6.5538   LearningRate 0.0849   Epoch: 11   Global Step: 121400   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-08 20:04:46,402-Speed 5920.22 samples/sec   Loss 6.5996   LearningRate 0.0849   Epoch: 11   Global Step: 121410   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-08 20:04:53,268-Speed 5966.85 samples/sec   Loss 6.6279   LearningRate 0.0848   Epoch: 11   Global Step: 121420   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:05:00,133-Speed 5967.91 samples/sec   Loss 6.6377   LearningRate 0.0848   Epoch: 11   Global Step: 121430   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:05:06,998-Speed 5967.19 samples/sec   Loss 6.6246   LearningRate 0.0848   Epoch: 11   Global Step: 121440   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:05:13,854-Speed 5976.55 samples/sec   Loss 6.6262   LearningRate 0.0848   Epoch: 11   Global Step: 121450   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:05:20,746-Speed 5945.08 samples/sec   Loss 6.6829   LearningRate 0.0848   Epoch: 11   Global Step: 121460   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:05:27,604-Speed 5973.79 samples/sec   Loss 6.5525   LearningRate 0.0847   Epoch: 11   Global Step: 121470   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:05:34,481-Speed 5956.83 samples/sec   Loss 6.6598   LearningRate 0.0847   Epoch: 11   Global Step: 121480   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:05:41,344-Speed 5969.29 samples/sec   Loss 6.6336   LearningRate 0.0847   Epoch: 11   Global Step: 121490   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:05:48,204-Speed 5971.96 samples/sec   Loss 6.6136   LearningRate 0.0847   Epoch: 11   Global Step: 121500   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:05:55,080-Speed 5958.07 samples/sec   Loss 6.6424   LearningRate 0.0847   Epoch: 11   Global Step: 121510   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:06:01,930-Speed 5980.93 samples/sec   Loss 6.7175   LearningRate 0.0846   Epoch: 11   Global Step: 121520   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:06:08,792-Speed 5970.22 samples/sec   Loss 6.6458   LearningRate 0.0846   Epoch: 11   Global Step: 121530   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:06:15,674-Speed 5953.50 samples/sec   Loss 6.5883   LearningRate 0.0846   Epoch: 11   Global Step: 121540   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:06:22,541-Speed 5965.91 samples/sec   Loss 6.6254   LearningRate 0.0846   Epoch: 11   Global Step: 121550   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:06:29,409-Speed 5964.74 samples/sec   Loss 6.5895   LearningRate 0.0846   Epoch: 11   Global Step: 121560   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:06:36,274-Speed 5967.52 samples/sec   Loss 6.5867   LearningRate 0.0846   Epoch: 11   Global Step: 121570   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:06:43,155-Speed 5954.06 samples/sec   Loss 6.6062   LearningRate 0.0845   Epoch: 11   Global Step: 121580   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:06:50,143-Speed 5863.31 samples/sec   Loss 6.6228   LearningRate 0.0845   Epoch: 11   Global Step: 121590   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:06:57,004-Speed 5971.37 samples/sec   Loss 6.6642   LearningRate 0.0845   Epoch: 11   Global Step: 121600   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:07:03,883-Speed 5955.26 samples/sec   Loss 6.6527   LearningRate 0.0845   Epoch: 11   Global Step: 121610   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:07:10,761-Speed 5956.17 samples/sec   Loss 6.6037   LearningRate 0.0845   Epoch: 11   Global Step: 121620   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:07:17,630-Speed 5963.94 samples/sec   Loss 6.6182   LearningRate 0.0844   Epoch: 11   Global Step: 121630   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:07:24,497-Speed 5965.20 samples/sec   Loss 6.5775   LearningRate 0.0844   Epoch: 11   Global Step: 121640   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:07:31,361-Speed 5969.44 samples/sec   Loss 6.5576   LearningRate 0.0844   Epoch: 11   Global Step: 121650   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:07:38,234-Speed 5959.89 samples/sec   Loss 6.5968   LearningRate 0.0844   Epoch: 11   Global Step: 121660   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:07:45,099-Speed 5967.97 samples/sec   Loss 6.6722   LearningRate 0.0844   Epoch: 11   Global Step: 121670   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:07:51,966-Speed 5965.30 samples/sec   Loss 6.6847   LearningRate 0.0843   Epoch: 11   Global Step: 121680   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:07:58,848-Speed 5954.16 samples/sec   Loss 6.6167   LearningRate 0.0843   Epoch: 11   Global Step: 121690   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:08:05,715-Speed 5966.49 samples/sec   Loss 6.5944   LearningRate 0.0843   Epoch: 11   Global Step: 121700   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:08:12,621-Speed 5931.82 samples/sec   Loss 6.6078   LearningRate 0.0843   Epoch: 11   Global Step: 121710   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:08:19,475-Speed 5977.80 samples/sec   Loss 6.6894   LearningRate 0.0843   Epoch: 11   Global Step: 121720   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:08:26,359-Speed 5951.30 samples/sec   Loss 6.5820   LearningRate 0.0842   Epoch: 11   Global Step: 121730   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:08:33,213-Speed 5977.48 samples/sec   Loss 6.5990   LearningRate 0.0842   Epoch: 11   Global Step: 121740   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:08:40,103-Speed 5946.53 samples/sec   Loss 6.5962   LearningRate 0.0842   Epoch: 11   Global Step: 121750   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:08:47,021-Speed 5922.05 samples/sec   Loss 6.6662   LearningRate 0.0842   Epoch: 11   Global Step: 121760   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-01-08 20:08:53,909-Speed 5948.81 samples/sec   Loss 6.6074   LearningRate 0.0842   Epoch: 11   Global Step: 121770   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:09:00,776-Speed 5967.14 samples/sec   Loss 6.5892   LearningRate 0.0841   Epoch: 11   Global Step: 121780   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:09:07,676-Speed 5937.21 samples/sec   Loss 6.6224   LearningRate 0.0841   Epoch: 11   Global Step: 121790   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:09:14,549-Speed 5960.56 samples/sec   Loss 6.6018   LearningRate 0.0841   Epoch: 11   Global Step: 121800   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:09:21,495-Speed 5898.48 samples/sec   Loss 6.6161   LearningRate 0.0841   Epoch: 11   Global Step: 121810   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:09:28,360-Speed 5967.69 samples/sec   Loss 6.5962   LearningRate 0.0841   Epoch: 11   Global Step: 121820   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:09:35,379-Speed 5836.81 samples/sec   Loss 6.5680   LearningRate 0.0840   Epoch: 11   Global Step: 121830   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:09:42,315-Speed 5906.73 samples/sec   Loss 6.6007   LearningRate 0.0840   Epoch: 11   Global Step: 121840   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:09:49,175-Speed 5971.80 samples/sec   Loss 6.5893   LearningRate 0.0840   Epoch: 11   Global Step: 121850   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:09:56,031-Speed 5975.81 samples/sec   Loss 6.5997   LearningRate 0.0840   Epoch: 11   Global Step: 121860   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:10:02,875-Speed 5986.31 samples/sec   Loss 6.6263   LearningRate 0.0840   Epoch: 11   Global Step: 121870   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:10:09,744-Speed 5963.96 samples/sec   Loss 6.5918   LearningRate 0.0839   Epoch: 11   Global Step: 121880   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:10:16,596-Speed 5979.24 samples/sec   Loss 6.5916   LearningRate 0.0839   Epoch: 11   Global Step: 121890   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:10:23,463-Speed 5965.49 samples/sec   Loss 6.5697   LearningRate 0.0839   Epoch: 11   Global Step: 121900   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:10:30,343-Speed 5954.84 samples/sec   Loss 6.6272   LearningRate 0.0839   Epoch: 11   Global Step: 121910   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:10:37,232-Speed 5947.18 samples/sec   Loss 6.6139   LearningRate 0.0839   Epoch: 11   Global Step: 121920   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:10:44,199-Speed 5879.44 samples/sec   Loss 6.6231   LearningRate 0.0838   Epoch: 11   Global Step: 121930   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:10:51,059-Speed 5972.21 samples/sec   Loss 6.6143   LearningRate 0.0838   Epoch: 11   Global Step: 121940   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:10:57,923-Speed 5968.55 samples/sec   Loss 6.5661   LearningRate 0.0838   Epoch: 11   Global Step: 121950   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:11:04,783-Speed 5971.49 samples/sec   Loss 6.6277   LearningRate 0.0838   Epoch: 11   Global Step: 121960   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:11:11,631-Speed 5982.67 samples/sec   Loss 6.5563   LearningRate 0.0838   Epoch: 11   Global Step: 121970   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:11:18,501-Speed 5963.29 samples/sec   Loss 6.5138   LearningRate 0.0837   Epoch: 11   Global Step: 121980   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:11:25,389-Speed 5947.87 samples/sec   Loss 6.6298   LearningRate 0.0837   Epoch: 11   Global Step: 121990   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:11:32,251-Speed 5972.24 samples/sec   Loss 6.5784   LearningRate 0.0837   Epoch: 11   Global Step: 122000   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:11:39,099-Speed 5983.14 samples/sec   Loss 6.5810   LearningRate 0.0837   Epoch: 11   Global Step: 122010   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:11:45,994-Speed 5941.43 samples/sec   Loss 6.5883   LearningRate 0.0837   Epoch: 11   Global Step: 122020   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:11:52,843-Speed 5981.67 samples/sec   Loss 6.5550   LearningRate 0.0836   Epoch: 11   Global Step: 122030   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:11:59,716-Speed 5960.74 samples/sec   Loss 6.6469   LearningRate 0.0836   Epoch: 11   Global Step: 122040   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:12:06,582-Speed 5968.20 samples/sec   Loss 6.5607   LearningRate 0.0836   Epoch: 11   Global Step: 122050   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:12:13,454-Speed 5961.38 samples/sec   Loss 6.5317   LearningRate 0.0836   Epoch: 11   Global Step: 122060   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:12:20,344-Speed 5946.15 samples/sec   Loss 6.6133   LearningRate 0.0836   Epoch: 11   Global Step: 122070   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:12:27,199-Speed 5975.80 samples/sec   Loss 6.6642   LearningRate 0.0835   Epoch: 11   Global Step: 122080   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:12:34,060-Speed 5971.28 samples/sec   Loss 6.5546   LearningRate 0.0835   Epoch: 11   Global Step: 122090   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:12:40,922-Speed 5971.18 samples/sec   Loss 6.5833   LearningRate 0.0835   Epoch: 11   Global Step: 122100   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:12:47,787-Speed 5967.44 samples/sec   Loss 6.5718   LearningRate 0.0835   Epoch: 11   Global Step: 122110   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:12:54,654-Speed 5965.70 samples/sec   Loss 6.5696   LearningRate 0.0835   Epoch: 11   Global Step: 122120   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:13:01,509-Speed 5976.74 samples/sec   Loss 6.6058   LearningRate 0.0835   Epoch: 11   Global Step: 122130   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:13:08,406-Speed 5940.97 samples/sec   Loss 6.6451   LearningRate 0.0834   Epoch: 11   Global Step: 122140   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:13:15,268-Speed 5969.61 samples/sec   Loss 6.5845   LearningRate 0.0834   Epoch: 11   Global Step: 122150   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:13:22,139-Speed 5965.00 samples/sec   Loss 6.6039   LearningRate 0.0834   Epoch: 11   Global Step: 122160   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:13:29,004-Speed 5967.74 samples/sec   Loss 6.6042   LearningRate 0.0834   Epoch: 11   Global Step: 122170   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:13:35,853-Speed 5981.33 samples/sec   Loss 6.5814   LearningRate 0.0834   Epoch: 11   Global Step: 122180   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:13:42,702-Speed 5981.84 samples/sec   Loss 6.6222   LearningRate 0.0833   Epoch: 11   Global Step: 122190   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:13:49,560-Speed 5975.66 samples/sec   Loss 6.5881   LearningRate 0.0833   Epoch: 11   Global Step: 122200   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:13:56,429-Speed 5964.12 samples/sec   Loss 6.6023   LearningRate 0.0833   Epoch: 11   Global Step: 122210   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:14:03,288-Speed 5973.65 samples/sec   Loss 6.5754   LearningRate 0.0833   Epoch: 11   Global Step: 122220   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:14:10,143-Speed 5976.96 samples/sec   Loss 6.5937   LearningRate 0.0833   Epoch: 11   Global Step: 122230   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:14:17,022-Speed 5954.65 samples/sec   Loss 6.5993   LearningRate 0.0832   Epoch: 11   Global Step: 122240   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:14:23,882-Speed 5972.08 samples/sec   Loss 6.5626   LearningRate 0.0832   Epoch: 11   Global Step: 122250   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:14:30,734-Speed 5979.55 samples/sec   Loss 6.5137   LearningRate 0.0832   Epoch: 11   Global Step: 122260   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:14:37,590-Speed 5975.10 samples/sec   Loss 6.5601   LearningRate 0.0832   Epoch: 11   Global Step: 122270   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:14:44,431-Speed 5988.62 samples/sec   Loss 6.5845   LearningRate 0.0832   Epoch: 11   Global Step: 122280   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-08 20:14:51,283-Speed 5979.07 samples/sec   Loss 6.5892   LearningRate 0.0831   Epoch: 11   Global Step: 122290   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-08 20:14:58,160-Speed 5956.93 samples/sec   Loss 6.6158   LearningRate 0.0831   Epoch: 11   Global Step: 122300   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-08 20:15:05,027-Speed 5966.10 samples/sec   Loss 6.5663   LearningRate 0.0831   Epoch: 11   Global Step: 122310   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-08 20:15:11,895-Speed 5965.50 samples/sec   Loss 6.6785   LearningRate 0.0831   Epoch: 11   Global Step: 122320   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-08 20:15:18,787-Speed 5944.06 samples/sec   Loss 6.5624   LearningRate 0.0831   Epoch: 11   Global Step: 122330   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-08 20:15:25,651-Speed 5968.33 samples/sec   Loss 6.5774   LearningRate 0.0830   Epoch: 11   Global Step: 122340   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-08 20:15:32,535-Speed 5951.33 samples/sec   Loss 6.5395   LearningRate 0.0830   Epoch: 11   Global Step: 122350   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-08 20:15:39,405-Speed 5963.24 samples/sec   Loss 6.5181   LearningRate 0.0830   Epoch: 11   Global Step: 122360   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-08 20:15:46,273-Speed 5965.16 samples/sec   Loss 6.5464   LearningRate 0.0830   Epoch: 11   Global Step: 122370   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-01-08 20:15:53,128-Speed 5976.47 samples/sec   Loss 6.5512   LearningRate 0.0830   Epoch: 11   Global Step: 122380   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:15:59,987-Speed 5973.09 samples/sec   Loss 6.5245   LearningRate 0.0829   Epoch: 11   Global Step: 122390   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:16:06,892-Speed 5933.04 samples/sec   Loss 6.5778   LearningRate 0.0829   Epoch: 11   Global Step: 122400   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:16:13,805-Speed 5926.24 samples/sec   Loss 6.6057   LearningRate 0.0829   Epoch: 11   Global Step: 122410   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:16:20,725-Speed 5920.18 samples/sec   Loss 6.5737   LearningRate 0.0829   Epoch: 11   Global Step: 122420   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:16:27,623-Speed 5939.28 samples/sec   Loss 6.6006   LearningRate 0.0829   Epoch: 11   Global Step: 122430   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:16:34,484-Speed 5971.42 samples/sec   Loss 6.4926   LearningRate 0.0828   Epoch: 11   Global Step: 122440   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:16:41,342-Speed 5973.29 samples/sec   Loss 6.5061   LearningRate 0.0828   Epoch: 11   Global Step: 122450   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:16:48,208-Speed 5967.44 samples/sec   Loss 6.5136   LearningRate 0.0828   Epoch: 11   Global Step: 122460   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:16:55,066-Speed 5973.84 samples/sec   Loss 6.5542   LearningRate 0.0828   Epoch: 11   Global Step: 122470   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:17:01,920-Speed 5977.36 samples/sec   Loss 6.5863   LearningRate 0.0828   Epoch: 11   Global Step: 122480   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:17:08,768-Speed 5982.87 samples/sec   Loss 6.5741   LearningRate 0.0827   Epoch: 11   Global Step: 122490   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:17:15,618-Speed 5979.98 samples/sec   Loss 6.4981   LearningRate 0.0827   Epoch: 11   Global Step: 122500   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:17:22,491-Speed 5961.24 samples/sec   Loss 6.5541   LearningRate 0.0827   Epoch: 11   Global Step: 122510   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:17:29,357-Speed 5966.98 samples/sec   Loss 6.5958   LearningRate 0.0827   Epoch: 11   Global Step: 122520   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:17:36,213-Speed 5975.60 samples/sec   Loss 6.5228   LearningRate 0.0827   Epoch: 11   Global Step: 122530   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:17:43,075-Speed 5969.97 samples/sec   Loss 6.5376   LearningRate 0.0826   Epoch: 11   Global Step: 122540   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:17:49,948-Speed 5961.31 samples/sec   Loss 6.5672   LearningRate 0.0826   Epoch: 11   Global Step: 122550   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:17:56,807-Speed 5973.25 samples/sec   Loss 6.5685   LearningRate 0.0826   Epoch: 11   Global Step: 122560   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:18:03,651-Speed 5985.72 samples/sec   Loss 6.5780   LearningRate 0.0826   Epoch: 11   Global Step: 122570   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:18:10,509-Speed 5973.25 samples/sec   Loss 6.5845   LearningRate 0.0826   Epoch: 11   Global Step: 122580   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:18:17,367-Speed 5974.39 samples/sec   Loss 6.5179   LearningRate 0.0826   Epoch: 11   Global Step: 122590   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:18:24,240-Speed 5960.76 samples/sec   Loss 6.5418   LearningRate 0.0825   Epoch: 11   Global Step: 122600   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:18:31,096-Speed 5975.42 samples/sec   Loss 6.5345   LearningRate 0.0825   Epoch: 11   Global Step: 122610   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:18:37,956-Speed 5971.78 samples/sec   Loss 6.5262   LearningRate 0.0825   Epoch: 11   Global Step: 122620   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:18:44,830-Speed 5959.62 samples/sec   Loss 6.5488   LearningRate 0.0825   Epoch: 11   Global Step: 122630   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:18:52,281-Speed 5499.41 samples/sec   Loss 6.5243   LearningRate 0.0825   Epoch: 11   Global Step: 122640   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:18:59,151-Speed 5963.45 samples/sec   Loss 6.5782   LearningRate 0.0824   Epoch: 11   Global Step: 122650   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:19:06,006-Speed 5976.57 samples/sec   Loss 6.5911   LearningRate 0.0824   Epoch: 11   Global Step: 122660   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:19:12,860-Speed 5977.21 samples/sec   Loss 6.5241   LearningRate 0.0824   Epoch: 11   Global Step: 122670   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:19:19,718-Speed 5973.95 samples/sec   Loss 6.5375   LearningRate 0.0824   Epoch: 11   Global Step: 122680   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:19:26,605-Speed 5950.41 samples/sec   Loss 6.5299   LearningRate 0.0824   Epoch: 11   Global Step: 122690   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:19:33,468-Speed 5969.22 samples/sec   Loss 6.5193   LearningRate 0.0823   Epoch: 11   Global Step: 122700   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:19:40,343-Speed 5958.71 samples/sec   Loss 6.5102   LearningRate 0.0823   Epoch: 11   Global Step: 122710   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:19:47,216-Speed 5960.81 samples/sec   Loss 6.5059   LearningRate 0.0823   Epoch: 11   Global Step: 122720   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:19:54,119-Speed 5934.31 samples/sec   Loss 6.5398   LearningRate 0.0823   Epoch: 11   Global Step: 122730   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:20:01,026-Speed 5930.89 samples/sec   Loss 6.5760   LearningRate 0.0823   Epoch: 11   Global Step: 122740   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:20:07,937-Speed 5928.20 samples/sec   Loss 6.5290   LearningRate 0.0822   Epoch: 11   Global Step: 122750   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:20:14,807-Speed 5963.25 samples/sec   Loss 6.6183   LearningRate 0.0822   Epoch: 11   Global Step: 122760   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:20:21,744-Speed 5906.31 samples/sec   Loss 6.4624   LearningRate 0.0822   Epoch: 11   Global Step: 122770   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:20:28,599-Speed 5976.84 samples/sec   Loss 6.4662   LearningRate 0.0822   Epoch: 11   Global Step: 122780   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:20:35,460-Speed 5970.82 samples/sec   Loss 6.5779   LearningRate 0.0822   Epoch: 11   Global Step: 122790   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:20:42,347-Speed 5949.06 samples/sec   Loss 6.5080   LearningRate 0.0821   Epoch: 11   Global Step: 122800   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:20:49,241-Speed 5942.21 samples/sec   Loss 6.5348   LearningRate 0.0821   Epoch: 11   Global Step: 122810   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:20:56,077-Speed 5992.89 samples/sec   Loss 6.5372   LearningRate 0.0821   Epoch: 11   Global Step: 122820   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:21:02,925-Speed 5981.85 samples/sec   Loss 6.5418   LearningRate 0.0821   Epoch: 11   Global Step: 122830   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:21:09,806-Speed 5955.33 samples/sec   Loss 6.5801   LearningRate 0.0821   Epoch: 11   Global Step: 122840   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:21:16,665-Speed 5972.35 samples/sec   Loss 6.5234   LearningRate 0.0820   Epoch: 11   Global Step: 122850   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:21:23,518-Speed 5980.08 samples/sec   Loss 6.5023   LearningRate 0.0820   Epoch: 11   Global Step: 122860   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:21:30,380-Speed 5969.62 samples/sec   Loss 6.4832   LearningRate 0.0820   Epoch: 11   Global Step: 122870   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:21:37,227-Speed 5983.22 samples/sec   Loss 6.4923   LearningRate 0.0820   Epoch: 11   Global Step: 122880   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:21:44,083-Speed 5975.82 samples/sec   Loss 6.5197   LearningRate 0.0820   Epoch: 11   Global Step: 122890   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:21:50,947-Speed 5968.36 samples/sec   Loss 6.5148   LearningRate 0.0820   Epoch: 11   Global Step: 122900   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:21:57,813-Speed 5966.48 samples/sec   Loss 6.5307   LearningRate 0.0819   Epoch: 11   Global Step: 122910   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:22:04,665-Speed 5978.85 samples/sec   Loss 6.4871   LearningRate 0.0819   Epoch: 11   Global Step: 122920   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:22:11,515-Speed 5980.75 samples/sec   Loss 6.4605   LearningRate 0.0819   Epoch: 11   Global Step: 122930   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:22:18,382-Speed 5965.74 samples/sec   Loss 6.5189   LearningRate 0.0819   Epoch: 11   Global Step: 122940   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:22:25,249-Speed 5966.11 samples/sec   Loss 6.5018   LearningRate 0.0819   Epoch: 11   Global Step: 122950   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:22:32,108-Speed 5972.78 samples/sec   Loss 6.5282   LearningRate 0.0818   Epoch: 11   Global Step: 122960   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:22:38,970-Speed 5970.56 samples/sec   Loss 6.5162   LearningRate 0.0818   Epoch: 11   Global Step: 122970   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:22:45,828-Speed 5972.74 samples/sec   Loss 6.4995   LearningRate 0.0818   Epoch: 11   Global Step: 122980   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:22:52,683-Speed 5976.43 samples/sec   Loss 6.5139   LearningRate 0.0818   Epoch: 11   Global Step: 122990   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:22:59,548-Speed 5967.70 samples/sec   Loss 6.5648   LearningRate 0.0818   Epoch: 11   Global Step: 123000   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:23:06,455-Speed 5931.74 samples/sec   Loss 6.5301   LearningRate 0.0817   Epoch: 11   Global Step: 123010   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:23:13,313-Speed 5972.85 samples/sec   Loss 6.5023   LearningRate 0.0817   Epoch: 11   Global Step: 123020   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:23:20,162-Speed 5982.57 samples/sec   Loss 6.5123   LearningRate 0.0817   Epoch: 11   Global Step: 123030   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:23:27,011-Speed 5980.95 samples/sec   Loss 6.5443   LearningRate 0.0817   Epoch: 11   Global Step: 123040   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:23:33,865-Speed 5977.04 samples/sec   Loss 6.4724   LearningRate 0.0817   Epoch: 11   Global Step: 123050   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:23:40,752-Speed 5948.17 samples/sec   Loss 6.5384   LearningRate 0.0816   Epoch: 11   Global Step: 123060   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:23:47,606-Speed 5977.93 samples/sec   Loss 6.5073   LearningRate 0.0816   Epoch: 11   Global Step: 123070   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:23:54,473-Speed 5966.28 samples/sec   Loss 6.5214   LearningRate 0.0816   Epoch: 11   Global Step: 123080   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:24:01,359-Speed 5949.76 samples/sec   Loss 6.4642   LearningRate 0.0816   Epoch: 11   Global Step: 123090   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:24:08,214-Speed 5976.26 samples/sec   Loss 6.4605   LearningRate 0.0816   Epoch: 11   Global Step: 123100   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-01-08 20:24:15,064-Speed 5980.67 samples/sec   Loss 6.5001   LearningRate 0.0815   Epoch: 11   Global Step: 123110   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:24:21,924-Speed 5972.05 samples/sec   Loss 6.5338   LearningRate 0.0815   Epoch: 11   Global Step: 123120   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:24:28,788-Speed 5967.80 samples/sec   Loss 6.5229   LearningRate 0.0815   Epoch: 11   Global Step: 123130   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-01-08 20:24:35,628-Speed 5990.23 samples/sec   Loss 6.5372   LearningRate 0.0815   Epoch: 11   Global Step: 123140   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-01-08 20:24:42,490-Speed 5969.63 samples/sec   Loss 6.5333   LearningRate 0.0815   Epoch: 11   Global Step: 123150   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-01-08 20:24:49,337-Speed 5983.87 samples/sec   Loss 6.5135   LearningRate 0.0814   Epoch: 11   Global Step: 123160   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-01-08 20:24:56,188-Speed 5981.15 samples/sec   Loss 6.4992   LearningRate 0.0814   Epoch: 11   Global Step: 123170   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-01-08 20:25:03,077-Speed 5946.99 samples/sec   Loss 6.4838   LearningRate 0.0814   Epoch: 11   Global Step: 123180   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-01-08 20:25:09,946-Speed 5964.46 samples/sec   Loss 6.4865   LearningRate 0.0814   Epoch: 11   Global Step: 123190   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-01-08 20:25:16,810-Speed 5968.35 samples/sec   Loss 6.4854   LearningRate 0.0814   Epoch: 11   Global Step: 123200   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-01-08 20:25:23,659-Speed 5981.05 samples/sec   Loss 6.5322   LearningRate 0.0813   Epoch: 11   Global Step: 123210   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-01-08 20:25:30,505-Speed 5985.07 samples/sec   Loss 6.5253   LearningRate 0.0813   Epoch: 11   Global Step: 123220   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-01-08 20:25:37,398-Speed 5942.98 samples/sec   Loss 6.5683   LearningRate 0.0813   Epoch: 11   Global Step: 123230   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-01-08 20:25:44,310-Speed 5927.24 samples/sec   Loss 6.5112   LearningRate 0.0813   Epoch: 11   Global Step: 123240   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-08 20:25:51,182-Speed 5961.47 samples/sec   Loss 6.4498   LearningRate 0.0813   Epoch: 11   Global Step: 123250   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-08 20:25:58,041-Speed 5972.71 samples/sec   Loss 6.4226   LearningRate 0.0813   Epoch: 11   Global Step: 123260   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-08 20:26:04,926-Speed 5959.36 samples/sec   Loss 6.5616   LearningRate 0.0812   Epoch: 11   Global Step: 123270   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-08 20:26:11,787-Speed 5971.67 samples/sec   Loss 6.4837   LearningRate 0.0812   Epoch: 11   Global Step: 123280   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-08 20:26:18,663-Speed 5958.14 samples/sec   Loss 6.5168   LearningRate 0.0812   Epoch: 11   Global Step: 123290   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-08 20:26:25,910-Speed 5653.43 samples/sec   Loss 6.5061   LearningRate 0.0812   Epoch: 11   Global Step: 123300   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-08 20:26:32,772-Speed 5971.07 samples/sec   Loss 6.5066   LearningRate 0.0812   Epoch: 11   Global Step: 123310   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-08 20:26:39,637-Speed 5967.72 samples/sec   Loss 6.4911   LearningRate 0.0811   Epoch: 11   Global Step: 123320   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-08 20:26:46,510-Speed 5960.40 samples/sec   Loss 6.4996   LearningRate 0.0811   Epoch: 11   Global Step: 123330   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-08 20:26:53,381-Speed 5963.01 samples/sec   Loss 6.4891   LearningRate 0.0811   Epoch: 11   Global Step: 123340   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:27:00,242-Speed 5970.89 samples/sec   Loss 6.4884   LearningRate 0.0811   Epoch: 11   Global Step: 123350   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:27:07,098-Speed 5976.19 samples/sec   Loss 6.4747   LearningRate 0.0811   Epoch: 11   Global Step: 123360   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:27:13,981-Speed 5952.26 samples/sec   Loss 6.5592   LearningRate 0.0810   Epoch: 11   Global Step: 123370   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:27:20,844-Speed 5971.27 samples/sec   Loss 6.4917   LearningRate 0.0810   Epoch: 11   Global Step: 123380   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:27:27,709-Speed 5967.21 samples/sec   Loss 6.5502   LearningRate 0.0810   Epoch: 11   Global Step: 123390   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:27:34,554-Speed 5986.87 samples/sec   Loss 6.5543   LearningRate 0.0810   Epoch: 11   Global Step: 123400   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:27:41,399-Speed 5984.65 samples/sec   Loss 6.5272   LearningRate 0.0810   Epoch: 11   Global Step: 123410   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:27:48,243-Speed 5986.06 samples/sec   Loss 6.4718   LearningRate 0.0809   Epoch: 11   Global Step: 123420   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:27:55,119-Speed 5958.07 samples/sec   Loss 6.4466   LearningRate 0.0809   Epoch: 11   Global Step: 123430   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:28:01,988-Speed 5964.99 samples/sec   Loss 6.4316   LearningRate 0.0809   Epoch: 11   Global Step: 123440   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:28:08,857-Speed 5963.91 samples/sec   Loss 6.4572   LearningRate 0.0809   Epoch: 11   Global Step: 123450   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:28:15,712-Speed 5978.15 samples/sec   Loss 6.4679   LearningRate 0.0809   Epoch: 11   Global Step: 123460   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:28:22,573-Speed 5970.91 samples/sec   Loss 6.4984   LearningRate 0.0808   Epoch: 11   Global Step: 123470   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:28:29,427-Speed 5977.32 samples/sec   Loss 6.5109   LearningRate 0.0808   Epoch: 11   Global Step: 123480   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:28:36,281-Speed 5977.31 samples/sec   Loss 6.4643   LearningRate 0.0808   Epoch: 11   Global Step: 123490   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:28:43,149-Speed 5965.91 samples/sec   Loss 6.5155   LearningRate 0.0808   Epoch: 11   Global Step: 123500   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:28:50,001-Speed 5977.77 samples/sec   Loss 6.4670   LearningRate 0.0808   Epoch: 11   Global Step: 123510   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:28:56,855-Speed 5977.37 samples/sec   Loss 6.4995   LearningRate 0.0808   Epoch: 11   Global Step: 123520   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:29:03,708-Speed 5977.85 samples/sec   Loss 6.4560   LearningRate 0.0807   Epoch: 11   Global Step: 123530   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:29:10,571-Speed 5969.74 samples/sec   Loss 6.4542   LearningRate 0.0807   Epoch: 11   Global Step: 123540   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:29:17,430-Speed 5972.63 samples/sec   Loss 6.5073   LearningRate 0.0807   Epoch: 11   Global Step: 123550   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:29:24,286-Speed 5975.90 samples/sec   Loss 6.4787   LearningRate 0.0807   Epoch: 11   Global Step: 123560   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:29:31,136-Speed 5980.52 samples/sec   Loss 6.4446   LearningRate 0.0807   Epoch: 11   Global Step: 123570   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:29:37,990-Speed 5976.67 samples/sec   Loss 6.4918   LearningRate 0.0806   Epoch: 11   Global Step: 123580   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:29:44,847-Speed 5975.48 samples/sec   Loss 6.5315   LearningRate 0.0806   Epoch: 11   Global Step: 123590   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:29:51,690-Speed 5987.33 samples/sec   Loss 6.5031   LearningRate 0.0806   Epoch: 11   Global Step: 123600   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:29:58,553-Speed 5970.39 samples/sec   Loss 6.4611   LearningRate 0.0806   Epoch: 11   Global Step: 123610   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:30:05,402-Speed 5980.50 samples/sec   Loss 6.4579   LearningRate 0.0806   Epoch: 11   Global Step: 123620   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:30:12,242-Speed 5990.52 samples/sec   Loss 6.4846   LearningRate 0.0805   Epoch: 11   Global Step: 123630   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:30:19,101-Speed 5974.60 samples/sec   Loss 6.4845   LearningRate 0.0805   Epoch: 11   Global Step: 123640   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:30:25,964-Speed 5970.12 samples/sec   Loss 6.4988   LearningRate 0.0805   Epoch: 11   Global Step: 123650   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:30:32,809-Speed 5985.02 samples/sec   Loss 6.4795   LearningRate 0.0805   Epoch: 11   Global Step: 123660   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:30:39,684-Speed 5958.55 samples/sec   Loss 6.4057   LearningRate 0.0805   Epoch: 11   Global Step: 123670   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:30:46,567-Speed 5952.37 samples/sec   Loss 6.5160   LearningRate 0.0804   Epoch: 11   Global Step: 123680   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:30:53,414-Speed 5983.50 samples/sec   Loss 6.4910   LearningRate 0.0804   Epoch: 11   Global Step: 123690   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:31:00,268-Speed 5977.46 samples/sec   Loss 6.5079   LearningRate 0.0804   Epoch: 11   Global Step: 123700   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:31:07,129-Speed 5971.79 samples/sec   Loss 6.4727   LearningRate 0.0804   Epoch: 11   Global Step: 123710   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:31:13,970-Speed 5988.76 samples/sec   Loss 6.5018   LearningRate 0.0804   Epoch: 11   Global Step: 123720   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-08 20:31:20,819-Speed 5981.62 samples/sec   Loss 6.4739   LearningRate 0.0803   Epoch: 11   Global Step: 123730   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-08 20:31:27,689-Speed 5962.57 samples/sec   Loss 6.5236   LearningRate 0.0803   Epoch: 11   Global Step: 123740   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-08 20:31:34,532-Speed 5986.61 samples/sec   Loss 6.4769   LearningRate 0.0803   Epoch: 11   Global Step: 123750   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-08 20:31:41,420-Speed 5947.64 samples/sec   Loss 6.4647   LearningRate 0.0803   Epoch: 11   Global Step: 123760   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-08 20:31:48,279-Speed 5973.10 samples/sec   Loss 6.4626   LearningRate 0.0803   Epoch: 11   Global Step: 123770   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-08 20:31:55,124-Speed 5984.73 samples/sec   Loss 6.4455   LearningRate 0.0803   Epoch: 11   Global Step: 123780   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-08 20:32:01,969-Speed 5984.85 samples/sec   Loss 6.5253   LearningRate 0.0802   Epoch: 11   Global Step: 123790   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-08 20:32:08,807-Speed 5990.98 samples/sec   Loss 6.4017   LearningRate 0.0802   Epoch: 11   Global Step: 123800   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-08 20:32:15,669-Speed 5970.76 samples/sec   Loss 6.5033   LearningRate 0.0802   Epoch: 11   Global Step: 123810   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-08 20:32:22,515-Speed 5984.38 samples/sec   Loss 6.4222   LearningRate 0.0802   Epoch: 11   Global Step: 123820   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:32:29,384-Speed 5963.16 samples/sec   Loss 6.5015   LearningRate 0.0802   Epoch: 11   Global Step: 123830   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:32:36,230-Speed 5984.88 samples/sec   Loss 6.4491   LearningRate 0.0801   Epoch: 11   Global Step: 123840   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:32:43,098-Speed 5965.26 samples/sec   Loss 6.4178   LearningRate 0.0801   Epoch: 11   Global Step: 123850   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:32:49,946-Speed 5982.51 samples/sec   Loss 6.4768   LearningRate 0.0801   Epoch: 11   Global Step: 123860   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:32:56,825-Speed 5955.49 samples/sec   Loss 6.4890   LearningRate 0.0801   Epoch: 11   Global Step: 123870   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:33:03,712-Speed 5948.59 samples/sec   Loss 6.4508   LearningRate 0.0801   Epoch: 11   Global Step: 123880   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:33:10,575-Speed 5969.21 samples/sec   Loss 6.4781   LearningRate 0.0800   Epoch: 11   Global Step: 123890   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:33:17,445-Speed 5963.76 samples/sec   Loss 6.4569   LearningRate 0.0800   Epoch: 11   Global Step: 123900   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:33:24,309-Speed 5969.02 samples/sec   Loss 6.5000   LearningRate 0.0800   Epoch: 11   Global Step: 123910   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:33:31,158-Speed 5980.87 samples/sec   Loss 6.4776   LearningRate 0.0800   Epoch: 11   Global Step: 123920   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:33:38,013-Speed 5979.73 samples/sec   Loss 6.5089   LearningRate 0.0800   Epoch: 11   Global Step: 123930   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:33:44,892-Speed 5954.89 samples/sec   Loss 6.4916   LearningRate 0.0799   Epoch: 11   Global Step: 123940   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:33:51,769-Speed 5959.12 samples/sec   Loss 6.5089   LearningRate 0.0799   Epoch: 11   Global Step: 123950   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:33:58,621-Speed 5978.96 samples/sec   Loss 6.4810   LearningRate 0.0799   Epoch: 11   Global Step: 123960   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:34:05,476-Speed 5976.10 samples/sec   Loss 6.4130   LearningRate 0.0799   Epoch: 11   Global Step: 123970   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:34:12,316-Speed 5990.33 samples/sec   Loss 6.4674   LearningRate 0.0799   Epoch: 11   Global Step: 123980   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:34:19,180-Speed 5968.89 samples/sec   Loss 6.4768   LearningRate 0.0798   Epoch: 11   Global Step: 123990   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:34:26,027-Speed 5983.09 samples/sec   Loss 6.4278   LearningRate 0.0798   Epoch: 11   Global Step: 124000   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:34:32,929-Speed 5935.26 samples/sec   Loss 6.4107   LearningRate 0.0798   Epoch: 11   Global Step: 124010   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:34:39,782-Speed 5978.13 samples/sec   Loss 6.4713   LearningRate 0.0798   Epoch: 11   Global Step: 124020   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:34:46,638-Speed 5977.57 samples/sec   Loss 6.4921   LearningRate 0.0798   Epoch: 11   Global Step: 124030   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:34:53,485-Speed 5983.33 samples/sec   Loss 6.4252   LearningRate 0.0798   Epoch: 11   Global Step: 124040   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:35:00,337-Speed 5979.28 samples/sec   Loss 6.4256   LearningRate 0.0797   Epoch: 11   Global Step: 124050   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:35:07,199-Speed 5970.18 samples/sec   Loss 6.4601   LearningRate 0.0797   Epoch: 11   Global Step: 124060   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:35:14,068-Speed 5965.69 samples/sec   Loss 6.4400   LearningRate 0.0797   Epoch: 11   Global Step: 124070   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:35:20,943-Speed 5960.12 samples/sec   Loss 6.4235   LearningRate 0.0797   Epoch: 11   Global Step: 124080   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:35:27,800-Speed 5974.13 samples/sec   Loss 6.3719   LearningRate 0.0797   Epoch: 11   Global Step: 124090   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:35:34,650-Speed 5981.19 samples/sec   Loss 6.4711   LearningRate 0.0796   Epoch: 11   Global Step: 124100   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:35:41,499-Speed 5981.39 samples/sec   Loss 6.4384   LearningRate 0.0796   Epoch: 11   Global Step: 124110   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:35:48,378-Speed 5955.33 samples/sec   Loss 6.4508   LearningRate 0.0796   Epoch: 11   Global Step: 124120   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:35:55,239-Speed 5971.17 samples/sec   Loss 6.4714   LearningRate 0.0796   Epoch: 11   Global Step: 124130   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:36:02,102-Speed 5969.20 samples/sec   Loss 6.4112   LearningRate 0.0796   Epoch: 11   Global Step: 124140   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:36:08,954-Speed 5979.16 samples/sec   Loss 6.4424   LearningRate 0.0795   Epoch: 11   Global Step: 124150   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:36:15,834-Speed 5955.03 samples/sec   Loss 6.4543   LearningRate 0.0795   Epoch: 11   Global Step: 124160   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:36:22,698-Speed 5969.14 samples/sec   Loss 6.4009   LearningRate 0.0795   Epoch: 11   Global Step: 124170   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:36:29,549-Speed 5980.26 samples/sec   Loss 6.4244   LearningRate 0.0795   Epoch: 11   Global Step: 124180   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:36:36,462-Speed 5925.42 samples/sec   Loss 6.4568   LearningRate 0.0795   Epoch: 11   Global Step: 124190   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:36:43,330-Speed 5965.40 samples/sec   Loss 6.3734   LearningRate 0.0794   Epoch: 11   Global Step: 124200   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:36:50,187-Speed 5974.93 samples/sec   Loss 6.4469   LearningRate 0.0794   Epoch: 11   Global Step: 124210   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:36:57,047-Speed 5973.62 samples/sec   Loss 6.4855   LearningRate 0.0794   Epoch: 11   Global Step: 124220   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:37:03,907-Speed 5970.77 samples/sec   Loss 6.4053   LearningRate 0.0794   Epoch: 11   Global Step: 124230   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:37:10,762-Speed 5978.33 samples/sec   Loss 6.3922   LearningRate 0.0794   Epoch: 11   Global Step: 124240   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:37:17,613-Speed 5979.71 samples/sec   Loss 6.4488   LearningRate 0.0794   Epoch: 11   Global Step: 124250   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:37:24,471-Speed 5973.88 samples/sec   Loss 6.5116   LearningRate 0.0793   Epoch: 11   Global Step: 124260   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:37:31,333-Speed 5971.04 samples/sec   Loss 6.4920   LearningRate 0.0793   Epoch: 11   Global Step: 124270   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:37:38,208-Speed 5958.36 samples/sec   Loss 6.4228   LearningRate 0.0793   Epoch: 11   Global Step: 124280   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-08 20:37:45,049-Speed 5988.88 samples/sec   Loss 6.4470   LearningRate 0.0793   Epoch: 11   Global Step: 124290   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:37:51,935-Speed 5949.31 samples/sec   Loss 6.4177   LearningRate 0.0793   Epoch: 11   Global Step: 124300   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:37:58,795-Speed 5974.42 samples/sec   Loss 6.4915   LearningRate 0.0792   Epoch: 11   Global Step: 124310   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:38:05,649-Speed 5976.99 samples/sec   Loss 6.4292   LearningRate 0.0792   Epoch: 11   Global Step: 124320   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:38:12,483-Speed 5994.50 samples/sec   Loss 6.4101   LearningRate 0.0792   Epoch: 11   Global Step: 124330   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:38:19,326-Speed 5987.30 samples/sec   Loss 6.4436   LearningRate 0.0792   Epoch: 11   Global Step: 124340   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:38:26,173-Speed 5983.46 samples/sec   Loss 6.4127   LearningRate 0.0792   Epoch: 11   Global Step: 124350   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:38:33,026-Speed 5978.09 samples/sec   Loss 6.4283   LearningRate 0.0791   Epoch: 11   Global Step: 124360   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:38:39,880-Speed 5976.91 samples/sec   Loss 6.4643   LearningRate 0.0791   Epoch: 11   Global Step: 124370   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:38:46,735-Speed 5975.80 samples/sec   Loss 6.4669   LearningRate 0.0791   Epoch: 11   Global Step: 124380   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:38:53,599-Speed 5969.07 samples/sec   Loss 6.4294   LearningRate 0.0791   Epoch: 11   Global Step: 124390   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:39:00,467-Speed 5965.61 samples/sec   Loss 6.4668   LearningRate 0.0791   Epoch: 11   Global Step: 124400   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:39:07,321-Speed 5977.05 samples/sec   Loss 6.4475   LearningRate 0.0790   Epoch: 11   Global Step: 124410   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:39:14,188-Speed 5965.45 samples/sec   Loss 6.4436   LearningRate 0.0790   Epoch: 11   Global Step: 124420   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:39:21,035-Speed 5982.97 samples/sec   Loss 6.4618   LearningRate 0.0790   Epoch: 11   Global Step: 124430   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:39:45,397-Speed 1681.47 samples/sec   Loss 6.4144   LearningRate 0.0790   Epoch: 12   Global Step: 124440   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:39:52,210-Speed 6013.29 samples/sec   Loss 6.4372   LearningRate 0.0790   Epoch: 12   Global Step: 124450   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:39:59,035-Speed 6005.09 samples/sec   Loss 6.3834   LearningRate 0.0790   Epoch: 12   Global Step: 124460   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:40:05,869-Speed 5994.68 samples/sec   Loss 6.4191   LearningRate 0.0789   Epoch: 12   Global Step: 124470   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:40:12,710-Speed 5988.23 samples/sec   Loss 6.4641   LearningRate 0.0789   Epoch: 12   Global Step: 124480   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:40:19,593-Speed 5974.17 samples/sec   Loss 6.4341   LearningRate 0.0789   Epoch: 12   Global Step: 124490   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:40:26,454-Speed 5974.78 samples/sec   Loss 6.3524   LearningRate 0.0789   Epoch: 12   Global Step: 124500   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:40:33,304-Speed 5980.96 samples/sec   Loss 6.3986   LearningRate 0.0789   Epoch: 12   Global Step: 124510   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:40:40,146-Speed 5987.35 samples/sec   Loss 6.4111   LearningRate 0.0788   Epoch: 12   Global Step: 124520   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:40:47,017-Speed 5976.07 samples/sec   Loss 6.3288   LearningRate 0.0788   Epoch: 12   Global Step: 124530   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:40:53,878-Speed 5971.16 samples/sec   Loss 6.3735   LearningRate 0.0788   Epoch: 12   Global Step: 124540   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:41:00,736-Speed 5973.39 samples/sec   Loss 6.4148   LearningRate 0.0788   Epoch: 12   Global Step: 124550   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:41:07,607-Speed 5962.29 samples/sec   Loss 6.4004   LearningRate 0.0788   Epoch: 12   Global Step: 124560   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:41:14,461-Speed 5977.42 samples/sec   Loss 6.4548   LearningRate 0.0787   Epoch: 12   Global Step: 124570   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:41:21,315-Speed 5977.80 samples/sec   Loss 6.3865   LearningRate 0.0787   Epoch: 12   Global Step: 124580   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:41:28,177-Speed 5970.44 samples/sec   Loss 6.4155   LearningRate 0.0787   Epoch: 12   Global Step: 124590   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:41:35,024-Speed 5982.66 samples/sec   Loss 6.3741   LearningRate 0.0787   Epoch: 12   Global Step: 124600   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:41:41,887-Speed 5970.02 samples/sec   Loss 6.3847   LearningRate 0.0787   Epoch: 12   Global Step: 124610   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:41:48,769-Speed 5953.25 samples/sec   Loss 6.3834   LearningRate 0.0786   Epoch: 12   Global Step: 124620   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:41:55,651-Speed 5952.91 samples/sec   Loss 6.3818   LearningRate 0.0786   Epoch: 12   Global Step: 124630   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:42:02,514-Speed 5969.14 samples/sec   Loss 6.4047   LearningRate 0.0786   Epoch: 12   Global Step: 124640   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:42:09,377-Speed 5969.45 samples/sec   Loss 6.4369   LearningRate 0.0786   Epoch: 12   Global Step: 124650   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:42:16,230-Speed 5978.04 samples/sec   Loss 6.3991   LearningRate 0.0786   Epoch: 12   Global Step: 124660   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:42:23,073-Speed 5986.69 samples/sec   Loss 6.3777   LearningRate 0.0786   Epoch: 12   Global Step: 124670   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:42:29,959-Speed 5950.27 samples/sec   Loss 6.4503   LearningRate 0.0785   Epoch: 12   Global Step: 124680   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:42:38,713-Speed 4679.34 samples/sec   Loss 6.4043   LearningRate 0.0785   Epoch: 12   Global Step: 124690   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:42:45,552-Speed 5990.00 samples/sec   Loss 6.4087   LearningRate 0.0785   Epoch: 12   Global Step: 124700   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:42:52,413-Speed 5971.39 samples/sec   Loss 6.3385   LearningRate 0.0785   Epoch: 12   Global Step: 124710   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:42:59,277-Speed 5968.81 samples/sec   Loss 6.4401   LearningRate 0.0785   Epoch: 12   Global Step: 124720   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:43:06,129-Speed 5979.30 samples/sec   Loss 6.3022   LearningRate 0.0784   Epoch: 12   Global Step: 124730   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:43:12,981-Speed 5977.87 samples/sec   Loss 6.3650   LearningRate 0.0784   Epoch: 12   Global Step: 124740   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:43:19,872-Speed 5945.86 samples/sec   Loss 6.4242   LearningRate 0.0784   Epoch: 12   Global Step: 124750   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:43:26,728-Speed 5975.75 samples/sec   Loss 6.3688   LearningRate 0.0784   Epoch: 12   Global Step: 124760   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:43:33,582-Speed 5976.71 samples/sec   Loss 6.3707   LearningRate 0.0784   Epoch: 12   Global Step: 124770   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:43:40,438-Speed 5975.79 samples/sec   Loss 6.3790   LearningRate 0.0783   Epoch: 12   Global Step: 124780   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:43:47,299-Speed 5970.99 samples/sec   Loss 6.3657   LearningRate 0.0783   Epoch: 12   Global Step: 124790   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:43:54,156-Speed 5974.27 samples/sec   Loss 6.4527   LearningRate 0.0783   Epoch: 12   Global Step: 124800   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:44:01,024-Speed 5965.30 samples/sec   Loss 6.4638   LearningRate 0.0783   Epoch: 12   Global Step: 124810   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:44:07,882-Speed 5974.17 samples/sec   Loss 6.4034   LearningRate 0.0783   Epoch: 12   Global Step: 124820   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:44:14,740-Speed 5973.98 samples/sec   Loss 6.4389   LearningRate 0.0782   Epoch: 12   Global Step: 124830   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:44:21,596-Speed 5976.02 samples/sec   Loss 6.4383   LearningRate 0.0782   Epoch: 12   Global Step: 124840   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:44:28,450-Speed 5977.74 samples/sec   Loss 6.4068   LearningRate 0.0782   Epoch: 12   Global Step: 124850   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:44:35,307-Speed 5973.67 samples/sec   Loss 6.3550   LearningRate 0.0782   Epoch: 12   Global Step: 124860   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:44:42,153-Speed 5985.08 samples/sec   Loss 6.3942   LearningRate 0.0782   Epoch: 12   Global Step: 124870   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:44:49,023-Speed 5968.16 samples/sec   Loss 6.3792   LearningRate 0.0782   Epoch: 12   Global Step: 124880   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:44:55,888-Speed 5967.85 samples/sec   Loss 6.4010   LearningRate 0.0781   Epoch: 12   Global Step: 124890   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:45:02,738-Speed 5981.35 samples/sec   Loss 6.3912   LearningRate 0.0781   Epoch: 12   Global Step: 124900   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:45:09,599-Speed 5972.68 samples/sec   Loss 6.4212   LearningRate 0.0781   Epoch: 12   Global Step: 124910   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:45:16,457-Speed 5973.59 samples/sec   Loss 6.3101   LearningRate 0.0781   Epoch: 12   Global Step: 124920   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:45:23,305-Speed 5982.61 samples/sec   Loss 6.4492   LearningRate 0.0781   Epoch: 12   Global Step: 124930   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:45:30,153-Speed 5982.92 samples/sec   Loss 6.4590   LearningRate 0.0780   Epoch: 12   Global Step: 124940   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:45:37,007-Speed 5976.13 samples/sec   Loss 6.3979   LearningRate 0.0780   Epoch: 12   Global Step: 124950   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:45:43,852-Speed 5985.27 samples/sec   Loss 6.3832   LearningRate 0.0780   Epoch: 12   Global Step: 124960   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:45:50,702-Speed 5982.30 samples/sec   Loss 6.3757   LearningRate 0.0780   Epoch: 12   Global Step: 124970   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:45:57,551-Speed 5981.30 samples/sec   Loss 6.3777   LearningRate 0.0780   Epoch: 12   Global Step: 124980   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:46:04,432-Speed 5953.92 samples/sec   Loss 6.4095   LearningRate 0.0779   Epoch: 12   Global Step: 124990   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:46:11,285-Speed 5980.93 samples/sec   Loss 6.3581   LearningRate 0.0779   Epoch: 12   Global Step: 125000   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:46:38,272-[lfw][125000]XNorm: 23.898443
Training: 2022-01-08 20:46:38,272-[lfw][125000]Accuracy-Flip: 0.99750+-0.00291
Training: 2022-01-08 20:46:38,273-[lfw][125000]Accuracy-Highest: 0.99783
Training: 2022-01-08 20:47:09,337-[cfp_fp][125000]XNorm: 21.006631
Training: 2022-01-08 20:47:09,338-[cfp_fp][125000]Accuracy-Flip: 0.98571+-0.00599
Training: 2022-01-08 20:47:09,339-[cfp_fp][125000]Accuracy-Highest: 0.98571
Training: 2022-01-08 20:47:36,052-[agedb_30][125000]XNorm: 23.742593
Training: 2022-01-08 20:47:36,053-[agedb_30][125000]Accuracy-Flip: 0.97200+-0.00726
Training: 2022-01-08 20:47:36,053-[agedb_30][125000]Accuracy-Highest: 0.97383
Training: 2022-01-08 20:47:42,915-Speed 447.02 samples/sec   Loss 6.3715   LearningRate 0.0779   Epoch: 12   Global Step: 125010   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:47:49,766-Speed 5979.55 samples/sec   Loss 6.2990   LearningRate 0.0779   Epoch: 12   Global Step: 125020   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:47:56,630-Speed 5968.94 samples/sec   Loss 6.4098   LearningRate 0.0779   Epoch: 12   Global Step: 125030   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-08 20:48:03,484-Speed 5978.20 samples/sec   Loss 6.4046   LearningRate 0.0779   Epoch: 12   Global Step: 125040   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-08 20:48:10,356-Speed 5961.25 samples/sec   Loss 6.4077   LearningRate 0.0778   Epoch: 12   Global Step: 125050   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:48:17,220-Speed 5968.90 samples/sec   Loss 6.3456   LearningRate 0.0778   Epoch: 12   Global Step: 125060   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:48:24,086-Speed 5966.79 samples/sec   Loss 6.3168   LearningRate 0.0778   Epoch: 12   Global Step: 125070   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:48:30,960-Speed 5959.17 samples/sec   Loss 6.3826   LearningRate 0.0778   Epoch: 12   Global Step: 125080   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:48:37,871-Speed 5931.33 samples/sec   Loss 6.3590   LearningRate 0.0778   Epoch: 12   Global Step: 125090   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:48:44,782-Speed 5928.31 samples/sec   Loss 6.4674   LearningRate 0.0777   Epoch: 12   Global Step: 125100   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:48:51,671-Speed 5946.72 samples/sec   Loss 6.3187   LearningRate 0.0777   Epoch: 12   Global Step: 125110   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:48:58,540-Speed 5964.48 samples/sec   Loss 6.3556   LearningRate 0.0777   Epoch: 12   Global Step: 125120   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:49:05,435-Speed 5942.04 samples/sec   Loss 6.3010   LearningRate 0.0777   Epoch: 12   Global Step: 125130   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:49:12,304-Speed 5963.70 samples/sec   Loss 6.3598   LearningRate 0.0777   Epoch: 12   Global Step: 125140   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:49:19,177-Speed 5961.32 samples/sec   Loss 6.3267   LearningRate 0.0776   Epoch: 12   Global Step: 125150   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:49:26,036-Speed 5972.95 samples/sec   Loss 6.3867   LearningRate 0.0776   Epoch: 12   Global Step: 125160   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:49:32,891-Speed 5976.16 samples/sec   Loss 6.3754   LearningRate 0.0776   Epoch: 12   Global Step: 125170   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:49:39,749-Speed 5973.92 samples/sec   Loss 6.4115   LearningRate 0.0776   Epoch: 12   Global Step: 125180   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:49:46,618-Speed 5964.06 samples/sec   Loss 6.3549   LearningRate 0.0776   Epoch: 12   Global Step: 125190   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:49:53,476-Speed 5973.70 samples/sec   Loss 6.3190   LearningRate 0.0775   Epoch: 12   Global Step: 125200   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:50:00,347-Speed 5962.52 samples/sec   Loss 6.4004   LearningRate 0.0775   Epoch: 12   Global Step: 125210   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:50:07,222-Speed 5959.62 samples/sec   Loss 6.3466   LearningRate 0.0775   Epoch: 12   Global Step: 125220   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:50:14,073-Speed 5979.69 samples/sec   Loss 6.4148   LearningRate 0.0775   Epoch: 12   Global Step: 125230   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:50:20,938-Speed 5967.75 samples/sec   Loss 6.3940   LearningRate 0.0775   Epoch: 12   Global Step: 125240   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:50:27,802-Speed 5971.02 samples/sec   Loss 6.3507   LearningRate 0.0775   Epoch: 12   Global Step: 125250   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:50:34,674-Speed 5961.98 samples/sec   Loss 6.3789   LearningRate 0.0774   Epoch: 12   Global Step: 125260   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:50:41,526-Speed 5979.04 samples/sec   Loss 6.3667   LearningRate 0.0774   Epoch: 12   Global Step: 125270   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:50:48,375-Speed 5981.47 samples/sec   Loss 6.3881   LearningRate 0.0774   Epoch: 12   Global Step: 125280   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:50:55,232-Speed 5973.48 samples/sec   Loss 6.4077   LearningRate 0.0774   Epoch: 12   Global Step: 125290   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:51:02,082-Speed 5980.74 samples/sec   Loss 6.4146   LearningRate 0.0774   Epoch: 12   Global Step: 125300   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:51:08,959-Speed 5959.85 samples/sec   Loss 6.3408   LearningRate 0.0773   Epoch: 12   Global Step: 125310   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:51:15,817-Speed 5973.15 samples/sec   Loss 6.3598   LearningRate 0.0773   Epoch: 12   Global Step: 125320   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-08 20:51:22,673-Speed 5975.92 samples/sec   Loss 6.3324   LearningRate 0.0773   Epoch: 12   Global Step: 125330   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-08 20:51:29,511-Speed 5990.89 samples/sec   Loss 6.4050   LearningRate 0.0773   Epoch: 12   Global Step: 125340   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:51:36,392-Speed 5954.14 samples/sec   Loss 6.3606   LearningRate 0.0773   Epoch: 12   Global Step: 125350   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:51:43,251-Speed 5973.02 samples/sec   Loss 6.3580   LearningRate 0.0772   Epoch: 12   Global Step: 125360   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:51:50,099-Speed 5982.50 samples/sec   Loss 6.4052   LearningRate 0.0772   Epoch: 12   Global Step: 125370   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:51:56,968-Speed 5964.20 samples/sec   Loss 6.3061   LearningRate 0.0772   Epoch: 12   Global Step: 125380   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:52:03,854-Speed 5951.41 samples/sec   Loss 6.3899   LearningRate 0.0772   Epoch: 12   Global Step: 125390   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:52:10,720-Speed 5966.93 samples/sec   Loss 6.2946   LearningRate 0.0772   Epoch: 12   Global Step: 125400   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:52:17,597-Speed 5957.21 samples/sec   Loss 6.2998   LearningRate 0.0772   Epoch: 12   Global Step: 125410   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:52:24,468-Speed 5962.80 samples/sec   Loss 6.3462   LearningRate 0.0771   Epoch: 12   Global Step: 125420   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:52:31,316-Speed 5983.55 samples/sec   Loss 6.3715   LearningRate 0.0771   Epoch: 12   Global Step: 125430   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:52:38,179-Speed 5969.36 samples/sec   Loss 6.3305   LearningRate 0.0771   Epoch: 12   Global Step: 125440   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:52:45,078-Speed 5938.58 samples/sec   Loss 6.3486   LearningRate 0.0771   Epoch: 12   Global Step: 125450   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:52:51,959-Speed 5955.05 samples/sec   Loss 6.3597   LearningRate 0.0771   Epoch: 12   Global Step: 125460   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:52:58,822-Speed 5969.05 samples/sec   Loss 6.3289   LearningRate 0.0770   Epoch: 12   Global Step: 125470   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:53:05,687-Speed 5967.95 samples/sec   Loss 6.2809   LearningRate 0.0770   Epoch: 12   Global Step: 125480   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:53:12,559-Speed 5962.24 samples/sec   Loss 6.3250   LearningRate 0.0770   Epoch: 12   Global Step: 125490   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:53:19,418-Speed 5972.61 samples/sec   Loss 6.2930   LearningRate 0.0770   Epoch: 12   Global Step: 125500   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:53:26,299-Speed 5964.32 samples/sec   Loss 6.3088   LearningRate 0.0770   Epoch: 12   Global Step: 125510   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:53:33,152-Speed 5978.88 samples/sec   Loss 6.3710   LearningRate 0.0769   Epoch: 12   Global Step: 125520   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:53:39,995-Speed 5985.90 samples/sec   Loss 6.3182   LearningRate 0.0769   Epoch: 12   Global Step: 125530   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:53:46,841-Speed 5984.31 samples/sec   Loss 6.2978   LearningRate 0.0769   Epoch: 12   Global Step: 125540   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:53:53,703-Speed 5971.91 samples/sec   Loss 6.3179   LearningRate 0.0769   Epoch: 12   Global Step: 125550   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:54:00,562-Speed 5972.67 samples/sec   Loss 6.3507   LearningRate 0.0769   Epoch: 12   Global Step: 125560   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:54:07,444-Speed 5952.02 samples/sec   Loss 6.3080   LearningRate 0.0769   Epoch: 12   Global Step: 125570   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:54:14,328-Speed 5951.39 samples/sec   Loss 6.3150   LearningRate 0.0768   Epoch: 12   Global Step: 125580   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:54:21,185-Speed 5974.76 samples/sec   Loss 6.3970   LearningRate 0.0768   Epoch: 12   Global Step: 125590   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:54:28,068-Speed 5952.01 samples/sec   Loss 6.3861   LearningRate 0.0768   Epoch: 12   Global Step: 125600   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:54:34,955-Speed 5949.14 samples/sec   Loss 6.3291   LearningRate 0.0768   Epoch: 12   Global Step: 125610   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:54:41,828-Speed 5959.93 samples/sec   Loss 6.3071   LearningRate 0.0768   Epoch: 12   Global Step: 125620   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:54:48,683-Speed 5976.74 samples/sec   Loss 6.2868   LearningRate 0.0767   Epoch: 12   Global Step: 125630   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:54:55,556-Speed 5961.57 samples/sec   Loss 6.2972   LearningRate 0.0767   Epoch: 12   Global Step: 125640   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:55:02,435-Speed 5955.47 samples/sec   Loss 6.3491   LearningRate 0.0767   Epoch: 12   Global Step: 125650   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:55:09,285-Speed 5980.56 samples/sec   Loss 6.3203   LearningRate 0.0767   Epoch: 12   Global Step: 125660   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:55:16,140-Speed 5976.87 samples/sec   Loss 6.3398   LearningRate 0.0767   Epoch: 12   Global Step: 125670   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:55:22,996-Speed 5974.67 samples/sec   Loss 6.3419   LearningRate 0.0766   Epoch: 12   Global Step: 125680   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:55:29,851-Speed 5977.08 samples/sec   Loss 6.3342   LearningRate 0.0766   Epoch: 12   Global Step: 125690   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:55:36,696-Speed 5985.06 samples/sec   Loss 6.4079   LearningRate 0.0766   Epoch: 12   Global Step: 125700   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:55:43,552-Speed 5975.57 samples/sec   Loss 6.3352   LearningRate 0.0766   Epoch: 12   Global Step: 125710   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:55:50,404-Speed 5978.79 samples/sec   Loss 6.4026   LearningRate 0.0766   Epoch: 12   Global Step: 125720   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:55:57,292-Speed 5947.79 samples/sec   Loss 6.3488   LearningRate 0.0766   Epoch: 12   Global Step: 125730   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:56:04,150-Speed 5974.27 samples/sec   Loss 6.3197   LearningRate 0.0765   Epoch: 12   Global Step: 125740   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:56:11,021-Speed 5962.71 samples/sec   Loss 6.3265   LearningRate 0.0765   Epoch: 12   Global Step: 125750   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:56:17,877-Speed 5976.27 samples/sec   Loss 6.3503   LearningRate 0.0765   Epoch: 12   Global Step: 125760   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:56:24,725-Speed 5981.60 samples/sec   Loss 6.3608   LearningRate 0.0765   Epoch: 12   Global Step: 125770   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:56:31,589-Speed 5968.49 samples/sec   Loss 6.3438   LearningRate 0.0765   Epoch: 12   Global Step: 125780   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:56:38,441-Speed 5979.12 samples/sec   Loss 6.3287   LearningRate 0.0764   Epoch: 12   Global Step: 125790   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:56:45,298-Speed 5974.30 samples/sec   Loss 6.3495   LearningRate 0.0764   Epoch: 12   Global Step: 125800   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:56:52,149-Speed 5979.88 samples/sec   Loss 6.3478   LearningRate 0.0764   Epoch: 12   Global Step: 125810   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:56:59,017-Speed 5966.04 samples/sec   Loss 6.3276   LearningRate 0.0764   Epoch: 12   Global Step: 125820   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:57:05,874-Speed 5974.03 samples/sec   Loss 6.2987   LearningRate 0.0764   Epoch: 12   Global Step: 125830   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:57:12,721-Speed 5983.93 samples/sec   Loss 6.2942   LearningRate 0.0763   Epoch: 12   Global Step: 125840   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:57:19,585-Speed 5968.06 samples/sec   Loss 6.2978   LearningRate 0.0763   Epoch: 12   Global Step: 125850   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:57:26,427-Speed 5988.04 samples/sec   Loss 6.3707   LearningRate 0.0763   Epoch: 12   Global Step: 125860   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:57:33,308-Speed 5953.45 samples/sec   Loss 6.3226   LearningRate 0.0763   Epoch: 12   Global Step: 125870   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:57:40,164-Speed 5976.23 samples/sec   Loss 6.3328   LearningRate 0.0763   Epoch: 12   Global Step: 125880   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:57:47,031-Speed 5965.33 samples/sec   Loss 6.3418   LearningRate 0.0763   Epoch: 12   Global Step: 125890   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:57:53,903-Speed 5961.65 samples/sec   Loss 6.2971   LearningRate 0.0762   Epoch: 12   Global Step: 125900   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 20:58:00,749-Speed 5984.72 samples/sec   Loss 6.3406   LearningRate 0.0762   Epoch: 12   Global Step: 125910   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:58:07,632-Speed 5952.45 samples/sec   Loss 6.3347   LearningRate 0.0762   Epoch: 12   Global Step: 125920   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:58:14,488-Speed 5975.30 samples/sec   Loss 6.3454   LearningRate 0.0762   Epoch: 12   Global Step: 125930   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:58:21,341-Speed 5980.49 samples/sec   Loss 6.3682   LearningRate 0.0762   Epoch: 12   Global Step: 125940   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:58:28,217-Speed 5958.00 samples/sec   Loss 6.3142   LearningRate 0.0761   Epoch: 12   Global Step: 125950   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:58:35,069-Speed 5978.76 samples/sec   Loss 6.3039   LearningRate 0.0761   Epoch: 12   Global Step: 125960   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:58:41,943-Speed 5960.02 samples/sec   Loss 6.2982   LearningRate 0.0761   Epoch: 12   Global Step: 125970   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:58:48,802-Speed 5972.51 samples/sec   Loss 6.3525   LearningRate 0.0761   Epoch: 12   Global Step: 125980   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:58:55,664-Speed 5973.64 samples/sec   Loss 6.3517   LearningRate 0.0761   Epoch: 12   Global Step: 125990   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:59:02,530-Speed 5966.46 samples/sec   Loss 6.3095   LearningRate 0.0760   Epoch: 12   Global Step: 126000   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:59:09,370-Speed 5989.09 samples/sec   Loss 6.3344   LearningRate 0.0760   Epoch: 12   Global Step: 126010   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:59:16,227-Speed 5974.16 samples/sec   Loss 6.2632   LearningRate 0.0760   Epoch: 12   Global Step: 126020   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:59:23,091-Speed 5968.89 samples/sec   Loss 6.2690   LearningRate 0.0760   Epoch: 12   Global Step: 126030   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:59:29,943-Speed 5978.09 samples/sec   Loss 6.2819   LearningRate 0.0760   Epoch: 12   Global Step: 126040   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:59:36,843-Speed 5937.10 samples/sec   Loss 6.2682   LearningRate 0.0760   Epoch: 12   Global Step: 126050   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:59:43,704-Speed 5971.84 samples/sec   Loss 6.2774   LearningRate 0.0759   Epoch: 12   Global Step: 126060   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:59:50,561-Speed 5974.60 samples/sec   Loss 6.3646   LearningRate 0.0759   Epoch: 12   Global Step: 126070   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 20:59:57,435-Speed 5960.08 samples/sec   Loss 6.2188   LearningRate 0.0759   Epoch: 12   Global Step: 126080   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:00:04,287-Speed 5979.17 samples/sec   Loss 6.3060   LearningRate 0.0759   Epoch: 12   Global Step: 126090   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:00:11,132-Speed 5984.83 samples/sec   Loss 6.2949   LearningRate 0.0759   Epoch: 12   Global Step: 126100   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:00:18,011-Speed 5955.50 samples/sec   Loss 6.2778   LearningRate 0.0758   Epoch: 12   Global Step: 126110   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:00:24,869-Speed 5974.07 samples/sec   Loss 6.3322   LearningRate 0.0758   Epoch: 12   Global Step: 126120   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:00:31,770-Speed 5936.22 samples/sec   Loss 6.3280   LearningRate 0.0758   Epoch: 12   Global Step: 126130   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:00:38,645-Speed 5958.69 samples/sec   Loss 6.2849   LearningRate 0.0758   Epoch: 12   Global Step: 126140   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:00:45,502-Speed 5977.56 samples/sec   Loss 6.2790   LearningRate 0.0758   Epoch: 12   Global Step: 126150   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:00:52,385-Speed 5951.10 samples/sec   Loss 6.3156   LearningRate 0.0757   Epoch: 12   Global Step: 126160   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:00:59,279-Speed 5944.65 samples/sec   Loss 6.2671   LearningRate 0.0757   Epoch: 12   Global Step: 126170   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:01:06,133-Speed 5976.97 samples/sec   Loss 6.3234   LearningRate 0.0757   Epoch: 12   Global Step: 126180   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:01:13,013-Speed 5955.04 samples/sec   Loss 6.2926   LearningRate 0.0757   Epoch: 12   Global Step: 126190   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:01:19,888-Speed 5959.42 samples/sec   Loss 6.2841   LearningRate 0.0757   Epoch: 12   Global Step: 126200   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:01:26,753-Speed 5967.74 samples/sec   Loss 6.2905   LearningRate 0.0757   Epoch: 12   Global Step: 126210   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:01:33,614-Speed 5971.00 samples/sec   Loss 6.3457   LearningRate 0.0756   Epoch: 12   Global Step: 126220   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:01:40,493-Speed 5962.34 samples/sec   Loss 6.2757   LearningRate 0.0756   Epoch: 12   Global Step: 126230   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:01:47,359-Speed 5978.96 samples/sec   Loss 6.3454   LearningRate 0.0756   Epoch: 12   Global Step: 126240   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:01:54,210-Speed 5979.48 samples/sec   Loss 6.2721   LearningRate 0.0756   Epoch: 12   Global Step: 126250   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:02:01,082-Speed 5962.34 samples/sec   Loss 6.2968   LearningRate 0.0756   Epoch: 12   Global Step: 126260   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:02:07,965-Speed 5951.73 samples/sec   Loss 6.2487   LearningRate 0.0755   Epoch: 12   Global Step: 126270   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:02:14,813-Speed 5982.38 samples/sec   Loss 6.3264   LearningRate 0.0755   Epoch: 12   Global Step: 126280   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:02:21,666-Speed 5978.36 samples/sec   Loss 6.2887   LearningRate 0.0755   Epoch: 12   Global Step: 126290   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:02:28,547-Speed 5954.24 samples/sec   Loss 6.3562   LearningRate 0.0755   Epoch: 12   Global Step: 126300   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:02:35,425-Speed 5955.74 samples/sec   Loss 6.2618   LearningRate 0.0755   Epoch: 12   Global Step: 126310   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:02:42,288-Speed 5975.58 samples/sec   Loss 6.2652   LearningRate 0.0754   Epoch: 12   Global Step: 126320   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:02:49,160-Speed 5961.37 samples/sec   Loss 6.2933   LearningRate 0.0754   Epoch: 12   Global Step: 126330   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:02:56,027-Speed 5965.67 samples/sec   Loss 6.3229   LearningRate 0.0754   Epoch: 12   Global Step: 126340   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:03:02,889-Speed 5970.50 samples/sec   Loss 6.2816   LearningRate 0.0754   Epoch: 12   Global Step: 126350   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:03:09,770-Speed 5954.39 samples/sec   Loss 6.2736   LearningRate 0.0754   Epoch: 12   Global Step: 126360   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:03:16,637-Speed 5965.27 samples/sec   Loss 6.2720   LearningRate 0.0754   Epoch: 12   Global Step: 126370   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:03:23,494-Speed 5974.67 samples/sec   Loss 6.2444   LearningRate 0.0753   Epoch: 12   Global Step: 126380   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:03:30,346-Speed 5978.96 samples/sec   Loss 6.3133   LearningRate 0.0753   Epoch: 12   Global Step: 126390   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:03:37,220-Speed 5959.95 samples/sec   Loss 6.2465   LearningRate 0.0753   Epoch: 12   Global Step: 126400   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:03:44,190-Speed 5877.45 samples/sec   Loss 6.3400   LearningRate 0.0753   Epoch: 12   Global Step: 126410   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:03:51,049-Speed 5975.35 samples/sec   Loss 6.3107   LearningRate 0.0753   Epoch: 12   Global Step: 126420   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:03:57,926-Speed 5956.56 samples/sec   Loss 6.3129   LearningRate 0.0752   Epoch: 12   Global Step: 126430   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:04:04,794-Speed 5966.98 samples/sec   Loss 6.3249   LearningRate 0.0752   Epoch: 12   Global Step: 126440   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:04:11,654-Speed 5972.56 samples/sec   Loss 6.2712   LearningRate 0.0752   Epoch: 12   Global Step: 126450   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:04:18,538-Speed 5950.96 samples/sec   Loss 6.2654   LearningRate 0.0752   Epoch: 12   Global Step: 126460   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:04:25,418-Speed 5955.06 samples/sec   Loss 6.2672   LearningRate 0.0752   Epoch: 12   Global Step: 126470   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:04:32,289-Speed 5962.49 samples/sec   Loss 6.2812   LearningRate 0.0752   Epoch: 12   Global Step: 126480   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:04:39,257-Speed 5879.82 samples/sec   Loss 6.2878   LearningRate 0.0751   Epoch: 12   Global Step: 126490   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:04:46,163-Speed 5932.00 samples/sec   Loss 6.3011   LearningRate 0.0751   Epoch: 12   Global Step: 126500   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:04:53,026-Speed 5970.71 samples/sec   Loss 6.3216   LearningRate 0.0751   Epoch: 12   Global Step: 126510   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:04:59,882-Speed 5975.30 samples/sec   Loss 6.2561   LearningRate 0.0751   Epoch: 12   Global Step: 126520   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:05:06,776-Speed 5943.16 samples/sec   Loss 6.2724   LearningRate 0.0751   Epoch: 12   Global Step: 126530   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:05:13,629-Speed 5978.35 samples/sec   Loss 6.3001   LearningRate 0.0750   Epoch: 12   Global Step: 126540   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:05:20,510-Speed 5953.57 samples/sec   Loss 6.2795   LearningRate 0.0750   Epoch: 12   Global Step: 126550   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:05:27,393-Speed 5952.03 samples/sec   Loss 6.2293   LearningRate 0.0750   Epoch: 12   Global Step: 126560   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:05:34,256-Speed 5969.34 samples/sec   Loss 6.2386   LearningRate 0.0750   Epoch: 12   Global Step: 126570   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:05:41,107-Speed 5979.45 samples/sec   Loss 6.2178   LearningRate 0.0750   Epoch: 12   Global Step: 126580   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:05:47,963-Speed 5975.70 samples/sec   Loss 6.2163   LearningRate 0.0749   Epoch: 12   Global Step: 126590   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:05:54,835-Speed 5961.84 samples/sec   Loss 6.2989   LearningRate 0.0749   Epoch: 12   Global Step: 126600   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:06:01,682-Speed 5983.45 samples/sec   Loss 6.2684   LearningRate 0.0749   Epoch: 12   Global Step: 126610   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:06:08,542-Speed 5972.04 samples/sec   Loss 6.2996   LearningRate 0.0749   Epoch: 12   Global Step: 126620   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:06:15,391-Speed 5981.11 samples/sec   Loss 6.2422   LearningRate 0.0749   Epoch: 12   Global Step: 126630   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:06:22,252-Speed 5970.87 samples/sec   Loss 6.3232   LearningRate 0.0749   Epoch: 12   Global Step: 126640   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:06:29,105-Speed 5979.21 samples/sec   Loss 6.3243   LearningRate 0.0748   Epoch: 12   Global Step: 126650   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:06:35,968-Speed 5969.14 samples/sec   Loss 6.2942   LearningRate 0.0748   Epoch: 12   Global Step: 126660   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:06:42,821-Speed 5977.64 samples/sec   Loss 6.2234   LearningRate 0.0748   Epoch: 12   Global Step: 126670   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:06:49,692-Speed 5965.28 samples/sec   Loss 6.2511   LearningRate 0.0748   Epoch: 12   Global Step: 126680   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:06:56,560-Speed 5964.52 samples/sec   Loss 6.2959   LearningRate 0.0748   Epoch: 12   Global Step: 126690   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:07:03,404-Speed 5986.05 samples/sec   Loss 6.2871   LearningRate 0.0747   Epoch: 12   Global Step: 126700   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:07:10,258-Speed 5976.65 samples/sec   Loss 6.2751   LearningRate 0.0747   Epoch: 12   Global Step: 126710   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:07:17,128-Speed 5964.16 samples/sec   Loss 6.2172   LearningRate 0.0747   Epoch: 12   Global Step: 126720   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:07:23,978-Speed 5979.95 samples/sec   Loss 6.2927   LearningRate 0.0747   Epoch: 12   Global Step: 126730   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:07:30,828-Speed 5980.51 samples/sec   Loss 6.2229   LearningRate 0.0747   Epoch: 12   Global Step: 126740   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:07:37,719-Speed 5948.25 samples/sec   Loss 6.2507   LearningRate 0.0747   Epoch: 12   Global Step: 126750   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:07:44,609-Speed 5945.65 samples/sec   Loss 6.2910   LearningRate 0.0746   Epoch: 12   Global Step: 126760   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:07:51,520-Speed 5927.64 samples/sec   Loss 6.1889   LearningRate 0.0746   Epoch: 12   Global Step: 126770   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:07:58,446-Speed 5916.15 samples/sec   Loss 6.2528   LearningRate 0.0746   Epoch: 12   Global Step: 126780   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:08:05,319-Speed 5959.92 samples/sec   Loss 6.2313   LearningRate 0.0746   Epoch: 12   Global Step: 126790   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:08:12,208-Speed 5947.51 samples/sec   Loss 6.2741   LearningRate 0.0746   Epoch: 12   Global Step: 126800   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:08:19,061-Speed 5978.54 samples/sec   Loss 6.2851   LearningRate 0.0745   Epoch: 12   Global Step: 126810   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:08:25,907-Speed 5983.35 samples/sec   Loss 6.2475   LearningRate 0.0745   Epoch: 12   Global Step: 126820   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:08:32,755-Speed 5983.00 samples/sec   Loss 6.3185   LearningRate 0.0745   Epoch: 12   Global Step: 126830   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:08:39,612-Speed 5974.27 samples/sec   Loss 6.2612   LearningRate 0.0745   Epoch: 12   Global Step: 126840   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:08:46,476-Speed 5968.42 samples/sec   Loss 6.1941   LearningRate 0.0745   Epoch: 12   Global Step: 126850   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:08:53,336-Speed 5972.51 samples/sec   Loss 6.2103   LearningRate 0.0744   Epoch: 12   Global Step: 126860   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:09:00,179-Speed 5987.11 samples/sec   Loss 6.2663   LearningRate 0.0744   Epoch: 12   Global Step: 126870   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:09:07,061-Speed 5952.46 samples/sec   Loss 6.2482   LearningRate 0.0744   Epoch: 12   Global Step: 126880   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:09:13,924-Speed 5969.05 samples/sec   Loss 6.2677   LearningRate 0.0744   Epoch: 12   Global Step: 126890   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:09:20,778-Speed 5977.07 samples/sec   Loss 6.2041   LearningRate 0.0744   Epoch: 12   Global Step: 126900   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:09:27,652-Speed 5959.61 samples/sec   Loss 6.2785   LearningRate 0.0744   Epoch: 12   Global Step: 126910   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:09:34,528-Speed 5958.47 samples/sec   Loss 6.2330   LearningRate 0.0743   Epoch: 12   Global Step: 126920   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:09:41,385-Speed 5975.09 samples/sec   Loss 6.2076   LearningRate 0.0743   Epoch: 12   Global Step: 126930   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:09:48,251-Speed 5966.76 samples/sec   Loss 6.2441   LearningRate 0.0743   Epoch: 12   Global Step: 126940   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:09:55,108-Speed 5974.65 samples/sec   Loss 6.1986   LearningRate 0.0743   Epoch: 12   Global Step: 126950   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:10:01,974-Speed 5967.02 samples/sec   Loss 6.2326   LearningRate 0.0743   Epoch: 12   Global Step: 126960   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:10:08,835-Speed 5970.68 samples/sec   Loss 6.2195   LearningRate 0.0742   Epoch: 12   Global Step: 126970   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-01-08 21:10:15,684-Speed 5982.05 samples/sec   Loss 6.2442   LearningRate 0.0742   Epoch: 12   Global Step: 126980   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:10:22,546-Speed 5970.99 samples/sec   Loss 6.1881   LearningRate 0.0742   Epoch: 12   Global Step: 126990   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:10:29,403-Speed 5974.05 samples/sec   Loss 6.2995   LearningRate 0.0742   Epoch: 12   Global Step: 127000   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:10:36,244-Speed 5988.54 samples/sec   Loss 6.2999   LearningRate 0.0742   Epoch: 12   Global Step: 127010   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:10:43,095-Speed 5979.77 samples/sec   Loss 6.2570   LearningRate 0.0742   Epoch: 12   Global Step: 127020   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:10:49,948-Speed 5977.62 samples/sec   Loss 6.2109   LearningRate 0.0741   Epoch: 12   Global Step: 127030   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:10:56,840-Speed 5944.26 samples/sec   Loss 6.1847   LearningRate 0.0741   Epoch: 12   Global Step: 127040   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:11:03,684-Speed 5986.05 samples/sec   Loss 6.2111   LearningRate 0.0741   Epoch: 12   Global Step: 127050   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:11:10,536-Speed 5979.03 samples/sec   Loss 6.2594   LearningRate 0.0741   Epoch: 12   Global Step: 127060   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:11:17,384-Speed 5981.92 samples/sec   Loss 6.2855   LearningRate 0.0741   Epoch: 12   Global Step: 127070   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:11:24,227-Speed 5986.97 samples/sec   Loss 6.1712   LearningRate 0.0740   Epoch: 12   Global Step: 127080   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:11:31,087-Speed 5971.73 samples/sec   Loss 6.2888   LearningRate 0.0740   Epoch: 12   Global Step: 127090   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:11:37,937-Speed 5981.56 samples/sec   Loss 6.2335   LearningRate 0.0740   Epoch: 12   Global Step: 127100   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:11:44,790-Speed 5977.98 samples/sec   Loss 6.2844   LearningRate 0.0740   Epoch: 12   Global Step: 127110   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:11:51,655-Speed 5967.80 samples/sec   Loss 6.1742   LearningRate 0.0740   Epoch: 12   Global Step: 127120   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:11:58,521-Speed 5966.28 samples/sec   Loss 6.2440   LearningRate 0.0739   Epoch: 12   Global Step: 127130   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:12:05,382-Speed 5971.71 samples/sec   Loss 6.2002   LearningRate 0.0739   Epoch: 12   Global Step: 127140   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:12:12,232-Speed 5980.91 samples/sec   Loss 6.2053   LearningRate 0.0739   Epoch: 12   Global Step: 127150   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:12:19,071-Speed 5990.18 samples/sec   Loss 6.2066   LearningRate 0.0739   Epoch: 12   Global Step: 127160   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:12:25,934-Speed 5971.94 samples/sec   Loss 6.2875   LearningRate 0.0739   Epoch: 12   Global Step: 127170   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:12:32,793-Speed 5972.69 samples/sec   Loss 6.2184   LearningRate 0.0739   Epoch: 12   Global Step: 127180   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:12:39,661-Speed 5965.19 samples/sec   Loss 6.2902   LearningRate 0.0738   Epoch: 12   Global Step: 127190   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:12:46,522-Speed 5971.60 samples/sec   Loss 6.2045   LearningRate 0.0738   Epoch: 12   Global Step: 127200   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:12:53,390-Speed 5964.31 samples/sec   Loss 6.2204   LearningRate 0.0738   Epoch: 12   Global Step: 127210   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:13:00,242-Speed 5979.88 samples/sec   Loss 6.2607   LearningRate 0.0738   Epoch: 12   Global Step: 127220   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:13:07,110-Speed 5964.95 samples/sec   Loss 6.2361   LearningRate 0.0738   Epoch: 12   Global Step: 127230   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:13:13,968-Speed 5973.59 samples/sec   Loss 6.2333   LearningRate 0.0737   Epoch: 12   Global Step: 127240   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:13:20,837-Speed 5964.01 samples/sec   Loss 6.2443   LearningRate 0.0737   Epoch: 12   Global Step: 127250   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:13:27,691-Speed 5977.74 samples/sec   Loss 6.1974   LearningRate 0.0737   Epoch: 12   Global Step: 127260   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:13:34,547-Speed 5975.58 samples/sec   Loss 6.2149   LearningRate 0.0737   Epoch: 12   Global Step: 127270   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:13:41,397-Speed 5980.96 samples/sec   Loss 6.2763   LearningRate 0.0737   Epoch: 12   Global Step: 127280   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:13:48,292-Speed 5942.16 samples/sec   Loss 6.2529   LearningRate 0.0737   Epoch: 12   Global Step: 127290   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:13:55,134-Speed 5987.32 samples/sec   Loss 6.2201   LearningRate 0.0736   Epoch: 12   Global Step: 127300   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:14:02,010-Speed 5958.10 samples/sec   Loss 6.2723   LearningRate 0.0736   Epoch: 12   Global Step: 127310   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:14:08,856-Speed 5984.85 samples/sec   Loss 6.2350   LearningRate 0.0736   Epoch: 12   Global Step: 127320   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:14:15,730-Speed 5960.90 samples/sec   Loss 6.1874   LearningRate 0.0736   Epoch: 12   Global Step: 127330   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:14:22,592-Speed 5970.58 samples/sec   Loss 6.2511   LearningRate 0.0736   Epoch: 12   Global Step: 127340   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:14:29,455-Speed 5969.80 samples/sec   Loss 6.2464   LearningRate 0.0735   Epoch: 12   Global Step: 127350   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:14:36,307-Speed 5978.68 samples/sec   Loss 6.2444   LearningRate 0.0735   Epoch: 12   Global Step: 127360   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:14:43,168-Speed 5971.94 samples/sec   Loss 6.1849   LearningRate 0.0735   Epoch: 12   Global Step: 127370   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:14:50,012-Speed 5985.77 samples/sec   Loss 6.2143   LearningRate 0.0735   Epoch: 12   Global Step: 127380   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:14:56,865-Speed 5977.68 samples/sec   Loss 6.2143   LearningRate 0.0735   Epoch: 12   Global Step: 127390   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:15:03,799-Speed 5908.87 samples/sec   Loss 6.1747   LearningRate 0.0735   Epoch: 12   Global Step: 127400   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:15:10,770-Speed 5877.80 samples/sec   Loss 6.1948   LearningRate 0.0734   Epoch: 12   Global Step: 127410   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:15:17,613-Speed 5986.63 samples/sec   Loss 6.2267   LearningRate 0.0734   Epoch: 12   Global Step: 127420   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:15:24,481-Speed 5965.04 samples/sec   Loss 6.2314   LearningRate 0.0734   Epoch: 12   Global Step: 127430   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:15:31,343-Speed 5971.03 samples/sec   Loss 6.2671   LearningRate 0.0734   Epoch: 12   Global Step: 127440   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:15:38,197-Speed 5976.28 samples/sec   Loss 6.2402   LearningRate 0.0734   Epoch: 12   Global Step: 127450   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:15:45,080-Speed 5952.75 samples/sec   Loss 6.2339   LearningRate 0.0733   Epoch: 12   Global Step: 127460   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:15:51,938-Speed 5973.63 samples/sec   Loss 6.2126   LearningRate 0.0733   Epoch: 12   Global Step: 127470   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:15:58,819-Speed 5953.62 samples/sec   Loss 6.2009   LearningRate 0.0733   Epoch: 12   Global Step: 127480   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:16:05,693-Speed 5960.41 samples/sec   Loss 6.2100   LearningRate 0.0733   Epoch: 12   Global Step: 127490   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:16:12,561-Speed 5964.95 samples/sec   Loss 6.2358   LearningRate 0.0733   Epoch: 12   Global Step: 127500   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:16:19,440-Speed 5955.39 samples/sec   Loss 6.2020   LearningRate 0.0733   Epoch: 12   Global Step: 127510   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:16:26,301-Speed 5971.69 samples/sec   Loss 6.2361   LearningRate 0.0732   Epoch: 12   Global Step: 127520   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:16:33,155-Speed 5976.84 samples/sec   Loss 6.1609   LearningRate 0.0732   Epoch: 12   Global Step: 127530   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:16:40,030-Speed 5959.79 samples/sec   Loss 6.1690   LearningRate 0.0732   Epoch: 12   Global Step: 127540   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:16:46,910-Speed 5955.33 samples/sec   Loss 6.2295   LearningRate 0.0732   Epoch: 12   Global Step: 127550   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:16:53,761-Speed 5980.09 samples/sec   Loss 6.2134   LearningRate 0.0732   Epoch: 12   Global Step: 127560   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:17:00,624-Speed 5969.06 samples/sec   Loss 6.2294   LearningRate 0.0731   Epoch: 12   Global Step: 127570   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:17:07,478-Speed 5978.88 samples/sec   Loss 6.2313   LearningRate 0.0731   Epoch: 12   Global Step: 127580   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:17:14,332-Speed 5979.72 samples/sec   Loss 6.2331   LearningRate 0.0731   Epoch: 12   Global Step: 127590   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:17:21,208-Speed 5957.99 samples/sec   Loss 6.2398   LearningRate 0.0731   Epoch: 12   Global Step: 127600   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:17:28,048-Speed 5989.51 samples/sec   Loss 6.2365   LearningRate 0.0731   Epoch: 12   Global Step: 127610   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:17:34,913-Speed 5967.87 samples/sec   Loss 6.1664   LearningRate 0.0730   Epoch: 12   Global Step: 127620   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:17:41,763-Speed 5980.72 samples/sec   Loss 6.0810   LearningRate 0.0730   Epoch: 12   Global Step: 127630   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:17:48,617-Speed 5976.81 samples/sec   Loss 6.1304   LearningRate 0.0730   Epoch: 12   Global Step: 127640   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:17:55,470-Speed 5978.77 samples/sec   Loss 6.1951   LearningRate 0.0730   Epoch: 12   Global Step: 127650   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:18:02,315-Speed 5984.57 samples/sec   Loss 6.1866   LearningRate 0.0730   Epoch: 12   Global Step: 127660   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:18:09,175-Speed 5972.04 samples/sec   Loss 6.2254   LearningRate 0.0730   Epoch: 12   Global Step: 127670   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:18:16,038-Speed 5969.32 samples/sec   Loss 6.2079   LearningRate 0.0729   Epoch: 12   Global Step: 127680   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:18:22,911-Speed 5961.04 samples/sec   Loss 6.1989   LearningRate 0.0729   Epoch: 12   Global Step: 127690   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:18:29,771-Speed 5971.91 samples/sec   Loss 6.1509   LearningRate 0.0729   Epoch: 12   Global Step: 127700   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:18:36,626-Speed 5976.38 samples/sec   Loss 6.2140   LearningRate 0.0729   Epoch: 12   Global Step: 127710   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:18:43,484-Speed 5973.70 samples/sec   Loss 6.1769   LearningRate 0.0729   Epoch: 12   Global Step: 127720   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:18:50,334-Speed 5981.01 samples/sec   Loss 6.1661   LearningRate 0.0728   Epoch: 12   Global Step: 127730   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:18:57,188-Speed 5980.19 samples/sec   Loss 6.2316   LearningRate 0.0728   Epoch: 12   Global Step: 127740   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:19:04,036-Speed 5982.10 samples/sec   Loss 6.2815   LearningRate 0.0728   Epoch: 12   Global Step: 127750   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:19:10,893-Speed 5974.72 samples/sec   Loss 6.2449   LearningRate 0.0728   Epoch: 12   Global Step: 127760   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:19:17,743-Speed 5981.15 samples/sec   Loss 6.2058   LearningRate 0.0728   Epoch: 12   Global Step: 127770   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:19:24,592-Speed 5981.20 samples/sec   Loss 6.2133   LearningRate 0.0728   Epoch: 12   Global Step: 127780   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:19:31,511-Speed 5921.03 samples/sec   Loss 6.2122   LearningRate 0.0727   Epoch: 12   Global Step: 127790   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:19:38,456-Speed 5899.61 samples/sec   Loss 6.2138   LearningRate 0.0727   Epoch: 12   Global Step: 127800   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:19:45,407-Speed 5893.55 samples/sec   Loss 6.2263   LearningRate 0.0727   Epoch: 12   Global Step: 127810   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:19:52,269-Speed 5970.12 samples/sec   Loss 6.1653   LearningRate 0.0727   Epoch: 12   Global Step: 127820   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:19:59,129-Speed 5972.19 samples/sec   Loss 6.0943   LearningRate 0.0727   Epoch: 12   Global Step: 127830   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:20:05,979-Speed 5980.18 samples/sec   Loss 6.1939   LearningRate 0.0726   Epoch: 12   Global Step: 127840   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:20:12,831-Speed 5978.40 samples/sec   Loss 6.1233   LearningRate 0.0726   Epoch: 12   Global Step: 127850   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:20:19,689-Speed 5974.25 samples/sec   Loss 6.1886   LearningRate 0.0726   Epoch: 12   Global Step: 127860   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:20:26,543-Speed 5977.33 samples/sec   Loss 6.2032   LearningRate 0.0726   Epoch: 12   Global Step: 127870   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:20:33,417-Speed 5960.01 samples/sec   Loss 6.2321   LearningRate 0.0726   Epoch: 12   Global Step: 127880   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:20:40,270-Speed 5977.90 samples/sec   Loss 6.1747   LearningRate 0.0726   Epoch: 12   Global Step: 127890   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:20:47,124-Speed 5976.94 samples/sec   Loss 6.1105   LearningRate 0.0725   Epoch: 12   Global Step: 127900   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:20:53,974-Speed 5981.72 samples/sec   Loss 6.1578   LearningRate 0.0725   Epoch: 12   Global Step: 127910   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:21:00,832-Speed 5974.42 samples/sec   Loss 6.1920   LearningRate 0.0725   Epoch: 12   Global Step: 127920   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:21:07,705-Speed 5960.19 samples/sec   Loss 6.2443   LearningRate 0.0725   Epoch: 12   Global Step: 127930   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:21:14,619-Speed 5925.61 samples/sec   Loss 6.1250   LearningRate 0.0725   Epoch: 12   Global Step: 127940   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:21:21,495-Speed 5958.05 samples/sec   Loss 6.1502   LearningRate 0.0724   Epoch: 12   Global Step: 127950   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:21:28,354-Speed 5972.70 samples/sec   Loss 6.1761   LearningRate 0.0724   Epoch: 12   Global Step: 127960   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:21:35,199-Speed 5984.77 samples/sec   Loss 6.1955   LearningRate 0.0724   Epoch: 12   Global Step: 127970   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:21:42,041-Speed 5987.58 samples/sec   Loss 6.2055   LearningRate 0.0724   Epoch: 12   Global Step: 127980   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:21:49,001-Speed 5886.80 samples/sec   Loss 6.1978   LearningRate 0.0724   Epoch: 12   Global Step: 127990   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:21:55,971-Speed 5878.31 samples/sec   Loss 6.1921   LearningRate 0.0724   Epoch: 12   Global Step: 128000   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:22:02,839-Speed 5965.23 samples/sec   Loss 6.1564   LearningRate 0.0723   Epoch: 12   Global Step: 128010   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:22:09,756-Speed 5922.14 samples/sec   Loss 6.1816   LearningRate 0.0723   Epoch: 12   Global Step: 128020   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:22:16,607-Speed 5980.56 samples/sec   Loss 6.1864   LearningRate 0.0723   Epoch: 12   Global Step: 128030   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:22:23,461-Speed 5979.99 samples/sec   Loss 6.1559   LearningRate 0.0723   Epoch: 12   Global Step: 128040   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:22:30,298-Speed 5990.94 samples/sec   Loss 6.2031   LearningRate 0.0723   Epoch: 12   Global Step: 128050   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:22:37,163-Speed 5967.80 samples/sec   Loss 6.1853   LearningRate 0.0722   Epoch: 12   Global Step: 128060   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:22:44,009-Speed 5984.73 samples/sec   Loss 6.2182   LearningRate 0.0722   Epoch: 12   Global Step: 128070   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:22:50,848-Speed 5989.11 samples/sec   Loss 6.1522   LearningRate 0.0722   Epoch: 12   Global Step: 128080   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:22:57,690-Speed 5988.25 samples/sec   Loss 6.1190   LearningRate 0.0722   Epoch: 12   Global Step: 128090   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:23:04,541-Speed 5980.75 samples/sec   Loss 6.1946   LearningRate 0.0722   Epoch: 12   Global Step: 128100   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-01-08 21:23:11,361-Speed 6006.46 samples/sec   Loss 6.1591   LearningRate 0.0722   Epoch: 12   Global Step: 128110   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-01-08 21:23:18,246-Speed 5950.30 samples/sec   Loss 6.1885   LearningRate 0.0721   Epoch: 12   Global Step: 128120   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-01-08 21:23:25,105-Speed 5973.35 samples/sec   Loss 6.1508   LearningRate 0.0721   Epoch: 12   Global Step: 128130   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-01-08 21:23:32,015-Speed 5928.45 samples/sec   Loss 6.1654   LearningRate 0.0721   Epoch: 12   Global Step: 128140   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-01-08 21:23:38,897-Speed 5953.13 samples/sec   Loss 6.2195   LearningRate 0.0721   Epoch: 12   Global Step: 128150   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-01-08 21:23:45,750-Speed 5978.21 samples/sec   Loss 6.1445   LearningRate 0.0721   Epoch: 12   Global Step: 128160   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-01-08 21:23:52,614-Speed 5968.66 samples/sec   Loss 6.1484   LearningRate 0.0720   Epoch: 12   Global Step: 128170   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-01-08 21:23:59,488-Speed 5960.98 samples/sec   Loss 6.1445   LearningRate 0.0720   Epoch: 12   Global Step: 128180   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-01-08 21:24:06,337-Speed 5981.91 samples/sec   Loss 6.2264   LearningRate 0.0720   Epoch: 12   Global Step: 128190   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-01-08 21:24:13,180-Speed 5986.37 samples/sec   Loss 6.2152   LearningRate 0.0720   Epoch: 12   Global Step: 128200   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-01-08 21:24:20,046-Speed 5966.63 samples/sec   Loss 6.1531   LearningRate 0.0720   Epoch: 12   Global Step: 128210   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-08 21:24:26,918-Speed 5962.64 samples/sec   Loss 6.0693   LearningRate 0.0720   Epoch: 12   Global Step: 128220   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-08 21:24:33,780-Speed 5969.72 samples/sec   Loss 6.1058   LearningRate 0.0719   Epoch: 12   Global Step: 128230   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-08 21:24:40,659-Speed 5955.56 samples/sec   Loss 6.1518   LearningRate 0.0719   Epoch: 12   Global Step: 128240   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-08 21:24:47,539-Speed 5955.03 samples/sec   Loss 6.1565   LearningRate 0.0719   Epoch: 12   Global Step: 128250   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-08 21:24:54,395-Speed 5975.63 samples/sec   Loss 6.1306   LearningRate 0.0719   Epoch: 12   Global Step: 128260   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-08 21:25:01,255-Speed 5971.77 samples/sec   Loss 6.2154   LearningRate 0.0719   Epoch: 12   Global Step: 128270   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-08 21:25:08,135-Speed 5954.70 samples/sec   Loss 6.1377   LearningRate 0.0718   Epoch: 12   Global Step: 128280   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-08 21:25:14,980-Speed 5985.49 samples/sec   Loss 6.1008   LearningRate 0.0718   Epoch: 12   Global Step: 128290   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-08 21:25:21,838-Speed 5973.04 samples/sec   Loss 6.1212   LearningRate 0.0718   Epoch: 12   Global Step: 128300   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-01-08 21:25:28,717-Speed 5956.15 samples/sec   Loss 6.1790   LearningRate 0.0718   Epoch: 12   Global Step: 128310   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:25:35,584-Speed 5965.95 samples/sec   Loss 6.1175   LearningRate 0.0718   Epoch: 12   Global Step: 128320   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-01-08 21:25:42,431-Speed 5983.26 samples/sec   Loss 6.1569   LearningRate 0.0718   Epoch: 12   Global Step: 128330   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:25:49,282-Speed 5981.58 samples/sec   Loss 6.1236   LearningRate 0.0717   Epoch: 12   Global Step: 128340   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:25:56,137-Speed 5976.30 samples/sec   Loss 6.1327   LearningRate 0.0717   Epoch: 12   Global Step: 128350   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:26:03,007-Speed 5963.84 samples/sec   Loss 6.1838   LearningRate 0.0717   Epoch: 12   Global Step: 128360   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:26:09,851-Speed 5986.11 samples/sec   Loss 6.1393   LearningRate 0.0717   Epoch: 12   Global Step: 128370   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:26:16,704-Speed 5978.24 samples/sec   Loss 6.0978   LearningRate 0.0717   Epoch: 12   Global Step: 128380   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:26:23,558-Speed 5979.61 samples/sec   Loss 6.1526   LearningRate 0.0716   Epoch: 12   Global Step: 128390   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:26:30,402-Speed 5986.60 samples/sec   Loss 6.1471   LearningRate 0.0716   Epoch: 12   Global Step: 128400   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:26:37,257-Speed 5975.84 samples/sec   Loss 6.1190   LearningRate 0.0716   Epoch: 12   Global Step: 128410   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:26:44,115-Speed 5973.76 samples/sec   Loss 6.1107   LearningRate 0.0716   Epoch: 12   Global Step: 128420   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:26:50,973-Speed 5974.23 samples/sec   Loss 6.1267   LearningRate 0.0716   Epoch: 12   Global Step: 128430   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:26:57,826-Speed 5977.75 samples/sec   Loss 6.1244   LearningRate 0.0716   Epoch: 12   Global Step: 128440   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:27:04,704-Speed 5956.94 samples/sec   Loss 6.1323   LearningRate 0.0715   Epoch: 12   Global Step: 128450   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:27:11,558-Speed 5977.36 samples/sec   Loss 6.1288   LearningRate 0.0715   Epoch: 12   Global Step: 128460   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:27:18,444-Speed 5949.51 samples/sec   Loss 6.1384   LearningRate 0.0715   Epoch: 12   Global Step: 128470   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:27:25,317-Speed 5961.12 samples/sec   Loss 6.1853   LearningRate 0.0715   Epoch: 12   Global Step: 128480   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:27:32,173-Speed 5976.25 samples/sec   Loss 6.1334   LearningRate 0.0715   Epoch: 12   Global Step: 128490   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:27:39,032-Speed 5972.81 samples/sec   Loss 6.1682   LearningRate 0.0714   Epoch: 12   Global Step: 128500   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:27:45,915-Speed 5956.01 samples/sec   Loss 6.1901   LearningRate 0.0714   Epoch: 12   Global Step: 128510   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-08 21:27:52,776-Speed 5973.78 samples/sec   Loss 6.1794   LearningRate 0.0714   Epoch: 12   Global Step: 128520   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-08 21:27:59,613-Speed 5991.40 samples/sec   Loss 6.1977   LearningRate 0.0714   Epoch: 12   Global Step: 128530   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:28:06,479-Speed 5967.50 samples/sec   Loss 6.1656   LearningRate 0.0714   Epoch: 12   Global Step: 128540   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:28:13,326-Speed 5983.47 samples/sec   Loss 6.1273   LearningRate 0.0714   Epoch: 12   Global Step: 128550   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:28:20,182-Speed 5975.08 samples/sec   Loss 6.1267   LearningRate 0.0713   Epoch: 12   Global Step: 128560   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:28:27,040-Speed 5974.10 samples/sec   Loss 6.1453   LearningRate 0.0713   Epoch: 12   Global Step: 128570   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:28:33,890-Speed 5981.16 samples/sec   Loss 6.0911   LearningRate 0.0713   Epoch: 12   Global Step: 128580   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:28:40,742-Speed 5978.32 samples/sec   Loss 6.0999   LearningRate 0.0713   Epoch: 12   Global Step: 128590   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:28:47,596-Speed 5976.59 samples/sec   Loss 6.1066   LearningRate 0.0713   Epoch: 12   Global Step: 128600   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:28:54,454-Speed 5974.14 samples/sec   Loss 6.1091   LearningRate 0.0712   Epoch: 12   Global Step: 128610   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:29:01,325-Speed 5962.13 samples/sec   Loss 6.0676   LearningRate 0.0712   Epoch: 12   Global Step: 128620   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:29:08,181-Speed 5975.32 samples/sec   Loss 6.0672   LearningRate 0.0712   Epoch: 12   Global Step: 128630   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:29:15,031-Speed 5981.10 samples/sec   Loss 6.0863   LearningRate 0.0712   Epoch: 12   Global Step: 128640   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:29:21,893-Speed 5970.46 samples/sec   Loss 6.1060   LearningRate 0.0712   Epoch: 12   Global Step: 128650   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:29:28,738-Speed 5985.50 samples/sec   Loss 6.1066   LearningRate 0.0712   Epoch: 12   Global Step: 128660   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:29:35,587-Speed 5982.18 samples/sec   Loss 6.1524   LearningRate 0.0711   Epoch: 12   Global Step: 128670   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:29:42,453-Speed 5966.92 samples/sec   Loss 6.1504   LearningRate 0.0711   Epoch: 12   Global Step: 128680   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:29:49,326-Speed 5960.86 samples/sec   Loss 6.1249   LearningRate 0.0711   Epoch: 12   Global Step: 128690   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:29:56,202-Speed 5957.71 samples/sec   Loss 6.0930   LearningRate 0.0711   Epoch: 12   Global Step: 128700   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:30:03,068-Speed 5966.69 samples/sec   Loss 6.0993   LearningRate 0.0711   Epoch: 12   Global Step: 128710   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:30:09,975-Speed 5931.94 samples/sec   Loss 6.1224   LearningRate 0.0710   Epoch: 12   Global Step: 128720   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:30:16,834-Speed 5972.53 samples/sec   Loss 6.1528   LearningRate 0.0710   Epoch: 12   Global Step: 128730   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:30:23,732-Speed 5939.20 samples/sec   Loss 6.1507   LearningRate 0.0710   Epoch: 12   Global Step: 128740   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:30:30,646-Speed 5925.25 samples/sec   Loss 6.0928   LearningRate 0.0710   Epoch: 12   Global Step: 128750   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:30:37,526-Speed 5954.80 samples/sec   Loss 6.1292   LearningRate 0.0710   Epoch: 12   Global Step: 128760   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:30:44,373-Speed 5982.89 samples/sec   Loss 6.1059   LearningRate 0.0710   Epoch: 12   Global Step: 128770   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:30:51,246-Speed 5960.37 samples/sec   Loss 6.0829   LearningRate 0.0709   Epoch: 12   Global Step: 128780   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:30:58,115-Speed 5964.00 samples/sec   Loss 6.1028   LearningRate 0.0709   Epoch: 12   Global Step: 128790   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:31:04,980-Speed 5968.55 samples/sec   Loss 6.0778   LearningRate 0.0709   Epoch: 12   Global Step: 128800   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:31:11,834-Speed 5976.88 samples/sec   Loss 6.1595   LearningRate 0.0709   Epoch: 12   Global Step: 128810   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:31:18,706-Speed 5962.23 samples/sec   Loss 6.0727   LearningRate 0.0709   Epoch: 12   Global Step: 128820   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:31:25,579-Speed 5960.35 samples/sec   Loss 6.1084   LearningRate 0.0708   Epoch: 12   Global Step: 128830   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:31:32,447-Speed 5965.00 samples/sec   Loss 6.1137   LearningRate 0.0708   Epoch: 12   Global Step: 128840   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:31:39,309-Speed 5970.75 samples/sec   Loss 6.1303   LearningRate 0.0708   Epoch: 12   Global Step: 128850   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:31:46,189-Speed 5954.55 samples/sec   Loss 6.1071   LearningRate 0.0708   Epoch: 12   Global Step: 128860   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:31:53,044-Speed 5976.20 samples/sec   Loss 6.0564   LearningRate 0.0708   Epoch: 12   Global Step: 128870   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:32:00,054-Speed 5846.83 samples/sec   Loss 6.0675   LearningRate 0.0708   Epoch: 12   Global Step: 128880   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:32:06,917-Speed 5969.71 samples/sec   Loss 6.0715   LearningRate 0.0707   Epoch: 12   Global Step: 128890   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:32:13,785-Speed 5965.51 samples/sec   Loss 6.1317   LearningRate 0.0707   Epoch: 12   Global Step: 128900   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:32:20,632-Speed 5982.78 samples/sec   Loss 6.1572   LearningRate 0.0707   Epoch: 12   Global Step: 128910   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:32:27,483-Speed 5980.29 samples/sec   Loss 6.1099   LearningRate 0.0707   Epoch: 12   Global Step: 128920   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:32:34,335-Speed 5978.69 samples/sec   Loss 6.1339   LearningRate 0.0707   Epoch: 12   Global Step: 128930   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:32:41,195-Speed 5971.96 samples/sec   Loss 6.1077   LearningRate 0.0707   Epoch: 12   Global Step: 128940   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:32:48,068-Speed 5961.05 samples/sec   Loss 6.0854   LearningRate 0.0706   Epoch: 12   Global Step: 128950   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:32:54,937-Speed 5963.90 samples/sec   Loss 6.1110   LearningRate 0.0706   Epoch: 12   Global Step: 128960   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:33:01,797-Speed 5972.67 samples/sec   Loss 6.0512   LearningRate 0.0706   Epoch: 12   Global Step: 128970   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-08 21:33:08,638-Speed 5987.89 samples/sec   Loss 6.1124   LearningRate 0.0706   Epoch: 12   Global Step: 128980   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:33:15,505-Speed 5966.14 samples/sec   Loss 6.1048   LearningRate 0.0706   Epoch: 12   Global Step: 128990   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:33:22,366-Speed 5970.52 samples/sec   Loss 6.0978   LearningRate 0.0705   Epoch: 12   Global Step: 129000   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:33:29,248-Speed 5953.23 samples/sec   Loss 6.1100   LearningRate 0.0705   Epoch: 12   Global Step: 129010   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:33:36,126-Speed 5955.92 samples/sec   Loss 6.1383   LearningRate 0.0705   Epoch: 12   Global Step: 129020   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:33:42,993-Speed 5965.47 samples/sec   Loss 6.0854   LearningRate 0.0705   Epoch: 12   Global Step: 129030   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:33:49,857-Speed 5968.71 samples/sec   Loss 6.0907   LearningRate 0.0705   Epoch: 12   Global Step: 129040   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:33:56,727-Speed 5966.36 samples/sec   Loss 6.0982   LearningRate 0.0705   Epoch: 12   Global Step: 129050   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:34:03,593-Speed 5966.60 samples/sec   Loss 6.0606   LearningRate 0.0704   Epoch: 12   Global Step: 129060   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:34:10,446-Speed 5978.57 samples/sec   Loss 6.0712   LearningRate 0.0704   Epoch: 12   Global Step: 129070   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:34:17,282-Speed 5992.59 samples/sec   Loss 6.0482   LearningRate 0.0704   Epoch: 12   Global Step: 129080   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:34:24,139-Speed 5976.81 samples/sec   Loss 6.0879   LearningRate 0.0704   Epoch: 12   Global Step: 129090   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:34:30,995-Speed 5975.45 samples/sec   Loss 6.1098   LearningRate 0.0704   Epoch: 12   Global Step: 129100   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:34:37,854-Speed 5973.00 samples/sec   Loss 6.1398   LearningRate 0.0703   Epoch: 12   Global Step: 129110   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:34:44,711-Speed 5975.20 samples/sec   Loss 6.0929   LearningRate 0.0703   Epoch: 12   Global Step: 129120   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:34:51,574-Speed 5969.51 samples/sec   Loss 6.1194   LearningRate 0.0703   Epoch: 12   Global Step: 129130   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:34:58,491-Speed 5922.61 samples/sec   Loss 6.1249   LearningRate 0.0703   Epoch: 12   Global Step: 129140   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:35:05,357-Speed 5966.66 samples/sec   Loss 6.0830   LearningRate 0.0703   Epoch: 12   Global Step: 129150   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:35:12,223-Speed 5966.70 samples/sec   Loss 6.0698   LearningRate 0.0703   Epoch: 12   Global Step: 129160   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:35:19,074-Speed 5979.65 samples/sec   Loss 6.0858   LearningRate 0.0702   Epoch: 12   Global Step: 129170   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:35:25,919-Speed 5984.88 samples/sec   Loss 6.0708   LearningRate 0.0702   Epoch: 12   Global Step: 129180   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:35:32,777-Speed 5973.32 samples/sec   Loss 6.1063   LearningRate 0.0702   Epoch: 12   Global Step: 129190   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:35:39,631-Speed 5977.62 samples/sec   Loss 6.0956   LearningRate 0.0702   Epoch: 12   Global Step: 129200   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:35:46,504-Speed 5960.90 samples/sec   Loss 6.1393   LearningRate 0.0702   Epoch: 12   Global Step: 129210   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:35:53,383-Speed 5955.49 samples/sec   Loss 6.1194   LearningRate 0.0701   Epoch: 12   Global Step: 129220   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:36:00,232-Speed 5981.68 samples/sec   Loss 6.0270   LearningRate 0.0701   Epoch: 12   Global Step: 129230   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:36:07,100-Speed 5964.74 samples/sec   Loss 6.1048   LearningRate 0.0701   Epoch: 12   Global Step: 129240   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:36:13,955-Speed 5976.97 samples/sec   Loss 6.1399   LearningRate 0.0701   Epoch: 12   Global Step: 129250   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:36:20,803-Speed 5981.61 samples/sec   Loss 6.0960   LearningRate 0.0701   Epoch: 12   Global Step: 129260   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:36:27,708-Speed 5936.73 samples/sec   Loss 6.1137   LearningRate 0.0701   Epoch: 12   Global Step: 129270   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:36:34,570-Speed 5969.55 samples/sec   Loss 6.0962   LearningRate 0.0700   Epoch: 12   Global Step: 129280   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:36:41,436-Speed 5967.05 samples/sec   Loss 6.0370   LearningRate 0.0700   Epoch: 12   Global Step: 129290   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:36:48,300-Speed 5968.48 samples/sec   Loss 6.1261   LearningRate 0.0700   Epoch: 12   Global Step: 129300   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:36:55,153-Speed 5978.06 samples/sec   Loss 6.0938   LearningRate 0.0700   Epoch: 12   Global Step: 129310   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:37:02,010-Speed 5974.32 samples/sec   Loss 6.1048   LearningRate 0.0700   Epoch: 12   Global Step: 129320   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:37:08,862-Speed 5979.59 samples/sec   Loss 6.0601   LearningRate 0.0699   Epoch: 12   Global Step: 129330   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:37:15,716-Speed 5976.64 samples/sec   Loss 6.0820   LearningRate 0.0699   Epoch: 12   Global Step: 129340   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:37:22,563-Speed 5983.31 samples/sec   Loss 6.1288   LearningRate 0.0699   Epoch: 12   Global Step: 129350   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:37:29,432-Speed 5964.65 samples/sec   Loss 6.0947   LearningRate 0.0699   Epoch: 12   Global Step: 129360   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:37:36,304-Speed 5961.63 samples/sec   Loss 6.1195   LearningRate 0.0699   Epoch: 12   Global Step: 129370   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:37:43,179-Speed 5959.26 samples/sec   Loss 6.0971   LearningRate 0.0699   Epoch: 12   Global Step: 129380   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:37:50,043-Speed 5968.80 samples/sec   Loss 6.0689   LearningRate 0.0698   Epoch: 12   Global Step: 129390   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:37:56,888-Speed 5984.76 samples/sec   Loss 6.0332   LearningRate 0.0698   Epoch: 12   Global Step: 129400   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:38:03,762-Speed 5960.02 samples/sec   Loss 6.0503   LearningRate 0.0698   Epoch: 12   Global Step: 129410   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:38:10,610-Speed 5982.89 samples/sec   Loss 6.0405   LearningRate 0.0698   Epoch: 12   Global Step: 129420   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:38:17,469-Speed 5972.38 samples/sec   Loss 6.0503   LearningRate 0.0698   Epoch: 12   Global Step: 129430   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:38:24,349-Speed 5954.93 samples/sec   Loss 6.0729   LearningRate 0.0698   Epoch: 12   Global Step: 129440   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:38:31,232-Speed 5952.38 samples/sec   Loss 6.0163   LearningRate 0.0697   Epoch: 12   Global Step: 129450   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:38:38,083-Speed 5979.35 samples/sec   Loss 6.0810   LearningRate 0.0697   Epoch: 12   Global Step: 129460   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:38:44,941-Speed 5976.53 samples/sec   Loss 6.0295   LearningRate 0.0697   Epoch: 12   Global Step: 129470   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:38:51,809-Speed 5964.94 samples/sec   Loss 6.0476   LearningRate 0.0697   Epoch: 12   Global Step: 129480   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:38:58,674-Speed 5967.04 samples/sec   Loss 6.1066   LearningRate 0.0697   Epoch: 12   Global Step: 129490   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:39:05,553-Speed 5956.13 samples/sec   Loss 6.0545   LearningRate 0.0696   Epoch: 12   Global Step: 129500   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:39:12,412-Speed 5972.97 samples/sec   Loss 6.1025   LearningRate 0.0696   Epoch: 12   Global Step: 129510   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:39:19,273-Speed 5971.21 samples/sec   Loss 6.0572   LearningRate 0.0696   Epoch: 12   Global Step: 129520   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:39:26,135-Speed 5970.27 samples/sec   Loss 6.0551   LearningRate 0.0696   Epoch: 12   Global Step: 129530   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:39:33,024-Speed 5947.17 samples/sec   Loss 6.0962   LearningRate 0.0696   Epoch: 12   Global Step: 129540   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:39:39,890-Speed 5966.07 samples/sec   Loss 6.0779   LearningRate 0.0696   Epoch: 12   Global Step: 129550   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:39:46,757-Speed 5966.36 samples/sec   Loss 6.0382   LearningRate 0.0695   Epoch: 12   Global Step: 129560   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:39:53,613-Speed 5975.94 samples/sec   Loss 6.0678   LearningRate 0.0695   Epoch: 12   Global Step: 129570   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:40:00,473-Speed 5972.11 samples/sec   Loss 6.0621   LearningRate 0.0695   Epoch: 12   Global Step: 129580   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:40:07,332-Speed 5972.42 samples/sec   Loss 6.0673   LearningRate 0.0695   Epoch: 12   Global Step: 129590   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:40:14,280-Speed 5897.78 samples/sec   Loss 6.0206   LearningRate 0.0695   Epoch: 12   Global Step: 129600   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:40:21,138-Speed 5973.37 samples/sec   Loss 6.0509   LearningRate 0.0694   Epoch: 12   Global Step: 129610   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:40:28,064-Speed 5915.32 samples/sec   Loss 6.0125   LearningRate 0.0694   Epoch: 12   Global Step: 129620   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:40:34,942-Speed 5958.39 samples/sec   Loss 6.0436   LearningRate 0.0694   Epoch: 12   Global Step: 129630   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:40:41,837-Speed 5941.84 samples/sec   Loss 6.0714   LearningRate 0.0694   Epoch: 12   Global Step: 129640   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:40:48,792-Speed 5890.52 samples/sec   Loss 6.0961   LearningRate 0.0694   Epoch: 12   Global Step: 129650   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:40:55,741-Speed 5895.57 samples/sec   Loss 6.0424   LearningRate 0.0694   Epoch: 12   Global Step: 129660   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:41:02,687-Speed 5897.76 samples/sec   Loss 6.0726   LearningRate 0.0693   Epoch: 12   Global Step: 129670   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:41:09,590-Speed 5934.65 samples/sec   Loss 6.1072   LearningRate 0.0693   Epoch: 12   Global Step: 129680   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:41:16,462-Speed 5962.39 samples/sec   Loss 6.0653   LearningRate 0.0693   Epoch: 12   Global Step: 129690   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:41:23,329-Speed 5965.45 samples/sec   Loss 6.0508   LearningRate 0.0693   Epoch: 12   Global Step: 129700   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:41:30,204-Speed 5959.27 samples/sec   Loss 6.0293   LearningRate 0.0693   Epoch: 12   Global Step: 129710   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:41:37,076-Speed 5962.51 samples/sec   Loss 6.0545   LearningRate 0.0693   Epoch: 12   Global Step: 129720   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:41:43,958-Speed 5952.50 samples/sec   Loss 5.9972   LearningRate 0.0692   Epoch: 12   Global Step: 129730   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:41:50,820-Speed 5970.56 samples/sec   Loss 6.0764   LearningRate 0.0692   Epoch: 12   Global Step: 129740   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:41:57,690-Speed 5964.40 samples/sec   Loss 6.0606   LearningRate 0.0692   Epoch: 12   Global Step: 129750   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:42:04,530-Speed 5989.43 samples/sec   Loss 6.0010   LearningRate 0.0692   Epoch: 12   Global Step: 129760   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:42:11,496-Speed 5881.45 samples/sec   Loss 6.0046   LearningRate 0.0692   Epoch: 12   Global Step: 129770   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:42:18,404-Speed 5931.67 samples/sec   Loss 6.0608   LearningRate 0.0691   Epoch: 12   Global Step: 129780   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:42:25,249-Speed 5984.46 samples/sec   Loss 6.0560   LearningRate 0.0691   Epoch: 12   Global Step: 129790   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:42:32,115-Speed 5967.58 samples/sec   Loss 6.0295   LearningRate 0.0691   Epoch: 12   Global Step: 129800   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:42:38,981-Speed 5966.33 samples/sec   Loss 5.9827   LearningRate 0.0691   Epoch: 12   Global Step: 129810   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:42:45,839-Speed 5974.14 samples/sec   Loss 6.0544   LearningRate 0.0691   Epoch: 12   Global Step: 129820   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:42:52,689-Speed 5981.31 samples/sec   Loss 6.0420   LearningRate 0.0691   Epoch: 12   Global Step: 129830   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:42:59,577-Speed 5947.39 samples/sec   Loss 6.0539   LearningRate 0.0690   Epoch: 12   Global Step: 129840   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:43:06,542-Speed 5882.08 samples/sec   Loss 6.0673   LearningRate 0.0690   Epoch: 12   Global Step: 129850   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:43:13,402-Speed 5972.78 samples/sec   Loss 6.0489   LearningRate 0.0690   Epoch: 12   Global Step: 129860   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:43:20,271-Speed 5963.59 samples/sec   Loss 6.0150   LearningRate 0.0690   Epoch: 12   Global Step: 129870   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:43:27,124-Speed 5980.36 samples/sec   Loss 6.0652   LearningRate 0.0690   Epoch: 12   Global Step: 129880   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:43:33,994-Speed 5964.49 samples/sec   Loss 6.0646   LearningRate 0.0689   Epoch: 12   Global Step: 129890   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:43:40,957-Speed 5888.17 samples/sec   Loss 6.0563   LearningRate 0.0689   Epoch: 12   Global Step: 129900   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:43:47,833-Speed 5957.73 samples/sec   Loss 6.0397   LearningRate 0.0689   Epoch: 12   Global Step: 129910   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:43:54,685-Speed 5979.37 samples/sec   Loss 6.0926   LearningRate 0.0689   Epoch: 12   Global Step: 129920   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:44:01,538-Speed 5978.34 samples/sec   Loss 6.1100   LearningRate 0.0689   Epoch: 12   Global Step: 129930   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:44:08,387-Speed 5981.94 samples/sec   Loss 6.0410   LearningRate 0.0689   Epoch: 12   Global Step: 129940   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:44:15,289-Speed 5935.17 samples/sec   Loss 6.0336   LearningRate 0.0688   Epoch: 12   Global Step: 129950   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:44:22,220-Speed 5911.29 samples/sec   Loss 6.0813   LearningRate 0.0688   Epoch: 12   Global Step: 129960   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:44:29,164-Speed 5898.68 samples/sec   Loss 6.0840   LearningRate 0.0688   Epoch: 12   Global Step: 129970   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:44:36,092-Speed 5913.94 samples/sec   Loss 6.0218   LearningRate 0.0688   Epoch: 12   Global Step: 129980   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:44:43,035-Speed 5900.24 samples/sec   Loss 5.9816   LearningRate 0.0688   Epoch: 12   Global Step: 129990   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:44:49,935-Speed 5937.46 samples/sec   Loss 6.0354   LearningRate 0.0688   Epoch: 12   Global Step: 130000   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:45:16,864-[lfw][130000]XNorm: 23.952681
Training: 2022-01-08 21:45:16,865-[lfw][130000]Accuracy-Flip: 0.99767+-0.00300
Training: 2022-01-08 21:45:16,865-[lfw][130000]Accuracy-Highest: 0.99783
Training: 2022-01-08 21:45:48,165-[cfp_fp][130000]XNorm: 20.997199
Training: 2022-01-08 21:45:48,166-[cfp_fp][130000]Accuracy-Flip: 0.98586+-0.00627
Training: 2022-01-08 21:45:48,167-[cfp_fp][130000]Accuracy-Highest: 0.98586
Training: 2022-01-08 21:46:15,228-[agedb_30][130000]XNorm: 23.315981
Training: 2022-01-08 21:46:15,229-[agedb_30][130000]Accuracy-Flip: 0.97550+-0.00517
Training: 2022-01-08 21:46:15,229-[agedb_30][130000]Accuracy-Highest: 0.97550
Training: 2022-01-08 21:46:22,069-Speed 444.58 samples/sec   Loss 6.0551   LearningRate 0.0687   Epoch: 12   Global Step: 130010   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:46:28,919-Speed 5981.68 samples/sec   Loss 6.0376   LearningRate 0.0687   Epoch: 12   Global Step: 130020   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:46:35,771-Speed 5979.41 samples/sec   Loss 6.0375   LearningRate 0.0687   Epoch: 12   Global Step: 130030   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:46:42,721-Speed 5894.75 samples/sec   Loss 6.0127   LearningRate 0.0687   Epoch: 12   Global Step: 130040   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:46:49,591-Speed 5963.44 samples/sec   Loss 6.0564   LearningRate 0.0687   Epoch: 12   Global Step: 130050   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:46:56,460-Speed 5964.45 samples/sec   Loss 6.0611   LearningRate 0.0686   Epoch: 12   Global Step: 130060   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:47:03,352-Speed 5944.50 samples/sec   Loss 5.9903   LearningRate 0.0686   Epoch: 12   Global Step: 130070   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:47:10,204-Speed 5978.57 samples/sec   Loss 6.0034   LearningRate 0.0686   Epoch: 12   Global Step: 130080   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:47:17,060-Speed 5975.91 samples/sec   Loss 6.0720   LearningRate 0.0686   Epoch: 12   Global Step: 130090   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:47:23,914-Speed 5976.57 samples/sec   Loss 6.0584   LearningRate 0.0686   Epoch: 12   Global Step: 130100   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:47:30,765-Speed 5980.35 samples/sec   Loss 6.0045   LearningRate 0.0686   Epoch: 12   Global Step: 130110   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:47:37,619-Speed 5976.56 samples/sec   Loss 6.0416   LearningRate 0.0685   Epoch: 12   Global Step: 130120   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:47:44,476-Speed 5974.58 samples/sec   Loss 6.0649   LearningRate 0.0685   Epoch: 12   Global Step: 130130   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:47:51,358-Speed 5952.70 samples/sec   Loss 6.0520   LearningRate 0.0685   Epoch: 12   Global Step: 130140   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:47:58,199-Speed 5988.25 samples/sec   Loss 6.0615   LearningRate 0.0685   Epoch: 12   Global Step: 130150   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:48:05,057-Speed 5973.90 samples/sec   Loss 5.9790   LearningRate 0.0685   Epoch: 12   Global Step: 130160   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:48:11,920-Speed 5969.74 samples/sec   Loss 5.9935   LearningRate 0.0685   Epoch: 12   Global Step: 130170   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:48:18,756-Speed 5992.40 samples/sec   Loss 6.0491   LearningRate 0.0684   Epoch: 12   Global Step: 130180   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:48:25,611-Speed 5976.64 samples/sec   Loss 6.0684   LearningRate 0.0684   Epoch: 12   Global Step: 130190   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:48:32,454-Speed 5986.74 samples/sec   Loss 5.9730   LearningRate 0.0684   Epoch: 12   Global Step: 130200   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:48:39,297-Speed 5986.34 samples/sec   Loss 5.9669   LearningRate 0.0684   Epoch: 12   Global Step: 130210   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:48:46,167-Speed 5963.94 samples/sec   Loss 6.0286   LearningRate 0.0684   Epoch: 12   Global Step: 130220   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:48:53,017-Speed 5980.31 samples/sec   Loss 5.9758   LearningRate 0.0683   Epoch: 12   Global Step: 130230   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-08 21:48:59,872-Speed 5976.43 samples/sec   Loss 5.9311   LearningRate 0.0683   Epoch: 12   Global Step: 130240   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:49:06,723-Speed 5979.98 samples/sec   Loss 5.9620   LearningRate 0.0683   Epoch: 12   Global Step: 130250   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:49:13,567-Speed 5985.96 samples/sec   Loss 6.0585   LearningRate 0.0683   Epoch: 12   Global Step: 130260   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:49:20,412-Speed 5985.15 samples/sec   Loss 6.0145   LearningRate 0.0683   Epoch: 12   Global Step: 130270   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:49:27,271-Speed 5972.77 samples/sec   Loss 5.9774   LearningRate 0.0683   Epoch: 12   Global Step: 130280   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-08 21:49:34,140-Speed 5964.36 samples/sec   Loss 6.0215   LearningRate 0.0682   Epoch: 12   Global Step: 130290   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-08 21:49:40,986-Speed 5984.04 samples/sec   Loss 6.0002   LearningRate 0.0682   Epoch: 12   Global Step: 130300   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-08 21:49:47,839-Speed 5977.75 samples/sec   Loss 6.0227   LearningRate 0.0682   Epoch: 12   Global Step: 130310   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-08 21:49:54,682-Speed 5986.79 samples/sec   Loss 6.0180   LearningRate 0.0682   Epoch: 12   Global Step: 130320   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-08 21:50:01,550-Speed 5965.03 samples/sec   Loss 6.0380   LearningRate 0.0682   Epoch: 12   Global Step: 130330   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-08 21:50:08,405-Speed 5976.40 samples/sec   Loss 5.9602   LearningRate 0.0682   Epoch: 12   Global Step: 130340   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-08 21:50:15,267-Speed 5970.20 samples/sec   Loss 6.0890   LearningRate 0.0681   Epoch: 12   Global Step: 130350   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-08 21:50:22,131-Speed 5969.67 samples/sec   Loss 6.0021   LearningRate 0.0681   Epoch: 12   Global Step: 130360   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-08 21:50:28,969-Speed 5990.61 samples/sec   Loss 5.9874   LearningRate 0.0681   Epoch: 12   Global Step: 130370   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-08 21:50:35,815-Speed 5984.03 samples/sec   Loss 6.0047   LearningRate 0.0681   Epoch: 12   Global Step: 130380   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:50:42,660-Speed 5985.18 samples/sec   Loss 5.9444   LearningRate 0.0681   Epoch: 12   Global Step: 130390   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:50:49,507-Speed 5982.54 samples/sec   Loss 6.0496   LearningRate 0.0680   Epoch: 12   Global Step: 130400   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:50:56,384-Speed 5960.05 samples/sec   Loss 6.0535   LearningRate 0.0680   Epoch: 12   Global Step: 130410   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:51:03,244-Speed 5973.42 samples/sec   Loss 6.0015   LearningRate 0.0680   Epoch: 12   Global Step: 130420   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:51:10,111-Speed 5965.91 samples/sec   Loss 6.0173   LearningRate 0.0680   Epoch: 12   Global Step: 130430   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:51:16,961-Speed 5980.28 samples/sec   Loss 6.0537   LearningRate 0.0680   Epoch: 12   Global Step: 130440   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:51:23,805-Speed 5986.54 samples/sec   Loss 6.0080   LearningRate 0.0680   Epoch: 12   Global Step: 130450   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:51:30,672-Speed 5965.92 samples/sec   Loss 6.0052   LearningRate 0.0679   Epoch: 12   Global Step: 130460   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:51:37,525-Speed 5977.44 samples/sec   Loss 5.9776   LearningRate 0.0679   Epoch: 12   Global Step: 130470   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:51:44,382-Speed 5974.76 samples/sec   Loss 6.0004   LearningRate 0.0679   Epoch: 12   Global Step: 130480   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:51:51,238-Speed 5978.45 samples/sec   Loss 5.9885   LearningRate 0.0679   Epoch: 12   Global Step: 130490   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:51:58,099-Speed 5970.94 samples/sec   Loss 6.0193   LearningRate 0.0679   Epoch: 12   Global Step: 130500   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:52:04,978-Speed 5956.88 samples/sec   Loss 5.9428   LearningRate 0.0679   Epoch: 12   Global Step: 130510   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:52:11,835-Speed 5978.07 samples/sec   Loss 5.9932   LearningRate 0.0678   Epoch: 12   Global Step: 130520   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:52:18,709-Speed 5959.32 samples/sec   Loss 5.9973   LearningRate 0.0678   Epoch: 12   Global Step: 130530   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:52:25,562-Speed 5978.04 samples/sec   Loss 6.0326   LearningRate 0.0678   Epoch: 12   Global Step: 130540   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:52:32,421-Speed 5973.68 samples/sec   Loss 5.9182   LearningRate 0.0678   Epoch: 12   Global Step: 130550   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:52:39,268-Speed 5983.14 samples/sec   Loss 6.0024   LearningRate 0.0678   Epoch: 12   Global Step: 130560   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:52:46,115-Speed 5982.52 samples/sec   Loss 5.9854   LearningRate 0.0677   Epoch: 12   Global Step: 130570   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:52:52,957-Speed 5988.01 samples/sec   Loss 5.9547   LearningRate 0.0677   Epoch: 12   Global Step: 130580   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:52:59,909-Speed 5893.13 samples/sec   Loss 5.9757   LearningRate 0.0677   Epoch: 12   Global Step: 130590   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:53:06,765-Speed 5976.93 samples/sec   Loss 5.9555   LearningRate 0.0677   Epoch: 12   Global Step: 130600   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:53:13,624-Speed 5973.81 samples/sec   Loss 5.9909   LearningRate 0.0677   Epoch: 12   Global Step: 130610   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:53:20,579-Speed 5889.92 samples/sec   Loss 5.9998   LearningRate 0.0677   Epoch: 12   Global Step: 130620   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:53:27,438-Speed 5972.70 samples/sec   Loss 6.0560   LearningRate 0.0676   Epoch: 12   Global Step: 130630   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:53:34,301-Speed 5972.33 samples/sec   Loss 5.9485   LearningRate 0.0676   Epoch: 12   Global Step: 130640   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:53:41,149-Speed 5982.06 samples/sec   Loss 6.0106   LearningRate 0.0676   Epoch: 12   Global Step: 130650   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:53:47,990-Speed 5991.52 samples/sec   Loss 5.9828   LearningRate 0.0676   Epoch: 12   Global Step: 130660   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:53:54,837-Speed 5983.06 samples/sec   Loss 5.9750   LearningRate 0.0676   Epoch: 12   Global Step: 130670   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:54:01,676-Speed 5989.41 samples/sec   Loss 5.9685   LearningRate 0.0676   Epoch: 12   Global Step: 130680   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:54:08,522-Speed 5984.75 samples/sec   Loss 5.9087   LearningRate 0.0675   Epoch: 12   Global Step: 130690   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:54:15,423-Speed 5936.73 samples/sec   Loss 6.0106   LearningRate 0.0675   Epoch: 12   Global Step: 130700   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:54:22,326-Speed 5934.65 samples/sec   Loss 5.9615   LearningRate 0.0675   Epoch: 12   Global Step: 130710   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:54:29,224-Speed 5939.14 samples/sec   Loss 5.9938   LearningRate 0.0675   Epoch: 12   Global Step: 130720   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:54:36,119-Speed 5941.98 samples/sec   Loss 6.0079   LearningRate 0.0675   Epoch: 12   Global Step: 130730   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:54:42,965-Speed 5983.75 samples/sec   Loss 6.0394   LearningRate 0.0674   Epoch: 12   Global Step: 130740   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:54:49,823-Speed 5973.46 samples/sec   Loss 6.0111   LearningRate 0.0674   Epoch: 12   Global Step: 130750   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:54:56,708-Speed 5950.73 samples/sec   Loss 5.9673   LearningRate 0.0674   Epoch: 12   Global Step: 130760   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:55:03,560-Speed 5979.14 samples/sec   Loss 5.9937   LearningRate 0.0674   Epoch: 12   Global Step: 130770   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:55:10,410-Speed 5980.62 samples/sec   Loss 6.0163   LearningRate 0.0674   Epoch: 12   Global Step: 130780   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:55:17,279-Speed 5964.89 samples/sec   Loss 6.0111   LearningRate 0.0674   Epoch: 12   Global Step: 130790   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:55:24,133-Speed 5977.01 samples/sec   Loss 6.0240   LearningRate 0.0673   Epoch: 12   Global Step: 130800   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:55:30,987-Speed 5977.53 samples/sec   Loss 5.9972   LearningRate 0.0673   Epoch: 12   Global Step: 130810   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:55:37,838-Speed 5980.01 samples/sec   Loss 5.9850   LearningRate 0.0673   Epoch: 12   Global Step: 130820   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:55:44,688-Speed 5980.42 samples/sec   Loss 5.9443   LearningRate 0.0673   Epoch: 12   Global Step: 130830   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:55:51,527-Speed 5990.19 samples/sec   Loss 5.9771   LearningRate 0.0673   Epoch: 12   Global Step: 130840   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:55:58,381-Speed 5977.10 samples/sec   Loss 5.9778   LearningRate 0.0673   Epoch: 12   Global Step: 130850   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-01-08 21:56:05,238-Speed 5974.56 samples/sec   Loss 5.9539   LearningRate 0.0672   Epoch: 12   Global Step: 130860   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:56:12,092-Speed 5977.79 samples/sec   Loss 5.9347   LearningRate 0.0672   Epoch: 12   Global Step: 130870   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:56:18,938-Speed 5985.10 samples/sec   Loss 5.9639   LearningRate 0.0672   Epoch: 12   Global Step: 130880   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:56:25,807-Speed 5963.69 samples/sec   Loss 5.9084   LearningRate 0.0672   Epoch: 12   Global Step: 130890   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:56:32,675-Speed 5965.38 samples/sec   Loss 6.0023   LearningRate 0.0672   Epoch: 12   Global Step: 130900   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:56:39,543-Speed 5964.75 samples/sec   Loss 6.0281   LearningRate 0.0671   Epoch: 12   Global Step: 130910   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:56:46,414-Speed 5962.25 samples/sec   Loss 5.9485   LearningRate 0.0671   Epoch: 12   Global Step: 130920   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:56:53,266-Speed 5979.00 samples/sec   Loss 6.0080   LearningRate 0.0671   Epoch: 12   Global Step: 130930   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:57:00,167-Speed 5937.51 samples/sec   Loss 5.9785   LearningRate 0.0671   Epoch: 12   Global Step: 130940   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:57:07,011-Speed 5985.37 samples/sec   Loss 5.9502   LearningRate 0.0671   Epoch: 12   Global Step: 130950   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:57:13,858-Speed 5983.84 samples/sec   Loss 5.9679   LearningRate 0.0671   Epoch: 12   Global Step: 130960   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:57:20,722-Speed 5968.33 samples/sec   Loss 5.9868   LearningRate 0.0670   Epoch: 12   Global Step: 130970   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:57:27,564-Speed 5987.42 samples/sec   Loss 5.9334   LearningRate 0.0670   Epoch: 12   Global Step: 130980   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:57:34,411-Speed 5983.60 samples/sec   Loss 5.9560   LearningRate 0.0670   Epoch: 12   Global Step: 130990   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:57:41,267-Speed 5975.35 samples/sec   Loss 5.9532   LearningRate 0.0670   Epoch: 12   Global Step: 131000   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:57:48,134-Speed 5965.11 samples/sec   Loss 6.0117   LearningRate 0.0670   Epoch: 12   Global Step: 131010   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:57:55,000-Speed 5967.22 samples/sec   Loss 5.9958   LearningRate 0.0670   Epoch: 12   Global Step: 131020   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:58:01,869-Speed 5964.89 samples/sec   Loss 5.9851   LearningRate 0.0669   Epoch: 12   Global Step: 131030   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:58:08,719-Speed 5980.50 samples/sec   Loss 6.0032   LearningRate 0.0669   Epoch: 12   Global Step: 131040   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:58:15,587-Speed 5965.43 samples/sec   Loss 5.9566   LearningRate 0.0669   Epoch: 12   Global Step: 131050   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:58:22,456-Speed 5963.97 samples/sec   Loss 5.9031   LearningRate 0.0669   Epoch: 12   Global Step: 131060   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:58:29,303-Speed 5983.07 samples/sec   Loss 5.9718   LearningRate 0.0669   Epoch: 12   Global Step: 131070   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:58:36,171-Speed 5965.57 samples/sec   Loss 6.0043   LearningRate 0.0668   Epoch: 12   Global Step: 131080   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:58:43,031-Speed 5972.30 samples/sec   Loss 5.9564   LearningRate 0.0668   Epoch: 12   Global Step: 131090   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:58:49,896-Speed 5967.72 samples/sec   Loss 5.9278   LearningRate 0.0668   Epoch: 12   Global Step: 131100   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:58:56,762-Speed 5966.52 samples/sec   Loss 5.9626   LearningRate 0.0668   Epoch: 12   Global Step: 131110   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:59:03,621-Speed 5973.29 samples/sec   Loss 5.9646   LearningRate 0.0668   Epoch: 12   Global Step: 131120   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:59:10,475-Speed 5976.81 samples/sec   Loss 5.9246   LearningRate 0.0668   Epoch: 12   Global Step: 131130   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:59:17,334-Speed 5974.98 samples/sec   Loss 5.8884   LearningRate 0.0667   Epoch: 12   Global Step: 131140   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:59:24,201-Speed 5968.35 samples/sec   Loss 5.9672   LearningRate 0.0667   Epoch: 12   Global Step: 131150   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:59:31,096-Speed 5941.29 samples/sec   Loss 5.9962   LearningRate 0.0667   Epoch: 12   Global Step: 131160   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:59:37,940-Speed 5985.87 samples/sec   Loss 5.9475   LearningRate 0.0667   Epoch: 12   Global Step: 131170   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 21:59:44,816-Speed 5958.79 samples/sec   Loss 5.9804   LearningRate 0.0667   Epoch: 12   Global Step: 131180   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:59:51,679-Speed 5968.49 samples/sec   Loss 5.9748   LearningRate 0.0667   Epoch: 12   Global Step: 131190   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 21:59:58,516-Speed 5993.13 samples/sec   Loss 5.9891   LearningRate 0.0666   Epoch: 12   Global Step: 131200   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:00:05,382-Speed 5965.96 samples/sec   Loss 5.9255   LearningRate 0.0666   Epoch: 12   Global Step: 131210   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:00:12,235-Speed 5978.03 samples/sec   Loss 5.9479   LearningRate 0.0666   Epoch: 12   Global Step: 131220   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:00:19,092-Speed 5974.28 samples/sec   Loss 5.9145   LearningRate 0.0666   Epoch: 12   Global Step: 131230   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:00:25,943-Speed 5980.62 samples/sec   Loss 6.0010   LearningRate 0.0666   Epoch: 12   Global Step: 131240   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:00:32,802-Speed 5972.32 samples/sec   Loss 5.9777   LearningRate 0.0666   Epoch: 12   Global Step: 131250   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:00:39,679-Speed 5957.21 samples/sec   Loss 5.9275   LearningRate 0.0665   Epoch: 12   Global Step: 131260   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:00:46,530-Speed 5981.81 samples/sec   Loss 5.9785   LearningRate 0.0665   Epoch: 12   Global Step: 131270   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:00:53,403-Speed 5960.85 samples/sec   Loss 5.9607   LearningRate 0.0665   Epoch: 12   Global Step: 131280   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:01:00,253-Speed 5980.80 samples/sec   Loss 5.9404   LearningRate 0.0665   Epoch: 12   Global Step: 131290   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:01:07,102-Speed 5981.06 samples/sec   Loss 5.9370   LearningRate 0.0665   Epoch: 12   Global Step: 131300   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:01:13,984-Speed 5952.85 samples/sec   Loss 5.9683   LearningRate 0.0664   Epoch: 12   Global Step: 131310   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:01:20,852-Speed 5965.05 samples/sec   Loss 5.9243   LearningRate 0.0664   Epoch: 12   Global Step: 131320   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:01:27,706-Speed 5979.08 samples/sec   Loss 5.9623   LearningRate 0.0664   Epoch: 12   Global Step: 131330   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:01:34,552-Speed 5983.96 samples/sec   Loss 5.9803   LearningRate 0.0664   Epoch: 12   Global Step: 131340   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:01:41,398-Speed 5984.25 samples/sec   Loss 5.9204   LearningRate 0.0664   Epoch: 12   Global Step: 131350   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:01:48,251-Speed 5977.84 samples/sec   Loss 5.9091   LearningRate 0.0664   Epoch: 12   Global Step: 131360   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:01:55,107-Speed 5975.05 samples/sec   Loss 5.9559   LearningRate 0.0663   Epoch: 12   Global Step: 131370   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:02:01,978-Speed 5962.33 samples/sec   Loss 5.9415   LearningRate 0.0663   Epoch: 12   Global Step: 131380   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:02:08,830-Speed 5979.40 samples/sec   Loss 5.9463   LearningRate 0.0663   Epoch: 12   Global Step: 131390   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:02:15,700-Speed 5962.76 samples/sec   Loss 5.9212   LearningRate 0.0663   Epoch: 12   Global Step: 131400   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:02:22,568-Speed 5964.72 samples/sec   Loss 5.9871   LearningRate 0.0663   Epoch: 12   Global Step: 131410   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:02:29,435-Speed 5966.62 samples/sec   Loss 5.9488   LearningRate 0.0663   Epoch: 12   Global Step: 131420   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-08 22:02:36,309-Speed 5959.72 samples/sec   Loss 5.9240   LearningRate 0.0662   Epoch: 12   Global Step: 131430   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-08 22:02:43,179-Speed 5963.71 samples/sec   Loss 5.9170   LearningRate 0.0662   Epoch: 12   Global Step: 131440   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-08 22:02:50,057-Speed 5956.42 samples/sec   Loss 5.9160   LearningRate 0.0662   Epoch: 12   Global Step: 131450   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-08 22:02:56,907-Speed 5980.33 samples/sec   Loss 5.9469   LearningRate 0.0662   Epoch: 12   Global Step: 131460   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-08 22:03:03,810-Speed 5935.02 samples/sec   Loss 5.9348   LearningRate 0.0662   Epoch: 12   Global Step: 131470   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-08 22:03:10,763-Speed 5893.27 samples/sec   Loss 5.9038   LearningRate 0.0661   Epoch: 12   Global Step: 131480   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-08 22:03:17,611-Speed 5982.69 samples/sec   Loss 5.9618   LearningRate 0.0661   Epoch: 12   Global Step: 131490   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-08 22:03:24,449-Speed 5990.81 samples/sec   Loss 5.9170   LearningRate 0.0661   Epoch: 12   Global Step: 131500   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-08 22:03:31,297-Speed 5982.95 samples/sec   Loss 5.9791   LearningRate 0.0661   Epoch: 12   Global Step: 131510   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-08 22:03:38,168-Speed 5961.62 samples/sec   Loss 5.8598   LearningRate 0.0661   Epoch: 12   Global Step: 131520   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:03:45,013-Speed 5985.29 samples/sec   Loss 5.9357   LearningRate 0.0661   Epoch: 12   Global Step: 131530   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:03:51,880-Speed 5969.51 samples/sec   Loss 5.9062   LearningRate 0.0660   Epoch: 12   Global Step: 131540   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:03:58,719-Speed 5989.78 samples/sec   Loss 5.9242   LearningRate 0.0660   Epoch: 12   Global Step: 131550   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:04:05,594-Speed 5959.34 samples/sec   Loss 5.8832   LearningRate 0.0660   Epoch: 12   Global Step: 131560   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:04:12,449-Speed 5976.65 samples/sec   Loss 5.9246   LearningRate 0.0660   Epoch: 12   Global Step: 131570   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:04:19,291-Speed 5987.46 samples/sec   Loss 5.9682   LearningRate 0.0660   Epoch: 12   Global Step: 131580   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:04:26,138-Speed 5982.76 samples/sec   Loss 5.8882   LearningRate 0.0660   Epoch: 12   Global Step: 131590   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:04:33,008-Speed 5963.38 samples/sec   Loss 5.9506   LearningRate 0.0659   Epoch: 12   Global Step: 131600   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:04:39,867-Speed 5973.02 samples/sec   Loss 5.9420   LearningRate 0.0659   Epoch: 12   Global Step: 131610   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:04:46,728-Speed 5971.17 samples/sec   Loss 5.9007   LearningRate 0.0659   Epoch: 12   Global Step: 131620   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:04:53,573-Speed 5985.68 samples/sec   Loss 5.9292   LearningRate 0.0659   Epoch: 12   Global Step: 131630   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:05:00,430-Speed 5973.86 samples/sec   Loss 5.8682   LearningRate 0.0659   Epoch: 12   Global Step: 131640   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:05:07,296-Speed 5967.40 samples/sec   Loss 5.9244   LearningRate 0.0659   Epoch: 12   Global Step: 131650   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:05:14,167-Speed 5964.80 samples/sec   Loss 5.9434   LearningRate 0.0658   Epoch: 12   Global Step: 131660   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:05:21,023-Speed 5974.94 samples/sec   Loss 5.9710   LearningRate 0.0658   Epoch: 12   Global Step: 131670   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:05:27,935-Speed 5927.74 samples/sec   Loss 5.9003   LearningRate 0.0658   Epoch: 12   Global Step: 131680   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:05:34,804-Speed 5965.40 samples/sec   Loss 5.9196   LearningRate 0.0658   Epoch: 12   Global Step: 131690   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:05:41,673-Speed 5964.04 samples/sec   Loss 5.9327   LearningRate 0.0658   Epoch: 12   Global Step: 131700   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:05:48,541-Speed 5964.65 samples/sec   Loss 5.8850   LearningRate 0.0657   Epoch: 12   Global Step: 131710   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:05:55,400-Speed 5973.68 samples/sec   Loss 5.9212   LearningRate 0.0657   Epoch: 12   Global Step: 131720   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:06:02,261-Speed 5971.18 samples/sec   Loss 5.8642   LearningRate 0.0657   Epoch: 12   Global Step: 131730   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:06:09,098-Speed 5992.02 samples/sec   Loss 5.8763   LearningRate 0.0657   Epoch: 12   Global Step: 131740   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:06:15,946-Speed 5982.76 samples/sec   Loss 5.9167   LearningRate 0.0657   Epoch: 12   Global Step: 131750   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:06:22,796-Speed 5982.08 samples/sec   Loss 5.8679   LearningRate 0.0657   Epoch: 12   Global Step: 131760   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:06:29,641-Speed 5985.35 samples/sec   Loss 5.9016   LearningRate 0.0656   Epoch: 12   Global Step: 131770   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:06:36,495-Speed 5977.88 samples/sec   Loss 5.8542   LearningRate 0.0656   Epoch: 12   Global Step: 131780   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:06:43,352-Speed 5974.87 samples/sec   Loss 5.9196   LearningRate 0.0656   Epoch: 12   Global Step: 131790   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:06:50,215-Speed 5969.62 samples/sec   Loss 5.9184   LearningRate 0.0656   Epoch: 12   Global Step: 131800   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:06:57,074-Speed 5973.17 samples/sec   Loss 5.8754   LearningRate 0.0656   Epoch: 12   Global Step: 131810   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:07:03,918-Speed 5986.11 samples/sec   Loss 5.9256   LearningRate 0.0656   Epoch: 12   Global Step: 131820   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:07:10,778-Speed 5972.22 samples/sec   Loss 5.9092   LearningRate 0.0655   Epoch: 12   Global Step: 131830   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:07:17,658-Speed 5956.80 samples/sec   Loss 5.9049   LearningRate 0.0655   Epoch: 12   Global Step: 131840   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:07:24,502-Speed 5985.86 samples/sec   Loss 5.8698   LearningRate 0.0655   Epoch: 12   Global Step: 131850   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:07:31,347-Speed 5985.02 samples/sec   Loss 5.9043   LearningRate 0.0655   Epoch: 12   Global Step: 131860   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:07:38,216-Speed 5964.29 samples/sec   Loss 5.9176   LearningRate 0.0655   Epoch: 12   Global Step: 131870   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:07:45,062-Speed 5984.05 samples/sec   Loss 5.9477   LearningRate 0.0655   Epoch: 12   Global Step: 131880   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:07:51,916-Speed 5977.40 samples/sec   Loss 5.9495   LearningRate 0.0654   Epoch: 12   Global Step: 131890   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:07:58,761-Speed 5985.15 samples/sec   Loss 5.9263   LearningRate 0.0654   Epoch: 12   Global Step: 131900   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:08:05,692-Speed 5910.23 samples/sec   Loss 5.9185   LearningRate 0.0654   Epoch: 12   Global Step: 131910   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:08:12,556-Speed 5968.97 samples/sec   Loss 5.8916   LearningRate 0.0654   Epoch: 12   Global Step: 131920   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:08:19,398-Speed 5988.17 samples/sec   Loss 5.9206   LearningRate 0.0654   Epoch: 12   Global Step: 131930   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:08:26,250-Speed 5979.23 samples/sec   Loss 5.9001   LearningRate 0.0653   Epoch: 12   Global Step: 131940   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:08:33,125-Speed 5958.94 samples/sec   Loss 5.8803   LearningRate 0.0653   Epoch: 12   Global Step: 131950   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:08:39,982-Speed 5975.28 samples/sec   Loss 5.9107   LearningRate 0.0653   Epoch: 12   Global Step: 131960   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:08:46,846-Speed 5968.08 samples/sec   Loss 5.9134   LearningRate 0.0653   Epoch: 12   Global Step: 131970   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:08:53,824-Speed 5871.06 samples/sec   Loss 5.9009   LearningRate 0.0653   Epoch: 12   Global Step: 131980   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:09:00,783-Speed 5889.02 samples/sec   Loss 5.8862   LearningRate 0.0653   Epoch: 12   Global Step: 131990   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:09:07,750-Speed 5879.89 samples/sec   Loss 5.9351   LearningRate 0.0652   Epoch: 12   Global Step: 132000   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:09:14,596-Speed 5985.84 samples/sec   Loss 5.8696   LearningRate 0.0652   Epoch: 12   Global Step: 132010   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:09:21,582-Speed 5864.48 samples/sec   Loss 5.8534   LearningRate 0.0652   Epoch: 12   Global Step: 132020   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:09:28,438-Speed 5975.87 samples/sec   Loss 5.9759   LearningRate 0.0652   Epoch: 12   Global Step: 132030   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:09:35,294-Speed 5975.87 samples/sec   Loss 5.9000   LearningRate 0.0652   Epoch: 12   Global Step: 132040   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:09:42,133-Speed 5990.35 samples/sec   Loss 5.9249   LearningRate 0.0652   Epoch: 12   Global Step: 132050   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:09:49,043-Speed 5928.17 samples/sec   Loss 5.8281   LearningRate 0.0651   Epoch: 12   Global Step: 132060   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:09:55,904-Speed 5970.99 samples/sec   Loss 5.9021   LearningRate 0.0651   Epoch: 12   Global Step: 132070   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:10:02,754-Speed 5980.79 samples/sec   Loss 5.8615   LearningRate 0.0651   Epoch: 12   Global Step: 132080   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:10:09,611-Speed 5975.69 samples/sec   Loss 5.9055   LearningRate 0.0651   Epoch: 12   Global Step: 132090   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:10:16,464-Speed 5978.70 samples/sec   Loss 5.8334   LearningRate 0.0651   Epoch: 12   Global Step: 132100   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:10:23,308-Speed 5985.98 samples/sec   Loss 5.8867   LearningRate 0.0651   Epoch: 12   Global Step: 132110   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:10:30,169-Speed 5970.30 samples/sec   Loss 5.8421   LearningRate 0.0650   Epoch: 12   Global Step: 132120   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:10:37,022-Speed 5978.76 samples/sec   Loss 5.8768   LearningRate 0.0650   Epoch: 12   Global Step: 132130   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:10:43,912-Speed 5945.70 samples/sec   Loss 5.9397   LearningRate 0.0650   Epoch: 12   Global Step: 132140   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:10:50,759-Speed 5982.86 samples/sec   Loss 5.8454   LearningRate 0.0650   Epoch: 12   Global Step: 132150   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:10:57,641-Speed 5953.30 samples/sec   Loss 5.8704   LearningRate 0.0650   Epoch: 12   Global Step: 132160   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:11:04,496-Speed 5977.30 samples/sec   Loss 5.8850   LearningRate 0.0650   Epoch: 12   Global Step: 132170   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:11:11,338-Speed 5986.59 samples/sec   Loss 5.8851   LearningRate 0.0649   Epoch: 12   Global Step: 132180   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:11:18,196-Speed 5974.69 samples/sec   Loss 5.9112   LearningRate 0.0649   Epoch: 12   Global Step: 132190   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:11:25,042-Speed 5983.56 samples/sec   Loss 5.9128   LearningRate 0.0649   Epoch: 12   Global Step: 132200   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:11:31,897-Speed 5976.61 samples/sec   Loss 5.9150   LearningRate 0.0649   Epoch: 12   Global Step: 132210   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:11:38,907-Speed 5847.11 samples/sec   Loss 5.9222   LearningRate 0.0649   Epoch: 12   Global Step: 132220   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:11:45,756-Speed 5981.89 samples/sec   Loss 5.8842   LearningRate 0.0648   Epoch: 12   Global Step: 132230   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:11:52,602-Speed 5984.06 samples/sec   Loss 5.8664   LearningRate 0.0648   Epoch: 12   Global Step: 132240   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:11:59,453-Speed 5980.27 samples/sec   Loss 5.8863   LearningRate 0.0648   Epoch: 12   Global Step: 132250   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:12:06,316-Speed 5969.60 samples/sec   Loss 5.8526   LearningRate 0.0648   Epoch: 12   Global Step: 132260   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:12:13,194-Speed 5956.65 samples/sec   Loss 5.8913   LearningRate 0.0648   Epoch: 12   Global Step: 132270   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:12:20,050-Speed 5975.69 samples/sec   Loss 5.8430   LearningRate 0.0648   Epoch: 12   Global Step: 132280   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:12:26,922-Speed 5962.24 samples/sec   Loss 5.8501   LearningRate 0.0647   Epoch: 12   Global Step: 132290   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:12:33,786-Speed 5968.87 samples/sec   Loss 5.9265   LearningRate 0.0647   Epoch: 12   Global Step: 132300   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:12:40,656-Speed 5963.69 samples/sec   Loss 5.8449   LearningRate 0.0647   Epoch: 12   Global Step: 132310   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:12:47,527-Speed 5962.08 samples/sec   Loss 5.8737   LearningRate 0.0647   Epoch: 12   Global Step: 132320   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:12:54,381-Speed 5976.95 samples/sec   Loss 5.8614   LearningRate 0.0647   Epoch: 12   Global Step: 132330   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:13:01,261-Speed 5955.36 samples/sec   Loss 5.8248   LearningRate 0.0647   Epoch: 12   Global Step: 132340   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:13:08,141-Speed 5954.40 samples/sec   Loss 5.8089   LearningRate 0.0646   Epoch: 12   Global Step: 132350   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:13:15,044-Speed 5934.20 samples/sec   Loss 5.8354   LearningRate 0.0646   Epoch: 12   Global Step: 132360   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:13:21,945-Speed 5937.21 samples/sec   Loss 5.8894   LearningRate 0.0646   Epoch: 12   Global Step: 132370   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:13:28,792-Speed 5982.86 samples/sec   Loss 5.9118   LearningRate 0.0646   Epoch: 12   Global Step: 132380   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:13:35,642-Speed 5980.32 samples/sec   Loss 5.8615   LearningRate 0.0646   Epoch: 12   Global Step: 132390   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:13:42,548-Speed 5931.45 samples/sec   Loss 5.9112   LearningRate 0.0646   Epoch: 12   Global Step: 132400   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:13:49,403-Speed 5976.35 samples/sec   Loss 5.8688   LearningRate 0.0645   Epoch: 12   Global Step: 132410   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:13:56,255-Speed 5978.95 samples/sec   Loss 5.9046   LearningRate 0.0645   Epoch: 12   Global Step: 132420   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:14:03,122-Speed 5966.10 samples/sec   Loss 5.8705   LearningRate 0.0645   Epoch: 12   Global Step: 132430   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:14:09,973-Speed 5983.38 samples/sec   Loss 5.9004   LearningRate 0.0645   Epoch: 12   Global Step: 132440   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:14:16,824-Speed 5979.36 samples/sec   Loss 5.8529   LearningRate 0.0645   Epoch: 12   Global Step: 132450   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:14:23,688-Speed 5969.16 samples/sec   Loss 5.8998   LearningRate 0.0645   Epoch: 12   Global Step: 132460   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:14:30,530-Speed 5989.72 samples/sec   Loss 5.8579   LearningRate 0.0644   Epoch: 12   Global Step: 132470   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:14:37,391-Speed 5970.35 samples/sec   Loss 5.8614   LearningRate 0.0644   Epoch: 12   Global Step: 132480   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:14:44,278-Speed 5948.65 samples/sec   Loss 5.9066   LearningRate 0.0644   Epoch: 12   Global Step: 132490   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:14:51,124-Speed 5983.94 samples/sec   Loss 5.8542   LearningRate 0.0644   Epoch: 12   Global Step: 132500   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:14:57,994-Speed 5965.93 samples/sec   Loss 5.8228   LearningRate 0.0644   Epoch: 12   Global Step: 132510   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:15:04,863-Speed 5964.42 samples/sec   Loss 5.8905   LearningRate 0.0643   Epoch: 12   Global Step: 132520   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:15:11,731-Speed 5964.90 samples/sec   Loss 5.8556   LearningRate 0.0643   Epoch: 12   Global Step: 132530   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:15:18,604-Speed 5961.15 samples/sec   Loss 5.8720   LearningRate 0.0643   Epoch: 12   Global Step: 132540   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:15:25,453-Speed 5981.02 samples/sec   Loss 5.9147   LearningRate 0.0643   Epoch: 12   Global Step: 132550   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:15:32,312-Speed 5972.96 samples/sec   Loss 5.9011   LearningRate 0.0643   Epoch: 12   Global Step: 132560   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:15:39,168-Speed 5975.52 samples/sec   Loss 5.8844   LearningRate 0.0643   Epoch: 12   Global Step: 132570   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:15:46,035-Speed 5966.08 samples/sec   Loss 5.8775   LearningRate 0.0642   Epoch: 12   Global Step: 132580   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:15:52,898-Speed 5968.94 samples/sec   Loss 5.8602   LearningRate 0.0642   Epoch: 12   Global Step: 132590   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:15:59,741-Speed 5986.92 samples/sec   Loss 5.9297   LearningRate 0.0642   Epoch: 12   Global Step: 132600   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:16:06,594-Speed 5978.68 samples/sec   Loss 5.8518   LearningRate 0.0642   Epoch: 12   Global Step: 132610   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:16:13,473-Speed 5957.53 samples/sec   Loss 5.8896   LearningRate 0.0642   Epoch: 12   Global Step: 132620   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:16:20,346-Speed 5961.10 samples/sec   Loss 5.8127   LearningRate 0.0642   Epoch: 12   Global Step: 132630   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:16:27,204-Speed 5974.01 samples/sec   Loss 5.7949   LearningRate 0.0641   Epoch: 12   Global Step: 132640   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:16:34,063-Speed 5972.56 samples/sec   Loss 5.8412   LearningRate 0.0641   Epoch: 12   Global Step: 132650   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:16:40,939-Speed 5958.26 samples/sec   Loss 5.8232   LearningRate 0.0641   Epoch: 12   Global Step: 132660   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:16:47,804-Speed 5967.77 samples/sec   Loss 5.8279   LearningRate 0.0641   Epoch: 12   Global Step: 132670   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:16:54,692-Speed 5948.87 samples/sec   Loss 5.8769   LearningRate 0.0641   Epoch: 12   Global Step: 132680   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:17:01,588-Speed 5941.25 samples/sec   Loss 5.8385   LearningRate 0.0641   Epoch: 12   Global Step: 132690   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:17:08,473-Speed 5950.09 samples/sec   Loss 5.8283   LearningRate 0.0640   Epoch: 12   Global Step: 132700   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:17:15,356-Speed 5952.54 samples/sec   Loss 5.8899   LearningRate 0.0640   Epoch: 12   Global Step: 132710   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:17:22,212-Speed 5974.31 samples/sec   Loss 5.8551   LearningRate 0.0640   Epoch: 12   Global Step: 132720   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:17:29,133-Speed 5919.58 samples/sec   Loss 5.8115   LearningRate 0.0640   Epoch: 12   Global Step: 132730   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:17:35,986-Speed 5978.78 samples/sec   Loss 5.8335   LearningRate 0.0640   Epoch: 12   Global Step: 132740   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:17:42,859-Speed 5960.56 samples/sec   Loss 5.8580   LearningRate 0.0640   Epoch: 12   Global Step: 132750   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:17:49,721-Speed 5970.46 samples/sec   Loss 5.7859   LearningRate 0.0639   Epoch: 12   Global Step: 132760   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:17:56,574-Speed 5978.11 samples/sec   Loss 5.9030   LearningRate 0.0639   Epoch: 12   Global Step: 132770   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:18:03,437-Speed 5968.91 samples/sec   Loss 5.8146   LearningRate 0.0639   Epoch: 12   Global Step: 132780   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:18:10,289-Speed 5979.34 samples/sec   Loss 5.8470   LearningRate 0.0639   Epoch: 12   Global Step: 132790   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:18:17,142-Speed 5978.75 samples/sec   Loss 5.8425   LearningRate 0.0639   Epoch: 12   Global Step: 132800   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:18:24,004-Speed 5970.10 samples/sec   Loss 5.8104   LearningRate 0.0639   Epoch: 12   Global Step: 132810   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:18:30,871-Speed 5969.00 samples/sec   Loss 5.7914   LearningRate 0.0638   Epoch: 12   Global Step: 132820   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:18:37,726-Speed 5976.05 samples/sec   Loss 5.8847   LearningRate 0.0638   Epoch: 12   Global Step: 132830   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:18:44,580-Speed 5977.49 samples/sec   Loss 5.8466   LearningRate 0.0638   Epoch: 12   Global Step: 132840   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:18:51,442-Speed 5973.03 samples/sec   Loss 5.8478   LearningRate 0.0638   Epoch: 12   Global Step: 132850   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:18:58,298-Speed 5976.35 samples/sec   Loss 5.8468   LearningRate 0.0638   Epoch: 12   Global Step: 132860   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:19:05,218-Speed 5920.67 samples/sec   Loss 5.8148   LearningRate 0.0637   Epoch: 12   Global Step: 132870   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:19:12,080-Speed 5970.06 samples/sec   Loss 5.8491   LearningRate 0.0637   Epoch: 12   Global Step: 132880   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:19:18,982-Speed 5935.88 samples/sec   Loss 5.8454   LearningRate 0.0637   Epoch: 12   Global Step: 132890   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:19:25,910-Speed 5913.36 samples/sec   Loss 5.8181   LearningRate 0.0637   Epoch: 12   Global Step: 132900   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:19:32,826-Speed 5923.52 samples/sec   Loss 5.8461   LearningRate 0.0637   Epoch: 12   Global Step: 132910   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:19:39,751-Speed 5916.55 samples/sec   Loss 5.7826   LearningRate 0.0637   Epoch: 12   Global Step: 132920   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:19:46,670-Speed 5921.26 samples/sec   Loss 5.8657   LearningRate 0.0636   Epoch: 12   Global Step: 132930   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:19:53,572-Speed 5935.31 samples/sec   Loss 5.8233   LearningRate 0.0636   Epoch: 12   Global Step: 132940   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:20:00,483-Speed 5928.38 samples/sec   Loss 5.8009   LearningRate 0.0636   Epoch: 12   Global Step: 132950   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:20:07,393-Speed 5928.14 samples/sec   Loss 5.8264   LearningRate 0.0636   Epoch: 12   Global Step: 132960   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:20:14,287-Speed 5942.19 samples/sec   Loss 5.8153   LearningRate 0.0636   Epoch: 12   Global Step: 132970   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:20:21,191-Speed 5934.20 samples/sec   Loss 5.7953   LearningRate 0.0636   Epoch: 12   Global Step: 132980   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:20:28,103-Speed 5927.08 samples/sec   Loss 5.8882   LearningRate 0.0635   Epoch: 12   Global Step: 132990   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:20:35,000-Speed 5940.09 samples/sec   Loss 5.8393   LearningRate 0.0635   Epoch: 12   Global Step: 133000   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:20:41,855-Speed 5978.10 samples/sec   Loss 5.7937   LearningRate 0.0635   Epoch: 12   Global Step: 133010   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:20:48,726-Speed 5962.26 samples/sec   Loss 5.8395   LearningRate 0.0635   Epoch: 12   Global Step: 133020   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:20:55,566-Speed 5988.92 samples/sec   Loss 5.7850   LearningRate 0.0635   Epoch: 12   Global Step: 133030   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:21:02,440-Speed 5960.38 samples/sec   Loss 5.9088   LearningRate 0.0635   Epoch: 12   Global Step: 133040   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:21:09,303-Speed 5969.36 samples/sec   Loss 5.8287   LearningRate 0.0634   Epoch: 12   Global Step: 133050   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:21:16,147-Speed 5986.45 samples/sec   Loss 5.8318   LearningRate 0.0634   Epoch: 12   Global Step: 133060   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-08 22:21:23,027-Speed 5954.51 samples/sec   Loss 5.8448   LearningRate 0.0634   Epoch: 12   Global Step: 133070   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-08 22:21:29,902-Speed 5958.43 samples/sec   Loss 5.8007   LearningRate 0.0634   Epoch: 12   Global Step: 133080   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-08 22:21:36,771-Speed 5966.63 samples/sec   Loss 5.8252   LearningRate 0.0634   Epoch: 12   Global Step: 133090   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-08 22:21:43,627-Speed 5975.54 samples/sec   Loss 5.8090   LearningRate 0.0634   Epoch: 12   Global Step: 133100   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-08 22:21:50,482-Speed 5975.63 samples/sec   Loss 5.7774   LearningRate 0.0633   Epoch: 12   Global Step: 133110   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-08 22:21:57,340-Speed 5974.21 samples/sec   Loss 5.7974   LearningRate 0.0633   Epoch: 12   Global Step: 133120   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-08 22:22:04,182-Speed 5987.26 samples/sec   Loss 5.8221   LearningRate 0.0633   Epoch: 12   Global Step: 133130   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-08 22:22:11,036-Speed 5977.48 samples/sec   Loss 5.8278   LearningRate 0.0633   Epoch: 12   Global Step: 133140   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-08 22:22:17,880-Speed 5985.64 samples/sec   Loss 5.8221   LearningRate 0.0633   Epoch: 12   Global Step: 133150   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-01-08 22:22:24,731-Speed 5979.97 samples/sec   Loss 5.8433   LearningRate 0.0633   Epoch: 12   Global Step: 133160   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:22:31,592-Speed 5971.10 samples/sec   Loss 5.7472   LearningRate 0.0632   Epoch: 12   Global Step: 133170   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:22:38,462-Speed 5963.04 samples/sec   Loss 5.7815   LearningRate 0.0632   Epoch: 12   Global Step: 133180   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:22:45,333-Speed 5962.98 samples/sec   Loss 5.8108   LearningRate 0.0632   Epoch: 12   Global Step: 133190   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:22:52,194-Speed 5970.60 samples/sec   Loss 5.8043   LearningRate 0.0632   Epoch: 12   Global Step: 133200   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:22:59,054-Speed 5972.41 samples/sec   Loss 5.8254   LearningRate 0.0632   Epoch: 12   Global Step: 133210   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:23:05,903-Speed 5981.32 samples/sec   Loss 5.7882   LearningRate 0.0632   Epoch: 12   Global Step: 133220   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:23:12,770-Speed 5965.90 samples/sec   Loss 5.8948   LearningRate 0.0631   Epoch: 12   Global Step: 133230   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:23:19,628-Speed 5973.77 samples/sec   Loss 5.8036   LearningRate 0.0631   Epoch: 12   Global Step: 133240   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:23:26,500-Speed 5962.31 samples/sec   Loss 5.7430   LearningRate 0.0631   Epoch: 12   Global Step: 133250   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:23:33,402-Speed 5935.61 samples/sec   Loss 5.8066   LearningRate 0.0631   Epoch: 12   Global Step: 133260   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:23:40,256-Speed 5977.99 samples/sec   Loss 5.7419   LearningRate 0.0631   Epoch: 12   Global Step: 133270   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:23:47,159-Speed 5934.35 samples/sec   Loss 5.8120   LearningRate 0.0630   Epoch: 12   Global Step: 133280   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:23:54,062-Speed 5935.52 samples/sec   Loss 5.8181   LearningRate 0.0630   Epoch: 12   Global Step: 133290   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:24:00,965-Speed 5934.52 samples/sec   Loss 5.8225   LearningRate 0.0630   Epoch: 12   Global Step: 133300   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:24:07,884-Speed 5921.31 samples/sec   Loss 5.7920   LearningRate 0.0630   Epoch: 12   Global Step: 133310   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:24:14,784-Speed 5936.60 samples/sec   Loss 5.8128   LearningRate 0.0630   Epoch: 12   Global Step: 133320   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:24:21,653-Speed 5964.66 samples/sec   Loss 5.7510   LearningRate 0.0630   Epoch: 12   Global Step: 133330   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:24:28,520-Speed 5965.85 samples/sec   Loss 5.7811   LearningRate 0.0629   Epoch: 12   Global Step: 133340   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:24:35,402-Speed 5952.49 samples/sec   Loss 5.7191   LearningRate 0.0629   Epoch: 12   Global Step: 133350   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:24:42,271-Speed 5964.83 samples/sec   Loss 5.8054   LearningRate 0.0629   Epoch: 12   Global Step: 133360   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:24:49,130-Speed 5972.19 samples/sec   Loss 5.7708   LearningRate 0.0629   Epoch: 12   Global Step: 133370   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:24:55,995-Speed 5967.21 samples/sec   Loss 5.8827   LearningRate 0.0629   Epoch: 12   Global Step: 133380   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-01-08 22:25:02,858-Speed 5972.66 samples/sec   Loss 5.7823   LearningRate 0.0629   Epoch: 12   Global Step: 133390   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:25:09,719-Speed 5972.41 samples/sec   Loss 5.8018   LearningRate 0.0628   Epoch: 12   Global Step: 133400   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:25:16,577-Speed 5973.70 samples/sec   Loss 5.7512   LearningRate 0.0628   Epoch: 12   Global Step: 133410   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-01-08 22:25:23,430-Speed 5978.22 samples/sec   Loss 5.7929   LearningRate 0.0628   Epoch: 12   Global Step: 133420   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:25:30,327-Speed 5944.94 samples/sec   Loss 5.8413   LearningRate 0.0628   Epoch: 12   Global Step: 133430   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:25:37,188-Speed 5970.99 samples/sec   Loss 5.7804   LearningRate 0.0628   Epoch: 12   Global Step: 133440   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:25:44,040-Speed 5978.78 samples/sec   Loss 5.7331   LearningRate 0.0628   Epoch: 12   Global Step: 133450   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:25:50,920-Speed 5956.81 samples/sec   Loss 5.7974   LearningRate 0.0627   Epoch: 12   Global Step: 133460   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:25:57,774-Speed 5976.87 samples/sec   Loss 5.8262   LearningRate 0.0627   Epoch: 12   Global Step: 133470   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:26:04,652-Speed 5956.77 samples/sec   Loss 5.8303   LearningRate 0.0627   Epoch: 12   Global Step: 133480   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:26:11,500-Speed 5982.22 samples/sec   Loss 5.7864   LearningRate 0.0627   Epoch: 12   Global Step: 133490   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:26:18,361-Speed 5971.48 samples/sec   Loss 5.8006   LearningRate 0.0627   Epoch: 12   Global Step: 133500   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:26:25,225-Speed 5968.31 samples/sec   Loss 5.8115   LearningRate 0.0627   Epoch: 12   Global Step: 133510   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:26:32,088-Speed 5978.21 samples/sec   Loss 5.7957   LearningRate 0.0626   Epoch: 12   Global Step: 133520   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:26:38,964-Speed 5957.70 samples/sec   Loss 5.7792   LearningRate 0.0626   Epoch: 12   Global Step: 133530   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:26:45,836-Speed 5961.48 samples/sec   Loss 5.8092   LearningRate 0.0626   Epoch: 12   Global Step: 133540   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:26:52,716-Speed 5955.66 samples/sec   Loss 5.7443   LearningRate 0.0626   Epoch: 12   Global Step: 133550   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:26:59,576-Speed 5972.10 samples/sec   Loss 5.7438   LearningRate 0.0626   Epoch: 12   Global Step: 133560   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:27:06,436-Speed 5971.47 samples/sec   Loss 5.7660   LearningRate 0.0626   Epoch: 12   Global Step: 133570   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:27:13,305-Speed 5969.53 samples/sec   Loss 5.7555   LearningRate 0.0625   Epoch: 12   Global Step: 133580   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:27:20,228-Speed 5917.89 samples/sec   Loss 5.7716   LearningRate 0.0625   Epoch: 12   Global Step: 133590   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:27:27,089-Speed 5970.68 samples/sec   Loss 5.7575   LearningRate 0.0625   Epoch: 12   Global Step: 133600   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:27:33,943-Speed 5977.41 samples/sec   Loss 5.7853   LearningRate 0.0625   Epoch: 12   Global Step: 133610   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:27:40,791-Speed 5982.92 samples/sec   Loss 5.8169   LearningRate 0.0625   Epoch: 12   Global Step: 133620   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:27:47,643-Speed 5978.99 samples/sec   Loss 5.8307   LearningRate 0.0625   Epoch: 12   Global Step: 133630   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:27:54,498-Speed 5978.58 samples/sec   Loss 5.7664   LearningRate 0.0624   Epoch: 12   Global Step: 133640   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:28:01,361-Speed 5969.14 samples/sec   Loss 5.8118   LearningRate 0.0624   Epoch: 12   Global Step: 133650   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:28:08,224-Speed 5969.73 samples/sec   Loss 5.7787   LearningRate 0.0624   Epoch: 12   Global Step: 133660   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:28:15,079-Speed 5976.24 samples/sec   Loss 5.7442   LearningRate 0.0624   Epoch: 12   Global Step: 133670   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:28:21,929-Speed 5980.94 samples/sec   Loss 5.7563   LearningRate 0.0624   Epoch: 12   Global Step: 133680   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:28:28,789-Speed 5971.80 samples/sec   Loss 5.7803   LearningRate 0.0624   Epoch: 12   Global Step: 133690   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:28:35,653-Speed 5970.24 samples/sec   Loss 5.7767   LearningRate 0.0623   Epoch: 12   Global Step: 133700   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:28:42,510-Speed 5974.44 samples/sec   Loss 5.7434   LearningRate 0.0623   Epoch: 12   Global Step: 133710   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:28:49,373-Speed 5969.30 samples/sec   Loss 5.7679   LearningRate 0.0623   Epoch: 12   Global Step: 133720   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:28:56,223-Speed 5981.15 samples/sec   Loss 5.8332   LearningRate 0.0623   Epoch: 12   Global Step: 133730   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:29:03,070-Speed 5982.40 samples/sec   Loss 5.7617   LearningRate 0.0623   Epoch: 12   Global Step: 133740   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:29:09,933-Speed 5969.49 samples/sec   Loss 5.7544   LearningRate 0.0623   Epoch: 12   Global Step: 133750   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:29:16,810-Speed 5957.46 samples/sec   Loss 5.8081   LearningRate 0.0622   Epoch: 12   Global Step: 133760   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:29:23,676-Speed 5966.84 samples/sec   Loss 5.7996   LearningRate 0.0622   Epoch: 12   Global Step: 133770   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:29:30,512-Speed 5993.06 samples/sec   Loss 5.7854   LearningRate 0.0622   Epoch: 12   Global Step: 133780   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:29:37,369-Speed 5974.09 samples/sec   Loss 5.7662   LearningRate 0.0622   Epoch: 12   Global Step: 133790   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:29:44,221-Speed 5978.82 samples/sec   Loss 5.7784   LearningRate 0.0622   Epoch: 12   Global Step: 133800   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:29:51,084-Speed 5969.79 samples/sec   Loss 5.7752   LearningRate 0.0622   Epoch: 12   Global Step: 133810   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:29:58,001-Speed 5922.86 samples/sec   Loss 5.8072   LearningRate 0.0621   Epoch: 12   Global Step: 133820   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:30:04,917-Speed 5923.31 samples/sec   Loss 5.7149   LearningRate 0.0621   Epoch: 12   Global Step: 133830   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:30:11,879-Speed 5884.52 samples/sec   Loss 5.7191   LearningRate 0.0621   Epoch: 12   Global Step: 133840   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:30:18,866-Speed 5864.08 samples/sec   Loss 5.7765   LearningRate 0.0621   Epoch: 12   Global Step: 133850   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:30:25,816-Speed 5894.39 samples/sec   Loss 5.7401   LearningRate 0.0621   Epoch: 12   Global Step: 133860   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:30:32,768-Speed 5893.12 samples/sec   Loss 5.7624   LearningRate 0.0620   Epoch: 12   Global Step: 133870   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:30:39,623-Speed 5976.94 samples/sec   Loss 5.7726   LearningRate 0.0620   Epoch: 12   Global Step: 133880   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:30:46,497-Speed 5959.80 samples/sec   Loss 5.7681   LearningRate 0.0620   Epoch: 12   Global Step: 133890   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:30:53,347-Speed 5981.08 samples/sec   Loss 5.7700   LearningRate 0.0620   Epoch: 12   Global Step: 133900   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:31:00,186-Speed 5990.53 samples/sec   Loss 5.7645   LearningRate 0.0620   Epoch: 12   Global Step: 133910   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:31:07,555-Speed 5558.84 samples/sec   Loss 5.7512   LearningRate 0.0620   Epoch: 12   Global Step: 133920   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:31:14,408-Speed 5978.20 samples/sec   Loss 5.7260   LearningRate 0.0619   Epoch: 12   Global Step: 133930   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:31:21,273-Speed 5967.84 samples/sec   Loss 5.8201   LearningRate 0.0619   Epoch: 12   Global Step: 133940   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:31:28,151-Speed 5956.25 samples/sec   Loss 5.7931   LearningRate 0.0619   Epoch: 12   Global Step: 133950   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:31:35,017-Speed 5966.74 samples/sec   Loss 5.7401   LearningRate 0.0619   Epoch: 12   Global Step: 133960   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:31:41,865-Speed 5982.73 samples/sec   Loss 5.7687   LearningRate 0.0619   Epoch: 12   Global Step: 133970   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:31:48,721-Speed 5975.24 samples/sec   Loss 5.7459   LearningRate 0.0619   Epoch: 12   Global Step: 133980   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:31:55,579-Speed 5973.88 samples/sec   Loss 5.7094   LearningRate 0.0618   Epoch: 12   Global Step: 133990   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:32:02,425-Speed 5983.99 samples/sec   Loss 5.7636   LearningRate 0.0618   Epoch: 12   Global Step: 134000   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:32:09,281-Speed 5975.91 samples/sec   Loss 5.7233   LearningRate 0.0618   Epoch: 12   Global Step: 134010   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:32:16,147-Speed 5966.38 samples/sec   Loss 5.7556   LearningRate 0.0618   Epoch: 12   Global Step: 134020   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:32:23,004-Speed 5977.65 samples/sec   Loss 5.8007   LearningRate 0.0618   Epoch: 12   Global Step: 134030   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:32:29,874-Speed 5963.37 samples/sec   Loss 5.7314   LearningRate 0.0618   Epoch: 12   Global Step: 134040   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:32:36,735-Speed 5971.23 samples/sec   Loss 5.7930   LearningRate 0.0617   Epoch: 12   Global Step: 134050   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:32:43,585-Speed 5983.99 samples/sec   Loss 5.7988   LearningRate 0.0617   Epoch: 12   Global Step: 134060   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:32:50,441-Speed 5975.57 samples/sec   Loss 5.7223   LearningRate 0.0617   Epoch: 12   Global Step: 134070   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:32:57,329-Speed 5947.82 samples/sec   Loss 5.7437   LearningRate 0.0617   Epoch: 12   Global Step: 134080   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:33:04,184-Speed 5975.93 samples/sec   Loss 5.7553   LearningRate 0.0617   Epoch: 12   Global Step: 134090   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:33:11,035-Speed 5979.42 samples/sec   Loss 5.7734   LearningRate 0.0617   Epoch: 12   Global Step: 134100   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:33:17,904-Speed 5964.98 samples/sec   Loss 5.7893   LearningRate 0.0616   Epoch: 12   Global Step: 134110   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-08 22:33:24,756-Speed 5978.76 samples/sec   Loss 5.7095   LearningRate 0.0616   Epoch: 12   Global Step: 134120   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:33:31,609-Speed 5978.47 samples/sec   Loss 5.7842   LearningRate 0.0616   Epoch: 12   Global Step: 134130   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:33:38,497-Speed 5947.12 samples/sec   Loss 5.7242   LearningRate 0.0616   Epoch: 12   Global Step: 134140   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:33:45,354-Speed 5978.71 samples/sec   Loss 5.7126   LearningRate 0.0616   Epoch: 12   Global Step: 134150   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:33:52,207-Speed 5978.06 samples/sec   Loss 5.7404   LearningRate 0.0616   Epoch: 12   Global Step: 134160   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:33:59,054-Speed 5983.04 samples/sec   Loss 5.6986   LearningRate 0.0615   Epoch: 12   Global Step: 134170   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:34:05,913-Speed 5972.92 samples/sec   Loss 5.7564   LearningRate 0.0615   Epoch: 12   Global Step: 134180   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:34:12,775-Speed 5969.96 samples/sec   Loss 5.7372   LearningRate 0.0615   Epoch: 12   Global Step: 134190   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:34:19,621-Speed 5984.37 samples/sec   Loss 5.7309   LearningRate 0.0615   Epoch: 12   Global Step: 134200   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:34:26,451-Speed 5998.02 samples/sec   Loss 5.7592   LearningRate 0.0615   Epoch: 12   Global Step: 134210   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 22:34:33,343-Speed 5943.92 samples/sec   Loss 5.7268   LearningRate 0.0615   Epoch: 12   Global Step: 134220   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 22:34:40,222-Speed 5955.34 samples/sec   Loss 5.6824   LearningRate 0.0614   Epoch: 12   Global Step: 134230   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 22:34:47,092-Speed 5964.37 samples/sec   Loss 5.7354   LearningRate 0.0614   Epoch: 12   Global Step: 134240   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 22:34:53,948-Speed 5974.88 samples/sec   Loss 5.7616   LearningRate 0.0614   Epoch: 12   Global Step: 134250   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 22:35:00,798-Speed 5980.21 samples/sec   Loss 5.7825   LearningRate 0.0614   Epoch: 12   Global Step: 134260   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 22:35:07,660-Speed 5971.21 samples/sec   Loss 5.7011   LearningRate 0.0614   Epoch: 12   Global Step: 134270   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 22:35:14,513-Speed 5976.91 samples/sec   Loss 5.7727   LearningRate 0.0614   Epoch: 12   Global Step: 134280   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 22:35:21,370-Speed 5975.27 samples/sec   Loss 5.8049   LearningRate 0.0613   Epoch: 12   Global Step: 134290   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 22:35:28,213-Speed 5986.84 samples/sec   Loss 5.7688   LearningRate 0.0613   Epoch: 12   Global Step: 134300   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 22:35:35,071-Speed 5973.51 samples/sec   Loss 5.7363   LearningRate 0.0613   Epoch: 12   Global Step: 134310   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:35:41,918-Speed 5983.87 samples/sec   Loss 5.7587   LearningRate 0.0613   Epoch: 12   Global Step: 134320   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:35:48,777-Speed 5972.96 samples/sec   Loss 5.7552   LearningRate 0.0613   Epoch: 12   Global Step: 134330   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:35:55,678-Speed 5936.51 samples/sec   Loss 5.7710   LearningRate 0.0613   Epoch: 12   Global Step: 134340   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:36:02,530-Speed 5979.20 samples/sec   Loss 5.7572   LearningRate 0.0612   Epoch: 12   Global Step: 134350   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:36:09,408-Speed 5956.46 samples/sec   Loss 5.7052   LearningRate 0.0612   Epoch: 12   Global Step: 134360   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:36:16,265-Speed 5974.40 samples/sec   Loss 5.7366   LearningRate 0.0612   Epoch: 12   Global Step: 134370   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:36:23,130-Speed 5967.93 samples/sec   Loss 5.6962   LearningRate 0.0612   Epoch: 12   Global Step: 134380   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:36:30,008-Speed 5956.32 samples/sec   Loss 5.7131   LearningRate 0.0612   Epoch: 12   Global Step: 134390   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:36:36,867-Speed 5972.63 samples/sec   Loss 5.7171   LearningRate 0.0612   Epoch: 12   Global Step: 134400   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:36:43,721-Speed 5977.62 samples/sec   Loss 5.6951   LearningRate 0.0611   Epoch: 12   Global Step: 134410   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:36:50,602-Speed 5953.41 samples/sec   Loss 5.7303   LearningRate 0.0611   Epoch: 12   Global Step: 134420   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:36:57,460-Speed 5973.82 samples/sec   Loss 5.7374   LearningRate 0.0611   Epoch: 12   Global Step: 134430   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:37:04,314-Speed 5977.71 samples/sec   Loss 5.6986   LearningRate 0.0611   Epoch: 12   Global Step: 134440   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:37:11,169-Speed 5979.03 samples/sec   Loss 5.7254   LearningRate 0.0611   Epoch: 12   Global Step: 134450   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:37:18,023-Speed 5977.33 samples/sec   Loss 5.8017   LearningRate 0.0611   Epoch: 12   Global Step: 134460   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:37:24,885-Speed 5970.68 samples/sec   Loss 5.7479   LearningRate 0.0610   Epoch: 12   Global Step: 134470   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:37:31,732-Speed 5983.61 samples/sec   Loss 5.7126   LearningRate 0.0610   Epoch: 12   Global Step: 134480   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:37:38,593-Speed 5970.34 samples/sec   Loss 5.6425   LearningRate 0.0610   Epoch: 12   Global Step: 134490   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:37:45,449-Speed 5975.60 samples/sec   Loss 5.7384   LearningRate 0.0610   Epoch: 12   Global Step: 134500   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:37:52,305-Speed 5976.13 samples/sec   Loss 5.7153   LearningRate 0.0610   Epoch: 12   Global Step: 134510   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:37:59,171-Speed 5966.77 samples/sec   Loss 5.6853   LearningRate 0.0610   Epoch: 12   Global Step: 134520   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:38:06,026-Speed 5976.50 samples/sec   Loss 5.7355   LearningRate 0.0609   Epoch: 12   Global Step: 134530   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:38:12,875-Speed 5981.97 samples/sec   Loss 5.7383   LearningRate 0.0609   Epoch: 12   Global Step: 134540   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:38:19,726-Speed 5979.88 samples/sec   Loss 5.6739   LearningRate 0.0609   Epoch: 12   Global Step: 134550   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:38:26,594-Speed 5965.35 samples/sec   Loss 5.7183   LearningRate 0.0609   Epoch: 12   Global Step: 134560   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:38:33,452-Speed 5973.30 samples/sec   Loss 5.7153   LearningRate 0.0609   Epoch: 12   Global Step: 134570   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:38:40,299-Speed 5982.92 samples/sec   Loss 5.7532   LearningRate 0.0609   Epoch: 12   Global Step: 134580   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:38:47,187-Speed 5948.41 samples/sec   Loss 5.7054   LearningRate 0.0608   Epoch: 12   Global Step: 134590   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:38:54,037-Speed 5982.47 samples/sec   Loss 5.6911   LearningRate 0.0608   Epoch: 12   Global Step: 134600   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:39:00,904-Speed 5965.76 samples/sec   Loss 5.6927   LearningRate 0.0608   Epoch: 12   Global Step: 134610   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:39:07,762-Speed 5974.09 samples/sec   Loss 5.6954   LearningRate 0.0608   Epoch: 12   Global Step: 134620   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:39:14,607-Speed 5985.14 samples/sec   Loss 5.7023   LearningRate 0.0608   Epoch: 12   Global Step: 134630   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:39:21,452-Speed 5985.18 samples/sec   Loss 5.7286   LearningRate 0.0608   Epoch: 12   Global Step: 134640   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:39:28,303-Speed 5979.96 samples/sec   Loss 5.7319   LearningRate 0.0607   Epoch: 12   Global Step: 134650   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:39:35,182-Speed 5955.02 samples/sec   Loss 5.7767   LearningRate 0.0607   Epoch: 12   Global Step: 134660   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:39:42,038-Speed 5975.43 samples/sec   Loss 5.7310   LearningRate 0.0607   Epoch: 12   Global Step: 134670   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:39:48,908-Speed 5962.98 samples/sec   Loss 5.7510   LearningRate 0.0607   Epoch: 12   Global Step: 134680   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:39:55,762-Speed 5977.80 samples/sec   Loss 5.7093   LearningRate 0.0607   Epoch: 12   Global Step: 134690   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:40:02,610-Speed 5981.59 samples/sec   Loss 5.6746   LearningRate 0.0607   Epoch: 12   Global Step: 134700   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:40:09,461-Speed 5982.06 samples/sec   Loss 5.7300   LearningRate 0.0606   Epoch: 12   Global Step: 134710   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:40:16,312-Speed 5979.98 samples/sec   Loss 5.8128   LearningRate 0.0606   Epoch: 12   Global Step: 134720   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:40:23,188-Speed 5958.20 samples/sec   Loss 5.6818   LearningRate 0.0606   Epoch: 12   Global Step: 134730   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:40:30,036-Speed 5983.60 samples/sec   Loss 5.6725   LearningRate 0.0606   Epoch: 12   Global Step: 134740   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:40:36,889-Speed 5981.45 samples/sec   Loss 5.7669   LearningRate 0.0606   Epoch: 12   Global Step: 134750   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:40:43,749-Speed 5971.34 samples/sec   Loss 5.7309   LearningRate 0.0606   Epoch: 12   Global Step: 134760   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:40:50,602-Speed 5978.84 samples/sec   Loss 5.6914   LearningRate 0.0605   Epoch: 12   Global Step: 134770   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:40:57,451-Speed 5981.17 samples/sec   Loss 5.7057   LearningRate 0.0605   Epoch: 12   Global Step: 134780   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:41:04,316-Speed 5967.03 samples/sec   Loss 5.7724   LearningRate 0.0605   Epoch: 12   Global Step: 134790   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:41:11,173-Speed 5975.36 samples/sec   Loss 5.7283   LearningRate 0.0605   Epoch: 12   Global Step: 134800   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:41:34,406-Speed 1763.10 samples/sec   Loss 5.7144   LearningRate 0.0605   Epoch: 13   Global Step: 134810   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:41:41,258-Speed 5980.42 samples/sec   Loss 5.7026   LearningRate 0.0605   Epoch: 13   Global Step: 134820   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:41:48,116-Speed 5973.58 samples/sec   Loss 5.7033   LearningRate 0.0604   Epoch: 13   Global Step: 134830   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:41:54,972-Speed 5975.57 samples/sec   Loss 5.7075   LearningRate 0.0604   Epoch: 13   Global Step: 134840   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:42:01,806-Speed 5994.91 samples/sec   Loss 5.7356   LearningRate 0.0604   Epoch: 13   Global Step: 134850   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:42:08,645-Speed 5990.22 samples/sec   Loss 5.6948   LearningRate 0.0604   Epoch: 13   Global Step: 134860   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:42:15,506-Speed 5971.12 samples/sec   Loss 5.7006   LearningRate 0.0604   Epoch: 13   Global Step: 134870   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:42:22,364-Speed 5974.29 samples/sec   Loss 5.7180   LearningRate 0.0604   Epoch: 13   Global Step: 134880   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:42:29,205-Speed 5988.67 samples/sec   Loss 5.7161   LearningRate 0.0603   Epoch: 13   Global Step: 134890   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:42:36,063-Speed 5973.00 samples/sec   Loss 5.7030   LearningRate 0.0603   Epoch: 13   Global Step: 134900   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:42:42,931-Speed 5965.47 samples/sec   Loss 5.6743   LearningRate 0.0603   Epoch: 13   Global Step: 134910   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:42:49,786-Speed 5977.12 samples/sec   Loss 5.6771   LearningRate 0.0603   Epoch: 13   Global Step: 134920   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:42:56,645-Speed 5971.75 samples/sec   Loss 5.7477   LearningRate 0.0603   Epoch: 13   Global Step: 134930   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:43:03,494-Speed 5982.42 samples/sec   Loss 5.6415   LearningRate 0.0603   Epoch: 13   Global Step: 134940   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:43:10,352-Speed 5976.48 samples/sec   Loss 5.6761   LearningRate 0.0602   Epoch: 13   Global Step: 134950   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:43:17,219-Speed 5968.72 samples/sec   Loss 5.6624   LearningRate 0.0602   Epoch: 13   Global Step: 134960   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:43:24,074-Speed 5978.48 samples/sec   Loss 5.6336   LearningRate 0.0602   Epoch: 13   Global Step: 134970   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:43:30,952-Speed 5956.63 samples/sec   Loss 5.6703   LearningRate 0.0602   Epoch: 13   Global Step: 134980   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:43:37,847-Speed 5941.35 samples/sec   Loss 5.6992   LearningRate 0.0602   Epoch: 13   Global Step: 134990   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:43:44,696-Speed 5982.39 samples/sec   Loss 5.6893   LearningRate 0.0602   Epoch: 13   Global Step: 135000   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-01-08 22:44:11,505-[lfw][135000]XNorm: 24.448964
Training: 2022-01-08 22:44:11,506-[lfw][135000]Accuracy-Flip: 0.99783+-0.00308
Training: 2022-01-08 22:44:11,506-[lfw][135000]Accuracy-Highest: 0.99783
Training: 2022-01-08 22:44:42,577-[cfp_fp][135000]XNorm: 21.364589
Training: 2022-01-08 22:44:42,578-[cfp_fp][135000]Accuracy-Flip: 0.98586+-0.00559
Training: 2022-01-08 22:44:42,579-[cfp_fp][135000]Accuracy-Highest: 0.98586
Training: 2022-01-08 22:45:09,282-[agedb_30][135000]XNorm: 23.805966
Training: 2022-01-08 22:45:09,283-[agedb_30][135000]Accuracy-Flip: 0.97667+-0.00830
Training: 2022-01-08 22:45:09,284-[agedb_30][135000]Accuracy-Highest: 0.97667
Training: 2022-01-08 22:45:16,122-Speed 448.02 samples/sec   Loss 5.6575   LearningRate 0.0601   Epoch: 13   Global Step: 135010   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-01-08 22:45:22,956-Speed 5994.20 samples/sec   Loss 5.6415   LearningRate 0.0601   Epoch: 13   Global Step: 135020   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-01-08 22:45:29,791-Speed 5993.14 samples/sec   Loss 5.7195   LearningRate 0.0601   Epoch: 13   Global Step: 135030   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-01-08 22:45:36,644-Speed 5978.34 samples/sec   Loss 5.6952   LearningRate 0.0601   Epoch: 13   Global Step: 135040   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-01-08 22:45:43,493-Speed 5982.66 samples/sec   Loss 5.6696   LearningRate 0.0601   Epoch: 13   Global Step: 135050   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-01-08 22:45:50,344-Speed 5980.88 samples/sec   Loss 5.6247   LearningRate 0.0601   Epoch: 13   Global Step: 135060   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-01-08 22:45:57,225-Speed 5956.21 samples/sec   Loss 5.7123   LearningRate 0.0600   Epoch: 13   Global Step: 135070   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-01-08 22:46:04,094-Speed 5963.55 samples/sec   Loss 5.6249   LearningRate 0.0600   Epoch: 13   Global Step: 135080   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-01-08 22:46:10,961-Speed 5966.00 samples/sec   Loss 5.6513   LearningRate 0.0600   Epoch: 13   Global Step: 135090   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-01-08 22:46:17,841-Speed 5954.99 samples/sec   Loss 5.6624   LearningRate 0.0600   Epoch: 13   Global Step: 135100   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 22:46:24,706-Speed 5967.71 samples/sec   Loss 5.6310   LearningRate 0.0600   Epoch: 13   Global Step: 135110   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 22:46:31,575-Speed 5963.90 samples/sec   Loss 5.6532   LearningRate 0.0600   Epoch: 13   Global Step: 135120   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 22:46:38,432-Speed 5974.62 samples/sec   Loss 5.7127   LearningRate 0.0599   Epoch: 13   Global Step: 135130   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 22:46:45,294-Speed 5969.57 samples/sec   Loss 5.6973   LearningRate 0.0599   Epoch: 13   Global Step: 135140   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 22:46:52,159-Speed 5968.96 samples/sec   Loss 5.6780   LearningRate 0.0599   Epoch: 13   Global Step: 135150   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 22:46:59,038-Speed 5955.90 samples/sec   Loss 5.6775   LearningRate 0.0599   Epoch: 13   Global Step: 135160   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 22:47:05,919-Speed 5953.43 samples/sec   Loss 5.7000   LearningRate 0.0599   Epoch: 13   Global Step: 135170   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 22:47:12,815-Speed 5941.26 samples/sec   Loss 5.6770   LearningRate 0.0599   Epoch: 13   Global Step: 135180   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 22:47:19,687-Speed 5961.86 samples/sec   Loss 5.6150   LearningRate 0.0598   Epoch: 13   Global Step: 135190   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 22:47:26,555-Speed 5964.60 samples/sec   Loss 5.7273   LearningRate 0.0598   Epoch: 13   Global Step: 135200   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:47:33,426-Speed 5962.83 samples/sec   Loss 5.6374   LearningRate 0.0598   Epoch: 13   Global Step: 135210   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:47:40,297-Speed 5964.23 samples/sec   Loss 5.6570   LearningRate 0.0598   Epoch: 13   Global Step: 135220   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:47:47,150-Speed 5977.88 samples/sec   Loss 5.6843   LearningRate 0.0598   Epoch: 13   Global Step: 135230   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:47:54,015-Speed 5967.45 samples/sec   Loss 5.7047   LearningRate 0.0598   Epoch: 13   Global Step: 135240   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:48:00,889-Speed 5963.26 samples/sec   Loss 5.7114   LearningRate 0.0597   Epoch: 13   Global Step: 135250   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:48:07,769-Speed 5953.88 samples/sec   Loss 5.7057   LearningRate 0.0597   Epoch: 13   Global Step: 135260   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:48:14,686-Speed 5923.31 samples/sec   Loss 5.7017   LearningRate 0.0597   Epoch: 13   Global Step: 135270   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:48:21,549-Speed 5972.68 samples/sec   Loss 5.6981   LearningRate 0.0597   Epoch: 13   Global Step: 135280   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:48:28,406-Speed 5974.29 samples/sec   Loss 5.6088   LearningRate 0.0597   Epoch: 13   Global Step: 135290   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:48:35,299-Speed 5943.14 samples/sec   Loss 5.6639   LearningRate 0.0597   Epoch: 13   Global Step: 135300   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:48:42,186-Speed 5949.10 samples/sec   Loss 5.6906   LearningRate 0.0596   Epoch: 13   Global Step: 135310   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:48:49,063-Speed 5957.54 samples/sec   Loss 5.6336   LearningRate 0.0596   Epoch: 13   Global Step: 135320   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:48:55,948-Speed 5950.51 samples/sec   Loss 5.7121   LearningRate 0.0596   Epoch: 13   Global Step: 135330   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:49:02,848-Speed 5937.89 samples/sec   Loss 5.6353   LearningRate 0.0596   Epoch: 13   Global Step: 135340   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:49:09,721-Speed 5961.97 samples/sec   Loss 5.6674   LearningRate 0.0596   Epoch: 13   Global Step: 135350   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:49:16,577-Speed 5975.66 samples/sec   Loss 5.6770   LearningRate 0.0596   Epoch: 13   Global Step: 135360   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:49:23,443-Speed 5967.05 samples/sec   Loss 5.6313   LearningRate 0.0595   Epoch: 13   Global Step: 135370   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:49:30,302-Speed 5973.00 samples/sec   Loss 5.5875   LearningRate 0.0595   Epoch: 13   Global Step: 135380   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:49:37,163-Speed 5971.11 samples/sec   Loss 5.6620   LearningRate 0.0595   Epoch: 13   Global Step: 135390   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:49:44,037-Speed 5960.45 samples/sec   Loss 5.6108   LearningRate 0.0595   Epoch: 13   Global Step: 135400   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:49:50,896-Speed 5972.24 samples/sec   Loss 5.6711   LearningRate 0.0595   Epoch: 13   Global Step: 135410   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:49:57,765-Speed 5964.75 samples/sec   Loss 5.6687   LearningRate 0.0595   Epoch: 13   Global Step: 135420   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:50:04,626-Speed 5972.62 samples/sec   Loss 5.6392   LearningRate 0.0594   Epoch: 13   Global Step: 135430   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:50:11,489-Speed 5969.25 samples/sec   Loss 5.6477   LearningRate 0.0594   Epoch: 13   Global Step: 135440   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:50:18,341-Speed 5979.05 samples/sec   Loss 5.6997   LearningRate 0.0594   Epoch: 13   Global Step: 135450   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:50:25,321-Speed 5869.66 samples/sec   Loss 5.6881   LearningRate 0.0594   Epoch: 13   Global Step: 135460   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:50:32,275-Speed 5891.13 samples/sec   Loss 5.6696   LearningRate 0.0594   Epoch: 13   Global Step: 135470   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:50:39,246-Speed 5876.75 samples/sec   Loss 5.6698   LearningRate 0.0594   Epoch: 13   Global Step: 135480   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:50:46,103-Speed 5975.45 samples/sec   Loss 5.6353   LearningRate 0.0593   Epoch: 13   Global Step: 135490   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:50:52,967-Speed 5967.38 samples/sec   Loss 5.6278   LearningRate 0.0593   Epoch: 13   Global Step: 135500   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:50:59,826-Speed 5973.31 samples/sec   Loss 5.6212   LearningRate 0.0593   Epoch: 13   Global Step: 135510   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:51:06,719-Speed 5942.95 samples/sec   Loss 5.5986   LearningRate 0.0593   Epoch: 13   Global Step: 135520   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:51:13,577-Speed 5973.68 samples/sec   Loss 5.6410   LearningRate 0.0593   Epoch: 13   Global Step: 135530   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:51:20,431-Speed 5976.90 samples/sec   Loss 5.6920   LearningRate 0.0593   Epoch: 13   Global Step: 135540   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:51:27,283-Speed 5979.03 samples/sec   Loss 5.6399   LearningRate 0.0592   Epoch: 13   Global Step: 135550   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:51:34,156-Speed 5961.27 samples/sec   Loss 5.6655   LearningRate 0.0592   Epoch: 13   Global Step: 135560   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:51:41,016-Speed 5972.02 samples/sec   Loss 5.6382   LearningRate 0.0592   Epoch: 13   Global Step: 135570   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:51:47,878-Speed 5970.68 samples/sec   Loss 5.6314   LearningRate 0.0592   Epoch: 13   Global Step: 135580   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:51:54,737-Speed 5972.34 samples/sec   Loss 5.6610   LearningRate 0.0592   Epoch: 13   Global Step: 135590   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:52:01,599-Speed 5970.59 samples/sec   Loss 5.6369   LearningRate 0.0592   Epoch: 13   Global Step: 135600   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:52:08,481-Speed 5953.41 samples/sec   Loss 5.6029   LearningRate 0.0591   Epoch: 13   Global Step: 135610   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:52:15,334-Speed 5977.16 samples/sec   Loss 5.6998   LearningRate 0.0591   Epoch: 13   Global Step: 135620   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 22:52:22,193-Speed 5972.84 samples/sec   Loss 5.7089   LearningRate 0.0591   Epoch: 13   Global Step: 135630   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 22:52:29,050-Speed 5975.19 samples/sec   Loss 5.7110   LearningRate 0.0591   Epoch: 13   Global Step: 135640   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 22:52:36,003-Speed 5892.29 samples/sec   Loss 5.6149   LearningRate 0.0591   Epoch: 13   Global Step: 135650   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 22:52:42,869-Speed 5966.24 samples/sec   Loss 5.6845   LearningRate 0.0591   Epoch: 13   Global Step: 135660   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 22:52:49,720-Speed 5980.15 samples/sec   Loss 5.6599   LearningRate 0.0590   Epoch: 13   Global Step: 135670   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 22:52:56,571-Speed 5980.10 samples/sec   Loss 5.6459   LearningRate 0.0590   Epoch: 13   Global Step: 135680   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 22:53:03,460-Speed 5946.76 samples/sec   Loss 5.6309   LearningRate 0.0590   Epoch: 13   Global Step: 135690   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 22:53:10,308-Speed 5982.08 samples/sec   Loss 5.6787   LearningRate 0.0590   Epoch: 13   Global Step: 135700   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 22:53:17,164-Speed 5975.15 samples/sec   Loss 5.6534   LearningRate 0.0590   Epoch: 13   Global Step: 135710   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 22:53:24,015-Speed 5980.59 samples/sec   Loss 5.6758   LearningRate 0.0590   Epoch: 13   Global Step: 135720   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:53:30,874-Speed 5973.40 samples/sec   Loss 5.6713   LearningRate 0.0589   Epoch: 13   Global Step: 135730   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:53:37,720-Speed 5984.06 samples/sec   Loss 5.6079   LearningRate 0.0589   Epoch: 13   Global Step: 135740   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:53:44,653-Speed 5909.33 samples/sec   Loss 5.5948   LearningRate 0.0589   Epoch: 13   Global Step: 135750   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:53:51,522-Speed 5964.45 samples/sec   Loss 5.5960   LearningRate 0.0589   Epoch: 13   Global Step: 135760   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:53:58,377-Speed 5976.15 samples/sec   Loss 5.5772   LearningRate 0.0589   Epoch: 13   Global Step: 135770   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:54:05,228-Speed 5979.33 samples/sec   Loss 5.6444   LearningRate 0.0589   Epoch: 13   Global Step: 135780   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:54:12,083-Speed 5976.52 samples/sec   Loss 5.5707   LearningRate 0.0588   Epoch: 13   Global Step: 135790   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:54:18,967-Speed 5951.38 samples/sec   Loss 5.6518   LearningRate 0.0588   Epoch: 13   Global Step: 135800   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:54:25,825-Speed 5975.81 samples/sec   Loss 5.6081   LearningRate 0.0588   Epoch: 13   Global Step: 135810   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:54:32,666-Speed 5988.20 samples/sec   Loss 5.6648   LearningRate 0.0588   Epoch: 13   Global Step: 135820   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:54:39,534-Speed 5964.97 samples/sec   Loss 5.6176   LearningRate 0.0588   Epoch: 13   Global Step: 135830   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:54:46,385-Speed 5980.69 samples/sec   Loss 5.6304   LearningRate 0.0588   Epoch: 13   Global Step: 135840   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:54:53,278-Speed 5945.58 samples/sec   Loss 5.6132   LearningRate 0.0588   Epoch: 13   Global Step: 135850   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:55:00,141-Speed 5969.02 samples/sec   Loss 5.5922   LearningRate 0.0587   Epoch: 13   Global Step: 135860   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:55:06,995-Speed 5977.09 samples/sec   Loss 5.6442   LearningRate 0.0587   Epoch: 13   Global Step: 135870   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:55:13,880-Speed 5950.68 samples/sec   Loss 5.6521   LearningRate 0.0587   Epoch: 13   Global Step: 135880   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:55:20,759-Speed 5955.01 samples/sec   Loss 5.6494   LearningRate 0.0587   Epoch: 13   Global Step: 135890   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:55:27,633-Speed 5960.36 samples/sec   Loss 5.5925   LearningRate 0.0587   Epoch: 13   Global Step: 135900   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:55:34,499-Speed 5966.28 samples/sec   Loss 5.6523   LearningRate 0.0587   Epoch: 13   Global Step: 135910   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:55:41,349-Speed 5981.10 samples/sec   Loss 5.6603   LearningRate 0.0586   Epoch: 13   Global Step: 135920   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:55:48,219-Speed 5963.95 samples/sec   Loss 5.6488   LearningRate 0.0586   Epoch: 13   Global Step: 135930   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:55:55,077-Speed 5973.72 samples/sec   Loss 5.6407   LearningRate 0.0586   Epoch: 13   Global Step: 135940   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:56:01,937-Speed 5972.00 samples/sec   Loss 5.6407   LearningRate 0.0586   Epoch: 13   Global Step: 135950   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:56:08,823-Speed 5950.05 samples/sec   Loss 5.6215   LearningRate 0.0586   Epoch: 13   Global Step: 135960   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:56:15,678-Speed 5976.56 samples/sec   Loss 5.6102   LearningRate 0.0586   Epoch: 13   Global Step: 135970   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:56:22,539-Speed 5970.65 samples/sec   Loss 5.6229   LearningRate 0.0585   Epoch: 13   Global Step: 135980   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:56:29,419-Speed 5954.31 samples/sec   Loss 5.6038   LearningRate 0.0585   Epoch: 13   Global Step: 135990   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:56:36,340-Speed 5919.56 samples/sec   Loss 5.6086   LearningRate 0.0585   Epoch: 13   Global Step: 136000   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:56:43,207-Speed 5968.56 samples/sec   Loss 5.6200   LearningRate 0.0585   Epoch: 13   Global Step: 136010   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:56:50,074-Speed 5965.30 samples/sec   Loss 5.5646   LearningRate 0.0585   Epoch: 13   Global Step: 136020   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:56:56,941-Speed 5966.40 samples/sec   Loss 5.5904   LearningRate 0.0585   Epoch: 13   Global Step: 136030   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:57:03,807-Speed 5966.33 samples/sec   Loss 5.5816   LearningRate 0.0584   Epoch: 13   Global Step: 136040   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:57:10,671-Speed 5968.56 samples/sec   Loss 5.5698   LearningRate 0.0584   Epoch: 13   Global Step: 136050   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:57:17,548-Speed 5957.48 samples/sec   Loss 5.6296   LearningRate 0.0584   Epoch: 13   Global Step: 136060   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:57:24,405-Speed 5975.27 samples/sec   Loss 5.6168   LearningRate 0.0584   Epoch: 13   Global Step: 136070   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:57:31,281-Speed 5958.96 samples/sec   Loss 5.5679   LearningRate 0.0584   Epoch: 13   Global Step: 136080   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:57:38,162-Speed 5954.56 samples/sec   Loss 5.6235   LearningRate 0.0584   Epoch: 13   Global Step: 136090   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:57:45,018-Speed 5975.26 samples/sec   Loss 5.6297   LearningRate 0.0583   Epoch: 13   Global Step: 136100   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:57:51,859-Speed 5988.32 samples/sec   Loss 5.6176   LearningRate 0.0583   Epoch: 13   Global Step: 136110   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:57:58,742-Speed 5955.15 samples/sec   Loss 5.5682   LearningRate 0.0583   Epoch: 13   Global Step: 136120   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:58:05,616-Speed 5959.30 samples/sec   Loss 5.5670   LearningRate 0.0583   Epoch: 13   Global Step: 136130   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:58:12,469-Speed 5978.54 samples/sec   Loss 5.5769   LearningRate 0.0583   Epoch: 13   Global Step: 136140   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:58:19,324-Speed 5976.62 samples/sec   Loss 5.5691   LearningRate 0.0583   Epoch: 13   Global Step: 136150   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:58:26,180-Speed 5975.84 samples/sec   Loss 5.6237   LearningRate 0.0582   Epoch: 13   Global Step: 136160   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:58:33,055-Speed 5958.59 samples/sec   Loss 5.6060   LearningRate 0.0582   Epoch: 13   Global Step: 136170   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:58:39,909-Speed 5977.84 samples/sec   Loss 5.6810   LearningRate 0.0582   Epoch: 13   Global Step: 136180   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 22:58:46,769-Speed 5971.50 samples/sec   Loss 5.6305   LearningRate 0.0582   Epoch: 13   Global Step: 136190   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:58:53,626-Speed 5974.85 samples/sec   Loss 5.5471   LearningRate 0.0582   Epoch: 13   Global Step: 136200   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:59:00,510-Speed 5952.20 samples/sec   Loss 5.6316   LearningRate 0.0582   Epoch: 13   Global Step: 136210   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:59:07,377-Speed 5965.38 samples/sec   Loss 5.5869   LearningRate 0.0581   Epoch: 13   Global Step: 136220   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:59:14,238-Speed 5971.64 samples/sec   Loss 5.5925   LearningRate 0.0581   Epoch: 13   Global Step: 136230   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:59:21,109-Speed 5962.15 samples/sec   Loss 5.6287   LearningRate 0.0581   Epoch: 13   Global Step: 136240   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:59:27,956-Speed 5983.24 samples/sec   Loss 5.6227   LearningRate 0.0581   Epoch: 13   Global Step: 136250   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:59:34,814-Speed 5975.01 samples/sec   Loss 5.5849   LearningRate 0.0581   Epoch: 13   Global Step: 136260   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:59:41,820-Speed 5847.86 samples/sec   Loss 5.6347   LearningRate 0.0581   Epoch: 13   Global Step: 136270   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:59:48,783-Speed 5884.13 samples/sec   Loss 5.6026   LearningRate 0.0580   Epoch: 13   Global Step: 136280   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 22:59:55,644-Speed 5971.38 samples/sec   Loss 5.6286   LearningRate 0.0580   Epoch: 13   Global Step: 136290   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:00:02,488-Speed 5986.10 samples/sec   Loss 5.6120   LearningRate 0.0580   Epoch: 13   Global Step: 136300   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:00:19,698-Speed 2380.26 samples/sec   Loss 5.5968   LearningRate 0.0580   Epoch: 13   Global Step: 136310   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:00:26,537-Speed 5989.70 samples/sec   Loss 5.5845   LearningRate 0.0580   Epoch: 13   Global Step: 136320   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:00:33,369-Speed 5996.43 samples/sec   Loss 5.5745   LearningRate 0.0580   Epoch: 13   Global Step: 136330   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:00:40,218-Speed 5982.57 samples/sec   Loss 5.6614   LearningRate 0.0579   Epoch: 13   Global Step: 136340   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:00:47,075-Speed 5974.36 samples/sec   Loss 5.6228   LearningRate 0.0579   Epoch: 13   Global Step: 136350   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:00:53,951-Speed 5958.59 samples/sec   Loss 5.5622   LearningRate 0.0579   Epoch: 13   Global Step: 136360   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:01:00,804-Speed 5977.91 samples/sec   Loss 5.6420   LearningRate 0.0579   Epoch: 13   Global Step: 136370   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:01:07,786-Speed 5869.66 samples/sec   Loss 5.5290   LearningRate 0.0579   Epoch: 13   Global Step: 136380   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:01:14,663-Speed 5957.46 samples/sec   Loss 5.5618   LearningRate 0.0579   Epoch: 13   Global Step: 136390   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:01:21,522-Speed 5972.68 samples/sec   Loss 5.5815   LearningRate 0.0579   Epoch: 13   Global Step: 136400   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:01:28,378-Speed 5974.99 samples/sec   Loss 5.5835   LearningRate 0.0578   Epoch: 13   Global Step: 136410   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:01:35,263-Speed 5950.59 samples/sec   Loss 5.6203   LearningRate 0.0578   Epoch: 13   Global Step: 136420   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:01:42,118-Speed 5976.00 samples/sec   Loss 5.5805   LearningRate 0.0578   Epoch: 13   Global Step: 136430   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:01:48,987-Speed 5964.92 samples/sec   Loss 5.5425   LearningRate 0.0578   Epoch: 13   Global Step: 136440   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:01:55,842-Speed 5975.73 samples/sec   Loss 5.6040   LearningRate 0.0578   Epoch: 13   Global Step: 136450   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:02:02,739-Speed 5940.07 samples/sec   Loss 5.5734   LearningRate 0.0578   Epoch: 13   Global Step: 136460   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:02:09,615-Speed 5970.03 samples/sec   Loss 5.5462   LearningRate 0.0577   Epoch: 13   Global Step: 136470   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:02:16,465-Speed 5980.14 samples/sec   Loss 5.5587   LearningRate 0.0577   Epoch: 13   Global Step: 136480   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:02:23,305-Speed 5991.52 samples/sec   Loss 5.5586   LearningRate 0.0577   Epoch: 13   Global Step: 136490   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:02:30,173-Speed 5966.39 samples/sec   Loss 5.6011   LearningRate 0.0577   Epoch: 13   Global Step: 136500   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:02:37,042-Speed 5963.82 samples/sec   Loss 5.5778   LearningRate 0.0577   Epoch: 13   Global Step: 136510   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:02:43,906-Speed 5970.72 samples/sec   Loss 5.5610   LearningRate 0.0577   Epoch: 13   Global Step: 136520   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:02:50,760-Speed 5977.40 samples/sec   Loss 5.5752   LearningRate 0.0576   Epoch: 13   Global Step: 136530   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:02:57,608-Speed 5982.05 samples/sec   Loss 5.5444   LearningRate 0.0576   Epoch: 13   Global Step: 136540   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:03:04,521-Speed 5926.45 samples/sec   Loss 5.5484   LearningRate 0.0576   Epoch: 13   Global Step: 136550   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:03:11,452-Speed 5910.94 samples/sec   Loss 5.5395   LearningRate 0.0576   Epoch: 13   Global Step: 136560   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:03:18,310-Speed 5972.90 samples/sec   Loss 5.5666   LearningRate 0.0576   Epoch: 13   Global Step: 136570   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:03:25,162-Speed 5979.98 samples/sec   Loss 5.5447   LearningRate 0.0576   Epoch: 13   Global Step: 136580   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:03:32,033-Speed 5962.14 samples/sec   Loss 5.5420   LearningRate 0.0575   Epoch: 13   Global Step: 136590   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:03:38,888-Speed 5976.31 samples/sec   Loss 5.5921   LearningRate 0.0575   Epoch: 13   Global Step: 136600   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:03:45,755-Speed 5966.09 samples/sec   Loss 5.5881   LearningRate 0.0575   Epoch: 13   Global Step: 136610   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:03:52,631-Speed 5958.25 samples/sec   Loss 5.5481   LearningRate 0.0575   Epoch: 13   Global Step: 136620   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:03:59,461-Speed 5997.58 samples/sec   Loss 5.5755   LearningRate 0.0575   Epoch: 13   Global Step: 136630   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 23:04:06,311-Speed 5980.71 samples/sec   Loss 5.5998   LearningRate 0.0575   Epoch: 13   Global Step: 136640   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 23:04:13,163-Speed 5979.03 samples/sec   Loss 5.5600   LearningRate 0.0574   Epoch: 13   Global Step: 136650   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 23:04:20,024-Speed 5971.52 samples/sec   Loss 5.5624   LearningRate 0.0574   Epoch: 13   Global Step: 136660   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 23:04:26,870-Speed 5985.82 samples/sec   Loss 5.6042   LearningRate 0.0574   Epoch: 13   Global Step: 136670   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 23:04:33,733-Speed 5970.80 samples/sec   Loss 5.5722   LearningRate 0.0574   Epoch: 13   Global Step: 136680   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 23:04:40,580-Speed 5983.80 samples/sec   Loss 5.5413   LearningRate 0.0574   Epoch: 13   Global Step: 136690   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 23:04:47,445-Speed 5967.61 samples/sec   Loss 5.5819   LearningRate 0.0574   Epoch: 13   Global Step: 136700   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 23:04:54,301-Speed 5975.64 samples/sec   Loss 5.5339   LearningRate 0.0573   Epoch: 13   Global Step: 136710   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 23:05:01,203-Speed 5935.21 samples/sec   Loss 5.5092   LearningRate 0.0573   Epoch: 13   Global Step: 136720   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 23:05:08,066-Speed 5969.39 samples/sec   Loss 5.5475   LearningRate 0.0573   Epoch: 13   Global Step: 136730   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:05:14,950-Speed 5951.68 samples/sec   Loss 5.6369   LearningRate 0.0573   Epoch: 13   Global Step: 136740   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:05:21,832-Speed 5952.95 samples/sec   Loss 5.5436   LearningRate 0.0573   Epoch: 13   Global Step: 136750   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:05:28,688-Speed 5975.23 samples/sec   Loss 5.5552   LearningRate 0.0573   Epoch: 13   Global Step: 136760   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:05:35,559-Speed 5963.08 samples/sec   Loss 5.5570   LearningRate 0.0572   Epoch: 13   Global Step: 136770   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:05:42,408-Speed 5980.99 samples/sec   Loss 5.4925   LearningRate 0.0572   Epoch: 13   Global Step: 136780   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:05:49,303-Speed 5943.27 samples/sec   Loss 5.5485   LearningRate 0.0572   Epoch: 13   Global Step: 136790   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:05:56,145-Speed 5989.88 samples/sec   Loss 5.5809   LearningRate 0.0572   Epoch: 13   Global Step: 136800   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:06:03,006-Speed 5971.01 samples/sec   Loss 5.5743   LearningRate 0.0572   Epoch: 13   Global Step: 136810   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 23:06:09,850-Speed 5986.68 samples/sec   Loss 5.5532   LearningRate 0.0572   Epoch: 13   Global Step: 136820   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 23:06:16,695-Speed 5985.11 samples/sec   Loss 5.6157   LearningRate 0.0572   Epoch: 13   Global Step: 136830   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 23:06:23,540-Speed 5984.68 samples/sec   Loss 5.5492   LearningRate 0.0571   Epoch: 13   Global Step: 136840   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 23:06:30,387-Speed 5985.38 samples/sec   Loss 5.5474   LearningRate 0.0571   Epoch: 13   Global Step: 136850   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 23:06:37,236-Speed 5982.08 samples/sec   Loss 5.5476   LearningRate 0.0571   Epoch: 13   Global Step: 136860   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 23:06:44,075-Speed 5989.49 samples/sec   Loss 5.5439   LearningRate 0.0571   Epoch: 13   Global Step: 136870   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 23:06:50,914-Speed 5989.99 samples/sec   Loss 5.5509   LearningRate 0.0571   Epoch: 13   Global Step: 136880   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 23:06:57,788-Speed 5962.45 samples/sec   Loss 5.5594   LearningRate 0.0571   Epoch: 13   Global Step: 136890   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 23:07:04,634-Speed 5983.27 samples/sec   Loss 5.5291   LearningRate 0.0570   Epoch: 13   Global Step: 136900   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 23:07:11,493-Speed 5972.97 samples/sec   Loss 5.5137   LearningRate 0.0570   Epoch: 13   Global Step: 136910   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:07:18,346-Speed 5977.99 samples/sec   Loss 5.5585   LearningRate 0.0570   Epoch: 13   Global Step: 136920   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:07:25,218-Speed 5961.69 samples/sec   Loss 5.6094   LearningRate 0.0570   Epoch: 13   Global Step: 136930   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:07:32,100-Speed 5953.30 samples/sec   Loss 5.5387   LearningRate 0.0570   Epoch: 13   Global Step: 136940   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:07:38,973-Speed 5961.68 samples/sec   Loss 5.5292   LearningRate 0.0570   Epoch: 13   Global Step: 136950   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:07:45,825-Speed 5978.52 samples/sec   Loss 5.5167   LearningRate 0.0569   Epoch: 13   Global Step: 136960   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:07:52,675-Speed 5981.04 samples/sec   Loss 5.5527   LearningRate 0.0569   Epoch: 13   Global Step: 136970   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:07:59,527-Speed 5979.89 samples/sec   Loss 5.5871   LearningRate 0.0569   Epoch: 13   Global Step: 136980   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:08:06,383-Speed 5975.37 samples/sec   Loss 5.5555   LearningRate 0.0569   Epoch: 13   Global Step: 136990   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:08:13,263-Speed 5954.49 samples/sec   Loss 5.5194   LearningRate 0.0569   Epoch: 13   Global Step: 137000   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:08:20,131-Speed 5965.79 samples/sec   Loss 5.5547   LearningRate 0.0569   Epoch: 13   Global Step: 137010   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:08:27,006-Speed 5958.28 samples/sec   Loss 5.5123   LearningRate 0.0568   Epoch: 13   Global Step: 137020   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:08:33,843-Speed 5991.78 samples/sec   Loss 5.5602   LearningRate 0.0568   Epoch: 13   Global Step: 137030   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:08:40,701-Speed 5974.30 samples/sec   Loss 5.5238   LearningRate 0.0568   Epoch: 13   Global Step: 137040   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:08:47,660-Speed 5886.69 samples/sec   Loss 5.5223   LearningRate 0.0568   Epoch: 13   Global Step: 137050   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:08:54,622-Speed 5885.75 samples/sec   Loss 5.5631   LearningRate 0.0568   Epoch: 13   Global Step: 137060   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:09:01,585-Speed 5883.69 samples/sec   Loss 5.5270   LearningRate 0.0568   Epoch: 13   Global Step: 137070   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:09:08,443-Speed 5973.87 samples/sec   Loss 5.5499   LearningRate 0.0567   Epoch: 13   Global Step: 137080   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:09:15,304-Speed 5971.64 samples/sec   Loss 5.5644   LearningRate 0.0567   Epoch: 13   Global Step: 137090   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:09:22,182-Speed 5957.32 samples/sec   Loss 5.5186   LearningRate 0.0567   Epoch: 13   Global Step: 137100   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:09:29,041-Speed 5972.63 samples/sec   Loss 5.5393   LearningRate 0.0567   Epoch: 13   Global Step: 137110   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 23:09:35,905-Speed 5968.78 samples/sec   Loss 5.5231   LearningRate 0.0567   Epoch: 13   Global Step: 137120   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 23:09:42,749-Speed 5986.72 samples/sec   Loss 5.5456   LearningRate 0.0567   Epoch: 13   Global Step: 137130   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 23:09:49,592-Speed 5986.57 samples/sec   Loss 5.5238   LearningRate 0.0567   Epoch: 13   Global Step: 137140   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 23:09:56,447-Speed 5976.77 samples/sec   Loss 5.4931   LearningRate 0.0566   Epoch: 13   Global Step: 137150   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 23:10:03,294-Speed 5982.86 samples/sec   Loss 5.5484   LearningRate 0.0566   Epoch: 13   Global Step: 137160   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 23:10:10,275-Speed 5868.92 samples/sec   Loss 5.5251   LearningRate 0.0566   Epoch: 13   Global Step: 137170   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 23:10:17,146-Speed 5962.59 samples/sec   Loss 5.5875   LearningRate 0.0566   Epoch: 13   Global Step: 137180   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 23:10:24,000-Speed 5977.88 samples/sec   Loss 5.5498   LearningRate 0.0566   Epoch: 13   Global Step: 137190   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 23:10:30,863-Speed 5968.98 samples/sec   Loss 5.5215   LearningRate 0.0566   Epoch: 13   Global Step: 137200   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-01-08 23:10:37,715-Speed 5978.88 samples/sec   Loss 5.5746   LearningRate 0.0565   Epoch: 13   Global Step: 137210   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:10:44,581-Speed 5966.85 samples/sec   Loss 5.4923   LearningRate 0.0565   Epoch: 13   Global Step: 137220   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:10:51,421-Speed 5989.16 samples/sec   Loss 5.5632   LearningRate 0.0565   Epoch: 13   Global Step: 137230   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:10:58,262-Speed 5988.07 samples/sec   Loss 5.4803   LearningRate 0.0565   Epoch: 13   Global Step: 137240   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:11:05,111-Speed 5982.33 samples/sec   Loss 5.5025   LearningRate 0.0565   Epoch: 13   Global Step: 137250   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:11:11,972-Speed 5971.02 samples/sec   Loss 5.4811   LearningRate 0.0565   Epoch: 13   Global Step: 137260   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:11:18,825-Speed 5977.89 samples/sec   Loss 5.5073   LearningRate 0.0564   Epoch: 13   Global Step: 137270   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:11:25,685-Speed 5972.32 samples/sec   Loss 5.4871   LearningRate 0.0564   Epoch: 13   Global Step: 137280   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:11:32,541-Speed 5975.13 samples/sec   Loss 5.5188   LearningRate 0.0564   Epoch: 13   Global Step: 137290   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:11:39,463-Speed 5918.38 samples/sec   Loss 5.5042   LearningRate 0.0564   Epoch: 13   Global Step: 137300   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:11:46,315-Speed 5979.07 samples/sec   Loss 5.5517   LearningRate 0.0564   Epoch: 13   Global Step: 137310   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:11:53,160-Speed 5984.54 samples/sec   Loss 5.5431   LearningRate 0.0564   Epoch: 13   Global Step: 137320   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:12:00,021-Speed 5973.02 samples/sec   Loss 5.5422   LearningRate 0.0563   Epoch: 13   Global Step: 137330   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:12:06,896-Speed 5959.09 samples/sec   Loss 5.5113   LearningRate 0.0563   Epoch: 13   Global Step: 137340   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:12:13,742-Speed 5986.63 samples/sec   Loss 5.4762   LearningRate 0.0563   Epoch: 13   Global Step: 137350   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:12:20,612-Speed 5962.96 samples/sec   Loss 5.5565   LearningRate 0.0563   Epoch: 13   Global Step: 137360   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:12:27,469-Speed 5975.71 samples/sec   Loss 5.5299   LearningRate 0.0563   Epoch: 13   Global Step: 137370   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:12:34,316-Speed 5982.72 samples/sec   Loss 5.5296   LearningRate 0.0563   Epoch: 13   Global Step: 137380   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:12:41,176-Speed 5971.49 samples/sec   Loss 5.5649   LearningRate 0.0562   Epoch: 13   Global Step: 137390   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:12:48,032-Speed 5976.21 samples/sec   Loss 5.5123   LearningRate 0.0562   Epoch: 13   Global Step: 137400   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:12:54,896-Speed 5967.77 samples/sec   Loss 5.5040   LearningRate 0.0562   Epoch: 13   Global Step: 137410   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:13:01,770-Speed 5960.32 samples/sec   Loss 5.4954   LearningRate 0.0562   Epoch: 13   Global Step: 137420   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:13:08,623-Speed 5981.20 samples/sec   Loss 5.5346   LearningRate 0.0562   Epoch: 13   Global Step: 137430   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:13:15,547-Speed 5916.15 samples/sec   Loss 5.5162   LearningRate 0.0562   Epoch: 13   Global Step: 137440   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:13:22,422-Speed 5959.34 samples/sec   Loss 5.5684   LearningRate 0.0562   Epoch: 13   Global Step: 137450   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:13:29,287-Speed 5970.25 samples/sec   Loss 5.4936   LearningRate 0.0561   Epoch: 13   Global Step: 137460   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:13:36,142-Speed 5976.11 samples/sec   Loss 5.5416   LearningRate 0.0561   Epoch: 13   Global Step: 137470   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:13:43,019-Speed 5957.93 samples/sec   Loss 5.5496   LearningRate 0.0561   Epoch: 13   Global Step: 137480   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:13:49,875-Speed 5975.22 samples/sec   Loss 5.4918   LearningRate 0.0561   Epoch: 13   Global Step: 137490   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:13:56,826-Speed 5894.44 samples/sec   Loss 5.4794   LearningRate 0.0561   Epoch: 13   Global Step: 137500   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:14:03,782-Speed 5890.11 samples/sec   Loss 5.5031   LearningRate 0.0561   Epoch: 13   Global Step: 137510   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:14:10,647-Speed 5967.27 samples/sec   Loss 5.5377   LearningRate 0.0560   Epoch: 13   Global Step: 137520   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:14:17,500-Speed 5978.40 samples/sec   Loss 5.5508   LearningRate 0.0560   Epoch: 13   Global Step: 137530   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:14:24,417-Speed 5923.50 samples/sec   Loss 5.4800   LearningRate 0.0560   Epoch: 13   Global Step: 137540   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:14:31,267-Speed 5980.40 samples/sec   Loss 5.4885   LearningRate 0.0560   Epoch: 13   Global Step: 137550   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:14:38,129-Speed 5970.60 samples/sec   Loss 5.5407   LearningRate 0.0560   Epoch: 13   Global Step: 137560   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:14:44,995-Speed 5966.97 samples/sec   Loss 5.4918   LearningRate 0.0560   Epoch: 13   Global Step: 137570   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:14:51,850-Speed 5975.96 samples/sec   Loss 5.5021   LearningRate 0.0559   Epoch: 13   Global Step: 137580   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:14:58,728-Speed 5956.00 samples/sec   Loss 5.5324   LearningRate 0.0559   Epoch: 13   Global Step: 137590   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:15:05,578-Speed 5981.42 samples/sec   Loss 5.4798   LearningRate 0.0559   Epoch: 13   Global Step: 137600   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:15:12,454-Speed 5958.88 samples/sec   Loss 5.5202   LearningRate 0.0559   Epoch: 13   Global Step: 137610   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:15:19,329-Speed 5958.19 samples/sec   Loss 5.5225   LearningRate 0.0559   Epoch: 13   Global Step: 137620   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:15:26,191-Speed 5970.46 samples/sec   Loss 5.5147   LearningRate 0.0559   Epoch: 13   Global Step: 137630   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:15:33,059-Speed 5966.53 samples/sec   Loss 5.4614   LearningRate 0.0558   Epoch: 13   Global Step: 137640   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:15:39,921-Speed 5969.90 samples/sec   Loss 5.5659   LearningRate 0.0558   Epoch: 13   Global Step: 137650   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:15:46,803-Speed 5952.85 samples/sec   Loss 5.4834   LearningRate 0.0558   Epoch: 13   Global Step: 137660   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:15:53,660-Speed 5975.09 samples/sec   Loss 5.5087   LearningRate 0.0558   Epoch: 13   Global Step: 137670   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:16:00,523-Speed 5969.31 samples/sec   Loss 5.5114   LearningRate 0.0558   Epoch: 13   Global Step: 137680   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:16:07,382-Speed 5972.87 samples/sec   Loss 5.5152   LearningRate 0.0558   Epoch: 13   Global Step: 137690   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:16:14,264-Speed 5955.55 samples/sec   Loss 5.5361   LearningRate 0.0558   Epoch: 13   Global Step: 137700   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:16:21,165-Speed 5936.69 samples/sec   Loss 5.5111   LearningRate 0.0557   Epoch: 13   Global Step: 137710   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:16:28,056-Speed 5945.06 samples/sec   Loss 5.5301   LearningRate 0.0557   Epoch: 13   Global Step: 137720   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:16:35,024-Speed 5879.47 samples/sec   Loss 5.4774   LearningRate 0.0557   Epoch: 13   Global Step: 137730   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:16:41,992-Speed 5879.81 samples/sec   Loss 5.5047   LearningRate 0.0557   Epoch: 13   Global Step: 137740   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:16:48,881-Speed 5947.17 samples/sec   Loss 5.5611   LearningRate 0.0557   Epoch: 13   Global Step: 137750   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:16:55,768-Speed 5948.21 samples/sec   Loss 5.4357   LearningRate 0.0557   Epoch: 13   Global Step: 137760   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:17:02,624-Speed 5975.50 samples/sec   Loss 5.4850   LearningRate 0.0556   Epoch: 13   Global Step: 137770   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:17:09,504-Speed 5954.98 samples/sec   Loss 5.5293   LearningRate 0.0556   Epoch: 13   Global Step: 137780   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:17:16,423-Speed 5922.01 samples/sec   Loss 5.5528   LearningRate 0.0556   Epoch: 13   Global Step: 137790   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:17:23,411-Speed 5863.39 samples/sec   Loss 5.4940   LearningRate 0.0556   Epoch: 13   Global Step: 137800   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-08 23:17:30,322-Speed 5927.63 samples/sec   Loss 5.5057   LearningRate 0.0556   Epoch: 13   Global Step: 137810   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:17:37,173-Speed 5980.49 samples/sec   Loss 5.4830   LearningRate 0.0556   Epoch: 13   Global Step: 137820   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:17:44,014-Speed 5988.90 samples/sec   Loss 5.4952   LearningRate 0.0555   Epoch: 13   Global Step: 137830   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:17:50,866-Speed 5980.67 samples/sec   Loss 5.4860   LearningRate 0.0555   Epoch: 13   Global Step: 137840   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:17:57,703-Speed 5991.36 samples/sec   Loss 5.4950   LearningRate 0.0555   Epoch: 13   Global Step: 137850   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:18:04,562-Speed 5972.75 samples/sec   Loss 5.4666   LearningRate 0.0555   Epoch: 13   Global Step: 137860   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:18:11,418-Speed 5976.16 samples/sec   Loss 5.5241   LearningRate 0.0555   Epoch: 13   Global Step: 137870   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:18:18,289-Speed 5964.68 samples/sec   Loss 5.5104   LearningRate 0.0555   Epoch: 13   Global Step: 137880   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:18:25,146-Speed 5974.27 samples/sec   Loss 5.4884   LearningRate 0.0554   Epoch: 13   Global Step: 137890   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:18:31,992-Speed 5983.90 samples/sec   Loss 5.5116   LearningRate 0.0554   Epoch: 13   Global Step: 137900   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:18:38,822-Speed 5998.26 samples/sec   Loss 5.4465   LearningRate 0.0554   Epoch: 13   Global Step: 137910   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:18:45,671-Speed 5981.89 samples/sec   Loss 5.5183   LearningRate 0.0554   Epoch: 13   Global Step: 137920   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:18:52,517-Speed 5984.71 samples/sec   Loss 5.5000   LearningRate 0.0554   Epoch: 13   Global Step: 137930   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:18:59,436-Speed 5921.85 samples/sec   Loss 5.4916   LearningRate 0.0554   Epoch: 13   Global Step: 137940   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:19:06,285-Speed 5981.05 samples/sec   Loss 5.4585   LearningRate 0.0554   Epoch: 13   Global Step: 137950   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:19:13,149-Speed 5968.87 samples/sec   Loss 5.4696   LearningRate 0.0553   Epoch: 13   Global Step: 137960   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:19:19,993-Speed 5986.19 samples/sec   Loss 5.4968   LearningRate 0.0553   Epoch: 13   Global Step: 137970   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:19:26,849-Speed 5974.92 samples/sec   Loss 5.4866   LearningRate 0.0553   Epoch: 13   Global Step: 137980   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:19:33,707-Speed 5973.45 samples/sec   Loss 5.4958   LearningRate 0.0553   Epoch: 13   Global Step: 137990   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:19:40,562-Speed 5976.21 samples/sec   Loss 5.4567   LearningRate 0.0553   Epoch: 13   Global Step: 138000   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:19:47,413-Speed 5979.38 samples/sec   Loss 5.4752   LearningRate 0.0553   Epoch: 13   Global Step: 138010   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:19:54,266-Speed 5977.96 samples/sec   Loss 5.4932   LearningRate 0.0552   Epoch: 13   Global Step: 138020   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:20:01,126-Speed 5971.80 samples/sec   Loss 5.4376   LearningRate 0.0552   Epoch: 13   Global Step: 138030   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:20:07,974-Speed 5982.56 samples/sec   Loss 5.4333   LearningRate 0.0552   Epoch: 13   Global Step: 138040   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:20:14,822-Speed 5984.62 samples/sec   Loss 5.4673   LearningRate 0.0552   Epoch: 13   Global Step: 138050   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:20:21,678-Speed 5975.38 samples/sec   Loss 5.4663   LearningRate 0.0552   Epoch: 13   Global Step: 138060   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:20:28,529-Speed 5979.20 samples/sec   Loss 5.4977   LearningRate 0.0552   Epoch: 13   Global Step: 138070   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:20:35,400-Speed 5963.42 samples/sec   Loss 5.4950   LearningRate 0.0551   Epoch: 13   Global Step: 138080   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:20:42,278-Speed 5956.10 samples/sec   Loss 5.4801   LearningRate 0.0551   Epoch: 13   Global Step: 138090   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:20:49,129-Speed 5979.56 samples/sec   Loss 5.4129   LearningRate 0.0551   Epoch: 13   Global Step: 138100   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:20:56,033-Speed 5934.38 samples/sec   Loss 5.4875   LearningRate 0.0551   Epoch: 13   Global Step: 138110   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:21:02,885-Speed 5986.90 samples/sec   Loss 5.4537   LearningRate 0.0551   Epoch: 13   Global Step: 138120   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:21:09,744-Speed 5972.12 samples/sec   Loss 5.4789   LearningRate 0.0551   Epoch: 13   Global Step: 138130   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:21:16,608-Speed 5969.25 samples/sec   Loss 5.5332   LearningRate 0.0550   Epoch: 13   Global Step: 138140   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:21:23,474-Speed 5966.58 samples/sec   Loss 5.4364   LearningRate 0.0550   Epoch: 13   Global Step: 138150   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:21:30,318-Speed 5985.72 samples/sec   Loss 5.4885   LearningRate 0.0550   Epoch: 13   Global Step: 138160   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:21:37,186-Speed 5965.01 samples/sec   Loss 5.4694   LearningRate 0.0550   Epoch: 13   Global Step: 138170   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:21:44,046-Speed 5972.61 samples/sec   Loss 5.4963   LearningRate 0.0550   Epoch: 13   Global Step: 138180   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:21:50,919-Speed 5960.98 samples/sec   Loss 5.5063   LearningRate 0.0550   Epoch: 13   Global Step: 138190   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:21:57,772-Speed 5978.24 samples/sec   Loss 5.4620   LearningRate 0.0550   Epoch: 13   Global Step: 138200   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:22:04,619-Speed 5985.46 samples/sec   Loss 5.4495   LearningRate 0.0549   Epoch: 13   Global Step: 138210   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:22:11,472-Speed 5977.97 samples/sec   Loss 5.4440   LearningRate 0.0549   Epoch: 13   Global Step: 138220   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:22:18,351-Speed 5956.12 samples/sec   Loss 5.4477   LearningRate 0.0549   Epoch: 13   Global Step: 138230   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:22:25,225-Speed 5959.97 samples/sec   Loss 5.4333   LearningRate 0.0549   Epoch: 13   Global Step: 138240   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:22:32,089-Speed 5968.25 samples/sec   Loss 5.4839   LearningRate 0.0549   Epoch: 13   Global Step: 138250   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:22:38,936-Speed 5983.80 samples/sec   Loss 5.4089   LearningRate 0.0549   Epoch: 13   Global Step: 138260   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:22:45,800-Speed 5971.00 samples/sec   Loss 5.4749   LearningRate 0.0548   Epoch: 13   Global Step: 138270   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:22:52,657-Speed 5974.81 samples/sec   Loss 5.4368   LearningRate 0.0548   Epoch: 13   Global Step: 138280   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:22:59,522-Speed 5967.55 samples/sec   Loss 5.4413   LearningRate 0.0548   Epoch: 13   Global Step: 138290   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:23:06,390-Speed 5965.28 samples/sec   Loss 5.5109   LearningRate 0.0548   Epoch: 13   Global Step: 138300   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:23:13,283-Speed 5943.43 samples/sec   Loss 5.4908   LearningRate 0.0548   Epoch: 13   Global Step: 138310   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:23:20,128-Speed 5985.23 samples/sec   Loss 5.4555   LearningRate 0.0548   Epoch: 13   Global Step: 138320   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:23:26,998-Speed 5963.71 samples/sec   Loss 5.4425   LearningRate 0.0547   Epoch: 13   Global Step: 138330   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:23:33,850-Speed 5978.84 samples/sec   Loss 5.4310   LearningRate 0.0547   Epoch: 13   Global Step: 138340   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:23:40,703-Speed 5977.68 samples/sec   Loss 5.4339   LearningRate 0.0547   Epoch: 13   Global Step: 138350   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-01-08 23:23:47,567-Speed 5969.28 samples/sec   Loss 5.4607   LearningRate 0.0547   Epoch: 13   Global Step: 138360   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-01-08 23:23:54,426-Speed 5972.12 samples/sec   Loss 5.3832   LearningRate 0.0547   Epoch: 13   Global Step: 138370   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:24:01,307-Speed 5954.36 samples/sec   Loss 5.4194   LearningRate 0.0547   Epoch: 13   Global Step: 138380   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:24:08,167-Speed 5971.57 samples/sec   Loss 5.4502   LearningRate 0.0547   Epoch: 13   Global Step: 138390   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:24:15,049-Speed 5952.88 samples/sec   Loss 5.4836   LearningRate 0.0546   Epoch: 13   Global Step: 138400   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:24:21,907-Speed 5974.07 samples/sec   Loss 5.4378   LearningRate 0.0546   Epoch: 13   Global Step: 138410   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:24:28,758-Speed 5980.09 samples/sec   Loss 5.5087   LearningRate 0.0546   Epoch: 13   Global Step: 138420   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:24:35,659-Speed 5936.27 samples/sec   Loss 5.5057   LearningRate 0.0546   Epoch: 13   Global Step: 138430   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:24:42,511-Speed 5979.43 samples/sec   Loss 5.4530   LearningRate 0.0546   Epoch: 13   Global Step: 138440   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:24:49,364-Speed 5980.57 samples/sec   Loss 5.4651   LearningRate 0.0546   Epoch: 13   Global Step: 138450   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:24:56,228-Speed 5968.74 samples/sec   Loss 5.4297   LearningRate 0.0545   Epoch: 13   Global Step: 138460   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:25:03,096-Speed 5965.19 samples/sec   Loss 5.4420   LearningRate 0.0545   Epoch: 13   Global Step: 138470   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:25:09,944-Speed 5982.05 samples/sec   Loss 5.4637   LearningRate 0.0545   Epoch: 13   Global Step: 138480   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:25:16,795-Speed 5980.02 samples/sec   Loss 5.4873   LearningRate 0.0545   Epoch: 13   Global Step: 138490   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:25:23,666-Speed 5962.54 samples/sec   Loss 5.4351   LearningRate 0.0545   Epoch: 13   Global Step: 138500   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:25:30,516-Speed 5980.57 samples/sec   Loss 5.4586   LearningRate 0.0545   Epoch: 13   Global Step: 138510   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:25:37,393-Speed 5957.69 samples/sec   Loss 5.4820   LearningRate 0.0544   Epoch: 13   Global Step: 138520   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-01-08 23:25:44,270-Speed 5957.26 samples/sec   Loss 5.4763   LearningRate 0.0544   Epoch: 13   Global Step: 138530   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:25:51,123-Speed 5978.15 samples/sec   Loss 5.4608   LearningRate 0.0544   Epoch: 13   Global Step: 138540   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:25:57,993-Speed 5963.15 samples/sec   Loss 5.4547   LearningRate 0.0544   Epoch: 13   Global Step: 138550   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:26:04,856-Speed 5971.57 samples/sec   Loss 5.4248   LearningRate 0.0544   Epoch: 13   Global Step: 138560   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:26:11,727-Speed 5962.17 samples/sec   Loss 5.4255   LearningRate 0.0544   Epoch: 13   Global Step: 138570   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:26:18,599-Speed 5961.48 samples/sec   Loss 5.4558   LearningRate 0.0544   Epoch: 13   Global Step: 138580   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:26:25,459-Speed 5971.87 samples/sec   Loss 5.4412   LearningRate 0.0543   Epoch: 13   Global Step: 138590   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:26:32,346-Speed 5949.23 samples/sec   Loss 5.4539   LearningRate 0.0543   Epoch: 13   Global Step: 138600   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:26:39,208-Speed 5969.25 samples/sec   Loss 5.3861   LearningRate 0.0543   Epoch: 13   Global Step: 138610   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:26:46,061-Speed 5978.73 samples/sec   Loss 5.3963   LearningRate 0.0543   Epoch: 13   Global Step: 138620   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:26:52,921-Speed 5972.06 samples/sec   Loss 5.4729   LearningRate 0.0543   Epoch: 13   Global Step: 138630   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:26:59,784-Speed 5969.11 samples/sec   Loss 5.4661   LearningRate 0.0543   Epoch: 13   Global Step: 138640   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:27:06,635-Speed 5979.35 samples/sec   Loss 5.4001   LearningRate 0.0542   Epoch: 13   Global Step: 138650   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:27:13,509-Speed 5962.58 samples/sec   Loss 5.4401   LearningRate 0.0542   Epoch: 13   Global Step: 138660   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:27:20,389-Speed 5954.48 samples/sec   Loss 5.4371   LearningRate 0.0542   Epoch: 13   Global Step: 138670   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-08 23:27:27,232-Speed 5986.82 samples/sec   Loss 5.4392   LearningRate 0.0542   Epoch: 13   Global Step: 138680   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:27:34,080-Speed 5985.04 samples/sec   Loss 5.3717   LearningRate 0.0542   Epoch: 13   Global Step: 138690   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:27:40,933-Speed 5977.69 samples/sec   Loss 5.3938   LearningRate 0.0542   Epoch: 13   Global Step: 138700   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:27:47,807-Speed 5959.85 samples/sec   Loss 5.4032   LearningRate 0.0541   Epoch: 13   Global Step: 138710   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:27:54,664-Speed 5974.28 samples/sec   Loss 5.3673   LearningRate 0.0541   Epoch: 13   Global Step: 138720   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:28:01,522-Speed 5973.80 samples/sec   Loss 5.3725   LearningRate 0.0541   Epoch: 13   Global Step: 138730   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:28:08,397-Speed 5959.18 samples/sec   Loss 5.4145   LearningRate 0.0541   Epoch: 13   Global Step: 138740   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:28:15,249-Speed 5979.20 samples/sec   Loss 5.4473   LearningRate 0.0541   Epoch: 13   Global Step: 138750   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:28:22,126-Speed 5956.97 samples/sec   Loss 5.4094   LearningRate 0.0541   Epoch: 13   Global Step: 138760   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:28:28,988-Speed 5971.07 samples/sec   Loss 5.4523   LearningRate 0.0541   Epoch: 13   Global Step: 138770   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:28:35,849-Speed 5971.20 samples/sec   Loss 5.4367   LearningRate 0.0540   Epoch: 13   Global Step: 138780   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:28:42,735-Speed 5949.35 samples/sec   Loss 5.4092   LearningRate 0.0540   Epoch: 13   Global Step: 138790   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:28:49,615-Speed 5954.55 samples/sec   Loss 5.3779   LearningRate 0.0540   Epoch: 13   Global Step: 138800   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:28:56,473-Speed 5974.41 samples/sec   Loss 5.4365   LearningRate 0.0540   Epoch: 13   Global Step: 138810   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:29:03,334-Speed 5970.75 samples/sec   Loss 5.4265   LearningRate 0.0540   Epoch: 13   Global Step: 138820   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:29:10,190-Speed 5975.90 samples/sec   Loss 5.4517   LearningRate 0.0540   Epoch: 13   Global Step: 138830   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:29:17,033-Speed 5986.09 samples/sec   Loss 5.4631   LearningRate 0.0539   Epoch: 13   Global Step: 138840   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:29:23,885-Speed 5978.81 samples/sec   Loss 5.4402   LearningRate 0.0539   Epoch: 13   Global Step: 138850   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:29:30,765-Speed 5955.65 samples/sec   Loss 5.3863   LearningRate 0.0539   Epoch: 13   Global Step: 138860   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:29:37,641-Speed 5958.58 samples/sec   Loss 5.4367   LearningRate 0.0539   Epoch: 13   Global Step: 138870   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:29:44,503-Speed 5970.35 samples/sec   Loss 5.4321   LearningRate 0.0539   Epoch: 13   Global Step: 138880   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:29:51,361-Speed 5973.28 samples/sec   Loss 5.4301   LearningRate 0.0539   Epoch: 13   Global Step: 138890   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:29:58,217-Speed 5977.11 samples/sec   Loss 5.3878   LearningRate 0.0538   Epoch: 13   Global Step: 138900   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:30:05,061-Speed 5986.01 samples/sec   Loss 5.4295   LearningRate 0.0538   Epoch: 13   Global Step: 138910   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:30:11,908-Speed 5983.58 samples/sec   Loss 5.4338   LearningRate 0.0538   Epoch: 13   Global Step: 138920   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:30:18,758-Speed 5980.36 samples/sec   Loss 5.4407   LearningRate 0.0538   Epoch: 13   Global Step: 138930   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:30:25,614-Speed 5975.83 samples/sec   Loss 5.4108   LearningRate 0.0538   Epoch: 13   Global Step: 138940   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:30:32,476-Speed 5969.81 samples/sec   Loss 5.3812   LearningRate 0.0538   Epoch: 13   Global Step: 138950   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:30:39,346-Speed 5963.27 samples/sec   Loss 5.4174   LearningRate 0.0538   Epoch: 13   Global Step: 138960   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:30:46,209-Speed 5969.32 samples/sec   Loss 5.3994   LearningRate 0.0537   Epoch: 13   Global Step: 138970   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:30:53,073-Speed 5970.09 samples/sec   Loss 5.4352   LearningRate 0.0537   Epoch: 13   Global Step: 138980   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:30:59,932-Speed 5972.33 samples/sec   Loss 5.3541   LearningRate 0.0537   Epoch: 13   Global Step: 138990   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:31:06,792-Speed 5972.21 samples/sec   Loss 5.4022   LearningRate 0.0537   Epoch: 13   Global Step: 139000   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:31:13,654-Speed 5970.07 samples/sec   Loss 5.4101   LearningRate 0.0537   Epoch: 13   Global Step: 139010   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:31:20,516-Speed 5971.24 samples/sec   Loss 5.3809   LearningRate 0.0537   Epoch: 13   Global Step: 139020   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:31:27,375-Speed 5973.18 samples/sec   Loss 5.4326   LearningRate 0.0536   Epoch: 13   Global Step: 139030   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-08 23:31:34,229-Speed 5979.05 samples/sec   Loss 5.3650   LearningRate 0.0536   Epoch: 13   Global Step: 139040   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:31:41,088-Speed 5973.19 samples/sec   Loss 5.4257   LearningRate 0.0536   Epoch: 13   Global Step: 139050   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:31:47,953-Speed 5967.41 samples/sec   Loss 5.3792   LearningRate 0.0536   Epoch: 13   Global Step: 139060   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:31:54,799-Speed 5983.46 samples/sec   Loss 5.4162   LearningRate 0.0536   Epoch: 13   Global Step: 139070   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:32:01,701-Speed 5935.85 samples/sec   Loss 5.4407   LearningRate 0.0536   Epoch: 13   Global Step: 139080   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:32:08,567-Speed 5967.01 samples/sec   Loss 5.3977   LearningRate 0.0535   Epoch: 13   Global Step: 139090   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:32:15,415-Speed 5982.97 samples/sec   Loss 5.4271   LearningRate 0.0535   Epoch: 13   Global Step: 139100   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:32:22,266-Speed 5980.27 samples/sec   Loss 5.4300   LearningRate 0.0535   Epoch: 13   Global Step: 139110   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:32:29,115-Speed 5983.38 samples/sec   Loss 5.3544   LearningRate 0.0535   Epoch: 13   Global Step: 139120   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:32:35,980-Speed 5967.49 samples/sec   Loss 5.4502   LearningRate 0.0535   Epoch: 13   Global Step: 139130   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:32:42,865-Speed 5954.36 samples/sec   Loss 5.4050   LearningRate 0.0535   Epoch: 13   Global Step: 139140   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:32:49,716-Speed 5979.61 samples/sec   Loss 5.4014   LearningRate 0.0535   Epoch: 13   Global Step: 139150   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:32:56,608-Speed 5944.53 samples/sec   Loss 5.3703   LearningRate 0.0534   Epoch: 13   Global Step: 139160   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:33:03,465-Speed 5974.81 samples/sec   Loss 5.3702   LearningRate 0.0534   Epoch: 13   Global Step: 139170   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:33:10,337-Speed 5961.22 samples/sec   Loss 5.3325   LearningRate 0.0534   Epoch: 13   Global Step: 139180   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:33:17,203-Speed 5967.68 samples/sec   Loss 5.3706   LearningRate 0.0534   Epoch: 13   Global Step: 139190   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:33:24,158-Speed 5890.76 samples/sec   Loss 5.3642   LearningRate 0.0534   Epoch: 13   Global Step: 139200   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:33:31,077-Speed 5921.45 samples/sec   Loss 5.4298   LearningRate 0.0534   Epoch: 13   Global Step: 139210   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:33:37,954-Speed 5957.39 samples/sec   Loss 5.4297   LearningRate 0.0533   Epoch: 13   Global Step: 139220   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:33:44,885-Speed 5911.23 samples/sec   Loss 5.4391   LearningRate 0.0533   Epoch: 13   Global Step: 139230   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:33:51,828-Speed 5900.44 samples/sec   Loss 5.3757   LearningRate 0.0533   Epoch: 13   Global Step: 139240   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:33:58,765-Speed 5905.47 samples/sec   Loss 5.4109   LearningRate 0.0533   Epoch: 13   Global Step: 139250   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:34:05,653-Speed 5947.72 samples/sec   Loss 5.3743   LearningRate 0.0533   Epoch: 13   Global Step: 139260   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:34:12,501-Speed 5982.33 samples/sec   Loss 5.4015   LearningRate 0.0533   Epoch: 13   Global Step: 139270   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:34:19,346-Speed 5986.32 samples/sec   Loss 5.4091   LearningRate 0.0533   Epoch: 13   Global Step: 139280   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:34:26,192-Speed 5984.83 samples/sec   Loss 5.3568   LearningRate 0.0532   Epoch: 13   Global Step: 139290   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:34:33,055-Speed 5968.89 samples/sec   Loss 5.3670   LearningRate 0.0532   Epoch: 13   Global Step: 139300   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:34:39,929-Speed 5960.71 samples/sec   Loss 5.4473   LearningRate 0.0532   Epoch: 13   Global Step: 139310   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:34:46,771-Speed 5987.04 samples/sec   Loss 5.3275   LearningRate 0.0532   Epoch: 13   Global Step: 139320   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:34:53,616-Speed 5985.02 samples/sec   Loss 5.3281   LearningRate 0.0532   Epoch: 13   Global Step: 139330   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:35:00,461-Speed 5984.71 samples/sec   Loss 5.4140   LearningRate 0.0532   Epoch: 13   Global Step: 139340   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:35:07,333-Speed 5961.49 samples/sec   Loss 5.3832   LearningRate 0.0531   Epoch: 13   Global Step: 139350   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:35:14,188-Speed 5976.83 samples/sec   Loss 5.3734   LearningRate 0.0531   Epoch: 13   Global Step: 139360   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:35:21,135-Speed 5896.57 samples/sec   Loss 5.4106   LearningRate 0.0531   Epoch: 13   Global Step: 139370   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:35:28,099-Speed 5883.17 samples/sec   Loss 5.3587   LearningRate 0.0531   Epoch: 13   Global Step: 139380   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:35:34,957-Speed 5973.81 samples/sec   Loss 5.3545   LearningRate 0.0531   Epoch: 13   Global Step: 139390   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:35:41,813-Speed 5975.28 samples/sec   Loss 5.4270   LearningRate 0.0531   Epoch: 13   Global Step: 139400   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:35:48,683-Speed 5966.21 samples/sec   Loss 5.3506   LearningRate 0.0530   Epoch: 13   Global Step: 139410   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:35:55,537-Speed 5977.51 samples/sec   Loss 5.3212   LearningRate 0.0530   Epoch: 13   Global Step: 139420   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:36:02,387-Speed 5981.03 samples/sec   Loss 5.3608   LearningRate 0.0530   Epoch: 13   Global Step: 139430   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:36:09,244-Speed 5975.02 samples/sec   Loss 5.3918   LearningRate 0.0530   Epoch: 13   Global Step: 139440   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:36:16,118-Speed 5959.40 samples/sec   Loss 5.3384   LearningRate 0.0530   Epoch: 13   Global Step: 139450   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:36:22,954-Speed 5993.17 samples/sec   Loss 5.4349   LearningRate 0.0530   Epoch: 13   Global Step: 139460   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:36:29,804-Speed 5980.86 samples/sec   Loss 5.3388   LearningRate 0.0530   Epoch: 13   Global Step: 139470   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:36:36,659-Speed 5976.23 samples/sec   Loss 5.4588   LearningRate 0.0529   Epoch: 13   Global Step: 139480   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:36:43,554-Speed 5942.15 samples/sec   Loss 5.3177   LearningRate 0.0529   Epoch: 13   Global Step: 139490   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:36:50,442-Speed 5949.43 samples/sec   Loss 5.3142   LearningRate 0.0529   Epoch: 13   Global Step: 139500   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:36:57,291-Speed 5981.18 samples/sec   Loss 5.4157   LearningRate 0.0529   Epoch: 13   Global Step: 139510   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:37:04,146-Speed 5977.75 samples/sec   Loss 5.3368   LearningRate 0.0529   Epoch: 13   Global Step: 139520   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:37:11,015-Speed 5968.92 samples/sec   Loss 5.3419   LearningRate 0.0529   Epoch: 13   Global Step: 139530   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:37:17,880-Speed 5967.71 samples/sec   Loss 5.3331   LearningRate 0.0528   Epoch: 13   Global Step: 139540   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:37:24,755-Speed 5959.29 samples/sec   Loss 5.3234   LearningRate 0.0528   Epoch: 13   Global Step: 139550   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:37:31,618-Speed 5971.02 samples/sec   Loss 5.3975   LearningRate 0.0528   Epoch: 13   Global Step: 139560   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:37:38,481-Speed 5969.28 samples/sec   Loss 5.3791   LearningRate 0.0528   Epoch: 13   Global Step: 139570   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:37:45,336-Speed 5976.64 samples/sec   Loss 5.3573   LearningRate 0.0528   Epoch: 13   Global Step: 139580   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:37:52,199-Speed 5971.77 samples/sec   Loss 5.3322   LearningRate 0.0528   Epoch: 13   Global Step: 139590   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:37:59,054-Speed 5975.73 samples/sec   Loss 5.3910   LearningRate 0.0528   Epoch: 13   Global Step: 139600   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:38:05,901-Speed 5983.51 samples/sec   Loss 5.3900   LearningRate 0.0527   Epoch: 13   Global Step: 139610   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:38:12,776-Speed 5960.54 samples/sec   Loss 5.3615   LearningRate 0.0527   Epoch: 13   Global Step: 139620   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:38:19,625-Speed 5981.08 samples/sec   Loss 5.3294   LearningRate 0.0527   Epoch: 13   Global Step: 139630   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:38:26,479-Speed 5977.53 samples/sec   Loss 5.3832   LearningRate 0.0527   Epoch: 13   Global Step: 139640   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:38:33,327-Speed 5982.71 samples/sec   Loss 5.4077   LearningRate 0.0527   Epoch: 13   Global Step: 139650   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:38:40,180-Speed 5977.53 samples/sec   Loss 5.3803   LearningRate 0.0527   Epoch: 13   Global Step: 139660   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:38:47,074-Speed 5942.54 samples/sec   Loss 5.3200   LearningRate 0.0526   Epoch: 13   Global Step: 139670   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:38:53,932-Speed 5974.23 samples/sec   Loss 5.3792   LearningRate 0.0526   Epoch: 13   Global Step: 139680   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:39:00,804-Speed 5961.52 samples/sec   Loss 5.3463   LearningRate 0.0526   Epoch: 13   Global Step: 139690   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:39:07,665-Speed 5971.57 samples/sec   Loss 5.3646   LearningRate 0.0526   Epoch: 13   Global Step: 139700   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:39:14,523-Speed 5973.69 samples/sec   Loss 5.3677   LearningRate 0.0526   Epoch: 13   Global Step: 139710   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:39:21,376-Speed 5978.19 samples/sec   Loss 5.3996   LearningRate 0.0526   Epoch: 13   Global Step: 139720   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:39:28,256-Speed 5954.80 samples/sec   Loss 5.3847   LearningRate 0.0526   Epoch: 13   Global Step: 139730   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:39:35,116-Speed 5971.95 samples/sec   Loss 5.3318   LearningRate 0.0525   Epoch: 13   Global Step: 139740   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:39:41,963-Speed 5983.07 samples/sec   Loss 5.3876   LearningRate 0.0525   Epoch: 13   Global Step: 139750   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:39:48,899-Speed 5907.12 samples/sec   Loss 5.3668   LearningRate 0.0525   Epoch: 13   Global Step: 139760   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:39:55,798-Speed 5937.24 samples/sec   Loss 5.3542   LearningRate 0.0525   Epoch: 13   Global Step: 139770   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:40:02,650-Speed 5979.47 samples/sec   Loss 5.4074   LearningRate 0.0525   Epoch: 13   Global Step: 139780   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:40:09,494-Speed 5985.96 samples/sec   Loss 5.3384   LearningRate 0.0525   Epoch: 13   Global Step: 139790   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:40:16,370-Speed 5958.37 samples/sec   Loss 5.3522   LearningRate 0.0524   Epoch: 13   Global Step: 139800   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:40:23,228-Speed 5973.55 samples/sec   Loss 5.3781   LearningRate 0.0524   Epoch: 13   Global Step: 139810   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:40:30,084-Speed 5975.83 samples/sec   Loss 5.3198   LearningRate 0.0524   Epoch: 13   Global Step: 139820   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:40:36,930-Speed 5984.62 samples/sec   Loss 5.3434   LearningRate 0.0524   Epoch: 13   Global Step: 139830   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:40:43,791-Speed 5970.82 samples/sec   Loss 5.3157   LearningRate 0.0524   Epoch: 13   Global Step: 139840   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:40:50,633-Speed 5987.58 samples/sec   Loss 5.3077   LearningRate 0.0524   Epoch: 13   Global Step: 139850   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:40:57,469-Speed 5992.60 samples/sec   Loss 5.3886   LearningRate 0.0523   Epoch: 13   Global Step: 139860   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:41:04,339-Speed 5963.20 samples/sec   Loss 5.3801   LearningRate 0.0523   Epoch: 13   Global Step: 139870   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:41:11,192-Speed 5978.42 samples/sec   Loss 5.3204   LearningRate 0.0523   Epoch: 13   Global Step: 139880   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:41:18,064-Speed 5961.66 samples/sec   Loss 5.3717   LearningRate 0.0523   Epoch: 13   Global Step: 139890   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:41:24,927-Speed 5969.20 samples/sec   Loss 5.3243   LearningRate 0.0523   Epoch: 13   Global Step: 139900   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:41:31,775-Speed 5983.09 samples/sec   Loss 5.3712   LearningRate 0.0523   Epoch: 13   Global Step: 139910   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:41:38,631-Speed 5977.08 samples/sec   Loss 5.2880   LearningRate 0.0523   Epoch: 13   Global Step: 139920   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:41:45,483-Speed 5977.98 samples/sec   Loss 5.3552   LearningRate 0.0522   Epoch: 13   Global Step: 139930   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:41:52,344-Speed 5971.50 samples/sec   Loss 5.3437   LearningRate 0.0522   Epoch: 13   Global Step: 139940   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:41:59,190-Speed 5984.10 samples/sec   Loss 5.4130   LearningRate 0.0522   Epoch: 13   Global Step: 139950   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:42:06,039-Speed 5981.07 samples/sec   Loss 5.3555   LearningRate 0.0522   Epoch: 13   Global Step: 139960   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:42:12,890-Speed 5979.86 samples/sec   Loss 5.3211   LearningRate 0.0522   Epoch: 13   Global Step: 139970   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:42:19,742-Speed 5979.02 samples/sec   Loss 5.3202   LearningRate 0.0522   Epoch: 13   Global Step: 139980   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:42:26,610-Speed 5965.11 samples/sec   Loss 5.3361   LearningRate 0.0521   Epoch: 13   Global Step: 139990   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:42:33,486-Speed 5958.47 samples/sec   Loss 5.3189   LearningRate 0.0521   Epoch: 13   Global Step: 140000   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:43:00,327-[lfw][140000]XNorm: 22.998499
Training: 2022-01-08 23:43:00,328-[lfw][140000]Accuracy-Flip: 0.99767+-0.00300
Training: 2022-01-08 23:43:00,328-[lfw][140000]Accuracy-Highest: 0.99783
Training: 2022-01-08 23:43:31,438-[cfp_fp][140000]XNorm: 20.309371
Training: 2022-01-08 23:43:31,439-[cfp_fp][140000]Accuracy-Flip: 0.98714+-0.00723
Training: 2022-01-08 23:43:31,440-[cfp_fp][140000]Accuracy-Highest: 0.98714
Training: 2022-01-08 23:43:58,303-[agedb_30][140000]XNorm: 22.815771
Training: 2022-01-08 23:43:58,304-[agedb_30][140000]Accuracy-Flip: 0.97617+-0.00619
Training: 2022-01-08 23:43:58,304-[agedb_30][140000]Accuracy-Highest: 0.97667
Training: 2022-01-08 23:44:05,167-Speed 446.77 samples/sec   Loss 5.3102   LearningRate 0.0521   Epoch: 13   Global Step: 140010   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:44:12,006-Speed 5990.44 samples/sec   Loss 5.3503   LearningRate 0.0521   Epoch: 13   Global Step: 140020   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:44:18,831-Speed 6002.38 samples/sec   Loss 5.3614   LearningRate 0.0521   Epoch: 13   Global Step: 140030   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:44:25,681-Speed 5981.88 samples/sec   Loss 5.3616   LearningRate 0.0521   Epoch: 13   Global Step: 140040   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:44:32,532-Speed 5979.83 samples/sec   Loss 5.3080   LearningRate 0.0521   Epoch: 13   Global Step: 140050   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:44:39,400-Speed 5965.47 samples/sec   Loss 5.2886   LearningRate 0.0520   Epoch: 13   Global Step: 140060   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:44:46,255-Speed 5976.09 samples/sec   Loss 5.3346   LearningRate 0.0520   Epoch: 13   Global Step: 140070   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:44:53,122-Speed 5965.46 samples/sec   Loss 5.2703   LearningRate 0.0520   Epoch: 13   Global Step: 140080   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:44:59,994-Speed 5964.15 samples/sec   Loss 5.3270   LearningRate 0.0520   Epoch: 13   Global Step: 140090   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:45:06,846-Speed 5980.18 samples/sec   Loss 5.3386   LearningRate 0.0520   Epoch: 13   Global Step: 140100   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:45:13,690-Speed 5985.05 samples/sec   Loss 5.3414   LearningRate 0.0520   Epoch: 13   Global Step: 140110   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:45:20,528-Speed 5991.16 samples/sec   Loss 5.3389   LearningRate 0.0519   Epoch: 13   Global Step: 140120   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:45:27,373-Speed 5985.30 samples/sec   Loss 5.3514   LearningRate 0.0519   Epoch: 13   Global Step: 140130   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:45:34,213-Speed 5989.34 samples/sec   Loss 5.3318   LearningRate 0.0519   Epoch: 13   Global Step: 140140   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:45:41,055-Speed 5988.17 samples/sec   Loss 5.3166   LearningRate 0.0519   Epoch: 13   Global Step: 140150   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:45:47,895-Speed 5989.45 samples/sec   Loss 5.3065   LearningRate 0.0519   Epoch: 13   Global Step: 140160   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:45:54,761-Speed 5966.84 samples/sec   Loss 5.3466   LearningRate 0.0519   Epoch: 13   Global Step: 140170   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:46:01,597-Speed 5993.18 samples/sec   Loss 5.2923   LearningRate 0.0519   Epoch: 13   Global Step: 140180   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:46:08,440-Speed 5987.66 samples/sec   Loss 5.3818   LearningRate 0.0518   Epoch: 13   Global Step: 140190   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:46:15,290-Speed 5980.21 samples/sec   Loss 5.2748   LearningRate 0.0518   Epoch: 13   Global Step: 140200   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:46:22,158-Speed 5965.94 samples/sec   Loss 5.3494   LearningRate 0.0518   Epoch: 13   Global Step: 140210   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:46:29,042-Speed 5951.93 samples/sec   Loss 5.3668   LearningRate 0.0518   Epoch: 13   Global Step: 140220   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:46:35,893-Speed 5979.97 samples/sec   Loss 5.3005   LearningRate 0.0518   Epoch: 13   Global Step: 140230   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:46:42,789-Speed 5940.46 samples/sec   Loss 5.3458   LearningRate 0.0518   Epoch: 13   Global Step: 140240   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:46:49,636-Speed 5984.30 samples/sec   Loss 5.3346   LearningRate 0.0517   Epoch: 13   Global Step: 140250   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:46:56,499-Speed 5969.29 samples/sec   Loss 5.3142   LearningRate 0.0517   Epoch: 13   Global Step: 140260   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:47:03,371-Speed 5964.15 samples/sec   Loss 5.3459   LearningRate 0.0517   Epoch: 13   Global Step: 140270   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:47:10,220-Speed 5981.34 samples/sec   Loss 5.3535   LearningRate 0.0517   Epoch: 13   Global Step: 140280   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:47:17,081-Speed 5971.01 samples/sec   Loss 5.3400   LearningRate 0.0517   Epoch: 13   Global Step: 140290   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:47:23,938-Speed 5974.68 samples/sec   Loss 5.3285   LearningRate 0.0517   Epoch: 13   Global Step: 140300   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:47:30,774-Speed 5992.76 samples/sec   Loss 5.4060   LearningRate 0.0517   Epoch: 13   Global Step: 140310   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:47:37,612-Speed 5991.35 samples/sec   Loss 5.3341   LearningRate 0.0516   Epoch: 13   Global Step: 140320   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:47:44,481-Speed 5964.43 samples/sec   Loss 5.3660   LearningRate 0.0516   Epoch: 13   Global Step: 140330   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:47:51,336-Speed 5976.82 samples/sec   Loss 5.2959   LearningRate 0.0516   Epoch: 13   Global Step: 140340   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:47:58,186-Speed 5980.66 samples/sec   Loss 5.3496   LearningRate 0.0516   Epoch: 13   Global Step: 140350   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:48:05,029-Speed 5986.83 samples/sec   Loss 5.3271   LearningRate 0.0516   Epoch: 13   Global Step: 140360   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:48:11,917-Speed 5948.25 samples/sec   Loss 5.2906   LearningRate 0.0516   Epoch: 13   Global Step: 140370   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:48:18,764-Speed 5983.06 samples/sec   Loss 5.3055   LearningRate 0.0515   Epoch: 13   Global Step: 140380   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:48:25,611-Speed 5983.10 samples/sec   Loss 5.3511   LearningRate 0.0515   Epoch: 13   Global Step: 140390   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:48:32,477-Speed 5966.97 samples/sec   Loss 5.2542   LearningRate 0.0515   Epoch: 13   Global Step: 140400   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:48:39,339-Speed 5970.43 samples/sec   Loss 5.3353   LearningRate 0.0515   Epoch: 13   Global Step: 140410   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:48:46,201-Speed 5970.61 samples/sec   Loss 5.3594   LearningRate 0.0515   Epoch: 13   Global Step: 140420   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:48:53,050-Speed 5981.15 samples/sec   Loss 5.3448   LearningRate 0.0515   Epoch: 13   Global Step: 140430   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:48:59,914-Speed 5968.46 samples/sec   Loss 5.3314   LearningRate 0.0515   Epoch: 13   Global Step: 140440   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-08 23:49:06,777-Speed 5969.91 samples/sec   Loss 5.3631   LearningRate 0.0514   Epoch: 13   Global Step: 140450   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-08 23:49:13,626-Speed 5981.86 samples/sec   Loss 5.2946   LearningRate 0.0514   Epoch: 13   Global Step: 140460   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-08 23:49:20,481-Speed 5975.89 samples/sec   Loss 5.3450   LearningRate 0.0514   Epoch: 13   Global Step: 140470   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-08 23:49:27,359-Speed 5958.81 samples/sec   Loss 5.3056   LearningRate 0.0514   Epoch: 13   Global Step: 140480   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-08 23:49:34,208-Speed 5981.71 samples/sec   Loss 5.3392   LearningRate 0.0514   Epoch: 13   Global Step: 140490   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-08 23:49:41,059-Speed 5979.47 samples/sec   Loss 5.2829   LearningRate 0.0514   Epoch: 13   Global Step: 140500   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-08 23:49:47,922-Speed 5971.46 samples/sec   Loss 5.3030   LearningRate 0.0513   Epoch: 13   Global Step: 140510   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-08 23:49:54,791-Speed 5963.72 samples/sec   Loss 5.2818   LearningRate 0.0513   Epoch: 13   Global Step: 140520   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-08 23:50:01,641-Speed 5980.74 samples/sec   Loss 5.3018   LearningRate 0.0513   Epoch: 13   Global Step: 140530   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-08 23:50:08,492-Speed 5980.45 samples/sec   Loss 5.2646   LearningRate 0.0513   Epoch: 13   Global Step: 140540   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:50:15,345-Speed 5978.18 samples/sec   Loss 5.2615   LearningRate 0.0513   Epoch: 13   Global Step: 140550   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:50:22,186-Speed 5988.61 samples/sec   Loss 5.2705   LearningRate 0.0513   Epoch: 13   Global Step: 140560   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:50:29,035-Speed 5981.02 samples/sec   Loss 5.2738   LearningRate 0.0513   Epoch: 13   Global Step: 140570   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:50:35,884-Speed 5982.20 samples/sec   Loss 5.3542   LearningRate 0.0512   Epoch: 13   Global Step: 140580   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:50:42,743-Speed 5972.76 samples/sec   Loss 5.3065   LearningRate 0.0512   Epoch: 13   Global Step: 140590   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:50:49,598-Speed 5976.49 samples/sec   Loss 5.2955   LearningRate 0.0512   Epoch: 13   Global Step: 140600   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:50:56,455-Speed 5974.90 samples/sec   Loss 5.2737   LearningRate 0.0512   Epoch: 13   Global Step: 140610   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:51:03,304-Speed 5981.04 samples/sec   Loss 5.3223   LearningRate 0.0512   Epoch: 13   Global Step: 140620   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:51:10,151-Speed 5984.00 samples/sec   Loss 5.2982   LearningRate 0.0512   Epoch: 13   Global Step: 140630   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:51:16,997-Speed 5984.21 samples/sec   Loss 5.2935   LearningRate 0.0511   Epoch: 13   Global Step: 140640   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:51:23,865-Speed 5964.77 samples/sec   Loss 5.2896   LearningRate 0.0511   Epoch: 13   Global Step: 140650   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:51:30,725-Speed 5971.43 samples/sec   Loss 5.2731   LearningRate 0.0511   Epoch: 13   Global Step: 140660   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:51:37,603-Speed 5959.82 samples/sec   Loss 5.2702   LearningRate 0.0511   Epoch: 13   Global Step: 140670   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:51:44,474-Speed 5961.99 samples/sec   Loss 5.3142   LearningRate 0.0511   Epoch: 13   Global Step: 140680   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:51:51,383-Speed 5929.60 samples/sec   Loss 5.2789   LearningRate 0.0511   Epoch: 13   Global Step: 140690   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:51:58,251-Speed 5965.69 samples/sec   Loss 5.2793   LearningRate 0.0511   Epoch: 13   Global Step: 140700   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:52:05,110-Speed 5972.60 samples/sec   Loss 5.2551   LearningRate 0.0510   Epoch: 13   Global Step: 140710   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:52:11,964-Speed 5976.88 samples/sec   Loss 5.2220   LearningRate 0.0510   Epoch: 13   Global Step: 140720   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:52:18,835-Speed 5962.86 samples/sec   Loss 5.2684   LearningRate 0.0510   Epoch: 13   Global Step: 140730   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:52:25,711-Speed 5957.96 samples/sec   Loss 5.2904   LearningRate 0.0510   Epoch: 13   Global Step: 140740   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:52:32,582-Speed 5962.40 samples/sec   Loss 5.3000   LearningRate 0.0510   Epoch: 13   Global Step: 140750   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:52:39,451-Speed 5964.69 samples/sec   Loss 5.2840   LearningRate 0.0510   Epoch: 13   Global Step: 140760   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:52:46,299-Speed 5981.86 samples/sec   Loss 5.2194   LearningRate 0.0509   Epoch: 13   Global Step: 140770   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:52:53,159-Speed 5972.26 samples/sec   Loss 5.3279   LearningRate 0.0509   Epoch: 13   Global Step: 140780   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:53:00,059-Speed 5937.58 samples/sec   Loss 5.2789   LearningRate 0.0509   Epoch: 13   Global Step: 140790   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:53:06,906-Speed 5983.73 samples/sec   Loss 5.2908   LearningRate 0.0509   Epoch: 13   Global Step: 140800   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:53:13,762-Speed 5974.91 samples/sec   Loss 5.3056   LearningRate 0.0509   Epoch: 13   Global Step: 140810   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:53:20,625-Speed 5969.99 samples/sec   Loss 5.2926   LearningRate 0.0509   Epoch: 13   Global Step: 140820   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:53:27,480-Speed 5975.59 samples/sec   Loss 5.2335   LearningRate 0.0509   Epoch: 13   Global Step: 140830   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:53:34,332-Speed 5978.94 samples/sec   Loss 5.1916   LearningRate 0.0508   Epoch: 13   Global Step: 140840   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:53:41,193-Speed 5970.97 samples/sec   Loss 5.2631   LearningRate 0.0508   Epoch: 13   Global Step: 140850   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:53:48,057-Speed 5968.51 samples/sec   Loss 5.2574   LearningRate 0.0508   Epoch: 13   Global Step: 140860   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:53:54,913-Speed 5976.27 samples/sec   Loss 5.2583   LearningRate 0.0508   Epoch: 13   Global Step: 140870   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:54:01,766-Speed 5977.93 samples/sec   Loss 5.2988   LearningRate 0.0508   Epoch: 13   Global Step: 140880   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:54:08,661-Speed 5941.78 samples/sec   Loss 5.2815   LearningRate 0.0508   Epoch: 13   Global Step: 140890   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:54:15,531-Speed 5963.46 samples/sec   Loss 5.2625   LearningRate 0.0507   Epoch: 13   Global Step: 140900   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:54:22,386-Speed 5976.35 samples/sec   Loss 5.3321   LearningRate 0.0507   Epoch: 13   Global Step: 140910   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:54:29,282-Speed 5941.10 samples/sec   Loss 5.3163   LearningRate 0.0507   Epoch: 13   Global Step: 140920   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:54:36,138-Speed 5974.85 samples/sec   Loss 5.3076   LearningRate 0.0507   Epoch: 13   Global Step: 140930   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:54:42,994-Speed 5975.98 samples/sec   Loss 5.2777   LearningRate 0.0507   Epoch: 13   Global Step: 140940   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:54:49,944-Speed 5894.18 samples/sec   Loss 5.2566   LearningRate 0.0507   Epoch: 13   Global Step: 140950   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:54:56,796-Speed 5978.77 samples/sec   Loss 5.2275   LearningRate 0.0507   Epoch: 13   Global Step: 140960   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:55:03,671-Speed 5959.49 samples/sec   Loss 5.2508   LearningRate 0.0506   Epoch: 13   Global Step: 140970   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-08 23:55:10,518-Speed 5983.45 samples/sec   Loss 5.2761   LearningRate 0.0506   Epoch: 13   Global Step: 140980   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:55:17,385-Speed 5965.68 samples/sec   Loss 5.2871   LearningRate 0.0506   Epoch: 13   Global Step: 140990   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:55:24,251-Speed 5966.76 samples/sec   Loss 5.2794   LearningRate 0.0506   Epoch: 13   Global Step: 141000   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:55:31,107-Speed 5975.59 samples/sec   Loss 5.2565   LearningRate 0.0506   Epoch: 13   Global Step: 141010   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:55:37,964-Speed 5975.27 samples/sec   Loss 5.2778   LearningRate 0.0506   Epoch: 13   Global Step: 141020   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:55:44,832-Speed 5965.42 samples/sec   Loss 5.2239   LearningRate 0.0506   Epoch: 13   Global Step: 141030   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:55:51,710-Speed 5956.14 samples/sec   Loss 5.2176   LearningRate 0.0505   Epoch: 13   Global Step: 141040   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:55:58,588-Speed 5956.18 samples/sec   Loss 5.2793   LearningRate 0.0505   Epoch: 13   Global Step: 141050   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:56:05,440-Speed 5978.70 samples/sec   Loss 5.2512   LearningRate 0.0505   Epoch: 13   Global Step: 141060   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:56:12,298-Speed 5973.99 samples/sec   Loss 5.2430   LearningRate 0.0505   Epoch: 13   Global Step: 141070   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:56:19,157-Speed 5971.80 samples/sec   Loss 5.2519   LearningRate 0.0505   Epoch: 13   Global Step: 141080   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-08 23:56:26,028-Speed 5963.69 samples/sec   Loss 5.2518   LearningRate 0.0505   Epoch: 13   Global Step: 141090   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-08 23:56:32,893-Speed 5967.56 samples/sec   Loss 5.2718   LearningRate 0.0504   Epoch: 13   Global Step: 141100   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-08 23:56:39,736-Speed 5987.13 samples/sec   Loss 5.2220   LearningRate 0.0504   Epoch: 13   Global Step: 141110   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:56:46,596-Speed 5971.91 samples/sec   Loss 5.2406   LearningRate 0.0504   Epoch: 13   Global Step: 141120   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:56:53,458-Speed 5969.71 samples/sec   Loss 5.2650   LearningRate 0.0504   Epoch: 13   Global Step: 141130   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:57:00,318-Speed 5972.34 samples/sec   Loss 5.3432   LearningRate 0.0504   Epoch: 13   Global Step: 141140   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:57:07,176-Speed 5974.12 samples/sec   Loss 5.2869   LearningRate 0.0504   Epoch: 13   Global Step: 141150   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:57:14,030-Speed 5976.41 samples/sec   Loss 5.2305   LearningRate 0.0504   Epoch: 13   Global Step: 141160   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:57:20,901-Speed 5962.79 samples/sec   Loss 5.2621   LearningRate 0.0503   Epoch: 13   Global Step: 141170   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:57:27,775-Speed 5962.44 samples/sec   Loss 5.3123   LearningRate 0.0503   Epoch: 13   Global Step: 141180   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:57:34,650-Speed 5958.19 samples/sec   Loss 5.2036   LearningRate 0.0503   Epoch: 13   Global Step: 141190   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:57:41,504-Speed 5977.23 samples/sec   Loss 5.2471   LearningRate 0.0503   Epoch: 13   Global Step: 141200   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:57:48,369-Speed 5968.47 samples/sec   Loss 5.2750   LearningRate 0.0503   Epoch: 13   Global Step: 141210   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:57:55,230-Speed 5970.69 samples/sec   Loss 5.2633   LearningRate 0.0503   Epoch: 13   Global Step: 141220   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:58:02,091-Speed 5971.52 samples/sec   Loss 5.2882   LearningRate 0.0502   Epoch: 13   Global Step: 141230   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-08 23:58:08,984-Speed 5943.75 samples/sec   Loss 5.2538   LearningRate 0.0502   Epoch: 13   Global Step: 141240   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:58:15,839-Speed 5975.86 samples/sec   Loss 5.2974   LearningRate 0.0502   Epoch: 13   Global Step: 141250   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:58:22,723-Speed 5951.73 samples/sec   Loss 5.2728   LearningRate 0.0502   Epoch: 13   Global Step: 141260   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:58:29,592-Speed 5965.56 samples/sec   Loss 5.2458   LearningRate 0.0502   Epoch: 13   Global Step: 141270   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:58:36,472-Speed 5954.56 samples/sec   Loss 5.2547   LearningRate 0.0502   Epoch: 13   Global Step: 141280   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:58:43,337-Speed 5968.76 samples/sec   Loss 5.2766   LearningRate 0.0502   Epoch: 13   Global Step: 141290   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:58:50,223-Speed 5950.29 samples/sec   Loss 5.2292   LearningRate 0.0501   Epoch: 13   Global Step: 141300   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:58:57,091-Speed 5964.55 samples/sec   Loss 5.2340   LearningRate 0.0501   Epoch: 13   Global Step: 141310   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:59:03,946-Speed 5976.27 samples/sec   Loss 5.3079   LearningRate 0.0501   Epoch: 13   Global Step: 141320   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:59:10,823-Speed 5957.70 samples/sec   Loss 5.2585   LearningRate 0.0501   Epoch: 13   Global Step: 141330   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:59:17,684-Speed 5971.84 samples/sec   Loss 5.2514   LearningRate 0.0501   Epoch: 13   Global Step: 141340   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:59:24,641-Speed 5888.51 samples/sec   Loss 5.2961   LearningRate 0.0501   Epoch: 13   Global Step: 141350   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:59:31,626-Speed 5865.74 samples/sec   Loss 5.1953   LearningRate 0.0500   Epoch: 13   Global Step: 141360   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:59:38,542-Speed 5924.17 samples/sec   Loss 5.2300   LearningRate 0.0500   Epoch: 13   Global Step: 141370   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:59:45,405-Speed 5969.14 samples/sec   Loss 5.2094   LearningRate 0.0500   Epoch: 13   Global Step: 141380   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:59:52,262-Speed 5974.91 samples/sec   Loss 5.2897   LearningRate 0.0500   Epoch: 13   Global Step: 141390   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-08 23:59:59,124-Speed 5970.12 samples/sec   Loss 5.2495   LearningRate 0.0500   Epoch: 13   Global Step: 141400   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:00:05,996-Speed 5961.64 samples/sec   Loss 5.2824   LearningRate 0.0500   Epoch: 13   Global Step: 141410   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:00:12,871-Speed 5958.85 samples/sec   Loss 5.2707   LearningRate 0.0500   Epoch: 13   Global Step: 141420   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:00:19,725-Speed 5977.02 samples/sec   Loss 5.2677   LearningRate 0.0499   Epoch: 13   Global Step: 141430   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:00:26,576-Speed 5979.61 samples/sec   Loss 5.2034   LearningRate 0.0499   Epoch: 13   Global Step: 141440   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-09 00:00:33,435-Speed 5973.00 samples/sec   Loss 5.2932   LearningRate 0.0499   Epoch: 13   Global Step: 141450   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:00:40,294-Speed 5971.87 samples/sec   Loss 5.2433   LearningRate 0.0499   Epoch: 13   Global Step: 141460   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:00:47,178-Speed 5953.77 samples/sec   Loss 5.2318   LearningRate 0.0499   Epoch: 13   Global Step: 141470   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:00:54,037-Speed 5972.88 samples/sec   Loss 5.2333   LearningRate 0.0499   Epoch: 13   Global Step: 141480   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:01:00,899-Speed 5969.51 samples/sec   Loss 5.2278   LearningRate 0.0499   Epoch: 13   Global Step: 141490   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:01:07,759-Speed 5971.92 samples/sec   Loss 5.2497   LearningRate 0.0498   Epoch: 13   Global Step: 141500   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:01:14,614-Speed 5977.43 samples/sec   Loss 5.1935   LearningRate 0.0498   Epoch: 13   Global Step: 141510   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:01:21,475-Speed 5970.58 samples/sec   Loss 5.2167   LearningRate 0.0498   Epoch: 13   Global Step: 141520   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:01:28,383-Speed 5930.64 samples/sec   Loss 5.2549   LearningRate 0.0498   Epoch: 13   Global Step: 141530   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:01:35,234-Speed 5979.46 samples/sec   Loss 5.2235   LearningRate 0.0498   Epoch: 13   Global Step: 141540   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:01:42,088-Speed 5977.53 samples/sec   Loss 5.2292   LearningRate 0.0498   Epoch: 13   Global Step: 141550   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:01:48,958-Speed 5962.99 samples/sec   Loss 5.2009   LearningRate 0.0497   Epoch: 13   Global Step: 141560   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:01:55,836-Speed 5956.33 samples/sec   Loss 5.2181   LearningRate 0.0497   Epoch: 13   Global Step: 141570   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:02:02,707-Speed 5962.34 samples/sec   Loss 5.1966   LearningRate 0.0497   Epoch: 13   Global Step: 141580   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:02:09,593-Speed 5949.37 samples/sec   Loss 5.2426   LearningRate 0.0497   Epoch: 13   Global Step: 141590   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:02:16,446-Speed 5978.42 samples/sec   Loss 5.3107   LearningRate 0.0497   Epoch: 13   Global Step: 141600   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 00:02:23,296-Speed 5981.00 samples/sec   Loss 5.2253   LearningRate 0.0497   Epoch: 13   Global Step: 141610   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 00:02:30,178-Speed 5952.55 samples/sec   Loss 5.2015   LearningRate 0.0497   Epoch: 13   Global Step: 141620   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 00:02:37,064-Speed 5950.02 samples/sec   Loss 5.2053   LearningRate 0.0496   Epoch: 13   Global Step: 141630   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 00:02:43,926-Speed 5970.40 samples/sec   Loss 5.2309   LearningRate 0.0496   Epoch: 13   Global Step: 141640   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 00:02:50,816-Speed 5947.52 samples/sec   Loss 5.2460   LearningRate 0.0496   Epoch: 13   Global Step: 141650   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 00:02:57,680-Speed 5968.80 samples/sec   Loss 5.2833   LearningRate 0.0496   Epoch: 13   Global Step: 141660   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 00:03:04,576-Speed 5940.78 samples/sec   Loss 5.2006   LearningRate 0.0496   Epoch: 13   Global Step: 141670   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 00:03:11,450-Speed 5959.55 samples/sec   Loss 5.2217   LearningRate 0.0496   Epoch: 13   Global Step: 141680   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 00:03:18,313-Speed 5970.08 samples/sec   Loss 5.2058   LearningRate 0.0495   Epoch: 13   Global Step: 141690   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 00:03:25,164-Speed 5979.30 samples/sec   Loss 5.2405   LearningRate 0.0495   Epoch: 13   Global Step: 141700   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:03:32,119-Speed 5891.53 samples/sec   Loss 5.2325   LearningRate 0.0495   Epoch: 13   Global Step: 141710   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:03:39,086-Speed 5879.71 samples/sec   Loss 5.2103   LearningRate 0.0495   Epoch: 13   Global Step: 141720   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:03:46,042-Speed 5889.93 samples/sec   Loss 5.1875   LearningRate 0.0495   Epoch: 13   Global Step: 141730   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:03:52,999-Speed 5889.13 samples/sec   Loss 5.2424   LearningRate 0.0495   Epoch: 13   Global Step: 141740   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:03:59,874-Speed 5959.36 samples/sec   Loss 5.2147   LearningRate 0.0495   Epoch: 13   Global Step: 141750   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:04:06,768-Speed 5941.79 samples/sec   Loss 5.2162   LearningRate 0.0494   Epoch: 13   Global Step: 141760   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:04:13,637-Speed 5964.06 samples/sec   Loss 5.2237   LearningRate 0.0494   Epoch: 13   Global Step: 141770   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:04:20,510-Speed 5961.01 samples/sec   Loss 5.2159   LearningRate 0.0494   Epoch: 13   Global Step: 141780   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:04:27,363-Speed 5977.83 samples/sec   Loss 5.2605   LearningRate 0.0494   Epoch: 13   Global Step: 141790   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:04:34,219-Speed 5974.57 samples/sec   Loss 5.2199   LearningRate 0.0494   Epoch: 13   Global Step: 141800   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:04:41,077-Speed 5973.88 samples/sec   Loss 5.1932   LearningRate 0.0494   Epoch: 13   Global Step: 141810   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:04:47,951-Speed 5959.66 samples/sec   Loss 5.2810   LearningRate 0.0494   Epoch: 13   Global Step: 141820   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:04:54,800-Speed 5981.94 samples/sec   Loss 5.2504   LearningRate 0.0493   Epoch: 13   Global Step: 141830   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:05:01,658-Speed 5973.81 samples/sec   Loss 5.2195   LearningRate 0.0493   Epoch: 13   Global Step: 141840   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:05:08,514-Speed 5974.40 samples/sec   Loss 5.1869   LearningRate 0.0493   Epoch: 13   Global Step: 141850   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:05:15,380-Speed 5967.50 samples/sec   Loss 5.2641   LearningRate 0.0493   Epoch: 13   Global Step: 141860   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:05:22,281-Speed 5936.61 samples/sec   Loss 5.2022   LearningRate 0.0493   Epoch: 13   Global Step: 141870   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:05:29,129-Speed 5982.60 samples/sec   Loss 5.2580   LearningRate 0.0493   Epoch: 13   Global Step: 141880   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:05:35,995-Speed 5967.28 samples/sec   Loss 5.2345   LearningRate 0.0492   Epoch: 13   Global Step: 141890   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:05:42,851-Speed 5975.30 samples/sec   Loss 5.2285   LearningRate 0.0492   Epoch: 13   Global Step: 141900   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:05:49,728-Speed 5957.41 samples/sec   Loss 5.2266   LearningRate 0.0492   Epoch: 13   Global Step: 141910   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:05:56,594-Speed 5968.97 samples/sec   Loss 5.2174   LearningRate 0.0492   Epoch: 13   Global Step: 141920   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:06:03,442-Speed 5981.92 samples/sec   Loss 5.2038   LearningRate 0.0492   Epoch: 13   Global Step: 141930   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:06:10,302-Speed 5972.17 samples/sec   Loss 5.2119   LearningRate 0.0492   Epoch: 13   Global Step: 141940   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:06:17,161-Speed 5972.98 samples/sec   Loss 5.0970   LearningRate 0.0492   Epoch: 13   Global Step: 141950   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:06:24,026-Speed 5968.58 samples/sec   Loss 5.1620   LearningRate 0.0491   Epoch: 13   Global Step: 141960   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:06:30,876-Speed 5980.39 samples/sec   Loss 5.1613   LearningRate 0.0491   Epoch: 13   Global Step: 141970   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:06:37,726-Speed 5981.42 samples/sec   Loss 5.1390   LearningRate 0.0491   Epoch: 13   Global Step: 141980   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:06:44,691-Speed 5881.61 samples/sec   Loss 5.2138   LearningRate 0.0491   Epoch: 13   Global Step: 141990   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:06:51,558-Speed 5966.41 samples/sec   Loss 5.1495   LearningRate 0.0491   Epoch: 13   Global Step: 142000   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:06:58,409-Speed 5979.80 samples/sec   Loss 5.1770   LearningRate 0.0491   Epoch: 13   Global Step: 142010   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:07:05,267-Speed 5973.14 samples/sec   Loss 5.2081   LearningRate 0.0491   Epoch: 13   Global Step: 142020   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:07:12,155-Speed 5947.48 samples/sec   Loss 5.1895   LearningRate 0.0490   Epoch: 13   Global Step: 142030   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:07:19,030-Speed 5959.55 samples/sec   Loss 5.1704   LearningRate 0.0490   Epoch: 13   Global Step: 142040   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:07:25,894-Speed 5968.32 samples/sec   Loss 5.2129   LearningRate 0.0490   Epoch: 13   Global Step: 142050   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:07:32,750-Speed 5975.76 samples/sec   Loss 5.2707   LearningRate 0.0490   Epoch: 13   Global Step: 142060   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:07:39,596-Speed 5984.92 samples/sec   Loss 5.1817   LearningRate 0.0490   Epoch: 13   Global Step: 142070   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:07:46,481-Speed 5952.66 samples/sec   Loss 5.2333   LearningRate 0.0490   Epoch: 13   Global Step: 142080   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:07:53,357-Speed 5958.36 samples/sec   Loss 5.1461   LearningRate 0.0489   Epoch: 13   Global Step: 142090   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:08:00,206-Speed 5981.50 samples/sec   Loss 5.1558   LearningRate 0.0489   Epoch: 13   Global Step: 142100   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:08:07,046-Speed 5989.25 samples/sec   Loss 5.1896   LearningRate 0.0489   Epoch: 13   Global Step: 142110   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 00:08:13,893-Speed 5982.65 samples/sec   Loss 5.2155   LearningRate 0.0489   Epoch: 13   Global Step: 142120   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 00:08:20,759-Speed 5966.36 samples/sec   Loss 5.1874   LearningRate 0.0489   Epoch: 13   Global Step: 142130   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 00:08:27,616-Speed 5975.06 samples/sec   Loss 5.2177   LearningRate 0.0489   Epoch: 13   Global Step: 142140   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 00:08:34,480-Speed 5968.93 samples/sec   Loss 5.2145   LearningRate 0.0489   Epoch: 13   Global Step: 142150   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 00:08:41,349-Speed 5963.66 samples/sec   Loss 5.1508   LearningRate 0.0488   Epoch: 13   Global Step: 142160   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 00:08:48,201-Speed 5982.68 samples/sec   Loss 5.1673   LearningRate 0.0488   Epoch: 13   Global Step: 142170   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 00:08:55,061-Speed 5971.08 samples/sec   Loss 5.1950   LearningRate 0.0488   Epoch: 13   Global Step: 142180   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 00:09:01,927-Speed 5967.64 samples/sec   Loss 5.2002   LearningRate 0.0488   Epoch: 13   Global Step: 142190   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 00:09:08,790-Speed 5969.08 samples/sec   Loss 5.2030   LearningRate 0.0488   Epoch: 13   Global Step: 142200   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 00:09:15,651-Speed 5971.02 samples/sec   Loss 5.1982   LearningRate 0.0488   Epoch: 13   Global Step: 142210   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:09:22,541-Speed 5947.75 samples/sec   Loss 5.1844   LearningRate 0.0488   Epoch: 13   Global Step: 142220   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:09:29,393-Speed 5980.22 samples/sec   Loss 5.2162   LearningRate 0.0487   Epoch: 13   Global Step: 142230   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:09:36,248-Speed 5976.18 samples/sec   Loss 5.1206   LearningRate 0.0487   Epoch: 13   Global Step: 142240   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:09:43,111-Speed 5969.52 samples/sec   Loss 5.2159   LearningRate 0.0487   Epoch: 13   Global Step: 142250   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:09:49,971-Speed 5971.85 samples/sec   Loss 5.1812   LearningRate 0.0487   Epoch: 13   Global Step: 142260   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:09:56,825-Speed 5976.37 samples/sec   Loss 5.2055   LearningRate 0.0487   Epoch: 13   Global Step: 142270   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:10:03,686-Speed 5971.65 samples/sec   Loss 5.2197   LearningRate 0.0487   Epoch: 13   Global Step: 142280   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:10:10,542-Speed 5975.32 samples/sec   Loss 5.1388   LearningRate 0.0486   Epoch: 13   Global Step: 142290   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:10:17,422-Speed 5957.53 samples/sec   Loss 5.1633   LearningRate 0.0486   Epoch: 13   Global Step: 142300   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:10:24,307-Speed 5950.77 samples/sec   Loss 5.2419   LearningRate 0.0486   Epoch: 13   Global Step: 142310   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:10:31,156-Speed 5981.41 samples/sec   Loss 5.1573   LearningRate 0.0486   Epoch: 13   Global Step: 142320   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:10:38,009-Speed 5978.28 samples/sec   Loss 5.1763   LearningRate 0.0486   Epoch: 13   Global Step: 142330   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:10:44,900-Speed 5944.39 samples/sec   Loss 5.1490   LearningRate 0.0486   Epoch: 13   Global Step: 142340   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:10:51,768-Speed 5965.75 samples/sec   Loss 5.1823   LearningRate 0.0486   Epoch: 13   Global Step: 142350   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:10:58,623-Speed 5975.51 samples/sec   Loss 5.1730   LearningRate 0.0485   Epoch: 13   Global Step: 142360   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:11:05,474-Speed 5980.15 samples/sec   Loss 5.1581   LearningRate 0.0485   Epoch: 13   Global Step: 142370   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:11:12,335-Speed 5970.42 samples/sec   Loss 5.1604   LearningRate 0.0485   Epoch: 13   Global Step: 142380   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:11:19,186-Speed 5980.46 samples/sec   Loss 5.1722   LearningRate 0.0485   Epoch: 13   Global Step: 142390   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:11:26,043-Speed 5973.96 samples/sec   Loss 5.2063   LearningRate 0.0485   Epoch: 13   Global Step: 142400   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:11:32,919-Speed 5959.03 samples/sec   Loss 5.0939   LearningRate 0.0485   Epoch: 13   Global Step: 142410   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:11:39,787-Speed 5964.75 samples/sec   Loss 5.1841   LearningRate 0.0485   Epoch: 13   Global Step: 142420   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:11:46,650-Speed 5969.64 samples/sec   Loss 5.1828   LearningRate 0.0484   Epoch: 13   Global Step: 142430   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:11:53,508-Speed 5973.16 samples/sec   Loss 5.1870   LearningRate 0.0484   Epoch: 13   Global Step: 142440   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:12:00,364-Speed 5975.28 samples/sec   Loss 5.2308   LearningRate 0.0484   Epoch: 13   Global Step: 142450   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:12:07,212-Speed 5982.69 samples/sec   Loss 5.1779   LearningRate 0.0484   Epoch: 13   Global Step: 142460   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:12:14,055-Speed 5987.05 samples/sec   Loss 5.1158   LearningRate 0.0484   Epoch: 13   Global Step: 142470   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:12:20,903-Speed 5981.59 samples/sec   Loss 5.1842   LearningRate 0.0484   Epoch: 13   Global Step: 142480   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:12:27,771-Speed 5965.21 samples/sec   Loss 5.1392   LearningRate 0.0484   Epoch: 13   Global Step: 142490   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:12:34,626-Speed 5976.88 samples/sec   Loss 5.1781   LearningRate 0.0483   Epoch: 13   Global Step: 142500   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:12:41,484-Speed 5976.92 samples/sec   Loss 5.1388   LearningRate 0.0483   Epoch: 13   Global Step: 142510   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:12:48,353-Speed 5964.17 samples/sec   Loss 5.1182   LearningRate 0.0483   Epoch: 13   Global Step: 142520   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:12:55,232-Speed 5956.28 samples/sec   Loss 5.1756   LearningRate 0.0483   Epoch: 13   Global Step: 142530   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:13:02,083-Speed 5978.60 samples/sec   Loss 5.1738   LearningRate 0.0483   Epoch: 13   Global Step: 142540   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:13:08,939-Speed 5976.24 samples/sec   Loss 5.1742   LearningRate 0.0483   Epoch: 13   Global Step: 142550   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 00:13:15,779-Speed 5989.76 samples/sec   Loss 5.1076   LearningRate 0.0482   Epoch: 13   Global Step: 142560   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 00:13:22,626-Speed 5982.86 samples/sec   Loss 5.2084   LearningRate 0.0482   Epoch: 13   Global Step: 142570   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 00:13:29,480-Speed 5976.99 samples/sec   Loss 5.1321   LearningRate 0.0482   Epoch: 13   Global Step: 142580   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 00:13:36,319-Speed 5990.22 samples/sec   Loss 5.0968   LearningRate 0.0482   Epoch: 13   Global Step: 142590   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 00:13:43,170-Speed 5980.44 samples/sec   Loss 5.1884   LearningRate 0.0482   Epoch: 13   Global Step: 142600   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 00:13:50,021-Speed 5979.72 samples/sec   Loss 5.1394   LearningRate 0.0482   Epoch: 13   Global Step: 142610   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 00:13:56,889-Speed 5965.13 samples/sec   Loss 5.1875   LearningRate 0.0482   Epoch: 13   Global Step: 142620   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 00:14:03,777-Speed 5947.51 samples/sec   Loss 5.1681   LearningRate 0.0481   Epoch: 13   Global Step: 142630   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 00:14:10,646-Speed 5964.59 samples/sec   Loss 5.2056   LearningRate 0.0481   Epoch: 13   Global Step: 142640   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-01-09 00:14:17,501-Speed 5976.68 samples/sec   Loss 5.1441   LearningRate 0.0481   Epoch: 13   Global Step: 142650   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:14:24,352-Speed 5979.61 samples/sec   Loss 5.1800   LearningRate 0.0481   Epoch: 13   Global Step: 142660   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:14:31,211-Speed 5972.91 samples/sec   Loss 5.1740   LearningRate 0.0481   Epoch: 13   Global Step: 142670   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:14:38,117-Speed 5932.29 samples/sec   Loss 5.1460   LearningRate 0.0481   Epoch: 13   Global Step: 142680   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:14:44,982-Speed 5967.41 samples/sec   Loss 5.1295   LearningRate 0.0481   Epoch: 13   Global Step: 142690   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:14:51,845-Speed 5969.60 samples/sec   Loss 5.1482   LearningRate 0.0480   Epoch: 13   Global Step: 142700   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:14:58,692-Speed 5983.77 samples/sec   Loss 5.1089   LearningRate 0.0480   Epoch: 13   Global Step: 142710   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:15:05,550-Speed 5973.15 samples/sec   Loss 5.0913   LearningRate 0.0480   Epoch: 13   Global Step: 142720   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:15:12,415-Speed 5968.28 samples/sec   Loss 5.1711   LearningRate 0.0480   Epoch: 13   Global Step: 142730   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:15:19,291-Speed 5958.29 samples/sec   Loss 5.1623   LearningRate 0.0480   Epoch: 13   Global Step: 142740   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:15:26,175-Speed 5951.27 samples/sec   Loss 5.1782   LearningRate 0.0480   Epoch: 13   Global Step: 142750   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:15:33,063-Speed 5947.89 samples/sec   Loss 5.1978   LearningRate 0.0479   Epoch: 13   Global Step: 142760   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:15:39,911-Speed 5982.33 samples/sec   Loss 5.2056   LearningRate 0.0479   Epoch: 13   Global Step: 142770   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:15:46,762-Speed 5979.54 samples/sec   Loss 5.1838   LearningRate 0.0479   Epoch: 13   Global Step: 142780   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:15:53,605-Speed 5987.16 samples/sec   Loss 5.1109   LearningRate 0.0479   Epoch: 13   Global Step: 142790   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:16:00,470-Speed 5967.76 samples/sec   Loss 5.1121   LearningRate 0.0479   Epoch: 13   Global Step: 142800   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:16:07,315-Speed 5984.23 samples/sec   Loss 5.1189   LearningRate 0.0479   Epoch: 13   Global Step: 142810   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:16:14,155-Speed 5989.39 samples/sec   Loss 5.2065   LearningRate 0.0479   Epoch: 13   Global Step: 142820   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:16:21,020-Speed 5967.93 samples/sec   Loss 5.1691   LearningRate 0.0478   Epoch: 13   Global Step: 142830   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:16:27,915-Speed 5942.28 samples/sec   Loss 5.1724   LearningRate 0.0478   Epoch: 13   Global Step: 142840   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:16:34,777-Speed 5970.37 samples/sec   Loss 5.1517   LearningRate 0.0478   Epoch: 13   Global Step: 142850   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-09 00:16:41,625-Speed 5981.82 samples/sec   Loss 5.1748   LearningRate 0.0478   Epoch: 13   Global Step: 142860   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:16:48,476-Speed 5979.88 samples/sec   Loss 5.1639   LearningRate 0.0478   Epoch: 13   Global Step: 142870   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:16:55,329-Speed 5978.32 samples/sec   Loss 5.1548   LearningRate 0.0478   Epoch: 13   Global Step: 142880   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:17:02,186-Speed 5977.01 samples/sec   Loss 5.1228   LearningRate 0.0478   Epoch: 13   Global Step: 142890   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:17:09,040-Speed 5977.51 samples/sec   Loss 5.1076   LearningRate 0.0477   Epoch: 13   Global Step: 142900   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:17:15,897-Speed 5974.78 samples/sec   Loss 5.1497   LearningRate 0.0477   Epoch: 13   Global Step: 142910   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:17:22,756-Speed 5972.87 samples/sec   Loss 5.1413   LearningRate 0.0477   Epoch: 13   Global Step: 142920   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:17:29,612-Speed 5975.14 samples/sec   Loss 5.1643   LearningRate 0.0477   Epoch: 13   Global Step: 142930   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:17:36,480-Speed 5965.14 samples/sec   Loss 5.1497   LearningRate 0.0477   Epoch: 13   Global Step: 142940   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:17:43,349-Speed 5964.58 samples/sec   Loss 5.1286   LearningRate 0.0477   Epoch: 13   Global Step: 142950   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:17:50,217-Speed 5964.51 samples/sec   Loss 5.1652   LearningRate 0.0477   Epoch: 13   Global Step: 142960   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-01-09 00:17:57,086-Speed 5964.67 samples/sec   Loss 5.1626   LearningRate 0.0476   Epoch: 13   Global Step: 142970   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:18:03,942-Speed 5975.19 samples/sec   Loss 5.1327   LearningRate 0.0476   Epoch: 13   Global Step: 142980   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:18:10,807-Speed 5967.78 samples/sec   Loss 5.1719   LearningRate 0.0476   Epoch: 13   Global Step: 142990   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:18:17,677-Speed 5963.35 samples/sec   Loss 5.0994   LearningRate 0.0476   Epoch: 13   Global Step: 143000   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:18:24,611-Speed 5909.21 samples/sec   Loss 5.1266   LearningRate 0.0476   Epoch: 13   Global Step: 143010   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:18:31,477-Speed 5966.32 samples/sec   Loss 5.1343   LearningRate 0.0476   Epoch: 13   Global Step: 143020   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:18:38,340-Speed 5969.81 samples/sec   Loss 5.1057   LearningRate 0.0475   Epoch: 13   Global Step: 143030   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:18:45,220-Speed 5954.44 samples/sec   Loss 5.1197   LearningRate 0.0475   Epoch: 13   Global Step: 143040   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:18:52,085-Speed 5967.86 samples/sec   Loss 5.0874   LearningRate 0.0475   Epoch: 13   Global Step: 143050   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:18:58,963-Speed 5958.47 samples/sec   Loss 5.1138   LearningRate 0.0475   Epoch: 13   Global Step: 143060   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:19:05,813-Speed 5980.76 samples/sec   Loss 5.1740   LearningRate 0.0475   Epoch: 13   Global Step: 143070   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:19:12,663-Speed 5980.39 samples/sec   Loss 5.1257   LearningRate 0.0475   Epoch: 13   Global Step: 143080   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:19:19,532-Speed 5964.57 samples/sec   Loss 5.1001   LearningRate 0.0475   Epoch: 13   Global Step: 143090   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:19:26,395-Speed 5969.79 samples/sec   Loss 5.1148   LearningRate 0.0474   Epoch: 13   Global Step: 143100   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:19:33,237-Speed 5987.81 samples/sec   Loss 5.0989   LearningRate 0.0474   Epoch: 13   Global Step: 143110   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:19:40,089-Speed 5978.99 samples/sec   Loss 5.1261   LearningRate 0.0474   Epoch: 13   Global Step: 143120   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:19:46,966-Speed 5957.91 samples/sec   Loss 5.1512   LearningRate 0.0474   Epoch: 13   Global Step: 143130   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:19:53,815-Speed 5980.44 samples/sec   Loss 5.0908   LearningRate 0.0474   Epoch: 13   Global Step: 143140   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:20:00,677-Speed 5970.79 samples/sec   Loss 5.1768   LearningRate 0.0474   Epoch: 13   Global Step: 143150   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:20:07,529-Speed 5979.42 samples/sec   Loss 5.1068   LearningRate 0.0474   Epoch: 13   Global Step: 143160   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:20:14,465-Speed 5905.83 samples/sec   Loss 5.1043   LearningRate 0.0473   Epoch: 13   Global Step: 143170   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:20:21,323-Speed 5974.69 samples/sec   Loss 5.1075   LearningRate 0.0473   Epoch: 13   Global Step: 143180   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:20:28,178-Speed 5975.90 samples/sec   Loss 5.1272   LearningRate 0.0473   Epoch: 13   Global Step: 143190   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:20:35,019-Speed 5988.15 samples/sec   Loss 5.1110   LearningRate 0.0473   Epoch: 13   Global Step: 143200   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:20:41,876-Speed 5975.51 samples/sec   Loss 5.1020   LearningRate 0.0473   Epoch: 13   Global Step: 143210   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:20:48,732-Speed 5975.53 samples/sec   Loss 5.1223   LearningRate 0.0473   Epoch: 13   Global Step: 143220   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:20:55,635-Speed 5934.53 samples/sec   Loss 5.1181   LearningRate 0.0473   Epoch: 13   Global Step: 143230   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:21:02,500-Speed 5967.87 samples/sec   Loss 5.1100   LearningRate 0.0472   Epoch: 13   Global Step: 143240   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:21:09,352-Speed 5978.70 samples/sec   Loss 5.0680   LearningRate 0.0472   Epoch: 13   Global Step: 143250   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:21:16,213-Speed 5970.98 samples/sec   Loss 5.1328   LearningRate 0.0472   Epoch: 13   Global Step: 143260   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:21:23,059-Speed 5984.14 samples/sec   Loss 5.0852   LearningRate 0.0472   Epoch: 13   Global Step: 143270   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:21:29,917-Speed 5974.31 samples/sec   Loss 5.0873   LearningRate 0.0472   Epoch: 13   Global Step: 143280   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:21:36,767-Speed 5979.52 samples/sec   Loss 5.1010   LearningRate 0.0472   Epoch: 13   Global Step: 143290   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:21:43,644-Speed 5957.89 samples/sec   Loss 5.1977   LearningRate 0.0472   Epoch: 13   Global Step: 143300   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:21:50,512-Speed 5965.79 samples/sec   Loss 5.0846   LearningRate 0.0471   Epoch: 13   Global Step: 143310   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:21:57,381-Speed 5964.06 samples/sec   Loss 5.0713   LearningRate 0.0471   Epoch: 13   Global Step: 143320   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:22:04,243-Speed 5970.55 samples/sec   Loss 5.0892   LearningRate 0.0471   Epoch: 13   Global Step: 143330   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:22:11,085-Speed 5987.33 samples/sec   Loss 5.0694   LearningRate 0.0471   Epoch: 13   Global Step: 143340   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:22:17,937-Speed 5979.25 samples/sec   Loss 5.1207   LearningRate 0.0471   Epoch: 13   Global Step: 143350   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:22:24,777-Speed 5989.16 samples/sec   Loss 5.0831   LearningRate 0.0471   Epoch: 13   Global Step: 143360   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:22:31,630-Speed 5977.94 samples/sec   Loss 5.1051   LearningRate 0.0470   Epoch: 13   Global Step: 143370   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:22:38,485-Speed 5976.51 samples/sec   Loss 5.1561   LearningRate 0.0470   Epoch: 13   Global Step: 143380   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:22:45,349-Speed 5968.61 samples/sec   Loss 5.0759   LearningRate 0.0470   Epoch: 13   Global Step: 143390   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:22:52,240-Speed 5946.62 samples/sec   Loss 5.1527   LearningRate 0.0470   Epoch: 13   Global Step: 143400   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:22:59,091-Speed 5979.15 samples/sec   Loss 5.1274   LearningRate 0.0470   Epoch: 13   Global Step: 143410   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:23:05,960-Speed 5963.95 samples/sec   Loss 5.1643   LearningRate 0.0470   Epoch: 13   Global Step: 143420   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:23:12,816-Speed 5975.66 samples/sec   Loss 5.0789   LearningRate 0.0470   Epoch: 13   Global Step: 143430   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:23:19,678-Speed 5970.13 samples/sec   Loss 5.1087   LearningRate 0.0469   Epoch: 13   Global Step: 143440   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:23:26,533-Speed 5977.27 samples/sec   Loss 5.0902   LearningRate 0.0469   Epoch: 13   Global Step: 143450   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:23:33,391-Speed 5973.32 samples/sec   Loss 5.1453   LearningRate 0.0469   Epoch: 13   Global Step: 143460   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:23:40,246-Speed 5976.20 samples/sec   Loss 5.0868   LearningRate 0.0469   Epoch: 13   Global Step: 143470   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:23:47,103-Speed 5974.60 samples/sec   Loss 5.1397   LearningRate 0.0469   Epoch: 13   Global Step: 143480   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:23:53,985-Speed 5952.82 samples/sec   Loss 5.1168   LearningRate 0.0469   Epoch: 13   Global Step: 143490   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:24:00,839-Speed 5977.33 samples/sec   Loss 5.0587   LearningRate 0.0469   Epoch: 13   Global Step: 143500   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:24:07,707-Speed 5965.05 samples/sec   Loss 5.0714   LearningRate 0.0468   Epoch: 13   Global Step: 143510   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:24:14,563-Speed 5977.03 samples/sec   Loss 5.0916   LearningRate 0.0468   Epoch: 13   Global Step: 143520   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:24:21,420-Speed 5973.45 samples/sec   Loss 5.0558   LearningRate 0.0468   Epoch: 13   Global Step: 143530   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:24:28,296-Speed 5958.43 samples/sec   Loss 5.1004   LearningRate 0.0468   Epoch: 13   Global Step: 143540   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:24:35,145-Speed 5981.77 samples/sec   Loss 5.0779   LearningRate 0.0468   Epoch: 13   Global Step: 143550   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-01-09 00:24:41,995-Speed 5982.31 samples/sec   Loss 5.0706   LearningRate 0.0468   Epoch: 13   Global Step: 143560   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:24:48,858-Speed 5972.17 samples/sec   Loss 5.0798   LearningRate 0.0468   Epoch: 13   Global Step: 143570   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:24:55,740-Speed 5952.63 samples/sec   Loss 5.0933   LearningRate 0.0467   Epoch: 13   Global Step: 143580   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:25:02,611-Speed 5962.83 samples/sec   Loss 5.0944   LearningRate 0.0467   Epoch: 13   Global Step: 143590   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:25:09,476-Speed 5967.86 samples/sec   Loss 5.0749   LearningRate 0.0467   Epoch: 13   Global Step: 143600   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:25:16,367-Speed 5945.05 samples/sec   Loss 5.1333   LearningRate 0.0467   Epoch: 13   Global Step: 143610   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-01-09 00:25:23,258-Speed 5945.33 samples/sec   Loss 5.0475   LearningRate 0.0467   Epoch: 13   Global Step: 143620   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:25:30,111-Speed 5978.23 samples/sec   Loss 5.0785   LearningRate 0.0467   Epoch: 13   Global Step: 143630   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:25:36,992-Speed 5953.28 samples/sec   Loss 5.0842   LearningRate 0.0467   Epoch: 13   Global Step: 143640   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:25:43,838-Speed 5984.42 samples/sec   Loss 5.0637   LearningRate 0.0466   Epoch: 13   Global Step: 143650   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:25:50,706-Speed 5964.90 samples/sec   Loss 5.0691   LearningRate 0.0466   Epoch: 13   Global Step: 143660   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:25:57,577-Speed 5962.51 samples/sec   Loss 5.1216   LearningRate 0.0466   Epoch: 13   Global Step: 143670   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:26:04,427-Speed 5981.17 samples/sec   Loss 5.0756   LearningRate 0.0466   Epoch: 13   Global Step: 143680   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:26:11,279-Speed 5978.87 samples/sec   Loss 5.0604   LearningRate 0.0466   Epoch: 13   Global Step: 143690   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:26:18,129-Speed 5980.82 samples/sec   Loss 5.0679   LearningRate 0.0466   Epoch: 13   Global Step: 143700   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:26:24,993-Speed 5968.13 samples/sec   Loss 5.0887   LearningRate 0.0465   Epoch: 13   Global Step: 143710   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:26:31,855-Speed 5970.42 samples/sec   Loss 5.0929   LearningRate 0.0465   Epoch: 13   Global Step: 143720   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:26:38,730-Speed 5961.41 samples/sec   Loss 5.0645   LearningRate 0.0465   Epoch: 13   Global Step: 143730   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:26:45,580-Speed 5980.67 samples/sec   Loss 5.0862   LearningRate 0.0465   Epoch: 13   Global Step: 143740   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:26:52,433-Speed 5978.39 samples/sec   Loss 5.0749   LearningRate 0.0465   Epoch: 13   Global Step: 143750   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:26:59,293-Speed 5972.13 samples/sec   Loss 5.1014   LearningRate 0.0465   Epoch: 13   Global Step: 143760   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:27:06,166-Speed 5960.28 samples/sec   Loss 5.1009   LearningRate 0.0465   Epoch: 13   Global Step: 143770   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:27:13,044-Speed 5956.35 samples/sec   Loss 5.0499   LearningRate 0.0464   Epoch: 13   Global Step: 143780   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:27:19,898-Speed 5977.46 samples/sec   Loss 5.0544   LearningRate 0.0464   Epoch: 13   Global Step: 143790   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:27:26,773-Speed 5958.81 samples/sec   Loss 5.0450   LearningRate 0.0464   Epoch: 13   Global Step: 143800   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:27:33,646-Speed 5960.64 samples/sec   Loss 5.0598   LearningRate 0.0464   Epoch: 13   Global Step: 143810   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:27:40,514-Speed 5965.73 samples/sec   Loss 5.0375   LearningRate 0.0464   Epoch: 13   Global Step: 143820   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:27:47,364-Speed 5980.25 samples/sec   Loss 5.0828   LearningRate 0.0464   Epoch: 13   Global Step: 143830   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:27:54,238-Speed 5959.28 samples/sec   Loss 5.0504   LearningRate 0.0464   Epoch: 13   Global Step: 143840   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:28:01,106-Speed 5965.95 samples/sec   Loss 5.0272   LearningRate 0.0463   Epoch: 13   Global Step: 143850   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-01-09 00:28:07,957-Speed 5979.13 samples/sec   Loss 5.1123   LearningRate 0.0463   Epoch: 13   Global Step: 143860   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:28:14,813-Speed 5976.14 samples/sec   Loss 4.9970   LearningRate 0.0463   Epoch: 13   Global Step: 143870   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:28:21,664-Speed 5980.14 samples/sec   Loss 5.1186   LearningRate 0.0463   Epoch: 13   Global Step: 143880   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:28:28,523-Speed 5972.56 samples/sec   Loss 5.0488   LearningRate 0.0463   Epoch: 13   Global Step: 143890   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:28:35,409-Speed 5950.33 samples/sec   Loss 5.0838   LearningRate 0.0463   Epoch: 13   Global Step: 143900   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:28:42,274-Speed 5968.24 samples/sec   Loss 5.0167   LearningRate 0.0463   Epoch: 13   Global Step: 143910   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:28:49,134-Speed 5971.58 samples/sec   Loss 5.0800   LearningRate 0.0462   Epoch: 13   Global Step: 143920   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:28:55,986-Speed 5979.48 samples/sec   Loss 5.0552   LearningRate 0.0462   Epoch: 13   Global Step: 143930   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:29:02,837-Speed 5979.07 samples/sec   Loss 5.0618   LearningRate 0.0462   Epoch: 13   Global Step: 143940   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:29:09,690-Speed 5978.59 samples/sec   Loss 5.0723   LearningRate 0.0462   Epoch: 13   Global Step: 143950   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:29:16,548-Speed 5974.01 samples/sec   Loss 5.0215   LearningRate 0.0462   Epoch: 13   Global Step: 143960   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:29:23,418-Speed 5963.19 samples/sec   Loss 5.0540   LearningRate 0.0462   Epoch: 13   Global Step: 143970   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:29:30,277-Speed 5972.17 samples/sec   Loss 5.1210   LearningRate 0.0462   Epoch: 13   Global Step: 143980   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:29:37,182-Speed 5935.40 samples/sec   Loss 5.0617   LearningRate 0.0461   Epoch: 13   Global Step: 143990   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:29:44,070-Speed 5947.03 samples/sec   Loss 5.0952   LearningRate 0.0461   Epoch: 13   Global Step: 144000   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:29:50,978-Speed 5929.86 samples/sec   Loss 5.0171   LearningRate 0.0461   Epoch: 13   Global Step: 144010   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:29:57,864-Speed 5950.42 samples/sec   Loss 5.1673   LearningRate 0.0461   Epoch: 13   Global Step: 144020   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:30:04,721-Speed 5974.57 samples/sec   Loss 5.0658   LearningRate 0.0461   Epoch: 13   Global Step: 144030   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:30:11,575-Speed 5977.32 samples/sec   Loss 5.0733   LearningRate 0.0461   Epoch: 13   Global Step: 144040   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:30:18,474-Speed 5938.38 samples/sec   Loss 5.1121   LearningRate 0.0461   Epoch: 13   Global Step: 144050   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:30:25,332-Speed 5973.68 samples/sec   Loss 5.0318   LearningRate 0.0460   Epoch: 13   Global Step: 144060   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:30:32,206-Speed 5960.30 samples/sec   Loss 5.0753   LearningRate 0.0460   Epoch: 13   Global Step: 144070   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:30:39,063-Speed 5974.54 samples/sec   Loss 5.0557   LearningRate 0.0460   Epoch: 13   Global Step: 144080   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:30:45,936-Speed 5960.51 samples/sec   Loss 5.0475   LearningRate 0.0460   Epoch: 13   Global Step: 144090   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:30:52,804-Speed 5964.42 samples/sec   Loss 5.0650   LearningRate 0.0460   Epoch: 13   Global Step: 144100   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:30:59,693-Speed 5948.17 samples/sec   Loss 5.0595   LearningRate 0.0460   Epoch: 13   Global Step: 144110   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:31:06,543-Speed 5981.83 samples/sec   Loss 5.0345   LearningRate 0.0460   Epoch: 13   Global Step: 144120   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:31:13,409-Speed 5965.80 samples/sec   Loss 5.0524   LearningRate 0.0459   Epoch: 13   Global Step: 144130   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:31:20,256-Speed 5983.63 samples/sec   Loss 5.0759   LearningRate 0.0459   Epoch: 13   Global Step: 144140   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:31:27,114-Speed 5973.47 samples/sec   Loss 5.0225   LearningRate 0.0459   Epoch: 13   Global Step: 144150   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:31:33,998-Speed 5953.62 samples/sec   Loss 5.0692   LearningRate 0.0459   Epoch: 13   Global Step: 144160   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:31:40,882-Speed 5950.88 samples/sec   Loss 5.0080   LearningRate 0.0459   Epoch: 13   Global Step: 144170   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:31:47,741-Speed 5973.44 samples/sec   Loss 5.0686   LearningRate 0.0459   Epoch: 13   Global Step: 144180   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:31:54,608-Speed 5966.03 samples/sec   Loss 5.0187   LearningRate 0.0458   Epoch: 13   Global Step: 144190   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:32:01,489-Speed 5953.47 samples/sec   Loss 5.1009   LearningRate 0.0458   Epoch: 13   Global Step: 144200   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:32:08,354-Speed 5970.89 samples/sec   Loss 5.0916   LearningRate 0.0458   Epoch: 13   Global Step: 144210   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:32:15,251-Speed 5939.36 samples/sec   Loss 5.0056   LearningRate 0.0458   Epoch: 13   Global Step: 144220   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:32:22,139-Speed 5947.94 samples/sec   Loss 4.9850   LearningRate 0.0458   Epoch: 13   Global Step: 144230   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:32:29,062-Speed 5921.28 samples/sec   Loss 5.0395   LearningRate 0.0458   Epoch: 13   Global Step: 144240   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:32:35,944-Speed 5952.93 samples/sec   Loss 5.0796   LearningRate 0.0458   Epoch: 13   Global Step: 144250   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:32:42,803-Speed 5975.05 samples/sec   Loss 5.0337   LearningRate 0.0457   Epoch: 13   Global Step: 144260   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:32:49,673-Speed 5964.29 samples/sec   Loss 5.0299   LearningRate 0.0457   Epoch: 13   Global Step: 144270   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:32:56,537-Speed 5968.36 samples/sec   Loss 5.0359   LearningRate 0.0457   Epoch: 13   Global Step: 144280   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:33:03,389-Speed 5978.58 samples/sec   Loss 5.0107   LearningRate 0.0457   Epoch: 13   Global Step: 144290   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:33:10,304-Speed 5924.87 samples/sec   Loss 5.0389   LearningRate 0.0457   Epoch: 13   Global Step: 144300   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:33:17,194-Speed 5945.53 samples/sec   Loss 5.0806   LearningRate 0.0457   Epoch: 13   Global Step: 144310   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:33:24,062-Speed 5965.05 samples/sec   Loss 5.0770   LearningRate 0.0457   Epoch: 13   Global Step: 144320   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:33:30,940-Speed 5956.95 samples/sec   Loss 5.0422   LearningRate 0.0456   Epoch: 13   Global Step: 144330   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:33:37,820-Speed 5954.71 samples/sec   Loss 4.9995   LearningRate 0.0456   Epoch: 13   Global Step: 144340   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:33:44,742-Speed 5918.62 samples/sec   Loss 5.0233   LearningRate 0.0456   Epoch: 13   Global Step: 144350   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:33:51,653-Speed 5928.21 samples/sec   Loss 5.0290   LearningRate 0.0456   Epoch: 13   Global Step: 144360   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:33:58,548-Speed 5941.40 samples/sec   Loss 5.0596   LearningRate 0.0456   Epoch: 13   Global Step: 144370   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:34:05,414-Speed 5967.86 samples/sec   Loss 5.0296   LearningRate 0.0456   Epoch: 13   Global Step: 144380   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:34:12,274-Speed 5972.33 samples/sec   Loss 5.0659   LearningRate 0.0456   Epoch: 13   Global Step: 144390   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:34:19,136-Speed 5969.92 samples/sec   Loss 4.9963   LearningRate 0.0455   Epoch: 13   Global Step: 144400   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:34:25,996-Speed 5976.36 samples/sec   Loss 5.0375   LearningRate 0.0455   Epoch: 13   Global Step: 144410   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:34:32,861-Speed 5968.03 samples/sec   Loss 5.0184   LearningRate 0.0455   Epoch: 13   Global Step: 144420   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:34:39,714-Speed 5977.84 samples/sec   Loss 5.0365   LearningRate 0.0455   Epoch: 13   Global Step: 144430   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:34:46,560-Speed 5983.99 samples/sec   Loss 5.0722   LearningRate 0.0455   Epoch: 13   Global Step: 144440   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:34:53,401-Speed 5988.67 samples/sec   Loss 5.0516   LearningRate 0.0455   Epoch: 13   Global Step: 144450   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:35:00,257-Speed 5975.30 samples/sec   Loss 5.0433   LearningRate 0.0455   Epoch: 13   Global Step: 144460   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:35:07,114-Speed 5974.13 samples/sec   Loss 5.0525   LearningRate 0.0454   Epoch: 13   Global Step: 144470   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:35:13,986-Speed 5962.02 samples/sec   Loss 5.0519   LearningRate 0.0454   Epoch: 13   Global Step: 144480   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:35:20,844-Speed 5972.57 samples/sec   Loss 5.0352   LearningRate 0.0454   Epoch: 13   Global Step: 144490   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:35:27,717-Speed 5961.13 samples/sec   Loss 5.0352   LearningRate 0.0454   Epoch: 13   Global Step: 144500   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:35:34,576-Speed 5975.24 samples/sec   Loss 4.9680   LearningRate 0.0454   Epoch: 13   Global Step: 144510   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:35:41,435-Speed 5971.53 samples/sec   Loss 5.0763   LearningRate 0.0454   Epoch: 13   Global Step: 144520   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:35:48,405-Speed 5878.38 samples/sec   Loss 5.0063   LearningRate 0.0454   Epoch: 13   Global Step: 144530   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:35:55,259-Speed 5977.34 samples/sec   Loss 4.9791   LearningRate 0.0453   Epoch: 13   Global Step: 144540   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:36:02,130-Speed 5963.01 samples/sec   Loss 5.0134   LearningRate 0.0453   Epoch: 13   Global Step: 144550   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:36:09,014-Speed 5951.44 samples/sec   Loss 4.9710   LearningRate 0.0453   Epoch: 13   Global Step: 144560   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:36:15,878-Speed 5969.33 samples/sec   Loss 5.0623   LearningRate 0.0453   Epoch: 13   Global Step: 144570   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:36:22,754-Speed 5957.74 samples/sec   Loss 5.0623   LearningRate 0.0453   Epoch: 13   Global Step: 144580   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:36:29,614-Speed 5972.49 samples/sec   Loss 5.0470   LearningRate 0.0453   Epoch: 13   Global Step: 144590   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:36:36,454-Speed 5989.11 samples/sec   Loss 5.0487   LearningRate 0.0453   Epoch: 13   Global Step: 144600   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:36:43,311-Speed 5974.49 samples/sec   Loss 5.0019   LearningRate 0.0452   Epoch: 13   Global Step: 144610   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:36:50,157-Speed 5984.10 samples/sec   Loss 5.0195   LearningRate 0.0452   Epoch: 13   Global Step: 144620   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:36:57,007-Speed 5983.40 samples/sec   Loss 5.0451   LearningRate 0.0452   Epoch: 13   Global Step: 144630   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:37:03,907-Speed 5937.47 samples/sec   Loss 5.0437   LearningRate 0.0452   Epoch: 13   Global Step: 144640   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:37:10,781-Speed 5960.23 samples/sec   Loss 5.0599   LearningRate 0.0452   Epoch: 13   Global Step: 144650   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:37:17,680-Speed 5938.64 samples/sec   Loss 5.0148   LearningRate 0.0452   Epoch: 13   Global Step: 144660   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:37:24,527-Speed 5982.84 samples/sec   Loss 5.0456   LearningRate 0.0452   Epoch: 13   Global Step: 144670   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:37:31,390-Speed 5971.39 samples/sec   Loss 4.9822   LearningRate 0.0451   Epoch: 13   Global Step: 144680   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:37:38,249-Speed 5973.18 samples/sec   Loss 5.0040   LearningRate 0.0451   Epoch: 13   Global Step: 144690   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:37:45,110-Speed 5970.39 samples/sec   Loss 4.9982   LearningRate 0.0451   Epoch: 13   Global Step: 144700   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:37:51,968-Speed 5974.19 samples/sec   Loss 5.0046   LearningRate 0.0451   Epoch: 13   Global Step: 144710   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:37:58,833-Speed 5967.92 samples/sec   Loss 5.0377   LearningRate 0.0451   Epoch: 13   Global Step: 144720   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:38:05,687-Speed 5977.33 samples/sec   Loss 5.0074   LearningRate 0.0451   Epoch: 13   Global Step: 144730   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:38:12,542-Speed 5975.87 samples/sec   Loss 5.0064   LearningRate 0.0451   Epoch: 13   Global Step: 144740   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:38:19,404-Speed 5971.12 samples/sec   Loss 4.9829   LearningRate 0.0450   Epoch: 13   Global Step: 144750   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:38:26,284-Speed 5954.59 samples/sec   Loss 5.0221   LearningRate 0.0450   Epoch: 13   Global Step: 144760   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:38:33,134-Speed 5981.35 samples/sec   Loss 5.0066   LearningRate 0.0450   Epoch: 13   Global Step: 144770   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:38:40,017-Speed 5951.56 samples/sec   Loss 4.9995   LearningRate 0.0450   Epoch: 13   Global Step: 144780   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:38:46,872-Speed 5976.23 samples/sec   Loss 4.9948   LearningRate 0.0450   Epoch: 13   Global Step: 144790   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:38:53,745-Speed 5963.28 samples/sec   Loss 4.9654   LearningRate 0.0450   Epoch: 13   Global Step: 144800   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:39:00,610-Speed 5973.62 samples/sec   Loss 5.0009   LearningRate 0.0450   Epoch: 13   Global Step: 144810   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:39:07,457-Speed 5983.04 samples/sec   Loss 5.0486   LearningRate 0.0449   Epoch: 13   Global Step: 144820   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:39:14,330-Speed 5960.91 samples/sec   Loss 5.0678   LearningRate 0.0449   Epoch: 13   Global Step: 144830   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:39:21,220-Speed 5946.45 samples/sec   Loss 4.9928   LearningRate 0.0449   Epoch: 13   Global Step: 144840   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:39:28,116-Speed 5940.73 samples/sec   Loss 4.9822   LearningRate 0.0449   Epoch: 13   Global Step: 144850   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:39:34,983-Speed 5965.29 samples/sec   Loss 4.9908   LearningRate 0.0449   Epoch: 13   Global Step: 144860   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:39:41,847-Speed 5968.91 samples/sec   Loss 4.9651   LearningRate 0.0449   Epoch: 13   Global Step: 144870   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:39:48,704-Speed 5974.36 samples/sec   Loss 4.9845   LearningRate 0.0449   Epoch: 13   Global Step: 144880   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:39:55,599-Speed 5942.01 samples/sec   Loss 4.9820   LearningRate 0.0448   Epoch: 13   Global Step: 144890   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:40:02,454-Speed 5976.63 samples/sec   Loss 4.9604   LearningRate 0.0448   Epoch: 13   Global Step: 144900   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:40:09,301-Speed 5982.94 samples/sec   Loss 5.0151   LearningRate 0.0448   Epoch: 13   Global Step: 144910   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:40:16,177-Speed 5958.37 samples/sec   Loss 4.9628   LearningRate 0.0448   Epoch: 13   Global Step: 144920   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:40:23,025-Speed 5982.47 samples/sec   Loss 5.0177   LearningRate 0.0448   Epoch: 13   Global Step: 144930   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:40:29,877-Speed 5978.12 samples/sec   Loss 4.9713   LearningRate 0.0448   Epoch: 13   Global Step: 144940   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:40:36,725-Speed 5982.95 samples/sec   Loss 4.9759   LearningRate 0.0448   Epoch: 13   Global Step: 144950   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:40:43,583-Speed 5974.18 samples/sec   Loss 4.9644   LearningRate 0.0447   Epoch: 13   Global Step: 144960   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:40:50,441-Speed 5972.93 samples/sec   Loss 5.0201   LearningRate 0.0447   Epoch: 13   Global Step: 144970   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:40:57,292-Speed 5979.94 samples/sec   Loss 4.9771   LearningRate 0.0447   Epoch: 13   Global Step: 144980   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:41:04,162-Speed 5962.83 samples/sec   Loss 4.9433   LearningRate 0.0447   Epoch: 13   Global Step: 144990   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:41:11,013-Speed 5979.82 samples/sec   Loss 5.0421   LearningRate 0.0447   Epoch: 13   Global Step: 145000   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:41:38,083-[lfw][145000]XNorm: 24.053263
Training: 2022-01-09 00:41:38,084-[lfw][145000]Accuracy-Flip: 0.99800+-0.00296
Training: 2022-01-09 00:41:38,084-[lfw][145000]Accuracy-Highest: 0.99800
Training: 2022-01-09 00:42:09,428-[cfp_fp][145000]XNorm: 21.189103
Training: 2022-01-09 00:42:09,429-[cfp_fp][145000]Accuracy-Flip: 0.98700+-0.00375
Training: 2022-01-09 00:42:09,429-[cfp_fp][145000]Accuracy-Highest: 0.98714
Training: 2022-01-09 00:42:36,229-[agedb_30][145000]XNorm: 23.486906
Training: 2022-01-09 00:42:36,230-[agedb_30][145000]Accuracy-Flip: 0.97800+-0.00694
Training: 2022-01-09 00:42:36,231-[agedb_30][145000]Accuracy-Highest: 0.97800
Training: 2022-01-09 00:42:43,073-Speed 444.94 samples/sec   Loss 5.0248   LearningRate 0.0447   Epoch: 13   Global Step: 145010   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:42:49,907-Speed 5994.10 samples/sec   Loss 4.9600   LearningRate 0.0447   Epoch: 13   Global Step: 145020   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:42:57,222-Speed 5600.66 samples/sec   Loss 5.0203   LearningRate 0.0446   Epoch: 13   Global Step: 145030   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:43:04,060-Speed 5991.06 samples/sec   Loss 5.0115   LearningRate 0.0446   Epoch: 13   Global Step: 145040   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:43:10,922-Speed 5970.15 samples/sec   Loss 5.0270   LearningRate 0.0446   Epoch: 13   Global Step: 145050   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:43:17,774-Speed 5980.34 samples/sec   Loss 4.9879   LearningRate 0.0446   Epoch: 13   Global Step: 145060   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:43:24,632-Speed 5973.08 samples/sec   Loss 5.0493   LearningRate 0.0446   Epoch: 13   Global Step: 145070   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:43:31,483-Speed 5979.76 samples/sec   Loss 5.0128   LearningRate 0.0446   Epoch: 13   Global Step: 145080   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:43:38,332-Speed 5982.47 samples/sec   Loss 4.9777   LearningRate 0.0446   Epoch: 13   Global Step: 145090   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:43:45,183-Speed 5979.58 samples/sec   Loss 5.0321   LearningRate 0.0445   Epoch: 13   Global Step: 145100   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:43:52,039-Speed 5974.96 samples/sec   Loss 5.0043   LearningRate 0.0445   Epoch: 13   Global Step: 145110   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:43:58,910-Speed 5963.02 samples/sec   Loss 4.9710   LearningRate 0.0445   Epoch: 13   Global Step: 145120   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:44:05,815-Speed 5932.68 samples/sec   Loss 5.0164   LearningRate 0.0445   Epoch: 13   Global Step: 145130   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:44:12,684-Speed 5963.71 samples/sec   Loss 5.0057   LearningRate 0.0445   Epoch: 13   Global Step: 145140   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:44:19,556-Speed 5962.13 samples/sec   Loss 5.0190   LearningRate 0.0445   Epoch: 13   Global Step: 145150   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:44:26,415-Speed 5973.13 samples/sec   Loss 5.0112   LearningRate 0.0445   Epoch: 13   Global Step: 145160   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:44:33,278-Speed 5968.89 samples/sec   Loss 4.9820   LearningRate 0.0444   Epoch: 13   Global Step: 145170   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:44:56,506-Speed 1763.51 samples/sec   Loss 4.9649   LearningRate 0.0444   Epoch: 14   Global Step: 145180   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:45:03,349-Speed 5987.05 samples/sec   Loss 4.9908   LearningRate 0.0444   Epoch: 14   Global Step: 145190   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:45:10,183-Speed 5994.46 samples/sec   Loss 4.9693   LearningRate 0.0444   Epoch: 14   Global Step: 145200   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:45:17,038-Speed 5976.31 samples/sec   Loss 4.9938   LearningRate 0.0444   Epoch: 14   Global Step: 145210   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:45:23,900-Speed 5970.61 samples/sec   Loss 4.9825   LearningRate 0.0444   Epoch: 14   Global Step: 145220   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:45:30,773-Speed 5959.94 samples/sec   Loss 5.0015   LearningRate 0.0444   Epoch: 14   Global Step: 145230   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:45:37,626-Speed 5981.68 samples/sec   Loss 4.9230   LearningRate 0.0443   Epoch: 14   Global Step: 145240   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:45:44,523-Speed 5939.24 samples/sec   Loss 4.9897   LearningRate 0.0443   Epoch: 14   Global Step: 145250   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:45:51,428-Speed 5934.78 samples/sec   Loss 4.9355   LearningRate 0.0443   Epoch: 14   Global Step: 145260   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:45:58,351-Speed 5917.21 samples/sec   Loss 4.9148   LearningRate 0.0443   Epoch: 14   Global Step: 145270   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:46:05,228-Speed 5957.25 samples/sec   Loss 4.9068   LearningRate 0.0443   Epoch: 14   Global Step: 145280   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:46:12,124-Speed 5943.20 samples/sec   Loss 4.9823   LearningRate 0.0443   Epoch: 14   Global Step: 145290   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:46:19,059-Speed 5907.47 samples/sec   Loss 4.9418   LearningRate 0.0443   Epoch: 14   Global Step: 145300   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:46:25,970-Speed 5927.83 samples/sec   Loss 4.9901   LearningRate 0.0442   Epoch: 14   Global Step: 145310   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:46:32,907-Speed 5905.90 samples/sec   Loss 4.9694   LearningRate 0.0442   Epoch: 14   Global Step: 145320   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:46:39,791-Speed 5951.87 samples/sec   Loss 4.9450   LearningRate 0.0442   Epoch: 14   Global Step: 145330   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:46:46,647-Speed 5975.24 samples/sec   Loss 4.9397   LearningRate 0.0442   Epoch: 14   Global Step: 145340   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:46:53,521-Speed 5963.23 samples/sec   Loss 4.9626   LearningRate 0.0442   Epoch: 14   Global Step: 145350   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:47:00,407-Speed 5949.20 samples/sec   Loss 4.9649   LearningRate 0.0442   Epoch: 14   Global Step: 145360   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:47:07,267-Speed 5972.39 samples/sec   Loss 4.8966   LearningRate 0.0442   Epoch: 14   Global Step: 145370   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:47:14,138-Speed 5962.36 samples/sec   Loss 4.9541   LearningRate 0.0441   Epoch: 14   Global Step: 145380   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:47:20,996-Speed 5973.34 samples/sec   Loss 5.0132   LearningRate 0.0441   Epoch: 14   Global Step: 145390   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:47:27,862-Speed 5968.28 samples/sec   Loss 4.9942   LearningRate 0.0441   Epoch: 14   Global Step: 145400   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:47:34,724-Speed 5970.36 samples/sec   Loss 4.9497   LearningRate 0.0441   Epoch: 14   Global Step: 145410   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:47:41,883-Speed 5722.14 samples/sec   Loss 4.9789   LearningRate 0.0441   Epoch: 14   Global Step: 145420   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:47:48,781-Speed 5939.10 samples/sec   Loss 4.9335   LearningRate 0.0441   Epoch: 14   Global Step: 145430   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:47:55,644-Speed 5971.49 samples/sec   Loss 4.9500   LearningRate 0.0441   Epoch: 14   Global Step: 145440   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 00:48:02,515-Speed 5962.50 samples/sec   Loss 4.9816   LearningRate 0.0440   Epoch: 14   Global Step: 145450   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 00:48:09,362-Speed 5983.37 samples/sec   Loss 4.9588   LearningRate 0.0440   Epoch: 14   Global Step: 145460   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 00:48:16,214-Speed 5980.53 samples/sec   Loss 4.9760   LearningRate 0.0440   Epoch: 14   Global Step: 145470   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 00:48:23,079-Speed 5967.60 samples/sec   Loss 4.9213   LearningRate 0.0440   Epoch: 14   Global Step: 145480   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 00:48:29,949-Speed 5963.66 samples/sec   Loss 4.9334   LearningRate 0.0440   Epoch: 14   Global Step: 145490   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 00:48:36,802-Speed 5978.06 samples/sec   Loss 4.9700   LearningRate 0.0440   Epoch: 14   Global Step: 145500   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 00:48:43,658-Speed 5975.23 samples/sec   Loss 4.9485   LearningRate 0.0440   Epoch: 14   Global Step: 145510   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 00:48:50,551-Speed 5943.13 samples/sec   Loss 4.9661   LearningRate 0.0439   Epoch: 14   Global Step: 145520   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 00:48:57,411-Speed 5972.17 samples/sec   Loss 4.9294   LearningRate 0.0439   Epoch: 14   Global Step: 145530   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 00:49:04,297-Speed 5948.90 samples/sec   Loss 4.9141   LearningRate 0.0439   Epoch: 14   Global Step: 145540   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:49:11,158-Speed 5971.27 samples/sec   Loss 4.9354   LearningRate 0.0439   Epoch: 14   Global Step: 145550   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:49:18,021-Speed 5968.89 samples/sec   Loss 4.9966   LearningRate 0.0439   Epoch: 14   Global Step: 145560   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:49:24,878-Speed 5974.79 samples/sec   Loss 4.9618   LearningRate 0.0439   Epoch: 14   Global Step: 145570   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:49:31,736-Speed 5973.35 samples/sec   Loss 4.9416   LearningRate 0.0439   Epoch: 14   Global Step: 145580   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:49:38,627-Speed 5945.46 samples/sec   Loss 4.9950   LearningRate 0.0438   Epoch: 14   Global Step: 145590   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:49:45,498-Speed 5962.21 samples/sec   Loss 4.9435   LearningRate 0.0438   Epoch: 14   Global Step: 145600   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:49:52,378-Speed 5954.50 samples/sec   Loss 5.0335   LearningRate 0.0438   Epoch: 14   Global Step: 145610   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:49:59,270-Speed 5944.45 samples/sec   Loss 4.9456   LearningRate 0.0438   Epoch: 14   Global Step: 145620   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:50:06,141-Speed 5962.09 samples/sec   Loss 4.9656   LearningRate 0.0438   Epoch: 14   Global Step: 145630   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:50:13,002-Speed 5970.31 samples/sec   Loss 4.8846   LearningRate 0.0438   Epoch: 14   Global Step: 145640   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:50:19,866-Speed 5969.05 samples/sec   Loss 4.9608   LearningRate 0.0438   Epoch: 14   Global Step: 145650   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:50:26,721-Speed 5977.21 samples/sec   Loss 4.9943   LearningRate 0.0437   Epoch: 14   Global Step: 145660   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:50:33,610-Speed 5946.78 samples/sec   Loss 4.9423   LearningRate 0.0437   Epoch: 14   Global Step: 145670   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:50:40,485-Speed 5959.13 samples/sec   Loss 4.9645   LearningRate 0.0437   Epoch: 14   Global Step: 145680   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:50:47,359-Speed 5962.39 samples/sec   Loss 4.9666   LearningRate 0.0437   Epoch: 14   Global Step: 145690   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:50:54,210-Speed 5979.57 samples/sec   Loss 4.9483   LearningRate 0.0437   Epoch: 14   Global Step: 145700   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:51:01,070-Speed 5972.38 samples/sec   Loss 4.9072   LearningRate 0.0437   Epoch: 14   Global Step: 145710   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:51:07,933-Speed 5972.46 samples/sec   Loss 4.9034   LearningRate 0.0437   Epoch: 14   Global Step: 145720   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:51:14,824-Speed 5945.20 samples/sec   Loss 4.9216   LearningRate 0.0436   Epoch: 14   Global Step: 145730   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:51:21,689-Speed 5967.53 samples/sec   Loss 4.9437   LearningRate 0.0436   Epoch: 14   Global Step: 145740   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:51:28,552-Speed 5968.94 samples/sec   Loss 4.9018   LearningRate 0.0436   Epoch: 14   Global Step: 145750   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:51:35,416-Speed 5968.57 samples/sec   Loss 4.9306   LearningRate 0.0436   Epoch: 14   Global Step: 145760   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:51:42,288-Speed 5961.48 samples/sec   Loss 4.9573   LearningRate 0.0436   Epoch: 14   Global Step: 145770   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:51:49,158-Speed 5964.44 samples/sec   Loss 4.9127   LearningRate 0.0436   Epoch: 14   Global Step: 145780   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:51:56,032-Speed 5959.67 samples/sec   Loss 4.9689   LearningRate 0.0436   Epoch: 14   Global Step: 145790   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:52:02,889-Speed 5974.21 samples/sec   Loss 4.9234   LearningRate 0.0435   Epoch: 14   Global Step: 145800   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:52:09,770-Speed 5954.12 samples/sec   Loss 4.9175   LearningRate 0.0435   Epoch: 14   Global Step: 145810   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:52:16,645-Speed 5958.28 samples/sec   Loss 4.9553   LearningRate 0.0435   Epoch: 14   Global Step: 145820   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:52:23,511-Speed 5966.86 samples/sec   Loss 4.9362   LearningRate 0.0435   Epoch: 14   Global Step: 145830   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:52:30,367-Speed 5976.13 samples/sec   Loss 4.9382   LearningRate 0.0435   Epoch: 14   Global Step: 145840   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:52:37,216-Speed 5981.16 samples/sec   Loss 4.9514   LearningRate 0.0435   Epoch: 14   Global Step: 145850   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:52:44,078-Speed 5970.41 samples/sec   Loss 4.9163   LearningRate 0.0435   Epoch: 14   Global Step: 145860   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:52:50,937-Speed 5972.57 samples/sec   Loss 4.9411   LearningRate 0.0434   Epoch: 14   Global Step: 145870   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:52:57,809-Speed 5961.76 samples/sec   Loss 4.9036   LearningRate 0.0434   Epoch: 14   Global Step: 145880   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:53:04,661-Speed 5980.62 samples/sec   Loss 4.9750   LearningRate 0.0434   Epoch: 14   Global Step: 145890   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:53:11,533-Speed 5961.68 samples/sec   Loss 4.9256   LearningRate 0.0434   Epoch: 14   Global Step: 145900   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:53:18,421-Speed 5947.44 samples/sec   Loss 4.9192   LearningRate 0.0434   Epoch: 14   Global Step: 145910   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:53:25,264-Speed 5987.17 samples/sec   Loss 4.8719   LearningRate 0.0434   Epoch: 14   Global Step: 145920   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:53:32,128-Speed 5968.65 samples/sec   Loss 4.9108   LearningRate 0.0434   Epoch: 14   Global Step: 145930   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:53:38,979-Speed 5979.26 samples/sec   Loss 4.9117   LearningRate 0.0433   Epoch: 14   Global Step: 145940   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:53:45,843-Speed 5968.09 samples/sec   Loss 4.9025   LearningRate 0.0433   Epoch: 14   Global Step: 145950   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:53:52,707-Speed 5967.85 samples/sec   Loss 4.9098   LearningRate 0.0433   Epoch: 14   Global Step: 145960   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:53:59,567-Speed 5972.43 samples/sec   Loss 4.9211   LearningRate 0.0433   Epoch: 14   Global Step: 145970   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:54:06,453-Speed 5949.47 samples/sec   Loss 4.9158   LearningRate 0.0433   Epoch: 14   Global Step: 145980   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:54:13,327-Speed 5960.22 samples/sec   Loss 4.9059   LearningRate 0.0433   Epoch: 14   Global Step: 145990   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:54:20,200-Speed 5960.00 samples/sec   Loss 4.9459   LearningRate 0.0433   Epoch: 14   Global Step: 146000   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:54:27,078-Speed 5957.49 samples/sec   Loss 4.9269   LearningRate 0.0432   Epoch: 14   Global Step: 146010   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:54:33,956-Speed 5955.64 samples/sec   Loss 4.9174   LearningRate 0.0432   Epoch: 14   Global Step: 146020   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:54:40,849-Speed 5943.67 samples/sec   Loss 4.9208   LearningRate 0.0432   Epoch: 14   Global Step: 146030   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:54:47,711-Speed 5971.25 samples/sec   Loss 4.9193   LearningRate 0.0432   Epoch: 14   Global Step: 146040   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:54:54,591-Speed 5955.23 samples/sec   Loss 4.9302   LearningRate 0.0432   Epoch: 14   Global Step: 146050   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:55:01,459-Speed 5964.88 samples/sec   Loss 4.9046   LearningRate 0.0432   Epoch: 14   Global Step: 146060   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:55:08,341-Speed 5952.98 samples/sec   Loss 4.8897   LearningRate 0.0432   Epoch: 14   Global Step: 146070   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:55:15,220-Speed 5955.44 samples/sec   Loss 4.9337   LearningRate 0.0431   Epoch: 14   Global Step: 146080   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:55:22,071-Speed 5979.95 samples/sec   Loss 4.9624   LearningRate 0.0431   Epoch: 14   Global Step: 146090   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:55:28,937-Speed 5966.75 samples/sec   Loss 4.9175   LearningRate 0.0431   Epoch: 14   Global Step: 146100   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:55:35,808-Speed 5964.28 samples/sec   Loss 4.8889   LearningRate 0.0431   Epoch: 14   Global Step: 146110   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:55:42,686-Speed 5955.42 samples/sec   Loss 4.9763   LearningRate 0.0431   Epoch: 14   Global Step: 146120   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:55:49,546-Speed 5972.76 samples/sec   Loss 4.8613   LearningRate 0.0431   Epoch: 14   Global Step: 146130   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:55:56,421-Speed 5959.05 samples/sec   Loss 4.9315   LearningRate 0.0431   Epoch: 14   Global Step: 146140   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:56:03,304-Speed 5951.96 samples/sec   Loss 4.9062   LearningRate 0.0430   Epoch: 14   Global Step: 146150   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:56:10,202-Speed 5940.88 samples/sec   Loss 4.8953   LearningRate 0.0430   Epoch: 14   Global Step: 146160   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:56:17,064-Speed 5969.58 samples/sec   Loss 4.9149   LearningRate 0.0430   Epoch: 14   Global Step: 146170   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:56:23,946-Speed 5952.95 samples/sec   Loss 4.8769   LearningRate 0.0430   Epoch: 14   Global Step: 146180   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:56:30,808-Speed 5972.61 samples/sec   Loss 4.9467   LearningRate 0.0430   Epoch: 14   Global Step: 146190   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:56:37,676-Speed 5964.98 samples/sec   Loss 4.9418   LearningRate 0.0430   Epoch: 14   Global Step: 146200   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:56:44,541-Speed 5967.75 samples/sec   Loss 4.8946   LearningRate 0.0430   Epoch: 14   Global Step: 146210   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:56:51,437-Speed 5940.91 samples/sec   Loss 4.8920   LearningRate 0.0430   Epoch: 14   Global Step: 146220   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:56:58,324-Speed 5948.55 samples/sec   Loss 4.9129   LearningRate 0.0429   Epoch: 14   Global Step: 146230   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:57:05,190-Speed 5966.73 samples/sec   Loss 4.9298   LearningRate 0.0429   Epoch: 14   Global Step: 146240   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:57:12,035-Speed 5985.63 samples/sec   Loss 4.9096   LearningRate 0.0429   Epoch: 14   Global Step: 146250   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:57:18,913-Speed 5958.14 samples/sec   Loss 4.9140   LearningRate 0.0429   Epoch: 14   Global Step: 146260   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:57:25,814-Speed 5936.21 samples/sec   Loss 4.8870   LearningRate 0.0429   Epoch: 14   Global Step: 146270   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:57:32,676-Speed 5972.45 samples/sec   Loss 4.8573   LearningRate 0.0429   Epoch: 14   Global Step: 146280   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:57:39,561-Speed 5952.90 samples/sec   Loss 4.9064   LearningRate 0.0429   Epoch: 14   Global Step: 146290   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:57:46,455-Speed 5942.67 samples/sec   Loss 4.8816   LearningRate 0.0428   Epoch: 14   Global Step: 146300   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:57:53,342-Speed 5948.96 samples/sec   Loss 4.8491   LearningRate 0.0428   Epoch: 14   Global Step: 146310   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:58:00,341-Speed 5854.27 samples/sec   Loss 4.9135   LearningRate 0.0428   Epoch: 14   Global Step: 146320   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:58:07,198-Speed 5974.16 samples/sec   Loss 4.8675   LearningRate 0.0428   Epoch: 14   Global Step: 146330   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:58:14,069-Speed 5962.89 samples/sec   Loss 4.9147   LearningRate 0.0428   Epoch: 14   Global Step: 146340   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:58:21,043-Speed 5874.78 samples/sec   Loss 4.8797   LearningRate 0.0428   Epoch: 14   Global Step: 146350   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:58:28,027-Speed 5866.38 samples/sec   Loss 4.8909   LearningRate 0.0428   Epoch: 14   Global Step: 146360   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:58:34,997-Speed 5877.61 samples/sec   Loss 4.9296   LearningRate 0.0427   Epoch: 14   Global Step: 146370   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:58:41,925-Speed 5914.38 samples/sec   Loss 4.9039   LearningRate 0.0427   Epoch: 14   Global Step: 146380   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:58:48,854-Speed 5911.60 samples/sec   Loss 4.9055   LearningRate 0.0427   Epoch: 14   Global Step: 146390   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:58:55,777-Speed 5920.49 samples/sec   Loss 4.9236   LearningRate 0.0427   Epoch: 14   Global Step: 146400   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:59:02,702-Speed 5916.74 samples/sec   Loss 4.9332   LearningRate 0.0427   Epoch: 14   Global Step: 146410   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:59:09,608-Speed 5931.04 samples/sec   Loss 4.8962   LearningRate 0.0427   Epoch: 14   Global Step: 146420   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:59:16,526-Speed 5923.48 samples/sec   Loss 4.8999   LearningRate 0.0427   Epoch: 14   Global Step: 146430   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:59:23,390-Speed 5968.69 samples/sec   Loss 4.8360   LearningRate 0.0426   Epoch: 14   Global Step: 146440   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:59:30,253-Speed 5968.68 samples/sec   Loss 4.8652   LearningRate 0.0426   Epoch: 14   Global Step: 146450   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 00:59:37,102-Speed 5984.36 samples/sec   Loss 4.9092   LearningRate 0.0426   Epoch: 14   Global Step: 146460   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:59:43,991-Speed 5946.69 samples/sec   Loss 4.8604   LearningRate 0.0426   Epoch: 14   Global Step: 146470   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:59:50,926-Speed 5907.33 samples/sec   Loss 4.8464   LearningRate 0.0426   Epoch: 14   Global Step: 146480   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 00:59:57,866-Speed 5906.14 samples/sec   Loss 4.8901   LearningRate 0.0426   Epoch: 14   Global Step: 146490   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:00:04,802-Speed 5909.37 samples/sec   Loss 4.8759   LearningRate 0.0426   Epoch: 14   Global Step: 146500   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:00:11,702-Speed 5937.59 samples/sec   Loss 4.9150   LearningRate 0.0425   Epoch: 14   Global Step: 146510   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:00:18,561-Speed 5973.18 samples/sec   Loss 4.8660   LearningRate 0.0425   Epoch: 14   Global Step: 146520   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:00:25,438-Speed 5956.83 samples/sec   Loss 4.8767   LearningRate 0.0425   Epoch: 14   Global Step: 146530   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:00:32,281-Speed 5986.79 samples/sec   Loss 4.9177   LearningRate 0.0425   Epoch: 14   Global Step: 146540   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:00:39,143-Speed 5969.75 samples/sec   Loss 4.9008   LearningRate 0.0425   Epoch: 14   Global Step: 146550   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:00:46,010-Speed 5966.67 samples/sec   Loss 4.8330   LearningRate 0.0425   Epoch: 14   Global Step: 146560   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:00:52,918-Speed 5930.41 samples/sec   Loss 4.9106   LearningRate 0.0425   Epoch: 14   Global Step: 146570   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:00:59,819-Speed 5936.48 samples/sec   Loss 4.9100   LearningRate 0.0424   Epoch: 14   Global Step: 146580   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:01:06,740-Speed 5920.02 samples/sec   Loss 4.9143   LearningRate 0.0424   Epoch: 14   Global Step: 146590   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:01:13,614-Speed 5959.60 samples/sec   Loss 4.9041   LearningRate 0.0424   Epoch: 14   Global Step: 146600   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:01:20,490-Speed 5957.97 samples/sec   Loss 4.8550   LearningRate 0.0424   Epoch: 14   Global Step: 146610   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:01:27,384-Speed 5943.33 samples/sec   Loss 4.8394   LearningRate 0.0424   Epoch: 14   Global Step: 146620   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:01:34,253-Speed 5964.08 samples/sec   Loss 4.8737   LearningRate 0.0424   Epoch: 14   Global Step: 146630   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:01:41,109-Speed 5975.17 samples/sec   Loss 4.8375   LearningRate 0.0424   Epoch: 14   Global Step: 146640   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:01:47,983-Speed 5960.73 samples/sec   Loss 4.8843   LearningRate 0.0423   Epoch: 14   Global Step: 146650   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:01:54,831-Speed 5982.24 samples/sec   Loss 4.8868   LearningRate 0.0423   Epoch: 14   Global Step: 146660   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:02:01,697-Speed 5966.30 samples/sec   Loss 4.8899   LearningRate 0.0423   Epoch: 14   Global Step: 146670   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:02:08,564-Speed 5966.69 samples/sec   Loss 4.8469   LearningRate 0.0423   Epoch: 14   Global Step: 146680   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:02:15,433-Speed 5963.73 samples/sec   Loss 4.8751   LearningRate 0.0423   Epoch: 14   Global Step: 146690   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:02:22,310-Speed 5959.68 samples/sec   Loss 4.8913   LearningRate 0.0423   Epoch: 14   Global Step: 146700   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:02:29,167-Speed 5974.72 samples/sec   Loss 4.8357   LearningRate 0.0423   Epoch: 14   Global Step: 146710   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:02:36,049-Speed 5952.66 samples/sec   Loss 4.9117   LearningRate 0.0423   Epoch: 14   Global Step: 146720   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:02:42,919-Speed 5966.45 samples/sec   Loss 4.8321   LearningRate 0.0422   Epoch: 14   Global Step: 146730   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:02:49,802-Speed 5952.49 samples/sec   Loss 4.8959   LearningRate 0.0422   Epoch: 14   Global Step: 146740   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:02:56,669-Speed 5965.68 samples/sec   Loss 4.9113   LearningRate 0.0422   Epoch: 14   Global Step: 146750   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:03:03,532-Speed 5968.84 samples/sec   Loss 4.8742   LearningRate 0.0422   Epoch: 14   Global Step: 146760   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:03:10,395-Speed 5970.45 samples/sec   Loss 4.8275   LearningRate 0.0422   Epoch: 14   Global Step: 146770   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:03:17,260-Speed 5970.73 samples/sec   Loss 4.8980   LearningRate 0.0422   Epoch: 14   Global Step: 146780   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:03:24,135-Speed 5959.44 samples/sec   Loss 4.8475   LearningRate 0.0422   Epoch: 14   Global Step: 146790   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:03:31,003-Speed 5965.40 samples/sec   Loss 4.8847   LearningRate 0.0421   Epoch: 14   Global Step: 146800   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:03:37,908-Speed 5932.85 samples/sec   Loss 4.8370   LearningRate 0.0421   Epoch: 14   Global Step: 146810   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:03:44,822-Speed 5925.79 samples/sec   Loss 4.8173   LearningRate 0.0421   Epoch: 14   Global Step: 146820   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:03:51,703-Speed 5954.02 samples/sec   Loss 4.8893   LearningRate 0.0421   Epoch: 14   Global Step: 146830   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:03:58,577-Speed 5959.08 samples/sec   Loss 4.8383   LearningRate 0.0421   Epoch: 14   Global Step: 146840   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:04:05,450-Speed 5963.90 samples/sec   Loss 4.8478   LearningRate 0.0421   Epoch: 14   Global Step: 146850   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:04:12,305-Speed 5975.75 samples/sec   Loss 4.8095   LearningRate 0.0421   Epoch: 14   Global Step: 146860   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:04:19,160-Speed 5976.07 samples/sec   Loss 4.8500   LearningRate 0.0420   Epoch: 14   Global Step: 146870   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:04:26,036-Speed 5958.83 samples/sec   Loss 4.8413   LearningRate 0.0420   Epoch: 14   Global Step: 146880   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:04:32,907-Speed 5963.76 samples/sec   Loss 4.8816   LearningRate 0.0420   Epoch: 14   Global Step: 146890   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:04:39,758-Speed 5979.77 samples/sec   Loss 4.8156   LearningRate 0.0420   Epoch: 14   Global Step: 146900   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:04:46,631-Speed 5960.71 samples/sec   Loss 4.8413   LearningRate 0.0420   Epoch: 14   Global Step: 146910   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:04:53,508-Speed 5957.37 samples/sec   Loss 4.8170   LearningRate 0.0420   Epoch: 14   Global Step: 146920   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:05:00,384-Speed 5958.50 samples/sec   Loss 4.8531   LearningRate 0.0420   Epoch: 14   Global Step: 146930   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:05:07,272-Speed 5947.59 samples/sec   Loss 4.8667   LearningRate 0.0419   Epoch: 14   Global Step: 146940   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:05:14,127-Speed 5976.35 samples/sec   Loss 4.8498   LearningRate 0.0419   Epoch: 14   Global Step: 146950   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:05:20,983-Speed 5975.17 samples/sec   Loss 4.9177   LearningRate 0.0419   Epoch: 14   Global Step: 146960   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:05:27,833-Speed 5981.10 samples/sec   Loss 4.8594   LearningRate 0.0419   Epoch: 14   Global Step: 146970   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:05:34,827-Speed 5857.50 samples/sec   Loss 4.8635   LearningRate 0.0419   Epoch: 14   Global Step: 146980   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:05:41,685-Speed 5975.01 samples/sec   Loss 4.9038   LearningRate 0.0419   Epoch: 14   Global Step: 146990   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 01:05:48,546-Speed 5971.47 samples/sec   Loss 4.8351   LearningRate 0.0419   Epoch: 14   Global Step: 147000   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 01:05:55,415-Speed 5964.00 samples/sec   Loss 4.8377   LearningRate 0.0418   Epoch: 14   Global Step: 147010   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 01:06:02,288-Speed 5961.21 samples/sec   Loss 4.8430   LearningRate 0.0418   Epoch: 14   Global Step: 147020   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 01:06:09,158-Speed 5962.82 samples/sec   Loss 4.8174   LearningRate 0.0418   Epoch: 14   Global Step: 147030   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 01:06:16,041-Speed 5953.00 samples/sec   Loss 4.8688   LearningRate 0.0418   Epoch: 14   Global Step: 147040   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 01:06:22,915-Speed 5959.41 samples/sec   Loss 4.8721   LearningRate 0.0418   Epoch: 14   Global Step: 147050   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 01:06:29,774-Speed 5972.63 samples/sec   Loss 4.8482   LearningRate 0.0418   Epoch: 14   Global Step: 147060   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 01:06:36,619-Speed 5986.08 samples/sec   Loss 4.8070   LearningRate 0.0418   Epoch: 14   Global Step: 147070   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:06:43,483-Speed 5967.35 samples/sec   Loss 4.8744   LearningRate 0.0418   Epoch: 14   Global Step: 147080   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:06:50,347-Speed 5968.95 samples/sec   Loss 4.8098   LearningRate 0.0417   Epoch: 14   Global Step: 147090   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:06:57,204-Speed 5974.89 samples/sec   Loss 4.7702   LearningRate 0.0417   Epoch: 14   Global Step: 147100   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:07:04,045-Speed 5987.65 samples/sec   Loss 4.8260   LearningRate 0.0417   Epoch: 14   Global Step: 147110   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:07:10,918-Speed 5961.10 samples/sec   Loss 4.8192   LearningRate 0.0417   Epoch: 14   Global Step: 147120   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:07:17,779-Speed 5971.00 samples/sec   Loss 4.8612   LearningRate 0.0417   Epoch: 14   Global Step: 147130   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:07:24,644-Speed 5968.09 samples/sec   Loss 4.9038   LearningRate 0.0417   Epoch: 14   Global Step: 147140   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:07:31,531-Speed 5948.80 samples/sec   Loss 4.8376   LearningRate 0.0417   Epoch: 14   Global Step: 147150   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:07:38,405-Speed 5960.07 samples/sec   Loss 4.8250   LearningRate 0.0416   Epoch: 14   Global Step: 147160   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:07:45,254-Speed 5980.52 samples/sec   Loss 4.8133   LearningRate 0.0416   Epoch: 14   Global Step: 147170   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:07:52,122-Speed 5964.77 samples/sec   Loss 4.8442   LearningRate 0.0416   Epoch: 14   Global Step: 147180   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:07:58,984-Speed 5976.67 samples/sec   Loss 4.8494   LearningRate 0.0416   Epoch: 14   Global Step: 147190   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:08:05,838-Speed 5977.49 samples/sec   Loss 4.8436   LearningRate 0.0416   Epoch: 14   Global Step: 147200   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:08:12,706-Speed 5964.94 samples/sec   Loss 4.8725   LearningRate 0.0416   Epoch: 14   Global Step: 147210   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:08:19,648-Speed 5901.66 samples/sec   Loss 4.8622   LearningRate 0.0416   Epoch: 14   Global Step: 147220   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:08:26,504-Speed 5974.88 samples/sec   Loss 4.8132   LearningRate 0.0415   Epoch: 14   Global Step: 147230   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:08:33,371-Speed 5966.89 samples/sec   Loss 4.8039   LearningRate 0.0415   Epoch: 14   Global Step: 147240   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:08:40,272-Speed 5936.54 samples/sec   Loss 4.8405   LearningRate 0.0415   Epoch: 14   Global Step: 147250   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:08:47,139-Speed 5965.93 samples/sec   Loss 4.7996   LearningRate 0.0415   Epoch: 14   Global Step: 147260   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:08:54,004-Speed 5967.59 samples/sec   Loss 4.8592   LearningRate 0.0415   Epoch: 14   Global Step: 147270   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:09:00,868-Speed 5968.46 samples/sec   Loss 4.8320   LearningRate 0.0415   Epoch: 14   Global Step: 147280   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:09:07,721-Speed 5978.20 samples/sec   Loss 4.7429   LearningRate 0.0415   Epoch: 14   Global Step: 147290   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:09:14,595-Speed 5959.62 samples/sec   Loss 4.8202   LearningRate 0.0414   Epoch: 14   Global Step: 147300   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:09:21,484-Speed 5947.45 samples/sec   Loss 4.8327   LearningRate 0.0414   Epoch: 14   Global Step: 147310   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 01:09:28,360-Speed 5958.02 samples/sec   Loss 4.8413   LearningRate 0.0414   Epoch: 14   Global Step: 147320   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:09:35,231-Speed 5962.40 samples/sec   Loss 4.8459   LearningRate 0.0414   Epoch: 14   Global Step: 147330   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:09:42,085-Speed 5978.06 samples/sec   Loss 4.8747   LearningRate 0.0414   Epoch: 14   Global Step: 147340   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:09:48,942-Speed 5974.46 samples/sec   Loss 4.8048   LearningRate 0.0414   Epoch: 14   Global Step: 147350   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:09:55,794-Speed 5979.47 samples/sec   Loss 4.8309   LearningRate 0.0414   Epoch: 14   Global Step: 147360   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:10:02,652-Speed 5973.95 samples/sec   Loss 4.8191   LearningRate 0.0414   Epoch: 14   Global Step: 147370   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:10:09,528-Speed 5958.08 samples/sec   Loss 4.8366   LearningRate 0.0413   Epoch: 14   Global Step: 147380   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:10:16,401-Speed 5961.52 samples/sec   Loss 4.8323   LearningRate 0.0413   Epoch: 14   Global Step: 147390   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:10:23,277-Speed 5960.63 samples/sec   Loss 4.7764   LearningRate 0.0413   Epoch: 14   Global Step: 147400   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:10:30,144-Speed 5965.55 samples/sec   Loss 4.7774   LearningRate 0.0413   Epoch: 14   Global Step: 147410   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:10:37,016-Speed 5962.74 samples/sec   Loss 4.7749   LearningRate 0.0413   Epoch: 14   Global Step: 147420   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 01:10:43,908-Speed 5944.64 samples/sec   Loss 4.8186   LearningRate 0.0413   Epoch: 14   Global Step: 147430   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:10:50,763-Speed 5976.34 samples/sec   Loss 4.8163   LearningRate 0.0413   Epoch: 14   Global Step: 147440   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:10:57,636-Speed 5962.11 samples/sec   Loss 4.7792   LearningRate 0.0412   Epoch: 14   Global Step: 147450   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:11:04,503-Speed 5968.52 samples/sec   Loss 4.7924   LearningRate 0.0412   Epoch: 14   Global Step: 147460   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:11:11,386-Speed 5953.26 samples/sec   Loss 4.8083   LearningRate 0.0412   Epoch: 14   Global Step: 147470   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:11:18,269-Speed 5952.60 samples/sec   Loss 4.8412   LearningRate 0.0412   Epoch: 14   Global Step: 147480   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:11:25,130-Speed 5971.27 samples/sec   Loss 4.8166   LearningRate 0.0412   Epoch: 14   Global Step: 147490   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:11:32,017-Speed 5948.53 samples/sec   Loss 4.8019   LearningRate 0.0412   Epoch: 14   Global Step: 147500   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:11:38,872-Speed 5977.10 samples/sec   Loss 4.8075   LearningRate 0.0412   Epoch: 14   Global Step: 147510   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:11:45,720-Speed 5981.83 samples/sec   Loss 4.8049   LearningRate 0.0411   Epoch: 14   Global Step: 147520   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:11:52,592-Speed 5962.04 samples/sec   Loss 4.8350   LearningRate 0.0411   Epoch: 14   Global Step: 147530   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:11:59,447-Speed 5976.74 samples/sec   Loss 4.8247   LearningRate 0.0411   Epoch: 14   Global Step: 147540   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:12:06,300-Speed 5977.56 samples/sec   Loss 4.8156   LearningRate 0.0411   Epoch: 14   Global Step: 147550   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:12:13,164-Speed 5968.61 samples/sec   Loss 4.8320   LearningRate 0.0411   Epoch: 14   Global Step: 147560   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:12:20,036-Speed 5964.04 samples/sec   Loss 4.8239   LearningRate 0.0411   Epoch: 14   Global Step: 147570   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:12:26,899-Speed 5969.23 samples/sec   Loss 4.7861   LearningRate 0.0411   Epoch: 14   Global Step: 147580   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:12:33,760-Speed 5970.36 samples/sec   Loss 4.8164   LearningRate 0.0410   Epoch: 14   Global Step: 147590   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:12:40,625-Speed 5969.70 samples/sec   Loss 4.7774   LearningRate 0.0410   Epoch: 14   Global Step: 147600   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:12:47,478-Speed 5977.95 samples/sec   Loss 4.8220   LearningRate 0.0410   Epoch: 14   Global Step: 147610   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:12:54,338-Speed 5972.35 samples/sec   Loss 4.7795   LearningRate 0.0410   Epoch: 14   Global Step: 147620   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:13:01,228-Speed 5945.81 samples/sec   Loss 4.8160   LearningRate 0.0410   Epoch: 14   Global Step: 147630   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:13:08,135-Speed 5934.48 samples/sec   Loss 4.8323   LearningRate 0.0410   Epoch: 14   Global Step: 147640   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:13:14,997-Speed 5973.65 samples/sec   Loss 4.7803   LearningRate 0.0410   Epoch: 14   Global Step: 147650   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:13:21,871-Speed 5960.25 samples/sec   Loss 4.8390   LearningRate 0.0410   Epoch: 14   Global Step: 147660   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:13:28,752-Speed 5953.53 samples/sec   Loss 4.8035   LearningRate 0.0409   Epoch: 14   Global Step: 147670   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:13:35,600-Speed 5981.65 samples/sec   Loss 4.8191   LearningRate 0.0409   Epoch: 14   Global Step: 147680   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:13:42,463-Speed 5969.67 samples/sec   Loss 4.7431   LearningRate 0.0409   Epoch: 14   Global Step: 147690   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:13:49,325-Speed 5970.30 samples/sec   Loss 4.8236   LearningRate 0.0409   Epoch: 14   Global Step: 147700   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:13:56,199-Speed 5959.72 samples/sec   Loss 4.7835   LearningRate 0.0409   Epoch: 14   Global Step: 147710   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:14:03,068-Speed 5964.50 samples/sec   Loss 4.7386   LearningRate 0.0409   Epoch: 14   Global Step: 147720   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:14:09,930-Speed 5970.49 samples/sec   Loss 4.8074   LearningRate 0.0409   Epoch: 14   Global Step: 147730   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:14:16,789-Speed 5972.86 samples/sec   Loss 4.8677   LearningRate 0.0408   Epoch: 14   Global Step: 147740   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:14:23,649-Speed 5971.78 samples/sec   Loss 4.7850   LearningRate 0.0408   Epoch: 14   Global Step: 147750   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:14:30,523-Speed 5960.11 samples/sec   Loss 4.7614   LearningRate 0.0408   Epoch: 14   Global Step: 147760   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:14:37,373-Speed 5980.21 samples/sec   Loss 4.8189   LearningRate 0.0408   Epoch: 14   Global Step: 147770   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:14:44,234-Speed 5970.96 samples/sec   Loss 4.8309   LearningRate 0.0408   Epoch: 14   Global Step: 147780   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:14:51,104-Speed 5966.87 samples/sec   Loss 4.8356   LearningRate 0.0408   Epoch: 14   Global Step: 147790   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:14:57,960-Speed 5975.27 samples/sec   Loss 4.8047   LearningRate 0.0408   Epoch: 14   Global Step: 147800   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:15:04,812-Speed 5979.66 samples/sec   Loss 4.7671   LearningRate 0.0407   Epoch: 14   Global Step: 147810   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:15:11,671-Speed 5973.01 samples/sec   Loss 4.7617   LearningRate 0.0407   Epoch: 14   Global Step: 147820   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:15:18,534-Speed 5969.30 samples/sec   Loss 4.7907   LearningRate 0.0407   Epoch: 14   Global Step: 147830   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:15:25,399-Speed 5967.48 samples/sec   Loss 4.8483   LearningRate 0.0407   Epoch: 14   Global Step: 147840   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:15:32,270-Speed 5962.62 samples/sec   Loss 4.7995   LearningRate 0.0407   Epoch: 14   Global Step: 147850   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:15:39,140-Speed 5963.00 samples/sec   Loss 4.7868   LearningRate 0.0407   Epoch: 14   Global Step: 147860   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:15:45,995-Speed 5976.32 samples/sec   Loss 4.7692   LearningRate 0.0407   Epoch: 14   Global Step: 147870   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:15:52,856-Speed 5970.88 samples/sec   Loss 4.8038   LearningRate 0.0407   Epoch: 14   Global Step: 147880   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 01:15:59,725-Speed 5964.03 samples/sec   Loss 4.7749   LearningRate 0.0406   Epoch: 14   Global Step: 147890   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 01:16:06,586-Speed 5971.62 samples/sec   Loss 4.7410   LearningRate 0.0406   Epoch: 14   Global Step: 147900   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 01:16:13,451-Speed 5970.01 samples/sec   Loss 4.7923   LearningRate 0.0406   Epoch: 14   Global Step: 147910   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 01:16:20,306-Speed 5975.77 samples/sec   Loss 4.7842   LearningRate 0.0406   Epoch: 14   Global Step: 147920   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 01:16:27,186-Speed 5957.90 samples/sec   Loss 4.7679   LearningRate 0.0406   Epoch: 14   Global Step: 147930   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:16:34,052-Speed 5967.37 samples/sec   Loss 4.7683   LearningRate 0.0406   Epoch: 14   Global Step: 147940   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:16:40,914-Speed 5969.39 samples/sec   Loss 4.8096   LearningRate 0.0406   Epoch: 14   Global Step: 147950   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:16:47,797-Speed 5952.96 samples/sec   Loss 4.7681   LearningRate 0.0405   Epoch: 14   Global Step: 147960   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:16:54,665-Speed 5965.14 samples/sec   Loss 4.7933   LearningRate 0.0405   Epoch: 14   Global Step: 147970   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:17:01,520-Speed 5976.18 samples/sec   Loss 4.7502   LearningRate 0.0405   Epoch: 14   Global Step: 147980   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:17:08,389-Speed 5964.05 samples/sec   Loss 4.7488   LearningRate 0.0405   Epoch: 14   Global Step: 147990   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:17:15,258-Speed 5964.58 samples/sec   Loss 4.7789   LearningRate 0.0405   Epoch: 14   Global Step: 148000   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:17:22,115-Speed 5974.64 samples/sec   Loss 4.7844   LearningRate 0.0405   Epoch: 14   Global Step: 148010   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:17:28,980-Speed 5968.15 samples/sec   Loss 4.7901   LearningRate 0.0405   Epoch: 14   Global Step: 148020   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:17:35,833-Speed 5978.82 samples/sec   Loss 4.8609   LearningRate 0.0404   Epoch: 14   Global Step: 148030   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 01:17:42,695-Speed 5969.55 samples/sec   Loss 4.7580   LearningRate 0.0404   Epoch: 14   Global Step: 148040   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 01:17:49,568-Speed 5961.03 samples/sec   Loss 4.7639   LearningRate 0.0404   Epoch: 14   Global Step: 148050   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 01:17:56,461-Speed 5944.10 samples/sec   Loss 4.7558   LearningRate 0.0404   Epoch: 14   Global Step: 148060   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 01:18:03,333-Speed 5961.57 samples/sec   Loss 4.7903   LearningRate 0.0404   Epoch: 14   Global Step: 148070   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:18:10,193-Speed 5972.51 samples/sec   Loss 4.7885   LearningRate 0.0404   Epoch: 14   Global Step: 148080   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:18:17,072-Speed 5955.21 samples/sec   Loss 4.8009   LearningRate 0.0404   Epoch: 14   Global Step: 148090   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:18:23,939-Speed 5966.01 samples/sec   Loss 4.8083   LearningRate 0.0404   Epoch: 14   Global Step: 148100   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:18:30,822-Speed 5952.91 samples/sec   Loss 4.7722   LearningRate 0.0403   Epoch: 14   Global Step: 148110   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:18:37,687-Speed 5968.86 samples/sec   Loss 4.7510   LearningRate 0.0403   Epoch: 14   Global Step: 148120   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:18:44,549-Speed 5970.03 samples/sec   Loss 4.7543   LearningRate 0.0403   Epoch: 14   Global Step: 148130   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:18:51,422-Speed 5961.41 samples/sec   Loss 4.7813   LearningRate 0.0403   Epoch: 14   Global Step: 148140   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:18:58,313-Speed 5945.46 samples/sec   Loss 4.7696   LearningRate 0.0403   Epoch: 14   Global Step: 148150   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:19:05,173-Speed 5971.74 samples/sec   Loss 4.7816   LearningRate 0.0403   Epoch: 14   Global Step: 148160   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:19:12,026-Speed 5978.32 samples/sec   Loss 4.7893   LearningRate 0.0403   Epoch: 14   Global Step: 148170   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 01:19:18,884-Speed 5973.79 samples/sec   Loss 4.7816   LearningRate 0.0402   Epoch: 14   Global Step: 148180   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 01:19:25,757-Speed 5960.62 samples/sec   Loss 4.7739   LearningRate 0.0402   Epoch: 14   Global Step: 148190   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 01:19:32,611-Speed 5977.05 samples/sec   Loss 4.7967   LearningRate 0.0402   Epoch: 14   Global Step: 148200   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 01:19:39,480-Speed 5964.06 samples/sec   Loss 4.7985   LearningRate 0.0402   Epoch: 14   Global Step: 148210   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 01:19:46,334-Speed 5976.80 samples/sec   Loss 4.7899   LearningRate 0.0402   Epoch: 14   Global Step: 148220   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 01:19:53,177-Speed 5986.71 samples/sec   Loss 4.7795   LearningRate 0.0402   Epoch: 14   Global Step: 148230   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:20:00,035-Speed 5973.20 samples/sec   Loss 4.7265   LearningRate 0.0402   Epoch: 14   Global Step: 148240   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:20:06,942-Speed 5931.86 samples/sec   Loss 4.7496   LearningRate 0.0401   Epoch: 14   Global Step: 148250   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:20:13,801-Speed 5973.40 samples/sec   Loss 4.7801   LearningRate 0.0401   Epoch: 14   Global Step: 148260   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:20:20,664-Speed 5970.09 samples/sec   Loss 4.7537   LearningRate 0.0401   Epoch: 14   Global Step: 148270   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:20:27,535-Speed 5961.94 samples/sec   Loss 4.7516   LearningRate 0.0401   Epoch: 14   Global Step: 148280   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:20:34,405-Speed 5963.26 samples/sec   Loss 4.7555   LearningRate 0.0401   Epoch: 14   Global Step: 148290   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:20:41,269-Speed 5969.29 samples/sec   Loss 4.7631   LearningRate 0.0401   Epoch: 14   Global Step: 148300   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:20:48,138-Speed 5964.34 samples/sec   Loss 4.7299   LearningRate 0.0401   Epoch: 14   Global Step: 148310   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:20:55,005-Speed 5965.78 samples/sec   Loss 4.7576   LearningRate 0.0401   Epoch: 14   Global Step: 148320   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:21:01,869-Speed 5968.98 samples/sec   Loss 4.7478   LearningRate 0.0400   Epoch: 14   Global Step: 148330   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 01:21:08,743-Speed 5959.48 samples/sec   Loss 4.7484   LearningRate 0.0400   Epoch: 14   Global Step: 148340   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 01:21:15,622-Speed 5955.42 samples/sec   Loss 4.7332   LearningRate 0.0400   Epoch: 14   Global Step: 148350   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 01:21:22,482-Speed 5972.63 samples/sec   Loss 4.7646   LearningRate 0.0400   Epoch: 14   Global Step: 148360   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:21:29,342-Speed 5971.69 samples/sec   Loss 4.7808   LearningRate 0.0400   Epoch: 14   Global Step: 148370   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:21:36,215-Speed 5962.06 samples/sec   Loss 4.7345   LearningRate 0.0400   Epoch: 14   Global Step: 148380   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:21:43,089-Speed 5959.70 samples/sec   Loss 4.7595   LearningRate 0.0400   Epoch: 14   Global Step: 148390   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:21:49,946-Speed 5974.86 samples/sec   Loss 4.7154   LearningRate 0.0399   Epoch: 14   Global Step: 148400   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:21:56,800-Speed 5978.89 samples/sec   Loss 4.7370   LearningRate 0.0399   Epoch: 14   Global Step: 148410   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:22:03,664-Speed 5968.58 samples/sec   Loss 4.7511   LearningRate 0.0399   Epoch: 14   Global Step: 148420   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:22:10,527-Speed 5969.99 samples/sec   Loss 4.7408   LearningRate 0.0399   Epoch: 14   Global Step: 148430   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:22:17,399-Speed 5961.27 samples/sec   Loss 4.7477   LearningRate 0.0399   Epoch: 14   Global Step: 148440   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:22:24,260-Speed 5971.71 samples/sec   Loss 4.7388   LearningRate 0.0399   Epoch: 14   Global Step: 148450   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:22:31,115-Speed 5975.95 samples/sec   Loss 4.7226   LearningRate 0.0399   Epoch: 14   Global Step: 148460   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 01:22:37,958-Speed 5986.20 samples/sec   Loss 4.7936   LearningRate 0.0398   Epoch: 14   Global Step: 148470   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:22:44,823-Speed 5967.79 samples/sec   Loss 4.7573   LearningRate 0.0398   Epoch: 14   Global Step: 148480   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:22:51,693-Speed 5962.66 samples/sec   Loss 4.7317   LearningRate 0.0398   Epoch: 14   Global Step: 148490   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:22:58,553-Speed 5973.52 samples/sec   Loss 4.7040   LearningRate 0.0398   Epoch: 14   Global Step: 148500   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:23:05,410-Speed 5977.50 samples/sec   Loss 4.7527   LearningRate 0.0398   Epoch: 14   Global Step: 148510   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:23:12,275-Speed 5966.75 samples/sec   Loss 4.7847   LearningRate 0.0398   Epoch: 14   Global Step: 148520   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:23:19,142-Speed 5966.85 samples/sec   Loss 4.7476   LearningRate 0.0398   Epoch: 14   Global Step: 148530   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:23:26,012-Speed 5963.14 samples/sec   Loss 4.7310   LearningRate 0.0398   Epoch: 14   Global Step: 148540   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:23:32,892-Speed 5954.13 samples/sec   Loss 4.7278   LearningRate 0.0397   Epoch: 14   Global Step: 148550   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:23:39,757-Speed 5967.49 samples/sec   Loss 4.7167   LearningRate 0.0397   Epoch: 14   Global Step: 148560   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-01-09 01:23:46,643-Speed 5951.46 samples/sec   Loss 4.7660   LearningRate 0.0397   Epoch: 14   Global Step: 148570   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:23:53,502-Speed 5971.98 samples/sec   Loss 4.7511   LearningRate 0.0397   Epoch: 14   Global Step: 148580   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:24:00,353-Speed 5980.57 samples/sec   Loss 4.7016   LearningRate 0.0397   Epoch: 14   Global Step: 148590   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:24:07,224-Speed 5962.13 samples/sec   Loss 4.7403   LearningRate 0.0397   Epoch: 14   Global Step: 148600   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:24:14,110-Speed 5949.58 samples/sec   Loss 4.7164   LearningRate 0.0397   Epoch: 14   Global Step: 148610   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:24:20,992-Speed 5968.45 samples/sec   Loss 4.7168   LearningRate 0.0396   Epoch: 14   Global Step: 148620   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:24:29,037-Speed 5955.02 samples/sec   Loss 4.7427   LearningRate 0.0396   Epoch: 14   Global Step: 148630   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:24:35,962-Speed 5915.79 samples/sec   Loss 4.7100   LearningRate 0.0396   Epoch: 14   Global Step: 148640   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:24:42,843-Speed 5953.45 samples/sec   Loss 4.7464   LearningRate 0.0396   Epoch: 14   Global Step: 148650   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:24:49,697-Speed 5977.97 samples/sec   Loss 4.7392   LearningRate 0.0396   Epoch: 14   Global Step: 148660   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:24:56,556-Speed 5972.88 samples/sec   Loss 4.7471   LearningRate 0.0396   Epoch: 14   Global Step: 148670   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 01:25:03,497-Speed 5902.43 samples/sec   Loss 4.7617   LearningRate 0.0396   Epoch: 14   Global Step: 148680   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 01:25:10,487-Speed 5861.22 samples/sec   Loss 4.7647   LearningRate 0.0396   Epoch: 14   Global Step: 148690   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-01-09 01:25:17,333-Speed 5984.26 samples/sec   Loss 4.7091   LearningRate 0.0395   Epoch: 14   Global Step: 148700   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:25:24,215-Speed 5953.28 samples/sec   Loss 4.6971   LearningRate 0.0395   Epoch: 14   Global Step: 148710   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:25:31,061-Speed 5983.93 samples/sec   Loss 4.7276   LearningRate 0.0395   Epoch: 14   Global Step: 148720   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-01-09 01:25:37,938-Speed 5956.88 samples/sec   Loss 4.7490   LearningRate 0.0395   Epoch: 14   Global Step: 148730   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:25:44,848-Speed 5929.09 samples/sec   Loss 4.7289   LearningRate 0.0395   Epoch: 14   Global Step: 148740   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:25:51,712-Speed 5969.20 samples/sec   Loss 4.7001   LearningRate 0.0395   Epoch: 14   Global Step: 148750   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:25:58,571-Speed 5972.77 samples/sec   Loss 4.7360   LearningRate 0.0395   Epoch: 14   Global Step: 148760   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:26:05,434-Speed 5969.22 samples/sec   Loss 4.7006   LearningRate 0.0394   Epoch: 14   Global Step: 148770   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:26:12,314-Speed 5954.66 samples/sec   Loss 4.6988   LearningRate 0.0394   Epoch: 14   Global Step: 148780   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:26:19,181-Speed 5966.41 samples/sec   Loss 4.7941   LearningRate 0.0394   Epoch: 14   Global Step: 148790   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:26:26,036-Speed 5976.67 samples/sec   Loss 4.7477   LearningRate 0.0394   Epoch: 14   Global Step: 148800   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 01:26:32,893-Speed 5976.41 samples/sec   Loss 4.7549   LearningRate 0.0394   Epoch: 14   Global Step: 148810   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 01:26:39,757-Speed 5968.50 samples/sec   Loss 4.7317   LearningRate 0.0394   Epoch: 14   Global Step: 148820   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 01:26:46,629-Speed 5961.29 samples/sec   Loss 4.6833   LearningRate 0.0394   Epoch: 14   Global Step: 148830   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 01:26:53,479-Speed 5980.31 samples/sec   Loss 4.7723   LearningRate 0.0394   Epoch: 14   Global Step: 148840   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:27:00,332-Speed 5977.72 samples/sec   Loss 4.7345   LearningRate 0.0393   Epoch: 14   Global Step: 148850   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:27:07,187-Speed 5976.78 samples/sec   Loss 4.7522   LearningRate 0.0393   Epoch: 14   Global Step: 148860   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:27:14,051-Speed 5969.97 samples/sec   Loss 4.7472   LearningRate 0.0393   Epoch: 14   Global Step: 148870   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:27:20,912-Speed 5971.15 samples/sec   Loss 4.6954   LearningRate 0.0393   Epoch: 14   Global Step: 148880   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:27:27,775-Speed 5969.12 samples/sec   Loss 4.7016   LearningRate 0.0393   Epoch: 14   Global Step: 148890   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:27:34,653-Speed 5956.98 samples/sec   Loss 4.7084   LearningRate 0.0393   Epoch: 14   Global Step: 148900   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:27:41,539-Speed 5949.18 samples/sec   Loss 4.6813   LearningRate 0.0393   Epoch: 14   Global Step: 148910   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:27:48,403-Speed 5970.81 samples/sec   Loss 4.7512   LearningRate 0.0392   Epoch: 14   Global Step: 148920   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:27:55,307-Speed 5933.59 samples/sec   Loss 4.7498   LearningRate 0.0392   Epoch: 14   Global Step: 148930   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:28:02,167-Speed 5971.50 samples/sec   Loss 4.6929   LearningRate 0.0392   Epoch: 14   Global Step: 148940   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 01:28:09,038-Speed 5964.67 samples/sec   Loss 4.6966   LearningRate 0.0392   Epoch: 14   Global Step: 148950   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 01:28:15,902-Speed 5968.70 samples/sec   Loss 4.7157   LearningRate 0.0392   Epoch: 14   Global Step: 148960   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 01:28:22,747-Speed 5985.03 samples/sec   Loss 4.7168   LearningRate 0.0392   Epoch: 14   Global Step: 148970   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:28:29,635-Speed 5947.95 samples/sec   Loss 4.7377   LearningRate 0.0392   Epoch: 14   Global Step: 148980   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:28:36,507-Speed 5961.78 samples/sec   Loss 4.6846   LearningRate 0.0391   Epoch: 14   Global Step: 148990   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:28:43,358-Speed 5979.63 samples/sec   Loss 4.6983   LearningRate 0.0391   Epoch: 14   Global Step: 149000   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:28:50,232-Speed 5959.50 samples/sec   Loss 4.7289   LearningRate 0.0391   Epoch: 14   Global Step: 149010   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:28:57,105-Speed 5960.82 samples/sec   Loss 4.6933   LearningRate 0.0391   Epoch: 14   Global Step: 149020   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:29:03,997-Speed 5944.83 samples/sec   Loss 4.6879   LearningRate 0.0391   Epoch: 14   Global Step: 149030   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:29:10,878-Speed 5953.41 samples/sec   Loss 4.7212   LearningRate 0.0391   Epoch: 14   Global Step: 149040   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:29:17,746-Speed 5965.06 samples/sec   Loss 4.7042   LearningRate 0.0391   Epoch: 14   Global Step: 149050   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:29:24,606-Speed 5971.88 samples/sec   Loss 4.6860   LearningRate 0.0391   Epoch: 14   Global Step: 149060   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:29:31,465-Speed 5974.65 samples/sec   Loss 4.7251   LearningRate 0.0390   Epoch: 14   Global Step: 149070   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:29:38,321-Speed 5975.19 samples/sec   Loss 4.7119   LearningRate 0.0390   Epoch: 14   Global Step: 149080   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:29:45,252-Speed 5910.44 samples/sec   Loss 4.7100   LearningRate 0.0390   Epoch: 14   Global Step: 149090   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:29:52,133-Speed 5954.14 samples/sec   Loss 4.6738   LearningRate 0.0390   Epoch: 14   Global Step: 149100   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:29:58,989-Speed 5977.75 samples/sec   Loss 4.7112   LearningRate 0.0390   Epoch: 14   Global Step: 149110   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:30:05,859-Speed 5962.71 samples/sec   Loss 4.6983   LearningRate 0.0390   Epoch: 14   Global Step: 149120   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:30:12,727-Speed 5966.88 samples/sec   Loss 4.6756   LearningRate 0.0390   Epoch: 14   Global Step: 149130   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:30:19,583-Speed 5975.97 samples/sec   Loss 4.7130   LearningRate 0.0389   Epoch: 14   Global Step: 149140   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:30:26,470-Speed 5948.39 samples/sec   Loss 4.6889   LearningRate 0.0389   Epoch: 14   Global Step: 149150   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:30:33,359-Speed 5947.40 samples/sec   Loss 4.7325   LearningRate 0.0389   Epoch: 14   Global Step: 149160   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:30:40,237-Speed 5956.57 samples/sec   Loss 4.6701   LearningRate 0.0389   Epoch: 14   Global Step: 149170   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:30:47,099-Speed 5969.99 samples/sec   Loss 4.6668   LearningRate 0.0389   Epoch: 14   Global Step: 149180   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:30:54,804-Speed 5317.72 samples/sec   Loss 4.6643   LearningRate 0.0389   Epoch: 14   Global Step: 149190   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:31:01,685-Speed 5953.91 samples/sec   Loss 4.6845   LearningRate 0.0389   Epoch: 14   Global Step: 149200   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:31:08,544-Speed 5972.69 samples/sec   Loss 4.6976   LearningRate 0.0389   Epoch: 14   Global Step: 149210   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:31:15,417-Speed 5961.09 samples/sec   Loss 4.6565   LearningRate 0.0388   Epoch: 14   Global Step: 149220   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:31:22,274-Speed 5974.51 samples/sec   Loss 4.6704   LearningRate 0.0388   Epoch: 14   Global Step: 149230   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:31:29,135-Speed 5971.70 samples/sec   Loss 4.7259   LearningRate 0.0388   Epoch: 14   Global Step: 149240   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:31:35,996-Speed 5971.08 samples/sec   Loss 4.6777   LearningRate 0.0388   Epoch: 14   Global Step: 149250   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 01:31:42,868-Speed 5962.20 samples/sec   Loss 4.7076   LearningRate 0.0388   Epoch: 14   Global Step: 149260   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 01:31:49,714-Speed 5983.94 samples/sec   Loss 4.6757   LearningRate 0.0388   Epoch: 14   Global Step: 149270   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:31:56,585-Speed 5964.28 samples/sec   Loss 4.7429   LearningRate 0.0388   Epoch: 14   Global Step: 149280   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:32:03,453-Speed 5965.69 samples/sec   Loss 4.7084   LearningRate 0.0387   Epoch: 14   Global Step: 149290   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:32:10,314-Speed 5970.55 samples/sec   Loss 4.7252   LearningRate 0.0387   Epoch: 14   Global Step: 149300   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:32:17,167-Speed 5980.31 samples/sec   Loss 4.7513   LearningRate 0.0387   Epoch: 14   Global Step: 149310   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:32:24,047-Speed 5955.20 samples/sec   Loss 4.6750   LearningRate 0.0387   Epoch: 14   Global Step: 149320   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:32:30,912-Speed 5967.01 samples/sec   Loss 4.6492   LearningRate 0.0387   Epoch: 14   Global Step: 149330   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:32:37,771-Speed 5972.76 samples/sec   Loss 4.6436   LearningRate 0.0387   Epoch: 14   Global Step: 149340   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:32:44,654-Speed 5952.84 samples/sec   Loss 4.6629   LearningRate 0.0387   Epoch: 14   Global Step: 149350   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:32:51,514-Speed 5971.45 samples/sec   Loss 4.6939   LearningRate 0.0387   Epoch: 14   Global Step: 149360   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:32:58,371-Speed 5977.40 samples/sec   Loss 4.6432   LearningRate 0.0386   Epoch: 14   Global Step: 149370   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 01:33:05,230-Speed 5975.77 samples/sec   Loss 4.6688   LearningRate 0.0386   Epoch: 14   Global Step: 149380   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:33:12,105-Speed 5959.16 samples/sec   Loss 4.7061   LearningRate 0.0386   Epoch: 14   Global Step: 149390   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:33:18,979-Speed 5960.40 samples/sec   Loss 4.6691   LearningRate 0.0386   Epoch: 14   Global Step: 149400   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:33:25,840-Speed 5971.07 samples/sec   Loss 4.6297   LearningRate 0.0386   Epoch: 14   Global Step: 149410   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:33:32,703-Speed 5968.68 samples/sec   Loss 4.6461   LearningRate 0.0386   Epoch: 14   Global Step: 149420   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:33:39,559-Speed 5974.89 samples/sec   Loss 4.6804   LearningRate 0.0386   Epoch: 14   Global Step: 149430   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:33:46,419-Speed 5972.82 samples/sec   Loss 4.6763   LearningRate 0.0385   Epoch: 14   Global Step: 149440   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:33:53,302-Speed 5952.16 samples/sec   Loss 4.6948   LearningRate 0.0385   Epoch: 14   Global Step: 149450   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:34:00,168-Speed 5966.62 samples/sec   Loss 4.6613   LearningRate 0.0385   Epoch: 14   Global Step: 149460   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:34:07,037-Speed 5964.09 samples/sec   Loss 4.6828   LearningRate 0.0385   Epoch: 14   Global Step: 149470   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:34:13,896-Speed 5973.18 samples/sec   Loss 4.6578   LearningRate 0.0385   Epoch: 14   Global Step: 149480   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 01:34:20,757-Speed 5973.21 samples/sec   Loss 4.6698   LearningRate 0.0385   Epoch: 14   Global Step: 149490   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:34:27,636-Speed 5955.76 samples/sec   Loss 4.6492   LearningRate 0.0385   Epoch: 14   Global Step: 149500   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:34:34,504-Speed 5964.87 samples/sec   Loss 4.6662   LearningRate 0.0385   Epoch: 14   Global Step: 149510   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:34:41,366-Speed 5970.18 samples/sec   Loss 4.6685   LearningRate 0.0384   Epoch: 14   Global Step: 149520   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:34:48,237-Speed 5964.73 samples/sec   Loss 4.6848   LearningRate 0.0384   Epoch: 14   Global Step: 149530   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:34:55,097-Speed 5970.98 samples/sec   Loss 4.6879   LearningRate 0.0384   Epoch: 14   Global Step: 149540   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:35:01,951-Speed 5979.13 samples/sec   Loss 4.6471   LearningRate 0.0384   Epoch: 14   Global Step: 149550   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:35:08,812-Speed 5973.80 samples/sec   Loss 4.7073   LearningRate 0.0384   Epoch: 14   Global Step: 149560   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:35:15,667-Speed 5976.12 samples/sec   Loss 4.6622   LearningRate 0.0384   Epoch: 14   Global Step: 149570   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:35:22,537-Speed 5963.46 samples/sec   Loss 4.6977   LearningRate 0.0384   Epoch: 14   Global Step: 149580   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:35:29,417-Speed 5955.39 samples/sec   Loss 4.6591   LearningRate 0.0383   Epoch: 14   Global Step: 149590   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 01:35:36,275-Speed 5972.99 samples/sec   Loss 4.6844   LearningRate 0.0383   Epoch: 14   Global Step: 149600   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 01:35:43,139-Speed 5968.65 samples/sec   Loss 4.6502   LearningRate 0.0383   Epoch: 14   Global Step: 149610   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 01:35:49,999-Speed 5971.87 samples/sec   Loss 4.6717   LearningRate 0.0383   Epoch: 14   Global Step: 149620   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 01:35:56,852-Speed 5977.87 samples/sec   Loss 4.6458   LearningRate 0.0383   Epoch: 14   Global Step: 149630   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:36:03,708-Speed 5976.16 samples/sec   Loss 4.6397   LearningRate 0.0383   Epoch: 14   Global Step: 149640   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:36:10,563-Speed 5978.13 samples/sec   Loss 4.7016   LearningRate 0.0383   Epoch: 14   Global Step: 149650   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:36:17,437-Speed 5959.56 samples/sec   Loss 4.6603   LearningRate 0.0383   Epoch: 14   Global Step: 149660   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:36:24,311-Speed 5960.20 samples/sec   Loss 4.6689   LearningRate 0.0382   Epoch: 14   Global Step: 149670   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:36:31,187-Speed 5957.84 samples/sec   Loss 4.6564   LearningRate 0.0382   Epoch: 14   Global Step: 149680   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:36:38,055-Speed 5965.03 samples/sec   Loss 4.6526   LearningRate 0.0382   Epoch: 14   Global Step: 149690   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:36:44,926-Speed 5962.69 samples/sec   Loss 4.6705   LearningRate 0.0382   Epoch: 14   Global Step: 149700   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:36:51,788-Speed 5970.27 samples/sec   Loss 4.6328   LearningRate 0.0382   Epoch: 14   Global Step: 149710   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:36:58,653-Speed 5967.68 samples/sec   Loss 4.7089   LearningRate 0.0382   Epoch: 14   Global Step: 149720   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:37:05,524-Speed 5961.91 samples/sec   Loss 4.6346   LearningRate 0.0382   Epoch: 14   Global Step: 149730   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 01:37:12,387-Speed 5969.90 samples/sec   Loss 4.6370   LearningRate 0.0381   Epoch: 14   Global Step: 149740   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:37:19,243-Speed 5975.22 samples/sec   Loss 4.6780   LearningRate 0.0381   Epoch: 14   Global Step: 149750   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:37:26,096-Speed 5977.85 samples/sec   Loss 4.5460   LearningRate 0.0381   Epoch: 14   Global Step: 149760   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:37:32,951-Speed 5976.95 samples/sec   Loss 4.6744   LearningRate 0.0381   Epoch: 14   Global Step: 149770   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:37:39,831-Speed 5953.60 samples/sec   Loss 4.6439   LearningRate 0.0381   Epoch: 14   Global Step: 149780   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:37:46,696-Speed 5967.50 samples/sec   Loss 4.6758   LearningRate 0.0381   Epoch: 14   Global Step: 149790   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:37:53,573-Speed 5970.95 samples/sec   Loss 4.6412   LearningRate 0.0381   Epoch: 14   Global Step: 149800   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:38:00,443-Speed 5963.27 samples/sec   Loss 4.6614   LearningRate 0.0381   Epoch: 14   Global Step: 149810   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:38:07,309-Speed 5967.08 samples/sec   Loss 4.6430   LearningRate 0.0380   Epoch: 14   Global Step: 149820   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:38:14,241-Speed 5910.00 samples/sec   Loss 4.6367   LearningRate 0.0380   Epoch: 14   Global Step: 149830   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:38:21,198-Speed 5889.00 samples/sec   Loss 4.6410   LearningRate 0.0380   Epoch: 14   Global Step: 149840   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 01:38:28,042-Speed 5985.89 samples/sec   Loss 4.7062   LearningRate 0.0380   Epoch: 14   Global Step: 149850   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:38:34,917-Speed 5959.48 samples/sec   Loss 4.6755   LearningRate 0.0380   Epoch: 14   Global Step: 149860   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:38:41,790-Speed 5961.27 samples/sec   Loss 4.6297   LearningRate 0.0380   Epoch: 14   Global Step: 149870   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:38:48,666-Speed 5957.63 samples/sec   Loss 4.6269   LearningRate 0.0380   Epoch: 14   Global Step: 149880   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:38:55,520-Speed 5977.26 samples/sec   Loss 4.6441   LearningRate 0.0380   Epoch: 14   Global Step: 149890   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:39:02,405-Speed 5950.06 samples/sec   Loss 4.6433   LearningRate 0.0379   Epoch: 14   Global Step: 149900   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:39:09,258-Speed 5978.87 samples/sec   Loss 4.6391   LearningRate 0.0379   Epoch: 14   Global Step: 149910   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:39:16,131-Speed 5961.17 samples/sec   Loss 4.6348   LearningRate 0.0379   Epoch: 14   Global Step: 149920   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:39:23,028-Speed 5939.22 samples/sec   Loss 4.6206   LearningRate 0.0379   Epoch: 14   Global Step: 149930   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:39:29,909-Speed 5954.37 samples/sec   Loss 4.6298   LearningRate 0.0379   Epoch: 14   Global Step: 149940   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:39:36,755-Speed 5985.79 samples/sec   Loss 4.6019   LearningRate 0.0379   Epoch: 14   Global Step: 149950   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:39:43,633-Speed 5956.24 samples/sec   Loss 4.6188   LearningRate 0.0379   Epoch: 14   Global Step: 149960   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:39:50,483-Speed 5981.36 samples/sec   Loss 4.6226   LearningRate 0.0378   Epoch: 14   Global Step: 149970   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:39:57,356-Speed 5960.65 samples/sec   Loss 4.6479   LearningRate 0.0378   Epoch: 14   Global Step: 149980   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:40:04,211-Speed 5975.69 samples/sec   Loss 4.6473   LearningRate 0.0378   Epoch: 14   Global Step: 149990   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:40:11,127-Speed 5924.55 samples/sec   Loss 4.6106   LearningRate 0.0378   Epoch: 14   Global Step: 150000   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:40:38,124-[lfw][150000]XNorm: 22.411340
Training: 2022-01-09 01:40:38,125-[lfw][150000]Accuracy-Flip: 0.99800+-0.00267
Training: 2022-01-09 01:40:38,125-[lfw][150000]Accuracy-Highest: 0.99800
Training: 2022-01-09 01:41:09,219-[cfp_fp][150000]XNorm: 20.023255
Training: 2022-01-09 01:41:09,220-[cfp_fp][150000]Accuracy-Flip: 0.98771+-0.00500
Training: 2022-01-09 01:41:09,221-[cfp_fp][150000]Accuracy-Highest: 0.98771
Training: 2022-01-09 01:41:36,021-[agedb_30][150000]XNorm: 22.346659
Training: 2022-01-09 01:41:36,022-[agedb_30][150000]Accuracy-Flip: 0.97833+-0.00671
Training: 2022-01-09 01:41:36,022-[agedb_30][150000]Accuracy-Highest: 0.97833
Training: 2022-01-09 01:41:42,864-Speed 446.50 samples/sec   Loss 4.6262   LearningRate 0.0378   Epoch: 14   Global Step: 150010   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:41:49,708-Speed 5986.40 samples/sec   Loss 4.6483   LearningRate 0.0378   Epoch: 14   Global Step: 150020   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:41:56,566-Speed 5974.21 samples/sec   Loss 4.6091   LearningRate 0.0378   Epoch: 14   Global Step: 150030   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:42:03,426-Speed 5975.70 samples/sec   Loss 4.6700   LearningRate 0.0378   Epoch: 14   Global Step: 150040   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:42:10,277-Speed 5979.96 samples/sec   Loss 4.6420   LearningRate 0.0377   Epoch: 14   Global Step: 150050   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:42:17,139-Speed 5971.00 samples/sec   Loss 4.6055   LearningRate 0.0377   Epoch: 14   Global Step: 150060   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:42:23,996-Speed 5974.61 samples/sec   Loss 4.6891   LearningRate 0.0377   Epoch: 14   Global Step: 150070   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:42:30,870-Speed 5962.13 samples/sec   Loss 4.6371   LearningRate 0.0377   Epoch: 14   Global Step: 150080   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 01:42:37,717-Speed 5983.71 samples/sec   Loss 4.6737   LearningRate 0.0377   Epoch: 14   Global Step: 150090   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 01:42:44,645-Speed 5912.80 samples/sec   Loss 4.6222   LearningRate 0.0377   Epoch: 14   Global Step: 150100   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:42:51,486-Speed 5988.20 samples/sec   Loss 4.6311   LearningRate 0.0377   Epoch: 14   Global Step: 150110   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:42:58,351-Speed 5968.41 samples/sec   Loss 4.6717   LearningRate 0.0376   Epoch: 14   Global Step: 150120   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:43:05,246-Speed 5941.50 samples/sec   Loss 4.6073   LearningRate 0.0376   Epoch: 14   Global Step: 150130   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:43:12,157-Speed 5927.65 samples/sec   Loss 4.5871   LearningRate 0.0376   Epoch: 14   Global Step: 150140   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:43:19,010-Speed 5978.40 samples/sec   Loss 4.6501   LearningRate 0.0376   Epoch: 14   Global Step: 150150   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:43:25,860-Speed 5980.61 samples/sec   Loss 4.6682   LearningRate 0.0376   Epoch: 14   Global Step: 150160   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:43:32,713-Speed 5977.18 samples/sec   Loss 4.6703   LearningRate 0.0376   Epoch: 14   Global Step: 150170   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:43:39,574-Speed 5971.45 samples/sec   Loss 4.6400   LearningRate 0.0376   Epoch: 14   Global Step: 150180   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:43:46,419-Speed 5985.22 samples/sec   Loss 4.6266   LearningRate 0.0376   Epoch: 14   Global Step: 150190   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:43:53,262-Speed 5986.97 samples/sec   Loss 4.6396   LearningRate 0.0375   Epoch: 14   Global Step: 150200   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 01:44:00,132-Speed 5965.16 samples/sec   Loss 4.6324   LearningRate 0.0375   Epoch: 14   Global Step: 150210   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 01:44:07,002-Speed 5964.19 samples/sec   Loss 4.6348   LearningRate 0.0375   Epoch: 14   Global Step: 150220   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 01:44:13,852-Speed 5980.20 samples/sec   Loss 4.6346   LearningRate 0.0375   Epoch: 14   Global Step: 150230   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 01:44:20,691-Speed 5990.47 samples/sec   Loss 4.6109   LearningRate 0.0375   Epoch: 14   Global Step: 150240   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:44:27,542-Speed 5980.10 samples/sec   Loss 4.6675   LearningRate 0.0375   Epoch: 14   Global Step: 150250   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:44:34,394-Speed 5978.48 samples/sec   Loss 4.6119   LearningRate 0.0375   Epoch: 14   Global Step: 150260   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:44:41,249-Speed 5976.74 samples/sec   Loss 4.6164   LearningRate 0.0375   Epoch: 14   Global Step: 150270   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:44:48,126-Speed 5957.64 samples/sec   Loss 4.6287   LearningRate 0.0374   Epoch: 14   Global Step: 150280   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:44:54,979-Speed 5977.35 samples/sec   Loss 4.5875   LearningRate 0.0374   Epoch: 14   Global Step: 150290   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:45:01,828-Speed 5982.29 samples/sec   Loss 4.5784   LearningRate 0.0374   Epoch: 14   Global Step: 150300   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:45:08,725-Speed 5940.30 samples/sec   Loss 4.6331   LearningRate 0.0374   Epoch: 14   Global Step: 150310   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:45:15,571-Speed 5984.36 samples/sec   Loss 4.6045   LearningRate 0.0374   Epoch: 14   Global Step: 150320   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:45:22,423-Speed 5978.80 samples/sec   Loss 4.6129   LearningRate 0.0374   Epoch: 14   Global Step: 150330   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:45:29,283-Speed 5972.12 samples/sec   Loss 4.6547   LearningRate 0.0374   Epoch: 14   Global Step: 150340   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 01:45:36,119-Speed 5993.18 samples/sec   Loss 4.6446   LearningRate 0.0373   Epoch: 14   Global Step: 150350   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:45:42,971-Speed 5979.32 samples/sec   Loss 4.6367   LearningRate 0.0373   Epoch: 14   Global Step: 150360   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:45:49,841-Speed 5963.79 samples/sec   Loss 4.5764   LearningRate 0.0373   Epoch: 14   Global Step: 150370   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:45:56,696-Speed 5976.01 samples/sec   Loss 4.5664   LearningRate 0.0373   Epoch: 14   Global Step: 150380   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:46:03,547-Speed 5979.90 samples/sec   Loss 4.6063   LearningRate 0.0373   Epoch: 14   Global Step: 150390   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:46:10,399-Speed 5980.67 samples/sec   Loss 4.6234   LearningRate 0.0373   Epoch: 14   Global Step: 150400   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:46:17,247-Speed 5983.01 samples/sec   Loss 4.5833   LearningRate 0.0373   Epoch: 14   Global Step: 150410   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:46:24,099-Speed 5978.80 samples/sec   Loss 4.5839   LearningRate 0.0373   Epoch: 14   Global Step: 150420   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:46:30,963-Speed 5969.03 samples/sec   Loss 4.5923   LearningRate 0.0372   Epoch: 14   Global Step: 150430   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:46:37,825-Speed 5970.18 samples/sec   Loss 4.6228   LearningRate 0.0372   Epoch: 14   Global Step: 150440   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:46:44,683-Speed 5973.65 samples/sec   Loss 4.5576   LearningRate 0.0372   Epoch: 14   Global Step: 150450   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 01:46:51,578-Speed 5941.57 samples/sec   Loss 4.6135   LearningRate 0.0372   Epoch: 14   Global Step: 150460   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 01:46:58,426-Speed 5982.16 samples/sec   Loss 4.5591   LearningRate 0.0372   Epoch: 14   Global Step: 150470   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 01:47:05,265-Speed 5990.35 samples/sec   Loss 4.5971   LearningRate 0.0372   Epoch: 14   Global Step: 150480   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:47:12,114-Speed 5985.37 samples/sec   Loss 4.6151   LearningRate 0.0372   Epoch: 14   Global Step: 150490   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:47:18,969-Speed 5976.48 samples/sec   Loss 4.6530   LearningRate 0.0372   Epoch: 14   Global Step: 150500   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:47:25,820-Speed 5981.47 samples/sec   Loss 4.6167   LearningRate 0.0371   Epoch: 14   Global Step: 150510   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:47:32,682-Speed 5970.01 samples/sec   Loss 4.5762   LearningRate 0.0371   Epoch: 14   Global Step: 150520   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:47:39,593-Speed 5928.23 samples/sec   Loss 4.5519   LearningRate 0.0371   Epoch: 14   Global Step: 150530   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:47:46,453-Speed 5972.21 samples/sec   Loss 4.6160   LearningRate 0.0371   Epoch: 14   Global Step: 150540   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:47:53,307-Speed 5980.62 samples/sec   Loss 4.5710   LearningRate 0.0371   Epoch: 14   Global Step: 150550   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:48:00,160-Speed 5978.05 samples/sec   Loss 4.6083   LearningRate 0.0371   Epoch: 14   Global Step: 150560   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:48:07,029-Speed 5964.26 samples/sec   Loss 4.6127   LearningRate 0.0371   Epoch: 14   Global Step: 150570   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:48:13,904-Speed 5959.18 samples/sec   Loss 4.6096   LearningRate 0.0370   Epoch: 14   Global Step: 150580   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:48:20,771-Speed 5965.44 samples/sec   Loss 4.5987   LearningRate 0.0370   Epoch: 14   Global Step: 150590   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:48:27,640-Speed 5965.08 samples/sec   Loss 4.5722   LearningRate 0.0370   Epoch: 14   Global Step: 150600   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:48:34,553-Speed 5926.17 samples/sec   Loss 4.6109   LearningRate 0.0370   Epoch: 14   Global Step: 150610   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:48:41,481-Speed 5912.88 samples/sec   Loss 4.5694   LearningRate 0.0370   Epoch: 14   Global Step: 150620   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:48:48,332-Speed 5979.88 samples/sec   Loss 4.6388   LearningRate 0.0370   Epoch: 14   Global Step: 150630   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:48:55,181-Speed 5982.02 samples/sec   Loss 4.5919   LearningRate 0.0370   Epoch: 14   Global Step: 150640   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:49:02,041-Speed 5972.75 samples/sec   Loss 4.5906   LearningRate 0.0370   Epoch: 14   Global Step: 150650   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:49:08,904-Speed 5971.72 samples/sec   Loss 4.6038   LearningRate 0.0369   Epoch: 14   Global Step: 150660   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:49:15,777-Speed 5960.56 samples/sec   Loss 4.5863   LearningRate 0.0369   Epoch: 14   Global Step: 150670   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:49:22,660-Speed 5952.21 samples/sec   Loss 4.5869   LearningRate 0.0369   Epoch: 14   Global Step: 150680   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:49:29,522-Speed 5970.34 samples/sec   Loss 4.5698   LearningRate 0.0369   Epoch: 14   Global Step: 150690   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:49:36,369-Speed 5983.90 samples/sec   Loss 4.6550   LearningRate 0.0369   Epoch: 14   Global Step: 150700   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:49:43,220-Speed 5979.05 samples/sec   Loss 4.6447   LearningRate 0.0369   Epoch: 14   Global Step: 150710   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:49:50,074-Speed 5978.92 samples/sec   Loss 4.5802   LearningRate 0.0369   Epoch: 14   Global Step: 150720   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:49:56,935-Speed 5971.65 samples/sec   Loss 4.5411   LearningRate 0.0369   Epoch: 14   Global Step: 150730   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:50:03,783-Speed 5981.69 samples/sec   Loss 4.6041   LearningRate 0.0368   Epoch: 14   Global Step: 150740   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 01:50:10,649-Speed 5966.99 samples/sec   Loss 4.5894   LearningRate 0.0368   Epoch: 14   Global Step: 150750   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:50:17,511-Speed 5970.41 samples/sec   Loss 4.5615   LearningRate 0.0368   Epoch: 14   Global Step: 150760   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:50:24,370-Speed 5972.66 samples/sec   Loss 4.6175   LearningRate 0.0368   Epoch: 14   Global Step: 150770   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:50:31,243-Speed 5960.75 samples/sec   Loss 4.5635   LearningRate 0.0368   Epoch: 14   Global Step: 150780   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:50:38,114-Speed 5963.07 samples/sec   Loss 4.6030   LearningRate 0.0368   Epoch: 14   Global Step: 150790   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:50:44,980-Speed 5966.43 samples/sec   Loss 4.5421   LearningRate 0.0368   Epoch: 14   Global Step: 150800   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:50:51,840-Speed 5972.22 samples/sec   Loss 4.5256   LearningRate 0.0367   Epoch: 14   Global Step: 150810   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:50:58,712-Speed 5961.66 samples/sec   Loss 4.5901   LearningRate 0.0367   Epoch: 14   Global Step: 150820   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:51:05,573-Speed 5971.63 samples/sec   Loss 4.5961   LearningRate 0.0367   Epoch: 14   Global Step: 150830   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:51:12,423-Speed 5980.48 samples/sec   Loss 4.5632   LearningRate 0.0367   Epoch: 14   Global Step: 150840   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:51:19,277-Speed 5977.94 samples/sec   Loss 4.6064   LearningRate 0.0367   Epoch: 14   Global Step: 150850   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:51:26,132-Speed 5976.37 samples/sec   Loss 4.5664   LearningRate 0.0367   Epoch: 14   Global Step: 150860   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:51:32,983-Speed 5980.05 samples/sec   Loss 4.5564   LearningRate 0.0367   Epoch: 14   Global Step: 150870   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:51:39,840-Speed 5974.24 samples/sec   Loss 4.5856   LearningRate 0.0367   Epoch: 14   Global Step: 150880   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:51:46,704-Speed 5968.52 samples/sec   Loss 4.6019   LearningRate 0.0366   Epoch: 14   Global Step: 150890   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:51:53,580-Speed 5958.75 samples/sec   Loss 4.5759   LearningRate 0.0366   Epoch: 14   Global Step: 150900   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:52:00,459-Speed 5955.98 samples/sec   Loss 4.5785   LearningRate 0.0366   Epoch: 14   Global Step: 150910   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:52:07,342-Speed 5952.44 samples/sec   Loss 4.6215   LearningRate 0.0366   Epoch: 14   Global Step: 150920   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:52:14,205-Speed 5969.01 samples/sec   Loss 4.5532   LearningRate 0.0366   Epoch: 14   Global Step: 150930   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:52:21,077-Speed 5961.90 samples/sec   Loss 4.6048   LearningRate 0.0366   Epoch: 14   Global Step: 150940   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:52:27,967-Speed 5945.88 samples/sec   Loss 4.5413   LearningRate 0.0366   Epoch: 14   Global Step: 150950   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:52:34,830-Speed 5969.94 samples/sec   Loss 4.5895   LearningRate 0.0366   Epoch: 14   Global Step: 150960   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:52:41,690-Speed 5972.39 samples/sec   Loss 4.5683   LearningRate 0.0365   Epoch: 14   Global Step: 150970   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:52:48,562-Speed 5961.17 samples/sec   Loss 4.5444   LearningRate 0.0365   Epoch: 14   Global Step: 150980   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:52:55,425-Speed 5969.73 samples/sec   Loss 4.5723   LearningRate 0.0365   Epoch: 14   Global Step: 150990   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:53:02,281-Speed 5975.58 samples/sec   Loss 4.5984   LearningRate 0.0365   Epoch: 14   Global Step: 151000   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:53:09,151-Speed 5963.65 samples/sec   Loss 4.5613   LearningRate 0.0365   Epoch: 14   Global Step: 151010   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:53:16,001-Speed 5980.14 samples/sec   Loss 4.5556   LearningRate 0.0365   Epoch: 14   Global Step: 151020   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:53:22,870-Speed 5964.71 samples/sec   Loss 4.5902   LearningRate 0.0365   Epoch: 14   Global Step: 151030   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:53:29,721-Speed 5979.96 samples/sec   Loss 4.5399   LearningRate 0.0364   Epoch: 14   Global Step: 151040   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:53:36,588-Speed 5968.42 samples/sec   Loss 4.5780   LearningRate 0.0364   Epoch: 14   Global Step: 151050   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 01:53:43,472-Speed 5950.64 samples/sec   Loss 4.5374   LearningRate 0.0364   Epoch: 14   Global Step: 151060   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 01:53:50,327-Speed 5976.87 samples/sec   Loss 4.5556   LearningRate 0.0364   Epoch: 14   Global Step: 151070   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:53:57,241-Speed 5925.21 samples/sec   Loss 4.5300   LearningRate 0.0364   Epoch: 14   Global Step: 151080   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:54:04,110-Speed 5964.25 samples/sec   Loss 4.5916   LearningRate 0.0364   Epoch: 14   Global Step: 151090   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:54:10,961-Speed 5979.34 samples/sec   Loss 4.5902   LearningRate 0.0364   Epoch: 14   Global Step: 151100   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:54:17,821-Speed 5972.00 samples/sec   Loss 4.5677   LearningRate 0.0364   Epoch: 14   Global Step: 151110   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:54:24,681-Speed 5972.09 samples/sec   Loss 4.5317   LearningRate 0.0363   Epoch: 14   Global Step: 151120   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:54:31,541-Speed 5972.08 samples/sec   Loss 4.5619   LearningRate 0.0363   Epoch: 14   Global Step: 151130   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:54:38,398-Speed 5975.19 samples/sec   Loss 4.5301   LearningRate 0.0363   Epoch: 14   Global Step: 151140   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:54:45,291-Speed 5943.10 samples/sec   Loss 4.5767   LearningRate 0.0363   Epoch: 14   Global Step: 151150   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:54:52,137-Speed 5983.66 samples/sec   Loss 4.5143   LearningRate 0.0363   Epoch: 14   Global Step: 151160   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:54:59,005-Speed 5964.86 samples/sec   Loss 4.5324   LearningRate 0.0363   Epoch: 14   Global Step: 151170   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:55:05,872-Speed 5966.69 samples/sec   Loss 4.5843   LearningRate 0.0363   Epoch: 14   Global Step: 151180   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:55:12,724-Speed 5978.78 samples/sec   Loss 4.5230   LearningRate 0.0363   Epoch: 14   Global Step: 151190   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:55:19,594-Speed 5962.94 samples/sec   Loss 4.4888   LearningRate 0.0362   Epoch: 14   Global Step: 151200   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:55:26,464-Speed 5963.06 samples/sec   Loss 4.5608   LearningRate 0.0362   Epoch: 14   Global Step: 151210   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:55:33,317-Speed 5977.18 samples/sec   Loss 4.5664   LearningRate 0.0362   Epoch: 14   Global Step: 151220   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:55:40,178-Speed 5971.57 samples/sec   Loss 4.5168   LearningRate 0.0362   Epoch: 14   Global Step: 151230   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:55:47,055-Speed 5957.98 samples/sec   Loss 4.5977   LearningRate 0.0362   Epoch: 14   Global Step: 151240   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:55:53,929-Speed 5958.93 samples/sec   Loss 4.5509   LearningRate 0.0362   Epoch: 14   Global Step: 151250   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 01:56:00,800-Speed 5963.37 samples/sec   Loss 4.5708   LearningRate 0.0362   Epoch: 14   Global Step: 151260   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:56:07,659-Speed 5972.20 samples/sec   Loss 4.5274   LearningRate 0.0362   Epoch: 14   Global Step: 151270   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:56:14,515-Speed 5975.24 samples/sec   Loss 4.5724   LearningRate 0.0361   Epoch: 14   Global Step: 151280   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:56:21,403-Speed 5948.40 samples/sec   Loss 4.5279   LearningRate 0.0361   Epoch: 14   Global Step: 151290   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:56:28,257-Speed 5976.86 samples/sec   Loss 4.5476   LearningRate 0.0361   Epoch: 14   Global Step: 151300   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:56:35,125-Speed 5965.27 samples/sec   Loss 4.5596   LearningRate 0.0361   Epoch: 14   Global Step: 151310   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:56:41,974-Speed 5981.55 samples/sec   Loss 4.4914   LearningRate 0.0361   Epoch: 14   Global Step: 151320   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:56:48,837-Speed 5969.46 samples/sec   Loss 4.5089   LearningRate 0.0361   Epoch: 14   Global Step: 151330   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:56:55,687-Speed 5980.05 samples/sec   Loss 4.5134   LearningRate 0.0361   Epoch: 14   Global Step: 151340   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:57:02,558-Speed 5963.16 samples/sec   Loss 4.5045   LearningRate 0.0360   Epoch: 14   Global Step: 151350   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:57:09,433-Speed 5959.41 samples/sec   Loss 4.5445   LearningRate 0.0360   Epoch: 14   Global Step: 151360   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 01:57:16,298-Speed 5969.61 samples/sec   Loss 4.4978   LearningRate 0.0360   Epoch: 14   Global Step: 151370   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:57:23,176-Speed 5956.66 samples/sec   Loss 4.5319   LearningRate 0.0360   Epoch: 14   Global Step: 151380   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:57:30,088-Speed 5927.94 samples/sec   Loss 4.5319   LearningRate 0.0360   Epoch: 14   Global Step: 151390   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:57:37,007-Speed 5920.74 samples/sec   Loss 4.5352   LearningRate 0.0360   Epoch: 14   Global Step: 151400   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:57:43,873-Speed 5966.90 samples/sec   Loss 4.5996   LearningRate 0.0360   Epoch: 14   Global Step: 151410   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:57:50,727-Speed 5978.09 samples/sec   Loss 4.5388   LearningRate 0.0360   Epoch: 14   Global Step: 151420   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:57:57,580-Speed 5977.84 samples/sec   Loss 4.5403   LearningRate 0.0359   Epoch: 14   Global Step: 151430   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:58:04,436-Speed 5975.42 samples/sec   Loss 4.5447   LearningRate 0.0359   Epoch: 14   Global Step: 151440   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:58:11,297-Speed 5970.61 samples/sec   Loss 4.5558   LearningRate 0.0359   Epoch: 14   Global Step: 151450   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:58:18,182-Speed 5950.72 samples/sec   Loss 4.5004   LearningRate 0.0359   Epoch: 14   Global Step: 151460   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:58:25,044-Speed 5970.00 samples/sec   Loss 4.5883   LearningRate 0.0359   Epoch: 14   Global Step: 151470   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 01:58:31,898-Speed 5977.87 samples/sec   Loss 4.5238   LearningRate 0.0359   Epoch: 14   Global Step: 151480   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 01:58:38,750-Speed 5978.59 samples/sec   Loss 4.5773   LearningRate 0.0359   Epoch: 14   Global Step: 151490   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 01:58:45,602-Speed 5978.75 samples/sec   Loss 4.5484   LearningRate 0.0359   Epoch: 14   Global Step: 151500   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 01:58:52,451-Speed 5983.83 samples/sec   Loss 4.5452   LearningRate 0.0358   Epoch: 14   Global Step: 151510   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:58:59,325-Speed 5959.75 samples/sec   Loss 4.5727   LearningRate 0.0358   Epoch: 14   Global Step: 151520   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:59:06,193-Speed 5965.87 samples/sec   Loss 4.5459   LearningRate 0.0358   Epoch: 14   Global Step: 151530   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:59:13,074-Speed 5954.17 samples/sec   Loss 4.5640   LearningRate 0.0358   Epoch: 14   Global Step: 151540   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:59:19,930-Speed 5974.59 samples/sec   Loss 4.5713   LearningRate 0.0358   Epoch: 14   Global Step: 151550   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:59:26,820-Speed 5946.98 samples/sec   Loss 4.5415   LearningRate 0.0358   Epoch: 14   Global Step: 151560   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:59:33,692-Speed 5961.28 samples/sec   Loss 4.5851   LearningRate 0.0358   Epoch: 14   Global Step: 151570   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:59:40,539-Speed 5983.29 samples/sec   Loss 4.5191   LearningRate 0.0358   Epoch: 14   Global Step: 151580   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:59:47,394-Speed 5977.10 samples/sec   Loss 4.5023   LearningRate 0.0357   Epoch: 14   Global Step: 151590   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 01:59:54,259-Speed 5970.42 samples/sec   Loss 4.6023   LearningRate 0.0357   Epoch: 14   Global Step: 151600   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:00:01,116-Speed 5974.10 samples/sec   Loss 4.4988   LearningRate 0.0357   Epoch: 14   Global Step: 151610   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:00:07,989-Speed 5961.18 samples/sec   Loss 4.5525   LearningRate 0.0357   Epoch: 14   Global Step: 151620   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:00:14,844-Speed 5976.35 samples/sec   Loss 4.5104   LearningRate 0.0357   Epoch: 14   Global Step: 151630   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:00:21,705-Speed 5971.22 samples/sec   Loss 4.5048   LearningRate 0.0357   Epoch: 14   Global Step: 151640   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:00:28,563-Speed 5976.50 samples/sec   Loss 4.5576   LearningRate 0.0357   Epoch: 14   Global Step: 151650   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:00:35,424-Speed 5971.38 samples/sec   Loss 4.5594   LearningRate 0.0357   Epoch: 14   Global Step: 151660   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:00:42,274-Speed 5980.45 samples/sec   Loss 4.5427   LearningRate 0.0356   Epoch: 14   Global Step: 151670   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:00:49,119-Speed 5984.71 samples/sec   Loss 4.4747   LearningRate 0.0356   Epoch: 14   Global Step: 151680   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:00:55,984-Speed 5968.03 samples/sec   Loss 4.5557   LearningRate 0.0356   Epoch: 14   Global Step: 151690   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:01:02,861-Speed 5957.01 samples/sec   Loss 4.5452   LearningRate 0.0356   Epoch: 14   Global Step: 151700   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:01:09,728-Speed 5966.89 samples/sec   Loss 4.4597   LearningRate 0.0356   Epoch: 14   Global Step: 151710   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:01:16,583-Speed 5976.71 samples/sec   Loss 4.5099   LearningRate 0.0356   Epoch: 14   Global Step: 151720   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:01:23,434-Speed 5979.32 samples/sec   Loss 4.5234   LearningRate 0.0356   Epoch: 14   Global Step: 151730   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:01:30,301-Speed 5966.34 samples/sec   Loss 4.5109   LearningRate 0.0355   Epoch: 14   Global Step: 151740   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:01:37,167-Speed 5967.61 samples/sec   Loss 4.4772   LearningRate 0.0355   Epoch: 14   Global Step: 151750   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:01:44,053-Speed 5949.12 samples/sec   Loss 4.5668   LearningRate 0.0355   Epoch: 14   Global Step: 151760   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:01:50,900-Speed 5982.78 samples/sec   Loss 4.5524   LearningRate 0.0355   Epoch: 14   Global Step: 151770   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:01:57,798-Speed 5939.06 samples/sec   Loss 4.4688   LearningRate 0.0355   Epoch: 14   Global Step: 151780   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:02:04,650-Speed 5979.13 samples/sec   Loss 4.4479   LearningRate 0.0355   Epoch: 14   Global Step: 151790   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:02:11,508-Speed 5974.11 samples/sec   Loss 4.5066   LearningRate 0.0355   Epoch: 14   Global Step: 151800   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:02:18,361-Speed 5977.97 samples/sec   Loss 4.4896   LearningRate 0.0355   Epoch: 14   Global Step: 151810   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:02:25,234-Speed 5961.12 samples/sec   Loss 4.4632   LearningRate 0.0354   Epoch: 14   Global Step: 151820   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:02:32,107-Speed 5960.78 samples/sec   Loss 4.5597   LearningRate 0.0354   Epoch: 14   Global Step: 151830   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:02:38,984-Speed 5956.91 samples/sec   Loss 4.5356   LearningRate 0.0354   Epoch: 14   Global Step: 151840   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:02:45,884-Speed 5937.06 samples/sec   Loss 4.4816   LearningRate 0.0354   Epoch: 14   Global Step: 151850   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:02:52,737-Speed 5978.62 samples/sec   Loss 4.5331   LearningRate 0.0354   Epoch: 14   Global Step: 151860   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:02:59,585-Speed 5985.16 samples/sec   Loss 4.5305   LearningRate 0.0354   Epoch: 14   Global Step: 151870   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:03:06,441-Speed 5975.18 samples/sec   Loss 4.4717   LearningRate 0.0354   Epoch: 14   Global Step: 151880   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:03:13,290-Speed 5981.85 samples/sec   Loss 4.4550   LearningRate 0.0354   Epoch: 14   Global Step: 151890   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:03:20,157-Speed 5966.49 samples/sec   Loss 4.5251   LearningRate 0.0353   Epoch: 14   Global Step: 151900   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:03:27,029-Speed 5961.91 samples/sec   Loss 4.5146   LearningRate 0.0353   Epoch: 14   Global Step: 151910   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:03:33,887-Speed 5973.38 samples/sec   Loss 4.5295   LearningRate 0.0353   Epoch: 14   Global Step: 151920   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:03:40,782-Speed 5942.24 samples/sec   Loss 4.5180   LearningRate 0.0353   Epoch: 14   Global Step: 151930   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:03:47,681-Speed 5937.82 samples/sec   Loss 4.4853   LearningRate 0.0353   Epoch: 14   Global Step: 151940   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:03:54,550-Speed 5964.61 samples/sec   Loss 4.4802   LearningRate 0.0353   Epoch: 14   Global Step: 151950   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:04:01,433-Speed 5952.62 samples/sec   Loss 4.5026   LearningRate 0.0353   Epoch: 14   Global Step: 151960   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:04:08,290-Speed 5974.50 samples/sec   Loss 4.5155   LearningRate 0.0353   Epoch: 14   Global Step: 151970   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:04:15,138-Speed 5982.25 samples/sec   Loss 4.5060   LearningRate 0.0352   Epoch: 14   Global Step: 151980   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:04:22,011-Speed 5961.17 samples/sec   Loss 4.4967   LearningRate 0.0352   Epoch: 14   Global Step: 151990   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:04:28,868-Speed 5974.34 samples/sec   Loss 4.5442   LearningRate 0.0352   Epoch: 14   Global Step: 152000   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:04:35,739-Speed 5962.78 samples/sec   Loss 4.4925   LearningRate 0.0352   Epoch: 14   Global Step: 152010   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:04:42,615-Speed 5961.16 samples/sec   Loss 4.5089   LearningRate 0.0352   Epoch: 14   Global Step: 152020   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:04:49,539-Speed 5916.62 samples/sec   Loss 4.5664   LearningRate 0.0352   Epoch: 14   Global Step: 152030   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:04:56,394-Speed 5977.35 samples/sec   Loss 4.5030   LearningRate 0.0352   Epoch: 14   Global Step: 152040   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:05:03,241-Speed 5983.33 samples/sec   Loss 4.4862   LearningRate 0.0352   Epoch: 14   Global Step: 152050   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:05:10,104-Speed 5969.54 samples/sec   Loss 4.4801   LearningRate 0.0351   Epoch: 14   Global Step: 152060   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:05:16,967-Speed 5969.75 samples/sec   Loss 4.4560   LearningRate 0.0351   Epoch: 14   Global Step: 152070   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:05:23,811-Speed 5985.42 samples/sec   Loss 4.4754   LearningRate 0.0351   Epoch: 14   Global Step: 152080   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:05:30,676-Speed 5967.78 samples/sec   Loss 4.4912   LearningRate 0.0351   Epoch: 14   Global Step: 152090   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:05:37,538-Speed 5969.72 samples/sec   Loss 4.4838   LearningRate 0.0351   Epoch: 14   Global Step: 152100   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:05:44,390-Speed 5980.42 samples/sec   Loss 4.4570   LearningRate 0.0351   Epoch: 14   Global Step: 152110   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:05:51,279-Speed 5946.71 samples/sec   Loss 4.4438   LearningRate 0.0351   Epoch: 14   Global Step: 152120   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:05:58,132-Speed 5978.09 samples/sec   Loss 4.5009   LearningRate 0.0351   Epoch: 14   Global Step: 152130   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:06:05,031-Speed 5938.32 samples/sec   Loss 4.4393   LearningRate 0.0350   Epoch: 14   Global Step: 152140   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:06:11,884-Speed 5977.93 samples/sec   Loss 4.4571   LearningRate 0.0350   Epoch: 14   Global Step: 152150   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:06:18,740-Speed 5975.30 samples/sec   Loss 4.5104   LearningRate 0.0350   Epoch: 14   Global Step: 152160   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:06:25,591-Speed 5980.07 samples/sec   Loss 4.5255   LearningRate 0.0350   Epoch: 14   Global Step: 152170   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:06:32,488-Speed 5940.11 samples/sec   Loss 4.5029   LearningRate 0.0350   Epoch: 14   Global Step: 152180   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:06:39,375-Speed 5949.41 samples/sec   Loss 4.5043   LearningRate 0.0350   Epoch: 14   Global Step: 152190   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:06:46,272-Speed 5939.72 samples/sec   Loss 4.4971   LearningRate 0.0350   Epoch: 14   Global Step: 152200   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:06:53,135-Speed 5969.34 samples/sec   Loss 4.4718   LearningRate 0.0350   Epoch: 14   Global Step: 152210   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:07:00,009-Speed 5959.97 samples/sec   Loss 4.4868   LearningRate 0.0349   Epoch: 14   Global Step: 152220   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:07:06,871-Speed 5970.74 samples/sec   Loss 4.4816   LearningRate 0.0349   Epoch: 14   Global Step: 152230   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:07:13,756-Speed 5950.01 samples/sec   Loss 4.5061   LearningRate 0.0349   Epoch: 14   Global Step: 152240   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:07:20,632-Speed 5957.76 samples/sec   Loss 4.4364   LearningRate 0.0349   Epoch: 14   Global Step: 152250   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:07:27,551-Speed 5921.40 samples/sec   Loss 4.5053   LearningRate 0.0349   Epoch: 14   Global Step: 152260   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:07:34,424-Speed 5962.88 samples/sec   Loss 4.4798   LearningRate 0.0349   Epoch: 14   Global Step: 152270   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:07:41,284-Speed 5972.74 samples/sec   Loss 4.4929   LearningRate 0.0349   Epoch: 14   Global Step: 152280   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:07:48,130-Speed 5984.11 samples/sec   Loss 4.4911   LearningRate 0.0348   Epoch: 14   Global Step: 152290   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:07:54,984-Speed 5976.05 samples/sec   Loss 4.4527   LearningRate 0.0348   Epoch: 14   Global Step: 152300   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:08:01,853-Speed 5964.13 samples/sec   Loss 4.4952   LearningRate 0.0348   Epoch: 14   Global Step: 152310   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:08:08,733-Speed 5954.98 samples/sec   Loss 4.4648   LearningRate 0.0348   Epoch: 14   Global Step: 152320   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:08:15,599-Speed 5967.04 samples/sec   Loss 4.4335   LearningRate 0.0348   Epoch: 14   Global Step: 152330   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:08:22,445-Speed 5984.38 samples/sec   Loss 4.4044   LearningRate 0.0348   Epoch: 14   Global Step: 152340   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:08:29,290-Speed 5985.30 samples/sec   Loss 4.4257   LearningRate 0.0348   Epoch: 14   Global Step: 152350   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:08:36,138-Speed 5981.99 samples/sec   Loss 4.4734   LearningRate 0.0348   Epoch: 14   Global Step: 152360   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:08:42,991-Speed 5981.28 samples/sec   Loss 4.4671   LearningRate 0.0347   Epoch: 14   Global Step: 152370   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:08:49,855-Speed 5968.26 samples/sec   Loss 4.4866   LearningRate 0.0347   Epoch: 14   Global Step: 152380   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:08:56,714-Speed 5973.20 samples/sec   Loss 4.4465   LearningRate 0.0347   Epoch: 14   Global Step: 152390   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:09:03,622-Speed 5933.30 samples/sec   Loss 4.4644   LearningRate 0.0347   Epoch: 14   Global Step: 152400   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:09:10,485-Speed 5969.77 samples/sec   Loss 4.4456   LearningRate 0.0347   Epoch: 14   Global Step: 152410   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:09:17,360-Speed 5958.51 samples/sec   Loss 4.4727   LearningRate 0.0347   Epoch: 14   Global Step: 152420   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:09:24,237-Speed 5957.05 samples/sec   Loss 4.4719   LearningRate 0.0347   Epoch: 14   Global Step: 152430   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:09:31,101-Speed 5969.07 samples/sec   Loss 4.4512   LearningRate 0.0347   Epoch: 14   Global Step: 152440   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:09:37,975-Speed 5959.62 samples/sec   Loss 4.5051   LearningRate 0.0346   Epoch: 14   Global Step: 152450   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:09:44,840-Speed 5968.00 samples/sec   Loss 4.3993   LearningRate 0.0346   Epoch: 14   Global Step: 152460   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:09:51,689-Speed 5981.19 samples/sec   Loss 4.4299   LearningRate 0.0346   Epoch: 14   Global Step: 152470   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:09:58,543-Speed 5977.44 samples/sec   Loss 4.4991   LearningRate 0.0346   Epoch: 14   Global Step: 152480   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:10:05,405-Speed 5972.09 samples/sec   Loss 4.4750   LearningRate 0.0346   Epoch: 14   Global Step: 152490   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:10:12,254-Speed 5983.48 samples/sec   Loss 4.4616   LearningRate 0.0346   Epoch: 14   Global Step: 152500   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:10:19,108-Speed 5977.42 samples/sec   Loss 4.4466   LearningRate 0.0346   Epoch: 14   Global Step: 152510   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:10:25,967-Speed 5972.71 samples/sec   Loss 4.4594   LearningRate 0.0346   Epoch: 14   Global Step: 152520   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:10:32,838-Speed 5963.00 samples/sec   Loss 4.3987   LearningRate 0.0345   Epoch: 14   Global Step: 152530   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:10:39,684-Speed 5983.45 samples/sec   Loss 4.5079   LearningRate 0.0345   Epoch: 14   Global Step: 152540   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:10:46,558-Speed 5960.64 samples/sec   Loss 4.4105   LearningRate 0.0345   Epoch: 14   Global Step: 152550   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:10:53,423-Speed 5967.76 samples/sec   Loss 4.4592   LearningRate 0.0345   Epoch: 14   Global Step: 152560   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:11:00,273-Speed 5980.42 samples/sec   Loss 4.4459   LearningRate 0.0345   Epoch: 14   Global Step: 152570   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:11:07,139-Speed 5967.03 samples/sec   Loss 4.4342   LearningRate 0.0345   Epoch: 14   Global Step: 152580   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:11:14,017-Speed 5956.60 samples/sec   Loss 4.4878   LearningRate 0.0345   Epoch: 14   Global Step: 152590   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:11:20,895-Speed 5956.27 samples/sec   Loss 4.4463   LearningRate 0.0345   Epoch: 14   Global Step: 152600   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:11:27,755-Speed 5971.92 samples/sec   Loss 4.4586   LearningRate 0.0344   Epoch: 14   Global Step: 152610   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:11:34,642-Speed 5949.16 samples/sec   Loss 4.4846   LearningRate 0.0344   Epoch: 14   Global Step: 152620   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:11:41,489-Speed 5983.53 samples/sec   Loss 4.4986   LearningRate 0.0344   Epoch: 14   Global Step: 152630   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:11:48,379-Speed 5945.80 samples/sec   Loss 4.4246   LearningRate 0.0344   Epoch: 14   Global Step: 152640   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:11:55,254-Speed 5959.58 samples/sec   Loss 4.4571   LearningRate 0.0344   Epoch: 14   Global Step: 152650   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:12:02,099-Speed 5984.54 samples/sec   Loss 4.4663   LearningRate 0.0344   Epoch: 14   Global Step: 152660   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:12:08,958-Speed 5975.40 samples/sec   Loss 4.4030   LearningRate 0.0344   Epoch: 14   Global Step: 152670   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:12:15,853-Speed 5943.81 samples/sec   Loss 4.4576   LearningRate 0.0344   Epoch: 14   Global Step: 152680   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:12:22,699-Speed 5983.95 samples/sec   Loss 4.4713   LearningRate 0.0343   Epoch: 14   Global Step: 152690   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:12:29,547-Speed 5982.56 samples/sec   Loss 4.4431   LearningRate 0.0343   Epoch: 14   Global Step: 152700   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:12:36,416-Speed 5964.32 samples/sec   Loss 4.4597   LearningRate 0.0343   Epoch: 14   Global Step: 152710   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:12:43,270-Speed 5977.08 samples/sec   Loss 4.4502   LearningRate 0.0343   Epoch: 14   Global Step: 152720   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:12:50,134-Speed 5968.57 samples/sec   Loss 4.4245   LearningRate 0.0343   Epoch: 14   Global Step: 152730   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:12:57,006-Speed 5961.12 samples/sec   Loss 4.4862   LearningRate 0.0343   Epoch: 14   Global Step: 152740   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:13:03,852-Speed 5984.21 samples/sec   Loss 4.4791   LearningRate 0.0343   Epoch: 14   Global Step: 152750   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:13:10,712-Speed 5972.12 samples/sec   Loss 4.4392   LearningRate 0.0343   Epoch: 14   Global Step: 152760   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:13:17,561-Speed 5981.79 samples/sec   Loss 4.4602   LearningRate 0.0342   Epoch: 14   Global Step: 152770   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:13:24,409-Speed 5981.33 samples/sec   Loss 4.4357   LearningRate 0.0342   Epoch: 14   Global Step: 152780   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:13:31,285-Speed 5960.92 samples/sec   Loss 4.4396   LearningRate 0.0342   Epoch: 14   Global Step: 152790   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:13:38,149-Speed 5969.26 samples/sec   Loss 4.3763   LearningRate 0.0342   Epoch: 14   Global Step: 152800   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:13:45,018-Speed 5963.21 samples/sec   Loss 4.4031   LearningRate 0.0342   Epoch: 14   Global Step: 152810   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:13:51,894-Speed 5958.59 samples/sec   Loss 4.4183   LearningRate 0.0342   Epoch: 14   Global Step: 152820   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:13:58,750-Speed 5977.42 samples/sec   Loss 4.4773   LearningRate 0.0342   Epoch: 14   Global Step: 152830   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:14:05,595-Speed 5985.23 samples/sec   Loss 4.4124   LearningRate 0.0342   Epoch: 14   Global Step: 152840   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:14:12,447-Speed 5978.66 samples/sec   Loss 4.3907   LearningRate 0.0341   Epoch: 14   Global Step: 152850   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:14:19,305-Speed 5974.19 samples/sec   Loss 4.4106   LearningRate 0.0341   Epoch: 14   Global Step: 152860   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:14:26,158-Speed 5978.31 samples/sec   Loss 4.4263   LearningRate 0.0341   Epoch: 14   Global Step: 152870   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:14:33,001-Speed 5987.04 samples/sec   Loss 4.4376   LearningRate 0.0341   Epoch: 14   Global Step: 152880   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:14:39,851-Speed 5980.30 samples/sec   Loss 4.4025   LearningRate 0.0341   Epoch: 14   Global Step: 152890   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:14:46,716-Speed 5968.06 samples/sec   Loss 4.4311   LearningRate 0.0341   Epoch: 14   Global Step: 152900   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:14:53,578-Speed 5970.13 samples/sec   Loss 4.5032   LearningRate 0.0341   Epoch: 14   Global Step: 152910   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:15:00,426-Speed 5982.02 samples/sec   Loss 4.4393   LearningRate 0.0341   Epoch: 14   Global Step: 152920   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:15:07,277-Speed 5979.66 samples/sec   Loss 4.4195   LearningRate 0.0340   Epoch: 14   Global Step: 152930   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:15:14,137-Speed 5972.88 samples/sec   Loss 4.4349   LearningRate 0.0340   Epoch: 14   Global Step: 152940   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:15:20,986-Speed 5981.71 samples/sec   Loss 4.4294   LearningRate 0.0340   Epoch: 14   Global Step: 152950   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:15:27,836-Speed 5980.35 samples/sec   Loss 4.4312   LearningRate 0.0340   Epoch: 14   Global Step: 152960   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:15:34,691-Speed 5978.97 samples/sec   Loss 4.4477   LearningRate 0.0340   Epoch: 14   Global Step: 152970   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:15:41,554-Speed 5969.49 samples/sec   Loss 4.4534   LearningRate 0.0340   Epoch: 14   Global Step: 152980   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:15:48,439-Speed 5950.01 samples/sec   Loss 4.4532   LearningRate 0.0340   Epoch: 14   Global Step: 152990   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:15:55,302-Speed 5969.46 samples/sec   Loss 4.3838   LearningRate 0.0340   Epoch: 14   Global Step: 153000   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:16:02,195-Speed 5943.32 samples/sec   Loss 4.4419   LearningRate 0.0339   Epoch: 14   Global Step: 153010   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:16:09,067-Speed 5961.99 samples/sec   Loss 4.4408   LearningRate 0.0339   Epoch: 14   Global Step: 153020   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:16:15,919-Speed 5978.56 samples/sec   Loss 4.4180   LearningRate 0.0339   Epoch: 14   Global Step: 153030   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:16:22,769-Speed 5981.13 samples/sec   Loss 4.3926   LearningRate 0.0339   Epoch: 14   Global Step: 153040   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:16:29,621-Speed 5977.90 samples/sec   Loss 4.4193   LearningRate 0.0339   Epoch: 14   Global Step: 153050   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:16:36,471-Speed 5980.88 samples/sec   Loss 4.4385   LearningRate 0.0339   Epoch: 14   Global Step: 153060   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:16:43,321-Speed 5982.61 samples/sec   Loss 4.4282   LearningRate 0.0339   Epoch: 14   Global Step: 153070   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:16:50,180-Speed 5972.96 samples/sec   Loss 4.4195   LearningRate 0.0339   Epoch: 14   Global Step: 153080   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:16:57,034-Speed 5977.11 samples/sec   Loss 4.4242   LearningRate 0.0338   Epoch: 14   Global Step: 153090   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:17:03,928-Speed 5943.34 samples/sec   Loss 4.4614   LearningRate 0.0338   Epoch: 14   Global Step: 153100   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:17:10,782-Speed 5976.29 samples/sec   Loss 4.4383   LearningRate 0.0338   Epoch: 14   Global Step: 153110   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:17:17,661-Speed 5956.25 samples/sec   Loss 4.4572   LearningRate 0.0338   Epoch: 14   Global Step: 153120   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:17:24,509-Speed 5982.21 samples/sec   Loss 4.4766   LearningRate 0.0338   Epoch: 14   Global Step: 153130   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:17:31,354-Speed 5984.52 samples/sec   Loss 4.4448   LearningRate 0.0338   Epoch: 14   Global Step: 153140   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:17:38,207-Speed 5978.24 samples/sec   Loss 4.4330   LearningRate 0.0338   Epoch: 14   Global Step: 153150   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:17:45,064-Speed 5974.30 samples/sec   Loss 4.3925   LearningRate 0.0338   Epoch: 14   Global Step: 153160   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:17:51,923-Speed 5972.81 samples/sec   Loss 4.3761   LearningRate 0.0337   Epoch: 14   Global Step: 153170   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:17:58,774-Speed 5980.55 samples/sec   Loss 4.4256   LearningRate 0.0337   Epoch: 14   Global Step: 153180   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:18:05,640-Speed 5967.62 samples/sec   Loss 4.3838   LearningRate 0.0337   Epoch: 14   Global Step: 153190   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:18:12,530-Speed 5945.29 samples/sec   Loss 4.3909   LearningRate 0.0337   Epoch: 14   Global Step: 153200   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:18:19,381-Speed 5980.05 samples/sec   Loss 4.4148   LearningRate 0.0337   Epoch: 14   Global Step: 153210   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:18:26,248-Speed 5967.57 samples/sec   Loss 4.4065   LearningRate 0.0337   Epoch: 14   Global Step: 153220   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:18:33,135-Speed 5948.48 samples/sec   Loss 4.4414   LearningRate 0.0337   Epoch: 14   Global Step: 153230   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:18:40,017-Speed 5952.93 samples/sec   Loss 4.4632   LearningRate 0.0337   Epoch: 14   Global Step: 153240   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:18:46,866-Speed 5981.39 samples/sec   Loss 4.4389   LearningRate 0.0336   Epoch: 14   Global Step: 153250   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:18:53,738-Speed 5961.55 samples/sec   Loss 4.4362   LearningRate 0.0336   Epoch: 14   Global Step: 153260   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:19:00,611-Speed 5960.62 samples/sec   Loss 4.4131   LearningRate 0.0336   Epoch: 14   Global Step: 153270   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:19:07,473-Speed 5970.73 samples/sec   Loss 4.3817   LearningRate 0.0336   Epoch: 14   Global Step: 153280   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:19:14,343-Speed 5962.50 samples/sec   Loss 4.4190   LearningRate 0.0336   Epoch: 14   Global Step: 153290   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:19:21,211-Speed 5965.32 samples/sec   Loss 4.4000   LearningRate 0.0336   Epoch: 14   Global Step: 153300   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:19:28,080-Speed 5964.50 samples/sec   Loss 4.4071   LearningRate 0.0336   Epoch: 14   Global Step: 153310   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:19:34,934-Speed 5976.88 samples/sec   Loss 4.3644   LearningRate 0.0336   Epoch: 14   Global Step: 153320   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:19:41,792-Speed 5974.77 samples/sec   Loss 4.3771   LearningRate 0.0335   Epoch: 14   Global Step: 153330   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:19:48,679-Speed 5948.32 samples/sec   Loss 4.4314   LearningRate 0.0335   Epoch: 14   Global Step: 153340   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:19:55,526-Speed 5983.71 samples/sec   Loss 4.4169   LearningRate 0.0335   Epoch: 14   Global Step: 153350   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:20:02,385-Speed 5972.90 samples/sec   Loss 4.3976   LearningRate 0.0335   Epoch: 14   Global Step: 153360   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:20:09,225-Speed 5989.44 samples/sec   Loss 4.3710   LearningRate 0.0335   Epoch: 14   Global Step: 153370   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:20:16,078-Speed 5977.30 samples/sec   Loss 4.4461   LearningRate 0.0335   Epoch: 14   Global Step: 153380   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:20:22,967-Speed 5947.22 samples/sec   Loss 4.4292   LearningRate 0.0335   Epoch: 14   Global Step: 153390   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:20:29,843-Speed 5958.45 samples/sec   Loss 4.4269   LearningRate 0.0335   Epoch: 14   Global Step: 153400   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:20:36,733-Speed 5946.46 samples/sec   Loss 4.4481   LearningRate 0.0334   Epoch: 14   Global Step: 153410   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:20:43,608-Speed 5958.60 samples/sec   Loss 4.4240   LearningRate 0.0334   Epoch: 14   Global Step: 153420   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:20:50,473-Speed 5967.98 samples/sec   Loss 4.3740   LearningRate 0.0334   Epoch: 14   Global Step: 153430   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:20:57,334-Speed 5971.09 samples/sec   Loss 4.3999   LearningRate 0.0334   Epoch: 14   Global Step: 153440   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:21:04,189-Speed 5976.59 samples/sec   Loss 4.3838   LearningRate 0.0334   Epoch: 14   Global Step: 153450   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:21:11,042-Speed 5978.59 samples/sec   Loss 4.3871   LearningRate 0.0334   Epoch: 14   Global Step: 153460   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:21:17,930-Speed 5947.14 samples/sec   Loss 4.4179   LearningRate 0.0334   Epoch: 14   Global Step: 153470   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:21:24,784-Speed 5977.29 samples/sec   Loss 4.3988   LearningRate 0.0334   Epoch: 14   Global Step: 153480   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:21:31,675-Speed 5944.99 samples/sec   Loss 4.3958   LearningRate 0.0333   Epoch: 14   Global Step: 153490   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:21:38,538-Speed 5969.48 samples/sec   Loss 4.3758   LearningRate 0.0333   Epoch: 14   Global Step: 153500   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:21:45,425-Speed 5950.54 samples/sec   Loss 4.3978   LearningRate 0.0333   Epoch: 14   Global Step: 153510   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:21:52,274-Speed 5983.91 samples/sec   Loss 4.4140   LearningRate 0.0333   Epoch: 14   Global Step: 153520   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:21:59,158-Speed 5950.75 samples/sec   Loss 4.3729   LearningRate 0.0333   Epoch: 14   Global Step: 153530   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:22:06,047-Speed 5947.56 samples/sec   Loss 4.3699   LearningRate 0.0333   Epoch: 14   Global Step: 153540   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:22:12,918-Speed 5962.41 samples/sec   Loss 4.4156   LearningRate 0.0333   Epoch: 14   Global Step: 153550   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:22:19,791-Speed 5960.33 samples/sec   Loss 4.3523   LearningRate 0.0333   Epoch: 14   Global Step: 153560   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:22:26,660-Speed 5964.67 samples/sec   Loss 4.4175   LearningRate 0.0332   Epoch: 14   Global Step: 153570   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:22:33,528-Speed 5964.77 samples/sec   Loss 4.3961   LearningRate 0.0332   Epoch: 14   Global Step: 153580   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:22:40,405-Speed 5956.83 samples/sec   Loss 4.3890   LearningRate 0.0332   Epoch: 14   Global Step: 153590   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:22:47,279-Speed 5959.68 samples/sec   Loss 4.3583   LearningRate 0.0332   Epoch: 14   Global Step: 153600   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:22:54,149-Speed 5963.93 samples/sec   Loss 4.4129   LearningRate 0.0332   Epoch: 14   Global Step: 153610   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:23:01,022-Speed 5960.77 samples/sec   Loss 4.3640   LearningRate 0.0332   Epoch: 14   Global Step: 153620   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:23:07,887-Speed 5967.14 samples/sec   Loss 4.3509   LearningRate 0.0332   Epoch: 14   Global Step: 153630   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:23:14,750-Speed 5970.06 samples/sec   Loss 4.3686   LearningRate 0.0332   Epoch: 14   Global Step: 153640   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-01-09 02:23:21,607-Speed 5974.28 samples/sec   Loss 4.3676   LearningRate 0.0331   Epoch: 14   Global Step: 153650   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:23:28,465-Speed 5974.00 samples/sec   Loss 4.4004   LearningRate 0.0331   Epoch: 14   Global Step: 153660   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:23:35,352-Speed 5949.41 samples/sec   Loss 4.3079   LearningRate 0.0331   Epoch: 14   Global Step: 153670   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:23:42,196-Speed 5984.78 samples/sec   Loss 4.3550   LearningRate 0.0331   Epoch: 14   Global Step: 153680   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:23:49,060-Speed 5969.02 samples/sec   Loss 4.3913   LearningRate 0.0331   Epoch: 14   Global Step: 153690   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:23:55,916-Speed 5976.46 samples/sec   Loss 4.3740   LearningRate 0.0331   Epoch: 14   Global Step: 153700   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:24:02,778-Speed 5969.98 samples/sec   Loss 4.3729   LearningRate 0.0331   Epoch: 14   Global Step: 153710   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 02:24:09,657-Speed 5955.23 samples/sec   Loss 4.3921   LearningRate 0.0331   Epoch: 14   Global Step: 153720   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 02:24:16,512-Speed 5976.58 samples/sec   Loss 4.4667   LearningRate 0.0331   Epoch: 14   Global Step: 153730   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 02:24:23,397-Speed 5950.31 samples/sec   Loss 4.3595   LearningRate 0.0330   Epoch: 14   Global Step: 153740   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 02:24:30,252-Speed 5975.37 samples/sec   Loss 4.4136   LearningRate 0.0330   Epoch: 14   Global Step: 153750   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 02:24:37,131-Speed 5956.34 samples/sec   Loss 4.3789   LearningRate 0.0330   Epoch: 14   Global Step: 153760   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 02:24:43,975-Speed 5985.54 samples/sec   Loss 4.3391   LearningRate 0.0330   Epoch: 14   Global Step: 153770   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 02:24:50,824-Speed 5981.31 samples/sec   Loss 4.3856   LearningRate 0.0330   Epoch: 14   Global Step: 153780   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 02:24:57,688-Speed 5968.48 samples/sec   Loss 4.3584   LearningRate 0.0330   Epoch: 14   Global Step: 153790   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 02:25:04,540-Speed 5978.65 samples/sec   Loss 4.3982   LearningRate 0.0330   Epoch: 14   Global Step: 153800   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-01-09 02:25:11,395-Speed 5976.87 samples/sec   Loss 4.3910   LearningRate 0.0330   Epoch: 14   Global Step: 153810   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-01-09 02:25:18,260-Speed 5970.29 samples/sec   Loss 4.3329   LearningRate 0.0329   Epoch: 14   Global Step: 153820   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:25:25,129-Speed 5963.41 samples/sec   Loss 4.4185   LearningRate 0.0329   Epoch: 14   Global Step: 153830   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:25:31,971-Speed 5988.39 samples/sec   Loss 4.3357   LearningRate 0.0329   Epoch: 14   Global Step: 153840   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:25:38,827-Speed 5976.74 samples/sec   Loss 4.3635   LearningRate 0.0329   Epoch: 14   Global Step: 153850   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:25:45,695-Speed 5965.27 samples/sec   Loss 4.3297   LearningRate 0.0329   Epoch: 14   Global Step: 153860   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:25:52,581-Speed 5949.60 samples/sec   Loss 4.3198   LearningRate 0.0329   Epoch: 14   Global Step: 153870   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:25:59,460-Speed 5958.30 samples/sec   Loss 4.3760   LearningRate 0.0329   Epoch: 14   Global Step: 153880   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:26:06,313-Speed 5977.59 samples/sec   Loss 4.3775   LearningRate 0.0329   Epoch: 14   Global Step: 153890   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:26:13,164-Speed 5982.37 samples/sec   Loss 4.3870   LearningRate 0.0328   Epoch: 14   Global Step: 153900   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:26:20,038-Speed 5960.31 samples/sec   Loss 4.3694   LearningRate 0.0328   Epoch: 14   Global Step: 153910   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 02:26:26,897-Speed 5973.06 samples/sec   Loss 4.3923   LearningRate 0.0328   Epoch: 14   Global Step: 153920   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 02:26:33,742-Speed 5984.40 samples/sec   Loss 4.3073   LearningRate 0.0328   Epoch: 14   Global Step: 153930   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:26:40,599-Speed 5974.69 samples/sec   Loss 4.3530   LearningRate 0.0328   Epoch: 14   Global Step: 153940   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:26:47,474-Speed 5959.53 samples/sec   Loss 4.3461   LearningRate 0.0328   Epoch: 14   Global Step: 153950   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:26:54,332-Speed 5973.21 samples/sec   Loss 4.3117   LearningRate 0.0328   Epoch: 14   Global Step: 153960   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:27:01,191-Speed 5973.03 samples/sec   Loss 4.3394   LearningRate 0.0328   Epoch: 14   Global Step: 153970   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:27:08,053-Speed 5972.04 samples/sec   Loss 4.3009   LearningRate 0.0327   Epoch: 14   Global Step: 153980   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:27:14,903-Speed 5983.74 samples/sec   Loss 4.3437   LearningRate 0.0327   Epoch: 14   Global Step: 153990   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:27:21,751-Speed 5984.59 samples/sec   Loss 4.3477   LearningRate 0.0327   Epoch: 14   Global Step: 154000   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:27:28,597-Speed 5983.80 samples/sec   Loss 4.3598   LearningRate 0.0327   Epoch: 14   Global Step: 154010   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:27:35,449-Speed 5979.43 samples/sec   Loss 4.2959   LearningRate 0.0327   Epoch: 14   Global Step: 154020   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:27:42,312-Speed 5969.64 samples/sec   Loss 4.3325   LearningRate 0.0327   Epoch: 14   Global Step: 154030   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:27:49,170-Speed 5973.62 samples/sec   Loss 4.3269   LearningRate 0.0327   Epoch: 14   Global Step: 154040   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:27:56,016-Speed 5983.79 samples/sec   Loss 4.3438   LearningRate 0.0327   Epoch: 14   Global Step: 154050   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:28:02,860-Speed 5986.07 samples/sec   Loss 4.2606   LearningRate 0.0326   Epoch: 14   Global Step: 154060   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:28:09,726-Speed 5966.50 samples/sec   Loss 4.3075   LearningRate 0.0326   Epoch: 14   Global Step: 154070   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:28:16,608-Speed 5954.21 samples/sec   Loss 4.3701   LearningRate 0.0326   Epoch: 14   Global Step: 154080   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:28:23,493-Speed 5950.40 samples/sec   Loss 4.3802   LearningRate 0.0326   Epoch: 14   Global Step: 154090   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:28:30,374-Speed 5954.48 samples/sec   Loss 4.3356   LearningRate 0.0326   Epoch: 14   Global Step: 154100   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:28:37,239-Speed 5967.90 samples/sec   Loss 4.3479   LearningRate 0.0326   Epoch: 14   Global Step: 154110   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:28:44,113-Speed 5959.53 samples/sec   Loss 4.3100   LearningRate 0.0326   Epoch: 14   Global Step: 154120   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:28:50,967-Speed 5977.00 samples/sec   Loss 4.3643   LearningRate 0.0326   Epoch: 14   Global Step: 154130   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:28:57,818-Speed 5980.57 samples/sec   Loss 4.3359   LearningRate 0.0325   Epoch: 14   Global Step: 154140   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:29:04,687-Speed 5964.31 samples/sec   Loss 4.3688   LearningRate 0.0325   Epoch: 14   Global Step: 154150   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:29:11,592-Speed 5932.19 samples/sec   Loss 4.2785   LearningRate 0.0325   Epoch: 14   Global Step: 154160   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:29:18,456-Speed 5969.19 samples/sec   Loss 4.3002   LearningRate 0.0325   Epoch: 14   Global Step: 154170   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:29:25,344-Speed 5947.45 samples/sec   Loss 4.3647   LearningRate 0.0325   Epoch: 14   Global Step: 154180   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:29:32,229-Speed 5950.17 samples/sec   Loss 4.3959   LearningRate 0.0325   Epoch: 14   Global Step: 154190   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:29:39,093-Speed 5968.91 samples/sec   Loss 4.3358   LearningRate 0.0325   Epoch: 14   Global Step: 154200   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:29:45,962-Speed 5966.61 samples/sec   Loss 4.3337   LearningRate 0.0325   Epoch: 14   Global Step: 154210   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:29:52,801-Speed 5990.34 samples/sec   Loss 4.3422   LearningRate 0.0324   Epoch: 14   Global Step: 154220   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:29:59,652-Speed 5978.95 samples/sec   Loss 4.3260   LearningRate 0.0324   Epoch: 14   Global Step: 154230   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:30:06,510-Speed 5974.44 samples/sec   Loss 4.3545   LearningRate 0.0324   Epoch: 14   Global Step: 154240   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:30:13,370-Speed 5972.22 samples/sec   Loss 4.3366   LearningRate 0.0324   Epoch: 14   Global Step: 154250   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:30:20,231-Speed 5971.08 samples/sec   Loss 4.3707   LearningRate 0.0324   Epoch: 14   Global Step: 154260   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:30:27,109-Speed 5956.57 samples/sec   Loss 4.3251   LearningRate 0.0324   Epoch: 14   Global Step: 154270   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:30:33,963-Speed 5976.81 samples/sec   Loss 4.3369   LearningRate 0.0324   Epoch: 14   Global Step: 154280   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:30:40,820-Speed 5974.83 samples/sec   Loss 4.3435   LearningRate 0.0324   Epoch: 14   Global Step: 154290   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:30:47,689-Speed 5964.00 samples/sec   Loss 4.3121   LearningRate 0.0324   Epoch: 14   Global Step: 154300   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:30:54,536-Speed 5982.70 samples/sec   Loss 4.3165   LearningRate 0.0323   Epoch: 14   Global Step: 154310   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:31:01,391-Speed 5978.77 samples/sec   Loss 4.3448   LearningRate 0.0323   Epoch: 14   Global Step: 154320   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:31:08,236-Speed 5984.79 samples/sec   Loss 4.3447   LearningRate 0.0323   Epoch: 14   Global Step: 154330   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:31:15,090-Speed 5976.64 samples/sec   Loss 4.3036   LearningRate 0.0323   Epoch: 14   Global Step: 154340   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:31:22,044-Speed 5892.14 samples/sec   Loss 4.3177   LearningRate 0.0323   Epoch: 14   Global Step: 154350   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 02:31:28,999-Speed 5890.30 samples/sec   Loss 4.2825   LearningRate 0.0323   Epoch: 14   Global Step: 154360   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 02:31:35,892-Speed 5942.89 samples/sec   Loss 4.3775   LearningRate 0.0323   Epoch: 14   Global Step: 154370   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 02:31:42,742-Speed 5980.71 samples/sec   Loss 4.3843   LearningRate 0.0323   Epoch: 14   Global Step: 154380   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 02:31:49,587-Speed 5984.72 samples/sec   Loss 4.2710   LearningRate 0.0322   Epoch: 14   Global Step: 154390   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 02:31:56,558-Speed 5877.26 samples/sec   Loss 4.3160   LearningRate 0.0322   Epoch: 14   Global Step: 154400   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:32:03,552-Speed 5859.71 samples/sec   Loss 4.3482   LearningRate 0.0322   Epoch: 14   Global Step: 154410   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:32:10,508-Speed 5889.22 samples/sec   Loss 4.3482   LearningRate 0.0322   Epoch: 14   Global Step: 154420   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:32:17,400-Speed 5944.09 samples/sec   Loss 4.3594   LearningRate 0.0322   Epoch: 14   Global Step: 154430   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:32:24,267-Speed 5966.04 samples/sec   Loss 4.2837   LearningRate 0.0322   Epoch: 14   Global Step: 154440   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:32:31,141-Speed 5962.00 samples/sec   Loss 4.3107   LearningRate 0.0322   Epoch: 14   Global Step: 154450   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:32:38,062-Speed 5919.87 samples/sec   Loss 4.3282   LearningRate 0.0322   Epoch: 14   Global Step: 154460   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:32:44,938-Speed 5957.70 samples/sec   Loss 4.3186   LearningRate 0.0321   Epoch: 14   Global Step: 154470   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:32:51,793-Speed 5976.68 samples/sec   Loss 4.2945   LearningRate 0.0321   Epoch: 14   Global Step: 154480   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:32:58,644-Speed 5978.90 samples/sec   Loss 4.3459   LearningRate 0.0321   Epoch: 14   Global Step: 154490   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:33:05,497-Speed 5978.14 samples/sec   Loss 4.3422   LearningRate 0.0321   Epoch: 14   Global Step: 154500   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:33:12,382-Speed 5951.06 samples/sec   Loss 4.3283   LearningRate 0.0321   Epoch: 14   Global Step: 154510   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:33:19,236-Speed 5976.18 samples/sec   Loss 4.3736   LearningRate 0.0321   Epoch: 14   Global Step: 154520   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:33:26,122-Speed 5950.22 samples/sec   Loss 4.2433   LearningRate 0.0321   Epoch: 14   Global Step: 154530   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:33:32,975-Speed 5977.82 samples/sec   Loss 4.3333   LearningRate 0.0321   Epoch: 14   Global Step: 154540   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:33:39,833-Speed 5973.59 samples/sec   Loss 4.3002   LearningRate 0.0320   Epoch: 14   Global Step: 154550   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:33:46,684-Speed 5979.61 samples/sec   Loss 4.3006   LearningRate 0.0320   Epoch: 14   Global Step: 154560   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:33:53,542-Speed 5974.19 samples/sec   Loss 4.3257   LearningRate 0.0320   Epoch: 14   Global Step: 154570   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:34:00,407-Speed 5967.62 samples/sec   Loss 4.3193   LearningRate 0.0320   Epoch: 14   Global Step: 154580   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:34:07,257-Speed 5981.69 samples/sec   Loss 4.3271   LearningRate 0.0320   Epoch: 14   Global Step: 154590   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:34:14,107-Speed 5980.76 samples/sec   Loss 4.2992   LearningRate 0.0320   Epoch: 14   Global Step: 154600   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 02:34:20,988-Speed 5953.13 samples/sec   Loss 4.3044   LearningRate 0.0320   Epoch: 14   Global Step: 154610   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 02:34:27,859-Speed 5963.46 samples/sec   Loss 4.3517   LearningRate 0.0320   Epoch: 14   Global Step: 154620   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 02:34:34,702-Speed 5987.02 samples/sec   Loss 4.3066   LearningRate 0.0320   Epoch: 14   Global Step: 154630   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:34:41,569-Speed 5965.58 samples/sec   Loss 4.3068   LearningRate 0.0319   Epoch: 14   Global Step: 154640   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:34:48,419-Speed 5980.84 samples/sec   Loss 4.3052   LearningRate 0.0319   Epoch: 14   Global Step: 154650   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:34:55,403-Speed 5865.41 samples/sec   Loss 4.3227   LearningRate 0.0319   Epoch: 14   Global Step: 154660   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:35:02,306-Speed 5935.07 samples/sec   Loss 4.3084   LearningRate 0.0319   Epoch: 14   Global Step: 154670   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:35:09,160-Speed 5977.38 samples/sec   Loss 4.2935   LearningRate 0.0319   Epoch: 14   Global Step: 154680   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:35:16,016-Speed 5975.56 samples/sec   Loss 4.2995   LearningRate 0.0319   Epoch: 14   Global Step: 154690   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:35:22,869-Speed 5978.58 samples/sec   Loss 4.3284   LearningRate 0.0319   Epoch: 14   Global Step: 154700   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:35:29,718-Speed 5981.44 samples/sec   Loss 4.3083   LearningRate 0.0319   Epoch: 14   Global Step: 154710   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:35:36,603-Speed 5950.45 samples/sec   Loss 4.3050   LearningRate 0.0318   Epoch: 14   Global Step: 154720   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:35:43,483-Speed 5955.33 samples/sec   Loss 4.3229   LearningRate 0.0318   Epoch: 14   Global Step: 154730   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 02:35:50,325-Speed 5987.81 samples/sec   Loss 4.3528   LearningRate 0.0318   Epoch: 14   Global Step: 154740   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:35:57,195-Speed 5963.52 samples/sec   Loss 4.3113   LearningRate 0.0318   Epoch: 14   Global Step: 154750   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:36:04,071-Speed 5957.64 samples/sec   Loss 4.3181   LearningRate 0.0318   Epoch: 14   Global Step: 154760   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:36:10,941-Speed 5963.80 samples/sec   Loss 4.2912   LearningRate 0.0318   Epoch: 14   Global Step: 154770   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:36:17,834-Speed 5943.50 samples/sec   Loss 4.3067   LearningRate 0.0318   Epoch: 14   Global Step: 154780   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:36:24,702-Speed 5964.98 samples/sec   Loss 4.3033   LearningRate 0.0318   Epoch: 14   Global Step: 154790   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:36:31,557-Speed 5976.09 samples/sec   Loss 4.2853   LearningRate 0.0317   Epoch: 14   Global Step: 154800   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:36:38,406-Speed 5981.29 samples/sec   Loss 4.3034   LearningRate 0.0317   Epoch: 14   Global Step: 154810   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:36:45,255-Speed 5980.82 samples/sec   Loss 4.3056   LearningRate 0.0317   Epoch: 14   Global Step: 154820   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:36:52,130-Speed 5959.49 samples/sec   Loss 4.3474   LearningRate 0.0317   Epoch: 14   Global Step: 154830   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:36:58,979-Speed 5981.61 samples/sec   Loss 4.3084   LearningRate 0.0317   Epoch: 14   Global Step: 154840   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:37:05,830-Speed 5979.74 samples/sec   Loss 4.2795   LearningRate 0.0317   Epoch: 14   Global Step: 154850   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:37:12,704-Speed 5959.81 samples/sec   Loss 4.3561   LearningRate 0.0317   Epoch: 14   Global Step: 154860   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:37:19,559-Speed 5976.71 samples/sec   Loss 4.2758   LearningRate 0.0317   Epoch: 14   Global Step: 154870   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:37:26,427-Speed 5964.90 samples/sec   Loss 4.2877   LearningRate 0.0316   Epoch: 14   Global Step: 154880   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:37:33,305-Speed 5955.88 samples/sec   Loss 4.2773   LearningRate 0.0316   Epoch: 14   Global Step: 154890   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:37:40,182-Speed 5958.45 samples/sec   Loss 4.2535   LearningRate 0.0316   Epoch: 14   Global Step: 154900   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:37:47,044-Speed 5969.87 samples/sec   Loss 4.2880   LearningRate 0.0316   Epoch: 14   Global Step: 154910   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:37:53,946-Speed 5935.22 samples/sec   Loss 4.2694   LearningRate 0.0316   Epoch: 14   Global Step: 154920   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:38:00,806-Speed 5971.77 samples/sec   Loss 4.2957   LearningRate 0.0316   Epoch: 14   Global Step: 154930   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:38:07,647-Speed 5988.37 samples/sec   Loss 4.2588   LearningRate 0.0316   Epoch: 14   Global Step: 154940   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:38:14,507-Speed 5972.93 samples/sec   Loss 4.2510   LearningRate 0.0316   Epoch: 14   Global Step: 154950   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:38:21,366-Speed 5972.75 samples/sec   Loss 4.2805   LearningRate 0.0316   Epoch: 14   Global Step: 154960   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:38:28,228-Speed 5970.18 samples/sec   Loss 4.2933   LearningRate 0.0315   Epoch: 14   Global Step: 154970   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:38:35,082-Speed 5977.50 samples/sec   Loss 4.3160   LearningRate 0.0315   Epoch: 14   Global Step: 154980   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:38:41,934-Speed 5979.58 samples/sec   Loss 4.2585   LearningRate 0.0315   Epoch: 14   Global Step: 154990   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:38:48,781-Speed 5983.34 samples/sec   Loss 4.2504   LearningRate 0.0315   Epoch: 14   Global Step: 155000   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:39:15,476-[lfw][155000]XNorm: 23.453467
Training: 2022-01-09 02:39:15,477-[lfw][155000]Accuracy-Flip: 0.99800+-0.00296
Training: 2022-01-09 02:39:15,477-[lfw][155000]Accuracy-Highest: 0.99800
Training: 2022-01-09 02:39:46,296-[cfp_fp][155000]XNorm: 20.854993
Training: 2022-01-09 02:39:46,297-[cfp_fp][155000]Accuracy-Flip: 0.98786+-0.00539
Training: 2022-01-09 02:39:46,298-[cfp_fp][155000]Accuracy-Highest: 0.98786
Training: 2022-01-09 02:40:13,112-[agedb_30][155000]XNorm: 22.783735
Training: 2022-01-09 02:40:13,113-[agedb_30][155000]Accuracy-Flip: 0.97783+-0.00687
Training: 2022-01-09 02:40:13,113-[agedb_30][155000]Accuracy-Highest: 0.97833
Training: 2022-01-09 02:40:19,956-Speed 449.25 samples/sec   Loss 4.3124   LearningRate 0.0315   Epoch: 14   Global Step: 155010   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:40:26,788-Speed 5997.44 samples/sec   Loss 4.2883   LearningRate 0.0315   Epoch: 14   Global Step: 155020   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:40:33,629-Speed 5988.02 samples/sec   Loss 4.2443   LearningRate 0.0315   Epoch: 14   Global Step: 155030   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:40:40,476-Speed 5983.60 samples/sec   Loss 4.2971   LearningRate 0.0315   Epoch: 14   Global Step: 155040   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:40:47,347-Speed 5962.33 samples/sec   Loss 4.2781   LearningRate 0.0314   Epoch: 14   Global Step: 155050   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:40:54,251-Speed 5935.17 samples/sec   Loss 4.2747   LearningRate 0.0314   Epoch: 14   Global Step: 155060   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:41:01,127-Speed 5958.40 samples/sec   Loss 4.3241   LearningRate 0.0314   Epoch: 14   Global Step: 155070   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:41:08,018-Speed 5944.57 samples/sec   Loss 4.2736   LearningRate 0.0314   Epoch: 14   Global Step: 155080   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:41:14,906-Speed 5948.02 samples/sec   Loss 4.2889   LearningRate 0.0314   Epoch: 14   Global Step: 155090   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:41:21,780-Speed 5960.59 samples/sec   Loss 4.2550   LearningRate 0.0314   Epoch: 14   Global Step: 155100   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:41:28,646-Speed 5966.30 samples/sec   Loss 4.3161   LearningRate 0.0314   Epoch: 14   Global Step: 155110   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:41:35,518-Speed 5963.78 samples/sec   Loss 4.2555   LearningRate 0.0314   Epoch: 14   Global Step: 155120   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:41:42,391-Speed 5961.08 samples/sec   Loss 4.2909   LearningRate 0.0313   Epoch: 14   Global Step: 155130   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:41:49,252-Speed 5970.84 samples/sec   Loss 4.2530   LearningRate 0.0313   Epoch: 14   Global Step: 155140   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:41:56,115-Speed 5969.87 samples/sec   Loss 4.2766   LearningRate 0.0313   Epoch: 14   Global Step: 155150   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:42:02,985-Speed 5963.39 samples/sec   Loss 4.2944   LearningRate 0.0313   Epoch: 14   Global Step: 155160   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:42:09,854-Speed 5964.50 samples/sec   Loss 4.2960   LearningRate 0.0313   Epoch: 14   Global Step: 155170   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:42:16,712-Speed 5973.70 samples/sec   Loss 4.2706   LearningRate 0.0313   Epoch: 14   Global Step: 155180   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:42:23,566-Speed 5977.70 samples/sec   Loss 4.2576   LearningRate 0.0313   Epoch: 14   Global Step: 155190   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:42:30,433-Speed 5965.39 samples/sec   Loss 4.2696   LearningRate 0.0313   Epoch: 14   Global Step: 155200   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:42:37,319-Speed 5949.65 samples/sec   Loss 4.2506   LearningRate 0.0313   Epoch: 14   Global Step: 155210   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:42:44,204-Speed 5951.21 samples/sec   Loss 4.2887   LearningRate 0.0312   Epoch: 14   Global Step: 155220   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:42:51,090-Speed 5948.82 samples/sec   Loss 4.1982   LearningRate 0.0312   Epoch: 14   Global Step: 155230   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:42:57,957-Speed 5966.34 samples/sec   Loss 4.2777   LearningRate 0.0312   Epoch: 14   Global Step: 155240   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 02:43:04,815-Speed 5974.00 samples/sec   Loss 4.2970   LearningRate 0.0312   Epoch: 14   Global Step: 155250   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:43:11,674-Speed 5972.58 samples/sec   Loss 4.2564   LearningRate 0.0312   Epoch: 14   Global Step: 155260   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:43:18,535-Speed 5970.76 samples/sec   Loss 4.3019   LearningRate 0.0312   Epoch: 14   Global Step: 155270   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:43:25,401-Speed 5967.67 samples/sec   Loss 4.3289   LearningRate 0.0312   Epoch: 14   Global Step: 155280   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:43:32,263-Speed 5970.11 samples/sec   Loss 4.2115   LearningRate 0.0312   Epoch: 14   Global Step: 155290   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:43:39,129-Speed 5966.40 samples/sec   Loss 4.2839   LearningRate 0.0311   Epoch: 14   Global Step: 155300   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:43:45,998-Speed 5965.04 samples/sec   Loss 4.2396   LearningRate 0.0311   Epoch: 14   Global Step: 155310   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:43:52,856-Speed 5973.57 samples/sec   Loss 4.2511   LearningRate 0.0311   Epoch: 14   Global Step: 155320   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:43:59,715-Speed 5973.97 samples/sec   Loss 4.3098   LearningRate 0.0311   Epoch: 14   Global Step: 155330   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:44:06,568-Speed 5978.14 samples/sec   Loss 4.2232   LearningRate 0.0311   Epoch: 14   Global Step: 155340   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:44:13,428-Speed 5972.41 samples/sec   Loss 4.2717   LearningRate 0.0311   Epoch: 14   Global Step: 155350   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 02:44:20,307-Speed 5955.94 samples/sec   Loss 4.2730   LearningRate 0.0311   Epoch: 14   Global Step: 155360   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 02:44:27,165-Speed 5976.65 samples/sec   Loss 4.2622   LearningRate 0.0311   Epoch: 14   Global Step: 155370   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 02:44:34,022-Speed 5974.04 samples/sec   Loss 4.2605   LearningRate 0.0310   Epoch: 14   Global Step: 155380   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 02:44:40,888-Speed 5969.84 samples/sec   Loss 4.2259   LearningRate 0.0310   Epoch: 14   Global Step: 155390   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 02:44:47,735-Speed 5983.90 samples/sec   Loss 4.3006   LearningRate 0.0310   Epoch: 14   Global Step: 155400   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:44:54,596-Speed 5970.24 samples/sec   Loss 4.2343   LearningRate 0.0310   Epoch: 14   Global Step: 155410   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:45:01,461-Speed 5968.81 samples/sec   Loss 4.3273   LearningRate 0.0310   Epoch: 14   Global Step: 155420   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:45:08,338-Speed 5957.54 samples/sec   Loss 4.2436   LearningRate 0.0310   Epoch: 14   Global Step: 155430   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:45:15,187-Speed 5981.37 samples/sec   Loss 4.2263   LearningRate 0.0310   Epoch: 14   Global Step: 155440   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:45:22,056-Speed 5964.15 samples/sec   Loss 4.2613   LearningRate 0.0310   Epoch: 14   Global Step: 155450   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:45:28,949-Speed 5943.63 samples/sec   Loss 4.3042   LearningRate 0.0310   Epoch: 14   Global Step: 155460   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:45:35,813-Speed 5968.59 samples/sec   Loss 4.2813   LearningRate 0.0309   Epoch: 14   Global Step: 155470   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:45:42,692-Speed 5955.30 samples/sec   Loss 4.2546   LearningRate 0.0309   Epoch: 14   Global Step: 155480   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:45:49,563-Speed 5962.00 samples/sec   Loss 4.2613   LearningRate 0.0309   Epoch: 14   Global Step: 155490   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:45:56,430-Speed 5966.26 samples/sec   Loss 4.2554   LearningRate 0.0309   Epoch: 14   Global Step: 155500   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 02:46:03,286-Speed 5977.93 samples/sec   Loss 4.2785   LearningRate 0.0309   Epoch: 14   Global Step: 155510   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 02:46:10,138-Speed 5978.69 samples/sec   Loss 4.2971   LearningRate 0.0309   Epoch: 14   Global Step: 155520   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:46:17,008-Speed 5962.91 samples/sec   Loss 4.2538   LearningRate 0.0309   Epoch: 14   Global Step: 155530   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:46:23,867-Speed 5973.54 samples/sec   Loss 4.2244   LearningRate 0.0309   Epoch: 14   Global Step: 155540   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:46:48,072-Speed 1692.33 samples/sec   Loss 4.2801   LearningRate 0.0308   Epoch: 15   Global Step: 155550   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:46:54,898-Speed 6001.69 samples/sec   Loss 4.1888   LearningRate 0.0308   Epoch: 15   Global Step: 155560   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:47:01,742-Speed 5985.96 samples/sec   Loss 4.2823   LearningRate 0.0308   Epoch: 15   Global Step: 155570   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:47:08,569-Speed 6000.65 samples/sec   Loss 4.2387   LearningRate 0.0308   Epoch: 15   Global Step: 155580   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:47:15,431-Speed 5970.06 samples/sec   Loss 4.2739   LearningRate 0.0308   Epoch: 15   Global Step: 155590   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:47:22,291-Speed 5971.85 samples/sec   Loss 4.2980   LearningRate 0.0308   Epoch: 15   Global Step: 155600   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:47:29,165-Speed 5960.08 samples/sec   Loss 4.2299   LearningRate 0.0308   Epoch: 15   Global Step: 155610   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:47:36,007-Speed 5987.91 samples/sec   Loss 4.2514   LearningRate 0.0308   Epoch: 15   Global Step: 155620   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:47:42,851-Speed 5985.94 samples/sec   Loss 4.1900   LearningRate 0.0308   Epoch: 15   Global Step: 155630   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:47:49,703-Speed 5979.08 samples/sec   Loss 4.1991   LearningRate 0.0307   Epoch: 15   Global Step: 155640   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:47:56,573-Speed 5963.82 samples/sec   Loss 4.2476   LearningRate 0.0307   Epoch: 15   Global Step: 155650   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:48:03,438-Speed 5967.40 samples/sec   Loss 4.2030   LearningRate 0.0307   Epoch: 15   Global Step: 155660   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:48:10,313-Speed 5959.49 samples/sec   Loss 4.2293   LearningRate 0.0307   Epoch: 15   Global Step: 155670   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:48:17,179-Speed 5967.33 samples/sec   Loss 4.2290   LearningRate 0.0307   Epoch: 15   Global Step: 155680   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:48:24,037-Speed 5972.96 samples/sec   Loss 4.1781   LearningRate 0.0307   Epoch: 15   Global Step: 155690   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:48:30,911-Speed 5960.10 samples/sec   Loss 4.2442   LearningRate 0.0307   Epoch: 15   Global Step: 155700   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:48:37,772-Speed 5971.54 samples/sec   Loss 4.2495   LearningRate 0.0307   Epoch: 15   Global Step: 155710   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:48:44,660-Speed 5947.51 samples/sec   Loss 4.2086   LearningRate 0.0306   Epoch: 15   Global Step: 155720   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:48:51,541-Speed 5956.55 samples/sec   Loss 4.2615   LearningRate 0.0306   Epoch: 15   Global Step: 155730   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:48:58,400-Speed 5973.34 samples/sec   Loss 4.2139   LearningRate 0.0306   Epoch: 15   Global Step: 155740   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:49:05,250-Speed 5980.26 samples/sec   Loss 4.2612   LearningRate 0.0306   Epoch: 15   Global Step: 155750   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 02:49:12,115-Speed 5967.70 samples/sec   Loss 4.2376   LearningRate 0.0306   Epoch: 15   Global Step: 155760   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:49:18,968-Speed 5977.47 samples/sec   Loss 4.2118   LearningRate 0.0306   Epoch: 15   Global Step: 155770   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:49:25,828-Speed 5972.19 samples/sec   Loss 4.1898   LearningRate 0.0306   Epoch: 15   Global Step: 155780   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:49:32,717-Speed 5947.17 samples/sec   Loss 4.2490   LearningRate 0.0306   Epoch: 15   Global Step: 155790   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:49:39,582-Speed 5968.04 samples/sec   Loss 4.1872   LearningRate 0.0305   Epoch: 15   Global Step: 155800   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:49:46,467-Speed 5950.09 samples/sec   Loss 4.1895   LearningRate 0.0305   Epoch: 15   Global Step: 155810   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:49:53,350-Speed 5952.94 samples/sec   Loss 4.2206   LearningRate 0.0305   Epoch: 15   Global Step: 155820   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:50:00,219-Speed 5964.72 samples/sec   Loss 4.1772   LearningRate 0.0305   Epoch: 15   Global Step: 155830   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:50:07,111-Speed 5944.85 samples/sec   Loss 4.2335   LearningRate 0.0305   Epoch: 15   Global Step: 155840   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:50:13,989-Speed 5957.64 samples/sec   Loss 4.2865   LearningRate 0.0305   Epoch: 15   Global Step: 155850   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:50:20,862-Speed 5961.50 samples/sec   Loss 4.1800   LearningRate 0.0305   Epoch: 15   Global Step: 155860   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 02:50:27,732-Speed 5963.12 samples/sec   Loss 4.2565   LearningRate 0.0305   Epoch: 15   Global Step: 155870   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 02:50:34,589-Speed 5974.67 samples/sec   Loss 4.2556   LearningRate 0.0305   Epoch: 15   Global Step: 155880   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:50:41,467-Speed 5956.22 samples/sec   Loss 4.2267   LearningRate 0.0304   Epoch: 15   Global Step: 155890   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:50:48,317-Speed 5980.82 samples/sec   Loss 4.2292   LearningRate 0.0304   Epoch: 15   Global Step: 155900   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:50:55,274-Speed 5889.16 samples/sec   Loss 4.2030   LearningRate 0.0304   Epoch: 15   Global Step: 155910   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:51:02,136-Speed 5970.13 samples/sec   Loss 4.2335   LearningRate 0.0304   Epoch: 15   Global Step: 155920   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:51:09,036-Speed 5937.71 samples/sec   Loss 4.2125   LearningRate 0.0304   Epoch: 15   Global Step: 155930   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:51:15,894-Speed 5974.24 samples/sec   Loss 4.2619   LearningRate 0.0304   Epoch: 15   Global Step: 155940   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:51:22,769-Speed 5959.76 samples/sec   Loss 4.2235   LearningRate 0.0304   Epoch: 15   Global Step: 155950   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:51:29,628-Speed 5972.49 samples/sec   Loss 4.2252   LearningRate 0.0304   Epoch: 15   Global Step: 155960   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:51:36,493-Speed 5967.72 samples/sec   Loss 4.2104   LearningRate 0.0303   Epoch: 15   Global Step: 155970   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:51:43,369-Speed 5958.66 samples/sec   Loss 4.1925   LearningRate 0.0303   Epoch: 15   Global Step: 155980   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 02:51:50,248-Speed 5954.91 samples/sec   Loss 4.2011   LearningRate 0.0303   Epoch: 15   Global Step: 155990   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 02:51:57,108-Speed 5972.75 samples/sec   Loss 4.1925   LearningRate 0.0303   Epoch: 15   Global Step: 156000   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 02:52:03,948-Speed 5989.44 samples/sec   Loss 4.2293   LearningRate 0.0303   Epoch: 15   Global Step: 156010   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:52:10,820-Speed 5961.66 samples/sec   Loss 4.2168   LearningRate 0.0303   Epoch: 15   Global Step: 156020   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:52:17,702-Speed 5954.27 samples/sec   Loss 4.2856   LearningRate 0.0303   Epoch: 15   Global Step: 156030   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:52:24,576-Speed 5960.56 samples/sec   Loss 4.2277   LearningRate 0.0303   Epoch: 15   Global Step: 156040   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:52:31,436-Speed 5972.06 samples/sec   Loss 4.2201   LearningRate 0.0303   Epoch: 15   Global Step: 156050   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:52:38,401-Speed 5881.91 samples/sec   Loss 4.2405   LearningRate 0.0302   Epoch: 15   Global Step: 156060   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:52:45,266-Speed 5969.37 samples/sec   Loss 4.2271   LearningRate 0.0302   Epoch: 15   Global Step: 156070   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:52:52,168-Speed 5935.79 samples/sec   Loss 4.1581   LearningRate 0.0302   Epoch: 15   Global Step: 156080   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:52:59,052-Speed 5951.08 samples/sec   Loss 4.1912   LearningRate 0.0302   Epoch: 15   Global Step: 156090   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:53:05,908-Speed 5975.80 samples/sec   Loss 4.1686   LearningRate 0.0302   Epoch: 15   Global Step: 156100   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:53:12,787-Speed 5955.36 samples/sec   Loss 4.1739   LearningRate 0.0302   Epoch: 15   Global Step: 156110   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:53:19,679-Speed 5944.87 samples/sec   Loss 4.2002   LearningRate 0.0302   Epoch: 15   Global Step: 156120   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:53:26,558-Speed 5955.63 samples/sec   Loss 4.1871   LearningRate 0.0302   Epoch: 15   Global Step: 156130   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:53:33,408-Speed 5981.10 samples/sec   Loss 4.2306   LearningRate 0.0301   Epoch: 15   Global Step: 156140   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:53:40,277-Speed 5964.02 samples/sec   Loss 4.2017   LearningRate 0.0301   Epoch: 15   Global Step: 156150   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:53:47,156-Speed 5955.59 samples/sec   Loss 4.2193   LearningRate 0.0301   Epoch: 15   Global Step: 156160   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:53:54,012-Speed 5975.33 samples/sec   Loss 4.2393   LearningRate 0.0301   Epoch: 15   Global Step: 156170   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:54:00,898-Speed 5949.10 samples/sec   Loss 4.2036   LearningRate 0.0301   Epoch: 15   Global Step: 156180   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:54:07,753-Speed 5976.54 samples/sec   Loss 4.2491   LearningRate 0.0301   Epoch: 15   Global Step: 156190   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:54:14,625-Speed 5961.55 samples/sec   Loss 4.2117   LearningRate 0.0301   Epoch: 15   Global Step: 156200   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:54:21,486-Speed 5970.88 samples/sec   Loss 4.2042   LearningRate 0.0301   Epoch: 15   Global Step: 156210   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:54:28,343-Speed 5975.19 samples/sec   Loss 4.2044   LearningRate 0.0301   Epoch: 15   Global Step: 156220   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:54:35,203-Speed 5971.55 samples/sec   Loss 4.1924   LearningRate 0.0300   Epoch: 15   Global Step: 156230   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:54:42,062-Speed 5973.37 samples/sec   Loss 4.1767   LearningRate 0.0300   Epoch: 15   Global Step: 156240   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:54:48,943-Speed 5953.85 samples/sec   Loss 4.2053   LearningRate 0.0300   Epoch: 15   Global Step: 156250   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 02:54:55,814-Speed 5962.89 samples/sec   Loss 4.1770   LearningRate 0.0300   Epoch: 15   Global Step: 156260   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:55:02,700-Speed 5949.36 samples/sec   Loss 4.1805   LearningRate 0.0300   Epoch: 15   Global Step: 156270   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:55:09,588-Speed 5947.89 samples/sec   Loss 4.2158   LearningRate 0.0300   Epoch: 15   Global Step: 156280   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:55:16,450-Speed 5969.95 samples/sec   Loss 4.1778   LearningRate 0.0300   Epoch: 15   Global Step: 156290   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:55:23,402-Speed 5893.41 samples/sec   Loss 4.2068   LearningRate 0.0300   Epoch: 15   Global Step: 156300   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:55:30,276-Speed 5962.43 samples/sec   Loss 4.1792   LearningRate 0.0299   Epoch: 15   Global Step: 156310   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:55:37,127-Speed 5979.81 samples/sec   Loss 4.1443   LearningRate 0.0299   Epoch: 15   Global Step: 156320   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:55:43,976-Speed 5981.55 samples/sec   Loss 4.1932   LearningRate 0.0299   Epoch: 15   Global Step: 156330   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:55:50,826-Speed 5980.25 samples/sec   Loss 4.2016   LearningRate 0.0299   Epoch: 15   Global Step: 156340   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:55:57,681-Speed 5976.27 samples/sec   Loss 4.1744   LearningRate 0.0299   Epoch: 15   Global Step: 156350   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:56:04,537-Speed 5975.42 samples/sec   Loss 4.2295   LearningRate 0.0299   Epoch: 15   Global Step: 156360   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 02:56:11,404-Speed 5966.61 samples/sec   Loss 4.1946   LearningRate 0.0299   Epoch: 15   Global Step: 156370   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:56:18,278-Speed 5959.16 samples/sec   Loss 4.2344   LearningRate 0.0299   Epoch: 15   Global Step: 156380   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:56:25,149-Speed 5963.58 samples/sec   Loss 4.2169   LearningRate 0.0299   Epoch: 15   Global Step: 156390   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:56:32,019-Speed 5963.92 samples/sec   Loss 4.1918   LearningRate 0.0298   Epoch: 15   Global Step: 156400   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:56:38,868-Speed 5981.68 samples/sec   Loss 4.2108   LearningRate 0.0298   Epoch: 15   Global Step: 156410   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:56:45,749-Speed 5953.57 samples/sec   Loss 4.1969   LearningRate 0.0298   Epoch: 15   Global Step: 156420   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:56:52,598-Speed 5981.86 samples/sec   Loss 4.1638   LearningRate 0.0298   Epoch: 15   Global Step: 156430   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:56:59,469-Speed 5962.60 samples/sec   Loss 4.1562   LearningRate 0.0298   Epoch: 15   Global Step: 156440   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:57:06,319-Speed 5981.13 samples/sec   Loss 4.1777   LearningRate 0.0298   Epoch: 15   Global Step: 156450   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:57:13,197-Speed 5956.24 samples/sec   Loss 4.1241   LearningRate 0.0298   Epoch: 15   Global Step: 156460   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:57:20,076-Speed 5955.80 samples/sec   Loss 4.1633   LearningRate 0.0298   Epoch: 15   Global Step: 156470   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:57:26,940-Speed 5968.91 samples/sec   Loss 4.1631   LearningRate 0.0297   Epoch: 15   Global Step: 156480   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:57:33,797-Speed 5974.80 samples/sec   Loss 4.1743   LearningRate 0.0297   Epoch: 15   Global Step: 156490   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:57:40,672-Speed 5958.23 samples/sec   Loss 4.1752   LearningRate 0.0297   Epoch: 15   Global Step: 156500   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:57:47,535-Speed 5969.79 samples/sec   Loss 4.1755   LearningRate 0.0297   Epoch: 15   Global Step: 156510   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:57:54,399-Speed 5970.11 samples/sec   Loss 4.1447   LearningRate 0.0297   Epoch: 15   Global Step: 156520   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:58:01,273-Speed 5959.97 samples/sec   Loss 4.1756   LearningRate 0.0297   Epoch: 15   Global Step: 156530   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:58:08,132-Speed 5974.19 samples/sec   Loss 4.2218   LearningRate 0.0297   Epoch: 15   Global Step: 156540   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:58:14,973-Speed 5988.83 samples/sec   Loss 4.1610   LearningRate 0.0297   Epoch: 15   Global Step: 156550   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:58:21,844-Speed 5961.62 samples/sec   Loss 4.1687   LearningRate 0.0297   Epoch: 15   Global Step: 156560   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 02:58:28,701-Speed 5974.56 samples/sec   Loss 4.2183   LearningRate 0.0296   Epoch: 15   Global Step: 156570   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 02:58:35,550-Speed 5982.06 samples/sec   Loss 4.1760   LearningRate 0.0296   Epoch: 15   Global Step: 156580   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 02:58:42,399-Speed 5981.00 samples/sec   Loss 4.1938   LearningRate 0.0296   Epoch: 15   Global Step: 156590   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 02:58:49,253-Speed 5977.52 samples/sec   Loss 4.1692   LearningRate 0.0296   Epoch: 15   Global Step: 156600   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 02:58:56,102-Speed 5982.39 samples/sec   Loss 4.1513   LearningRate 0.0296   Epoch: 15   Global Step: 156610   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 02:59:02,952-Speed 5980.29 samples/sec   Loss 4.1658   LearningRate 0.0296   Epoch: 15   Global Step: 156620   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 02:59:09,842-Speed 5950.50 samples/sec   Loss 4.1464   LearningRate 0.0296   Epoch: 15   Global Step: 156630   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 02:59:16,693-Speed 5980.48 samples/sec   Loss 4.1519   LearningRate 0.0296   Epoch: 15   Global Step: 156640   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 02:59:23,541-Speed 5981.87 samples/sec   Loss 4.1231   LearningRate 0.0296   Epoch: 15   Global Step: 156650   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 02:59:30,392-Speed 5980.33 samples/sec   Loss 4.1937   LearningRate 0.0295   Epoch: 15   Global Step: 156660   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 02:59:37,245-Speed 5977.39 samples/sec   Loss 4.1689   LearningRate 0.0295   Epoch: 15   Global Step: 156670   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-01-09 02:59:44,101-Speed 5975.54 samples/sec   Loss 4.1749   LearningRate 0.0295   Epoch: 15   Global Step: 156680   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 02:59:50,964-Speed 5972.65 samples/sec   Loss 4.1754   LearningRate 0.0295   Epoch: 15   Global Step: 156690   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 02:59:57,830-Speed 5966.97 samples/sec   Loss 4.1568   LearningRate 0.0295   Epoch: 15   Global Step: 156700   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 03:00:04,694-Speed 5968.47 samples/sec   Loss 4.1461   LearningRate 0.0295   Epoch: 15   Global Step: 156710   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 03:00:11,546-Speed 5980.19 samples/sec   Loss 4.1746   LearningRate 0.0295   Epoch: 15   Global Step: 156720   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 03:00:18,435-Speed 5947.38 samples/sec   Loss 4.1771   LearningRate 0.0295   Epoch: 15   Global Step: 156730   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 03:00:25,308-Speed 5960.94 samples/sec   Loss 4.1022   LearningRate 0.0294   Epoch: 15   Global Step: 156740   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 03:00:32,164-Speed 5975.32 samples/sec   Loss 4.1516   LearningRate 0.0294   Epoch: 15   Global Step: 156750   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 03:00:39,025-Speed 5972.62 samples/sec   Loss 4.1577   LearningRate 0.0294   Epoch: 15   Global Step: 156760   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:00:45,899-Speed 5959.59 samples/sec   Loss 4.2068   LearningRate 0.0294   Epoch: 15   Global Step: 156770   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:00:52,758-Speed 5973.40 samples/sec   Loss 4.1418   LearningRate 0.0294   Epoch: 15   Global Step: 156780   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:00:59,614-Speed 5976.06 samples/sec   Loss 4.2003   LearningRate 0.0294   Epoch: 15   Global Step: 156790   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:01:06,511-Speed 5940.19 samples/sec   Loss 4.1481   LearningRate 0.0294   Epoch: 15   Global Step: 156800   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:01:13,375-Speed 5969.59 samples/sec   Loss 4.1521   LearningRate 0.0294   Epoch: 15   Global Step: 156810   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:01:20,229-Speed 5977.88 samples/sec   Loss 4.1691   LearningRate 0.0294   Epoch: 15   Global Step: 156820   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:01:27,090-Speed 5969.73 samples/sec   Loss 4.1878   LearningRate 0.0293   Epoch: 15   Global Step: 156830   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:01:33,955-Speed 5968.03 samples/sec   Loss 4.2120   LearningRate 0.0293   Epoch: 15   Global Step: 156840   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:01:40,838-Speed 5952.06 samples/sec   Loss 4.1504   LearningRate 0.0293   Epoch: 15   Global Step: 156850   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:01:47,704-Speed 5966.85 samples/sec   Loss 4.0900   LearningRate 0.0293   Epoch: 15   Global Step: 156860   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:01:54,612-Speed 5930.69 samples/sec   Loss 4.1585   LearningRate 0.0293   Epoch: 15   Global Step: 156870   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:02:01,493-Speed 5953.82 samples/sec   Loss 4.1356   LearningRate 0.0293   Epoch: 15   Global Step: 156880   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:02:08,347-Speed 5977.38 samples/sec   Loss 4.1211   LearningRate 0.0293   Epoch: 15   Global Step: 156890   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:02:15,198-Speed 5980.17 samples/sec   Loss 4.1373   LearningRate 0.0293   Epoch: 15   Global Step: 156900   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:02:22,047-Speed 5981.71 samples/sec   Loss 4.1141   LearningRate 0.0292   Epoch: 15   Global Step: 156910   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:02:28,908-Speed 5970.72 samples/sec   Loss 4.1448   LearningRate 0.0292   Epoch: 15   Global Step: 156920   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:02:35,754-Speed 5984.85 samples/sec   Loss 4.1626   LearningRate 0.0292   Epoch: 15   Global Step: 156930   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:02:42,663-Speed 5929.31 samples/sec   Loss 4.1596   LearningRate 0.0292   Epoch: 15   Global Step: 156940   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:02:49,547-Speed 5951.43 samples/sec   Loss 4.1415   LearningRate 0.0292   Epoch: 15   Global Step: 156950   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:02:56,420-Speed 5961.37 samples/sec   Loss 4.1384   LearningRate 0.0292   Epoch: 15   Global Step: 156960   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:03:03,263-Speed 5986.32 samples/sec   Loss 4.1235   LearningRate 0.0292   Epoch: 15   Global Step: 156970   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:03:10,128-Speed 5967.96 samples/sec   Loss 4.1331   LearningRate 0.0292   Epoch: 15   Global Step: 156980   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:03:16,974-Speed 5984.06 samples/sec   Loss 4.1235   LearningRate 0.0292   Epoch: 15   Global Step: 156990   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:03:23,836-Speed 5970.80 samples/sec   Loss 4.1226   LearningRate 0.0291   Epoch: 15   Global Step: 157000   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:03:30,724-Speed 5947.90 samples/sec   Loss 4.1457   LearningRate 0.0291   Epoch: 15   Global Step: 157010   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 03:03:37,609-Speed 5951.78 samples/sec   Loss 4.1498   LearningRate 0.0291   Epoch: 15   Global Step: 157020   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:03:44,471-Speed 5970.45 samples/sec   Loss 4.1506   LearningRate 0.0291   Epoch: 15   Global Step: 157030   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:03:51,331-Speed 5971.62 samples/sec   Loss 4.1670   LearningRate 0.0291   Epoch: 15   Global Step: 157040   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:03:58,198-Speed 5966.92 samples/sec   Loss 4.1241   LearningRate 0.0291   Epoch: 15   Global Step: 157050   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:04:05,057-Speed 5972.73 samples/sec   Loss 4.1754   LearningRate 0.0291   Epoch: 15   Global Step: 157060   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:04:11,909-Speed 5979.54 samples/sec   Loss 4.1145   LearningRate 0.0291   Epoch: 15   Global Step: 157070   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:04:18,782-Speed 5960.90 samples/sec   Loss 4.1807   LearningRate 0.0291   Epoch: 15   Global Step: 157080   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:04:25,645-Speed 5969.56 samples/sec   Loss 4.1774   LearningRate 0.0290   Epoch: 15   Global Step: 157090   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:04:32,497-Speed 5978.53 samples/sec   Loss 4.1043   LearningRate 0.0290   Epoch: 15   Global Step: 157100   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:04:39,358-Speed 5971.30 samples/sec   Loss 4.1587   LearningRate 0.0290   Epoch: 15   Global Step: 157110   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:04:46,224-Speed 5972.73 samples/sec   Loss 4.1221   LearningRate 0.0290   Epoch: 15   Global Step: 157120   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:04:53,073-Speed 5981.39 samples/sec   Loss 4.1582   LearningRate 0.0290   Epoch: 15   Global Step: 157130   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:04:59,948-Speed 5961.59 samples/sec   Loss 4.1386   LearningRate 0.0290   Epoch: 15   Global Step: 157140   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:05:06,799-Speed 5979.37 samples/sec   Loss 4.1764   LearningRate 0.0290   Epoch: 15   Global Step: 157150   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:05:13,656-Speed 5974.43 samples/sec   Loss 4.1335   LearningRate 0.0290   Epoch: 15   Global Step: 157160   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:05:20,537-Speed 5954.22 samples/sec   Loss 4.1002   LearningRate 0.0289   Epoch: 15   Global Step: 157170   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:05:27,389-Speed 5979.52 samples/sec   Loss 4.1463   LearningRate 0.0289   Epoch: 15   Global Step: 157180   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:05:34,233-Speed 5985.21 samples/sec   Loss 4.1443   LearningRate 0.0289   Epoch: 15   Global Step: 157190   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:05:41,076-Speed 5987.52 samples/sec   Loss 4.1549   LearningRate 0.0289   Epoch: 15   Global Step: 157200   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:05:47,925-Speed 5981.97 samples/sec   Loss 4.1182   LearningRate 0.0289   Epoch: 15   Global Step: 157210   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:05:54,766-Speed 5987.88 samples/sec   Loss 4.0860   LearningRate 0.0289   Epoch: 15   Global Step: 157220   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:06:01,616-Speed 5980.39 samples/sec   Loss 4.0975   LearningRate 0.0289   Epoch: 15   Global Step: 157230   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:06:08,469-Speed 5980.46 samples/sec   Loss 4.1216   LearningRate 0.0289   Epoch: 15   Global Step: 157240   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 03:06:15,340-Speed 5962.05 samples/sec   Loss 4.1086   LearningRate 0.0289   Epoch: 15   Global Step: 157250   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 03:06:22,181-Speed 5988.95 samples/sec   Loss 4.0972   LearningRate 0.0288   Epoch: 15   Global Step: 157260   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:06:29,018-Speed 5992.01 samples/sec   Loss 4.1149   LearningRate 0.0288   Epoch: 15   Global Step: 157270   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:06:35,876-Speed 5973.42 samples/sec   Loss 4.0988   LearningRate 0.0288   Epoch: 15   Global Step: 157280   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:06:42,716-Speed 5991.05 samples/sec   Loss 4.1578   LearningRate 0.0288   Epoch: 15   Global Step: 157290   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:06:49,563-Speed 5984.66 samples/sec   Loss 4.1290   LearningRate 0.0288   Epoch: 15   Global Step: 157300   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:06:56,415-Speed 5977.98 samples/sec   Loss 4.0650   LearningRate 0.0288   Epoch: 15   Global Step: 157310   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:07:03,277-Speed 5971.25 samples/sec   Loss 4.1333   LearningRate 0.0288   Epoch: 15   Global Step: 157320   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:07:10,128-Speed 5980.36 samples/sec   Loss 4.0867   LearningRate 0.0288   Epoch: 15   Global Step: 157330   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:07:16,977-Speed 5981.34 samples/sec   Loss 4.1083   LearningRate 0.0288   Epoch: 15   Global Step: 157340   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:07:23,832-Speed 5978.46 samples/sec   Loss 4.1164   LearningRate 0.0287   Epoch: 15   Global Step: 157350   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:07:30,678-Speed 5984.39 samples/sec   Loss 4.1165   LearningRate 0.0287   Epoch: 15   Global Step: 157360   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:07:37,534-Speed 5976.68 samples/sec   Loss 4.1019   LearningRate 0.0287   Epoch: 15   Global Step: 157370   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:07:44,414-Speed 5954.35 samples/sec   Loss 4.1153   LearningRate 0.0287   Epoch: 15   Global Step: 157380   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:07:51,268-Speed 5977.55 samples/sec   Loss 4.1249   LearningRate 0.0287   Epoch: 15   Global Step: 157390   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:07:58,113-Speed 5984.43 samples/sec   Loss 4.0980   LearningRate 0.0287   Epoch: 15   Global Step: 157400   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:08:04,979-Speed 5967.26 samples/sec   Loss 4.0585   LearningRate 0.0287   Epoch: 15   Global Step: 157410   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:08:11,832-Speed 5978.44 samples/sec   Loss 4.1441   LearningRate 0.0287   Epoch: 15   Global Step: 157420   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:08:18,706-Speed 5961.58 samples/sec   Loss 4.1191   LearningRate 0.0286   Epoch: 15   Global Step: 157430   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:08:25,578-Speed 5961.95 samples/sec   Loss 4.0886   LearningRate 0.0286   Epoch: 15   Global Step: 157440   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:08:32,463-Speed 5950.62 samples/sec   Loss 4.0765   LearningRate 0.0286   Epoch: 15   Global Step: 157450   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:08:39,322-Speed 5972.80 samples/sec   Loss 4.1047   LearningRate 0.0286   Epoch: 15   Global Step: 157460   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:08:46,167-Speed 5985.07 samples/sec   Loss 4.1302   LearningRate 0.0286   Epoch: 15   Global Step: 157470   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:08:53,020-Speed 5980.64 samples/sec   Loss 4.1277   LearningRate 0.0286   Epoch: 15   Global Step: 157480   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:08:59,896-Speed 5957.96 samples/sec   Loss 4.0997   LearningRate 0.0286   Epoch: 15   Global Step: 157490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 03:09:06,736-Speed 5989.21 samples/sec   Loss 4.0890   LearningRate 0.0286   Epoch: 15   Global Step: 157500   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:09:13,591-Speed 5977.05 samples/sec   Loss 4.0679   LearningRate 0.0286   Epoch: 15   Global Step: 157510   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:09:20,449-Speed 5973.49 samples/sec   Loss 4.1639   LearningRate 0.0285   Epoch: 15   Global Step: 157520   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:09:27,328-Speed 5956.09 samples/sec   Loss 4.1010   LearningRate 0.0285   Epoch: 15   Global Step: 157530   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:09:34,197-Speed 5964.22 samples/sec   Loss 4.1327   LearningRate 0.0285   Epoch: 15   Global Step: 157540   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:09:41,052-Speed 5976.00 samples/sec   Loss 4.0896   LearningRate 0.0285   Epoch: 15   Global Step: 157550   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:09:47,909-Speed 5974.91 samples/sec   Loss 4.1016   LearningRate 0.0285   Epoch: 15   Global Step: 157560   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:09:54,766-Speed 5974.15 samples/sec   Loss 4.1213   LearningRate 0.0285   Epoch: 15   Global Step: 157570   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:10:01,641-Speed 5959.24 samples/sec   Loss 4.0776   LearningRate 0.0285   Epoch: 15   Global Step: 157580   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:10:08,531-Speed 5946.29 samples/sec   Loss 4.0950   LearningRate 0.0285   Epoch: 15   Global Step: 157590   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:10:15,381-Speed 5980.55 samples/sec   Loss 4.1173   LearningRate 0.0285   Epoch: 15   Global Step: 157600   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 03:10:22,247-Speed 5967.00 samples/sec   Loss 4.1222   LearningRate 0.0284   Epoch: 15   Global Step: 157610   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:10:29,126-Speed 5956.39 samples/sec   Loss 4.0914   LearningRate 0.0284   Epoch: 15   Global Step: 157620   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:10:35,989-Speed 5969.91 samples/sec   Loss 4.0634   LearningRate 0.0284   Epoch: 15   Global Step: 157630   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:10:42,848-Speed 5972.09 samples/sec   Loss 4.0967   LearningRate 0.0284   Epoch: 15   Global Step: 157640   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:10:49,757-Speed 5929.99 samples/sec   Loss 4.0964   LearningRate 0.0284   Epoch: 15   Global Step: 157650   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:10:56,607-Speed 5980.57 samples/sec   Loss 4.0664   LearningRate 0.0284   Epoch: 15   Global Step: 157660   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:11:03,467-Speed 5972.29 samples/sec   Loss 4.0863   LearningRate 0.0284   Epoch: 15   Global Step: 157670   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:11:10,325-Speed 5973.71 samples/sec   Loss 4.0356   LearningRate 0.0284   Epoch: 15   Global Step: 157680   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:11:17,193-Speed 5965.35 samples/sec   Loss 4.1007   LearningRate 0.0284   Epoch: 15   Global Step: 157690   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:11:24,047-Speed 5976.65 samples/sec   Loss 4.1201   LearningRate 0.0283   Epoch: 15   Global Step: 157700   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:11:30,925-Speed 5956.39 samples/sec   Loss 4.0702   LearningRate 0.0283   Epoch: 15   Global Step: 157710   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:11:37,850-Speed 5916.88 samples/sec   Loss 4.1187   LearningRate 0.0283   Epoch: 15   Global Step: 157720   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:11:44,701-Speed 5979.15 samples/sec   Loss 4.0497   LearningRate 0.0283   Epoch: 15   Global Step: 157730   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:11:51,566-Speed 5968.26 samples/sec   Loss 4.1071   LearningRate 0.0283   Epoch: 15   Global Step: 157740   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:11:58,416-Speed 5981.43 samples/sec   Loss 4.1067   LearningRate 0.0283   Epoch: 15   Global Step: 157750   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:12:05,266-Speed 5979.78 samples/sec   Loss 4.0950   LearningRate 0.0283   Epoch: 15   Global Step: 157760   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:12:12,110-Speed 5986.47 samples/sec   Loss 4.1156   LearningRate 0.0283   Epoch: 15   Global Step: 157770   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:12:18,967-Speed 5974.40 samples/sec   Loss 4.0842   LearningRate 0.0282   Epoch: 15   Global Step: 157780   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:12:25,848-Speed 5954.09 samples/sec   Loss 4.1205   LearningRate 0.0282   Epoch: 15   Global Step: 157790   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:12:32,717-Speed 5964.00 samples/sec   Loss 4.1121   LearningRate 0.0282   Epoch: 15   Global Step: 157800   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:12:39,594-Speed 5957.63 samples/sec   Loss 4.0867   LearningRate 0.0282   Epoch: 15   Global Step: 157810   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:12:46,466-Speed 5961.32 samples/sec   Loss 4.0473   LearningRate 0.0282   Epoch: 15   Global Step: 157820   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:12:53,359-Speed 5944.82 samples/sec   Loss 4.1340   LearningRate 0.0282   Epoch: 15   Global Step: 157830   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:13:00,234-Speed 5959.09 samples/sec   Loss 4.1038   LearningRate 0.0282   Epoch: 15   Global Step: 157840   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:13:07,094-Speed 5974.54 samples/sec   Loss 4.1040   LearningRate 0.0282   Epoch: 15   Global Step: 157850   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:13:13,941-Speed 5983.89 samples/sec   Loss 4.0800   LearningRate 0.0282   Epoch: 15   Global Step: 157860   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:13:20,790-Speed 5981.14 samples/sec   Loss 4.0240   LearningRate 0.0281   Epoch: 15   Global Step: 157870   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:13:27,663-Speed 5960.70 samples/sec   Loss 4.0477   LearningRate 0.0281   Epoch: 15   Global Step: 157880   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:13:34,535-Speed 5961.06 samples/sec   Loss 4.0608   LearningRate 0.0281   Epoch: 15   Global Step: 157890   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:13:41,397-Speed 5971.13 samples/sec   Loss 4.1083   LearningRate 0.0281   Epoch: 15   Global Step: 157900   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:13:48,269-Speed 5960.83 samples/sec   Loss 4.0791   LearningRate 0.0281   Epoch: 15   Global Step: 157910   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:13:55,133-Speed 5969.05 samples/sec   Loss 4.0862   LearningRate 0.0281   Epoch: 15   Global Step: 157920   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:14:01,994-Speed 5971.24 samples/sec   Loss 4.0541   LearningRate 0.0281   Epoch: 15   Global Step: 157930   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:14:08,920-Speed 5914.71 samples/sec   Loss 4.0555   LearningRate 0.0281   Epoch: 15   Global Step: 157940   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:14:15,775-Speed 5976.68 samples/sec   Loss 4.0383   LearningRate 0.0281   Epoch: 15   Global Step: 157950   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:14:22,628-Speed 5978.82 samples/sec   Loss 4.1058   LearningRate 0.0280   Epoch: 15   Global Step: 157960   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:14:29,489-Speed 5970.84 samples/sec   Loss 4.0599   LearningRate 0.0280   Epoch: 15   Global Step: 157970   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:14:36,336-Speed 5984.91 samples/sec   Loss 4.0647   LearningRate 0.0280   Epoch: 15   Global Step: 157980   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:14:43,184-Speed 5982.24 samples/sec   Loss 4.0538   LearningRate 0.0280   Epoch: 15   Global Step: 157990   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:14:50,027-Speed 5986.45 samples/sec   Loss 4.0651   LearningRate 0.0280   Epoch: 15   Global Step: 158000   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 03:14:56,931-Speed 5934.21 samples/sec   Loss 4.1176   LearningRate 0.0280   Epoch: 15   Global Step: 158010   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 03:15:03,790-Speed 5973.45 samples/sec   Loss 4.0626   LearningRate 0.0280   Epoch: 15   Global Step: 158020   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 03:15:10,668-Speed 5955.73 samples/sec   Loss 4.0581   LearningRate 0.0280   Epoch: 15   Global Step: 158030   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 03:15:17,528-Speed 5973.36 samples/sec   Loss 4.0801   LearningRate 0.0280   Epoch: 15   Global Step: 158040   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:15:24,432-Speed 5934.53 samples/sec   Loss 4.0741   LearningRate 0.0279   Epoch: 15   Global Step: 158050   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:15:31,311-Speed 5954.49 samples/sec   Loss 4.0493   LearningRate 0.0279   Epoch: 15   Global Step: 158060   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:15:38,197-Speed 5949.71 samples/sec   Loss 4.0634   LearningRate 0.0279   Epoch: 15   Global Step: 158070   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:15:45,057-Speed 5972.82 samples/sec   Loss 4.0060   LearningRate 0.0279   Epoch: 15   Global Step: 158080   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:15:51,919-Speed 5970.22 samples/sec   Loss 4.0307   LearningRate 0.0279   Epoch: 15   Global Step: 158090   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:15:58,770-Speed 5981.75 samples/sec   Loss 4.0587   LearningRate 0.0279   Epoch: 15   Global Step: 158100   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:16:05,627-Speed 5977.15 samples/sec   Loss 4.0398   LearningRate 0.0279   Epoch: 15   Global Step: 158110   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:16:12,480-Speed 5977.50 samples/sec   Loss 4.0764   LearningRate 0.0279   Epoch: 15   Global Step: 158120   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:16:19,361-Speed 5953.58 samples/sec   Loss 4.0278   LearningRate 0.0279   Epoch: 15   Global Step: 158130   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:16:26,223-Speed 5970.48 samples/sec   Loss 4.0540   LearningRate 0.0278   Epoch: 15   Global Step: 158140   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:16:33,080-Speed 5974.68 samples/sec   Loss 4.0649   LearningRate 0.0278   Epoch: 15   Global Step: 158150   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:16:39,930-Speed 5980.48 samples/sec   Loss 4.1051   LearningRate 0.0278   Epoch: 15   Global Step: 158160   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:16:46,801-Speed 5962.53 samples/sec   Loss 4.0303   LearningRate 0.0278   Epoch: 15   Global Step: 158170   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:16:53,651-Speed 5980.62 samples/sec   Loss 4.0544   LearningRate 0.0278   Epoch: 15   Global Step: 158180   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:17:00,499-Speed 5983.08 samples/sec   Loss 4.0711   LearningRate 0.0278   Epoch: 15   Global Step: 158190   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:17:07,347-Speed 5982.10 samples/sec   Loss 4.0404   LearningRate 0.0278   Epoch: 15   Global Step: 158200   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:17:14,193-Speed 5983.81 samples/sec   Loss 4.0777   LearningRate 0.0278   Epoch: 15   Global Step: 158210   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:17:21,051-Speed 5972.97 samples/sec   Loss 4.0835   LearningRate 0.0278   Epoch: 15   Global Step: 158220   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:17:27,923-Speed 5961.74 samples/sec   Loss 4.0317   LearningRate 0.0277   Epoch: 15   Global Step: 158230   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:17:34,776-Speed 5977.69 samples/sec   Loss 4.0261   LearningRate 0.0277   Epoch: 15   Global Step: 158240   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:17:41,650-Speed 5959.13 samples/sec   Loss 4.0772   LearningRate 0.0277   Epoch: 15   Global Step: 158250   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:17:48,498-Speed 5983.06 samples/sec   Loss 4.0628   LearningRate 0.0277   Epoch: 15   Global Step: 158260   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:17:55,353-Speed 5975.72 samples/sec   Loss 4.0708   LearningRate 0.0277   Epoch: 15   Global Step: 158270   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-01-09 03:18:02,216-Speed 5970.15 samples/sec   Loss 4.0536   LearningRate 0.0277   Epoch: 15   Global Step: 158280   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:18:09,093-Speed 5957.80 samples/sec   Loss 4.0706   LearningRate 0.0277   Epoch: 15   Global Step: 158290   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:18:15,946-Speed 5978.26 samples/sec   Loss 4.0080   LearningRate 0.0277   Epoch: 15   Global Step: 158300   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:18:22,814-Speed 5964.59 samples/sec   Loss 4.0314   LearningRate 0.0276   Epoch: 15   Global Step: 158310   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:18:29,686-Speed 5962.54 samples/sec   Loss 4.0395   LearningRate 0.0276   Epoch: 15   Global Step: 158320   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:18:36,559-Speed 5961.54 samples/sec   Loss 4.0371   LearningRate 0.0276   Epoch: 15   Global Step: 158330   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:18:43,429-Speed 5963.67 samples/sec   Loss 4.0750   LearningRate 0.0276   Epoch: 15   Global Step: 158340   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:18:50,287-Speed 5973.08 samples/sec   Loss 4.0483   LearningRate 0.0276   Epoch: 15   Global Step: 158350   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:18:57,138-Speed 5979.98 samples/sec   Loss 4.0717   LearningRate 0.0276   Epoch: 15   Global Step: 158360   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:19:04,014-Speed 5961.11 samples/sec   Loss 4.0648   LearningRate 0.0276   Epoch: 15   Global Step: 158370   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:19:10,885-Speed 5962.07 samples/sec   Loss 4.0490   LearningRate 0.0276   Epoch: 15   Global Step: 158380   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 03:19:17,765-Speed 5955.24 samples/sec   Loss 4.0380   LearningRate 0.0276   Epoch: 15   Global Step: 158390   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 03:19:24,625-Speed 5974.81 samples/sec   Loss 4.0561   LearningRate 0.0275   Epoch: 15   Global Step: 158400   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 03:19:31,506-Speed 5953.61 samples/sec   Loss 4.0572   LearningRate 0.0275   Epoch: 15   Global Step: 158410   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 03:19:38,369-Speed 5968.92 samples/sec   Loss 4.0087   LearningRate 0.0275   Epoch: 15   Global Step: 158420   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 03:19:45,218-Speed 5981.63 samples/sec   Loss 4.0634   LearningRate 0.0275   Epoch: 15   Global Step: 158430   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:19:52,065-Speed 5983.56 samples/sec   Loss 4.0442   LearningRate 0.0275   Epoch: 15   Global Step: 158440   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:19:58,915-Speed 5980.40 samples/sec   Loss 4.0522   LearningRate 0.0275   Epoch: 15   Global Step: 158450   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:20:05,769-Speed 5976.69 samples/sec   Loss 4.0747   LearningRate 0.0275   Epoch: 15   Global Step: 158460   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:20:12,614-Speed 5985.52 samples/sec   Loss 4.0749   LearningRate 0.0275   Epoch: 15   Global Step: 158470   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:20:19,471-Speed 5974.18 samples/sec   Loss 4.0281   LearningRate 0.0275   Epoch: 15   Global Step: 158480   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:20:26,337-Speed 5967.16 samples/sec   Loss 4.0193   LearningRate 0.0274   Epoch: 15   Global Step: 158490   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:20:33,186-Speed 5981.98 samples/sec   Loss 4.0431   LearningRate 0.0274   Epoch: 15   Global Step: 158500   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:20:40,046-Speed 5972.21 samples/sec   Loss 4.0445   LearningRate 0.0274   Epoch: 15   Global Step: 158510   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:20:46,915-Speed 5963.87 samples/sec   Loss 4.0484   LearningRate 0.0274   Epoch: 15   Global Step: 158520   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:20:53,767-Speed 5979.13 samples/sec   Loss 4.0336   LearningRate 0.0274   Epoch: 15   Global Step: 158530   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 03:21:00,623-Speed 5975.04 samples/sec   Loss 4.0266   LearningRate 0.0274   Epoch: 15   Global Step: 158540   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 03:21:07,473-Speed 5983.33 samples/sec   Loss 4.0695   LearningRate 0.0274   Epoch: 15   Global Step: 158550   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:21:14,337-Speed 5967.78 samples/sec   Loss 4.0612   LearningRate 0.0274   Epoch: 15   Global Step: 158560   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:21:21,189-Speed 5979.30 samples/sec   Loss 4.0786   LearningRate 0.0274   Epoch: 15   Global Step: 158570   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:21:28,035-Speed 5984.98 samples/sec   Loss 4.0473   LearningRate 0.0273   Epoch: 15   Global Step: 158580   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:21:34,888-Speed 5978.68 samples/sec   Loss 3.9614   LearningRate 0.0273   Epoch: 15   Global Step: 158590   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:21:41,737-Speed 5980.81 samples/sec   Loss 4.0150   LearningRate 0.0273   Epoch: 15   Global Step: 158600   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:21:48,590-Speed 5978.35 samples/sec   Loss 4.0109   LearningRate 0.0273   Epoch: 15   Global Step: 158610   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:21:55,460-Speed 5963.21 samples/sec   Loss 4.0254   LearningRate 0.0273   Epoch: 15   Global Step: 158620   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:22:02,314-Speed 5977.31 samples/sec   Loss 4.0399   LearningRate 0.0273   Epoch: 15   Global Step: 158630   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:22:09,188-Speed 5959.58 samples/sec   Loss 3.9822   LearningRate 0.0273   Epoch: 15   Global Step: 158640   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:22:16,070-Speed 5953.32 samples/sec   Loss 4.0373   LearningRate 0.0273   Epoch: 15   Global Step: 158650   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 03:22:22,924-Speed 5976.12 samples/sec   Loss 4.0529   LearningRate 0.0273   Epoch: 15   Global Step: 158660   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:22:29,800-Speed 5960.15 samples/sec   Loss 4.0520   LearningRate 0.0272   Epoch: 15   Global Step: 158670   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:22:36,682-Speed 5954.66 samples/sec   Loss 4.0126   LearningRate 0.0272   Epoch: 15   Global Step: 158680   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:22:43,536-Speed 5976.56 samples/sec   Loss 4.0135   LearningRate 0.0272   Epoch: 15   Global Step: 158690   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:22:50,390-Speed 5977.19 samples/sec   Loss 4.0147   LearningRate 0.0272   Epoch: 15   Global Step: 158700   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:22:57,244-Speed 5977.69 samples/sec   Loss 4.0329   LearningRate 0.0272   Epoch: 15   Global Step: 158710   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:23:04,112-Speed 5964.30 samples/sec   Loss 4.0186   LearningRate 0.0272   Epoch: 15   Global Step: 158720   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:23:10,984-Speed 5962.24 samples/sec   Loss 3.9832   LearningRate 0.0272   Epoch: 15   Global Step: 158730   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:23:17,836-Speed 5978.32 samples/sec   Loss 4.0162   LearningRate 0.0272   Epoch: 15   Global Step: 158740   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:23:24,697-Speed 5971.26 samples/sec   Loss 3.9923   LearningRate 0.0272   Epoch: 15   Global Step: 158750   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:23:31,562-Speed 5967.26 samples/sec   Loss 4.0776   LearningRate 0.0271   Epoch: 15   Global Step: 158760   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-01-09 03:23:38,467-Speed 5933.15 samples/sec   Loss 4.0045   LearningRate 0.0271   Epoch: 15   Global Step: 158770   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:23:45,333-Speed 5967.44 samples/sec   Loss 4.0368   LearningRate 0.0271   Epoch: 15   Global Step: 158780   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:23:52,199-Speed 5966.49 samples/sec   Loss 4.0553   LearningRate 0.0271   Epoch: 15   Global Step: 158790   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:23:59,051-Speed 5979.48 samples/sec   Loss 3.9756   LearningRate 0.0271   Epoch: 15   Global Step: 158800   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:24:05,928-Speed 5956.40 samples/sec   Loss 4.0120   LearningRate 0.0271   Epoch: 15   Global Step: 158810   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:24:12,774-Speed 5985.00 samples/sec   Loss 4.0121   LearningRate 0.0271   Epoch: 15   Global Step: 158820   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:24:19,632-Speed 5973.11 samples/sec   Loss 3.9834   LearningRate 0.0271   Epoch: 15   Global Step: 158830   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:24:26,479-Speed 5982.85 samples/sec   Loss 4.0482   LearningRate 0.0271   Epoch: 15   Global Step: 158840   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:24:33,333-Speed 5977.88 samples/sec   Loss 4.0279   LearningRate 0.0270   Epoch: 15   Global Step: 158850   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:24:40,180-Speed 5982.83 samples/sec   Loss 3.9974   LearningRate 0.0270   Epoch: 15   Global Step: 158860   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:24:47,027-Speed 5983.12 samples/sec   Loss 4.0448   LearningRate 0.0270   Epoch: 15   Global Step: 158870   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:24:53,879-Speed 5981.58 samples/sec   Loss 4.0143   LearningRate 0.0270   Epoch: 15   Global Step: 158880   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:25:00,741-Speed 5970.45 samples/sec   Loss 4.0614   LearningRate 0.0270   Epoch: 15   Global Step: 158890   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:25:07,601-Speed 5971.95 samples/sec   Loss 4.0524   LearningRate 0.0270   Epoch: 15   Global Step: 158900   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:25:14,480-Speed 5955.83 samples/sec   Loss 4.0226   LearningRate 0.0270   Epoch: 15   Global Step: 158910   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-01-09 03:25:21,364-Speed 5952.62 samples/sec   Loss 3.9965   LearningRate 0.0270   Epoch: 15   Global Step: 158920   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:25:28,226-Speed 5970.46 samples/sec   Loss 3.9703   LearningRate 0.0270   Epoch: 15   Global Step: 158930   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:25:35,079-Speed 5977.91 samples/sec   Loss 3.9650   LearningRate 0.0269   Epoch: 15   Global Step: 158940   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:25:41,933-Speed 5976.89 samples/sec   Loss 3.9650   LearningRate 0.0269   Epoch: 15   Global Step: 158950   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:25:48,808-Speed 5959.46 samples/sec   Loss 4.0116   LearningRate 0.0269   Epoch: 15   Global Step: 158960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:25:55,666-Speed 5974.45 samples/sec   Loss 4.0183   LearningRate 0.0269   Epoch: 15   Global Step: 158970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 03:26:02,522-Speed 5975.95 samples/sec   Loss 4.0095   LearningRate 0.0269   Epoch: 15   Global Step: 158980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:26:09,372-Speed 5980.61 samples/sec   Loss 3.9790   LearningRate 0.0269   Epoch: 15   Global Step: 158990   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:26:16,244-Speed 5961.16 samples/sec   Loss 4.0058   LearningRate 0.0269   Epoch: 15   Global Step: 159000   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:26:23,101-Speed 5974.41 samples/sec   Loss 3.9971   LearningRate 0.0269   Epoch: 15   Global Step: 159010   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:26:29,943-Speed 5987.20 samples/sec   Loss 3.9795   LearningRate 0.0269   Epoch: 15   Global Step: 159020   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:26:36,827-Speed 5954.80 samples/sec   Loss 4.0004   LearningRate 0.0268   Epoch: 15   Global Step: 159030   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:26:43,691-Speed 5970.10 samples/sec   Loss 3.9778   LearningRate 0.0268   Epoch: 15   Global Step: 159040   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:26:50,554-Speed 5968.28 samples/sec   Loss 4.0246   LearningRate 0.0268   Epoch: 15   Global Step: 159050   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:26:57,398-Speed 5986.65 samples/sec   Loss 3.9643   LearningRate 0.0268   Epoch: 15   Global Step: 159060   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:27:04,271-Speed 5960.87 samples/sec   Loss 3.9714   LearningRate 0.0268   Epoch: 15   Global Step: 159070   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:27:11,141-Speed 5962.09 samples/sec   Loss 3.9817   LearningRate 0.0268   Epoch: 15   Global Step: 159080   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 03:27:18,003-Speed 5970.58 samples/sec   Loss 4.0135   LearningRate 0.0268   Epoch: 15   Global Step: 159090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 03:27:24,856-Speed 5978.58 samples/sec   Loss 3.9765   LearningRate 0.0268   Epoch: 15   Global Step: 159100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 03:27:31,722-Speed 5966.56 samples/sec   Loss 3.9831   LearningRate 0.0268   Epoch: 15   Global Step: 159110   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 03:27:38,565-Speed 5986.68 samples/sec   Loss 3.9788   LearningRate 0.0267   Epoch: 15   Global Step: 159120   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:27:45,417-Speed 5978.51 samples/sec   Loss 4.0640   LearningRate 0.0267   Epoch: 15   Global Step: 159130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:27:52,271-Speed 5976.93 samples/sec   Loss 3.9851   LearningRate 0.0267   Epoch: 15   Global Step: 159140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:27:59,132-Speed 5970.94 samples/sec   Loss 4.0737   LearningRate 0.0267   Epoch: 15   Global Step: 159150   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:28:05,997-Speed 5967.88 samples/sec   Loss 3.9984   LearningRate 0.0267   Epoch: 15   Global Step: 159160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:28:12,846-Speed 5980.96 samples/sec   Loss 3.9792   LearningRate 0.0267   Epoch: 15   Global Step: 159170   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:28:19,718-Speed 5962.33 samples/sec   Loss 3.9901   LearningRate 0.0267   Epoch: 15   Global Step: 159180   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:28:26,576-Speed 5973.19 samples/sec   Loss 4.0144   LearningRate 0.0267   Epoch: 15   Global Step: 159190   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:28:33,434-Speed 5973.80 samples/sec   Loss 3.9947   LearningRate 0.0267   Epoch: 15   Global Step: 159200   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:28:40,281-Speed 5983.42 samples/sec   Loss 4.0455   LearningRate 0.0266   Epoch: 15   Global Step: 159210   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:28:47,135-Speed 5980.40 samples/sec   Loss 3.9656   LearningRate 0.0266   Epoch: 15   Global Step: 159220   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:28:54,014-Speed 5955.54 samples/sec   Loss 3.9672   LearningRate 0.0266   Epoch: 15   Global Step: 159230   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:29:00,887-Speed 5960.62 samples/sec   Loss 3.9826   LearningRate 0.0266   Epoch: 15   Global Step: 159240   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:29:07,747-Speed 5973.92 samples/sec   Loss 3.9929   LearningRate 0.0266   Epoch: 15   Global Step: 159250   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:29:14,605-Speed 5973.79 samples/sec   Loss 3.9518   LearningRate 0.0266   Epoch: 15   Global Step: 159260   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:29:21,465-Speed 5974.12 samples/sec   Loss 4.0110   LearningRate 0.0266   Epoch: 15   Global Step: 159270   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:29:28,318-Speed 5977.98 samples/sec   Loss 3.9783   LearningRate 0.0266   Epoch: 15   Global Step: 159280   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:29:35,192-Speed 5959.67 samples/sec   Loss 4.0029   LearningRate 0.0266   Epoch: 15   Global Step: 159290   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:29:42,059-Speed 5965.28 samples/sec   Loss 3.9803   LearningRate 0.0265   Epoch: 15   Global Step: 159300   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:29:48,916-Speed 5976.57 samples/sec   Loss 3.9865   LearningRate 0.0265   Epoch: 15   Global Step: 159310   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:29:55,765-Speed 5981.45 samples/sec   Loss 3.9577   LearningRate 0.0265   Epoch: 15   Global Step: 159320   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:30:02,625-Speed 5973.24 samples/sec   Loss 3.9605   LearningRate 0.0265   Epoch: 15   Global Step: 159330   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:30:09,472-Speed 5983.31 samples/sec   Loss 3.9610   LearningRate 0.0265   Epoch: 15   Global Step: 159340   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:30:16,336-Speed 5968.16 samples/sec   Loss 3.9929   LearningRate 0.0265   Epoch: 15   Global Step: 159350   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:30:23,187-Speed 5980.78 samples/sec   Loss 3.9526   LearningRate 0.0265   Epoch: 15   Global Step: 159360   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:30:30,033-Speed 5984.41 samples/sec   Loss 3.9764   LearningRate 0.0265   Epoch: 15   Global Step: 159370   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:30:36,874-Speed 5987.97 samples/sec   Loss 3.9191   LearningRate 0.0265   Epoch: 15   Global Step: 159380   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:30:43,725-Speed 5980.57 samples/sec   Loss 4.0044   LearningRate 0.0264   Epoch: 15   Global Step: 159390   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:30:50,596-Speed 5962.20 samples/sec   Loss 3.9631   LearningRate 0.0264   Epoch: 15   Global Step: 159400   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:30:57,454-Speed 5974.08 samples/sec   Loss 3.9893   LearningRate 0.0264   Epoch: 15   Global Step: 159410   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:31:04,321-Speed 5969.74 samples/sec   Loss 3.9580   LearningRate 0.0264   Epoch: 15   Global Step: 159420   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:31:11,187-Speed 5967.23 samples/sec   Loss 3.9341   LearningRate 0.0264   Epoch: 15   Global Step: 159430   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:31:18,048-Speed 5970.77 samples/sec   Loss 3.9882   LearningRate 0.0264   Epoch: 15   Global Step: 159440   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:31:24,889-Speed 5988.74 samples/sec   Loss 4.0016   LearningRate 0.0264   Epoch: 15   Global Step: 159450   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:31:31,763-Speed 5959.99 samples/sec   Loss 3.9541   LearningRate 0.0264   Epoch: 15   Global Step: 159460   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:31:38,617-Speed 5976.77 samples/sec   Loss 4.0100   LearningRate 0.0264   Epoch: 15   Global Step: 159470   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:31:45,495-Speed 5956.79 samples/sec   Loss 3.9513   LearningRate 0.0263   Epoch: 15   Global Step: 159480   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:31:52,357-Speed 5969.76 samples/sec   Loss 4.0214   LearningRate 0.0263   Epoch: 15   Global Step: 159490   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:31:59,216-Speed 5973.19 samples/sec   Loss 3.9519   LearningRate 0.0263   Epoch: 15   Global Step: 159500   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:32:06,066-Speed 5983.30 samples/sec   Loss 3.9560   LearningRate 0.0263   Epoch: 15   Global Step: 159510   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:32:12,929-Speed 5972.31 samples/sec   Loss 3.9548   LearningRate 0.0263   Epoch: 15   Global Step: 159520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:32:19,794-Speed 5970.05 samples/sec   Loss 3.9696   LearningRate 0.0263   Epoch: 15   Global Step: 159530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:32:26,660-Speed 5967.21 samples/sec   Loss 3.9091   LearningRate 0.0263   Epoch: 15   Global Step: 159540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:32:33,515-Speed 5977.58 samples/sec   Loss 3.9495   LearningRate 0.0263   Epoch: 15   Global Step: 159550   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:32:40,363-Speed 5982.35 samples/sec   Loss 3.9484   LearningRate 0.0263   Epoch: 15   Global Step: 159560   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:32:47,253-Speed 5946.55 samples/sec   Loss 4.0056   LearningRate 0.0262   Epoch: 15   Global Step: 159570   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:32:54,095-Speed 5987.72 samples/sec   Loss 3.9813   LearningRate 0.0262   Epoch: 15   Global Step: 159580   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:33:00,958-Speed 5969.33 samples/sec   Loss 4.0015   LearningRate 0.0262   Epoch: 15   Global Step: 159590   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:33:07,918-Speed 5886.75 samples/sec   Loss 3.9830   LearningRate 0.0262   Epoch: 15   Global Step: 159600   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 03:33:14,774-Speed 5975.86 samples/sec   Loss 3.9651   LearningRate 0.0262   Epoch: 15   Global Step: 159610   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 03:33:21,776-Speed 5851.05 samples/sec   Loss 3.9301   LearningRate 0.0262   Epoch: 15   Global Step: 159620   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 03:33:28,747-Speed 5876.76 samples/sec   Loss 3.9529   LearningRate 0.0262   Epoch: 15   Global Step: 159630   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 03:33:35,600-Speed 5978.60 samples/sec   Loss 3.9526   LearningRate 0.0262   Epoch: 15   Global Step: 159640   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 03:33:42,442-Speed 5987.31 samples/sec   Loss 3.9365   LearningRate 0.0262   Epoch: 15   Global Step: 159650   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:33:49,327-Speed 5952.83 samples/sec   Loss 3.9363   LearningRate 0.0261   Epoch: 15   Global Step: 159660   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:33:56,208-Speed 5954.49 samples/sec   Loss 3.9466   LearningRate 0.0261   Epoch: 15   Global Step: 159670   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:34:03,056-Speed 5982.08 samples/sec   Loss 3.9281   LearningRate 0.0261   Epoch: 15   Global Step: 159680   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:34:09,951-Speed 5941.49 samples/sec   Loss 3.9406   LearningRate 0.0261   Epoch: 15   Global Step: 159690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:34:16,809-Speed 5973.95 samples/sec   Loss 3.9435   LearningRate 0.0261   Epoch: 15   Global Step: 159700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:34:23,668-Speed 5972.19 samples/sec   Loss 3.9340   LearningRate 0.0261   Epoch: 15   Global Step: 159710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:34:30,536-Speed 5965.28 samples/sec   Loss 3.9763   LearningRate 0.0261   Epoch: 15   Global Step: 159720   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:34:37,383-Speed 5983.17 samples/sec   Loss 3.9227   LearningRate 0.0261   Epoch: 15   Global Step: 159730   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:34:44,237-Speed 5976.94 samples/sec   Loss 3.9324   LearningRate 0.0261   Epoch: 15   Global Step: 159740   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:34:51,116-Speed 5955.94 samples/sec   Loss 3.9265   LearningRate 0.0260   Epoch: 15   Global Step: 159750   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 03:34:58,012-Speed 5941.25 samples/sec   Loss 3.9342   LearningRate 0.0260   Epoch: 15   Global Step: 159760   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 03:35:04,853-Speed 5989.21 samples/sec   Loss 3.9258   LearningRate 0.0260   Epoch: 15   Global Step: 159770   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:35:11,708-Speed 5976.22 samples/sec   Loss 3.9302   LearningRate 0.0260   Epoch: 15   Global Step: 159780   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:35:18,565-Speed 5975.07 samples/sec   Loss 3.9347   LearningRate 0.0260   Epoch: 15   Global Step: 159790   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:35:25,414-Speed 5981.48 samples/sec   Loss 3.9470   LearningRate 0.0260   Epoch: 15   Global Step: 159800   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:35:32,265-Speed 5981.55 samples/sec   Loss 3.9406   LearningRate 0.0260   Epoch: 15   Global Step: 159810   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:35:39,129-Speed 5969.33 samples/sec   Loss 3.9215   LearningRate 0.0260   Epoch: 15   Global Step: 159820   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:35:45,989-Speed 5971.26 samples/sec   Loss 3.9739   LearningRate 0.0260   Epoch: 15   Global Step: 159830   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:35:52,941-Speed 5895.64 samples/sec   Loss 3.9507   LearningRate 0.0260   Epoch: 15   Global Step: 159840   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:35:59,809-Speed 5964.99 samples/sec   Loss 3.9700   LearningRate 0.0259   Epoch: 15   Global Step: 159850   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:36:06,669-Speed 5971.82 samples/sec   Loss 3.9207   LearningRate 0.0259   Epoch: 15   Global Step: 159860   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:36:13,522-Speed 5978.14 samples/sec   Loss 3.9635   LearningRate 0.0259   Epoch: 15   Global Step: 159870   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:36:20,386-Speed 5968.83 samples/sec   Loss 3.9203   LearningRate 0.0259   Epoch: 15   Global Step: 159880   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:36:27,240-Speed 5976.24 samples/sec   Loss 3.9301   LearningRate 0.0259   Epoch: 15   Global Step: 159890   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:36:34,111-Speed 5965.72 samples/sec   Loss 3.8890   LearningRate 0.0259   Epoch: 15   Global Step: 159900   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:36:40,959-Speed 5983.10 samples/sec   Loss 3.9654   LearningRate 0.0259   Epoch: 15   Global Step: 159910   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:36:47,799-Speed 5988.77 samples/sec   Loss 3.9176   LearningRate 0.0259   Epoch: 15   Global Step: 159920   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:36:54,644-Speed 5985.55 samples/sec   Loss 3.9193   LearningRate 0.0259   Epoch: 15   Global Step: 159930   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:37:01,494-Speed 5980.48 samples/sec   Loss 3.9256   LearningRate 0.0258   Epoch: 15   Global Step: 159940   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:37:08,360-Speed 5966.29 samples/sec   Loss 3.9340   LearningRate 0.0258   Epoch: 15   Global Step: 159950   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:37:15,221-Speed 5971.57 samples/sec   Loss 3.9361   LearningRate 0.0258   Epoch: 15   Global Step: 159960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:37:22,082-Speed 5971.14 samples/sec   Loss 3.9718   LearningRate 0.0258   Epoch: 15   Global Step: 159970   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:37:28,936-Speed 5976.63 samples/sec   Loss 3.9438   LearningRate 0.0258   Epoch: 15   Global Step: 159980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:37:35,778-Speed 5987.67 samples/sec   Loss 3.9422   LearningRate 0.0258   Epoch: 15   Global Step: 159990   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 03:37:42,632-Speed 5977.34 samples/sec   Loss 3.9010   LearningRate 0.0258   Epoch: 15   Global Step: 160000   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:38:09,290-[lfw][160000]XNorm: 23.752233
Training: 2022-01-09 03:38:09,291-[lfw][160000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-01-09 03:38:09,292-[lfw][160000]Accuracy-Highest: 0.99817
Training: 2022-01-09 03:38:40,143-[cfp_fp][160000]XNorm: 21.176481
Training: 2022-01-09 03:38:40,144-[cfp_fp][160000]Accuracy-Flip: 0.98929+-0.00492
Training: 2022-01-09 03:38:40,145-[cfp_fp][160000]Accuracy-Highest: 0.98929
Training: 2022-01-09 03:39:06,843-[agedb_30][160000]XNorm: 23.048697
Training: 2022-01-09 03:39:06,844-[agedb_30][160000]Accuracy-Flip: 0.97817+-0.00608
Training: 2022-01-09 03:39:06,845-[agedb_30][160000]Accuracy-Highest: 0.97833
Training: 2022-01-09 03:39:13,708-Speed 449.74 samples/sec   Loss 3.9775   LearningRate 0.0258   Epoch: 15   Global Step: 160010   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:39:20,563-Speed 5976.01 samples/sec   Loss 3.8734   LearningRate 0.0258   Epoch: 15   Global Step: 160020   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:39:27,412-Speed 5981.98 samples/sec   Loss 3.9750   LearningRate 0.0257   Epoch: 15   Global Step: 160030   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:39:34,267-Speed 5976.47 samples/sec   Loss 3.9275   LearningRate 0.0257   Epoch: 15   Global Step: 160040   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:39:41,136-Speed 5964.33 samples/sec   Loss 3.9181   LearningRate 0.0257   Epoch: 15   Global Step: 160050   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:39:47,990-Speed 5977.44 samples/sec   Loss 3.8779   LearningRate 0.0257   Epoch: 15   Global Step: 160060   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:39:54,846-Speed 5975.83 samples/sec   Loss 3.9558   LearningRate 0.0257   Epoch: 15   Global Step: 160070   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:40:01,723-Speed 5958.15 samples/sec   Loss 3.8993   LearningRate 0.0257   Epoch: 15   Global Step: 160080   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:40:08,625-Speed 5935.67 samples/sec   Loss 3.9297   LearningRate 0.0257   Epoch: 15   Global Step: 160090   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:40:15,528-Speed 5934.96 samples/sec   Loss 3.9346   LearningRate 0.0257   Epoch: 15   Global Step: 160100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 03:40:22,418-Speed 5945.50 samples/sec   Loss 3.9103   LearningRate 0.0257   Epoch: 15   Global Step: 160110   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:40:29,279-Speed 5970.73 samples/sec   Loss 3.9255   LearningRate 0.0256   Epoch: 15   Global Step: 160120   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:40:36,181-Speed 5936.41 samples/sec   Loss 3.9273   LearningRate 0.0256   Epoch: 15   Global Step: 160130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:40:43,054-Speed 5959.91 samples/sec   Loss 3.8907   LearningRate 0.0256   Epoch: 15   Global Step: 160140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:40:49,927-Speed 5961.11 samples/sec   Loss 3.9195   LearningRate 0.0256   Epoch: 15   Global Step: 160150   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:40:56,781-Speed 5977.73 samples/sec   Loss 3.9425   LearningRate 0.0256   Epoch: 15   Global Step: 160160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:41:03,644-Speed 5968.91 samples/sec   Loss 3.9025   LearningRate 0.0256   Epoch: 15   Global Step: 160170   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:41:10,496-Speed 5978.48 samples/sec   Loss 3.8854   LearningRate 0.0256   Epoch: 15   Global Step: 160180   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:41:17,350-Speed 5977.42 samples/sec   Loss 3.8648   LearningRate 0.0256   Epoch: 15   Global Step: 160190   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:41:24,196-Speed 5983.65 samples/sec   Loss 3.9445   LearningRate 0.0256   Epoch: 15   Global Step: 160200   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:41:31,053-Speed 5974.78 samples/sec   Loss 3.9311   LearningRate 0.0255   Epoch: 15   Global Step: 160210   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 03:41:37,923-Speed 5963.78 samples/sec   Loss 3.9401   LearningRate 0.0255   Epoch: 15   Global Step: 160220   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 03:41:44,785-Speed 5970.35 samples/sec   Loss 3.9682   LearningRate 0.0255   Epoch: 15   Global Step: 160230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 03:41:51,633-Speed 5983.18 samples/sec   Loss 3.9203   LearningRate 0.0255   Epoch: 15   Global Step: 160240   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 03:41:58,501-Speed 5965.07 samples/sec   Loss 3.9316   LearningRate 0.0255   Epoch: 15   Global Step: 160250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 03:42:05,358-Speed 5974.40 samples/sec   Loss 3.8972   LearningRate 0.0255   Epoch: 15   Global Step: 160260   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:42:12,202-Speed 5985.93 samples/sec   Loss 3.8916   LearningRate 0.0255   Epoch: 15   Global Step: 160270   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:42:19,052-Speed 5980.80 samples/sec   Loss 3.8748   LearningRate 0.0255   Epoch: 15   Global Step: 160280   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:42:25,913-Speed 5970.96 samples/sec   Loss 3.8884   LearningRate 0.0255   Epoch: 15   Global Step: 160290   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:42:32,779-Speed 5969.15 samples/sec   Loss 3.9500   LearningRate 0.0255   Epoch: 15   Global Step: 160300   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:42:39,638-Speed 5972.69 samples/sec   Loss 3.9232   LearningRate 0.0254   Epoch: 15   Global Step: 160310   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:42:46,517-Speed 5955.12 samples/sec   Loss 3.8999   LearningRate 0.0254   Epoch: 15   Global Step: 160320   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:42:53,380-Speed 5971.92 samples/sec   Loss 3.8871   LearningRate 0.0254   Epoch: 15   Global Step: 160330   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:43:00,259-Speed 5955.91 samples/sec   Loss 3.9009   LearningRate 0.0254   Epoch: 15   Global Step: 160340   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:43:07,117-Speed 5973.63 samples/sec   Loss 3.9306   LearningRate 0.0254   Epoch: 15   Global Step: 160350   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:43:13,979-Speed 5973.68 samples/sec   Loss 3.8635   LearningRate 0.0254   Epoch: 15   Global Step: 160360   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:43:20,837-Speed 5973.49 samples/sec   Loss 3.9235   LearningRate 0.0254   Epoch: 15   Global Step: 160370   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:43:27,710-Speed 5960.23 samples/sec   Loss 3.9401   LearningRate 0.0254   Epoch: 15   Global Step: 160380   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:43:34,587-Speed 5957.96 samples/sec   Loss 3.8987   LearningRate 0.0254   Epoch: 15   Global Step: 160390   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:43:41,445-Speed 5973.50 samples/sec   Loss 3.9046   LearningRate 0.0253   Epoch: 15   Global Step: 160400   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:43:48,313-Speed 5964.71 samples/sec   Loss 3.8742   LearningRate 0.0253   Epoch: 15   Global Step: 160410   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:43:55,176-Speed 5969.11 samples/sec   Loss 3.8891   LearningRate 0.0253   Epoch: 15   Global Step: 160420   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:44:02,113-Speed 5905.94 samples/sec   Loss 3.9252   LearningRate 0.0253   Epoch: 15   Global Step: 160430   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:44:08,992-Speed 5955.26 samples/sec   Loss 3.8977   LearningRate 0.0253   Epoch: 15   Global Step: 160440   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:44:15,926-Speed 5909.12 samples/sec   Loss 3.9182   LearningRate 0.0253   Epoch: 15   Global Step: 160450   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:44:22,795-Speed 5964.39 samples/sec   Loss 3.9057   LearningRate 0.0253   Epoch: 15   Global Step: 160460   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:44:29,649-Speed 5976.74 samples/sec   Loss 3.9033   LearningRate 0.0253   Epoch: 15   Global Step: 160470   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:44:36,512-Speed 5972.55 samples/sec   Loss 3.8720   LearningRate 0.0253   Epoch: 15   Global Step: 160480   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:44:43,378-Speed 5966.59 samples/sec   Loss 3.8944   LearningRate 0.0252   Epoch: 15   Global Step: 160490   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:44:50,247-Speed 5964.36 samples/sec   Loss 3.9439   LearningRate 0.0252   Epoch: 15   Global Step: 160500   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:44:57,125-Speed 5956.83 samples/sec   Loss 3.8709   LearningRate 0.0252   Epoch: 15   Global Step: 160510   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:45:03,980-Speed 5976.53 samples/sec   Loss 3.8792   LearningRate 0.0252   Epoch: 15   Global Step: 160520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:45:10,835-Speed 5975.90 samples/sec   Loss 3.9087   LearningRate 0.0252   Epoch: 15   Global Step: 160530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:45:17,742-Speed 5931.79 samples/sec   Loss 3.8599   LearningRate 0.0252   Epoch: 15   Global Step: 160540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:45:24,653-Speed 5928.05 samples/sec   Loss 3.8920   LearningRate 0.0252   Epoch: 15   Global Step: 160550   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 03:45:31,572-Speed 5921.21 samples/sec   Loss 3.8832   LearningRate 0.0252   Epoch: 15   Global Step: 160560   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:45:38,434-Speed 5970.29 samples/sec   Loss 3.8521   LearningRate 0.0252   Epoch: 15   Global Step: 160570   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:45:45,287-Speed 5977.91 samples/sec   Loss 3.9201   LearningRate 0.0251   Epoch: 15   Global Step: 160580   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:45:52,137-Speed 5980.58 samples/sec   Loss 3.8693   LearningRate 0.0251   Epoch: 15   Global Step: 160590   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:45:59,007-Speed 5963.30 samples/sec   Loss 3.8452   LearningRate 0.0251   Epoch: 15   Global Step: 160600   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:46:05,869-Speed 5970.28 samples/sec   Loss 3.8779   LearningRate 0.0251   Epoch: 15   Global Step: 160610   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:46:12,733-Speed 5968.26 samples/sec   Loss 3.9316   LearningRate 0.0251   Epoch: 15   Global Step: 160620   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:46:19,599-Speed 5966.80 samples/sec   Loss 3.8480   LearningRate 0.0251   Epoch: 15   Global Step: 160630   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:46:26,474-Speed 5959.24 samples/sec   Loss 3.9102   LearningRate 0.0251   Epoch: 15   Global Step: 160640   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:46:33,323-Speed 5981.72 samples/sec   Loss 3.8838   LearningRate 0.0251   Epoch: 15   Global Step: 160650   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:46:40,234-Speed 5981.20 samples/sec   Loss 3.9297   LearningRate 0.0251   Epoch: 15   Global Step: 160660   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:46:47,123-Speed 5946.33 samples/sec   Loss 3.8548   LearningRate 0.0251   Epoch: 15   Global Step: 160670   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:46:54,007-Speed 5951.13 samples/sec   Loss 3.8948   LearningRate 0.0250   Epoch: 15   Global Step: 160680   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:47:00,860-Speed 5978.17 samples/sec   Loss 3.8923   LearningRate 0.0250   Epoch: 15   Global Step: 160690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:47:07,752-Speed 5944.33 samples/sec   Loss 3.8675   LearningRate 0.0250   Epoch: 15   Global Step: 160700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:47:14,602-Speed 5981.48 samples/sec   Loss 3.8327   LearningRate 0.0250   Epoch: 15   Global Step: 160710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:47:21,449-Speed 5982.74 samples/sec   Loss 3.8947   LearningRate 0.0250   Epoch: 15   Global Step: 160720   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:47:28,311-Speed 5970.97 samples/sec   Loss 3.8833   LearningRate 0.0250   Epoch: 15   Global Step: 160730   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:47:35,181-Speed 5963.22 samples/sec   Loss 3.8828   LearningRate 0.0250   Epoch: 15   Global Step: 160740   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:47:42,053-Speed 5961.74 samples/sec   Loss 3.8897   LearningRate 0.0250   Epoch: 15   Global Step: 160750   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:47:48,924-Speed 5962.88 samples/sec   Loss 3.8654   LearningRate 0.0250   Epoch: 15   Global Step: 160760   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:47:55,800-Speed 5957.29 samples/sec   Loss 3.8716   LearningRate 0.0249   Epoch: 15   Global Step: 160770   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 03:48:02,642-Speed 5987.59 samples/sec   Loss 3.8586   LearningRate 0.0249   Epoch: 15   Global Step: 160780   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:48:09,489-Speed 5983.63 samples/sec   Loss 3.8902   LearningRate 0.0249   Epoch: 15   Global Step: 160790   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:48:16,355-Speed 5967.14 samples/sec   Loss 3.8683   LearningRate 0.0249   Epoch: 15   Global Step: 160800   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:48:23,256-Speed 5936.05 samples/sec   Loss 3.8559   LearningRate 0.0249   Epoch: 15   Global Step: 160810   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:48:30,120-Speed 5968.75 samples/sec   Loss 3.8047   LearningRate 0.0249   Epoch: 15   Global Step: 160820   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:48:37,016-Speed 5941.25 samples/sec   Loss 3.8719   LearningRate 0.0249   Epoch: 15   Global Step: 160830   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:48:43,881-Speed 5968.04 samples/sec   Loss 3.8712   LearningRate 0.0249   Epoch: 15   Global Step: 160840   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:48:50,738-Speed 5976.89 samples/sec   Loss 3.8853   LearningRate 0.0249   Epoch: 15   Global Step: 160850   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:48:57,603-Speed 5967.94 samples/sec   Loss 3.8674   LearningRate 0.0248   Epoch: 15   Global Step: 160860   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:49:04,450-Speed 5982.84 samples/sec   Loss 3.8773   LearningRate 0.0248   Epoch: 15   Global Step: 160870   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:49:11,333-Speed 5952.01 samples/sec   Loss 3.8655   LearningRate 0.0248   Epoch: 15   Global Step: 160880   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 03:49:18,203-Speed 5963.92 samples/sec   Loss 3.8616   LearningRate 0.0248   Epoch: 15   Global Step: 160890   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 03:49:25,082-Speed 5955.70 samples/sec   Loss 3.8878   LearningRate 0.0248   Epoch: 15   Global Step: 160900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 03:49:31,939-Speed 5974.13 samples/sec   Loss 3.8246   LearningRate 0.0248   Epoch: 15   Global Step: 160910   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:49:38,792-Speed 5979.16 samples/sec   Loss 3.8410   LearningRate 0.0248   Epoch: 15   Global Step: 160920   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:49:45,662-Speed 5963.44 samples/sec   Loss 3.8304   LearningRate 0.0248   Epoch: 15   Global Step: 160930   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:49:52,524-Speed 5970.81 samples/sec   Loss 3.8655   LearningRate 0.0248   Epoch: 15   Global Step: 160940   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:49:59,381-Speed 5974.23 samples/sec   Loss 3.8304   LearningRate 0.0248   Epoch: 15   Global Step: 160950   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:50:06,232-Speed 5981.29 samples/sec   Loss 3.9205   LearningRate 0.0247   Epoch: 15   Global Step: 160960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:50:13,124-Speed 5944.77 samples/sec   Loss 3.8802   LearningRate 0.0247   Epoch: 15   Global Step: 160970   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:50:19,993-Speed 5964.40 samples/sec   Loss 3.8925   LearningRate 0.0247   Epoch: 15   Global Step: 160980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:50:26,870-Speed 5957.00 samples/sec   Loss 3.9054   LearningRate 0.0247   Epoch: 15   Global Step: 160990   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:50:33,734-Speed 5969.26 samples/sec   Loss 3.8931   LearningRate 0.0247   Epoch: 15   Global Step: 161000   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:50:40,587-Speed 5978.52 samples/sec   Loss 3.8777   LearningRate 0.0247   Epoch: 15   Global Step: 161010   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 03:50:47,455-Speed 5964.50 samples/sec   Loss 3.8469   LearningRate 0.0247   Epoch: 15   Global Step: 161020   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 03:50:54,330-Speed 5959.22 samples/sec   Loss 3.8822   LearningRate 0.0247   Epoch: 15   Global Step: 161030   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 03:51:01,175-Speed 5985.59 samples/sec   Loss 3.8475   LearningRate 0.0247   Epoch: 15   Global Step: 161040   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:51:08,075-Speed 5937.69 samples/sec   Loss 3.8439   LearningRate 0.0246   Epoch: 15   Global Step: 161050   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:51:14,933-Speed 5973.39 samples/sec   Loss 3.8750   LearningRate 0.0246   Epoch: 15   Global Step: 161060   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:51:21,803-Speed 5974.75 samples/sec   Loss 3.8493   LearningRate 0.0246   Epoch: 15   Global Step: 161070   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:51:28,664-Speed 5971.55 samples/sec   Loss 3.8354   LearningRate 0.0246   Epoch: 15   Global Step: 161080   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:51:35,540-Speed 5958.27 samples/sec   Loss 3.8548   LearningRate 0.0246   Epoch: 15   Global Step: 161090   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:51:42,404-Speed 5968.46 samples/sec   Loss 3.8336   LearningRate 0.0246   Epoch: 15   Global Step: 161100   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:51:49,286-Speed 5952.13 samples/sec   Loss 3.9043   LearningRate 0.0246   Epoch: 15   Global Step: 161110   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:51:56,141-Speed 5976.18 samples/sec   Loss 3.8318   LearningRate 0.0246   Epoch: 15   Global Step: 161120   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:52:02,997-Speed 5976.18 samples/sec   Loss 3.8756   LearningRate 0.0246   Epoch: 15   Global Step: 161130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:52:09,848-Speed 5979.44 samples/sec   Loss 3.8341   LearningRate 0.0246   Epoch: 15   Global Step: 161140   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 03:52:16,687-Speed 5990.54 samples/sec   Loss 3.8575   LearningRate 0.0245   Epoch: 15   Global Step: 161150   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:52:23,556-Speed 5964.29 samples/sec   Loss 3.8679   LearningRate 0.0245   Epoch: 15   Global Step: 161160   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:52:30,403-Speed 5982.82 samples/sec   Loss 3.8330   LearningRate 0.0245   Epoch: 15   Global Step: 161170   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:52:37,303-Speed 5937.08 samples/sec   Loss 3.8917   LearningRate 0.0245   Epoch: 15   Global Step: 161180   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:52:44,186-Speed 5952.94 samples/sec   Loss 3.8415   LearningRate 0.0245   Epoch: 15   Global Step: 161190   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:52:51,040-Speed 5977.00 samples/sec   Loss 3.8887   LearningRate 0.0245   Epoch: 15   Global Step: 161200   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:52:57,919-Speed 5956.11 samples/sec   Loss 3.8462   LearningRate 0.0245   Epoch: 15   Global Step: 161210   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:53:04,775-Speed 5975.24 samples/sec   Loss 3.8378   LearningRate 0.0245   Epoch: 15   Global Step: 161220   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:53:11,648-Speed 5960.48 samples/sec   Loss 3.8309   LearningRate 0.0245   Epoch: 15   Global Step: 161230   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:53:18,503-Speed 5976.78 samples/sec   Loss 3.8481   LearningRate 0.0244   Epoch: 15   Global Step: 161240   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 03:53:25,362-Speed 5972.77 samples/sec   Loss 3.8836   LearningRate 0.0244   Epoch: 15   Global Step: 161250   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:53:32,244-Speed 5953.43 samples/sec   Loss 3.8306   LearningRate 0.0244   Epoch: 15   Global Step: 161260   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:53:39,171-Speed 5914.36 samples/sec   Loss 3.8154   LearningRate 0.0244   Epoch: 15   Global Step: 161270   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:53:46,072-Speed 5937.35 samples/sec   Loss 3.7909   LearningRate 0.0244   Epoch: 15   Global Step: 161280   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:53:53,070-Speed 5853.70 samples/sec   Loss 3.8438   LearningRate 0.0244   Epoch: 15   Global Step: 161290   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:53:59,924-Speed 5977.84 samples/sec   Loss 3.8381   LearningRate 0.0244   Epoch: 15   Global Step: 161300   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:54:06,795-Speed 5962.81 samples/sec   Loss 3.8298   LearningRate 0.0244   Epoch: 15   Global Step: 161310   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:54:13,654-Speed 5972.71 samples/sec   Loss 3.8685   LearningRate 0.0244   Epoch: 15   Global Step: 161320   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:54:20,518-Speed 5968.59 samples/sec   Loss 3.8274   LearningRate 0.0244   Epoch: 15   Global Step: 161330   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:54:27,392-Speed 5959.82 samples/sec   Loss 3.8739   LearningRate 0.0243   Epoch: 15   Global Step: 161340   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:54:34,250-Speed 5973.95 samples/sec   Loss 3.8120   LearningRate 0.0243   Epoch: 15   Global Step: 161350   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 03:54:41,094-Speed 5986.23 samples/sec   Loss 3.8563   LearningRate 0.0243   Epoch: 15   Global Step: 161360   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:54:47,952-Speed 5973.90 samples/sec   Loss 3.8206   LearningRate 0.0243   Epoch: 15   Global Step: 161370   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:54:54,801-Speed 5981.42 samples/sec   Loss 3.8632   LearningRate 0.0243   Epoch: 15   Global Step: 161380   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:55:01,656-Speed 5976.25 samples/sec   Loss 3.7954   LearningRate 0.0243   Epoch: 15   Global Step: 161390   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:55:08,504-Speed 5982.55 samples/sec   Loss 3.8017   LearningRate 0.0243   Epoch: 15   Global Step: 161400   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:55:15,352-Speed 5982.50 samples/sec   Loss 3.7989   LearningRate 0.0243   Epoch: 15   Global Step: 161410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:55:22,229-Speed 5956.98 samples/sec   Loss 3.8013   LearningRate 0.0243   Epoch: 15   Global Step: 161420   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:55:29,117-Speed 5950.27 samples/sec   Loss 3.8023   LearningRate 0.0242   Epoch: 15   Global Step: 161430   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:55:35,980-Speed 5969.07 samples/sec   Loss 3.7937   LearningRate 0.0242   Epoch: 15   Global Step: 161440   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:55:42,833-Speed 5977.95 samples/sec   Loss 3.8277   LearningRate 0.0242   Epoch: 15   Global Step: 161450   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:55:49,696-Speed 5969.16 samples/sec   Loss 3.8085   LearningRate 0.0242   Epoch: 15   Global Step: 161460   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 03:55:56,542-Speed 5984.49 samples/sec   Loss 3.8205   LearningRate 0.0242   Epoch: 15   Global Step: 161470   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:56:03,408-Speed 5968.00 samples/sec   Loss 3.8608   LearningRate 0.0242   Epoch: 15   Global Step: 161480   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:56:10,272-Speed 5968.40 samples/sec   Loss 3.8165   LearningRate 0.0242   Epoch: 15   Global Step: 161490   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:56:17,119-Speed 5982.69 samples/sec   Loss 3.8429   LearningRate 0.0242   Epoch: 15   Global Step: 161500   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:56:23,985-Speed 5967.39 samples/sec   Loss 3.8467   LearningRate 0.0242   Epoch: 15   Global Step: 161510   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:56:30,844-Speed 5974.05 samples/sec   Loss 3.8275   LearningRate 0.0241   Epoch: 15   Global Step: 161520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:56:37,692-Speed 5981.53 samples/sec   Loss 3.8060   LearningRate 0.0241   Epoch: 15   Global Step: 161530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:56:44,555-Speed 5969.80 samples/sec   Loss 3.7985   LearningRate 0.0241   Epoch: 15   Global Step: 161540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:56:51,412-Speed 5974.55 samples/sec   Loss 3.7725   LearningRate 0.0241   Epoch: 15   Global Step: 161550   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:56:58,267-Speed 5976.50 samples/sec   Loss 3.8091   LearningRate 0.0241   Epoch: 15   Global Step: 161560   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:57:05,135-Speed 5965.10 samples/sec   Loss 3.8652   LearningRate 0.0241   Epoch: 15   Global Step: 161570   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:57:12,014-Speed 5956.48 samples/sec   Loss 3.7934   LearningRate 0.0241   Epoch: 15   Global Step: 161580   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:57:18,912-Speed 5938.76 samples/sec   Loss 3.7746   LearningRate 0.0241   Epoch: 15   Global Step: 161590   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:57:25,763-Speed 5981.59 samples/sec   Loss 3.8078   LearningRate 0.0241   Epoch: 15   Global Step: 161600   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:57:32,624-Speed 5971.52 samples/sec   Loss 3.7965   LearningRate 0.0241   Epoch: 15   Global Step: 161610   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:57:39,499-Speed 5958.29 samples/sec   Loss 3.8135   LearningRate 0.0240   Epoch: 15   Global Step: 161620   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:57:46,360-Speed 5972.15 samples/sec   Loss 3.7889   LearningRate 0.0240   Epoch: 15   Global Step: 161630   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:57:53,203-Speed 5986.75 samples/sec   Loss 3.7863   LearningRate 0.0240   Epoch: 15   Global Step: 161640   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:58:00,053-Speed 5980.28 samples/sec   Loss 3.8420   LearningRate 0.0240   Epoch: 15   Global Step: 161650   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:58:06,900-Speed 5983.17 samples/sec   Loss 3.8161   LearningRate 0.0240   Epoch: 15   Global Step: 161660   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:58:13,763-Speed 5973.32 samples/sec   Loss 3.7994   LearningRate 0.0240   Epoch: 15   Global Step: 161670   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 03:58:20,632-Speed 5964.48 samples/sec   Loss 3.7963   LearningRate 0.0240   Epoch: 15   Global Step: 161680   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 03:58:27,489-Speed 5974.34 samples/sec   Loss 3.7710   LearningRate 0.0240   Epoch: 15   Global Step: 161690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:58:34,381-Speed 5945.85 samples/sec   Loss 3.8054   LearningRate 0.0240   Epoch: 15   Global Step: 161700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:58:41,247-Speed 5967.21 samples/sec   Loss 3.8476   LearningRate 0.0239   Epoch: 15   Global Step: 161710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:58:48,105-Speed 5973.57 samples/sec   Loss 3.8201   LearningRate 0.0239   Epoch: 15   Global Step: 161720   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:58:54,967-Speed 5970.12 samples/sec   Loss 3.7801   LearningRate 0.0239   Epoch: 15   Global Step: 161730   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:59:01,820-Speed 5977.45 samples/sec   Loss 3.8043   LearningRate 0.0239   Epoch: 15   Global Step: 161740   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:59:08,688-Speed 5965.57 samples/sec   Loss 3.8512   LearningRate 0.0239   Epoch: 15   Global Step: 161750   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:59:15,550-Speed 5971.65 samples/sec   Loss 3.7757   LearningRate 0.0239   Epoch: 15   Global Step: 161760   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:59:22,416-Speed 5966.38 samples/sec   Loss 3.7757   LearningRate 0.0239   Epoch: 15   Global Step: 161770   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:59:29,286-Speed 5963.85 samples/sec   Loss 3.7362   LearningRate 0.0239   Epoch: 15   Global Step: 161780   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:59:36,134-Speed 5983.94 samples/sec   Loss 3.7731   LearningRate 0.0239   Epoch: 15   Global Step: 161790   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:59:43,014-Speed 5955.29 samples/sec   Loss 3.7982   LearningRate 0.0239   Epoch: 15   Global Step: 161800   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:59:49,873-Speed 5972.54 samples/sec   Loss 3.8229   LearningRate 0.0238   Epoch: 15   Global Step: 161810   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 03:59:56,732-Speed 5972.58 samples/sec   Loss 3.7989   LearningRate 0.0238   Epoch: 15   Global Step: 161820   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:00:03,580-Speed 5982.24 samples/sec   Loss 3.7659   LearningRate 0.0238   Epoch: 15   Global Step: 161830   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:00:10,432-Speed 5979.83 samples/sec   Loss 3.8666   LearningRate 0.0238   Epoch: 15   Global Step: 161840   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:00:17,306-Speed 5959.50 samples/sec   Loss 3.7937   LearningRate 0.0238   Epoch: 15   Global Step: 161850   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:00:24,156-Speed 5980.90 samples/sec   Loss 3.7896   LearningRate 0.0238   Epoch: 15   Global Step: 161860   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:00:31,006-Speed 5981.11 samples/sec   Loss 3.8608   LearningRate 0.0238   Epoch: 15   Global Step: 161870   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:00:37,864-Speed 5973.21 samples/sec   Loss 3.8060   LearningRate 0.0238   Epoch: 15   Global Step: 161880   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:00:44,721-Speed 5974.51 samples/sec   Loss 3.7719   LearningRate 0.0238   Epoch: 15   Global Step: 161890   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 04:00:51,575-Speed 5977.30 samples/sec   Loss 3.8236   LearningRate 0.0238   Epoch: 15   Global Step: 161900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 04:00:58,432-Speed 5974.61 samples/sec   Loss 3.8335   LearningRate 0.0237   Epoch: 15   Global Step: 161910   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 04:01:05,278-Speed 5984.16 samples/sec   Loss 3.7907   LearningRate 0.0237   Epoch: 15   Global Step: 161920   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:01:12,122-Speed 5985.35 samples/sec   Loss 3.8050   LearningRate 0.0237   Epoch: 15   Global Step: 161930   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:01:18,988-Speed 5967.53 samples/sec   Loss 3.8173   LearningRate 0.0237   Epoch: 15   Global Step: 161940   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:01:25,860-Speed 5960.60 samples/sec   Loss 3.8109   LearningRate 0.0237   Epoch: 15   Global Step: 161950   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:01:32,711-Speed 5980.52 samples/sec   Loss 3.8101   LearningRate 0.0237   Epoch: 15   Global Step: 161960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:01:39,561-Speed 5981.47 samples/sec   Loss 3.7712   LearningRate 0.0237   Epoch: 15   Global Step: 161970   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:01:46,420-Speed 5971.81 samples/sec   Loss 3.7682   LearningRate 0.0237   Epoch: 15   Global Step: 161980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:01:53,282-Speed 5970.78 samples/sec   Loss 3.7413   LearningRate 0.0237   Epoch: 15   Global Step: 161990   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:02:00,142-Speed 5972.16 samples/sec   Loss 3.8140   LearningRate 0.0236   Epoch: 15   Global Step: 162000   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:02:07,002-Speed 5971.69 samples/sec   Loss 3.8019   LearningRate 0.0236   Epoch: 15   Global Step: 162010   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:02:13,859-Speed 5974.84 samples/sec   Loss 3.8307   LearningRate 0.0236   Epoch: 15   Global Step: 162020   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:02:20,731-Speed 5962.09 samples/sec   Loss 3.7990   LearningRate 0.0236   Epoch: 15   Global Step: 162030   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:02:27,580-Speed 5981.54 samples/sec   Loss 3.7305   LearningRate 0.0236   Epoch: 15   Global Step: 162040   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:02:35,125-Speed 5429.48 samples/sec   Loss 3.7962   LearningRate 0.0236   Epoch: 15   Global Step: 162050   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:02:42,006-Speed 5954.75 samples/sec   Loss 3.8318   LearningRate 0.0236   Epoch: 15   Global Step: 162060   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:02:48,878-Speed 5960.82 samples/sec   Loss 3.7764   LearningRate 0.0236   Epoch: 15   Global Step: 162070   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:02:55,740-Speed 5971.05 samples/sec   Loss 3.8000   LearningRate 0.0236   Epoch: 15   Global Step: 162080   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:03:02,597-Speed 5974.08 samples/sec   Loss 3.7649   LearningRate 0.0236   Epoch: 15   Global Step: 162090   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:03:09,461-Speed 5968.51 samples/sec   Loss 3.7704   LearningRate 0.0235   Epoch: 15   Global Step: 162100   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:03:16,308-Speed 5983.04 samples/sec   Loss 3.8143   LearningRate 0.0235   Epoch: 15   Global Step: 162110   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:03:23,171-Speed 5970.58 samples/sec   Loss 3.7852   LearningRate 0.0235   Epoch: 15   Global Step: 162120   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 04:03:30,031-Speed 5971.24 samples/sec   Loss 3.7782   LearningRate 0.0235   Epoch: 15   Global Step: 162130   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 04:03:36,893-Speed 5970.98 samples/sec   Loss 3.8286   LearningRate 0.0235   Epoch: 15   Global Step: 162140   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 04:03:43,768-Speed 5958.90 samples/sec   Loss 3.8138   LearningRate 0.0235   Epoch: 15   Global Step: 162150   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 04:03:50,632-Speed 5968.25 samples/sec   Loss 3.7813   LearningRate 0.0235   Epoch: 15   Global Step: 162160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:03:57,507-Speed 5959.23 samples/sec   Loss 3.8111   LearningRate 0.0235   Epoch: 15   Global Step: 162170   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:04:04,363-Speed 5975.75 samples/sec   Loss 3.7501   LearningRate 0.0235   Epoch: 15   Global Step: 162180   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:04:11,243-Speed 5954.64 samples/sec   Loss 3.7411   LearningRate 0.0234   Epoch: 15   Global Step: 162190   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:04:18,096-Speed 5978.30 samples/sec   Loss 3.7663   LearningRate 0.0234   Epoch: 15   Global Step: 162200   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:04:24,950-Speed 5977.18 samples/sec   Loss 3.7889   LearningRate 0.0234   Epoch: 15   Global Step: 162210   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:04:31,815-Speed 5967.17 samples/sec   Loss 3.8013   LearningRate 0.0234   Epoch: 15   Global Step: 162220   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:04:38,668-Speed 5978.44 samples/sec   Loss 3.7840   LearningRate 0.0234   Epoch: 15   Global Step: 162230   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:04:45,563-Speed 5943.14 samples/sec   Loss 3.7998   LearningRate 0.0234   Epoch: 15   Global Step: 162240   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:04:52,425-Speed 5970.21 samples/sec   Loss 3.8083   LearningRate 0.0234   Epoch: 15   Global Step: 162250   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:04:59,287-Speed 5970.02 samples/sec   Loss 3.7349   LearningRate 0.0234   Epoch: 15   Global Step: 162260   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:05:06,138-Speed 5982.15 samples/sec   Loss 3.7786   LearningRate 0.0234   Epoch: 15   Global Step: 162270   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:05:13,006-Speed 5965.38 samples/sec   Loss 3.8179   LearningRate 0.0234   Epoch: 15   Global Step: 162280   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:05:19,858-Speed 5978.93 samples/sec   Loss 3.7523   LearningRate 0.0233   Epoch: 15   Global Step: 162290   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:05:26,719-Speed 5971.65 samples/sec   Loss 3.7988   LearningRate 0.0233   Epoch: 15   Global Step: 162300   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:05:33,580-Speed 5970.39 samples/sec   Loss 3.7215   LearningRate 0.0233   Epoch: 15   Global Step: 162310   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:05:40,441-Speed 5971.89 samples/sec   Loss 3.7502   LearningRate 0.0233   Epoch: 15   Global Step: 162320   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:05:47,294-Speed 5977.81 samples/sec   Loss 3.7896   LearningRate 0.0233   Epoch: 15   Global Step: 162330   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:05:54,146-Speed 5979.18 samples/sec   Loss 3.7578   LearningRate 0.0233   Epoch: 15   Global Step: 162340   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:06:01,008-Speed 5970.55 samples/sec   Loss 3.7476   LearningRate 0.0233   Epoch: 15   Global Step: 162350   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:06:07,858-Speed 5980.99 samples/sec   Loss 3.7854   LearningRate 0.0233   Epoch: 15   Global Step: 162360   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 04:06:14,721-Speed 5971.53 samples/sec   Loss 3.7791   LearningRate 0.0233   Epoch: 15   Global Step: 162370   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 04:06:21,594-Speed 5960.72 samples/sec   Loss 3.7679   LearningRate 0.0233   Epoch: 15   Global Step: 162380   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:06:28,468-Speed 5960.48 samples/sec   Loss 3.7580   LearningRate 0.0232   Epoch: 15   Global Step: 162390   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:06:35,358-Speed 5945.30 samples/sec   Loss 3.7506   LearningRate 0.0232   Epoch: 15   Global Step: 162400   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:06:42,227-Speed 5966.10 samples/sec   Loss 3.7728   LearningRate 0.0232   Epoch: 15   Global Step: 162410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:06:49,080-Speed 5978.57 samples/sec   Loss 3.7359   LearningRate 0.0232   Epoch: 15   Global Step: 162420   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:06:55,934-Speed 5976.78 samples/sec   Loss 3.8314   LearningRate 0.0232   Epoch: 15   Global Step: 162430   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:07:02,780-Speed 5983.64 samples/sec   Loss 3.7746   LearningRate 0.0232   Epoch: 15   Global Step: 162440   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:07:09,648-Speed 5966.05 samples/sec   Loss 3.7056   LearningRate 0.0232   Epoch: 15   Global Step: 162450   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:07:16,507-Speed 5972.22 samples/sec   Loss 3.6958   LearningRate 0.0232   Epoch: 15   Global Step: 162460   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:07:23,380-Speed 5961.35 samples/sec   Loss 3.7511   LearningRate 0.0232   Epoch: 15   Global Step: 162470   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:07:30,246-Speed 5968.21 samples/sec   Loss 3.7524   LearningRate 0.0231   Epoch: 15   Global Step: 162480   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:07:37,116-Speed 5962.83 samples/sec   Loss 3.7733   LearningRate 0.0231   Epoch: 15   Global Step: 162490   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:07:43,970-Speed 5977.13 samples/sec   Loss 3.7773   LearningRate 0.0231   Epoch: 15   Global Step: 162500   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:07:50,830-Speed 5972.49 samples/sec   Loss 3.6699   LearningRate 0.0231   Epoch: 15   Global Step: 162510   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:07:57,680-Speed 5983.21 samples/sec   Loss 3.7600   LearningRate 0.0231   Epoch: 15   Global Step: 162520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:08:04,529-Speed 5980.97 samples/sec   Loss 3.7531   LearningRate 0.0231   Epoch: 15   Global Step: 162530   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:08:11,397-Speed 5965.67 samples/sec   Loss 3.7425   LearningRate 0.0231   Epoch: 15   Global Step: 162540   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:08:18,265-Speed 5964.46 samples/sec   Loss 3.7922   LearningRate 0.0231   Epoch: 15   Global Step: 162550   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:08:25,112-Speed 5983.60 samples/sec   Loss 3.7219   LearningRate 0.0231   Epoch: 15   Global Step: 162560   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:08:31,972-Speed 5972.46 samples/sec   Loss 3.7321   LearningRate 0.0231   Epoch: 15   Global Step: 162570   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:08:38,823-Speed 5979.19 samples/sec   Loss 3.7492   LearningRate 0.0230   Epoch: 15   Global Step: 162580   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:08:45,717-Speed 5942.65 samples/sec   Loss 3.7348   LearningRate 0.0230   Epoch: 15   Global Step: 162590   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:08:52,580-Speed 5969.16 samples/sec   Loss 3.7224   LearningRate 0.0230   Epoch: 15   Global Step: 162600   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:08:59,458-Speed 5957.03 samples/sec   Loss 3.7311   LearningRate 0.0230   Epoch: 15   Global Step: 162610   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:09:06,336-Speed 5956.04 samples/sec   Loss 3.7297   LearningRate 0.0230   Epoch: 15   Global Step: 162620   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:09:13,231-Speed 5942.10 samples/sec   Loss 3.7321   LearningRate 0.0230   Epoch: 15   Global Step: 162630   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:09:20,189-Speed 5888.30 samples/sec   Loss 3.7516   LearningRate 0.0230   Epoch: 15   Global Step: 162640   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:09:27,034-Speed 5984.93 samples/sec   Loss 3.7576   LearningRate 0.0230   Epoch: 15   Global Step: 162650   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:09:33,926-Speed 5944.93 samples/sec   Loss 3.7620   LearningRate 0.0230   Epoch: 15   Global Step: 162660   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:09:40,808-Speed 5952.68 samples/sec   Loss 3.7090   LearningRate 0.0230   Epoch: 15   Global Step: 162670   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:09:47,689-Speed 5953.88 samples/sec   Loss 3.7553   LearningRate 0.0229   Epoch: 15   Global Step: 162680   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:09:54,588-Speed 5938.26 samples/sec   Loss 3.7280   LearningRate 0.0229   Epoch: 15   Global Step: 162690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:10:01,450-Speed 5970.59 samples/sec   Loss 3.7295   LearningRate 0.0229   Epoch: 15   Global Step: 162700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:10:08,324-Speed 5959.79 samples/sec   Loss 3.7655   LearningRate 0.0229   Epoch: 15   Global Step: 162710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:10:15,178-Speed 5978.75 samples/sec   Loss 3.7398   LearningRate 0.0229   Epoch: 15   Global Step: 162720   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:10:22,042-Speed 5967.62 samples/sec   Loss 3.7195   LearningRate 0.0229   Epoch: 15   Global Step: 162730   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:10:28,915-Speed 5960.80 samples/sec   Loss 3.7708   LearningRate 0.0229   Epoch: 15   Global Step: 162740   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:10:35,764-Speed 5982.40 samples/sec   Loss 3.7147   LearningRate 0.0229   Epoch: 15   Global Step: 162750   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:10:42,616-Speed 5978.59 samples/sec   Loss 3.7315   LearningRate 0.0229   Epoch: 15   Global Step: 162760   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:10:49,465-Speed 5981.83 samples/sec   Loss 3.7939   LearningRate 0.0229   Epoch: 15   Global Step: 162770   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:10:56,319-Speed 5977.09 samples/sec   Loss 3.7461   LearningRate 0.0228   Epoch: 15   Global Step: 162780   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:11:03,174-Speed 5976.23 samples/sec   Loss 3.7012   LearningRate 0.0228   Epoch: 15   Global Step: 162790   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:11:10,073-Speed 5939.09 samples/sec   Loss 3.7227   LearningRate 0.0228   Epoch: 15   Global Step: 162800   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:11:16,948-Speed 5958.66 samples/sec   Loss 3.7279   LearningRate 0.0228   Epoch: 15   Global Step: 162810   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:11:23,798-Speed 5982.18 samples/sec   Loss 3.7738   LearningRate 0.0228   Epoch: 15   Global Step: 162820   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:11:30,678-Speed 5954.66 samples/sec   Loss 3.6783   LearningRate 0.0228   Epoch: 15   Global Step: 162830   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:11:37,571-Speed 5944.37 samples/sec   Loss 3.6806   LearningRate 0.0228   Epoch: 15   Global Step: 162840   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:11:44,439-Speed 5964.52 samples/sec   Loss 3.7133   LearningRate 0.0228   Epoch: 15   Global Step: 162850   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:11:51,303-Speed 5969.13 samples/sec   Loss 3.7471   LearningRate 0.0228   Epoch: 15   Global Step: 162860   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:11:58,160-Speed 5975.00 samples/sec   Loss 3.6959   LearningRate 0.0227   Epoch: 15   Global Step: 162870   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:12:05,026-Speed 5966.00 samples/sec   Loss 3.7349   LearningRate 0.0227   Epoch: 15   Global Step: 162880   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:12:11,889-Speed 5969.94 samples/sec   Loss 3.7547   LearningRate 0.0227   Epoch: 15   Global Step: 162890   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:12:18,772-Speed 5951.73 samples/sec   Loss 3.7508   LearningRate 0.0227   Epoch: 15   Global Step: 162900   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:12:25,649-Speed 5956.73 samples/sec   Loss 3.7005   LearningRate 0.0227   Epoch: 15   Global Step: 162910   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:12:32,556-Speed 5931.73 samples/sec   Loss 3.7185   LearningRate 0.0227   Epoch: 15   Global Step: 162920   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:12:39,417-Speed 5971.57 samples/sec   Loss 3.7148   LearningRate 0.0227   Epoch: 15   Global Step: 162930   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:12:46,269-Speed 5979.53 samples/sec   Loss 3.6983   LearningRate 0.0227   Epoch: 15   Global Step: 162940   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:12:53,123-Speed 5976.38 samples/sec   Loss 3.6911   LearningRate 0.0227   Epoch: 15   Global Step: 162950   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:12:59,974-Speed 5980.33 samples/sec   Loss 3.6983   LearningRate 0.0227   Epoch: 15   Global Step: 162960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:13:06,821-Speed 5982.62 samples/sec   Loss 3.7280   LearningRate 0.0226   Epoch: 15   Global Step: 162970   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:13:13,695-Speed 5959.99 samples/sec   Loss 3.6848   LearningRate 0.0226   Epoch: 15   Global Step: 162980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:13:20,568-Speed 5961.12 samples/sec   Loss 3.7323   LearningRate 0.0226   Epoch: 15   Global Step: 162990   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:13:27,459-Speed 5945.25 samples/sec   Loss 3.7557   LearningRate 0.0226   Epoch: 15   Global Step: 163000   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:13:34,344-Speed 5950.36 samples/sec   Loss 3.7241   LearningRate 0.0226   Epoch: 15   Global Step: 163010   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:13:41,212-Speed 5965.68 samples/sec   Loss 3.6835   LearningRate 0.0226   Epoch: 15   Global Step: 163020   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:13:48,072-Speed 5971.47 samples/sec   Loss 3.6765   LearningRate 0.0226   Epoch: 15   Global Step: 163030   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:13:54,923-Speed 5980.54 samples/sec   Loss 3.7507   LearningRate 0.0226   Epoch: 15   Global Step: 163040   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:14:01,796-Speed 5960.39 samples/sec   Loss 3.6840   LearningRate 0.0226   Epoch: 15   Global Step: 163050   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:14:08,662-Speed 5966.85 samples/sec   Loss 3.7067   LearningRate 0.0226   Epoch: 15   Global Step: 163060   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 04:14:15,515-Speed 5978.52 samples/sec   Loss 3.7190   LearningRate 0.0225   Epoch: 15   Global Step: 163070   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 04:14:22,382-Speed 5965.78 samples/sec   Loss 3.7497   LearningRate 0.0225   Epoch: 15   Global Step: 163080   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:14:29,271-Speed 5946.62 samples/sec   Loss 3.7232   LearningRate 0.0225   Epoch: 15   Global Step: 163090   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:14:36,142-Speed 5963.41 samples/sec   Loss 3.7629   LearningRate 0.0225   Epoch: 15   Global Step: 163100   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:14:43,030-Speed 5949.38 samples/sec   Loss 3.7304   LearningRate 0.0225   Epoch: 15   Global Step: 163110   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:14:49,884-Speed 5977.08 samples/sec   Loss 3.6703   LearningRate 0.0225   Epoch: 15   Global Step: 163120   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:14:56,738-Speed 5976.97 samples/sec   Loss 3.7488   LearningRate 0.0225   Epoch: 15   Global Step: 163130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:15:03,589-Speed 5979.90 samples/sec   Loss 3.7324   LearningRate 0.0225   Epoch: 15   Global Step: 163140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:15:10,435-Speed 5983.71 samples/sec   Loss 3.6812   LearningRate 0.0225   Epoch: 15   Global Step: 163150   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:15:17,274-Speed 5991.02 samples/sec   Loss 3.7158   LearningRate 0.0225   Epoch: 15   Global Step: 163160   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:15:24,165-Speed 5947.25 samples/sec   Loss 3.6945   LearningRate 0.0224   Epoch: 15   Global Step: 163170   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:15:31,033-Speed 5964.70 samples/sec   Loss 3.6837   LearningRate 0.0224   Epoch: 15   Global Step: 163180   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:15:37,899-Speed 5967.06 samples/sec   Loss 3.6917   LearningRate 0.0224   Epoch: 15   Global Step: 163190   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:15:44,745-Speed 5983.88 samples/sec   Loss 3.7098   LearningRate 0.0224   Epoch: 15   Global Step: 163200   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:15:51,590-Speed 5985.16 samples/sec   Loss 3.7244   LearningRate 0.0224   Epoch: 15   Global Step: 163210   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:15:58,438-Speed 5983.18 samples/sec   Loss 3.7073   LearningRate 0.0224   Epoch: 15   Global Step: 163220   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:16:05,330-Speed 5944.57 samples/sec   Loss 3.6841   LearningRate 0.0224   Epoch: 15   Global Step: 163230   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:16:12,212-Speed 5952.83 samples/sec   Loss 3.6911   LearningRate 0.0224   Epoch: 15   Global Step: 163240   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:16:19,135-Speed 5917.10 samples/sec   Loss 3.7151   LearningRate 0.0224   Epoch: 15   Global Step: 163250   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:16:26,002-Speed 5966.06 samples/sec   Loss 3.6821   LearningRate 0.0224   Epoch: 15   Global Step: 163260   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:16:32,857-Speed 5977.05 samples/sec   Loss 3.6845   LearningRate 0.0223   Epoch: 15   Global Step: 163270   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:16:39,727-Speed 5963.25 samples/sec   Loss 3.7027   LearningRate 0.0223   Epoch: 15   Global Step: 163280   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:16:46,593-Speed 5967.23 samples/sec   Loss 3.7439   LearningRate 0.0223   Epoch: 15   Global Step: 163290   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:16:53,485-Speed 5943.78 samples/sec   Loss 3.6947   LearningRate 0.0223   Epoch: 15   Global Step: 163300   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:17:00,330-Speed 5985.83 samples/sec   Loss 3.6863   LearningRate 0.0223   Epoch: 15   Global Step: 163310   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:17:07,172-Speed 5988.64 samples/sec   Loss 3.6987   LearningRate 0.0223   Epoch: 15   Global Step: 163320   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:17:14,062-Speed 5946.15 samples/sec   Loss 3.7061   LearningRate 0.0223   Epoch: 15   Global Step: 163330   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:17:20,952-Speed 5946.55 samples/sec   Loss 3.7412   LearningRate 0.0223   Epoch: 15   Global Step: 163340   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:17:27,811-Speed 5972.64 samples/sec   Loss 3.6593   LearningRate 0.0223   Epoch: 15   Global Step: 163350   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:17:34,666-Speed 5976.43 samples/sec   Loss 3.6663   LearningRate 0.0223   Epoch: 15   Global Step: 163360   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 04:17:41,518-Speed 5980.37 samples/sec   Loss 3.6744   LearningRate 0.0222   Epoch: 15   Global Step: 163370   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:17:48,405-Speed 5949.24 samples/sec   Loss 3.6393   LearningRate 0.0222   Epoch: 15   Global Step: 163380   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:17:55,272-Speed 5965.04 samples/sec   Loss 3.6952   LearningRate 0.0222   Epoch: 15   Global Step: 163390   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:18:02,130-Speed 5974.29 samples/sec   Loss 3.6690   LearningRate 0.0222   Epoch: 15   Global Step: 163400   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:18:08,992-Speed 5971.80 samples/sec   Loss 3.6507   LearningRate 0.0222   Epoch: 15   Global Step: 163410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:18:15,839-Speed 5983.43 samples/sec   Loss 3.6775   LearningRate 0.0222   Epoch: 15   Global Step: 163420   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:18:22,692-Speed 5978.57 samples/sec   Loss 3.6860   LearningRate 0.0222   Epoch: 15   Global Step: 163430   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:18:29,553-Speed 5971.79 samples/sec   Loss 3.6412   LearningRate 0.0222   Epoch: 15   Global Step: 163440   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:18:36,403-Speed 5980.20 samples/sec   Loss 3.6981   LearningRate 0.0222   Epoch: 15   Global Step: 163450   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:18:43,286-Speed 5954.22 samples/sec   Loss 3.7058   LearningRate 0.0221   Epoch: 15   Global Step: 163460   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:18:50,151-Speed 5967.80 samples/sec   Loss 3.6506   LearningRate 0.0221   Epoch: 15   Global Step: 163470   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:18:56,995-Speed 5985.99 samples/sec   Loss 3.6942   LearningRate 0.0221   Epoch: 15   Global Step: 163480   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:19:03,843-Speed 5984.25 samples/sec   Loss 3.6626   LearningRate 0.0221   Epoch: 15   Global Step: 163490   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:19:10,707-Speed 5969.07 samples/sec   Loss 3.6379   LearningRate 0.0221   Epoch: 15   Global Step: 163500   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:19:17,568-Speed 5970.79 samples/sec   Loss 3.7120   LearningRate 0.0221   Epoch: 15   Global Step: 163510   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:19:24,443-Speed 5958.97 samples/sec   Loss 3.7026   LearningRate 0.0221   Epoch: 15   Global Step: 163520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:19:31,344-Speed 5937.26 samples/sec   Loss 3.7152   LearningRate 0.0221   Epoch: 15   Global Step: 163530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:19:38,201-Speed 5974.02 samples/sec   Loss 3.6631   LearningRate 0.0221   Epoch: 15   Global Step: 163540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:19:45,055-Speed 5979.52 samples/sec   Loss 3.6909   LearningRate 0.0221   Epoch: 15   Global Step: 163550   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:19:51,903-Speed 5982.54 samples/sec   Loss 3.6830   LearningRate 0.0220   Epoch: 15   Global Step: 163560   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:19:58,767-Speed 5968.12 samples/sec   Loss 3.7119   LearningRate 0.0220   Epoch: 15   Global Step: 163570   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:20:05,636-Speed 5965.33 samples/sec   Loss 3.6688   LearningRate 0.0220   Epoch: 15   Global Step: 163580   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:20:12,499-Speed 5971.50 samples/sec   Loss 3.6704   LearningRate 0.0220   Epoch: 15   Global Step: 163590   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:20:19,352-Speed 5977.94 samples/sec   Loss 3.6506   LearningRate 0.0220   Epoch: 15   Global Step: 163600   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:20:26,216-Speed 5969.04 samples/sec   Loss 3.6754   LearningRate 0.0220   Epoch: 15   Global Step: 163610   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:20:33,081-Speed 5972.43 samples/sec   Loss 3.6653   LearningRate 0.0220   Epoch: 15   Global Step: 163620   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:20:39,936-Speed 5975.88 samples/sec   Loss 3.7002   LearningRate 0.0220   Epoch: 15   Global Step: 163630   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:20:46,818-Speed 5953.77 samples/sec   Loss 3.7166   LearningRate 0.0220   Epoch: 15   Global Step: 163640   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:20:53,666-Speed 5982.87 samples/sec   Loss 3.6932   LearningRate 0.0220   Epoch: 15   Global Step: 163650   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:21:00,519-Speed 5978.04 samples/sec   Loss 3.6684   LearningRate 0.0219   Epoch: 15   Global Step: 163660   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:21:07,381-Speed 5970.50 samples/sec   Loss 3.6843   LearningRate 0.0219   Epoch: 15   Global Step: 163670   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:21:14,255-Speed 5960.71 samples/sec   Loss 3.7423   LearningRate 0.0219   Epoch: 15   Global Step: 163680   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:21:21,124-Speed 5963.71 samples/sec   Loss 3.6520   LearningRate 0.0219   Epoch: 15   Global Step: 163690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:21:27,987-Speed 5969.08 samples/sec   Loss 3.6724   LearningRate 0.0219   Epoch: 15   Global Step: 163700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:21:34,839-Speed 5981.48 samples/sec   Loss 3.6677   LearningRate 0.0219   Epoch: 15   Global Step: 163710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:21:41,683-Speed 5986.72 samples/sec   Loss 3.6841   LearningRate 0.0219   Epoch: 15   Global Step: 163720   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:21:48,538-Speed 5976.40 samples/sec   Loss 3.6459   LearningRate 0.0219   Epoch: 15   Global Step: 163730   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:21:55,422-Speed 5951.86 samples/sec   Loss 3.6350   LearningRate 0.0219   Epoch: 15   Global Step: 163740   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:22:02,268-Speed 5983.39 samples/sec   Loss 3.6680   LearningRate 0.0219   Epoch: 15   Global Step: 163750   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:22:09,118-Speed 5982.40 samples/sec   Loss 3.6594   LearningRate 0.0218   Epoch: 15   Global Step: 163760   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:22:15,999-Speed 5954.37 samples/sec   Loss 3.6554   LearningRate 0.0218   Epoch: 15   Global Step: 163770   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:22:22,872-Speed 5960.24 samples/sec   Loss 3.6391   LearningRate 0.0218   Epoch: 15   Global Step: 163780   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:22:29,738-Speed 5967.48 samples/sec   Loss 3.7036   LearningRate 0.0218   Epoch: 15   Global Step: 163790   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:22:36,616-Speed 5957.11 samples/sec   Loss 3.6790   LearningRate 0.0218   Epoch: 15   Global Step: 163800   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:22:43,463-Speed 5982.90 samples/sec   Loss 3.6728   LearningRate 0.0218   Epoch: 15   Global Step: 163810   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:22:50,337-Speed 5959.97 samples/sec   Loss 3.6607   LearningRate 0.0218   Epoch: 15   Global Step: 163820   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:22:57,199-Speed 5973.50 samples/sec   Loss 3.6237   LearningRate 0.0218   Epoch: 15   Global Step: 163830   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:23:04,053-Speed 5977.17 samples/sec   Loss 3.6728   LearningRate 0.0218   Epoch: 15   Global Step: 163840   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:23:10,912-Speed 5973.15 samples/sec   Loss 3.6583   LearningRate 0.0218   Epoch: 15   Global Step: 163850   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-01-09 04:23:17,775-Speed 5969.99 samples/sec   Loss 3.6701   LearningRate 0.0217   Epoch: 15   Global Step: 163860   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:23:24,636-Speed 5971.41 samples/sec   Loss 3.6142   LearningRate 0.0217   Epoch: 15   Global Step: 163870   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:23:31,520-Speed 5951.60 samples/sec   Loss 3.6464   LearningRate 0.0217   Epoch: 15   Global Step: 163880   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:23:38,371-Speed 5979.93 samples/sec   Loss 3.6658   LearningRate 0.0217   Epoch: 15   Global Step: 163890   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:23:45,257-Speed 5949.00 samples/sec   Loss 3.6059   LearningRate 0.0217   Epoch: 15   Global Step: 163900   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:23:52,109-Speed 5978.35 samples/sec   Loss 3.6300   LearningRate 0.0217   Epoch: 15   Global Step: 163910   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:23:59,010-Speed 5936.72 samples/sec   Loss 3.6338   LearningRate 0.0217   Epoch: 15   Global Step: 163920   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:24:05,912-Speed 5935.40 samples/sec   Loss 3.6621   LearningRate 0.0217   Epoch: 15   Global Step: 163930   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:24:12,816-Speed 5933.94 samples/sec   Loss 3.6660   LearningRate 0.0217   Epoch: 15   Global Step: 163940   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:24:19,714-Speed 5939.16 samples/sec   Loss 3.6392   LearningRate 0.0217   Epoch: 15   Global Step: 163950   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:24:26,597-Speed 5951.90 samples/sec   Loss 3.6574   LearningRate 0.0216   Epoch: 15   Global Step: 163960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 04:24:33,446-Speed 5981.47 samples/sec   Loss 3.6379   LearningRate 0.0216   Epoch: 15   Global Step: 163970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-01-09 04:24:40,292-Speed 5984.14 samples/sec   Loss 3.6745   LearningRate 0.0216   Epoch: 15   Global Step: 163980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:24:47,133-Speed 5988.59 samples/sec   Loss 3.6209   LearningRate 0.0216   Epoch: 15   Global Step: 163990   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:24:54,008-Speed 5961.62 samples/sec   Loss 3.6341   LearningRate 0.0216   Epoch: 15   Global Step: 164000   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:25:00,860-Speed 5979.68 samples/sec   Loss 3.6509   LearningRate 0.0216   Epoch: 15   Global Step: 164010   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-01-09 04:25:07,745-Speed 5950.02 samples/sec   Loss 3.6108   LearningRate 0.0216   Epoch: 15   Global Step: 164020   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:25:14,605-Speed 5972.30 samples/sec   Loss 3.6993   LearningRate 0.0216   Epoch: 15   Global Step: 164030   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:25:21,471-Speed 5966.71 samples/sec   Loss 3.6680   LearningRate 0.0216   Epoch: 15   Global Step: 164040   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:25:28,393-Speed 5918.62 samples/sec   Loss 3.6197   LearningRate 0.0216   Epoch: 15   Global Step: 164050   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:25:35,277-Speed 5951.31 samples/sec   Loss 3.6257   LearningRate 0.0215   Epoch: 15   Global Step: 164060   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:25:42,142-Speed 5966.95 samples/sec   Loss 3.6241   LearningRate 0.0215   Epoch: 15   Global Step: 164070   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:25:50,186-Speed 5092.64 samples/sec   Loss 3.6349   LearningRate 0.0215   Epoch: 15   Global Step: 164080   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:25:57,035-Speed 5982.69 samples/sec   Loss 3.6529   LearningRate 0.0215   Epoch: 15   Global Step: 164090   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:26:03,891-Speed 5975.45 samples/sec   Loss 3.6726   LearningRate 0.0215   Epoch: 15   Global Step: 164100   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:26:10,747-Speed 5975.46 samples/sec   Loss 3.6355   LearningRate 0.0215   Epoch: 15   Global Step: 164110   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:26:17,609-Speed 5971.02 samples/sec   Loss 3.6620   LearningRate 0.0215   Epoch: 15   Global Step: 164120   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:26:24,469-Speed 5972.38 samples/sec   Loss 3.6458   LearningRate 0.0215   Epoch: 15   Global Step: 164130   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:26:31,326-Speed 5974.26 samples/sec   Loss 3.6395   LearningRate 0.0215   Epoch: 15   Global Step: 164140   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:26:38,176-Speed 5981.42 samples/sec   Loss 3.6322   LearningRate 0.0215   Epoch: 15   Global Step: 164150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:26:45,038-Speed 5970.27 samples/sec   Loss 3.6661   LearningRate 0.0214   Epoch: 15   Global Step: 164160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:26:51,906-Speed 5964.71 samples/sec   Loss 3.6504   LearningRate 0.0214   Epoch: 15   Global Step: 164170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:26:58,769-Speed 5969.86 samples/sec   Loss 3.6271   LearningRate 0.0214   Epoch: 15   Global Step: 164180   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-09 04:27:05,624-Speed 5976.62 samples/sec   Loss 3.6507   LearningRate 0.0214   Epoch: 15   Global Step: 164190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:27:12,486-Speed 5969.76 samples/sec   Loss 3.6032   LearningRate 0.0214   Epoch: 15   Global Step: 164200   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:27:19,368-Speed 5953.13 samples/sec   Loss 3.5989   LearningRate 0.0214   Epoch: 15   Global Step: 164210   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:27:26,248-Speed 5957.77 samples/sec   Loss 3.6144   LearningRate 0.0214   Epoch: 15   Global Step: 164220   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:27:33,123-Speed 5959.35 samples/sec   Loss 3.6409   LearningRate 0.0214   Epoch: 15   Global Step: 164230   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:27:39,989-Speed 5966.60 samples/sec   Loss 3.6374   LearningRate 0.0214   Epoch: 15   Global Step: 164240   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:27:46,887-Speed 5939.43 samples/sec   Loss 3.5910   LearningRate 0.0214   Epoch: 15   Global Step: 164250   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:27:53,763-Speed 5957.19 samples/sec   Loss 3.6342   LearningRate 0.0214   Epoch: 15   Global Step: 164260   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:28:00,619-Speed 5976.33 samples/sec   Loss 3.6187   LearningRate 0.0213   Epoch: 15   Global Step: 164270   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:28:07,495-Speed 5958.47 samples/sec   Loss 3.6340   LearningRate 0.0213   Epoch: 15   Global Step: 164280   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:28:14,344-Speed 5981.54 samples/sec   Loss 3.6085   LearningRate 0.0213   Epoch: 15   Global Step: 164290   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:28:21,229-Speed 5950.96 samples/sec   Loss 3.6243   LearningRate 0.0213   Epoch: 15   Global Step: 164300   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:28:28,158-Speed 5913.20 samples/sec   Loss 3.6041   LearningRate 0.0213   Epoch: 15   Global Step: 164310   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:28:35,009-Speed 5979.32 samples/sec   Loss 3.6196   LearningRate 0.0213   Epoch: 15   Global Step: 164320   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:28:41,869-Speed 5972.22 samples/sec   Loss 3.6013   LearningRate 0.0213   Epoch: 15   Global Step: 164330   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:28:48,722-Speed 5978.14 samples/sec   Loss 3.6236   LearningRate 0.0213   Epoch: 15   Global Step: 164340   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:28:55,612-Speed 5946.25 samples/sec   Loss 3.5861   LearningRate 0.0213   Epoch: 15   Global Step: 164350   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:29:02,467-Speed 5976.33 samples/sec   Loss 3.5687   LearningRate 0.0213   Epoch: 15   Global Step: 164360   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:29:09,375-Speed 5932.79 samples/sec   Loss 3.5994   LearningRate 0.0212   Epoch: 15   Global Step: 164370   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:29:16,308-Speed 5908.72 samples/sec   Loss 3.6320   LearningRate 0.0212   Epoch: 15   Global Step: 164380   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:29:23,204-Speed 5941.23 samples/sec   Loss 3.6268   LearningRate 0.0212   Epoch: 15   Global Step: 164390   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:29:30,047-Speed 5987.14 samples/sec   Loss 3.5675   LearningRate 0.0212   Epoch: 15   Global Step: 164400   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:29:36,902-Speed 5976.03 samples/sec   Loss 3.6167   LearningRate 0.0212   Epoch: 15   Global Step: 164410   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:29:43,757-Speed 5977.89 samples/sec   Loss 3.6123   LearningRate 0.0212   Epoch: 15   Global Step: 164420   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:29:50,642-Speed 5950.60 samples/sec   Loss 3.6270   LearningRate 0.0212   Epoch: 15   Global Step: 164430   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:29:57,547-Speed 5932.98 samples/sec   Loss 3.6064   LearningRate 0.0212   Epoch: 15   Global Step: 164440   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:30:04,406-Speed 5972.57 samples/sec   Loss 3.6009   LearningRate 0.0212   Epoch: 15   Global Step: 164450   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:30:11,258-Speed 5978.98 samples/sec   Loss 3.5931   LearningRate 0.0212   Epoch: 15   Global Step: 164460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-09 04:30:18,131-Speed 5960.28 samples/sec   Loss 3.6304   LearningRate 0.0211   Epoch: 15   Global Step: 164470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-09 04:30:24,994-Speed 5970.33 samples/sec   Loss 3.6192   LearningRate 0.0211   Epoch: 15   Global Step: 164480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-09 04:30:31,841-Speed 5982.88 samples/sec   Loss 3.5977   LearningRate 0.0211   Epoch: 15   Global Step: 164490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-09 04:30:38,684-Speed 5987.35 samples/sec   Loss 3.5949   LearningRate 0.0211   Epoch: 15   Global Step: 164500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:30:45,554-Speed 5963.49 samples/sec   Loss 3.5901   LearningRate 0.0211   Epoch: 15   Global Step: 164510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:30:52,427-Speed 5961.61 samples/sec   Loss 3.6138   LearningRate 0.0211   Epoch: 15   Global Step: 164520   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:30:59,272-Speed 5985.16 samples/sec   Loss 3.5933   LearningRate 0.0211   Epoch: 15   Global Step: 164530   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:31:06,135-Speed 5969.59 samples/sec   Loss 3.5921   LearningRate 0.0211   Epoch: 15   Global Step: 164540   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:31:13,009-Speed 5961.62 samples/sec   Loss 3.5969   LearningRate 0.0211   Epoch: 15   Global Step: 164550   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:31:19,873-Speed 5968.91 samples/sec   Loss 3.6439   LearningRate 0.0211   Epoch: 15   Global Step: 164560   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:31:26,748-Speed 5958.53 samples/sec   Loss 3.5765   LearningRate 0.0210   Epoch: 15   Global Step: 164570   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:31:33,614-Speed 5967.74 samples/sec   Loss 3.5927   LearningRate 0.0210   Epoch: 15   Global Step: 164580   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:31:40,471-Speed 5974.33 samples/sec   Loss 3.6103   LearningRate 0.0210   Epoch: 15   Global Step: 164590   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:31:47,367-Speed 5942.02 samples/sec   Loss 3.5934   LearningRate 0.0210   Epoch: 15   Global Step: 164600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-09 04:31:54,207-Speed 5989.37 samples/sec   Loss 3.6143   LearningRate 0.0210   Epoch: 15   Global Step: 164610   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:32:01,057-Speed 5980.88 samples/sec   Loss 3.6195   LearningRate 0.0210   Epoch: 15   Global Step: 164620   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:32:07,921-Speed 5968.56 samples/sec   Loss 3.5932   LearningRate 0.0210   Epoch: 15   Global Step: 164630   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:32:14,799-Speed 5956.74 samples/sec   Loss 3.6103   LearningRate 0.0210   Epoch: 15   Global Step: 164640   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:32:21,644-Speed 5984.67 samples/sec   Loss 3.5812   LearningRate 0.0210   Epoch: 15   Global Step: 164650   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:32:28,492-Speed 5982.20 samples/sec   Loss 3.6012   LearningRate 0.0210   Epoch: 15   Global Step: 164660   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:32:35,357-Speed 5967.86 samples/sec   Loss 3.5742   LearningRate 0.0209   Epoch: 15   Global Step: 164670   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:32:42,215-Speed 5973.86 samples/sec   Loss 3.5578   LearningRate 0.0209   Epoch: 15   Global Step: 164680   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:32:49,068-Speed 5978.13 samples/sec   Loss 3.5848   LearningRate 0.0209   Epoch: 15   Global Step: 164690   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:32:55,943-Speed 5959.48 samples/sec   Loss 3.6032   LearningRate 0.0209   Epoch: 15   Global Step: 164700   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:33:02,801-Speed 5974.32 samples/sec   Loss 3.6312   LearningRate 0.0209   Epoch: 15   Global Step: 164710   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:33:09,672-Speed 5962.47 samples/sec   Loss 3.6188   LearningRate 0.0209   Epoch: 15   Global Step: 164720   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:33:16,518-Speed 5984.13 samples/sec   Loss 3.5658   LearningRate 0.0209   Epoch: 15   Global Step: 164730   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:33:23,394-Speed 5958.00 samples/sec   Loss 3.5682   LearningRate 0.0209   Epoch: 15   Global Step: 164740   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:33:30,261-Speed 5966.34 samples/sec   Loss 3.5869   LearningRate 0.0209   Epoch: 15   Global Step: 164750   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:33:37,104-Speed 5986.47 samples/sec   Loss 3.5700   LearningRate 0.0209   Epoch: 15   Global Step: 164760   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:33:43,950-Speed 5984.03 samples/sec   Loss 3.6182   LearningRate 0.0208   Epoch: 15   Global Step: 164770   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:33:50,809-Speed 5973.45 samples/sec   Loss 3.5647   LearningRate 0.0208   Epoch: 15   Global Step: 164780   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:33:57,663-Speed 5976.38 samples/sec   Loss 3.5739   LearningRate 0.0208   Epoch: 15   Global Step: 164790   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:34:04,583-Speed 5920.54 samples/sec   Loss 3.5784   LearningRate 0.0208   Epoch: 15   Global Step: 164800   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:34:11,515-Speed 5911.77 samples/sec   Loss 3.5702   LearningRate 0.0208   Epoch: 15   Global Step: 164810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-09 04:34:18,426-Speed 5928.61 samples/sec   Loss 3.5348   LearningRate 0.0208   Epoch: 15   Global Step: 164820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-09 04:34:25,295-Speed 5963.82 samples/sec   Loss 3.5490   LearningRate 0.0208   Epoch: 15   Global Step: 164830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-09 04:34:32,155-Speed 5972.37 samples/sec   Loss 3.5473   LearningRate 0.0208   Epoch: 15   Global Step: 164840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-09 04:34:39,033-Speed 5956.22 samples/sec   Loss 3.6038   LearningRate 0.0208   Epoch: 15   Global Step: 164850   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-09 04:34:45,887-Speed 5977.01 samples/sec   Loss 3.6348   LearningRate 0.0208   Epoch: 15   Global Step: 164860   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-09 04:34:52,749-Speed 5970.66 samples/sec   Loss 3.5740   LearningRate 0.0208   Epoch: 15   Global Step: 164870   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-09 04:34:59,602-Speed 5977.79 samples/sec   Loss 3.6094   LearningRate 0.0207   Epoch: 15   Global Step: 164880   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:35:06,453-Speed 5979.49 samples/sec   Loss 3.5607   LearningRate 0.0207   Epoch: 15   Global Step: 164890   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:35:13,303-Speed 5980.99 samples/sec   Loss 3.6373   LearningRate 0.0207   Epoch: 15   Global Step: 164900   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:35:21,485-Speed 5007.12 samples/sec   Loss 3.5996   LearningRate 0.0207   Epoch: 15   Global Step: 164910   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:35:28,327-Speed 5987.76 samples/sec   Loss 3.6167   LearningRate 0.0207   Epoch: 15   Global Step: 164920   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:35:35,197-Speed 5963.24 samples/sec   Loss 3.5188   LearningRate 0.0207   Epoch: 15   Global Step: 164930   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:35:42,049-Speed 5979.83 samples/sec   Loss 3.5551   LearningRate 0.0207   Epoch: 15   Global Step: 164940   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:35:48,910-Speed 5970.72 samples/sec   Loss 3.5811   LearningRate 0.0207   Epoch: 15   Global Step: 164950   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:35:55,768-Speed 5974.59 samples/sec   Loss 3.5488   LearningRate 0.0207   Epoch: 15   Global Step: 164960   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:36:02,668-Speed 5937.22 samples/sec   Loss 3.6046   LearningRate 0.0207   Epoch: 15   Global Step: 164970   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:36:09,554-Speed 5949.76 samples/sec   Loss 3.6071   LearningRate 0.0206   Epoch: 15   Global Step: 164980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-09 04:36:16,442-Speed 5950.99 samples/sec   Loss 3.5620   LearningRate 0.0206   Epoch: 15   Global Step: 164990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-09 04:36:23,293-Speed 5979.89 samples/sec   Loss 3.5998   LearningRate 0.0206   Epoch: 15   Global Step: 165000   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:36:50,204-[lfw][165000]XNorm: 23.243098
Training: 2022-01-09 04:36:50,204-[lfw][165000]Accuracy-Flip: 0.99767+-0.00300
Training: 2022-01-09 04:36:50,205-[lfw][165000]Accuracy-Highest: 0.99817
Training: 2022-01-09 04:37:21,341-[cfp_fp][165000]XNorm: 20.781484
Training: 2022-01-09 04:37:21,342-[cfp_fp][165000]Accuracy-Flip: 0.98814+-0.00438
Training: 2022-01-09 04:37:21,343-[cfp_fp][165000]Accuracy-Highest: 0.98929
Training: 2022-01-09 04:37:47,972-[agedb_30][165000]XNorm: 22.635031
Training: 2022-01-09 04:37:47,973-[agedb_30][165000]Accuracy-Flip: 0.98067+-0.00633
Training: 2022-01-09 04:37:47,973-[agedb_30][165000]Accuracy-Highest: 0.98067
Training: 2022-01-09 04:37:54,823-Speed 447.51 samples/sec   Loss 3.5845   LearningRate 0.0206   Epoch: 15   Global Step: 165010   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:38:01,653-Speed 5998.64 samples/sec   Loss 3.5696   LearningRate 0.0206   Epoch: 15   Global Step: 165020   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:38:08,488-Speed 5993.94 samples/sec   Loss 3.5748   LearningRate 0.0206   Epoch: 15   Global Step: 165030   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:38:15,329-Speed 5989.68 samples/sec   Loss 3.5559   LearningRate 0.0206   Epoch: 15   Global Step: 165040   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:38:22,201-Speed 5960.81 samples/sec   Loss 3.5500   LearningRate 0.0206   Epoch: 15   Global Step: 165050   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:38:29,075-Speed 5960.04 samples/sec   Loss 3.6008   LearningRate 0.0206   Epoch: 15   Global Step: 165060   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:38:35,948-Speed 5961.92 samples/sec   Loss 3.6060   LearningRate 0.0206   Epoch: 15   Global Step: 165070   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:38:42,827-Speed 5956.01 samples/sec   Loss 3.5432   LearningRate 0.0205   Epoch: 15   Global Step: 165080   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:38:49,749-Speed 5918.96 samples/sec   Loss 3.5809   LearningRate 0.0205   Epoch: 15   Global Step: 165090   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:38:56,655-Speed 5934.29 samples/sec   Loss 3.5925   LearningRate 0.0205   Epoch: 15   Global Step: 165100   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:39:03,528-Speed 5960.96 samples/sec   Loss 3.5392   LearningRate 0.0205   Epoch: 15   Global Step: 165110   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:39:10,406-Speed 5955.65 samples/sec   Loss 3.5879   LearningRate 0.0205   Epoch: 15   Global Step: 165120   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:39:17,272-Speed 5966.97 samples/sec   Loss 3.5372   LearningRate 0.0205   Epoch: 15   Global Step: 165130   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:39:24,164-Speed 5944.82 samples/sec   Loss 3.5826   LearningRate 0.0205   Epoch: 15   Global Step: 165140   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:39:31,020-Speed 5975.56 samples/sec   Loss 3.5842   LearningRate 0.0205   Epoch: 15   Global Step: 165150   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:39:37,881-Speed 5971.62 samples/sec   Loss 3.5669   LearningRate 0.0205   Epoch: 15   Global Step: 165160   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:39:44,733-Speed 5978.78 samples/sec   Loss 3.5781   LearningRate 0.0205   Epoch: 15   Global Step: 165170   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:39:51,565-Speed 5997.09 samples/sec   Loss 3.5359   LearningRate 0.0204   Epoch: 15   Global Step: 165180   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:39:58,427-Speed 5969.85 samples/sec   Loss 3.5923   LearningRate 0.0204   Epoch: 15   Global Step: 165190   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:40:05,286-Speed 5972.79 samples/sec   Loss 3.5525   LearningRate 0.0204   Epoch: 15   Global Step: 165200   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:40:12,166-Speed 5955.33 samples/sec   Loss 3.5863   LearningRate 0.0204   Epoch: 15   Global Step: 165210   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:40:19,039-Speed 5961.11 samples/sec   Loss 3.5908   LearningRate 0.0204   Epoch: 15   Global Step: 165220   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:40:25,889-Speed 5979.81 samples/sec   Loss 3.5676   LearningRate 0.0204   Epoch: 15   Global Step: 165230   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:40:32,739-Speed 5981.14 samples/sec   Loss 3.5590   LearningRate 0.0204   Epoch: 15   Global Step: 165240   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:40:39,585-Speed 5984.76 samples/sec   Loss 3.5215   LearningRate 0.0204   Epoch: 15   Global Step: 165250   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:40:46,432-Speed 5982.85 samples/sec   Loss 3.6024   LearningRate 0.0204   Epoch: 15   Global Step: 165260   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:40:53,301-Speed 5963.84 samples/sec   Loss 3.5677   LearningRate 0.0204   Epoch: 15   Global Step: 165270   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:41:00,147-Speed 5984.58 samples/sec   Loss 3.5420   LearningRate 0.0204   Epoch: 15   Global Step: 165280   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:41:06,983-Speed 5992.70 samples/sec   Loss 3.4933   LearningRate 0.0203   Epoch: 15   Global Step: 165290   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:41:13,840-Speed 5973.86 samples/sec   Loss 3.5449   LearningRate 0.0203   Epoch: 15   Global Step: 165300   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:41:20,694-Speed 5979.07 samples/sec   Loss 3.5514   LearningRate 0.0203   Epoch: 15   Global Step: 165310   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:41:27,568-Speed 5959.34 samples/sec   Loss 3.4881   LearningRate 0.0203   Epoch: 15   Global Step: 165320   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:41:34,455-Speed 5949.40 samples/sec   Loss 3.5767   LearningRate 0.0203   Epoch: 15   Global Step: 165330   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:41:41,310-Speed 5975.94 samples/sec   Loss 3.5540   LearningRate 0.0203   Epoch: 15   Global Step: 165340   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:41:48,164-Speed 5977.64 samples/sec   Loss 3.5344   LearningRate 0.0203   Epoch: 15   Global Step: 165350   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:41:55,021-Speed 5980.60 samples/sec   Loss 3.5952   LearningRate 0.0203   Epoch: 15   Global Step: 165360   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:42:01,875-Speed 5977.36 samples/sec   Loss 3.5400   LearningRate 0.0203   Epoch: 15   Global Step: 165370   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:42:08,758-Speed 5952.48 samples/sec   Loss 3.5378   LearningRate 0.0203   Epoch: 15   Global Step: 165380   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:42:15,641-Speed 5952.14 samples/sec   Loss 3.5642   LearningRate 0.0202   Epoch: 15   Global Step: 165390   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:42:22,511-Speed 5963.77 samples/sec   Loss 3.5674   LearningRate 0.0202   Epoch: 15   Global Step: 165400   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:42:29,396-Speed 5949.52 samples/sec   Loss 3.5204   LearningRate 0.0202   Epoch: 15   Global Step: 165410   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:42:36,256-Speed 5972.42 samples/sec   Loss 3.5031   LearningRate 0.0202   Epoch: 15   Global Step: 165420   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:42:43,108-Speed 5980.34 samples/sec   Loss 3.5663   LearningRate 0.0202   Epoch: 15   Global Step: 165430   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:42:49,973-Speed 5967.23 samples/sec   Loss 3.5433   LearningRate 0.0202   Epoch: 15   Global Step: 165440   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:42:56,844-Speed 5962.44 samples/sec   Loss 3.5559   LearningRate 0.0202   Epoch: 15   Global Step: 165450   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:43:03,700-Speed 5977.21 samples/sec   Loss 3.5083   LearningRate 0.0202   Epoch: 15   Global Step: 165460   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:43:10,559-Speed 5972.77 samples/sec   Loss 3.5872   LearningRate 0.0202   Epoch: 15   Global Step: 165470   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:43:17,421-Speed 5969.99 samples/sec   Loss 3.5319   LearningRate 0.0202   Epoch: 15   Global Step: 165480   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:43:24,344-Speed 5918.52 samples/sec   Loss 3.5010   LearningRate 0.0201   Epoch: 15   Global Step: 165490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-09 04:43:31,218-Speed 5959.62 samples/sec   Loss 3.5500   LearningRate 0.0201   Epoch: 15   Global Step: 165500   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-09 04:43:38,082-Speed 5968.70 samples/sec   Loss 3.5366   LearningRate 0.0201   Epoch: 15   Global Step: 165510   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-09 04:43:44,961-Speed 5955.78 samples/sec   Loss 3.5532   LearningRate 0.0201   Epoch: 15   Global Step: 165520   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-09 04:43:51,800-Speed 5990.46 samples/sec   Loss 3.5329   LearningRate 0.0201   Epoch: 15   Global Step: 165530   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:43:58,664-Speed 5967.89 samples/sec   Loss 3.5174   LearningRate 0.0201   Epoch: 15   Global Step: 165540   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:44:05,534-Speed 5963.69 samples/sec   Loss 3.5272   LearningRate 0.0201   Epoch: 15   Global Step: 165550   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:44:12,441-Speed 5931.84 samples/sec   Loss 3.5131   LearningRate 0.0201   Epoch: 15   Global Step: 165560   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:44:19,314-Speed 5960.75 samples/sec   Loss 3.5901   LearningRate 0.0201   Epoch: 15   Global Step: 165570   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:44:26,173-Speed 5974.81 samples/sec   Loss 3.4886   LearningRate 0.0201   Epoch: 15   Global Step: 165580   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:44:33,027-Speed 5976.99 samples/sec   Loss 3.4889   LearningRate 0.0201   Epoch: 15   Global Step: 165590   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:44:39,877-Speed 5980.42 samples/sec   Loss 3.5189   LearningRate 0.0200   Epoch: 15   Global Step: 165600   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:44:46,734-Speed 5976.06 samples/sec   Loss 3.5160   LearningRate 0.0200   Epoch: 15   Global Step: 165610   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:44:53,608-Speed 5959.16 samples/sec   Loss 3.5084   LearningRate 0.0200   Epoch: 15   Global Step: 165620   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:45:00,516-Speed 5932.84 samples/sec   Loss 3.5455   LearningRate 0.0200   Epoch: 15   Global Step: 165630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-09 04:45:10,672-Speed 4033.60 samples/sec   Loss 3.5440   LearningRate 0.0200   Epoch: 15   Global Step: 165640   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:45:17,537-Speed 5967.97 samples/sec   Loss 3.5654   LearningRate 0.0200   Epoch: 15   Global Step: 165650   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:45:24,385-Speed 5982.49 samples/sec   Loss 3.5040   LearningRate 0.0200   Epoch: 15   Global Step: 165660   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:45:31,235-Speed 5980.76 samples/sec   Loss 3.5355   LearningRate 0.0200   Epoch: 15   Global Step: 165670   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:45:38,129-Speed 5944.73 samples/sec   Loss 3.5228   LearningRate 0.0200   Epoch: 15   Global Step: 165680   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:45:44,985-Speed 5974.82 samples/sec   Loss 3.5370   LearningRate 0.0200   Epoch: 15   Global Step: 165690   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:45:51,837-Speed 5979.17 samples/sec   Loss 3.5448   LearningRate 0.0199   Epoch: 15   Global Step: 165700   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:45:58,700-Speed 5969.59 samples/sec   Loss 3.5172   LearningRate 0.0199   Epoch: 15   Global Step: 165710   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:46:05,556-Speed 5976.14 samples/sec   Loss 3.4689   LearningRate 0.0199   Epoch: 15   Global Step: 165720   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:46:12,402-Speed 5986.90 samples/sec   Loss 3.5044   LearningRate 0.0199   Epoch: 15   Global Step: 165730   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:46:19,272-Speed 5962.90 samples/sec   Loss 3.5167   LearningRate 0.0199   Epoch: 15   Global Step: 165740   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:46:26,156-Speed 5951.88 samples/sec   Loss 3.5482   LearningRate 0.0199   Epoch: 15   Global Step: 165750   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:46:33,067-Speed 5928.09 samples/sec   Loss 3.5407   LearningRate 0.0199   Epoch: 15   Global Step: 165760   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:46:39,943-Speed 5959.95 samples/sec   Loss 3.4881   LearningRate 0.0199   Epoch: 15   Global Step: 165770   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:46:46,800-Speed 5974.29 samples/sec   Loss 3.5077   LearningRate 0.0199   Epoch: 15   Global Step: 165780   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:46:53,657-Speed 5974.86 samples/sec   Loss 3.5897   LearningRate 0.0199   Epoch: 15   Global Step: 165790   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:47:00,509-Speed 5979.44 samples/sec   Loss 3.5295   LearningRate 0.0199   Epoch: 15   Global Step: 165800   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:47:07,369-Speed 5971.95 samples/sec   Loss 3.5433   LearningRate 0.0198   Epoch: 15   Global Step: 165810   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:47:14,300-Speed 5911.22 samples/sec   Loss 3.5124   LearningRate 0.0198   Epoch: 15   Global Step: 165820   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:47:21,254-Speed 5891.52 samples/sec   Loss 3.5613   LearningRate 0.0198   Epoch: 15   Global Step: 165830   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:47:28,230-Speed 5871.92 samples/sec   Loss 3.5365   LearningRate 0.0198   Epoch: 15   Global Step: 165840   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:47:35,189-Speed 5889.68 samples/sec   Loss 3.4921   LearningRate 0.0198   Epoch: 15   Global Step: 165850   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:47:42,074-Speed 5949.84 samples/sec   Loss 3.5365   LearningRate 0.0198   Epoch: 15   Global Step: 165860   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:47:48,949-Speed 5958.81 samples/sec   Loss 3.5262   LearningRate 0.0198   Epoch: 15   Global Step: 165870   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:47:55,809-Speed 5972.44 samples/sec   Loss 3.5626   LearningRate 0.0198   Epoch: 15   Global Step: 165880   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:48:02,679-Speed 5963.04 samples/sec   Loss 3.5363   LearningRate 0.0198   Epoch: 15   Global Step: 165890   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:48:09,559-Speed 5954.44 samples/sec   Loss 3.5374   LearningRate 0.0198   Epoch: 15   Global Step: 165900   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:48:16,416-Speed 5974.96 samples/sec   Loss 3.5355   LearningRate 0.0197   Epoch: 15   Global Step: 165910   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:48:44,501-Speed 1458.61 samples/sec   Loss 3.5368   LearningRate 0.0197   Epoch: 16   Global Step: 165920   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:48:51,356-Speed 5976.53 samples/sec   Loss 3.5072   LearningRate 0.0197   Epoch: 16   Global Step: 165930   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:48:58,176-Speed 6007.37 samples/sec   Loss 3.5349   LearningRate 0.0197   Epoch: 16   Global Step: 165940   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:49:05,009-Speed 5995.56 samples/sec   Loss 3.5315   LearningRate 0.0197   Epoch: 16   Global Step: 165950   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:49:11,909-Speed 5937.56 samples/sec   Loss 3.5290   LearningRate 0.0197   Epoch: 16   Global Step: 165960   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:49:18,847-Speed 5905.59 samples/sec   Loss 3.4958   LearningRate 0.0197   Epoch: 16   Global Step: 165970   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:49:25,713-Speed 5966.62 samples/sec   Loss 3.5159   LearningRate 0.0197   Epoch: 16   Global Step: 165980   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:49:32,578-Speed 5967.86 samples/sec   Loss 3.5275   LearningRate 0.0197   Epoch: 16   Global Step: 165990   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:49:39,448-Speed 5962.99 samples/sec   Loss 3.4770   LearningRate 0.0197   Epoch: 16   Global Step: 166000   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:49:46,331-Speed 5952.32 samples/sec   Loss 3.5326   LearningRate 0.0197   Epoch: 16   Global Step: 166010   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:49:53,204-Speed 5960.35 samples/sec   Loss 3.5053   LearningRate 0.0196   Epoch: 16   Global Step: 166020   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:50:00,096-Speed 5944.70 samples/sec   Loss 3.4435   LearningRate 0.0196   Epoch: 16   Global Step: 166030   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:50:06,991-Speed 5940.56 samples/sec   Loss 3.4806   LearningRate 0.0196   Epoch: 16   Global Step: 166040   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:50:13,890-Speed 5938.69 samples/sec   Loss 3.4773   LearningRate 0.0196   Epoch: 16   Global Step: 166050   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:50:20,776-Speed 5949.72 samples/sec   Loss 3.4763   LearningRate 0.0196   Epoch: 16   Global Step: 166060   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:50:27,642-Speed 5966.16 samples/sec   Loss 3.5037   LearningRate 0.0196   Epoch: 16   Global Step: 166070   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:50:34,527-Speed 5950.59 samples/sec   Loss 3.4470   LearningRate 0.0196   Epoch: 16   Global Step: 166080   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:50:41,402-Speed 5959.51 samples/sec   Loss 3.4985   LearningRate 0.0196   Epoch: 16   Global Step: 166090   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:50:48,305-Speed 5934.29 samples/sec   Loss 3.4637   LearningRate 0.0196   Epoch: 16   Global Step: 166100   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:50:55,174-Speed 5964.44 samples/sec   Loss 3.5081   LearningRate 0.0196   Epoch: 16   Global Step: 166110   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:51:02,035-Speed 5971.71 samples/sec   Loss 3.4840   LearningRate 0.0195   Epoch: 16   Global Step: 166120   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:51:08,904-Speed 5963.49 samples/sec   Loss 3.4840   LearningRate 0.0195   Epoch: 16   Global Step: 166130   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:51:15,777-Speed 5962.23 samples/sec   Loss 3.4876   LearningRate 0.0195   Epoch: 16   Global Step: 166140   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:51:22,644-Speed 5966.09 samples/sec   Loss 3.5338   LearningRate 0.0195   Epoch: 16   Global Step: 166150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:51:29,505-Speed 5971.23 samples/sec   Loss 3.4706   LearningRate 0.0195   Epoch: 16   Global Step: 166160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:51:36,388-Speed 5951.95 samples/sec   Loss 3.4916   LearningRate 0.0195   Epoch: 16   Global Step: 166170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:51:43,273-Speed 5950.48 samples/sec   Loss 3.4557   LearningRate 0.0195   Epoch: 16   Global Step: 166180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:51:50,152-Speed 5955.17 samples/sec   Loss 3.4983   LearningRate 0.0195   Epoch: 16   Global Step: 166190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:51:57,016-Speed 5969.11 samples/sec   Loss 3.5054   LearningRate 0.0195   Epoch: 16   Global Step: 166200   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:52:03,875-Speed 5973.12 samples/sec   Loss 3.4857   LearningRate 0.0195   Epoch: 16   Global Step: 166210   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:52:10,750-Speed 5959.06 samples/sec   Loss 3.4445   LearningRate 0.0195   Epoch: 16   Global Step: 166220   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:52:17,634-Speed 5951.28 samples/sec   Loss 3.4845   LearningRate 0.0194   Epoch: 16   Global Step: 166230   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:52:24,515-Speed 5953.62 samples/sec   Loss 3.4816   LearningRate 0.0194   Epoch: 16   Global Step: 166240   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-09 04:52:31,372-Speed 5974.43 samples/sec   Loss 3.4976   LearningRate 0.0194   Epoch: 16   Global Step: 166250   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:52:38,240-Speed 5964.35 samples/sec   Loss 3.4651   LearningRate 0.0194   Epoch: 16   Global Step: 166260   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:52:45,149-Speed 5929.78 samples/sec   Loss 3.4984   LearningRate 0.0194   Epoch: 16   Global Step: 166270   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:52:52,057-Speed 5930.33 samples/sec   Loss 3.4824   LearningRate 0.0194   Epoch: 16   Global Step: 166280   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:52:58,931-Speed 5960.68 samples/sec   Loss 3.5114   LearningRate 0.0194   Epoch: 16   Global Step: 166290   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:53:05,802-Speed 5962.63 samples/sec   Loss 3.4907   LearningRate 0.0194   Epoch: 16   Global Step: 166300   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:53:12,669-Speed 5965.71 samples/sec   Loss 3.4969   LearningRate 0.0194   Epoch: 16   Global Step: 166310   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:53:19,524-Speed 5975.94 samples/sec   Loss 3.4818   LearningRate 0.0194   Epoch: 16   Global Step: 166320   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:53:26,412-Speed 5948.17 samples/sec   Loss 3.4440   LearningRate 0.0193   Epoch: 16   Global Step: 166330   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:53:33,289-Speed 5957.20 samples/sec   Loss 3.5228   LearningRate 0.0193   Epoch: 16   Global Step: 166340   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:53:40,173-Speed 5951.65 samples/sec   Loss 3.5358   LearningRate 0.0193   Epoch: 16   Global Step: 166350   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:53:47,024-Speed 5980.50 samples/sec   Loss 3.5287   LearningRate 0.0193   Epoch: 16   Global Step: 166360   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:53:53,883-Speed 5972.17 samples/sec   Loss 3.5011   LearningRate 0.0193   Epoch: 16   Global Step: 166370   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:54:00,749-Speed 5966.95 samples/sec   Loss 3.4695   LearningRate 0.0193   Epoch: 16   Global Step: 166380   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:54:07,613-Speed 5969.44 samples/sec   Loss 3.4433   LearningRate 0.0193   Epoch: 16   Global Step: 166390   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:54:14,490-Speed 5957.13 samples/sec   Loss 3.5144   LearningRate 0.0193   Epoch: 16   Global Step: 166400   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:54:21,379-Speed 5946.37 samples/sec   Loss 3.4491   LearningRate 0.0193   Epoch: 16   Global Step: 166410   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:54:28,235-Speed 5975.57 samples/sec   Loss 3.4911   LearningRate 0.0193   Epoch: 16   Global Step: 166420   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:54:35,108-Speed 5960.85 samples/sec   Loss 3.4815   LearningRate 0.0193   Epoch: 16   Global Step: 166430   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:54:41,963-Speed 5976.65 samples/sec   Loss 3.4990   LearningRate 0.0192   Epoch: 16   Global Step: 166440   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:54:48,819-Speed 5975.88 samples/sec   Loss 3.4407   LearningRate 0.0192   Epoch: 16   Global Step: 166450   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:54:55,678-Speed 5973.22 samples/sec   Loss 3.4504   LearningRate 0.0192   Epoch: 16   Global Step: 166460   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:55:02,561-Speed 5952.57 samples/sec   Loss 3.4949   LearningRate 0.0192   Epoch: 16   Global Step: 166470   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:55:09,506-Speed 5898.86 samples/sec   Loss 3.4554   LearningRate 0.0192   Epoch: 16   Global Step: 166480   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:55:16,379-Speed 5960.17 samples/sec   Loss 3.4603   LearningRate 0.0192   Epoch: 16   Global Step: 166490   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:55:23,239-Speed 5971.93 samples/sec   Loss 3.4411   LearningRate 0.0192   Epoch: 16   Global Step: 166500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:55:30,173-Speed 5908.89 samples/sec   Loss 3.4776   LearningRate 0.0192   Epoch: 16   Global Step: 166510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:55:37,046-Speed 5959.90 samples/sec   Loss 3.4564   LearningRate 0.0192   Epoch: 16   Global Step: 166520   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:55:43,968-Speed 5918.71 samples/sec   Loss 3.4567   LearningRate 0.0192   Epoch: 16   Global Step: 166530   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:55:50,845-Speed 5957.08 samples/sec   Loss 3.4423   LearningRate 0.0192   Epoch: 16   Global Step: 166540   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:55:57,702-Speed 5975.71 samples/sec   Loss 3.5075   LearningRate 0.0191   Epoch: 16   Global Step: 166550   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:56:04,574-Speed 5961.55 samples/sec   Loss 3.4639   LearningRate 0.0191   Epoch: 16   Global Step: 166560   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-09 04:56:11,436-Speed 5970.20 samples/sec   Loss 3.4934   LearningRate 0.0191   Epoch: 16   Global Step: 166570   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:56:18,319-Speed 5951.98 samples/sec   Loss 3.4650   LearningRate 0.0191   Epoch: 16   Global Step: 166580   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:56:25,229-Speed 5929.81 samples/sec   Loss 3.4615   LearningRate 0.0191   Epoch: 16   Global Step: 166590   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:56:32,081-Speed 5979.63 samples/sec   Loss 3.4876   LearningRate 0.0191   Epoch: 16   Global Step: 166600   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:56:38,941-Speed 5971.95 samples/sec   Loss 3.4546   LearningRate 0.0191   Epoch: 16   Global Step: 166610   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:56:45,800-Speed 5974.25 samples/sec   Loss 3.4649   LearningRate 0.0191   Epoch: 16   Global Step: 166620   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:56:52,680-Speed 5954.90 samples/sec   Loss 3.4177   LearningRate 0.0191   Epoch: 16   Global Step: 166630   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:56:59,602-Speed 5918.05 samples/sec   Loss 3.4897   LearningRate 0.0191   Epoch: 16   Global Step: 166640   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:57:06,471-Speed 5964.53 samples/sec   Loss 3.4361   LearningRate 0.0190   Epoch: 16   Global Step: 166650   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:57:13,343-Speed 5961.72 samples/sec   Loss 3.3965   LearningRate 0.0190   Epoch: 16   Global Step: 166660   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:57:20,212-Speed 5963.98 samples/sec   Loss 3.4748   LearningRate 0.0190   Epoch: 16   Global Step: 166670   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-09 04:57:27,064-Speed 5978.71 samples/sec   Loss 3.4390   LearningRate 0.0190   Epoch: 16   Global Step: 166680   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-09 04:57:34,007-Speed 5900.58 samples/sec   Loss 3.4554   LearningRate 0.0190   Epoch: 16   Global Step: 166690   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:57:40,871-Speed 5968.28 samples/sec   Loss 3.4737   LearningRate 0.0190   Epoch: 16   Global Step: 166700   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 04:57:47,721-Speed 5981.42 samples/sec   Loss 3.4386   LearningRate 0.0190   Epoch: 16   Global Step: 166710   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:57:54,591-Speed 5963.34 samples/sec   Loss 3.4264   LearningRate 0.0190   Epoch: 16   Global Step: 166720   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:58:01,451-Speed 5971.70 samples/sec   Loss 3.4640   LearningRate 0.0190   Epoch: 16   Global Step: 166730   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:58:08,307-Speed 5975.85 samples/sec   Loss 3.4492   LearningRate 0.0190   Epoch: 16   Global Step: 166740   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:58:15,164-Speed 5974.15 samples/sec   Loss 3.4233   LearningRate 0.0190   Epoch: 16   Global Step: 166750   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:58:22,023-Speed 5972.80 samples/sec   Loss 3.4697   LearningRate 0.0189   Epoch: 16   Global Step: 166760   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:58:28,887-Speed 5970.30 samples/sec   Loss 3.4567   LearningRate 0.0189   Epoch: 16   Global Step: 166770   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:58:35,778-Speed 5945.63 samples/sec   Loss 3.4406   LearningRate 0.0189   Epoch: 16   Global Step: 166780   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 04:58:42,677-Speed 5937.74 samples/sec   Loss 3.4203   LearningRate 0.0189   Epoch: 16   Global Step: 166790   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 04:58:49,538-Speed 5972.85 samples/sec   Loss 3.4523   LearningRate 0.0189   Epoch: 16   Global Step: 166800   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 04:58:56,403-Speed 5967.97 samples/sec   Loss 3.4262   LearningRate 0.0189   Epoch: 16   Global Step: 166810   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 04:59:03,261-Speed 5973.42 samples/sec   Loss 3.4765   LearningRate 0.0189   Epoch: 16   Global Step: 166820   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 04:59:10,126-Speed 5968.37 samples/sec   Loss 3.4282   LearningRate 0.0189   Epoch: 16   Global Step: 166830   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 04:59:17,014-Speed 5948.06 samples/sec   Loss 3.4815   LearningRate 0.0189   Epoch: 16   Global Step: 166840   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 04:59:23,872-Speed 5973.49 samples/sec   Loss 3.4524   LearningRate 0.0189   Epoch: 16   Global Step: 166850   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 04:59:30,734-Speed 5970.75 samples/sec   Loss 3.4219   LearningRate 0.0189   Epoch: 16   Global Step: 166860   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 04:59:37,595-Speed 5971.20 samples/sec   Loss 3.4362   LearningRate 0.0188   Epoch: 16   Global Step: 166870   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-01-09 04:59:44,460-Speed 5968.01 samples/sec   Loss 3.4188   LearningRate 0.0188   Epoch: 16   Global Step: 166880   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:59:51,320-Speed 5971.75 samples/sec   Loss 3.4461   LearningRate 0.0188   Epoch: 16   Global Step: 166890   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 04:59:58,182-Speed 5970.91 samples/sec   Loss 3.4341   LearningRate 0.0188   Epoch: 16   Global Step: 166900   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:00:05,057-Speed 5958.48 samples/sec   Loss 3.4044   LearningRate 0.0188   Epoch: 16   Global Step: 166910   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:00:11,930-Speed 5961.30 samples/sec   Loss 3.4269   LearningRate 0.0188   Epoch: 16   Global Step: 166920   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:00:18,788-Speed 5973.91 samples/sec   Loss 3.3778   LearningRate 0.0188   Epoch: 16   Global Step: 166930   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:00:25,640-Speed 5979.20 samples/sec   Loss 3.4798   LearningRate 0.0188   Epoch: 16   Global Step: 166940   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:00:32,522-Speed 5952.68 samples/sec   Loss 3.4219   LearningRate 0.0188   Epoch: 16   Global Step: 166950   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:00:39,470-Speed 5896.30 samples/sec   Loss 3.4763   LearningRate 0.0188   Epoch: 16   Global Step: 166960   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:00:46,356-Speed 5949.47 samples/sec   Loss 3.4463   LearningRate 0.0188   Epoch: 16   Global Step: 166970   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:00:53,211-Speed 5976.82 samples/sec   Loss 3.4302   LearningRate 0.0187   Epoch: 16   Global Step: 166980   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:01:00,092-Speed 5954.23 samples/sec   Loss 3.3958   LearningRate 0.0187   Epoch: 16   Global Step: 166990   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:01:06,954-Speed 5970.28 samples/sec   Loss 3.4348   LearningRate 0.0187   Epoch: 16   Global Step: 167000   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:01:13,822-Speed 5964.57 samples/sec   Loss 3.4605   LearningRate 0.0187   Epoch: 16   Global Step: 167010   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:01:20,696-Speed 5967.71 samples/sec   Loss 3.4249   LearningRate 0.0187   Epoch: 16   Global Step: 167020   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:01:27,574-Speed 5956.66 samples/sec   Loss 3.4043   LearningRate 0.0187   Epoch: 16   Global Step: 167030   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:01:34,436-Speed 5970.24 samples/sec   Loss 3.4195   LearningRate 0.0187   Epoch: 16   Global Step: 167040   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:01:41,289-Speed 5978.20 samples/sec   Loss 3.4066   LearningRate 0.0187   Epoch: 16   Global Step: 167050   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:01:48,166-Speed 5957.01 samples/sec   Loss 3.4771   LearningRate 0.0187   Epoch: 16   Global Step: 167060   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:01:55,020-Speed 5977.78 samples/sec   Loss 3.4226   LearningRate 0.0187   Epoch: 16   Global Step: 167070   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:02:01,890-Speed 5964.55 samples/sec   Loss 3.4314   LearningRate 0.0186   Epoch: 16   Global Step: 167080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-09 05:02:08,763-Speed 5960.11 samples/sec   Loss 3.4559   LearningRate 0.0186   Epoch: 16   Global Step: 167090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-09 05:02:15,610-Speed 5983.23 samples/sec   Loss 3.4168   LearningRate 0.0186   Epoch: 16   Global Step: 167100   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:02:22,470-Speed 5972.29 samples/sec   Loss 3.4099   LearningRate 0.0186   Epoch: 16   Global Step: 167110   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:02:29,343-Speed 5959.50 samples/sec   Loss 3.4199   LearningRate 0.0186   Epoch: 16   Global Step: 167120   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:02:36,207-Speed 5968.91 samples/sec   Loss 3.4188   LearningRate 0.0186   Epoch: 16   Global Step: 167130   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:02:43,075-Speed 5965.65 samples/sec   Loss 3.3593   LearningRate 0.0186   Epoch: 16   Global Step: 167140   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:02:49,942-Speed 5965.61 samples/sec   Loss 3.4443   LearningRate 0.0186   Epoch: 16   Global Step: 167150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:02:56,907-Speed 5885.31 samples/sec   Loss 3.4487   LearningRate 0.0186   Epoch: 16   Global Step: 167160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:03:03,761-Speed 5977.19 samples/sec   Loss 3.3726   LearningRate 0.0186   Epoch: 16   Global Step: 167170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:03:10,626-Speed 5967.82 samples/sec   Loss 3.4249   LearningRate 0.0186   Epoch: 16   Global Step: 167180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:03:17,488-Speed 5969.76 samples/sec   Loss 3.3976   LearningRate 0.0185   Epoch: 16   Global Step: 167190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:03:24,337-Speed 5982.36 samples/sec   Loss 3.3986   LearningRate 0.0185   Epoch: 16   Global Step: 167200   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:03:31,212-Speed 5958.27 samples/sec   Loss 3.3942   LearningRate 0.0185   Epoch: 16   Global Step: 167210   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:03:38,082-Speed 5963.32 samples/sec   Loss 3.3762   LearningRate 0.0185   Epoch: 16   Global Step: 167220   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:03:44,935-Speed 5977.98 samples/sec   Loss 3.4467   LearningRate 0.0185   Epoch: 16   Global Step: 167230   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:03:51,810-Speed 5959.11 samples/sec   Loss 3.4380   LearningRate 0.0185   Epoch: 16   Global Step: 167240   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:03:58,677-Speed 5965.57 samples/sec   Loss 3.3996   LearningRate 0.0185   Epoch: 16   Global Step: 167250   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:04:05,543-Speed 5966.91 samples/sec   Loss 3.3957   LearningRate 0.0185   Epoch: 16   Global Step: 167260   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:04:12,433-Speed 5946.45 samples/sec   Loss 3.4381   LearningRate 0.0185   Epoch: 16   Global Step: 167270   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:04:19,293-Speed 5972.00 samples/sec   Loss 3.4152   LearningRate 0.0185   Epoch: 16   Global Step: 167280   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:04:26,159-Speed 5967.14 samples/sec   Loss 3.4550   LearningRate 0.0185   Epoch: 16   Global Step: 167290   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:04:33,035-Speed 5957.98 samples/sec   Loss 3.3921   LearningRate 0.0184   Epoch: 16   Global Step: 167300   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:04:39,913-Speed 5956.55 samples/sec   Loss 3.4015   LearningRate 0.0184   Epoch: 16   Global Step: 167310   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:04:46,762-Speed 5981.57 samples/sec   Loss 3.3913   LearningRate 0.0184   Epoch: 16   Global Step: 167320   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:04:53,608-Speed 5984.34 samples/sec   Loss 3.3896   LearningRate 0.0184   Epoch: 16   Global Step: 167330   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:05:00,457-Speed 5981.72 samples/sec   Loss 3.4412   LearningRate 0.0184   Epoch: 16   Global Step: 167340   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:05:07,360-Speed 5934.52 samples/sec   Loss 3.4459   LearningRate 0.0184   Epoch: 16   Global Step: 167350   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:05:14,202-Speed 5987.85 samples/sec   Loss 3.3592   LearningRate 0.0184   Epoch: 16   Global Step: 167360   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:05:21,053-Speed 5979.32 samples/sec   Loss 3.3993   LearningRate 0.0184   Epoch: 16   Global Step: 167370   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:05:27,907-Speed 5977.84 samples/sec   Loss 3.3972   LearningRate 0.0184   Epoch: 16   Global Step: 167380   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:05:34,775-Speed 5964.89 samples/sec   Loss 3.3952   LearningRate 0.0184   Epoch: 16   Global Step: 167390   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:05:41,636-Speed 5970.94 samples/sec   Loss 3.4426   LearningRate 0.0184   Epoch: 16   Global Step: 167400   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:05:48,520-Speed 5950.69 samples/sec   Loss 3.4170   LearningRate 0.0183   Epoch: 16   Global Step: 167410   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:05:55,366-Speed 5984.94 samples/sec   Loss 3.4170   LearningRate 0.0183   Epoch: 16   Global Step: 167420   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:06:02,265-Speed 5938.02 samples/sec   Loss 3.4000   LearningRate 0.0183   Epoch: 16   Global Step: 167430   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:06:09,165-Speed 5938.66 samples/sec   Loss 3.3901   LearningRate 0.0183   Epoch: 16   Global Step: 167440   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:06:16,023-Speed 5974.36 samples/sec   Loss 3.3913   LearningRate 0.0183   Epoch: 16   Global Step: 167450   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:06:22,902-Speed 5955.50 samples/sec   Loss 3.3762   LearningRate 0.0183   Epoch: 16   Global Step: 167460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-09 05:06:29,779-Speed 5957.04 samples/sec   Loss 3.4101   LearningRate 0.0183   Epoch: 16   Global Step: 167470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-09 05:06:36,643-Speed 5968.05 samples/sec   Loss 3.4307   LearningRate 0.0183   Epoch: 16   Global Step: 167480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-09 05:06:43,495-Speed 5980.37 samples/sec   Loss 3.4272   LearningRate 0.0183   Epoch: 16   Global Step: 167490   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:06:50,366-Speed 5962.58 samples/sec   Loss 3.4275   LearningRate 0.0183   Epoch: 16   Global Step: 167500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:06:57,227-Speed 5970.50 samples/sec   Loss 3.3935   LearningRate 0.0183   Epoch: 16   Global Step: 167510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:07:04,083-Speed 5976.17 samples/sec   Loss 3.4010   LearningRate 0.0182   Epoch: 16   Global Step: 167520   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:07:10,935-Speed 5978.65 samples/sec   Loss 3.3423   LearningRate 0.0182   Epoch: 16   Global Step: 167530   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:07:17,789-Speed 5977.11 samples/sec   Loss 3.3960   LearningRate 0.0182   Epoch: 16   Global Step: 167540   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:07:24,652-Speed 5970.05 samples/sec   Loss 3.3925   LearningRate 0.0182   Epoch: 16   Global Step: 167550   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:07:31,503-Speed 5980.48 samples/sec   Loss 3.4217   LearningRate 0.0182   Epoch: 16   Global Step: 167560   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:07:38,362-Speed 5972.64 samples/sec   Loss 3.3929   LearningRate 0.0182   Epoch: 16   Global Step: 167570   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:07:45,213-Speed 5980.06 samples/sec   Loss 3.3999   LearningRate 0.0182   Epoch: 16   Global Step: 167580   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:07:52,067-Speed 5977.01 samples/sec   Loss 3.3786   LearningRate 0.0182   Epoch: 16   Global Step: 167590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-09 05:07:58,918-Speed 5979.77 samples/sec   Loss 3.3556   LearningRate 0.0182   Epoch: 16   Global Step: 167600   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:08:05,780-Speed 5970.89 samples/sec   Loss 3.3431   LearningRate 0.0182   Epoch: 16   Global Step: 167610   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:08:12,739-Speed 5887.31 samples/sec   Loss 3.3793   LearningRate 0.0182   Epoch: 16   Global Step: 167620   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:08:19,639-Speed 5936.60 samples/sec   Loss 3.3636   LearningRate 0.0181   Epoch: 16   Global Step: 167630   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:08:26,556-Speed 5923.27 samples/sec   Loss 3.3768   LearningRate 0.0181   Epoch: 16   Global Step: 167640   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:08:33,512-Speed 5889.54 samples/sec   Loss 3.3522   LearningRate 0.0181   Epoch: 16   Global Step: 167650   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:08:40,432-Speed 5920.40 samples/sec   Loss 3.3839   LearningRate 0.0181   Epoch: 16   Global Step: 167660   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:08:47,439-Speed 5846.92 samples/sec   Loss 3.3939   LearningRate 0.0181   Epoch: 16   Global Step: 167670   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:08:54,305-Speed 5970.03 samples/sec   Loss 3.3764   LearningRate 0.0181   Epoch: 16   Global Step: 167680   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:09:01,211-Speed 5931.82 samples/sec   Loss 3.4037   LearningRate 0.0181   Epoch: 16   Global Step: 167690   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:09:08,110-Speed 5937.54 samples/sec   Loss 3.3836   LearningRate 0.0181   Epoch: 16   Global Step: 167700   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:09:14,963-Speed 5978.65 samples/sec   Loss 3.3985   LearningRate 0.0181   Epoch: 16   Global Step: 167710   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:09:21,813-Speed 5980.23 samples/sec   Loss 3.3539   LearningRate 0.0181   Epoch: 16   Global Step: 167720   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:09:28,678-Speed 5968.16 samples/sec   Loss 3.4336   LearningRate 0.0181   Epoch: 16   Global Step: 167730   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:09:35,537-Speed 5972.38 samples/sec   Loss 3.3489   LearningRate 0.0180   Epoch: 16   Global Step: 167740   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:09:42,382-Speed 5985.21 samples/sec   Loss 3.4011   LearningRate 0.0180   Epoch: 16   Global Step: 167750   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:09:49,236-Speed 5978.98 samples/sec   Loss 3.3530   LearningRate 0.0180   Epoch: 16   Global Step: 167760   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:09:56,099-Speed 5969.13 samples/sec   Loss 3.4045   LearningRate 0.0180   Epoch: 16   Global Step: 167770   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:10:02,965-Speed 5967.03 samples/sec   Loss 3.3730   LearningRate 0.0180   Epoch: 16   Global Step: 167780   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:10:09,848-Speed 5952.02 samples/sec   Loss 3.3866   LearningRate 0.0180   Epoch: 16   Global Step: 167790   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:10:16,751-Speed 5934.43 samples/sec   Loss 3.3564   LearningRate 0.0180   Epoch: 16   Global Step: 167800   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:10:23,607-Speed 5975.86 samples/sec   Loss 3.3512   LearningRate 0.0180   Epoch: 16   Global Step: 167810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-09 05:10:30,455-Speed 5982.02 samples/sec   Loss 3.3632   LearningRate 0.0180   Epoch: 16   Global Step: 167820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-09 05:10:37,298-Speed 5986.96 samples/sec   Loss 3.3705   LearningRate 0.0180   Epoch: 16   Global Step: 167830   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:10:44,170-Speed 5961.62 samples/sec   Loss 3.3908   LearningRate 0.0180   Epoch: 16   Global Step: 167840   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:10:51,022-Speed 5979.06 samples/sec   Loss 3.3398   LearningRate 0.0179   Epoch: 16   Global Step: 167850   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:10:57,867-Speed 5985.17 samples/sec   Loss 3.3991   LearningRate 0.0179   Epoch: 16   Global Step: 167860   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:11:04,729-Speed 5970.29 samples/sec   Loss 3.3892   LearningRate 0.0179   Epoch: 16   Global Step: 167870   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:11:11,620-Speed 5945.60 samples/sec   Loss 3.3823   LearningRate 0.0179   Epoch: 16   Global Step: 167880   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:11:18,496-Speed 5958.40 samples/sec   Loss 3.3780   LearningRate 0.0179   Epoch: 16   Global Step: 167890   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:11:25,350-Speed 5976.87 samples/sec   Loss 3.3497   LearningRate 0.0179   Epoch: 16   Global Step: 167900   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:11:32,228-Speed 5956.29 samples/sec   Loss 3.3690   LearningRate 0.0179   Epoch: 16   Global Step: 167910   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:11:39,118-Speed 5946.58 samples/sec   Loss 3.3967   LearningRate 0.0179   Epoch: 16   Global Step: 167920   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:11:45,976-Speed 5973.63 samples/sec   Loss 3.3685   LearningRate 0.0179   Epoch: 16   Global Step: 167930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-09 05:11:52,842-Speed 5966.93 samples/sec   Loss 3.3491   LearningRate 0.0179   Epoch: 16   Global Step: 167940   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:11:59,682-Speed 5989.23 samples/sec   Loss 3.3258   LearningRate 0.0179   Epoch: 16   Global Step: 167950   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:12:06,554-Speed 5961.58 samples/sec   Loss 3.3640   LearningRate 0.0178   Epoch: 16   Global Step: 167960   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:12:13,402-Speed 5982.35 samples/sec   Loss 3.3517   LearningRate 0.0178   Epoch: 16   Global Step: 167970   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:12:20,257-Speed 5977.08 samples/sec   Loss 3.3309   LearningRate 0.0178   Epoch: 16   Global Step: 167980   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:12:27,121-Speed 5968.64 samples/sec   Loss 3.3382   LearningRate 0.0178   Epoch: 16   Global Step: 167990   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:12:33,982-Speed 5971.37 samples/sec   Loss 3.3461   LearningRate 0.0178   Epoch: 16   Global Step: 168000   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:12:40,830-Speed 5982.06 samples/sec   Loss 3.3611   LearningRate 0.0178   Epoch: 16   Global Step: 168010   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:12:47,686-Speed 5975.09 samples/sec   Loss 3.3931   LearningRate 0.0178   Epoch: 16   Global Step: 168020   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:12:54,534-Speed 5984.13 samples/sec   Loss 3.3156   LearningRate 0.0178   Epoch: 16   Global Step: 168030   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:13:01,398-Speed 5968.99 samples/sec   Loss 3.3848   LearningRate 0.0178   Epoch: 16   Global Step: 168040   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:13:08,267-Speed 5964.06 samples/sec   Loss 3.3221   LearningRate 0.0178   Epoch: 16   Global Step: 168050   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:13:15,142-Speed 5959.65 samples/sec   Loss 3.3302   LearningRate 0.0178   Epoch: 16   Global Step: 168060   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:13:22,002-Speed 5971.64 samples/sec   Loss 3.3504   LearningRate 0.0177   Epoch: 16   Global Step: 168070   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:13:28,861-Speed 5972.33 samples/sec   Loss 3.3996   LearningRate 0.0177   Epoch: 16   Global Step: 168080   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:13:35,742-Speed 5955.70 samples/sec   Loss 3.3290   LearningRate 0.0177   Epoch: 16   Global Step: 168090   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:13:42,615-Speed 5960.72 samples/sec   Loss 3.3021   LearningRate 0.0177   Epoch: 16   Global Step: 168100   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:13:49,468-Speed 5977.98 samples/sec   Loss 3.3578   LearningRate 0.0177   Epoch: 16   Global Step: 168110   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:13:56,328-Speed 5972.31 samples/sec   Loss 3.3658   LearningRate 0.0177   Epoch: 16   Global Step: 168120   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:14:03,186-Speed 5974.04 samples/sec   Loss 3.3491   LearningRate 0.0177   Epoch: 16   Global Step: 168130   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:14:10,061-Speed 5959.06 samples/sec   Loss 3.3717   LearningRate 0.0177   Epoch: 16   Global Step: 168140   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:14:16,914-Speed 5977.67 samples/sec   Loss 3.3215   LearningRate 0.0177   Epoch: 16   Global Step: 168150   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:14:23,800-Speed 5950.68 samples/sec   Loss 3.3160   LearningRate 0.0177   Epoch: 16   Global Step: 168160   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:14:30,648-Speed 5981.78 samples/sec   Loss 3.3660   LearningRate 0.0177   Epoch: 16   Global Step: 168170   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:14:37,521-Speed 5961.24 samples/sec   Loss 3.3258   LearningRate 0.0176   Epoch: 16   Global Step: 168180   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:14:44,429-Speed 5931.15 samples/sec   Loss 3.3595   LearningRate 0.0176   Epoch: 16   Global Step: 168190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:14:51,294-Speed 5966.82 samples/sec   Loss 3.3512   LearningRate 0.0176   Epoch: 16   Global Step: 168200   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:14:58,158-Speed 5968.54 samples/sec   Loss 3.3589   LearningRate 0.0176   Epoch: 16   Global Step: 168210   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:15:05,051-Speed 5944.43 samples/sec   Loss 3.3613   LearningRate 0.0176   Epoch: 16   Global Step: 168220   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:15:11,999-Speed 5896.25 samples/sec   Loss 3.3341   LearningRate 0.0176   Epoch: 16   Global Step: 168230   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:15:18,863-Speed 5969.01 samples/sec   Loss 3.3246   LearningRate 0.0176   Epoch: 16   Global Step: 168240   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:15:25,735-Speed 5962.21 samples/sec   Loss 3.3425   LearningRate 0.0176   Epoch: 16   Global Step: 168250   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:15:32,618-Speed 5951.87 samples/sec   Loss 3.3147   LearningRate 0.0176   Epoch: 16   Global Step: 168260   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:15:39,488-Speed 5964.29 samples/sec   Loss 3.3423   LearningRate 0.0176   Epoch: 16   Global Step: 168270   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:15:46,366-Speed 5956.30 samples/sec   Loss 3.3648   LearningRate 0.0176   Epoch: 16   Global Step: 168280   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:15:53,245-Speed 5954.98 samples/sec   Loss 3.3453   LearningRate 0.0175   Epoch: 16   Global Step: 168290   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:16:00,110-Speed 5971.67 samples/sec   Loss 3.3315   LearningRate 0.0175   Epoch: 16   Global Step: 168300   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:16:06,988-Speed 5957.37 samples/sec   Loss 3.3312   LearningRate 0.0175   Epoch: 16   Global Step: 168310   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:16:13,861-Speed 5960.34 samples/sec   Loss 3.3687   LearningRate 0.0175   Epoch: 16   Global Step: 168320   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:16:20,708-Speed 5984.04 samples/sec   Loss 3.3438   LearningRate 0.0175   Epoch: 16   Global Step: 168330   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:16:27,582-Speed 5959.05 samples/sec   Loss 3.3105   LearningRate 0.0175   Epoch: 16   Global Step: 168340   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:16:34,428-Speed 5984.78 samples/sec   Loss 3.3243   LearningRate 0.0175   Epoch: 16   Global Step: 168350   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:16:41,296-Speed 5965.20 samples/sec   Loss 3.3698   LearningRate 0.0175   Epoch: 16   Global Step: 168360   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:16:48,143-Speed 5984.26 samples/sec   Loss 3.3658   LearningRate 0.0175   Epoch: 16   Global Step: 168370   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:16:55,011-Speed 5965.12 samples/sec   Loss 3.3483   LearningRate 0.0175   Epoch: 16   Global Step: 168380   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:17:01,873-Speed 5972.07 samples/sec   Loss 3.3383   LearningRate 0.0175   Epoch: 16   Global Step: 168390   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:17:08,717-Speed 5985.64 samples/sec   Loss 3.3797   LearningRate 0.0174   Epoch: 16   Global Step: 168400   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:17:15,563-Speed 5983.95 samples/sec   Loss 3.3543   LearningRate 0.0174   Epoch: 16   Global Step: 168410   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:17:22,435-Speed 5961.87 samples/sec   Loss 3.2987   LearningRate 0.0174   Epoch: 16   Global Step: 168420   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:17:29,299-Speed 5968.85 samples/sec   Loss 3.3149   LearningRate 0.0174   Epoch: 16   Global Step: 168430   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:17:36,147-Speed 5982.00 samples/sec   Loss 3.3281   LearningRate 0.0174   Epoch: 16   Global Step: 168440   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:17:42,992-Speed 5985.11 samples/sec   Loss 3.3027   LearningRate 0.0174   Epoch: 16   Global Step: 168450   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:17:49,862-Speed 5963.84 samples/sec   Loss 3.3070   LearningRate 0.0174   Epoch: 16   Global Step: 168460   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:17:56,708-Speed 5984.17 samples/sec   Loss 3.3134   LearningRate 0.0174   Epoch: 16   Global Step: 168470   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:18:03,646-Speed 5904.78 samples/sec   Loss 3.3066   LearningRate 0.0174   Epoch: 16   Global Step: 168480   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:18:10,502-Speed 5974.81 samples/sec   Loss 3.3576   LearningRate 0.0174   Epoch: 16   Global Step: 168490   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:18:17,345-Speed 5986.88 samples/sec   Loss 3.3806   LearningRate 0.0174   Epoch: 16   Global Step: 168500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:18:24,195-Speed 5980.52 samples/sec   Loss 3.3587   LearningRate 0.0173   Epoch: 16   Global Step: 168510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:18:31,040-Speed 5984.84 samples/sec   Loss 3.3022   LearningRate 0.0173   Epoch: 16   Global Step: 168520   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:18:37,903-Speed 5969.91 samples/sec   Loss 3.2899   LearningRate 0.0173   Epoch: 16   Global Step: 168530   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:18:44,890-Speed 5863.18 samples/sec   Loss 3.3750   LearningRate 0.0173   Epoch: 16   Global Step: 168540   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:18:51,741-Speed 5979.94 samples/sec   Loss 3.3147   LearningRate 0.0173   Epoch: 16   Global Step: 168550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-09 05:18:58,578-Speed 5991.95 samples/sec   Loss 3.3295   LearningRate 0.0173   Epoch: 16   Global Step: 168560   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:19:05,455-Speed 5957.79 samples/sec   Loss 3.2963   LearningRate 0.0173   Epoch: 16   Global Step: 168570   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:19:12,302-Speed 5983.27 samples/sec   Loss 3.2869   LearningRate 0.0173   Epoch: 16   Global Step: 168580   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:19:19,151-Speed 5981.76 samples/sec   Loss 3.2720   LearningRate 0.0173   Epoch: 16   Global Step: 168590   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:19:26,012-Speed 5971.19 samples/sec   Loss 3.3528   LearningRate 0.0173   Epoch: 16   Global Step: 168600   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:19:32,871-Speed 5972.67 samples/sec   Loss 3.3331   LearningRate 0.0173   Epoch: 16   Global Step: 168610   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:19:39,729-Speed 5974.09 samples/sec   Loss 3.3298   LearningRate 0.0173   Epoch: 16   Global Step: 168620   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:19:46,595-Speed 5967.10 samples/sec   Loss 3.3209   LearningRate 0.0172   Epoch: 16   Global Step: 168630   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:19:53,456-Speed 5970.91 samples/sec   Loss 3.3105   LearningRate 0.0172   Epoch: 16   Global Step: 168640   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:20:00,315-Speed 5972.68 samples/sec   Loss 3.2893   LearningRate 0.0172   Epoch: 16   Global Step: 168650   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:20:07,160-Speed 5985.67 samples/sec   Loss 3.2986   LearningRate 0.0172   Epoch: 16   Global Step: 168660   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:20:14,016-Speed 5975.22 samples/sec   Loss 3.3038   LearningRate 0.0172   Epoch: 16   Global Step: 168670   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:20:20,892-Speed 5958.23 samples/sec   Loss 3.3114   LearningRate 0.0172   Epoch: 16   Global Step: 168680   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:20:27,734-Speed 5987.13 samples/sec   Loss 3.3261   LearningRate 0.0172   Epoch: 16   Global Step: 168690   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:20:34,596-Speed 5970.36 samples/sec   Loss 3.3019   LearningRate 0.0172   Epoch: 16   Global Step: 168700   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:20:41,455-Speed 5972.81 samples/sec   Loss 3.2894   LearningRate 0.0172   Epoch: 16   Global Step: 168710   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:20:48,314-Speed 5972.73 samples/sec   Loss 3.3035   LearningRate 0.0172   Epoch: 16   Global Step: 168720   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:20:55,188-Speed 5959.48 samples/sec   Loss 3.2959   LearningRate 0.0172   Epoch: 16   Global Step: 168730   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:21:02,041-Speed 5978.14 samples/sec   Loss 3.3372   LearningRate 0.0171   Epoch: 16   Global Step: 168740   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:21:08,905-Speed 5968.21 samples/sec   Loss 3.3358   LearningRate 0.0171   Epoch: 16   Global Step: 168750   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:21:15,759-Speed 5977.74 samples/sec   Loss 3.3318   LearningRate 0.0171   Epoch: 16   Global Step: 168760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-09 05:21:22,607-Speed 5981.99 samples/sec   Loss 3.3287   LearningRate 0.0171   Epoch: 16   Global Step: 168770   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:21:29,487-Speed 5954.97 samples/sec   Loss 3.3485   LearningRate 0.0171   Epoch: 16   Global Step: 168780   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:21:36,356-Speed 5964.51 samples/sec   Loss 3.2984   LearningRate 0.0171   Epoch: 16   Global Step: 168790   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:21:43,216-Speed 5972.27 samples/sec   Loss 3.2926   LearningRate 0.0171   Epoch: 16   Global Step: 168800   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:21:50,184-Speed 5879.16 samples/sec   Loss 3.2740   LearningRate 0.0171   Epoch: 16   Global Step: 168810   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:21:57,029-Speed 5985.38 samples/sec   Loss 3.3012   LearningRate 0.0171   Epoch: 16   Global Step: 168820   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:22:03,888-Speed 5972.07 samples/sec   Loss 3.3105   LearningRate 0.0171   Epoch: 16   Global Step: 168830   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:22:10,733-Speed 5984.79 samples/sec   Loss 3.3167   LearningRate 0.0171   Epoch: 16   Global Step: 168840   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:22:17,604-Speed 5962.65 samples/sec   Loss 3.3113   LearningRate 0.0170   Epoch: 16   Global Step: 168850   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:22:24,469-Speed 5967.89 samples/sec   Loss 3.3086   LearningRate 0.0170   Epoch: 16   Global Step: 168860   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:22:31,327-Speed 5974.03 samples/sec   Loss 3.2941   LearningRate 0.0170   Epoch: 16   Global Step: 168870   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:22:38,182-Speed 5976.12 samples/sec   Loss 3.2829   LearningRate 0.0170   Epoch: 16   Global Step: 168880   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:22:45,066-Speed 5951.36 samples/sec   Loss 3.2843   LearningRate 0.0170   Epoch: 16   Global Step: 168890   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:22:51,930-Speed 5968.56 samples/sec   Loss 3.3165   LearningRate 0.0170   Epoch: 16   Global Step: 168900   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:22:58,796-Speed 5966.39 samples/sec   Loss 3.2983   LearningRate 0.0170   Epoch: 16   Global Step: 168910   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:23:05,687-Speed 5945.05 samples/sec   Loss 3.3470   LearningRate 0.0170   Epoch: 16   Global Step: 168920   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:23:12,554-Speed 5966.01 samples/sec   Loss 3.3219   LearningRate 0.0170   Epoch: 16   Global Step: 168930   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-01-09 05:23:19,399-Speed 5986.29 samples/sec   Loss 3.3152   LearningRate 0.0170   Epoch: 16   Global Step: 168940   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:23:26,252-Speed 5977.22 samples/sec   Loss 3.3094   LearningRate 0.0170   Epoch: 16   Global Step: 168950   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:23:33,122-Speed 5965.16 samples/sec   Loss 3.2982   LearningRate 0.0169   Epoch: 16   Global Step: 168960   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:23:39,980-Speed 5974.47 samples/sec   Loss 3.3018   LearningRate 0.0169   Epoch: 16   Global Step: 168970   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:23:46,831-Speed 5979.95 samples/sec   Loss 3.2591   LearningRate 0.0169   Epoch: 16   Global Step: 168980   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:23:53,699-Speed 5965.62 samples/sec   Loss 3.2740   LearningRate 0.0169   Epoch: 16   Global Step: 168990   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:24:00,555-Speed 5976.13 samples/sec   Loss 3.2701   LearningRate 0.0169   Epoch: 16   Global Step: 169000   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:24:07,428-Speed 5959.85 samples/sec   Loss 3.3040   LearningRate 0.0169   Epoch: 16   Global Step: 169010   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:24:14,292-Speed 5968.89 samples/sec   Loss 3.2644   LearningRate 0.0169   Epoch: 16   Global Step: 169020   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:24:21,160-Speed 5965.14 samples/sec   Loss 3.3070   LearningRate 0.0169   Epoch: 16   Global Step: 169030   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:24:28,017-Speed 5974.02 samples/sec   Loss 3.2762   LearningRate 0.0169   Epoch: 16   Global Step: 169040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-01-09 05:24:34,894-Speed 5957.19 samples/sec   Loss 3.2770   LearningRate 0.0169   Epoch: 16   Global Step: 169050   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:24:41,781-Speed 5948.39 samples/sec   Loss 3.2544   LearningRate 0.0169   Epoch: 16   Global Step: 169060   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:24:48,635-Speed 5977.15 samples/sec   Loss 3.2630   LearningRate 0.0169   Epoch: 16   Global Step: 169070   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:24:55,507-Speed 5962.33 samples/sec   Loss 3.2767   LearningRate 0.0168   Epoch: 16   Global Step: 169080   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:25:02,384-Speed 5957.57 samples/sec   Loss 3.2769   LearningRate 0.0168   Epoch: 16   Global Step: 169090   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:25:09,284-Speed 5936.58 samples/sec   Loss 3.2649   LearningRate 0.0168   Epoch: 16   Global Step: 169100   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:25:16,192-Speed 5930.82 samples/sec   Loss 3.2433   LearningRate 0.0168   Epoch: 16   Global Step: 169110   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:25:23,125-Speed 5909.34 samples/sec   Loss 3.2807   LearningRate 0.0168   Epoch: 16   Global Step: 169120   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-01-09 05:25:30,024-Speed 5937.77 samples/sec   Loss 3.2812   LearningRate 0.0168   Epoch: 16   Global Step: 169130   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:25:36,916-Speed 5944.01 samples/sec   Loss 3.3044   LearningRate 0.0168   Epoch: 16   Global Step: 169140   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:25:43,817-Speed 5936.76 samples/sec   Loss 3.2666   LearningRate 0.0168   Epoch: 16   Global Step: 169150   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:25:50,689-Speed 5961.60 samples/sec   Loss 3.3305   LearningRate 0.0168   Epoch: 16   Global Step: 169160   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:25:57,537-Speed 5982.70 samples/sec   Loss 3.3176   LearningRate 0.0168   Epoch: 16   Global Step: 169170   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:26:04,387-Speed 5980.56 samples/sec   Loss 3.2633   LearningRate 0.0168   Epoch: 16   Global Step: 169180   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:26:11,242-Speed 5976.02 samples/sec   Loss 3.2772   LearningRate 0.0167   Epoch: 16   Global Step: 169190   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:26:18,100-Speed 5973.93 samples/sec   Loss 3.2711   LearningRate 0.0167   Epoch: 16   Global Step: 169200   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:26:24,999-Speed 5939.16 samples/sec   Loss 3.2392   LearningRate 0.0167   Epoch: 16   Global Step: 169210   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:26:31,899-Speed 5936.90 samples/sec   Loss 3.2626   LearningRate 0.0167   Epoch: 16   Global Step: 169220   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:26:38,751-Speed 5979.74 samples/sec   Loss 3.3152   LearningRate 0.0167   Epoch: 16   Global Step: 169230   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:26:45,597-Speed 5983.77 samples/sec   Loss 3.2477   LearningRate 0.0167   Epoch: 16   Global Step: 169240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:26:52,456-Speed 5973.24 samples/sec   Loss 3.2524   LearningRate 0.0167   Epoch: 16   Global Step: 169250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:26:59,312-Speed 5975.73 samples/sec   Loss 3.2982   LearningRate 0.0167   Epoch: 16   Global Step: 169260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:27:06,175-Speed 5969.40 samples/sec   Loss 3.2639   LearningRate 0.0167   Epoch: 16   Global Step: 169270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:27:13,042-Speed 5966.19 samples/sec   Loss 3.2747   LearningRate 0.0167   Epoch: 16   Global Step: 169280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:27:19,908-Speed 5966.52 samples/sec   Loss 3.2840   LearningRate 0.0167   Epoch: 16   Global Step: 169290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:27:26,761-Speed 5980.98 samples/sec   Loss 3.2318   LearningRate 0.0167   Epoch: 16   Global Step: 169300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:27:33,633-Speed 5961.50 samples/sec   Loss 3.2740   LearningRate 0.0166   Epoch: 16   Global Step: 169310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:27:40,517-Speed 5951.38 samples/sec   Loss 3.2762   LearningRate 0.0166   Epoch: 16   Global Step: 169320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:27:47,385-Speed 5964.99 samples/sec   Loss 3.2514   LearningRate 0.0166   Epoch: 16   Global Step: 169330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:27:54,225-Speed 5989.32 samples/sec   Loss 3.2623   LearningRate 0.0166   Epoch: 16   Global Step: 169340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:28:01,077-Speed 5979.03 samples/sec   Loss 3.2208   LearningRate 0.0166   Epoch: 16   Global Step: 169350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:28:07,951-Speed 5959.77 samples/sec   Loss 3.2483   LearningRate 0.0166   Epoch: 16   Global Step: 169360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:28:14,808-Speed 5974.26 samples/sec   Loss 3.2640   LearningRate 0.0166   Epoch: 16   Global Step: 169370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:28:21,659-Speed 5980.70 samples/sec   Loss 3.2457   LearningRate 0.0166   Epoch: 16   Global Step: 169380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:28:28,515-Speed 5975.43 samples/sec   Loss 3.2558   LearningRate 0.0166   Epoch: 16   Global Step: 169390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:28:35,400-Speed 5950.04 samples/sec   Loss 3.2612   LearningRate 0.0166   Epoch: 16   Global Step: 169400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:28:42,253-Speed 5979.14 samples/sec   Loss 3.2852   LearningRate 0.0166   Epoch: 16   Global Step: 169410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:28:49,114-Speed 5971.14 samples/sec   Loss 3.2572   LearningRate 0.0165   Epoch: 16   Global Step: 169420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:28:55,986-Speed 5961.42 samples/sec   Loss 3.2397   LearningRate 0.0165   Epoch: 16   Global Step: 169430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:29:02,847-Speed 5970.81 samples/sec   Loss 3.2831   LearningRate 0.0165   Epoch: 16   Global Step: 169440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-09 05:29:09,690-Speed 5988.94 samples/sec   Loss 3.2911   LearningRate 0.0165   Epoch: 16   Global Step: 169450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:29:16,560-Speed 5963.12 samples/sec   Loss 3.2610   LearningRate 0.0165   Epoch: 16   Global Step: 169460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:29:23,420-Speed 5972.44 samples/sec   Loss 3.2463   LearningRate 0.0165   Epoch: 16   Global Step: 169470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:29:30,268-Speed 5982.25 samples/sec   Loss 3.2565   LearningRate 0.0165   Epoch: 16   Global Step: 169480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:29:37,164-Speed 5940.39 samples/sec   Loss 3.2343   LearningRate 0.0165   Epoch: 16   Global Step: 169490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:29:44,016-Speed 5978.76 samples/sec   Loss 3.2448   LearningRate 0.0165   Epoch: 16   Global Step: 169500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:29:50,882-Speed 5967.17 samples/sec   Loss 3.3016   LearningRate 0.0165   Epoch: 16   Global Step: 169510   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:29:57,753-Speed 5962.19 samples/sec   Loss 3.2844   LearningRate 0.0165   Epoch: 16   Global Step: 169520   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:30:04,603-Speed 5980.65 samples/sec   Loss 3.2646   LearningRate 0.0165   Epoch: 16   Global Step: 169530   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:30:11,446-Speed 5987.34 samples/sec   Loss 3.2422   LearningRate 0.0164   Epoch: 16   Global Step: 169540   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:30:18,310-Speed 5967.82 samples/sec   Loss 3.2522   LearningRate 0.0164   Epoch: 16   Global Step: 169550   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:30:25,175-Speed 5969.32 samples/sec   Loss 3.2700   LearningRate 0.0164   Epoch: 16   Global Step: 169560   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:30:32,032-Speed 5974.75 samples/sec   Loss 3.2472   LearningRate 0.0164   Epoch: 16   Global Step: 169570   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:30:38,877-Speed 5985.27 samples/sec   Loss 3.2974   LearningRate 0.0164   Epoch: 16   Global Step: 169580   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:30:45,734-Speed 5974.37 samples/sec   Loss 3.2444   LearningRate 0.0164   Epoch: 16   Global Step: 169590   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:30:52,590-Speed 5975.31 samples/sec   Loss 3.2393   LearningRate 0.0164   Epoch: 16   Global Step: 169600   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:30:59,465-Speed 5959.14 samples/sec   Loss 3.3322   LearningRate 0.0164   Epoch: 16   Global Step: 169610   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:31:06,312-Speed 5982.85 samples/sec   Loss 3.2412   LearningRate 0.0164   Epoch: 16   Global Step: 169620   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:31:13,162-Speed 5981.26 samples/sec   Loss 3.2240   LearningRate 0.0164   Epoch: 16   Global Step: 169630   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:31:20,020-Speed 5973.99 samples/sec   Loss 3.2370   LearningRate 0.0164   Epoch: 16   Global Step: 169640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:31:26,885-Speed 5967.36 samples/sec   Loss 3.2254   LearningRate 0.0163   Epoch: 16   Global Step: 169650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:31:33,734-Speed 5981.89 samples/sec   Loss 3.2200   LearningRate 0.0163   Epoch: 16   Global Step: 169660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:31:40,585-Speed 5979.35 samples/sec   Loss 3.2544   LearningRate 0.0163   Epoch: 16   Global Step: 169670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:31:47,464-Speed 5957.70 samples/sec   Loss 3.2473   LearningRate 0.0163   Epoch: 16   Global Step: 169680   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:31:54,396-Speed 5909.85 samples/sec   Loss 3.2436   LearningRate 0.0163   Epoch: 16   Global Step: 169690   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:32:01,330-Speed 5908.32 samples/sec   Loss 3.2502   LearningRate 0.0163   Epoch: 16   Global Step: 169700   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:32:08,251-Speed 5919.55 samples/sec   Loss 3.2031   LearningRate 0.0163   Epoch: 16   Global Step: 169710   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:32:15,195-Speed 5899.31 samples/sec   Loss 3.2334   LearningRate 0.0163   Epoch: 16   Global Step: 169720   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:32:22,128-Speed 5909.03 samples/sec   Loss 3.2088   LearningRate 0.0163   Epoch: 16   Global Step: 169730   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:32:29,061-Speed 5910.00 samples/sec   Loss 3.2519   LearningRate 0.0163   Epoch: 16   Global Step: 169740   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:32:36,001-Speed 5903.05 samples/sec   Loss 3.2193   LearningRate 0.0163   Epoch: 16   Global Step: 169750   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:32:42,863-Speed 5969.52 samples/sec   Loss 3.2149   LearningRate 0.0163   Epoch: 16   Global Step: 169760   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:32:49,710-Speed 5983.40 samples/sec   Loss 3.2302   LearningRate 0.0162   Epoch: 16   Global Step: 169770   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:32:56,563-Speed 5978.25 samples/sec   Loss 3.2444   LearningRate 0.0162   Epoch: 16   Global Step: 169780   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:33:03,443-Speed 5954.61 samples/sec   Loss 3.2446   LearningRate 0.0162   Epoch: 16   Global Step: 169790   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:33:10,329-Speed 5949.46 samples/sec   Loss 3.1972   LearningRate 0.0162   Epoch: 16   Global Step: 169800   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:33:17,204-Speed 5959.56 samples/sec   Loss 3.2564   LearningRate 0.0162   Epoch: 16   Global Step: 169810   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:33:24,098-Speed 5941.68 samples/sec   Loss 3.2152   LearningRate 0.0162   Epoch: 16   Global Step: 169820   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:33:30,962-Speed 5968.99 samples/sec   Loss 3.1643   LearningRate 0.0162   Epoch: 16   Global Step: 169830   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:33:37,810-Speed 5983.08 samples/sec   Loss 3.2632   LearningRate 0.0162   Epoch: 16   Global Step: 169840   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:33:44,675-Speed 5966.60 samples/sec   Loss 3.2216   LearningRate 0.0162   Epoch: 16   Global Step: 169850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:33:51,524-Speed 5981.70 samples/sec   Loss 3.2372   LearningRate 0.0162   Epoch: 16   Global Step: 169860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:33:58,384-Speed 5972.28 samples/sec   Loss 3.2117   LearningRate 0.0162   Epoch: 16   Global Step: 169870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:34:05,265-Speed 5954.00 samples/sec   Loss 3.2107   LearningRate 0.0161   Epoch: 16   Global Step: 169880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:34:12,111-Speed 5984.05 samples/sec   Loss 3.2214   LearningRate 0.0161   Epoch: 16   Global Step: 169890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:34:18,996-Speed 5950.93 samples/sec   Loss 3.2092   LearningRate 0.0161   Epoch: 16   Global Step: 169900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:34:25,844-Speed 5982.80 samples/sec   Loss 3.1975   LearningRate 0.0161   Epoch: 16   Global Step: 169910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:34:32,700-Speed 5974.74 samples/sec   Loss 3.2490   LearningRate 0.0161   Epoch: 16   Global Step: 169920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:34:39,572-Speed 5962.20 samples/sec   Loss 3.2522   LearningRate 0.0161   Epoch: 16   Global Step: 169930   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:34:46,445-Speed 5960.29 samples/sec   Loss 3.2375   LearningRate 0.0161   Epoch: 16   Global Step: 169940   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:34:53,343-Speed 5939.63 samples/sec   Loss 3.2385   LearningRate 0.0161   Epoch: 16   Global Step: 169950   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:35:00,239-Speed 5941.50 samples/sec   Loss 3.2423   LearningRate 0.0161   Epoch: 16   Global Step: 169960   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:35:07,099-Speed 5971.69 samples/sec   Loss 3.2481   LearningRate 0.0161   Epoch: 16   Global Step: 169970   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:35:13,953-Speed 5977.66 samples/sec   Loss 3.2133   LearningRate 0.0161   Epoch: 16   Global Step: 169980   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:35:20,815-Speed 5970.49 samples/sec   Loss 3.2062   LearningRate 0.0161   Epoch: 16   Global Step: 169990   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:35:27,709-Speed 5942.85 samples/sec   Loss 3.2472   LearningRate 0.0160   Epoch: 16   Global Step: 170000   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:35:54,537-[lfw][170000]XNorm: 23.536367
Training: 2022-01-09 05:35:54,538-[lfw][170000]Accuracy-Flip: 0.99800+-0.00287
Training: 2022-01-09 05:35:54,539-[lfw][170000]Accuracy-Highest: 0.99817
Training: 2022-01-09 05:36:25,456-[cfp_fp][170000]XNorm: 21.273448
Training: 2022-01-09 05:36:25,457-[cfp_fp][170000]Accuracy-Flip: 0.99071+-0.00363
Training: 2022-01-09 05:36:25,458-[cfp_fp][170000]Accuracy-Highest: 0.99071
Training: 2022-01-09 05:36:52,343-[agedb_30][170000]XNorm: 22.943632
Training: 2022-01-09 05:36:52,344-[agedb_30][170000]Accuracy-Flip: 0.97750+-0.00523
Training: 2022-01-09 05:36:52,344-[agedb_30][170000]Accuracy-Highest: 0.98067
Training: 2022-01-09 05:36:59,223-Speed 447.60 samples/sec   Loss 3.2569   LearningRate 0.0160   Epoch: 16   Global Step: 170010   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:37:06,060-Speed 5992.13 samples/sec   Loss 3.2460   LearningRate 0.0160   Epoch: 16   Global Step: 170020   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:37:12,916-Speed 5974.96 samples/sec   Loss 3.2212   LearningRate 0.0160   Epoch: 16   Global Step: 170030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:37:19,772-Speed 5975.81 samples/sec   Loss 3.2334   LearningRate 0.0160   Epoch: 16   Global Step: 170040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:37:26,641-Speed 5964.28 samples/sec   Loss 3.1970   LearningRate 0.0160   Epoch: 16   Global Step: 170050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:37:33,533-Speed 5944.60 samples/sec   Loss 3.2156   LearningRate 0.0160   Epoch: 16   Global Step: 170060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:37:40,411-Speed 5956.03 samples/sec   Loss 3.2145   LearningRate 0.0160   Epoch: 16   Global Step: 170070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:37:47,294-Speed 5952.10 samples/sec   Loss 3.2137   LearningRate 0.0160   Epoch: 16   Global Step: 170080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:37:54,176-Speed 5954.03 samples/sec   Loss 3.1833   LearningRate 0.0160   Epoch: 16   Global Step: 170090   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:38:01,055-Speed 5955.59 samples/sec   Loss 3.1531   LearningRate 0.0160   Epoch: 16   Global Step: 170100   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:38:07,944-Speed 5946.55 samples/sec   Loss 3.2260   LearningRate 0.0159   Epoch: 16   Global Step: 170110   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:38:14,830-Speed 5949.86 samples/sec   Loss 3.1974   LearningRate 0.0159   Epoch: 16   Global Step: 170120   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:38:21,710-Speed 5955.02 samples/sec   Loss 3.2017   LearningRate 0.0159   Epoch: 16   Global Step: 170130   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:38:28,571-Speed 5970.83 samples/sec   Loss 3.2278   LearningRate 0.0159   Epoch: 16   Global Step: 170140   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:38:35,423-Speed 5979.28 samples/sec   Loss 3.1771   LearningRate 0.0159   Epoch: 16   Global Step: 170150   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:38:42,286-Speed 5970.05 samples/sec   Loss 3.2233   LearningRate 0.0159   Epoch: 16   Global Step: 170160   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:38:49,170-Speed 5951.68 samples/sec   Loss 3.1730   LearningRate 0.0159   Epoch: 16   Global Step: 170170   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:38:56,048-Speed 5956.33 samples/sec   Loss 3.2394   LearningRate 0.0159   Epoch: 16   Global Step: 170180   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:39:02,915-Speed 5966.44 samples/sec   Loss 3.1952   LearningRate 0.0159   Epoch: 16   Global Step: 170190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:39:09,778-Speed 5968.64 samples/sec   Loss 3.2629   LearningRate 0.0159   Epoch: 16   Global Step: 170200   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:39:16,635-Speed 5975.11 samples/sec   Loss 3.2175   LearningRate 0.0159   Epoch: 16   Global Step: 170210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:39:23,495-Speed 5974.63 samples/sec   Loss 3.2159   LearningRate 0.0159   Epoch: 16   Global Step: 170220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:39:30,370-Speed 5958.63 samples/sec   Loss 3.1773   LearningRate 0.0158   Epoch: 16   Global Step: 170230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:39:37,233-Speed 5969.33 samples/sec   Loss 3.1832   LearningRate 0.0158   Epoch: 16   Global Step: 170240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:39:44,113-Speed 5955.00 samples/sec   Loss 3.1725   LearningRate 0.0158   Epoch: 16   Global Step: 170250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:39:50,970-Speed 5974.22 samples/sec   Loss 3.2130   LearningRate 0.0158   Epoch: 16   Global Step: 170260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:39:57,847-Speed 5958.13 samples/sec   Loss 3.2137   LearningRate 0.0158   Epoch: 16   Global Step: 170270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:40:04,718-Speed 5962.69 samples/sec   Loss 3.1930   LearningRate 0.0158   Epoch: 16   Global Step: 170280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:40:11,560-Speed 5987.06 samples/sec   Loss 3.2495   LearningRate 0.0158   Epoch: 16   Global Step: 170290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:40:18,425-Speed 5968.16 samples/sec   Loss 3.2020   LearningRate 0.0158   Epoch: 16   Global Step: 170300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:40:25,292-Speed 5966.16 samples/sec   Loss 3.1722   LearningRate 0.0158   Epoch: 16   Global Step: 170310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:40:32,149-Speed 5974.38 samples/sec   Loss 3.2063   LearningRate 0.0158   Epoch: 16   Global Step: 170320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:40:39,021-Speed 5962.13 samples/sec   Loss 3.2013   LearningRate 0.0158   Epoch: 16   Global Step: 170330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:40:45,884-Speed 5969.75 samples/sec   Loss 3.1918   LearningRate 0.0158   Epoch: 16   Global Step: 170340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:40:52,843-Speed 5886.57 samples/sec   Loss 3.1629   LearningRate 0.0157   Epoch: 16   Global Step: 170350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:40:59,734-Speed 5945.80 samples/sec   Loss 3.1839   LearningRate 0.0157   Epoch: 16   Global Step: 170360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:41:06,589-Speed 5975.62 samples/sec   Loss 3.1789   LearningRate 0.0157   Epoch: 16   Global Step: 170370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:41:13,448-Speed 5972.58 samples/sec   Loss 3.1893   LearningRate 0.0157   Epoch: 16   Global Step: 170380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:41:20,326-Speed 5956.76 samples/sec   Loss 3.2327   LearningRate 0.0157   Epoch: 16   Global Step: 170390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-09 05:41:27,196-Speed 5963.35 samples/sec   Loss 3.1547   LearningRate 0.0157   Epoch: 16   Global Step: 170400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:41:34,061-Speed 5967.41 samples/sec   Loss 3.1712   LearningRate 0.0157   Epoch: 16   Global Step: 170410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:41:40,994-Speed 5909.75 samples/sec   Loss 3.1590   LearningRate 0.0157   Epoch: 16   Global Step: 170420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:41:47,848-Speed 5977.24 samples/sec   Loss 3.2272   LearningRate 0.0157   Epoch: 16   Global Step: 170430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:41:54,701-Speed 5977.24 samples/sec   Loss 3.1639   LearningRate 0.0157   Epoch: 16   Global Step: 170440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:42:01,552-Speed 5979.75 samples/sec   Loss 3.1822   LearningRate 0.0157   Epoch: 16   Global Step: 170450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:42:08,432-Speed 5955.17 samples/sec   Loss 3.1594   LearningRate 0.0157   Epoch: 16   Global Step: 170460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:42:15,284-Speed 5978.66 samples/sec   Loss 3.1606   LearningRate 0.0156   Epoch: 16   Global Step: 170470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:42:22,156-Speed 5961.27 samples/sec   Loss 3.1924   LearningRate 0.0156   Epoch: 16   Global Step: 170480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:42:28,990-Speed 5994.59 samples/sec   Loss 3.1537   LearningRate 0.0156   Epoch: 16   Global Step: 170490   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:42:35,852-Speed 5970.38 samples/sec   Loss 3.2003   LearningRate 0.0156   Epoch: 16   Global Step: 170500   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:42:42,735-Speed 5952.69 samples/sec   Loss 3.1853   LearningRate 0.0156   Epoch: 16   Global Step: 170510   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:42:49,602-Speed 5966.41 samples/sec   Loss 3.1860   LearningRate 0.0156   Epoch: 16   Global Step: 170520   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:42:56,447-Speed 5984.73 samples/sec   Loss 3.1912   LearningRate 0.0156   Epoch: 16   Global Step: 170530   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:43:03,299-Speed 5978.67 samples/sec   Loss 3.1697   LearningRate 0.0156   Epoch: 16   Global Step: 170540   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:43:10,164-Speed 5968.35 samples/sec   Loss 3.1725   LearningRate 0.0156   Epoch: 16   Global Step: 170550   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:43:17,054-Speed 5945.11 samples/sec   Loss 3.1843   LearningRate 0.0156   Epoch: 16   Global Step: 170560   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:43:23,941-Speed 5949.29 samples/sec   Loss 3.1245   LearningRate 0.0156   Epoch: 16   Global Step: 170570   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:43:30,814-Speed 5961.11 samples/sec   Loss 3.2197   LearningRate 0.0156   Epoch: 16   Global Step: 170580   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:43:37,672-Speed 5973.71 samples/sec   Loss 3.2125   LearningRate 0.0155   Epoch: 16   Global Step: 170590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:43:44,551-Speed 5957.70 samples/sec   Loss 3.1812   LearningRate 0.0155   Epoch: 16   Global Step: 170600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:43:51,407-Speed 5975.84 samples/sec   Loss 3.1479   LearningRate 0.0155   Epoch: 16   Global Step: 170610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:43:58,286-Speed 5955.39 samples/sec   Loss 3.1582   LearningRate 0.0155   Epoch: 16   Global Step: 170620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:44:05,142-Speed 5975.35 samples/sec   Loss 3.1600   LearningRate 0.0155   Epoch: 16   Global Step: 170630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:44:12,002-Speed 5971.81 samples/sec   Loss 3.1720   LearningRate 0.0155   Epoch: 16   Global Step: 170640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:44:18,891-Speed 5946.65 samples/sec   Loss 3.1831   LearningRate 0.0155   Epoch: 16   Global Step: 170650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:44:25,754-Speed 5970.32 samples/sec   Loss 3.1648   LearningRate 0.0155   Epoch: 16   Global Step: 170660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:44:32,621-Speed 5965.24 samples/sec   Loss 3.2343   LearningRate 0.0155   Epoch: 16   Global Step: 170670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:44:39,477-Speed 5975.64 samples/sec   Loss 3.1820   LearningRate 0.0155   Epoch: 16   Global Step: 170680   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:44:46,326-Speed 5981.30 samples/sec   Loss 3.1617   LearningRate 0.0155   Epoch: 16   Global Step: 170690   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:44:53,197-Speed 5963.30 samples/sec   Loss 3.1646   LearningRate 0.0154   Epoch: 16   Global Step: 170700   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:45:00,051-Speed 5976.94 samples/sec   Loss 3.1697   LearningRate 0.0154   Epoch: 16   Global Step: 170710   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:45:06,934-Speed 5954.59 samples/sec   Loss 3.1625   LearningRate 0.0154   Epoch: 16   Global Step: 170720   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:45:13,795-Speed 5971.40 samples/sec   Loss 3.1655   LearningRate 0.0154   Epoch: 16   Global Step: 170730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:45:20,652-Speed 5974.32 samples/sec   Loss 3.1445   LearningRate 0.0154   Epoch: 16   Global Step: 170740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:45:27,511-Speed 5976.11 samples/sec   Loss 3.1567   LearningRate 0.0154   Epoch: 16   Global Step: 170750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:45:34,378-Speed 5965.64 samples/sec   Loss 3.1468   LearningRate 0.0154   Epoch: 16   Global Step: 170760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:45:41,230-Speed 5978.65 samples/sec   Loss 3.1480   LearningRate 0.0154   Epoch: 16   Global Step: 170770   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:45:48,083-Speed 5978.26 samples/sec   Loss 3.1646   LearningRate 0.0154   Epoch: 16   Global Step: 170780   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:45:54,955-Speed 5962.47 samples/sec   Loss 3.1680   LearningRate 0.0154   Epoch: 16   Global Step: 170790   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:46:01,820-Speed 5966.92 samples/sec   Loss 3.1391   LearningRate 0.0154   Epoch: 16   Global Step: 170800   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:46:08,684-Speed 5968.69 samples/sec   Loss 3.1615   LearningRate 0.0154   Epoch: 16   Global Step: 170810   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:46:15,553-Speed 5967.23 samples/sec   Loss 3.1456   LearningRate 0.0153   Epoch: 16   Global Step: 170820   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:46:22,410-Speed 5975.22 samples/sec   Loss 3.2030   LearningRate 0.0153   Epoch: 16   Global Step: 170830   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:46:29,295-Speed 5950.62 samples/sec   Loss 3.1425   LearningRate 0.0153   Epoch: 16   Global Step: 170840   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:46:36,149-Speed 5977.09 samples/sec   Loss 3.1554   LearningRate 0.0153   Epoch: 16   Global Step: 170850   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:46:43,009-Speed 5971.77 samples/sec   Loss 3.1906   LearningRate 0.0153   Epoch: 16   Global Step: 170860   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:46:49,863-Speed 5976.94 samples/sec   Loss 3.1539   LearningRate 0.0153   Epoch: 16   Global Step: 170870   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:46:56,708-Speed 5985.44 samples/sec   Loss 3.1504   LearningRate 0.0153   Epoch: 16   Global Step: 170880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:47:03,566-Speed 5972.70 samples/sec   Loss 3.1967   LearningRate 0.0153   Epoch: 16   Global Step: 170890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:47:10,456-Speed 5946.72 samples/sec   Loss 3.1629   LearningRate 0.0153   Epoch: 16   Global Step: 170900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:47:17,306-Speed 5980.79 samples/sec   Loss 3.1466   LearningRate 0.0153   Epoch: 16   Global Step: 170910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:47:24,174-Speed 5965.41 samples/sec   Loss 3.2054   LearningRate 0.0153   Epoch: 16   Global Step: 170920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:47:31,025-Speed 5979.41 samples/sec   Loss 3.1313   LearningRate 0.0153   Epoch: 16   Global Step: 170930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:47:37,873-Speed 5983.36 samples/sec   Loss 3.1900   LearningRate 0.0152   Epoch: 16   Global Step: 170940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:47:44,734-Speed 5970.48 samples/sec   Loss 3.1932   LearningRate 0.0152   Epoch: 16   Global Step: 170950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:47:51,594-Speed 5971.79 samples/sec   Loss 3.1314   LearningRate 0.0152   Epoch: 16   Global Step: 170960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:47:58,439-Speed 5985.65 samples/sec   Loss 3.1839   LearningRate 0.0152   Epoch: 16   Global Step: 170970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:48:05,299-Speed 5971.92 samples/sec   Loss 3.1190   LearningRate 0.0152   Epoch: 16   Global Step: 170980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-09 05:48:12,182-Speed 5951.81 samples/sec   Loss 3.1601   LearningRate 0.0152   Epoch: 16   Global Step: 170990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-09 05:48:19,046-Speed 5968.05 samples/sec   Loss 3.1495   LearningRate 0.0152   Epoch: 16   Global Step: 171000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-09 05:48:25,894-Speed 5982.41 samples/sec   Loss 3.1449   LearningRate 0.0152   Epoch: 16   Global Step: 171010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-09 05:48:32,743-Speed 5984.90 samples/sec   Loss 3.1678   LearningRate 0.0152   Epoch: 16   Global Step: 171020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:48:39,616-Speed 5960.56 samples/sec   Loss 3.1622   LearningRate 0.0152   Epoch: 16   Global Step: 171030   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:48:46,468-Speed 5978.18 samples/sec   Loss 3.1138   LearningRate 0.0152   Epoch: 16   Global Step: 171040   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:48:53,332-Speed 5968.89 samples/sec   Loss 3.1314   LearningRate 0.0152   Epoch: 16   Global Step: 171050   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:49:00,182-Speed 5981.13 samples/sec   Loss 3.1192   LearningRate 0.0151   Epoch: 16   Global Step: 171060   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:49:07,039-Speed 5974.91 samples/sec   Loss 3.1416   LearningRate 0.0151   Epoch: 16   Global Step: 171070   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:49:13,897-Speed 5973.54 samples/sec   Loss 3.0987   LearningRate 0.0151   Epoch: 16   Global Step: 171080   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:49:20,774-Speed 5957.21 samples/sec   Loss 3.1234   LearningRate 0.0151   Epoch: 16   Global Step: 171090   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:49:27,634-Speed 5971.79 samples/sec   Loss 3.1770   LearningRate 0.0151   Epoch: 16   Global Step: 171100   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:49:34,485-Speed 5979.33 samples/sec   Loss 3.1571   LearningRate 0.0151   Epoch: 16   Global Step: 171110   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:49:41,336-Speed 5983.23 samples/sec   Loss 3.1523   LearningRate 0.0151   Epoch: 16   Global Step: 171120   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:49:48,206-Speed 5963.42 samples/sec   Loss 3.1222   LearningRate 0.0151   Epoch: 16   Global Step: 171130   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:49:55,060-Speed 5977.68 samples/sec   Loss 3.0931   LearningRate 0.0151   Epoch: 16   Global Step: 171140   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:50:01,930-Speed 5963.59 samples/sec   Loss 3.1470   LearningRate 0.0151   Epoch: 16   Global Step: 171150   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:50:08,793-Speed 5969.00 samples/sec   Loss 3.1410   LearningRate 0.0151   Epoch: 16   Global Step: 171160   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:50:15,681-Speed 5947.75 samples/sec   Loss 3.1574   LearningRate 0.0151   Epoch: 16   Global Step: 171170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:50:22,547-Speed 5966.61 samples/sec   Loss 3.1300   LearningRate 0.0150   Epoch: 16   Global Step: 171180   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:50:29,408-Speed 5971.60 samples/sec   Loss 3.1386   LearningRate 0.0150   Epoch: 16   Global Step: 171190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:50:36,275-Speed 5965.79 samples/sec   Loss 3.1090   LearningRate 0.0150   Epoch: 16   Global Step: 171200   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:50:43,199-Speed 5917.86 samples/sec   Loss 3.1041   LearningRate 0.0150   Epoch: 16   Global Step: 171210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:50:50,119-Speed 5920.64 samples/sec   Loss 3.0902   LearningRate 0.0150   Epoch: 16   Global Step: 171220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:50:56,980-Speed 5971.24 samples/sec   Loss 3.1126   LearningRate 0.0150   Epoch: 16   Global Step: 171230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:51:03,830-Speed 5980.42 samples/sec   Loss 3.1167   LearningRate 0.0150   Epoch: 16   Global Step: 171240   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:51:10,685-Speed 5976.64 samples/sec   Loss 3.1361   LearningRate 0.0150   Epoch: 16   Global Step: 171250   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:51:17,558-Speed 5961.16 samples/sec   Loss 3.1471   LearningRate 0.0150   Epoch: 16   Global Step: 171260   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:51:24,439-Speed 5955.87 samples/sec   Loss 3.1138   LearningRate 0.0150   Epoch: 16   Global Step: 171270   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:51:31,315-Speed 5958.40 samples/sec   Loss 3.1567   LearningRate 0.0150   Epoch: 16   Global Step: 171280   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:51:38,167-Speed 5978.63 samples/sec   Loss 3.1042   LearningRate 0.0150   Epoch: 16   Global Step: 171290   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:51:45,024-Speed 5974.58 samples/sec   Loss 3.1435   LearningRate 0.0149   Epoch: 16   Global Step: 171300   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:51:51,905-Speed 5953.55 samples/sec   Loss 3.1099   LearningRate 0.0149   Epoch: 16   Global Step: 171310   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:51:58,762-Speed 5975.53 samples/sec   Loss 3.1305   LearningRate 0.0149   Epoch: 16   Global Step: 171320   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:52:05,614-Speed 5978.50 samples/sec   Loss 3.1728   LearningRate 0.0149   Epoch: 16   Global Step: 171330   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:52:12,475-Speed 5971.05 samples/sec   Loss 3.1309   LearningRate 0.0149   Epoch: 16   Global Step: 171340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:52:19,346-Speed 5964.26 samples/sec   Loss 3.1171   LearningRate 0.0149   Epoch: 16   Global Step: 171350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:52:26,212-Speed 5967.56 samples/sec   Loss 3.1760   LearningRate 0.0149   Epoch: 16   Global Step: 171360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:52:33,091-Speed 5955.06 samples/sec   Loss 3.1532   LearningRate 0.0149   Epoch: 16   Global Step: 171370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:52:39,952-Speed 5973.29 samples/sec   Loss 3.1441   LearningRate 0.0149   Epoch: 16   Global Step: 171380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:52:46,819-Speed 5965.69 samples/sec   Loss 3.1154   LearningRate 0.0149   Epoch: 16   Global Step: 171390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:52:53,673-Speed 5977.09 samples/sec   Loss 3.1527   LearningRate 0.0149   Epoch: 16   Global Step: 171400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:53:00,554-Speed 5953.88 samples/sec   Loss 3.1416   LearningRate 0.0149   Epoch: 16   Global Step: 171410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:53:07,402-Speed 5982.71 samples/sec   Loss 3.1147   LearningRate 0.0148   Epoch: 16   Global Step: 171420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:53:14,267-Speed 5967.59 samples/sec   Loss 3.1330   LearningRate 0.0148   Epoch: 16   Global Step: 171430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:53:21,155-Speed 5947.35 samples/sec   Loss 3.1314   LearningRate 0.0148   Epoch: 16   Global Step: 171440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-09 05:53:28,004-Speed 5982.67 samples/sec   Loss 3.0862   LearningRate 0.0148   Epoch: 16   Global Step: 171450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:53:34,881-Speed 5956.81 samples/sec   Loss 3.1221   LearningRate 0.0148   Epoch: 16   Global Step: 171460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:53:41,732-Speed 5979.94 samples/sec   Loss 3.0859   LearningRate 0.0148   Epoch: 16   Global Step: 171470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:53:48,600-Speed 5965.76 samples/sec   Loss 3.1292   LearningRate 0.0148   Epoch: 16   Global Step: 171480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:53:55,452-Speed 5978.89 samples/sec   Loss 3.1213   LearningRate 0.0148   Epoch: 16   Global Step: 171490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:54:02,299-Speed 5983.61 samples/sec   Loss 3.1054   LearningRate 0.0148   Epoch: 16   Global Step: 171500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:54:09,159-Speed 5972.48 samples/sec   Loss 3.1294   LearningRate 0.0148   Epoch: 16   Global Step: 171510   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:54:16,045-Speed 5949.20 samples/sec   Loss 3.1389   LearningRate 0.0148   Epoch: 16   Global Step: 171520   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:54:22,899-Speed 5977.95 samples/sec   Loss 3.0738   LearningRate 0.0148   Epoch: 16   Global Step: 171530   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:54:29,795-Speed 5940.87 samples/sec   Loss 3.1472   LearningRate 0.0147   Epoch: 16   Global Step: 171540   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:54:36,675-Speed 5954.68 samples/sec   Loss 3.1317   LearningRate 0.0147   Epoch: 16   Global Step: 171550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-09 05:54:43,517-Speed 5987.24 samples/sec   Loss 3.1201   LearningRate 0.0147   Epoch: 16   Global Step: 171560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:54:50,404-Speed 5948.93 samples/sec   Loss 3.1427   LearningRate 0.0147   Epoch: 16   Global Step: 171570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:54:57,278-Speed 5959.57 samples/sec   Loss 3.1464   LearningRate 0.0147   Epoch: 16   Global Step: 171580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:55:04,149-Speed 5963.03 samples/sec   Loss 3.1009   LearningRate 0.0147   Epoch: 16   Global Step: 171590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:55:11,007-Speed 5973.62 samples/sec   Loss 3.1422   LearningRate 0.0147   Epoch: 16   Global Step: 171600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:55:17,878-Speed 5962.85 samples/sec   Loss 3.0954   LearningRate 0.0147   Epoch: 16   Global Step: 171610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:55:24,741-Speed 5970.54 samples/sec   Loss 3.1053   LearningRate 0.0147   Epoch: 16   Global Step: 171620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:55:31,592-Speed 5980.54 samples/sec   Loss 3.1316   LearningRate 0.0147   Epoch: 16   Global Step: 171630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:55:38,442-Speed 5980.10 samples/sec   Loss 3.1094   LearningRate 0.0147   Epoch: 16   Global Step: 171640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:55:45,296-Speed 5976.91 samples/sec   Loss 3.0856   LearningRate 0.0147   Epoch: 16   Global Step: 171650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:55:52,140-Speed 5986.03 samples/sec   Loss 3.0795   LearningRate 0.0147   Epoch: 16   Global Step: 171660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:55:58,997-Speed 5973.76 samples/sec   Loss 3.0967   LearningRate 0.0146   Epoch: 16   Global Step: 171670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:56:05,879-Speed 5953.07 samples/sec   Loss 3.0977   LearningRate 0.0146   Epoch: 16   Global Step: 171680   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:56:12,744-Speed 5967.62 samples/sec   Loss 3.0835   LearningRate 0.0146   Epoch: 16   Global Step: 171690   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:56:19,601-Speed 5974.02 samples/sec   Loss 3.0872   LearningRate 0.0146   Epoch: 16   Global Step: 171700   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:56:26,485-Speed 5952.08 samples/sec   Loss 3.1300   LearningRate 0.0146   Epoch: 16   Global Step: 171710   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:56:33,340-Speed 5976.24 samples/sec   Loss 3.0936   LearningRate 0.0146   Epoch: 16   Global Step: 171720   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:56:40,184-Speed 5985.25 samples/sec   Loss 3.1218   LearningRate 0.0146   Epoch: 16   Global Step: 171730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:56:47,040-Speed 5975.99 samples/sec   Loss 3.1144   LearningRate 0.0146   Epoch: 16   Global Step: 171740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:56:53,879-Speed 5990.58 samples/sec   Loss 3.0608   LearningRate 0.0146   Epoch: 16   Global Step: 171750   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:57:00,730-Speed 5979.01 samples/sec   Loss 3.0445   LearningRate 0.0146   Epoch: 16   Global Step: 171760   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:57:07,595-Speed 5967.56 samples/sec   Loss 3.1164   LearningRate 0.0146   Epoch: 16   Global Step: 171770   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:57:14,456-Speed 5971.25 samples/sec   Loss 3.1045   LearningRate 0.0146   Epoch: 16   Global Step: 171780   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:57:21,307-Speed 5979.31 samples/sec   Loss 3.0505   LearningRate 0.0145   Epoch: 16   Global Step: 171790   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:57:28,168-Speed 5971.46 samples/sec   Loss 3.0355   LearningRate 0.0145   Epoch: 16   Global Step: 171800   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:57:35,028-Speed 5972.58 samples/sec   Loss 3.0660   LearningRate 0.0145   Epoch: 16   Global Step: 171810   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:57:41,881-Speed 5977.09 samples/sec   Loss 3.1170   LearningRate 0.0145   Epoch: 16   Global Step: 171820   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:57:48,741-Speed 5975.20 samples/sec   Loss 3.0664   LearningRate 0.0145   Epoch: 16   Global Step: 171830   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:57:55,596-Speed 5976.61 samples/sec   Loss 3.1034   LearningRate 0.0145   Epoch: 16   Global Step: 171840   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 05:58:02,453-Speed 5974.22 samples/sec   Loss 3.1071   LearningRate 0.0145   Epoch: 16   Global Step: 171850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:58:09,301-Speed 5982.43 samples/sec   Loss 3.0675   LearningRate 0.0145   Epoch: 16   Global Step: 171860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:58:16,170-Speed 5963.49 samples/sec   Loss 3.0349   LearningRate 0.0145   Epoch: 16   Global Step: 171870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:58:23,015-Speed 5985.26 samples/sec   Loss 3.0911   LearningRate 0.0145   Epoch: 16   Global Step: 171880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:58:29,893-Speed 5956.41 samples/sec   Loss 3.0902   LearningRate 0.0145   Epoch: 16   Global Step: 171890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:58:36,759-Speed 5966.94 samples/sec   Loss 3.1037   LearningRate 0.0145   Epoch: 16   Global Step: 171900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:58:43,621-Speed 5969.58 samples/sec   Loss 3.1095   LearningRate 0.0144   Epoch: 16   Global Step: 171910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:58:50,478-Speed 5978.10 samples/sec   Loss 3.0839   LearningRate 0.0144   Epoch: 16   Global Step: 171920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:58:57,335-Speed 5977.21 samples/sec   Loss 3.0668   LearningRate 0.0144   Epoch: 16   Global Step: 171930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:59:04,231-Speed 5940.79 samples/sec   Loss 3.1194   LearningRate 0.0144   Epoch: 16   Global Step: 171940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:59:11,170-Speed 5904.12 samples/sec   Loss 3.0689   LearningRate 0.0144   Epoch: 16   Global Step: 171950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-09 05:59:18,027-Speed 5974.47 samples/sec   Loss 3.0858   LearningRate 0.0144   Epoch: 16   Global Step: 171960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:59:24,902-Speed 5958.97 samples/sec   Loss 3.0718   LearningRate 0.0144   Epoch: 16   Global Step: 171970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:59:31,779-Speed 5957.48 samples/sec   Loss 3.0450   LearningRate 0.0144   Epoch: 16   Global Step: 171980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:59:38,661-Speed 5953.34 samples/sec   Loss 3.0821   LearningRate 0.0144   Epoch: 16   Global Step: 171990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:59:45,570-Speed 5929.61 samples/sec   Loss 3.0375   LearningRate 0.0144   Epoch: 16   Global Step: 172000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:59:52,421-Speed 5980.06 samples/sec   Loss 3.0766   LearningRate 0.0144   Epoch: 16   Global Step: 172010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 05:59:59,312-Speed 5945.28 samples/sec   Loss 3.1133   LearningRate 0.0144   Epoch: 16   Global Step: 172020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:00:06,184-Speed 5961.63 samples/sec   Loss 3.0966   LearningRate 0.0143   Epoch: 16   Global Step: 172030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:00:13,061-Speed 5957.04 samples/sec   Loss 3.0444   LearningRate 0.0143   Epoch: 16   Global Step: 172040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:00:19,935-Speed 5960.60 samples/sec   Loss 3.0874   LearningRate 0.0143   Epoch: 16   Global Step: 172050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:00:26,785-Speed 5979.96 samples/sec   Loss 3.1126   LearningRate 0.0143   Epoch: 16   Global Step: 172060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:00:33,773-Speed 5862.75 samples/sec   Loss 3.0576   LearningRate 0.0143   Epoch: 16   Global Step: 172070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:00:40,649-Speed 5958.14 samples/sec   Loss 3.0798   LearningRate 0.0143   Epoch: 16   Global Step: 172080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:00:47,526-Speed 5957.73 samples/sec   Loss 3.0198   LearningRate 0.0143   Epoch: 16   Global Step: 172090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:00:54,423-Speed 5939.63 samples/sec   Loss 3.0009   LearningRate 0.0143   Epoch: 16   Global Step: 172100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:01:01,290-Speed 5966.12 samples/sec   Loss 3.0432   LearningRate 0.0143   Epoch: 16   Global Step: 172110   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:01:08,158-Speed 5965.21 samples/sec   Loss 3.0482   LearningRate 0.0143   Epoch: 16   Global Step: 172120   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:01:15,055-Speed 5940.31 samples/sec   Loss 3.1049   LearningRate 0.0143   Epoch: 16   Global Step: 172130   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:01:21,913-Speed 5973.82 samples/sec   Loss 3.0425   LearningRate 0.0143   Epoch: 16   Global Step: 172140   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:01:28,759-Speed 5983.86 samples/sec   Loss 3.0428   LearningRate 0.0143   Epoch: 16   Global Step: 172150   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:01:35,620-Speed 5971.55 samples/sec   Loss 3.0668   LearningRate 0.0142   Epoch: 16   Global Step: 172160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-09 06:01:42,491-Speed 5963.06 samples/sec   Loss 3.0323   LearningRate 0.0142   Epoch: 16   Global Step: 172170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-09 06:01:49,330-Speed 5989.65 samples/sec   Loss 3.0486   LearningRate 0.0142   Epoch: 16   Global Step: 172180   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:01:56,202-Speed 5964.41 samples/sec   Loss 3.0469   LearningRate 0.0142   Epoch: 16   Global Step: 172190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:02:03,088-Speed 5949.79 samples/sec   Loss 3.0898   LearningRate 0.0142   Epoch: 16   Global Step: 172200   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:02:09,929-Speed 5988.40 samples/sec   Loss 3.0830   LearningRate 0.0142   Epoch: 16   Global Step: 172210   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:02:16,781-Speed 5978.87 samples/sec   Loss 3.1137   LearningRate 0.0142   Epoch: 16   Global Step: 172220   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:02:23,686-Speed 5933.81 samples/sec   Loss 3.0401   LearningRate 0.0142   Epoch: 16   Global Step: 172230   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:02:30,564-Speed 5955.69 samples/sec   Loss 3.0767   LearningRate 0.0142   Epoch: 16   Global Step: 172240   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:02:37,422-Speed 5974.40 samples/sec   Loss 3.0775   LearningRate 0.0142   Epoch: 16   Global Step: 172250   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:02:44,303-Speed 5953.26 samples/sec   Loss 3.0616   LearningRate 0.0142   Epoch: 16   Global Step: 172260   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:02:51,180-Speed 5957.70 samples/sec   Loss 3.1239   LearningRate 0.0142   Epoch: 16   Global Step: 172270   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:02:58,062-Speed 5952.95 samples/sec   Loss 3.0832   LearningRate 0.0141   Epoch: 16   Global Step: 172280   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:03:04,934-Speed 5962.14 samples/sec   Loss 3.0518   LearningRate 0.0141   Epoch: 16   Global Step: 172290   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:03:11,788-Speed 5977.50 samples/sec   Loss 3.0443   LearningRate 0.0141   Epoch: 16   Global Step: 172300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:03:18,648-Speed 5972.56 samples/sec   Loss 3.0522   LearningRate 0.0141   Epoch: 16   Global Step: 172310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:03:25,493-Speed 5984.89 samples/sec   Loss 3.0359   LearningRate 0.0141   Epoch: 16   Global Step: 172320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:03:32,355-Speed 5969.61 samples/sec   Loss 3.0446   LearningRate 0.0141   Epoch: 16   Global Step: 172330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:03:39,203-Speed 5982.81 samples/sec   Loss 3.0487   LearningRate 0.0141   Epoch: 16   Global Step: 172340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:03:46,054-Speed 5979.91 samples/sec   Loss 3.0504   LearningRate 0.0141   Epoch: 16   Global Step: 172350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:03:52,929-Speed 5959.07 samples/sec   Loss 3.0590   LearningRate 0.0141   Epoch: 16   Global Step: 172360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:03:59,787-Speed 5974.33 samples/sec   Loss 3.0746   LearningRate 0.0141   Epoch: 16   Global Step: 172370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:04:06,633-Speed 5984.59 samples/sec   Loss 3.0688   LearningRate 0.0141   Epoch: 16   Global Step: 172380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:04:13,501-Speed 5964.92 samples/sec   Loss 3.0721   LearningRate 0.0141   Epoch: 16   Global Step: 172390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:04:20,366-Speed 5968.99 samples/sec   Loss 3.0505   LearningRate 0.0141   Epoch: 16   Global Step: 172400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-09 06:04:27,220-Speed 5977.63 samples/sec   Loss 3.0264   LearningRate 0.0140   Epoch: 16   Global Step: 172410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-09 06:04:34,063-Speed 5986.50 samples/sec   Loss 3.0794   LearningRate 0.0140   Epoch: 16   Global Step: 172420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:04:40,918-Speed 5975.94 samples/sec   Loss 3.0458   LearningRate 0.0140   Epoch: 16   Global Step: 172430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:04:47,793-Speed 5961.36 samples/sec   Loss 3.0432   LearningRate 0.0140   Epoch: 16   Global Step: 172440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:04:54,645-Speed 5978.68 samples/sec   Loss 3.0299   LearningRate 0.0140   Epoch: 16   Global Step: 172450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:05:01,519-Speed 5960.17 samples/sec   Loss 3.0427   LearningRate 0.0140   Epoch: 16   Global Step: 172460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:05:08,372-Speed 5977.45 samples/sec   Loss 3.0499   LearningRate 0.0140   Epoch: 16   Global Step: 172470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:05:15,233-Speed 5971.37 samples/sec   Loss 3.0166   LearningRate 0.0140   Epoch: 16   Global Step: 172480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:05:22,090-Speed 5976.97 samples/sec   Loss 2.9922   LearningRate 0.0140   Epoch: 16   Global Step: 172490   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:05:28,963-Speed 5960.49 samples/sec   Loss 3.0725   LearningRate 0.0140   Epoch: 16   Global Step: 172500   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:05:35,825-Speed 5969.74 samples/sec   Loss 3.0574   LearningRate 0.0140   Epoch: 16   Global Step: 172510   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:05:42,686-Speed 5971.29 samples/sec   Loss 3.0386   LearningRate 0.0140   Epoch: 16   Global Step: 172520   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:05:49,558-Speed 5962.38 samples/sec   Loss 3.0304   LearningRate 0.0139   Epoch: 16   Global Step: 172530   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:05:56,411-Speed 5977.81 samples/sec   Loss 3.0800   LearningRate 0.0139   Epoch: 16   Global Step: 172540   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:06:03,279-Speed 5965.02 samples/sec   Loss 3.0542   LearningRate 0.0139   Epoch: 16   Global Step: 172550   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:06:10,145-Speed 5967.34 samples/sec   Loss 3.0595   LearningRate 0.0139   Epoch: 16   Global Step: 172560   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:06:16,998-Speed 5978.56 samples/sec   Loss 2.9959   LearningRate 0.0139   Epoch: 16   Global Step: 172570   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:06:23,848-Speed 5980.00 samples/sec   Loss 3.0632   LearningRate 0.0139   Epoch: 16   Global Step: 172580   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:06:30,715-Speed 5966.31 samples/sec   Loss 3.0672   LearningRate 0.0139   Epoch: 16   Global Step: 172590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:06:37,586-Speed 5962.61 samples/sec   Loss 3.0436   LearningRate 0.0139   Epoch: 16   Global Step: 172600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:06:44,442-Speed 5975.58 samples/sec   Loss 3.0123   LearningRate 0.0139   Epoch: 16   Global Step: 172610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:06:51,304-Speed 5970.12 samples/sec   Loss 3.0948   LearningRate 0.0139   Epoch: 16   Global Step: 172620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:06:58,143-Speed 5990.65 samples/sec   Loss 3.0241   LearningRate 0.0139   Epoch: 16   Global Step: 172630   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:07:05,022-Speed 5955.61 samples/sec   Loss 3.0152   LearningRate 0.0139   Epoch: 16   Global Step: 172640   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:07:11,883-Speed 5972.78 samples/sec   Loss 3.0538   LearningRate 0.0139   Epoch: 16   Global Step: 172650   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:07:18,741-Speed 5973.53 samples/sec   Loss 3.0379   LearningRate 0.0138   Epoch: 16   Global Step: 172660   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:07:25,610-Speed 5964.44 samples/sec   Loss 3.0384   LearningRate 0.0138   Epoch: 16   Global Step: 172670   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:07:32,462-Speed 5979.53 samples/sec   Loss 3.0303   LearningRate 0.0138   Epoch: 16   Global Step: 172680   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:07:39,325-Speed 5968.82 samples/sec   Loss 3.0207   LearningRate 0.0138   Epoch: 16   Global Step: 172690   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:07:46,199-Speed 5960.35 samples/sec   Loss 3.0202   LearningRate 0.0138   Epoch: 16   Global Step: 172700   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:07:53,063-Speed 5969.15 samples/sec   Loss 2.9961   LearningRate 0.0138   Epoch: 16   Global Step: 172710   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:07:59,906-Speed 5985.91 samples/sec   Loss 3.0405   LearningRate 0.0138   Epoch: 16   Global Step: 172720   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:08:06,759-Speed 5978.33 samples/sec   Loss 3.0372   LearningRate 0.0138   Epoch: 16   Global Step: 172730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:08:13,615-Speed 5976.53 samples/sec   Loss 3.0579   LearningRate 0.0138   Epoch: 16   Global Step: 172740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:08:20,486-Speed 5962.77 samples/sec   Loss 3.0615   LearningRate 0.0138   Epoch: 16   Global Step: 172750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:08:27,354-Speed 5964.70 samples/sec   Loss 3.0527   LearningRate 0.0138   Epoch: 16   Global Step: 172760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:08:34,205-Speed 5981.95 samples/sec   Loss 3.0395   LearningRate 0.0138   Epoch: 16   Global Step: 172770   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:08:41,057-Speed 5978.92 samples/sec   Loss 3.0037   LearningRate 0.0137   Epoch: 16   Global Step: 172780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:08:47,908-Speed 5979.27 samples/sec   Loss 3.0483   LearningRate 0.0137   Epoch: 16   Global Step: 172790   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:08:54,762-Speed 5977.41 samples/sec   Loss 3.0054   LearningRate 0.0137   Epoch: 16   Global Step: 172800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:09:01,610-Speed 5982.38 samples/sec   Loss 3.0748   LearningRate 0.0137   Epoch: 16   Global Step: 172810   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:09:08,468-Speed 5974.17 samples/sec   Loss 3.0042   LearningRate 0.0137   Epoch: 16   Global Step: 172820   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:09:15,315-Speed 5983.68 samples/sec   Loss 3.0369   LearningRate 0.0137   Epoch: 16   Global Step: 172830   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:09:22,171-Speed 5975.02 samples/sec   Loss 3.0296   LearningRate 0.0137   Epoch: 16   Global Step: 172840   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:09:29,035-Speed 5968.83 samples/sec   Loss 3.0054   LearningRate 0.0137   Epoch: 16   Global Step: 172850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:09:35,900-Speed 5967.74 samples/sec   Loss 3.0382   LearningRate 0.0137   Epoch: 16   Global Step: 172860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:09:42,765-Speed 5967.45 samples/sec   Loss 3.0407   LearningRate 0.0137   Epoch: 16   Global Step: 172870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:09:49,620-Speed 5976.41 samples/sec   Loss 3.0349   LearningRate 0.0137   Epoch: 16   Global Step: 172880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:09:56,483-Speed 5970.87 samples/sec   Loss 3.0194   LearningRate 0.0137   Epoch: 16   Global Step: 172890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:10:03,348-Speed 5967.84 samples/sec   Loss 3.0122   LearningRate 0.0137   Epoch: 16   Global Step: 172900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:10:10,207-Speed 5972.93 samples/sec   Loss 2.9937   LearningRate 0.0136   Epoch: 16   Global Step: 172910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:10:17,072-Speed 5968.55 samples/sec   Loss 3.0397   LearningRate 0.0136   Epoch: 16   Global Step: 172920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:10:23,918-Speed 5983.20 samples/sec   Loss 3.0216   LearningRate 0.0136   Epoch: 16   Global Step: 172930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-09 06:10:30,791-Speed 5960.64 samples/sec   Loss 3.0508   LearningRate 0.0136   Epoch: 16   Global Step: 172940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-09 06:10:37,656-Speed 5968.13 samples/sec   Loss 3.0026   LearningRate 0.0136   Epoch: 16   Global Step: 172950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:10:44,507-Speed 5979.30 samples/sec   Loss 3.0146   LearningRate 0.0136   Epoch: 16   Global Step: 172960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:10:51,371-Speed 5969.17 samples/sec   Loss 2.9998   LearningRate 0.0136   Epoch: 16   Global Step: 172970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:10:58,215-Speed 5985.89 samples/sec   Loss 3.0497   LearningRate 0.0136   Epoch: 16   Global Step: 172980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:11:05,073-Speed 5973.24 samples/sec   Loss 3.0028   LearningRate 0.0136   Epoch: 16   Global Step: 172990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:11:11,942-Speed 5963.95 samples/sec   Loss 3.0135   LearningRate 0.0136   Epoch: 16   Global Step: 173000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:11:18,801-Speed 5972.88 samples/sec   Loss 3.0161   LearningRate 0.0136   Epoch: 16   Global Step: 173010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:11:25,664-Speed 5968.84 samples/sec   Loss 3.0213   LearningRate 0.0136   Epoch: 16   Global Step: 173020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:11:32,541-Speed 5960.03 samples/sec   Loss 2.9543   LearningRate 0.0135   Epoch: 16   Global Step: 173030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:11:39,403-Speed 5972.22 samples/sec   Loss 3.0195   LearningRate 0.0135   Epoch: 16   Global Step: 173040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:11:46,245-Speed 5986.80 samples/sec   Loss 2.9766   LearningRate 0.0135   Epoch: 16   Global Step: 173050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:11:53,096-Speed 5979.85 samples/sec   Loss 3.0068   LearningRate 0.0135   Epoch: 16   Global Step: 173060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:11:59,966-Speed 5963.88 samples/sec   Loss 2.9893   LearningRate 0.0135   Epoch: 16   Global Step: 173070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:12:06,823-Speed 5974.44 samples/sec   Loss 2.9978   LearningRate 0.0135   Epoch: 16   Global Step: 173080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:12:13,682-Speed 5972.63 samples/sec   Loss 2.9798   LearningRate 0.0135   Epoch: 16   Global Step: 173090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:12:20,530-Speed 5985.68 samples/sec   Loss 3.0467   LearningRate 0.0135   Epoch: 16   Global Step: 173100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:12:27,373-Speed 5986.60 samples/sec   Loss 3.0011   LearningRate 0.0135   Epoch: 16   Global Step: 173110   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:12:34,214-Speed 5990.59 samples/sec   Loss 2.9646   LearningRate 0.0135   Epoch: 16   Global Step: 173120   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:12:41,081-Speed 5965.96 samples/sec   Loss 2.9866   LearningRate 0.0135   Epoch: 16   Global Step: 173130   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:12:47,946-Speed 5967.81 samples/sec   Loss 2.9787   LearningRate 0.0135   Epoch: 16   Global Step: 173140   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:12:54,794-Speed 5983.84 samples/sec   Loss 2.9661   LearningRate 0.0135   Epoch: 16   Global Step: 173150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-09 06:13:01,657-Speed 5969.90 samples/sec   Loss 3.0282   LearningRate 0.0134   Epoch: 16   Global Step: 173160   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:13:08,540-Speed 5951.33 samples/sec   Loss 2.9763   LearningRate 0.0134   Epoch: 16   Global Step: 173170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:13:15,392-Speed 5979.17 samples/sec   Loss 2.9972   LearningRate 0.0134   Epoch: 16   Global Step: 173180   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:13:22,250-Speed 5978.34 samples/sec   Loss 3.0127   LearningRate 0.0134   Epoch: 16   Global Step: 173190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:13:29,101-Speed 5980.17 samples/sec   Loss 3.0009   LearningRate 0.0134   Epoch: 16   Global Step: 173200   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:13:36,005-Speed 5933.70 samples/sec   Loss 2.9938   LearningRate 0.0134   Epoch: 16   Global Step: 173210   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:13:42,900-Speed 5942.19 samples/sec   Loss 3.0107   LearningRate 0.0134   Epoch: 16   Global Step: 173220   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:13:49,755-Speed 5975.86 samples/sec   Loss 2.9842   LearningRate 0.0134   Epoch: 16   Global Step: 173230   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:13:56,618-Speed 5969.31 samples/sec   Loss 2.9909   LearningRate 0.0134   Epoch: 16   Global Step: 173240   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:14:03,484-Speed 5967.20 samples/sec   Loss 2.9524   LearningRate 0.0134   Epoch: 16   Global Step: 173250   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:14:10,370-Speed 5949.49 samples/sec   Loss 2.9878   LearningRate 0.0134   Epoch: 16   Global Step: 173260   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:14:17,282-Speed 5927.24 samples/sec   Loss 2.9697   LearningRate 0.0134   Epoch: 16   Global Step: 173270   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:14:24,228-Speed 5898.16 samples/sec   Loss 2.9817   LearningRate 0.0134   Epoch: 16   Global Step: 173280   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:14:31,154-Speed 5915.08 samples/sec   Loss 3.0025   LearningRate 0.0133   Epoch: 16   Global Step: 173290   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:14:38,070-Speed 5924.10 samples/sec   Loss 2.9885   LearningRate 0.0133   Epoch: 16   Global Step: 173300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:14:44,923-Speed 5977.87 samples/sec   Loss 3.0049   LearningRate 0.0133   Epoch: 16   Global Step: 173310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:14:51,814-Speed 5945.08 samples/sec   Loss 2.9455   LearningRate 0.0133   Epoch: 16   Global Step: 173320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:14:58,667-Speed 5978.87 samples/sec   Loss 2.9792   LearningRate 0.0133   Epoch: 16   Global Step: 173330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:15:05,522-Speed 5976.29 samples/sec   Loss 2.9908   LearningRate 0.0133   Epoch: 16   Global Step: 173340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:15:12,383-Speed 5971.22 samples/sec   Loss 2.9755   LearningRate 0.0133   Epoch: 16   Global Step: 173350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:15:19,239-Speed 5975.73 samples/sec   Loss 3.0114   LearningRate 0.0133   Epoch: 16   Global Step: 173360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:15:26,122-Speed 5951.99 samples/sec   Loss 3.0357   LearningRate 0.0133   Epoch: 16   Global Step: 173370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:15:32,992-Speed 5963.31 samples/sec   Loss 2.9722   LearningRate 0.0133   Epoch: 16   Global Step: 173380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:15:39,874-Speed 5953.38 samples/sec   Loss 2.9875   LearningRate 0.0133   Epoch: 16   Global Step: 173390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:15:46,798-Speed 5916.80 samples/sec   Loss 2.9525   LearningRate 0.0133   Epoch: 16   Global Step: 173400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:15:53,659-Speed 5971.13 samples/sec   Loss 2.9765   LearningRate 0.0133   Epoch: 16   Global Step: 173410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:16:00,518-Speed 5972.79 samples/sec   Loss 3.0022   LearningRate 0.0132   Epoch: 16   Global Step: 173420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:16:07,487-Speed 5878.84 samples/sec   Loss 2.9936   LearningRate 0.0132   Epoch: 16   Global Step: 173430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:16:14,390-Speed 5934.40 samples/sec   Loss 3.0057   LearningRate 0.0132   Epoch: 16   Global Step: 173440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:16:21,251-Speed 5971.96 samples/sec   Loss 2.9871   LearningRate 0.0132   Epoch: 16   Global Step: 173450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:16:28,112-Speed 5971.40 samples/sec   Loss 2.9769   LearningRate 0.0132   Epoch: 16   Global Step: 173460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:16:34,974-Speed 5970.13 samples/sec   Loss 2.9869   LearningRate 0.0132   Epoch: 16   Global Step: 173470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:16:41,843-Speed 5964.66 samples/sec   Loss 2.9606   LearningRate 0.0132   Epoch: 16   Global Step: 173480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:16:48,705-Speed 5969.34 samples/sec   Loss 2.9653   LearningRate 0.0132   Epoch: 16   Global Step: 173490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:16:55,580-Speed 5959.36 samples/sec   Loss 2.9609   LearningRate 0.0132   Epoch: 16   Global Step: 173500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-09 06:17:02,412-Speed 5996.71 samples/sec   Loss 2.9531   LearningRate 0.0132   Epoch: 16   Global Step: 173510   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:17:09,291-Speed 5955.51 samples/sec   Loss 2.9782   LearningRate 0.0132   Epoch: 16   Global Step: 173520   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:17:16,155-Speed 5968.11 samples/sec   Loss 2.9841   LearningRate 0.0132   Epoch: 16   Global Step: 173530   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:17:23,008-Speed 5978.36 samples/sec   Loss 2.9961   LearningRate 0.0131   Epoch: 16   Global Step: 173540   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:17:29,913-Speed 5933.22 samples/sec   Loss 2.9649   LearningRate 0.0131   Epoch: 16   Global Step: 173550   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:17:36,800-Speed 5948.87 samples/sec   Loss 2.9728   LearningRate 0.0131   Epoch: 16   Global Step: 173560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:17:43,658-Speed 5973.52 samples/sec   Loss 2.9734   LearningRate 0.0131   Epoch: 16   Global Step: 173570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:17:50,527-Speed 5964.38 samples/sec   Loss 2.9165   LearningRate 0.0131   Epoch: 16   Global Step: 173580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:17:57,411-Speed 5951.24 samples/sec   Loss 2.9770   LearningRate 0.0131   Epoch: 16   Global Step: 173590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:18:04,267-Speed 5975.88 samples/sec   Loss 2.9290   LearningRate 0.0131   Epoch: 16   Global Step: 173600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:18:11,135-Speed 5964.66 samples/sec   Loss 2.9950   LearningRate 0.0131   Epoch: 16   Global Step: 173610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:18:18,017-Speed 5953.15 samples/sec   Loss 2.9752   LearningRate 0.0131   Epoch: 16   Global Step: 173620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:18:24,886-Speed 5963.95 samples/sec   Loss 2.9579   LearningRate 0.0131   Epoch: 16   Global Step: 173630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:18:31,743-Speed 5974.73 samples/sec   Loss 2.9705   LearningRate 0.0131   Epoch: 16   Global Step: 173640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:18:38,600-Speed 5974.29 samples/sec   Loss 2.9597   LearningRate 0.0131   Epoch: 16   Global Step: 173650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:18:45,476-Speed 5957.79 samples/sec   Loss 2.9542   LearningRate 0.0131   Epoch: 16   Global Step: 173660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:18:52,335-Speed 5972.34 samples/sec   Loss 2.9867   LearningRate 0.0130   Epoch: 16   Global Step: 173670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:18:59,217-Speed 5953.42 samples/sec   Loss 2.9790   LearningRate 0.0130   Epoch: 16   Global Step: 173680   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:19:06,081-Speed 5968.53 samples/sec   Loss 2.9645   LearningRate 0.0130   Epoch: 16   Global Step: 173690   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:19:12,975-Speed 5942.48 samples/sec   Loss 2.9369   LearningRate 0.0130   Epoch: 16   Global Step: 173700   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:19:19,829-Speed 5977.11 samples/sec   Loss 2.9569   LearningRate 0.0130   Epoch: 16   Global Step: 173710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-09 06:19:26,692-Speed 5969.96 samples/sec   Loss 2.9373   LearningRate 0.0130   Epoch: 16   Global Step: 173720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-01-09 06:19:33,539-Speed 5983.81 samples/sec   Loss 2.9642   LearningRate 0.0130   Epoch: 16   Global Step: 173730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:19:40,406-Speed 5965.67 samples/sec   Loss 2.9898   LearningRate 0.0130   Epoch: 16   Global Step: 173740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:19:47,266-Speed 5971.88 samples/sec   Loss 2.9776   LearningRate 0.0130   Epoch: 16   Global Step: 173750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:19:54,097-Speed 5997.87 samples/sec   Loss 2.9030   LearningRate 0.0130   Epoch: 16   Global Step: 173760   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:20:00,957-Speed 5971.54 samples/sec   Loss 2.9478   LearningRate 0.0130   Epoch: 16   Global Step: 173770   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:20:07,797-Speed 5989.62 samples/sec   Loss 2.9482   LearningRate 0.0130   Epoch: 16   Global Step: 173780   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:20:14,655-Speed 5973.35 samples/sec   Loss 2.9437   LearningRate 0.0130   Epoch: 16   Global Step: 173790   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:20:21,504-Speed 5981.19 samples/sec   Loss 2.9326   LearningRate 0.0129   Epoch: 16   Global Step: 173800   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:20:28,356-Speed 5979.62 samples/sec   Loss 2.9394   LearningRate 0.0129   Epoch: 16   Global Step: 173810   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:20:35,249-Speed 5943.42 samples/sec   Loss 2.9528   LearningRate 0.0129   Epoch: 16   Global Step: 173820   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:20:42,151-Speed 5935.61 samples/sec   Loss 2.9165   LearningRate 0.0129   Epoch: 16   Global Step: 173830   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:20:48,996-Speed 5984.95 samples/sec   Loss 2.9428   LearningRate 0.0129   Epoch: 16   Global Step: 173840   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:20:55,927-Speed 5910.86 samples/sec   Loss 2.9415   LearningRate 0.0129   Epoch: 16   Global Step: 173850   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:21:02,774-Speed 5983.71 samples/sec   Loss 2.9574   LearningRate 0.0129   Epoch: 16   Global Step: 173860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:21:09,619-Speed 5984.54 samples/sec   Loss 2.9745   LearningRate 0.0129   Epoch: 16   Global Step: 173870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:21:16,481-Speed 5970.67 samples/sec   Loss 2.9543   LearningRate 0.0129   Epoch: 16   Global Step: 173880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:21:23,350-Speed 5963.66 samples/sec   Loss 2.9473   LearningRate 0.0129   Epoch: 16   Global Step: 173890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:21:30,200-Speed 5980.98 samples/sec   Loss 2.9487   LearningRate 0.0129   Epoch: 16   Global Step: 173900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:21:37,065-Speed 5968.45 samples/sec   Loss 2.9495   LearningRate 0.0129   Epoch: 16   Global Step: 173910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:21:43,932-Speed 5966.11 samples/sec   Loss 2.9320   LearningRate 0.0129   Epoch: 16   Global Step: 173920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:21:50,798-Speed 5967.14 samples/sec   Loss 2.9352   LearningRate 0.0128   Epoch: 16   Global Step: 173930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:21:57,642-Speed 5987.93 samples/sec   Loss 2.9588   LearningRate 0.0128   Epoch: 16   Global Step: 173940   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:22:04,491-Speed 5981.16 samples/sec   Loss 2.9437   LearningRate 0.0128   Epoch: 16   Global Step: 173950   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:22:11,343-Speed 5978.50 samples/sec   Loss 2.9241   LearningRate 0.0128   Epoch: 16   Global Step: 173960   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:22:18,198-Speed 5976.64 samples/sec   Loss 2.9367   LearningRate 0.0128   Epoch: 16   Global Step: 173970   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:22:25,081-Speed 5951.93 samples/sec   Loss 2.9676   LearningRate 0.0128   Epoch: 16   Global Step: 173980   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:22:31,946-Speed 5968.14 samples/sec   Loss 2.9707   LearningRate 0.0128   Epoch: 16   Global Step: 173990   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:22:38,811-Speed 5968.22 samples/sec   Loss 2.9613   LearningRate 0.0128   Epoch: 16   Global Step: 174000   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:22:45,680-Speed 5963.74 samples/sec   Loss 2.9460   LearningRate 0.0128   Epoch: 16   Global Step: 174010   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:22:52,567-Speed 5948.70 samples/sec   Loss 2.9505   LearningRate 0.0128   Epoch: 16   Global Step: 174020   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:22:59,458-Speed 5945.43 samples/sec   Loss 2.9112   LearningRate 0.0128   Epoch: 16   Global Step: 174030   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:23:06,311-Speed 5977.84 samples/sec   Loss 2.9117   LearningRate 0.0128   Epoch: 16   Global Step: 174040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:23:13,163-Speed 5979.18 samples/sec   Loss 2.9227   LearningRate 0.0128   Epoch: 16   Global Step: 174050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:23:20,066-Speed 5937.62 samples/sec   Loss 2.9402   LearningRate 0.0127   Epoch: 16   Global Step: 174060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:23:26,925-Speed 5972.27 samples/sec   Loss 2.9325   LearningRate 0.0127   Epoch: 16   Global Step: 174070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:23:33,775-Speed 5980.99 samples/sec   Loss 2.9618   LearningRate 0.0127   Epoch: 16   Global Step: 174080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:23:40,627-Speed 5981.53 samples/sec   Loss 2.9634   LearningRate 0.0127   Epoch: 16   Global Step: 174090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:23:47,506-Speed 5955.00 samples/sec   Loss 2.9623   LearningRate 0.0127   Epoch: 16   Global Step: 174100   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:23:54,369-Speed 5969.04 samples/sec   Loss 2.9441   LearningRate 0.0127   Epoch: 16   Global Step: 174110   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:24:01,224-Speed 5977.16 samples/sec   Loss 2.9182   LearningRate 0.0127   Epoch: 16   Global Step: 174120   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:24:08,078-Speed 5976.39 samples/sec   Loss 2.9132   LearningRate 0.0127   Epoch: 16   Global Step: 174130   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:24:14,965-Speed 5948.88 samples/sec   Loss 2.9614   LearningRate 0.0127   Epoch: 16   Global Step: 174140   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:24:21,822-Speed 5975.15 samples/sec   Loss 2.9562   LearningRate 0.0127   Epoch: 16   Global Step: 174150   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:24:28,712-Speed 5945.52 samples/sec   Loss 2.9463   LearningRate 0.0127   Epoch: 16   Global Step: 174160   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:24:35,565-Speed 5978.37 samples/sec   Loss 2.9671   LearningRate 0.0127   Epoch: 16   Global Step: 174170   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:24:42,418-Speed 5977.75 samples/sec   Loss 2.9142   LearningRate 0.0127   Epoch: 16   Global Step: 174180   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:24:49,275-Speed 5973.41 samples/sec   Loss 2.9184   LearningRate 0.0126   Epoch: 16   Global Step: 174190   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-01-09 06:24:56,129-Speed 5977.06 samples/sec   Loss 2.9438   LearningRate 0.0126   Epoch: 16   Global Step: 174200   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:25:02,988-Speed 5973.85 samples/sec   Loss 2.9367   LearningRate 0.0126   Epoch: 16   Global Step: 174210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-01-09 06:25:09,837-Speed 5980.74 samples/sec   Loss 2.9417   LearningRate 0.0126   Epoch: 16   Global Step: 174220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:25:16,710-Speed 5960.67 samples/sec   Loss 2.9555   LearningRate 0.0126   Epoch: 16   Global Step: 174230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:25:23,606-Speed 5941.05 samples/sec   Loss 2.9275   LearningRate 0.0126   Epoch: 16   Global Step: 174240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:25:30,486-Speed 5954.06 samples/sec   Loss 2.9382   LearningRate 0.0126   Epoch: 16   Global Step: 174250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:25:37,352-Speed 5967.17 samples/sec   Loss 2.9634   LearningRate 0.0126   Epoch: 16   Global Step: 174260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:25:44,205-Speed 5978.92 samples/sec   Loss 2.9624   LearningRate 0.0126   Epoch: 16   Global Step: 174270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:25:51,046-Speed 5987.84 samples/sec   Loss 2.9287   LearningRate 0.0126   Epoch: 16   Global Step: 174280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:25:57,910-Speed 5971.33 samples/sec   Loss 2.8770   LearningRate 0.0126   Epoch: 16   Global Step: 174290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:26:04,754-Speed 5985.87 samples/sec   Loss 2.9293   LearningRate 0.0126   Epoch: 16   Global Step: 174300   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 06:26:11,609-Speed 5978.66 samples/sec   Loss 2.8951   LearningRate 0.0126   Epoch: 16   Global Step: 174310   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 06:26:18,457-Speed 5982.68 samples/sec   Loss 2.8887   LearningRate 0.0126   Epoch: 16   Global Step: 174320   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 06:26:25,309-Speed 5979.43 samples/sec   Loss 2.9112   LearningRate 0.0125   Epoch: 16   Global Step: 174330   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 06:26:32,163-Speed 5976.24 samples/sec   Loss 2.9291   LearningRate 0.0125   Epoch: 16   Global Step: 174340   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 06:26:39,023-Speed 5972.58 samples/sec   Loss 2.9287   LearningRate 0.0125   Epoch: 16   Global Step: 174350   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 06:26:45,877-Speed 5978.09 samples/sec   Loss 2.9098   LearningRate 0.0125   Epoch: 16   Global Step: 174360   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 06:26:52,724-Speed 5982.68 samples/sec   Loss 2.9030   LearningRate 0.0125   Epoch: 16   Global Step: 174370   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 06:26:59,570-Speed 5984.25 samples/sec   Loss 2.8606   LearningRate 0.0125   Epoch: 16   Global Step: 174380   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 06:27:06,450-Speed 5955.73 samples/sec   Loss 2.8844   LearningRate 0.0125   Epoch: 16   Global Step: 174390   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-01-09 06:27:13,329-Speed 5955.18 samples/sec   Loss 2.8900   LearningRate 0.0125   Epoch: 16   Global Step: 174400   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:27:20,183-Speed 5976.90 samples/sec   Loss 2.8885   LearningRate 0.0125   Epoch: 16   Global Step: 174410   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:27:27,036-Speed 5978.44 samples/sec   Loss 2.9312   LearningRate 0.0125   Epoch: 16   Global Step: 174420   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:27:33,885-Speed 5981.38 samples/sec   Loss 2.9080   LearningRate 0.0125   Epoch: 16   Global Step: 174430   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:27:40,736-Speed 5979.64 samples/sec   Loss 2.9423   LearningRate 0.0125   Epoch: 16   Global Step: 174440   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:27:47,604-Speed 5967.69 samples/sec   Loss 2.8566   LearningRate 0.0125   Epoch: 16   Global Step: 174450   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:27:54,454-Speed 5980.74 samples/sec   Loss 2.9389   LearningRate 0.0124   Epoch: 16   Global Step: 174460   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:28:01,295-Speed 5988.57 samples/sec   Loss 2.8984   LearningRate 0.0124   Epoch: 16   Global Step: 174470   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:28:08,154-Speed 5973.25 samples/sec   Loss 2.9132   LearningRate 0.0124   Epoch: 16   Global Step: 174480   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:28:14,995-Speed 5988.23 samples/sec   Loss 2.9137   LearningRate 0.0124   Epoch: 16   Global Step: 174490   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:28:21,863-Speed 5965.30 samples/sec   Loss 2.9384   LearningRate 0.0124   Epoch: 16   Global Step: 174500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:28:28,753-Speed 5946.69 samples/sec   Loss 2.9233   LearningRate 0.0124   Epoch: 16   Global Step: 174510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:28:35,603-Speed 5980.51 samples/sec   Loss 2.9472   LearningRate 0.0124   Epoch: 16   Global Step: 174520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:28:42,450-Speed 5983.46 samples/sec   Loss 2.8921   LearningRate 0.0124   Epoch: 16   Global Step: 174530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:28:49,308-Speed 5975.00 samples/sec   Loss 2.9192   LearningRate 0.0124   Epoch: 16   Global Step: 174540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:28:56,184-Speed 5958.00 samples/sec   Loss 2.8750   LearningRate 0.0124   Epoch: 16   Global Step: 174550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:29:03,053-Speed 5964.28 samples/sec   Loss 2.8989   LearningRate 0.0124   Epoch: 16   Global Step: 174560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:29:09,893-Speed 5989.97 samples/sec   Loss 2.9070   LearningRate 0.0124   Epoch: 16   Global Step: 174570   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:29:16,762-Speed 5963.99 samples/sec   Loss 2.9270   LearningRate 0.0124   Epoch: 16   Global Step: 174580   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:29:23,615-Speed 5977.78 samples/sec   Loss 2.9031   LearningRate 0.0123   Epoch: 16   Global Step: 174590   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:29:30,503-Speed 5948.18 samples/sec   Loss 2.8917   LearningRate 0.0123   Epoch: 16   Global Step: 174600   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:29:37,358-Speed 5975.53 samples/sec   Loss 2.9313   LearningRate 0.0123   Epoch: 16   Global Step: 174610   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:29:44,233-Speed 5959.48 samples/sec   Loss 2.8837   LearningRate 0.0123   Epoch: 16   Global Step: 174620   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:29:51,097-Speed 5970.05 samples/sec   Loss 2.9085   LearningRate 0.0123   Epoch: 16   Global Step: 174630   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:29:57,962-Speed 5968.20 samples/sec   Loss 2.9047   LearningRate 0.0123   Epoch: 16   Global Step: 174640   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:30:04,817-Speed 5978.04 samples/sec   Loss 2.8943   LearningRate 0.0123   Epoch: 16   Global Step: 174650   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:30:11,662-Speed 5985.06 samples/sec   Loss 2.8750   LearningRate 0.0123   Epoch: 16   Global Step: 174660   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:30:18,519-Speed 5974.94 samples/sec   Loss 2.8970   LearningRate 0.0123   Epoch: 16   Global Step: 174670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:30:25,415-Speed 5941.26 samples/sec   Loss 2.8614   LearningRate 0.0123   Epoch: 16   Global Step: 174680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:30:32,315-Speed 5937.44 samples/sec   Loss 2.8866   LearningRate 0.0123   Epoch: 16   Global Step: 174690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:30:39,184-Speed 5963.98 samples/sec   Loss 2.8951   LearningRate 0.0123   Epoch: 16   Global Step: 174700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:30:46,057-Speed 5960.72 samples/sec   Loss 2.9034   LearningRate 0.0123   Epoch: 16   Global Step: 174710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:30:52,946-Speed 5949.11 samples/sec   Loss 2.9068   LearningRate 0.0122   Epoch: 16   Global Step: 174720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:30:59,804-Speed 5974.21 samples/sec   Loss 2.8774   LearningRate 0.0122   Epoch: 16   Global Step: 174730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:31:06,654-Speed 5980.38 samples/sec   Loss 2.8730   LearningRate 0.0122   Epoch: 16   Global Step: 174740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:31:13,512-Speed 5973.80 samples/sec   Loss 2.8488   LearningRate 0.0122   Epoch: 16   Global Step: 174750   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:31:20,360-Speed 5982.04 samples/sec   Loss 2.8897   LearningRate 0.0122   Epoch: 16   Global Step: 174760   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:31:27,231-Speed 5964.56 samples/sec   Loss 2.8834   LearningRate 0.0122   Epoch: 16   Global Step: 174770   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:31:34,091-Speed 5971.42 samples/sec   Loss 2.8373   LearningRate 0.0122   Epoch: 16   Global Step: 174780   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:31:40,947-Speed 5975.52 samples/sec   Loss 2.8551   LearningRate 0.0122   Epoch: 16   Global Step: 174790   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:31:47,805-Speed 5974.13 samples/sec   Loss 2.8790   LearningRate 0.0122   Epoch: 16   Global Step: 174800   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:31:54,674-Speed 5964.43 samples/sec   Loss 2.8662   LearningRate 0.0122   Epoch: 16   Global Step: 174810   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:32:01,528-Speed 5976.84 samples/sec   Loss 2.8818   LearningRate 0.0122   Epoch: 16   Global Step: 174820   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:32:08,395-Speed 5967.57 samples/sec   Loss 2.8925   LearningRate 0.0122   Epoch: 16   Global Step: 174830   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:32:15,384-Speed 5862.45 samples/sec   Loss 2.9285   LearningRate 0.0122   Epoch: 16   Global Step: 174840   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:32:22,242-Speed 5973.67 samples/sec   Loss 2.8669   LearningRate 0.0122   Epoch: 16   Global Step: 174850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:32:29,134-Speed 5946.92 samples/sec   Loss 2.8926   LearningRate 0.0121   Epoch: 16   Global Step: 174860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:32:36,007-Speed 5960.81 samples/sec   Loss 2.8729   LearningRate 0.0121   Epoch: 16   Global Step: 174870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:32:42,885-Speed 5956.49 samples/sec   Loss 2.8224   LearningRate 0.0121   Epoch: 16   Global Step: 174880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:32:49,747-Speed 5970.16 samples/sec   Loss 2.8828   LearningRate 0.0121   Epoch: 16   Global Step: 174890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:32:56,619-Speed 5961.84 samples/sec   Loss 2.8896   LearningRate 0.0121   Epoch: 16   Global Step: 174900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:33:03,490-Speed 5962.78 samples/sec   Loss 2.8523   LearningRate 0.0121   Epoch: 16   Global Step: 174910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:33:10,350-Speed 5972.27 samples/sec   Loss 2.8895   LearningRate 0.0121   Epoch: 16   Global Step: 174920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:33:17,183-Speed 5995.37 samples/sec   Loss 2.8366   LearningRate 0.0121   Epoch: 16   Global Step: 174930   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:33:24,042-Speed 5972.65 samples/sec   Loss 2.8765   LearningRate 0.0121   Epoch: 16   Global Step: 174940   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:33:30,898-Speed 5975.83 samples/sec   Loss 2.8561   LearningRate 0.0121   Epoch: 16   Global Step: 174950   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:33:37,750-Speed 5978.86 samples/sec   Loss 2.8677   LearningRate 0.0121   Epoch: 16   Global Step: 174960   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:33:44,592-Speed 5986.92 samples/sec   Loss 2.9110   LearningRate 0.0121   Epoch: 16   Global Step: 174970   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:33:51,438-Speed 5983.97 samples/sec   Loss 2.8573   LearningRate 0.0121   Epoch: 16   Global Step: 174980   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:33:58,296-Speed 5974.67 samples/sec   Loss 2.8670   LearningRate 0.0120   Epoch: 16   Global Step: 174990   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:34:05,222-Speed 5914.87 samples/sec   Loss 2.8607   LearningRate 0.0120   Epoch: 16   Global Step: 175000   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:34:31,893-[lfw][175000]XNorm: 23.882786
Training: 2022-01-09 06:34:31,893-[lfw][175000]Accuracy-Flip: 0.99800+-0.00277
Training: 2022-01-09 06:34:31,894-[lfw][175000]Accuracy-Highest: 0.99817
Training: 2022-01-09 06:35:05,381-[cfp_fp][175000]XNorm: 21.642738
Training: 2022-01-09 06:35:05,382-[cfp_fp][175000]Accuracy-Flip: 0.99229+-0.00448
Training: 2022-01-09 06:35:05,383-[cfp_fp][175000]Accuracy-Highest: 0.99229
Training: 2022-01-09 06:35:32,122-[agedb_30][175000]XNorm: 23.278589
Training: 2022-01-09 06:35:32,123-[agedb_30][175000]Accuracy-Flip: 0.97833+-0.00654
Training: 2022-01-09 06:35:32,123-[agedb_30][175000]Accuracy-Highest: 0.98067
Training: 2022-01-09 06:35:38,974-Speed 436.91 samples/sec   Loss 2.8500   LearningRate 0.0120   Epoch: 16   Global Step: 175010   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:35:45,809-Speed 5994.52 samples/sec   Loss 2.8315   LearningRate 0.0120   Epoch: 16   Global Step: 175020   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:35:52,677-Speed 5965.87 samples/sec   Loss 2.8599   LearningRate 0.0120   Epoch: 16   Global Step: 175030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:35:59,562-Speed 5954.18 samples/sec   Loss 2.8450   LearningRate 0.0120   Epoch: 16   Global Step: 175040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:36:06,414-Speed 5978.39 samples/sec   Loss 2.8548   LearningRate 0.0120   Epoch: 16   Global Step: 175050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:36:13,271-Speed 5974.20 samples/sec   Loss 2.8598   LearningRate 0.0120   Epoch: 16   Global Step: 175060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:36:20,127-Speed 5975.70 samples/sec   Loss 2.8733   LearningRate 0.0120   Epoch: 16   Global Step: 175070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:36:26,993-Speed 5966.23 samples/sec   Loss 2.8930   LearningRate 0.0120   Epoch: 16   Global Step: 175080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:36:33,849-Speed 5976.15 samples/sec   Loss 2.8703   LearningRate 0.0120   Epoch: 16   Global Step: 175090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:36:40,698-Speed 5982.36 samples/sec   Loss 2.8938   LearningRate 0.0120   Epoch: 16   Global Step: 175100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:36:47,567-Speed 5963.72 samples/sec   Loss 2.8735   LearningRate 0.0120   Epoch: 16   Global Step: 175110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:36:54,416-Speed 5982.29 samples/sec   Loss 2.8739   LearningRate 0.0120   Epoch: 16   Global Step: 175120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:37:01,265-Speed 5980.96 samples/sec   Loss 2.8596   LearningRate 0.0119   Epoch: 16   Global Step: 175130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-09 06:37:08,108-Speed 5987.02 samples/sec   Loss 2.8371   LearningRate 0.0119   Epoch: 16   Global Step: 175140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:37:14,959-Speed 5980.04 samples/sec   Loss 2.8572   LearningRate 0.0119   Epoch: 16   Global Step: 175150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:37:21,828-Speed 5964.09 samples/sec   Loss 2.8730   LearningRate 0.0119   Epoch: 16   Global Step: 175160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:37:28,671-Speed 5986.52 samples/sec   Loss 2.8736   LearningRate 0.0119   Epoch: 16   Global Step: 175170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:37:35,525-Speed 5977.97 samples/sec   Loss 2.8544   LearningRate 0.0119   Epoch: 16   Global Step: 175180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:37:42,396-Speed 5962.67 samples/sec   Loss 2.8796   LearningRate 0.0119   Epoch: 16   Global Step: 175190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:37:49,297-Speed 5936.61 samples/sec   Loss 2.8295   LearningRate 0.0119   Epoch: 16   Global Step: 175200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:37:56,148-Speed 5979.95 samples/sec   Loss 2.8082   LearningRate 0.0119   Epoch: 16   Global Step: 175210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:38:03,001-Speed 5981.24 samples/sec   Loss 2.8603   LearningRate 0.0119   Epoch: 16   Global Step: 175220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:38:09,857-Speed 5975.10 samples/sec   Loss 2.8497   LearningRate 0.0119   Epoch: 16   Global Step: 175230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:38:16,702-Speed 5985.90 samples/sec   Loss 2.8614   LearningRate 0.0119   Epoch: 16   Global Step: 175240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:38:23,571-Speed 5964.19 samples/sec   Loss 2.8490   LearningRate 0.0119   Epoch: 16   Global Step: 175250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:38:30,435-Speed 5968.44 samples/sec   Loss 2.8499   LearningRate 0.0118   Epoch: 16   Global Step: 175260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:38:37,310-Speed 5959.28 samples/sec   Loss 2.8715   LearningRate 0.0118   Epoch: 16   Global Step: 175270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:38:44,149-Speed 5990.89 samples/sec   Loss 2.8733   LearningRate 0.0118   Epoch: 16   Global Step: 175280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:38:50,992-Speed 5986.21 samples/sec   Loss 2.8313   LearningRate 0.0118   Epoch: 16   Global Step: 175290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:38:57,846-Speed 5980.17 samples/sec   Loss 2.8469   LearningRate 0.0118   Epoch: 16   Global Step: 175300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:39:07,394-Speed 4290.80 samples/sec   Loss 2.8283   LearningRate 0.0118   Epoch: 16   Global Step: 175310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:39:14,250-Speed 5975.61 samples/sec   Loss 2.8736   LearningRate 0.0118   Epoch: 16   Global Step: 175320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:39:21,096-Speed 5984.37 samples/sec   Loss 2.8608   LearningRate 0.0118   Epoch: 16   Global Step: 175330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:39:27,933-Speed 5991.97 samples/sec   Loss 2.8461   LearningRate 0.0118   Epoch: 16   Global Step: 175340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:39:34,802-Speed 5963.91 samples/sec   Loss 2.8554   LearningRate 0.0118   Epoch: 16   Global Step: 175350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:39:41,652-Speed 5981.08 samples/sec   Loss 2.8524   LearningRate 0.0118   Epoch: 16   Global Step: 175360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:39:48,481-Speed 5998.45 samples/sec   Loss 2.8535   LearningRate 0.0118   Epoch: 16   Global Step: 175370   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:39:55,323-Speed 5988.43 samples/sec   Loss 2.8439   LearningRate 0.0118   Epoch: 16   Global Step: 175380   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:40:02,166-Speed 5989.60 samples/sec   Loss 2.8340   LearningRate 0.0118   Epoch: 16   Global Step: 175390   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:40:09,020-Speed 5977.37 samples/sec   Loss 2.8379   LearningRate 0.0117   Epoch: 16   Global Step: 175400   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:40:15,854-Speed 5994.27 samples/sec   Loss 2.8381   LearningRate 0.0117   Epoch: 16   Global Step: 175410   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:40:22,720-Speed 5969.80 samples/sec   Loss 2.8722   LearningRate 0.0117   Epoch: 16   Global Step: 175420   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:40:29,571-Speed 5979.34 samples/sec   Loss 2.8539   LearningRate 0.0117   Epoch: 16   Global Step: 175430   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:40:36,407-Speed 5993.22 samples/sec   Loss 2.8002   LearningRate 0.0117   Epoch: 16   Global Step: 175440   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:40:43,252-Speed 5984.87 samples/sec   Loss 2.8568   LearningRate 0.0117   Epoch: 16   Global Step: 175450   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:40:50,094-Speed 5988.06 samples/sec   Loss 2.8538   LearningRate 0.0117   Epoch: 16   Global Step: 175460   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:40:56,931-Speed 5991.85 samples/sec   Loss 2.8908   LearningRate 0.0117   Epoch: 16   Global Step: 175470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:41:03,764-Speed 5995.18 samples/sec   Loss 2.8241   LearningRate 0.0117   Epoch: 16   Global Step: 175480   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:41:10,616-Speed 5980.55 samples/sec   Loss 2.8208   LearningRate 0.0117   Epoch: 16   Global Step: 175490   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:41:17,457-Speed 5988.49 samples/sec   Loss 2.8478   LearningRate 0.0117   Epoch: 16   Global Step: 175500   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:41:24,310-Speed 5978.66 samples/sec   Loss 2.8595   LearningRate 0.0117   Epoch: 16   Global Step: 175510   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:41:31,163-Speed 5977.20 samples/sec   Loss 2.8865   LearningRate 0.0117   Epoch: 16   Global Step: 175520   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:41:38,010-Speed 5983.61 samples/sec   Loss 2.8337   LearningRate 0.0116   Epoch: 16   Global Step: 175530   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:41:44,861-Speed 5980.18 samples/sec   Loss 2.8562   LearningRate 0.0116   Epoch: 16   Global Step: 175540   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:41:51,742-Speed 5953.69 samples/sec   Loss 2.8485   LearningRate 0.0116   Epoch: 16   Global Step: 175550   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:41:58,623-Speed 5954.13 samples/sec   Loss 2.8440   LearningRate 0.0116   Epoch: 16   Global Step: 175560   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:42:05,474-Speed 5979.85 samples/sec   Loss 2.8476   LearningRate 0.0116   Epoch: 16   Global Step: 175570   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:42:12,330-Speed 5975.43 samples/sec   Loss 2.8307   LearningRate 0.0116   Epoch: 16   Global Step: 175580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:42:19,184-Speed 5977.23 samples/sec   Loss 2.8578   LearningRate 0.0116   Epoch: 16   Global Step: 175590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:42:26,037-Speed 5978.53 samples/sec   Loss 2.8261   LearningRate 0.0116   Epoch: 16   Global Step: 175600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:42:32,886-Speed 5981.30 samples/sec   Loss 2.8557   LearningRate 0.0116   Epoch: 16   Global Step: 175610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:42:39,793-Speed 5931.54 samples/sec   Loss 2.8178   LearningRate 0.0116   Epoch: 16   Global Step: 175620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:42:46,652-Speed 5973.49 samples/sec   Loss 2.8036   LearningRate 0.0116   Epoch: 16   Global Step: 175630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:42:53,504-Speed 5978.46 samples/sec   Loss 2.8100   LearningRate 0.0116   Epoch: 16   Global Step: 175640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:43:00,368-Speed 5968.51 samples/sec   Loss 2.8340   LearningRate 0.0116   Epoch: 16   Global Step: 175650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:43:07,209-Speed 5988.10 samples/sec   Loss 2.8261   LearningRate 0.0116   Epoch: 16   Global Step: 175660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:43:14,073-Speed 5968.96 samples/sec   Loss 2.9055   LearningRate 0.0115   Epoch: 16   Global Step: 175670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:43:20,924-Speed 5979.54 samples/sec   Loss 2.8538   LearningRate 0.0115   Epoch: 16   Global Step: 175680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-09 06:43:27,784-Speed 5972.42 samples/sec   Loss 2.8059   LearningRate 0.0115   Epoch: 16   Global Step: 175690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-09 06:43:34,633-Speed 5981.33 samples/sec   Loss 2.8264   LearningRate 0.0115   Epoch: 16   Global Step: 175700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:43:41,498-Speed 5967.25 samples/sec   Loss 2.7862   LearningRate 0.0115   Epoch: 16   Global Step: 175710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:43:48,358-Speed 5972.05 samples/sec   Loss 2.8200   LearningRate 0.0115   Epoch: 16   Global Step: 175720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:43:55,233-Speed 5958.60 samples/sec   Loss 2.8409   LearningRate 0.0115   Epoch: 16   Global Step: 175730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:44:02,097-Speed 5968.99 samples/sec   Loss 2.7827   LearningRate 0.0115   Epoch: 16   Global Step: 175740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:44:08,961-Speed 5969.33 samples/sec   Loss 2.7925   LearningRate 0.0115   Epoch: 16   Global Step: 175750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:44:15,826-Speed 5968.52 samples/sec   Loss 2.8436   LearningRate 0.0115   Epoch: 16   Global Step: 175760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:44:22,689-Speed 5969.24 samples/sec   Loss 2.8386   LearningRate 0.0115   Epoch: 16   Global Step: 175770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:44:29,554-Speed 5967.99 samples/sec   Loss 2.8416   LearningRate 0.0115   Epoch: 16   Global Step: 175780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:44:36,402-Speed 5981.92 samples/sec   Loss 2.7980   LearningRate 0.0115   Epoch: 16   Global Step: 175790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:44:43,247-Speed 5987.05 samples/sec   Loss 2.8688   LearningRate 0.0115   Epoch: 16   Global Step: 175800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:44:50,099-Speed 5978.67 samples/sec   Loss 2.8263   LearningRate 0.0114   Epoch: 16   Global Step: 175810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:44:56,952-Speed 5978.38 samples/sec   Loss 2.8442   LearningRate 0.0114   Epoch: 16   Global Step: 175820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:45:03,817-Speed 5967.42 samples/sec   Loss 2.8212   LearningRate 0.0114   Epoch: 16   Global Step: 175830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:45:10,688-Speed 5962.39 samples/sec   Loss 2.7901   LearningRate 0.0114   Epoch: 16   Global Step: 175840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:45:17,571-Speed 5951.78 samples/sec   Loss 2.8100   LearningRate 0.0114   Epoch: 16   Global Step: 175850   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:45:24,447-Speed 5958.89 samples/sec   Loss 2.8421   LearningRate 0.0114   Epoch: 16   Global Step: 175860   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:45:31,299-Speed 5978.93 samples/sec   Loss 2.8108   LearningRate 0.0114   Epoch: 16   Global Step: 175870   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:45:38,180-Speed 5953.62 samples/sec   Loss 2.7786   LearningRate 0.0114   Epoch: 16   Global Step: 175880   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:45:45,055-Speed 5959.13 samples/sec   Loss 2.8000   LearningRate 0.0114   Epoch: 16   Global Step: 175890   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:45:51,908-Speed 5978.13 samples/sec   Loss 2.8223   LearningRate 0.0114   Epoch: 16   Global Step: 175900   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:45:58,788-Speed 5954.29 samples/sec   Loss 2.7703   LearningRate 0.0114   Epoch: 16   Global Step: 175910   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:46:05,662-Speed 5960.51 samples/sec   Loss 2.7733   LearningRate 0.0114   Epoch: 16   Global Step: 175920   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:46:12,554-Speed 5944.52 samples/sec   Loss 2.7659   LearningRate 0.0114   Epoch: 16   Global Step: 175930   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:46:19,521-Speed 5879.61 samples/sec   Loss 2.8275   LearningRate 0.0114   Epoch: 16   Global Step: 175940   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:46:26,481-Speed 5886.85 samples/sec   Loss 2.8177   LearningRate 0.0113   Epoch: 16   Global Step: 175950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:46:33,366-Speed 5950.01 samples/sec   Loss 2.7706   LearningRate 0.0113   Epoch: 16   Global Step: 175960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:46:40,224-Speed 5973.97 samples/sec   Loss 2.8103   LearningRate 0.0113   Epoch: 16   Global Step: 175970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:46:47,078-Speed 5976.80 samples/sec   Loss 2.7977   LearningRate 0.0113   Epoch: 16   Global Step: 175980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:46:54,005-Speed 5913.95 samples/sec   Loss 2.8141   LearningRate 0.0113   Epoch: 16   Global Step: 175990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:47:00,856-Speed 5980.18 samples/sec   Loss 2.8289   LearningRate 0.0113   Epoch: 16   Global Step: 176000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:47:07,751-Speed 5941.74 samples/sec   Loss 2.8048   LearningRate 0.0113   Epoch: 16   Global Step: 176010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:47:14,613-Speed 5970.07 samples/sec   Loss 2.8305   LearningRate 0.0113   Epoch: 16   Global Step: 176020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:47:21,479-Speed 5967.00 samples/sec   Loss 2.7841   LearningRate 0.0113   Epoch: 16   Global Step: 176030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:47:28,346-Speed 5968.53 samples/sec   Loss 2.8135   LearningRate 0.0113   Epoch: 16   Global Step: 176040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:47:35,253-Speed 5932.87 samples/sec   Loss 2.8589   LearningRate 0.0113   Epoch: 16   Global Step: 176050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-09 06:47:42,099-Speed 5984.03 samples/sec   Loss 2.7736   LearningRate 0.0113   Epoch: 16   Global Step: 176060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:47:48,966-Speed 5965.87 samples/sec   Loss 2.8245   LearningRate 0.0113   Epoch: 16   Global Step: 176070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:47:55,832-Speed 5966.68 samples/sec   Loss 2.7940   LearningRate 0.0112   Epoch: 16   Global Step: 176080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:48:02,719-Speed 5948.91 samples/sec   Loss 2.7934   LearningRate 0.0112   Epoch: 16   Global Step: 176090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:48:09,593-Speed 5959.83 samples/sec   Loss 2.7814   LearningRate 0.0112   Epoch: 16   Global Step: 176100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:48:16,459-Speed 5966.83 samples/sec   Loss 2.8333   LearningRate 0.0112   Epoch: 16   Global Step: 176110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:48:23,320-Speed 5970.88 samples/sec   Loss 2.8079   LearningRate 0.0112   Epoch: 16   Global Step: 176120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:48:30,193-Speed 5961.04 samples/sec   Loss 2.8043   LearningRate 0.0112   Epoch: 16   Global Step: 176130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:48:37,076-Speed 5952.83 samples/sec   Loss 2.7947   LearningRate 0.0112   Epoch: 16   Global Step: 176140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:48:43,947-Speed 5962.37 samples/sec   Loss 2.7758   LearningRate 0.0112   Epoch: 16   Global Step: 176150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:48:50,839-Speed 5947.49 samples/sec   Loss 2.8168   LearningRate 0.0112   Epoch: 16   Global Step: 176160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:48:57,720-Speed 5954.13 samples/sec   Loss 2.7860   LearningRate 0.0112   Epoch: 16   Global Step: 176170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:49:04,591-Speed 5961.74 samples/sec   Loss 2.8099   LearningRate 0.0112   Epoch: 16   Global Step: 176180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:49:11,443-Speed 5979.53 samples/sec   Loss 2.8403   LearningRate 0.0112   Epoch: 16   Global Step: 176190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:49:18,295-Speed 5978.47 samples/sec   Loss 2.7343   LearningRate 0.0112   Epoch: 16   Global Step: 176200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:49:25,162-Speed 5965.52 samples/sec   Loss 2.7701   LearningRate 0.0112   Epoch: 16   Global Step: 176210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:49:32,023-Speed 5971.19 samples/sec   Loss 2.8083   LearningRate 0.0111   Epoch: 16   Global Step: 176220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:49:38,894-Speed 5962.71 samples/sec   Loss 2.7771   LearningRate 0.0111   Epoch: 16   Global Step: 176230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:49:45,758-Speed 5968.28 samples/sec   Loss 2.8028   LearningRate 0.0111   Epoch: 16   Global Step: 176240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:49:52,630-Speed 5962.20 samples/sec   Loss 2.7874   LearningRate 0.0111   Epoch: 16   Global Step: 176250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:49:59,479-Speed 5981.39 samples/sec   Loss 2.8081   LearningRate 0.0111   Epoch: 16   Global Step: 176260   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:50:06,338-Speed 5973.11 samples/sec   Loss 2.7670   LearningRate 0.0111   Epoch: 16   Global Step: 176270   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:50:13,194-Speed 5976.14 samples/sec   Loss 2.8266   LearningRate 0.0111   Epoch: 16   Global Step: 176280   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:50:36,588-Speed 1751.01 samples/sec   Loss 2.8342   LearningRate 0.0111   Epoch: 17   Global Step: 176290   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:50:43,436-Speed 5983.23 samples/sec   Loss 2.7696   LearningRate 0.0111   Epoch: 17   Global Step: 176300   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:50:50,279-Speed 5986.51 samples/sec   Loss 2.8336   LearningRate 0.0111   Epoch: 17   Global Step: 176310   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:50:57,109-Speed 5998.75 samples/sec   Loss 2.8112   LearningRate 0.0111   Epoch: 17   Global Step: 176320   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:51:03,966-Speed 5976.44 samples/sec   Loss 2.8130   LearningRate 0.0111   Epoch: 17   Global Step: 176330   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:51:10,808-Speed 5987.19 samples/sec   Loss 2.7831   LearningRate 0.0111   Epoch: 17   Global Step: 176340   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:51:17,637-Speed 5998.96 samples/sec   Loss 2.7881   LearningRate 0.0111   Epoch: 17   Global Step: 176350   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:51:24,492-Speed 5976.69 samples/sec   Loss 2.7898   LearningRate 0.0110   Epoch: 17   Global Step: 176360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:51:31,332-Speed 5990.14 samples/sec   Loss 2.7399   LearningRate 0.0110   Epoch: 17   Global Step: 176370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:51:38,204-Speed 5961.03 samples/sec   Loss 2.7776   LearningRate 0.0110   Epoch: 17   Global Step: 176380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:51:45,072-Speed 5964.72 samples/sec   Loss 2.7911   LearningRate 0.0110   Epoch: 17   Global Step: 176390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:51:51,916-Speed 5986.23 samples/sec   Loss 2.7523   LearningRate 0.0110   Epoch: 17   Global Step: 176400   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:51:58,796-Speed 5954.21 samples/sec   Loss 2.7906   LearningRate 0.0110   Epoch: 17   Global Step: 176410   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:52:05,656-Speed 5972.76 samples/sec   Loss 2.7823   LearningRate 0.0110   Epoch: 17   Global Step: 176420   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:52:12,509-Speed 5977.45 samples/sec   Loss 2.7766   LearningRate 0.0110   Epoch: 17   Global Step: 176430   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:52:19,384-Speed 5959.60 samples/sec   Loss 2.7870   LearningRate 0.0110   Epoch: 17   Global Step: 176440   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:52:26,249-Speed 5967.62 samples/sec   Loss 2.7499   LearningRate 0.0110   Epoch: 17   Global Step: 176450   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:52:33,105-Speed 5974.73 samples/sec   Loss 2.7716   LearningRate 0.0110   Epoch: 17   Global Step: 176460   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:52:39,980-Speed 5959.36 samples/sec   Loss 2.7529   LearningRate 0.0110   Epoch: 17   Global Step: 176470   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:52:46,830-Speed 5980.82 samples/sec   Loss 2.7411   LearningRate 0.0110   Epoch: 17   Global Step: 176480   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:52:53,691-Speed 5971.26 samples/sec   Loss 2.7373   LearningRate 0.0110   Epoch: 17   Global Step: 176490   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:53:00,575-Speed 5951.03 samples/sec   Loss 2.7710   LearningRate 0.0109   Epoch: 17   Global Step: 176500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:53:07,447-Speed 5961.31 samples/sec   Loss 2.7767   LearningRate 0.0109   Epoch: 17   Global Step: 176510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:53:14,320-Speed 5960.17 samples/sec   Loss 2.7517   LearningRate 0.0109   Epoch: 17   Global Step: 176520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:53:21,179-Speed 5973.17 samples/sec   Loss 2.7669   LearningRate 0.0109   Epoch: 17   Global Step: 176530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:53:28,047-Speed 5964.77 samples/sec   Loss 2.7473   LearningRate 0.0109   Epoch: 17   Global Step: 176540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:53:34,898-Speed 5979.90 samples/sec   Loss 2.7134   LearningRate 0.0109   Epoch: 17   Global Step: 176550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:53:41,759-Speed 5971.17 samples/sec   Loss 2.7529   LearningRate 0.0109   Epoch: 17   Global Step: 176560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:53:48,613-Speed 5976.83 samples/sec   Loss 2.7694   LearningRate 0.0109   Epoch: 17   Global Step: 176570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:53:55,473-Speed 5971.51 samples/sec   Loss 2.7732   LearningRate 0.0109   Epoch: 17   Global Step: 176580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:54:02,338-Speed 5968.17 samples/sec   Loss 2.7705   LearningRate 0.0109   Epoch: 17   Global Step: 176590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:54:09,225-Speed 5948.38 samples/sec   Loss 2.7651   LearningRate 0.0109   Epoch: 17   Global Step: 176600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-09 06:54:16,080-Speed 5976.54 samples/sec   Loss 2.7475   LearningRate 0.0109   Epoch: 17   Global Step: 176610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:54:22,927-Speed 5982.67 samples/sec   Loss 2.7269   LearningRate 0.0109   Epoch: 17   Global Step: 176620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:54:29,790-Speed 5969.41 samples/sec   Loss 2.8017   LearningRate 0.0109   Epoch: 17   Global Step: 176630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:54:36,650-Speed 5973.24 samples/sec   Loss 2.7319   LearningRate 0.0109   Epoch: 17   Global Step: 176640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:54:43,506-Speed 5977.92 samples/sec   Loss 2.7446   LearningRate 0.0108   Epoch: 17   Global Step: 176650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:54:50,362-Speed 5975.75 samples/sec   Loss 2.7489   LearningRate 0.0108   Epoch: 17   Global Step: 176660   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:54:57,238-Speed 5958.11 samples/sec   Loss 2.7382   LearningRate 0.0108   Epoch: 17   Global Step: 176670   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:55:04,091-Speed 5978.32 samples/sec   Loss 2.7719   LearningRate 0.0108   Epoch: 17   Global Step: 176680   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:55:10,961-Speed 5963.56 samples/sec   Loss 2.8172   LearningRate 0.0108   Epoch: 17   Global Step: 176690   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:55:17,838-Speed 5957.57 samples/sec   Loss 2.7730   LearningRate 0.0108   Epoch: 17   Global Step: 176700   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:55:24,708-Speed 5963.80 samples/sec   Loss 2.7629   LearningRate 0.0108   Epoch: 17   Global Step: 176710   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:55:31,571-Speed 5968.82 samples/sec   Loss 2.7439   LearningRate 0.0108   Epoch: 17   Global Step: 176720   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:55:38,448-Speed 5958.73 samples/sec   Loss 2.7618   LearningRate 0.0108   Epoch: 17   Global Step: 176730   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:55:45,304-Speed 5976.90 samples/sec   Loss 2.7684   LearningRate 0.0108   Epoch: 17   Global Step: 176740   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:55:52,172-Speed 5964.68 samples/sec   Loss 2.7666   LearningRate 0.0108   Epoch: 17   Global Step: 176750   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:55:59,038-Speed 5966.74 samples/sec   Loss 2.7383   LearningRate 0.0108   Epoch: 17   Global Step: 176760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:56:05,913-Speed 5959.44 samples/sec   Loss 2.7577   LearningRate 0.0108   Epoch: 17   Global Step: 176770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:56:12,776-Speed 5969.20 samples/sec   Loss 2.7443   LearningRate 0.0108   Epoch: 17   Global Step: 176780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:56:19,646-Speed 5963.82 samples/sec   Loss 2.7274   LearningRate 0.0107   Epoch: 17   Global Step: 176790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:56:26,546-Speed 5941.24 samples/sec   Loss 2.7479   LearningRate 0.0107   Epoch: 17   Global Step: 176800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:56:33,412-Speed 5967.26 samples/sec   Loss 2.7575   LearningRate 0.0107   Epoch: 17   Global Step: 176810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:56:40,262-Speed 5980.49 samples/sec   Loss 2.7609   LearningRate 0.0107   Epoch: 17   Global Step: 176820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:56:47,110-Speed 5983.12 samples/sec   Loss 2.7605   LearningRate 0.0107   Epoch: 17   Global Step: 176830   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:56:53,982-Speed 5964.07 samples/sec   Loss 2.7814   LearningRate 0.0107   Epoch: 17   Global Step: 176840   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:57:00,867-Speed 5950.77 samples/sec   Loss 2.7255   LearningRate 0.0107   Epoch: 17   Global Step: 176850   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:57:07,720-Speed 5977.98 samples/sec   Loss 2.7519   LearningRate 0.0107   Epoch: 17   Global Step: 176860   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:57:14,580-Speed 5972.01 samples/sec   Loss 2.7184   LearningRate 0.0107   Epoch: 17   Global Step: 176870   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:57:21,536-Speed 5889.79 samples/sec   Loss 2.7707   LearningRate 0.0107   Epoch: 17   Global Step: 176880   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:57:28,476-Speed 5903.30 samples/sec   Loss 2.7278   LearningRate 0.0107   Epoch: 17   Global Step: 176890   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:57:35,324-Speed 5982.36 samples/sec   Loss 2.7199   LearningRate 0.0107   Epoch: 17   Global Step: 176900   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:57:42,186-Speed 5970.14 samples/sec   Loss 2.7534   LearningRate 0.0107   Epoch: 17   Global Step: 176910   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:57:49,072-Speed 5948.63 samples/sec   Loss 2.7676   LearningRate 0.0107   Epoch: 17   Global Step: 176920   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:57:55,953-Speed 5954.50 samples/sec   Loss 2.7259   LearningRate 0.0106   Epoch: 17   Global Step: 176930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:58:02,819-Speed 5966.91 samples/sec   Loss 2.7272   LearningRate 0.0106   Epoch: 17   Global Step: 176940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:58:09,667-Speed 5982.34 samples/sec   Loss 2.7554   LearningRate 0.0106   Epoch: 17   Global Step: 176950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:58:16,520-Speed 5978.31 samples/sec   Loss 2.7691   LearningRate 0.0106   Epoch: 17   Global Step: 176960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:58:23,396-Speed 5958.31 samples/sec   Loss 2.7543   LearningRate 0.0106   Epoch: 17   Global Step: 176970   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:58:30,244-Speed 5982.96 samples/sec   Loss 2.7015   LearningRate 0.0106   Epoch: 17   Global Step: 176980   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:58:37,109-Speed 5967.40 samples/sec   Loss 2.7299   LearningRate 0.0106   Epoch: 17   Global Step: 176990   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:58:43,969-Speed 5974.12 samples/sec   Loss 2.7385   LearningRate 0.0106   Epoch: 17   Global Step: 177000   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:58:50,821-Speed 5978.82 samples/sec   Loss 2.7307   LearningRate 0.0106   Epoch: 17   Global Step: 177010   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:58:57,682-Speed 5971.76 samples/sec   Loss 2.7537   LearningRate 0.0106   Epoch: 17   Global Step: 177020   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:59:04,530-Speed 5982.64 samples/sec   Loss 2.7442   LearningRate 0.0106   Epoch: 17   Global Step: 177030   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:59:11,384-Speed 5976.85 samples/sec   Loss 2.7154   LearningRate 0.0106   Epoch: 17   Global Step: 177040   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:59:18,277-Speed 5943.35 samples/sec   Loss 2.7429   LearningRate 0.0106   Epoch: 17   Global Step: 177050   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:59:25,188-Speed 5927.71 samples/sec   Loss 2.7060   LearningRate 0.0106   Epoch: 17   Global Step: 177060   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 06:59:32,086-Speed 5939.62 samples/sec   Loss 2.7132   LearningRate 0.0105   Epoch: 17   Global Step: 177070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:59:38,950-Speed 5968.92 samples/sec   Loss 2.7581   LearningRate 0.0105   Epoch: 17   Global Step: 177080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:59:45,812-Speed 5969.55 samples/sec   Loss 2.7497   LearningRate 0.0105   Epoch: 17   Global Step: 177090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:59:52,715-Speed 5934.56 samples/sec   Loss 2.7152   LearningRate 0.0105   Epoch: 17   Global Step: 177100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 06:59:59,566-Speed 5980.01 samples/sec   Loss 2.7541   LearningRate 0.0105   Epoch: 17   Global Step: 177110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:00:06,445-Speed 5955.61 samples/sec   Loss 2.7858   LearningRate 0.0105   Epoch: 17   Global Step: 177120   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:00:13,319-Speed 5960.12 samples/sec   Loss 2.7116   LearningRate 0.0105   Epoch: 17   Global Step: 177130   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:00:20,174-Speed 5976.57 samples/sec   Loss 2.7126   LearningRate 0.0105   Epoch: 17   Global Step: 177140   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:00:27,032-Speed 5973.47 samples/sec   Loss 2.7134   LearningRate 0.0105   Epoch: 17   Global Step: 177150   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:00:33,881-Speed 5981.36 samples/sec   Loss 2.7265   LearningRate 0.0105   Epoch: 17   Global Step: 177160   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:00:40,722-Speed 5988.37 samples/sec   Loss 2.7383   LearningRate 0.0105   Epoch: 17   Global Step: 177170   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:00:47,596-Speed 5960.03 samples/sec   Loss 2.7375   LearningRate 0.0105   Epoch: 17   Global Step: 177180   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:00:54,450-Speed 5976.92 samples/sec   Loss 2.7458   LearningRate 0.0105   Epoch: 17   Global Step: 177190   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:01:01,328-Speed 5956.77 samples/sec   Loss 2.7218   LearningRate 0.0105   Epoch: 17   Global Step: 177200   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:01:08,204-Speed 5958.05 samples/sec   Loss 2.7450   LearningRate 0.0105   Epoch: 17   Global Step: 177210   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:01:15,045-Speed 5987.55 samples/sec   Loss 2.7078   LearningRate 0.0104   Epoch: 17   Global Step: 177220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:01:21,918-Speed 5963.41 samples/sec   Loss 2.6950   LearningRate 0.0104   Epoch: 17   Global Step: 177230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:01:28,854-Speed 5906.47 samples/sec   Loss 2.6957   LearningRate 0.0104   Epoch: 17   Global Step: 177240   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:01:35,778-Speed 5917.42 samples/sec   Loss 2.6837   LearningRate 0.0104   Epoch: 17   Global Step: 177250   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:01:42,645-Speed 5965.00 samples/sec   Loss 2.6934   LearningRate 0.0104   Epoch: 17   Global Step: 177260   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:01:49,492-Speed 5984.90 samples/sec   Loss 2.7672   LearningRate 0.0104   Epoch: 17   Global Step: 177270   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:01:56,369-Speed 5957.11 samples/sec   Loss 2.7568   LearningRate 0.0104   Epoch: 17   Global Step: 177280   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:02:03,222-Speed 5980.59 samples/sec   Loss 2.7230   LearningRate 0.0104   Epoch: 17   Global Step: 177290   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:02:10,081-Speed 5972.49 samples/sec   Loss 2.6781   LearningRate 0.0104   Epoch: 17   Global Step: 177300   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:02:16,965-Speed 5953.57 samples/sec   Loss 2.7079   LearningRate 0.0104   Epoch: 17   Global Step: 177310   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:02:23,830-Speed 5968.01 samples/sec   Loss 2.7463   LearningRate 0.0104   Epoch: 17   Global Step: 177320   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:02:30,702-Speed 5961.99 samples/sec   Loss 2.6582   LearningRate 0.0104   Epoch: 17   Global Step: 177330   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:02:37,559-Speed 5974.77 samples/sec   Loss 2.7323   LearningRate 0.0104   Epoch: 17   Global Step: 177340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:02:44,427-Speed 5965.21 samples/sec   Loss 2.6858   LearningRate 0.0104   Epoch: 17   Global Step: 177350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:02:51,286-Speed 5972.54 samples/sec   Loss 2.7400   LearningRate 0.0103   Epoch: 17   Global Step: 177360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:02:58,167-Speed 5953.46 samples/sec   Loss 2.6829   LearningRate 0.0103   Epoch: 17   Global Step: 177370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:03:05,071-Speed 5933.99 samples/sec   Loss 2.7211   LearningRate 0.0103   Epoch: 17   Global Step: 177380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:03:11,929-Speed 5973.49 samples/sec   Loss 2.6947   LearningRate 0.0103   Epoch: 17   Global Step: 177390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:03:18,796-Speed 5966.06 samples/sec   Loss 2.6992   LearningRate 0.0103   Epoch: 17   Global Step: 177400   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:03:25,647-Speed 5979.77 samples/sec   Loss 2.7105   LearningRate 0.0103   Epoch: 17   Global Step: 177410   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:03:32,499-Speed 5979.19 samples/sec   Loss 2.6804   LearningRate 0.0103   Epoch: 17   Global Step: 177420   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:03:39,361-Speed 5970.33 samples/sec   Loss 2.7121   LearningRate 0.0103   Epoch: 17   Global Step: 177430   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:03:46,253-Speed 5946.69 samples/sec   Loss 2.6996   LearningRate 0.0103   Epoch: 17   Global Step: 177440   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:03:53,113-Speed 5971.83 samples/sec   Loss 2.6657   LearningRate 0.0103   Epoch: 17   Global Step: 177450   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:03:59,980-Speed 5966.34 samples/sec   Loss 2.7297   LearningRate 0.0103   Epoch: 17   Global Step: 177460   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:04:06,836-Speed 5974.70 samples/sec   Loss 2.6997   LearningRate 0.0103   Epoch: 17   Global Step: 177470   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:04:13,698-Speed 5970.83 samples/sec   Loss 2.7366   LearningRate 0.0103   Epoch: 17   Global Step: 177480   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:04:20,546-Speed 5983.18 samples/sec   Loss 2.6768   LearningRate 0.0103   Epoch: 17   Global Step: 177490   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:04:27,398-Speed 5978.73 samples/sec   Loss 2.7018   LearningRate 0.0103   Epoch: 17   Global Step: 177500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:04:34,259-Speed 5971.13 samples/sec   Loss 2.6961   LearningRate 0.0102   Epoch: 17   Global Step: 177510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:04:41,117-Speed 5973.58 samples/sec   Loss 2.7023   LearningRate 0.0102   Epoch: 17   Global Step: 177520   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:04:48,025-Speed 5930.36 samples/sec   Loss 2.7178   LearningRate 0.0102   Epoch: 17   Global Step: 177530   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:04:54,907-Speed 5954.44 samples/sec   Loss 2.6888   LearningRate 0.0102   Epoch: 17   Global Step: 177540   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:05:01,756-Speed 5980.81 samples/sec   Loss 2.7121   LearningRate 0.0102   Epoch: 17   Global Step: 177550   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:05:08,625-Speed 5966.65 samples/sec   Loss 2.7158   LearningRate 0.0102   Epoch: 17   Global Step: 177560   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:05:15,497-Speed 5961.78 samples/sec   Loss 2.6914   LearningRate 0.0102   Epoch: 17   Global Step: 177570   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:05:22,376-Speed 5957.01 samples/sec   Loss 2.6658   LearningRate 0.0102   Epoch: 17   Global Step: 177580   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:05:29,248-Speed 5961.67 samples/sec   Loss 2.7230   LearningRate 0.0102   Epoch: 17   Global Step: 177590   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:05:36,180-Speed 5910.33 samples/sec   Loss 2.6782   LearningRate 0.0102   Epoch: 17   Global Step: 177600   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:05:43,022-Speed 5988.28 samples/sec   Loss 2.7075   LearningRate 0.0102   Epoch: 17   Global Step: 177610   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:05:49,987-Speed 5882.02 samples/sec   Loss 2.6942   LearningRate 0.0102   Epoch: 17   Global Step: 177620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:05:56,959-Speed 5876.81 samples/sec   Loss 2.7280   LearningRate 0.0102   Epoch: 17   Global Step: 177630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:06:03,920-Speed 5885.07 samples/sec   Loss 2.6758   LearningRate 0.0102   Epoch: 17   Global Step: 177640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:06:10,868-Speed 5896.23 samples/sec   Loss 2.7089   LearningRate 0.0101   Epoch: 17   Global Step: 177650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:06:17,712-Speed 5986.31 samples/sec   Loss 2.7021   LearningRate 0.0101   Epoch: 17   Global Step: 177660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:06:24,576-Speed 5968.69 samples/sec   Loss 2.6602   LearningRate 0.0101   Epoch: 17   Global Step: 177670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:06:31,432-Speed 5976.11 samples/sec   Loss 2.6651   LearningRate 0.0101   Epoch: 17   Global Step: 177680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:06:38,279-Speed 5983.96 samples/sec   Loss 2.6964   LearningRate 0.0101   Epoch: 17   Global Step: 177690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:06:45,136-Speed 5974.95 samples/sec   Loss 2.6735   LearningRate 0.0101   Epoch: 17   Global Step: 177700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:06:52,117-Speed 5871.43 samples/sec   Loss 2.6883   LearningRate 0.0101   Epoch: 17   Global Step: 177710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:06:59,060-Speed 5900.75 samples/sec   Loss 2.6599   LearningRate 0.0101   Epoch: 17   Global Step: 177720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-09 07:07:05,924-Speed 5968.90 samples/sec   Loss 2.7087   LearningRate 0.0101   Epoch: 17   Global Step: 177730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-09 07:07:12,779-Speed 5977.01 samples/sec   Loss 2.6558   LearningRate 0.0101   Epoch: 17   Global Step: 177740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:07:19,649-Speed 5966.28 samples/sec   Loss 2.6780   LearningRate 0.0101   Epoch: 17   Global Step: 177750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:07:26,508-Speed 5974.04 samples/sec   Loss 2.7003   LearningRate 0.0101   Epoch: 17   Global Step: 177760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:07:33,377-Speed 5964.51 samples/sec   Loss 2.6609   LearningRate 0.0101   Epoch: 17   Global Step: 177770   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:07:40,231-Speed 5977.18 samples/sec   Loss 2.6851   LearningRate 0.0101   Epoch: 17   Global Step: 177780   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:07:47,098-Speed 5965.83 samples/sec   Loss 2.6684   LearningRate 0.0101   Epoch: 17   Global Step: 177790   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:07:53,951-Speed 5978.18 samples/sec   Loss 2.7002   LearningRate 0.0100   Epoch: 17   Global Step: 177800   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:08:00,819-Speed 5965.15 samples/sec   Loss 2.6883   LearningRate 0.0100   Epoch: 17   Global Step: 177810   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:08:07,697-Speed 5958.08 samples/sec   Loss 2.6874   LearningRate 0.0100   Epoch: 17   Global Step: 177820   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:08:14,573-Speed 5959.15 samples/sec   Loss 2.6956   LearningRate 0.0100   Epoch: 17   Global Step: 177830   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:08:21,431-Speed 5974.19 samples/sec   Loss 2.7028   LearningRate 0.0100   Epoch: 17   Global Step: 177840   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:08:28,289-Speed 5973.32 samples/sec   Loss 2.7089   LearningRate 0.0100   Epoch: 17   Global Step: 177850   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:08:35,141-Speed 5981.31 samples/sec   Loss 2.6851   LearningRate 0.0100   Epoch: 17   Global Step: 177860   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:08:42,023-Speed 5952.47 samples/sec   Loss 2.7273   LearningRate 0.0100   Epoch: 17   Global Step: 177870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:08:48,868-Speed 5985.53 samples/sec   Loss 2.6737   LearningRate 0.0100   Epoch: 17   Global Step: 177880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:08:55,726-Speed 5973.53 samples/sec   Loss 2.6815   LearningRate 0.0100   Epoch: 17   Global Step: 177890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:09:02,576-Speed 5980.76 samples/sec   Loss 2.6612   LearningRate 0.0100   Epoch: 17   Global Step: 177900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:09:09,428-Speed 5978.46 samples/sec   Loss 2.6966   LearningRate 0.0100   Epoch: 17   Global Step: 177910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:09:16,306-Speed 5957.29 samples/sec   Loss 2.6776   LearningRate 0.0100   Epoch: 17   Global Step: 177920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:09:23,179-Speed 5960.68 samples/sec   Loss 2.6534   LearningRate 0.0100   Epoch: 17   Global Step: 177930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:09:30,041-Speed 5969.78 samples/sec   Loss 2.6866   LearningRate 0.0100   Epoch: 17   Global Step: 177940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:09:36,932-Speed 5945.30 samples/sec   Loss 2.6658   LearningRate 0.0099   Epoch: 17   Global Step: 177950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:09:43,795-Speed 5969.67 samples/sec   Loss 2.6722   LearningRate 0.0099   Epoch: 17   Global Step: 177960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:09:50,666-Speed 5962.15 samples/sec   Loss 2.6726   LearningRate 0.0099   Epoch: 17   Global Step: 177970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-09 07:09:57,514-Speed 5982.66 samples/sec   Loss 2.6688   LearningRate 0.0099   Epoch: 17   Global Step: 177980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:10:04,364-Speed 5980.13 samples/sec   Loss 2.6644   LearningRate 0.0099   Epoch: 17   Global Step: 177990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:10:11,237-Speed 5960.46 samples/sec   Loss 2.6361   LearningRate 0.0099   Epoch: 17   Global Step: 178000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:10:18,100-Speed 5969.99 samples/sec   Loss 2.6482   LearningRate 0.0099   Epoch: 17   Global Step: 178010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:10:24,943-Speed 5986.78 samples/sec   Loss 2.6211   LearningRate 0.0099   Epoch: 17   Global Step: 178020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:10:31,805-Speed 5970.14 samples/sec   Loss 2.6497   LearningRate 0.0099   Epoch: 17   Global Step: 178030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:10:38,661-Speed 5974.87 samples/sec   Loss 2.6481   LearningRate 0.0099   Epoch: 17   Global Step: 178040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:10:45,512-Speed 5979.71 samples/sec   Loss 2.6432   LearningRate 0.0099   Epoch: 17   Global Step: 178050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:10:52,404-Speed 5943.52 samples/sec   Loss 2.6572   LearningRate 0.0099   Epoch: 17   Global Step: 178060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:10:59,259-Speed 5978.97 samples/sec   Loss 2.7166   LearningRate 0.0099   Epoch: 17   Global Step: 178070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:11:06,113-Speed 5980.79 samples/sec   Loss 2.6622   LearningRate 0.0099   Epoch: 17   Global Step: 178080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-09 07:11:12,963-Speed 5980.16 samples/sec   Loss 2.6457   LearningRate 0.0099   Epoch: 17   Global Step: 178090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-09 07:11:19,808-Speed 5985.48 samples/sec   Loss 2.6497   LearningRate 0.0098   Epoch: 17   Global Step: 178100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:11:26,662-Speed 5977.84 samples/sec   Loss 2.6722   LearningRate 0.0098   Epoch: 17   Global Step: 178110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:11:33,522-Speed 5975.22 samples/sec   Loss 2.6126   LearningRate 0.0098   Epoch: 17   Global Step: 178120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:11:40,381-Speed 5972.05 samples/sec   Loss 2.6506   LearningRate 0.0098   Epoch: 17   Global Step: 178130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:11:47,242-Speed 5971.21 samples/sec   Loss 2.6819   LearningRate 0.0098   Epoch: 17   Global Step: 178140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:11:54,093-Speed 5979.09 samples/sec   Loss 2.6359   LearningRate 0.0098   Epoch: 17   Global Step: 178150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:12:00,989-Speed 5943.32 samples/sec   Loss 2.6808   LearningRate 0.0098   Epoch: 17   Global Step: 178160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:12:07,861-Speed 5961.62 samples/sec   Loss 2.6683   LearningRate 0.0098   Epoch: 17   Global Step: 178170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:12:14,716-Speed 5975.57 samples/sec   Loss 2.6793   LearningRate 0.0098   Epoch: 17   Global Step: 178180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:12:21,573-Speed 5974.96 samples/sec   Loss 2.6265   LearningRate 0.0098   Epoch: 17   Global Step: 178190   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:12:28,456-Speed 5951.64 samples/sec   Loss 2.6476   LearningRate 0.0098   Epoch: 17   Global Step: 178200   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:12:35,308-Speed 5979.34 samples/sec   Loss 2.6298   LearningRate 0.0098   Epoch: 17   Global Step: 178210   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:12:42,150-Speed 5988.53 samples/sec   Loss 2.6537   LearningRate 0.0098   Epoch: 17   Global Step: 178220   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:12:49,019-Speed 5963.88 samples/sec   Loss 2.6294   LearningRate 0.0098   Epoch: 17   Global Step: 178230   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:12:55,882-Speed 5969.38 samples/sec   Loss 2.6423   LearningRate 0.0098   Epoch: 17   Global Step: 178240   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:13:02,756-Speed 5960.86 samples/sec   Loss 2.6870   LearningRate 0.0097   Epoch: 17   Global Step: 178250   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:13:09,608-Speed 5979.25 samples/sec   Loss 2.6799   LearningRate 0.0097   Epoch: 17   Global Step: 178260   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:13:16,458-Speed 5980.59 samples/sec   Loss 2.6866   LearningRate 0.0097   Epoch: 17   Global Step: 178270   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:13:23,324-Speed 5966.81 samples/sec   Loss 2.6436   LearningRate 0.0097   Epoch: 17   Global Step: 178280   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:13:30,190-Speed 5966.38 samples/sec   Loss 2.6343   LearningRate 0.0097   Epoch: 17   Global Step: 178290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:13:37,067-Speed 5958.27 samples/sec   Loss 2.6714   LearningRate 0.0097   Epoch: 17   Global Step: 178300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:13:43,926-Speed 5973.19 samples/sec   Loss 2.6466   LearningRate 0.0097   Epoch: 17   Global Step: 178310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:13:50,782-Speed 5975.19 samples/sec   Loss 2.6617   LearningRate 0.0097   Epoch: 17   Global Step: 178320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:13:57,643-Speed 5971.05 samples/sec   Loss 2.6392   LearningRate 0.0097   Epoch: 17   Global Step: 178330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:14:04,493-Speed 5981.60 samples/sec   Loss 2.6609   LearningRate 0.0097   Epoch: 17   Global Step: 178340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:14:11,339-Speed 5984.53 samples/sec   Loss 2.6645   LearningRate 0.0097   Epoch: 17   Global Step: 178350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:14:18,184-Speed 5985.13 samples/sec   Loss 2.6379   LearningRate 0.0097   Epoch: 17   Global Step: 178360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:14:25,038-Speed 5976.60 samples/sec   Loss 2.6393   LearningRate 0.0097   Epoch: 17   Global Step: 178370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:14:31,889-Speed 5982.03 samples/sec   Loss 2.6425   LearningRate 0.0097   Epoch: 17   Global Step: 178380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:14:38,730-Speed 5989.50 samples/sec   Loss 2.6166   LearningRate 0.0097   Epoch: 17   Global Step: 178390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:14:45,607-Speed 5956.59 samples/sec   Loss 2.6411   LearningRate 0.0096   Epoch: 17   Global Step: 178400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:14:52,451-Speed 5986.48 samples/sec   Loss 2.6095   LearningRate 0.0096   Epoch: 17   Global Step: 178410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:14:59,289-Speed 5990.43 samples/sec   Loss 2.6281   LearningRate 0.0096   Epoch: 17   Global Step: 178420   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:15:06,173-Speed 5951.20 samples/sec   Loss 2.6619   LearningRate 0.0096   Epoch: 17   Global Step: 178430   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:15:13,028-Speed 5980.28 samples/sec   Loss 2.6304   LearningRate 0.0096   Epoch: 17   Global Step: 178440   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:15:19,879-Speed 5979.59 samples/sec   Loss 2.6375   LearningRate 0.0096   Epoch: 17   Global Step: 178450   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:15:26,729-Speed 5983.10 samples/sec   Loss 2.6484   LearningRate 0.0096   Epoch: 17   Global Step: 178460   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:15:33,579-Speed 5982.01 samples/sec   Loss 2.6184   LearningRate 0.0096   Epoch: 17   Global Step: 178470   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:15:40,436-Speed 5974.89 samples/sec   Loss 2.6722   LearningRate 0.0096   Epoch: 17   Global Step: 178480   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:15:47,296-Speed 5971.74 samples/sec   Loss 2.5991   LearningRate 0.0096   Epoch: 17   Global Step: 178490   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:15:54,154-Speed 5973.85 samples/sec   Loss 2.6534   LearningRate 0.0096   Epoch: 17   Global Step: 178500   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:16:01,013-Speed 5973.15 samples/sec   Loss 2.6710   LearningRate 0.0096   Epoch: 17   Global Step: 178510   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:16:07,877-Speed 5969.14 samples/sec   Loss 2.6307   LearningRate 0.0096   Epoch: 17   Global Step: 178520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:16:14,751-Speed 5959.49 samples/sec   Loss 2.6446   LearningRate 0.0096   Epoch: 17   Global Step: 178530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:16:21,611-Speed 5972.39 samples/sec   Loss 2.6561   LearningRate 0.0096   Epoch: 17   Global Step: 178540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:16:28,475-Speed 5967.90 samples/sec   Loss 2.6146   LearningRate 0.0095   Epoch: 17   Global Step: 178550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:16:35,331-Speed 5975.82 samples/sec   Loss 2.6425   LearningRate 0.0095   Epoch: 17   Global Step: 178560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:16:42,220-Speed 5947.00 samples/sec   Loss 2.6218   LearningRate 0.0095   Epoch: 17   Global Step: 178570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:16:49,070-Speed 5980.70 samples/sec   Loss 2.6481   LearningRate 0.0095   Epoch: 17   Global Step: 178580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:16:55,921-Speed 5980.00 samples/sec   Loss 2.5916   LearningRate 0.0095   Epoch: 17   Global Step: 178590   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:17:02,780-Speed 5973.32 samples/sec   Loss 2.6171   LearningRate 0.0095   Epoch: 17   Global Step: 178600   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:17:09,627-Speed 5983.06 samples/sec   Loss 2.6408   LearningRate 0.0095   Epoch: 17   Global Step: 178610   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:17:16,485-Speed 5973.86 samples/sec   Loss 2.6043   LearningRate 0.0095   Epoch: 17   Global Step: 178620   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:17:23,336-Speed 5980.14 samples/sec   Loss 2.6296   LearningRate 0.0095   Epoch: 17   Global Step: 178630   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:17:30,208-Speed 5960.93 samples/sec   Loss 2.6595   LearningRate 0.0095   Epoch: 17   Global Step: 178640   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:17:37,060-Speed 5979.15 samples/sec   Loss 2.6186   LearningRate 0.0095   Epoch: 17   Global Step: 178650   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:17:43,932-Speed 5963.60 samples/sec   Loss 2.5984   LearningRate 0.0095   Epoch: 17   Global Step: 178660   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:17:50,810-Speed 5955.64 samples/sec   Loss 2.6270   LearningRate 0.0095   Epoch: 17   Global Step: 178670   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:17:57,670-Speed 5972.30 samples/sec   Loss 2.6185   LearningRate 0.0095   Epoch: 17   Global Step: 178680   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:18:04,540-Speed 5962.84 samples/sec   Loss 2.6059   LearningRate 0.0095   Epoch: 17   Global Step: 178690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:18:11,395-Speed 5976.34 samples/sec   Loss 2.6157   LearningRate 0.0094   Epoch: 17   Global Step: 178700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:18:18,263-Speed 5965.19 samples/sec   Loss 2.6176   LearningRate 0.0094   Epoch: 17   Global Step: 178710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:18:25,173-Speed 5931.36 samples/sec   Loss 2.6295   LearningRate 0.0094   Epoch: 17   Global Step: 178720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:18:32,028-Speed 5977.09 samples/sec   Loss 2.6237   LearningRate 0.0094   Epoch: 17   Global Step: 178730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:18:38,879-Speed 5980.10 samples/sec   Loss 2.6324   LearningRate 0.0094   Epoch: 17   Global Step: 178740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:18:45,768-Speed 5946.86 samples/sec   Loss 2.6280   LearningRate 0.0094   Epoch: 17   Global Step: 178750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:18:52,623-Speed 5975.90 samples/sec   Loss 2.6300   LearningRate 0.0094   Epoch: 17   Global Step: 178760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:18:59,486-Speed 5969.79 samples/sec   Loss 2.6069   LearningRate 0.0094   Epoch: 17   Global Step: 178770   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:19:06,365-Speed 5955.02 samples/sec   Loss 2.5742   LearningRate 0.0094   Epoch: 17   Global Step: 178780   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:19:13,217-Speed 5981.55 samples/sec   Loss 2.6415   LearningRate 0.0094   Epoch: 17   Global Step: 178790   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:19:20,069-Speed 5979.02 samples/sec   Loss 2.5962   LearningRate 0.0094   Epoch: 17   Global Step: 178800   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:19:26,943-Speed 5959.41 samples/sec   Loss 2.6357   LearningRate 0.0094   Epoch: 17   Global Step: 178810   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:19:33,792-Speed 5981.30 samples/sec   Loss 2.6165   LearningRate 0.0094   Epoch: 17   Global Step: 178820   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:19:40,638-Speed 5986.01 samples/sec   Loss 2.6040   LearningRate 0.0094   Epoch: 17   Global Step: 178830   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:19:47,487-Speed 5981.52 samples/sec   Loss 2.6124   LearningRate 0.0094   Epoch: 17   Global Step: 178840   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:19:54,354-Speed 5965.78 samples/sec   Loss 2.5931   LearningRate 0.0093   Epoch: 17   Global Step: 178850   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:20:01,217-Speed 5969.48 samples/sec   Loss 2.6297   LearningRate 0.0093   Epoch: 17   Global Step: 178860   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:20:08,065-Speed 5982.84 samples/sec   Loss 2.6144   LearningRate 0.0093   Epoch: 17   Global Step: 178870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:20:14,913-Speed 5982.38 samples/sec   Loss 2.6354   LearningRate 0.0093   Epoch: 17   Global Step: 178880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:20:21,791-Speed 5956.68 samples/sec   Loss 2.6155   LearningRate 0.0093   Epoch: 17   Global Step: 178890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:20:28,655-Speed 5968.12 samples/sec   Loss 2.6109   LearningRate 0.0093   Epoch: 17   Global Step: 178900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:20:35,583-Speed 5913.55 samples/sec   Loss 2.5674   LearningRate 0.0093   Epoch: 17   Global Step: 178910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:20:42,565-Speed 5868.38 samples/sec   Loss 2.6347   LearningRate 0.0093   Epoch: 17   Global Step: 178920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:20:49,426-Speed 5970.87 samples/sec   Loss 2.6119   LearningRate 0.0093   Epoch: 17   Global Step: 178930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:20:56,296-Speed 5965.42 samples/sec   Loss 2.6041   LearningRate 0.0093   Epoch: 17   Global Step: 178940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:21:03,206-Speed 5928.10 samples/sec   Loss 2.5958   LearningRate 0.0093   Epoch: 17   Global Step: 178950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:21:10,065-Speed 5975.47 samples/sec   Loss 2.5955   LearningRate 0.0093   Epoch: 17   Global Step: 178960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:21:16,905-Speed 5989.80 samples/sec   Loss 2.6161   LearningRate 0.0093   Epoch: 17   Global Step: 178970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:21:23,766-Speed 5970.86 samples/sec   Loss 2.5918   LearningRate 0.0093   Epoch: 17   Global Step: 178980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:21:30,612-Speed 5983.96 samples/sec   Loss 2.5943   LearningRate 0.0093   Epoch: 17   Global Step: 178990   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:21:37,464-Speed 5979.19 samples/sec   Loss 2.6100   LearningRate 0.0092   Epoch: 17   Global Step: 179000   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:21:44,326-Speed 5969.93 samples/sec   Loss 2.6033   LearningRate 0.0092   Epoch: 17   Global Step: 179010   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:21:51,182-Speed 5975.48 samples/sec   Loss 2.5989   LearningRate 0.0092   Epoch: 17   Global Step: 179020   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:21:58,037-Speed 5976.31 samples/sec   Loss 2.6074   LearningRate 0.0092   Epoch: 17   Global Step: 179030   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:22:04,908-Speed 5961.99 samples/sec   Loss 2.6147   LearningRate 0.0092   Epoch: 17   Global Step: 179040   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:22:11,781-Speed 5960.65 samples/sec   Loss 2.6352   LearningRate 0.0092   Epoch: 17   Global Step: 179050   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:22:18,658-Speed 5958.50 samples/sec   Loss 2.5793   LearningRate 0.0092   Epoch: 17   Global Step: 179060   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:22:25,517-Speed 5973.99 samples/sec   Loss 2.5815   LearningRate 0.0092   Epoch: 17   Global Step: 179070   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:22:32,364-Speed 5983.12 samples/sec   Loss 2.6226   LearningRate 0.0092   Epoch: 17   Global Step: 179080   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-01-09 07:22:39,225-Speed 5972.48 samples/sec   Loss 2.5740   LearningRate 0.0092   Epoch: 17   Global Step: 179090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:22:46,070-Speed 5984.95 samples/sec   Loss 2.5865   LearningRate 0.0092   Epoch: 17   Global Step: 179100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:22:52,939-Speed 5964.68 samples/sec   Loss 2.6167   LearningRate 0.0092   Epoch: 17   Global Step: 179110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:22:59,789-Speed 5980.86 samples/sec   Loss 2.6150   LearningRate 0.0092   Epoch: 17   Global Step: 179120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:23:06,648-Speed 5975.31 samples/sec   Loss 2.5833   LearningRate 0.0092   Epoch: 17   Global Step: 179130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:23:13,521-Speed 5961.14 samples/sec   Loss 2.5988   LearningRate 0.0092   Epoch: 17   Global Step: 179140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:23:20,406-Speed 5950.20 samples/sec   Loss 2.5836   LearningRate 0.0092   Epoch: 17   Global Step: 179150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:23:27,249-Speed 5987.14 samples/sec   Loss 2.6243   LearningRate 0.0091   Epoch: 17   Global Step: 179160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:23:34,120-Speed 5962.24 samples/sec   Loss 2.5752   LearningRate 0.0091   Epoch: 17   Global Step: 179170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:23:41,005-Speed 5952.03 samples/sec   Loss 2.6094   LearningRate 0.0091   Epoch: 17   Global Step: 179180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:23:47,871-Speed 5966.97 samples/sec   Loss 2.6053   LearningRate 0.0091   Epoch: 17   Global Step: 179190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-01-09 07:23:54,722-Speed 5979.56 samples/sec   Loss 2.5970   LearningRate 0.0091   Epoch: 17   Global Step: 179200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:24:01,595-Speed 5961.23 samples/sec   Loss 2.6088   LearningRate 0.0091   Epoch: 17   Global Step: 179210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:24:08,469-Speed 5959.73 samples/sec   Loss 2.6042   LearningRate 0.0091   Epoch: 17   Global Step: 179220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:24:15,322-Speed 5978.20 samples/sec   Loss 2.6043   LearningRate 0.0091   Epoch: 17   Global Step: 179230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:24:22,166-Speed 5985.68 samples/sec   Loss 2.5635   LearningRate 0.0091   Epoch: 17   Global Step: 179240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:24:29,021-Speed 5976.50 samples/sec   Loss 2.5980   LearningRate 0.0091   Epoch: 17   Global Step: 179250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:24:35,874-Speed 5977.92 samples/sec   Loss 2.6034   LearningRate 0.0091   Epoch: 17   Global Step: 179260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:24:42,718-Speed 5986.31 samples/sec   Loss 2.6087   LearningRate 0.0091   Epoch: 17   Global Step: 179270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:24:49,570-Speed 5979.06 samples/sec   Loss 2.6133   LearningRate 0.0091   Epoch: 17   Global Step: 179280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:24:56,413-Speed 5986.57 samples/sec   Loss 2.5973   LearningRate 0.0091   Epoch: 17   Global Step: 179290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:25:03,265-Speed 5978.83 samples/sec   Loss 2.5676   LearningRate 0.0091   Epoch: 17   Global Step: 179300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:25:10,124-Speed 5973.05 samples/sec   Loss 2.6063   LearningRate 0.0090   Epoch: 17   Global Step: 179310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:25:17,013-Speed 5946.89 samples/sec   Loss 2.5935   LearningRate 0.0090   Epoch: 17   Global Step: 179320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-01-09 07:25:23,873-Speed 5972.74 samples/sec   Loss 2.6066   LearningRate 0.0090   Epoch: 17   Global Step: 179330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:25:30,830-Speed 5888.65 samples/sec   Loss 2.5397   LearningRate 0.0090   Epoch: 17   Global Step: 179340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:25:37,662-Speed 5998.04 samples/sec   Loss 2.5235   LearningRate 0.0090   Epoch: 17   Global Step: 179350   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:25:44,515-Speed 5977.87 samples/sec   Loss 2.5768   LearningRate 0.0090   Epoch: 17   Global Step: 179360   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:25:51,369-Speed 5977.16 samples/sec   Loss 2.5803   LearningRate 0.0090   Epoch: 17   Global Step: 179370   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:25:58,233-Speed 5971.12 samples/sec   Loss 2.5933   LearningRate 0.0090   Epoch: 17   Global Step: 179380   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:26:05,097-Speed 5968.22 samples/sec   Loss 2.6006   LearningRate 0.0090   Epoch: 17   Global Step: 179390   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:26:11,951-Speed 5977.22 samples/sec   Loss 2.5742   LearningRate 0.0090   Epoch: 17   Global Step: 179400   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:26:18,813-Speed 5969.92 samples/sec   Loss 2.5846   LearningRate 0.0090   Epoch: 17   Global Step: 179410   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:26:25,715-Speed 5936.22 samples/sec   Loss 2.5388   LearningRate 0.0090   Epoch: 17   Global Step: 179420   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:26:32,607-Speed 5943.70 samples/sec   Loss 2.5679   LearningRate 0.0090   Epoch: 17   Global Step: 179430   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:26:39,469-Speed 5970.52 samples/sec   Loss 2.5438   LearningRate 0.0090   Epoch: 17   Global Step: 179440   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:26:46,320-Speed 5979.76 samples/sec   Loss 2.5877   LearningRate 0.0090   Epoch: 17   Global Step: 179450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:26:53,184-Speed 5969.30 samples/sec   Loss 2.5763   LearningRate 0.0090   Epoch: 17   Global Step: 179460   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:27:00,039-Speed 5976.03 samples/sec   Loss 2.5713   LearningRate 0.0089   Epoch: 17   Global Step: 179470   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:27:06,920-Speed 5953.96 samples/sec   Loss 2.5806   LearningRate 0.0089   Epoch: 17   Global Step: 179480   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:27:13,821-Speed 5936.86 samples/sec   Loss 2.5736   LearningRate 0.0089   Epoch: 17   Global Step: 179490   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:27:20,729-Speed 5930.73 samples/sec   Loss 2.5702   LearningRate 0.0089   Epoch: 17   Global Step: 179500   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:27:27,633-Speed 5933.94 samples/sec   Loss 2.5851   LearningRate 0.0089   Epoch: 17   Global Step: 179510   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:27:34,508-Speed 5959.20 samples/sec   Loss 2.5598   LearningRate 0.0089   Epoch: 17   Global Step: 179520   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:27:41,395-Speed 5948.52 samples/sec   Loss 2.5385   LearningRate 0.0089   Epoch: 17   Global Step: 179530   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:27:48,243-Speed 5982.73 samples/sec   Loss 2.5851   LearningRate 0.0089   Epoch: 17   Global Step: 179540   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:27:55,110-Speed 5965.84 samples/sec   Loss 2.5521   LearningRate 0.0089   Epoch: 17   Global Step: 179550   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:28:02,010-Speed 5938.71 samples/sec   Loss 2.5691   LearningRate 0.0089   Epoch: 17   Global Step: 179560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:28:08,851-Speed 5989.75 samples/sec   Loss 2.5682   LearningRate 0.0089   Epoch: 17   Global Step: 179570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:28:15,699-Speed 5982.35 samples/sec   Loss 2.5525   LearningRate 0.0089   Epoch: 17   Global Step: 179580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:28:22,556-Speed 5974.81 samples/sec   Loss 2.5869   LearningRate 0.0089   Epoch: 17   Global Step: 179590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:28:29,398-Speed 5987.22 samples/sec   Loss 2.5235   LearningRate 0.0089   Epoch: 17   Global Step: 179600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:28:36,263-Speed 5967.83 samples/sec   Loss 2.5353   LearningRate 0.0089   Epoch: 17   Global Step: 179610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:28:43,134-Speed 5962.82 samples/sec   Loss 2.5734   LearningRate 0.0088   Epoch: 17   Global Step: 179620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:28:50,029-Speed 5941.99 samples/sec   Loss 2.5497   LearningRate 0.0088   Epoch: 17   Global Step: 179630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:28:56,884-Speed 5978.32 samples/sec   Loss 2.5637   LearningRate 0.0088   Epoch: 17   Global Step: 179640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:29:03,773-Speed 5946.83 samples/sec   Loss 2.5682   LearningRate 0.0088   Epoch: 17   Global Step: 179650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:29:10,615-Speed 5987.87 samples/sec   Loss 2.5683   LearningRate 0.0088   Epoch: 17   Global Step: 179660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:29:17,469-Speed 5979.34 samples/sec   Loss 2.5364   LearningRate 0.0088   Epoch: 17   Global Step: 179670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:29:24,320-Speed 5980.00 samples/sec   Loss 2.5609   LearningRate 0.0088   Epoch: 17   Global Step: 179680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:29:31,174-Speed 5976.92 samples/sec   Loss 2.5157   LearningRate 0.0088   Epoch: 17   Global Step: 179690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:29:38,042-Speed 5965.42 samples/sec   Loss 2.5703   LearningRate 0.0088   Epoch: 17   Global Step: 179700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:29:44,892-Speed 5980.40 samples/sec   Loss 2.5655   LearningRate 0.0088   Epoch: 17   Global Step: 179710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:29:51,740-Speed 5982.85 samples/sec   Loss 2.5432   LearningRate 0.0088   Epoch: 17   Global Step: 179720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:29:58,580-Speed 5989.76 samples/sec   Loss 2.5464   LearningRate 0.0088   Epoch: 17   Global Step: 179730   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:30:05,437-Speed 5974.37 samples/sec   Loss 2.5593   LearningRate 0.0088   Epoch: 17   Global Step: 179740   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:30:12,284-Speed 5985.65 samples/sec   Loss 2.5678   LearningRate 0.0088   Epoch: 17   Global Step: 179750   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:30:19,148-Speed 5968.94 samples/sec   Loss 2.5566   LearningRate 0.0088   Epoch: 17   Global Step: 179760   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:30:26,022-Speed 5960.38 samples/sec   Loss 2.5450   LearningRate 0.0088   Epoch: 17   Global Step: 179770   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:30:32,888-Speed 5966.77 samples/sec   Loss 2.5444   LearningRate 0.0087   Epoch: 17   Global Step: 179780   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:30:39,757-Speed 5964.29 samples/sec   Loss 2.5677   LearningRate 0.0087   Epoch: 17   Global Step: 179790   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:30:46,598-Speed 5988.92 samples/sec   Loss 2.5828   LearningRate 0.0087   Epoch: 17   Global Step: 179800   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:30:53,439-Speed 5988.61 samples/sec   Loss 2.5641   LearningRate 0.0087   Epoch: 17   Global Step: 179810   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:31:00,319-Speed 5955.27 samples/sec   Loss 2.5646   LearningRate 0.0087   Epoch: 17   Global Step: 179820   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:31:07,185-Speed 5966.56 samples/sec   Loss 2.5205   LearningRate 0.0087   Epoch: 17   Global Step: 179830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:31:14,034-Speed 5981.41 samples/sec   Loss 2.5379   LearningRate 0.0087   Epoch: 17   Global Step: 179840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:31:20,904-Speed 5962.55 samples/sec   Loss 2.5869   LearningRate 0.0087   Epoch: 17   Global Step: 179850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:31:27,749-Speed 5985.12 samples/sec   Loss 2.5442   LearningRate 0.0087   Epoch: 17   Global Step: 179860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:31:34,598-Speed 5981.75 samples/sec   Loss 2.5870   LearningRate 0.0087   Epoch: 17   Global Step: 179870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:31:41,452-Speed 5976.90 samples/sec   Loss 2.5528   LearningRate 0.0087   Epoch: 17   Global Step: 179880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:31:48,849-Speed 5540.77 samples/sec   Loss 2.5337   LearningRate 0.0087   Epoch: 17   Global Step: 179890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:31:55,744-Speed 5960.51 samples/sec   Loss 2.5422   LearningRate 0.0087   Epoch: 17   Global Step: 179900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:32:02,599-Speed 5977.04 samples/sec   Loss 2.5382   LearningRate 0.0087   Epoch: 17   Global Step: 179910   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:32:09,455-Speed 5975.03 samples/sec   Loss 2.5595   LearningRate 0.0087   Epoch: 17   Global Step: 179920   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:32:16,307-Speed 5978.59 samples/sec   Loss 2.5447   LearningRate 0.0087   Epoch: 17   Global Step: 179930   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:32:23,180-Speed 5961.24 samples/sec   Loss 2.5516   LearningRate 0.0086   Epoch: 17   Global Step: 179940   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:32:30,034-Speed 5977.34 samples/sec   Loss 2.5531   LearningRate 0.0086   Epoch: 17   Global Step: 179950   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:32:36,880-Speed 5984.00 samples/sec   Loss 2.5434   LearningRate 0.0086   Epoch: 17   Global Step: 179960   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:32:43,746-Speed 5967.04 samples/sec   Loss 2.5319   LearningRate 0.0086   Epoch: 17   Global Step: 179970   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:32:50,602-Speed 5975.25 samples/sec   Loss 2.4881   LearningRate 0.0086   Epoch: 17   Global Step: 179980   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:32:57,446-Speed 5986.37 samples/sec   Loss 2.5487   LearningRate 0.0086   Epoch: 17   Global Step: 179990   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:33:04,302-Speed 5977.41 samples/sec   Loss 2.5412   LearningRate 0.0086   Epoch: 17   Global Step: 180000   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:33:31,318-[lfw][180000]XNorm: 23.782969
Training: 2022-01-09 07:33:31,319-[lfw][180000]Accuracy-Flip: 0.99833+-0.00236
Training: 2022-01-09 07:33:31,320-[lfw][180000]Accuracy-Highest: 0.99833
Training: 2022-01-09 07:34:02,720-[cfp_fp][180000]XNorm: 21.364016
Training: 2022-01-09 07:34:02,721-[cfp_fp][180000]Accuracy-Flip: 0.99086+-0.00400
Training: 2022-01-09 07:34:02,722-[cfp_fp][180000]Accuracy-Highest: 0.99229
Training: 2022-01-09 07:34:29,288-[agedb_30][180000]XNorm: 23.191659
Training: 2022-01-09 07:34:29,289-[agedb_30][180000]Accuracy-Flip: 0.98200+-0.00591
Training: 2022-01-09 07:34:29,289-[agedb_30][180000]Accuracy-Highest: 0.98200
Training: 2022-01-09 07:34:36,114-Speed 446.13 samples/sec   Loss 2.5488   LearningRate 0.0086   Epoch: 17   Global Step: 180010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:34:42,974-Speed 5972.64 samples/sec   Loss 2.5654   LearningRate 0.0086   Epoch: 17   Global Step: 180020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:34:49,820-Speed 5984.54 samples/sec   Loss 2.5724   LearningRate 0.0086   Epoch: 17   Global Step: 180030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:34:56,683-Speed 5971.08 samples/sec   Loss 2.5476   LearningRate 0.0086   Epoch: 17   Global Step: 180040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:35:03,547-Speed 5968.62 samples/sec   Loss 2.5303   LearningRate 0.0086   Epoch: 17   Global Step: 180050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:35:10,403-Speed 5975.68 samples/sec   Loss 2.5235   LearningRate 0.0086   Epoch: 17   Global Step: 180060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:35:17,250-Speed 5983.42 samples/sec   Loss 2.5218   LearningRate 0.0086   Epoch: 17   Global Step: 180070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:35:24,110-Speed 5972.16 samples/sec   Loss 2.5479   LearningRate 0.0086   Epoch: 17   Global Step: 180080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:35:30,987-Speed 5957.33 samples/sec   Loss 2.5110   LearningRate 0.0086   Epoch: 17   Global Step: 180090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:35:37,858-Speed 5963.04 samples/sec   Loss 2.5242   LearningRate 0.0085   Epoch: 17   Global Step: 180100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:35:44,695-Speed 5992.61 samples/sec   Loss 2.5311   LearningRate 0.0085   Epoch: 17   Global Step: 180110   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:35:51,552-Speed 5974.28 samples/sec   Loss 2.4757   LearningRate 0.0085   Epoch: 17   Global Step: 180120   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:35:58,401-Speed 5982.03 samples/sec   Loss 2.5228   LearningRate 0.0085   Epoch: 17   Global Step: 180130   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:36:05,272-Speed 5962.94 samples/sec   Loss 2.5186   LearningRate 0.0085   Epoch: 17   Global Step: 180140   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:36:12,122-Speed 5983.13 samples/sec   Loss 2.5538   LearningRate 0.0085   Epoch: 17   Global Step: 180150   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:36:18,984-Speed 5970.38 samples/sec   Loss 2.5708   LearningRate 0.0085   Epoch: 17   Global Step: 180160   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:36:25,834-Speed 5980.43 samples/sec   Loss 2.5308   LearningRate 0.0085   Epoch: 17   Global Step: 180170   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:36:32,689-Speed 5976.58 samples/sec   Loss 2.4817   LearningRate 0.0085   Epoch: 17   Global Step: 180180   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:36:39,547-Speed 5973.88 samples/sec   Loss 2.4986   LearningRate 0.0085   Epoch: 17   Global Step: 180190   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:36:46,387-Speed 5989.85 samples/sec   Loss 2.5272   LearningRate 0.0085   Epoch: 17   Global Step: 180200   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:36:53,284-Speed 5940.25 samples/sec   Loss 2.5318   LearningRate 0.0085   Epoch: 17   Global Step: 180210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:37:00,174-Speed 5946.06 samples/sec   Loss 2.5337   LearningRate 0.0085   Epoch: 17   Global Step: 180220   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:37:07,033-Speed 5973.10 samples/sec   Loss 2.4955   LearningRate 0.0085   Epoch: 17   Global Step: 180230   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:37:13,923-Speed 5946.53 samples/sec   Loss 2.5104   LearningRate 0.0085   Epoch: 17   Global Step: 180240   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:37:20,759-Speed 5992.62 samples/sec   Loss 2.5289   LearningRate 0.0085   Epoch: 17   Global Step: 180250   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:37:27,599-Speed 5989.43 samples/sec   Loss 2.5460   LearningRate 0.0084   Epoch: 17   Global Step: 180260   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:37:34,472-Speed 5960.57 samples/sec   Loss 2.5245   LearningRate 0.0084   Epoch: 17   Global Step: 180270   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:37:41,333-Speed 5971.15 samples/sec   Loss 2.5399   LearningRate 0.0084   Epoch: 17   Global Step: 180280   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:37:48,175-Speed 5990.43 samples/sec   Loss 2.4967   LearningRate 0.0084   Epoch: 17   Global Step: 180290   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:37:55,027-Speed 5978.98 samples/sec   Loss 2.5441   LearningRate 0.0084   Epoch: 17   Global Step: 180300   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:38:01,872-Speed 5985.04 samples/sec   Loss 2.5133   LearningRate 0.0084   Epoch: 17   Global Step: 180310   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:38:08,723-Speed 5980.19 samples/sec   Loss 2.5050   LearningRate 0.0084   Epoch: 17   Global Step: 180320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:38:15,568-Speed 5985.02 samples/sec   Loss 2.5173   LearningRate 0.0084   Epoch: 17   Global Step: 180330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:38:22,435-Speed 5965.37 samples/sec   Loss 2.4786   LearningRate 0.0084   Epoch: 17   Global Step: 180340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:38:29,349-Speed 5926.11 samples/sec   Loss 2.5234   LearningRate 0.0084   Epoch: 17   Global Step: 180350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:38:36,218-Speed 5964.55 samples/sec   Loss 2.5091   LearningRate 0.0084   Epoch: 17   Global Step: 180360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:38:43,062-Speed 5985.80 samples/sec   Loss 2.4930   LearningRate 0.0084   Epoch: 17   Global Step: 180370   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:38:49,917-Speed 5977.03 samples/sec   Loss 2.5021   LearningRate 0.0084   Epoch: 17   Global Step: 180380   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:38:56,768-Speed 5979.65 samples/sec   Loss 2.5424   LearningRate 0.0084   Epoch: 17   Global Step: 180390   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:39:03,649-Speed 5956.91 samples/sec   Loss 2.5327   LearningRate 0.0084   Epoch: 17   Global Step: 180400   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:39:10,506-Speed 5975.69 samples/sec   Loss 2.5749   LearningRate 0.0084   Epoch: 17   Global Step: 180410   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:39:17,380-Speed 5959.92 samples/sec   Loss 2.5383   LearningRate 0.0083   Epoch: 17   Global Step: 180420   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:39:24,246-Speed 5967.15 samples/sec   Loss 2.4971   LearningRate 0.0083   Epoch: 17   Global Step: 180430   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:39:31,088-Speed 5987.22 samples/sec   Loss 2.5086   LearningRate 0.0083   Epoch: 17   Global Step: 180440   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:39:37,941-Speed 5979.14 samples/sec   Loss 2.5255   LearningRate 0.0083   Epoch: 17   Global Step: 180450   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:39:44,795-Speed 5977.00 samples/sec   Loss 2.4816   LearningRate 0.0083   Epoch: 17   Global Step: 180460   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:39:51,638-Speed 5986.74 samples/sec   Loss 2.4887   LearningRate 0.0083   Epoch: 17   Global Step: 180470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:39:58,497-Speed 5972.89 samples/sec   Loss 2.5200   LearningRate 0.0083   Epoch: 17   Global Step: 180480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:40:05,361-Speed 5968.69 samples/sec   Loss 2.5337   LearningRate 0.0083   Epoch: 17   Global Step: 180490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:40:12,209-Speed 5982.29 samples/sec   Loss 2.4890   LearningRate 0.0083   Epoch: 17   Global Step: 180500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:40:19,072-Speed 5969.41 samples/sec   Loss 2.5120   LearningRate 0.0083   Epoch: 17   Global Step: 180510   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:40:25,920-Speed 5982.72 samples/sec   Loss 2.4712   LearningRate 0.0083   Epoch: 17   Global Step: 180520   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:40:32,777-Speed 5974.38 samples/sec   Loss 2.4527   LearningRate 0.0083   Epoch: 17   Global Step: 180530   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:40:39,638-Speed 5971.51 samples/sec   Loss 2.5007   LearningRate 0.0083   Epoch: 17   Global Step: 180540   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:40:46,499-Speed 5971.15 samples/sec   Loss 2.4702   LearningRate 0.0083   Epoch: 17   Global Step: 180550   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:40:53,367-Speed 5965.48 samples/sec   Loss 2.5145   LearningRate 0.0083   Epoch: 17   Global Step: 180560   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:41:00,222-Speed 5976.53 samples/sec   Loss 2.5173   LearningRate 0.0083   Epoch: 17   Global Step: 180570   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:41:07,069-Speed 5982.89 samples/sec   Loss 2.5355   LearningRate 0.0082   Epoch: 17   Global Step: 180580   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 07:41:13,916-Speed 5983.10 samples/sec   Loss 2.4946   LearningRate 0.0082   Epoch: 17   Global Step: 180590   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 07:41:20,760-Speed 5987.04 samples/sec   Loss 2.4974   LearningRate 0.0082   Epoch: 17   Global Step: 180600   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 07:41:27,602-Speed 5988.29 samples/sec   Loss 2.5073   LearningRate 0.0082   Epoch: 17   Global Step: 180610   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 07:41:34,475-Speed 5960.14 samples/sec   Loss 2.4802   LearningRate 0.0082   Epoch: 17   Global Step: 180620   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 07:41:41,327-Speed 5978.73 samples/sec   Loss 2.5181   LearningRate 0.0082   Epoch: 17   Global Step: 180630   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 07:41:48,281-Speed 5891.70 samples/sec   Loss 2.5239   LearningRate 0.0082   Epoch: 17   Global Step: 180640   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 07:41:55,130-Speed 5981.56 samples/sec   Loss 2.4907   LearningRate 0.0082   Epoch: 17   Global Step: 180650   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 07:42:02,006-Speed 5958.09 samples/sec   Loss 2.4871   LearningRate 0.0082   Epoch: 17   Global Step: 180660   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 07:42:08,883-Speed 5957.11 samples/sec   Loss 2.4913   LearningRate 0.0082   Epoch: 17   Global Step: 180670   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 07:42:16,411-Speed 5442.41 samples/sec   Loss 2.4654   LearningRate 0.0082   Epoch: 17   Global Step: 180680   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:42:23,300-Speed 5946.84 samples/sec   Loss 2.5029   LearningRate 0.0082   Epoch: 17   Global Step: 180690   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:42:30,170-Speed 5963.65 samples/sec   Loss 2.4833   LearningRate 0.0082   Epoch: 17   Global Step: 180700   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:42:37,027-Speed 5974.69 samples/sec   Loss 2.4865   LearningRate 0.0082   Epoch: 17   Global Step: 180710   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:42:43,890-Speed 5971.01 samples/sec   Loss 2.4861   LearningRate 0.0082   Epoch: 17   Global Step: 180720   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:42:50,743-Speed 5978.40 samples/sec   Loss 2.4456   LearningRate 0.0082   Epoch: 17   Global Step: 180730   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:42:57,635-Speed 5943.12 samples/sec   Loss 2.4720   LearningRate 0.0081   Epoch: 17   Global Step: 180740   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:43:04,531-Speed 5941.42 samples/sec   Loss 2.4896   LearningRate 0.0081   Epoch: 17   Global Step: 180750   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:43:11,429-Speed 5938.96 samples/sec   Loss 2.5145   LearningRate 0.0081   Epoch: 17   Global Step: 180760   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:43:18,290-Speed 5970.52 samples/sec   Loss 2.4903   LearningRate 0.0081   Epoch: 17   Global Step: 180770   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:43:25,162-Speed 5960.87 samples/sec   Loss 2.4979   LearningRate 0.0081   Epoch: 17   Global Step: 180780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:43:32,015-Speed 5978.66 samples/sec   Loss 2.4909   LearningRate 0.0081   Epoch: 17   Global Step: 180790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:43:38,862-Speed 5982.65 samples/sec   Loss 2.4916   LearningRate 0.0081   Epoch: 17   Global Step: 180800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:43:45,717-Speed 5976.87 samples/sec   Loss 2.4740   LearningRate 0.0081   Epoch: 17   Global Step: 180810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:43:52,567-Speed 5980.34 samples/sec   Loss 2.4903   LearningRate 0.0081   Epoch: 17   Global Step: 180820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:43:59,429-Speed 5970.41 samples/sec   Loss 2.4667   LearningRate 0.0081   Epoch: 17   Global Step: 180830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:44:06,276-Speed 5983.58 samples/sec   Loss 2.4756   LearningRate 0.0081   Epoch: 17   Global Step: 180840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:44:13,138-Speed 5970.25 samples/sec   Loss 2.4847   LearningRate 0.0081   Epoch: 17   Global Step: 180850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:44:20,010-Speed 5961.38 samples/sec   Loss 2.5092   LearningRate 0.0081   Epoch: 17   Global Step: 180860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:44:26,881-Speed 5962.26 samples/sec   Loss 2.4558   LearningRate 0.0081   Epoch: 17   Global Step: 180870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:44:33,740-Speed 5972.61 samples/sec   Loss 2.4941   LearningRate 0.0081   Epoch: 17   Global Step: 180880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-09 07:44:40,589-Speed 5981.81 samples/sec   Loss 2.5028   LearningRate 0.0081   Epoch: 17   Global Step: 180890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:44:47,458-Speed 5964.07 samples/sec   Loss 2.4573   LearningRate 0.0081   Epoch: 17   Global Step: 180900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:44:54,313-Speed 5976.87 samples/sec   Loss 2.4644   LearningRate 0.0080   Epoch: 17   Global Step: 180910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:45:01,167-Speed 5976.76 samples/sec   Loss 2.4658   LearningRate 0.0080   Epoch: 17   Global Step: 180920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:45:08,027-Speed 5972.12 samples/sec   Loss 2.4522   LearningRate 0.0080   Epoch: 17   Global Step: 180930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:45:14,901-Speed 5960.61 samples/sec   Loss 2.4948   LearningRate 0.0080   Epoch: 17   Global Step: 180940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:45:21,796-Speed 5941.47 samples/sec   Loss 2.4681   LearningRate 0.0080   Epoch: 17   Global Step: 180950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:45:28,710-Speed 5925.17 samples/sec   Loss 2.4704   LearningRate 0.0080   Epoch: 17   Global Step: 180960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:45:35,587-Speed 5958.66 samples/sec   Loss 2.4574   LearningRate 0.0080   Epoch: 17   Global Step: 180970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:45:42,450-Speed 5968.73 samples/sec   Loss 2.4789   LearningRate 0.0080   Epoch: 17   Global Step: 180980   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:45:49,312-Speed 5975.78 samples/sec   Loss 2.4677   LearningRate 0.0080   Epoch: 17   Global Step: 180990   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:45:56,300-Speed 5862.39 samples/sec   Loss 2.4898   LearningRate 0.0080   Epoch: 17   Global Step: 181000   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:46:03,255-Speed 5890.68 samples/sec   Loss 2.4528   LearningRate 0.0080   Epoch: 17   Global Step: 181010   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:46:10,113-Speed 5973.26 samples/sec   Loss 2.4587   LearningRate 0.0080   Epoch: 17   Global Step: 181020   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:46:16,973-Speed 5972.63 samples/sec   Loss 2.4633   LearningRate 0.0080   Epoch: 17   Global Step: 181030   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:46:23,841-Speed 5967.48 samples/sec   Loss 2.4624   LearningRate 0.0080   Epoch: 17   Global Step: 181040   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:46:30,692-Speed 5978.79 samples/sec   Loss 2.4972   LearningRate 0.0080   Epoch: 17   Global Step: 181050   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:46:37,565-Speed 5960.65 samples/sec   Loss 2.4621   LearningRate 0.0080   Epoch: 17   Global Step: 181060   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:46:44,427-Speed 5970.53 samples/sec   Loss 2.4350   LearningRate 0.0079   Epoch: 17   Global Step: 181070   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:46:51,293-Speed 5967.34 samples/sec   Loss 2.4778   LearningRate 0.0079   Epoch: 17   Global Step: 181080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:46:58,163-Speed 5963.18 samples/sec   Loss 2.4735   LearningRate 0.0079   Epoch: 17   Global Step: 181090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:47:05,020-Speed 5974.54 samples/sec   Loss 2.4592   LearningRate 0.0079   Epoch: 17   Global Step: 181100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:47:11,879-Speed 5972.33 samples/sec   Loss 2.4867   LearningRate 0.0079   Epoch: 17   Global Step: 181110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:47:18,742-Speed 5969.37 samples/sec   Loss 2.4977   LearningRate 0.0079   Epoch: 17   Global Step: 181120   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:47:25,614-Speed 5961.67 samples/sec   Loss 2.4512   LearningRate 0.0079   Epoch: 17   Global Step: 181130   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:47:32,485-Speed 5962.19 samples/sec   Loss 2.4410   LearningRate 0.0079   Epoch: 17   Global Step: 181140   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:47:39,344-Speed 5972.62 samples/sec   Loss 2.4926   LearningRate 0.0079   Epoch: 17   Global Step: 181150   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:47:46,241-Speed 5940.56 samples/sec   Loss 2.4557   LearningRate 0.0079   Epoch: 17   Global Step: 181160   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:47:53,129-Speed 5948.00 samples/sec   Loss 2.4430   LearningRate 0.0079   Epoch: 17   Global Step: 181170   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:47:59,988-Speed 5972.57 samples/sec   Loss 2.4964   LearningRate 0.0079   Epoch: 17   Global Step: 181180   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:48:06,863-Speed 5959.12 samples/sec   Loss 2.4493   LearningRate 0.0079   Epoch: 17   Global Step: 181190   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:48:13,726-Speed 5969.09 samples/sec   Loss 2.4830   LearningRate 0.0079   Epoch: 17   Global Step: 181200   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:48:20,599-Speed 5960.72 samples/sec   Loss 2.4436   LearningRate 0.0079   Epoch: 17   Global Step: 181210   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:48:27,465-Speed 5966.63 samples/sec   Loss 2.4679   LearningRate 0.0079   Epoch: 17   Global Step: 181220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:48:34,331-Speed 5966.75 samples/sec   Loss 2.4481   LearningRate 0.0079   Epoch: 17   Global Step: 181230   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:48:41,211-Speed 5954.80 samples/sec   Loss 2.4510   LearningRate 0.0078   Epoch: 17   Global Step: 181240   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:48:48,081-Speed 5969.12 samples/sec   Loss 2.4668   LearningRate 0.0078   Epoch: 17   Global Step: 181250   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:48:54,952-Speed 5962.57 samples/sec   Loss 2.4447   LearningRate 0.0078   Epoch: 17   Global Step: 181260   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:49:01,835-Speed 5952.24 samples/sec   Loss 2.4645   LearningRate 0.0078   Epoch: 17   Global Step: 181270   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:49:08,730-Speed 5943.63 samples/sec   Loss 2.4285   LearningRate 0.0078   Epoch: 17   Global Step: 181280   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:49:15,590-Speed 5971.93 samples/sec   Loss 2.4528   LearningRate 0.0078   Epoch: 17   Global Step: 181290   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:49:22,512-Speed 5919.44 samples/sec   Loss 2.4630   LearningRate 0.0078   Epoch: 17   Global Step: 181300   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:49:29,392-Speed 5954.74 samples/sec   Loss 2.4634   LearningRate 0.0078   Epoch: 17   Global Step: 181310   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:49:36,257-Speed 5967.43 samples/sec   Loss 2.4523   LearningRate 0.0078   Epoch: 17   Global Step: 181320   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:49:43,136-Speed 5955.57 samples/sec   Loss 2.4546   LearningRate 0.0078   Epoch: 17   Global Step: 181330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:49:50,015-Speed 5954.89 samples/sec   Loss 2.4429   LearningRate 0.0078   Epoch: 17   Global Step: 181340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:49:56,875-Speed 5972.41 samples/sec   Loss 2.4560   LearningRate 0.0078   Epoch: 17   Global Step: 181350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:50:03,746-Speed 5961.77 samples/sec   Loss 2.4369   LearningRate 0.0078   Epoch: 17   Global Step: 181360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:50:10,606-Speed 5971.34 samples/sec   Loss 2.4921   LearningRate 0.0078   Epoch: 17   Global Step: 181370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:50:17,474-Speed 5965.36 samples/sec   Loss 2.4588   LearningRate 0.0078   Epoch: 17   Global Step: 181380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:50:24,342-Speed 5964.87 samples/sec   Loss 2.4561   LearningRate 0.0078   Epoch: 17   Global Step: 181390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:50:31,229-Speed 5948.92 samples/sec   Loss 2.4565   LearningRate 0.0078   Epoch: 17   Global Step: 181400   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:50:38,083-Speed 5977.56 samples/sec   Loss 2.4458   LearningRate 0.0077   Epoch: 17   Global Step: 181410   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:50:44,945-Speed 5970.00 samples/sec   Loss 2.4580   LearningRate 0.0077   Epoch: 17   Global Step: 181420   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:50:51,808-Speed 5968.89 samples/sec   Loss 2.4618   LearningRate 0.0077   Epoch: 17   Global Step: 181430   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:50:58,715-Speed 5932.73 samples/sec   Loss 2.4140   LearningRate 0.0077   Epoch: 17   Global Step: 181440   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:51:05,580-Speed 5968.21 samples/sec   Loss 2.4285   LearningRate 0.0077   Epoch: 17   Global Step: 181450   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:51:12,445-Speed 5967.46 samples/sec   Loss 2.4331   LearningRate 0.0077   Epoch: 17   Global Step: 181460   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:51:19,307-Speed 5970.18 samples/sec   Loss 2.4529   LearningRate 0.0077   Epoch: 17   Global Step: 181470   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:51:26,162-Speed 5975.63 samples/sec   Loss 2.4240   LearningRate 0.0077   Epoch: 17   Global Step: 181480   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:51:33,035-Speed 5961.96 samples/sec   Loss 2.4427   LearningRate 0.0077   Epoch: 17   Global Step: 181490   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:51:39,899-Speed 5969.54 samples/sec   Loss 2.4349   LearningRate 0.0077   Epoch: 17   Global Step: 181500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:51:46,756-Speed 5973.82 samples/sec   Loss 2.4260   LearningRate 0.0077   Epoch: 17   Global Step: 181510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:51:53,638-Speed 5953.12 samples/sec   Loss 2.4222   LearningRate 0.0077   Epoch: 17   Global Step: 181520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:52:00,496-Speed 5974.79 samples/sec   Loss 2.4625   LearningRate 0.0077   Epoch: 17   Global Step: 181530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:52:07,354-Speed 5973.94 samples/sec   Loss 2.4364   LearningRate 0.0077   Epoch: 17   Global Step: 181540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:52:14,246-Speed 5944.53 samples/sec   Loss 2.4530   LearningRate 0.0077   Epoch: 17   Global Step: 181550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:52:21,099-Speed 5978.42 samples/sec   Loss 2.4337   LearningRate 0.0077   Epoch: 17   Global Step: 181560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:52:27,948-Speed 5981.91 samples/sec   Loss 2.4158   LearningRate 0.0076   Epoch: 17   Global Step: 181570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:52:34,795-Speed 5983.37 samples/sec   Loss 2.4068   LearningRate 0.0076   Epoch: 17   Global Step: 181580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:52:41,645-Speed 5980.32 samples/sec   Loss 2.4027   LearningRate 0.0076   Epoch: 17   Global Step: 181590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:52:48,494-Speed 5981.60 samples/sec   Loss 2.4200   LearningRate 0.0076   Epoch: 17   Global Step: 181600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:52:55,366-Speed 5961.51 samples/sec   Loss 2.4061   LearningRate 0.0076   Epoch: 17   Global Step: 181610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:53:02,215-Speed 5982.25 samples/sec   Loss 2.4399   LearningRate 0.0076   Epoch: 17   Global Step: 181620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:53:09,086-Speed 5962.30 samples/sec   Loss 2.4265   LearningRate 0.0076   Epoch: 17   Global Step: 181630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:53:15,940-Speed 5977.04 samples/sec   Loss 2.4159   LearningRate 0.0076   Epoch: 17   Global Step: 181640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:53:22,799-Speed 5972.57 samples/sec   Loss 2.4702   LearningRate 0.0076   Epoch: 17   Global Step: 181650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:53:29,656-Speed 5975.03 samples/sec   Loss 2.4265   LearningRate 0.0076   Epoch: 17   Global Step: 181660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:53:36,505-Speed 5981.42 samples/sec   Loss 2.4495   LearningRate 0.0076   Epoch: 17   Global Step: 181670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:53:43,370-Speed 5967.89 samples/sec   Loss 2.4133   LearningRate 0.0076   Epoch: 17   Global Step: 181680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:53:50,214-Speed 5985.90 samples/sec   Loss 2.4294   LearningRate 0.0076   Epoch: 17   Global Step: 181690   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:53:57,063-Speed 5981.20 samples/sec   Loss 2.4311   LearningRate 0.0076   Epoch: 17   Global Step: 181700   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:54:03,921-Speed 5974.28 samples/sec   Loss 2.4512   LearningRate 0.0076   Epoch: 17   Global Step: 181710   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:54:10,784-Speed 5969.54 samples/sec   Loss 2.4511   LearningRate 0.0076   Epoch: 17   Global Step: 181720   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:54:17,661-Speed 5957.95 samples/sec   Loss 2.4912   LearningRate 0.0076   Epoch: 17   Global Step: 181730   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:54:24,536-Speed 5958.35 samples/sec   Loss 2.4500   LearningRate 0.0075   Epoch: 17   Global Step: 181740   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:54:31,399-Speed 5969.19 samples/sec   Loss 2.4021   LearningRate 0.0075   Epoch: 17   Global Step: 181750   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:54:38,262-Speed 5969.24 samples/sec   Loss 2.4609   LearningRate 0.0075   Epoch: 17   Global Step: 181760   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:54:45,129-Speed 5966.35 samples/sec   Loss 2.4037   LearningRate 0.0075   Epoch: 17   Global Step: 181770   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:54:51,990-Speed 5970.92 samples/sec   Loss 2.4665   LearningRate 0.0075   Epoch: 17   Global Step: 181780   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:54:58,847-Speed 5973.79 samples/sec   Loss 2.4485   LearningRate 0.0075   Epoch: 17   Global Step: 181790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:55:05,699-Speed 5979.40 samples/sec   Loss 2.4502   LearningRate 0.0075   Epoch: 17   Global Step: 181800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:55:12,545-Speed 5983.78 samples/sec   Loss 2.4283   LearningRate 0.0075   Epoch: 17   Global Step: 181810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:55:19,405-Speed 5972.53 samples/sec   Loss 2.4568   LearningRate 0.0075   Epoch: 17   Global Step: 181820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:55:26,266-Speed 5970.48 samples/sec   Loss 2.4014   LearningRate 0.0075   Epoch: 17   Global Step: 181830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:55:33,125-Speed 5973.18 samples/sec   Loss 2.4355   LearningRate 0.0075   Epoch: 17   Global Step: 181840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:55:40,006-Speed 5955.90 samples/sec   Loss 2.4123   LearningRate 0.0075   Epoch: 17   Global Step: 181850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:55:46,874-Speed 5964.81 samples/sec   Loss 2.4176   LearningRate 0.0075   Epoch: 17   Global Step: 181860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:55:53,730-Speed 5975.76 samples/sec   Loss 2.4399   LearningRate 0.0075   Epoch: 17   Global Step: 181870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:56:00,601-Speed 5962.21 samples/sec   Loss 2.4220   LearningRate 0.0075   Epoch: 17   Global Step: 181880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:56:07,444-Speed 5986.97 samples/sec   Loss 2.3658   LearningRate 0.0075   Epoch: 17   Global Step: 181890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:56:14,301-Speed 5974.36 samples/sec   Loss 2.3940   LearningRate 0.0075   Epoch: 17   Global Step: 181900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:56:21,150-Speed 5981.81 samples/sec   Loss 2.4006   LearningRate 0.0074   Epoch: 17   Global Step: 181910   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:56:28,016-Speed 5967.21 samples/sec   Loss 2.4511   LearningRate 0.0074   Epoch: 17   Global Step: 181920   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:56:34,879-Speed 5969.11 samples/sec   Loss 2.4168   LearningRate 0.0074   Epoch: 17   Global Step: 181930   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:56:41,776-Speed 5939.80 samples/sec   Loss 2.3872   LearningRate 0.0074   Epoch: 17   Global Step: 181940   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:56:48,665-Speed 5947.71 samples/sec   Loss 2.4208   LearningRate 0.0074   Epoch: 17   Global Step: 181950   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:56:55,521-Speed 5979.45 samples/sec   Loss 2.4037   LearningRate 0.0074   Epoch: 17   Global Step: 181960   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:57:02,378-Speed 5974.53 samples/sec   Loss 2.4317   LearningRate 0.0074   Epoch: 17   Global Step: 181970   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:57:09,234-Speed 5975.84 samples/sec   Loss 2.4104   LearningRate 0.0074   Epoch: 17   Global Step: 181980   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:57:16,110-Speed 5957.55 samples/sec   Loss 2.4059   LearningRate 0.0074   Epoch: 17   Global Step: 181990   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:57:22,958-Speed 5984.06 samples/sec   Loss 2.4360   LearningRate 0.0074   Epoch: 17   Global Step: 182000   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:57:29,826-Speed 5964.82 samples/sec   Loss 2.4262   LearningRate 0.0074   Epoch: 17   Global Step: 182010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:57:36,670-Speed 5985.59 samples/sec   Loss 2.4179   LearningRate 0.0074   Epoch: 17   Global Step: 182020   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:57:43,548-Speed 5957.00 samples/sec   Loss 2.3923   LearningRate 0.0074   Epoch: 17   Global Step: 182030   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:57:50,552-Speed 5849.49 samples/sec   Loss 2.4115   LearningRate 0.0074   Epoch: 17   Global Step: 182040   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:57:57,504-Speed 5893.39 samples/sec   Loss 2.3960   LearningRate 0.0074   Epoch: 17   Global Step: 182050   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:58:04,411-Speed 5931.94 samples/sec   Loss 2.3850   LearningRate 0.0074   Epoch: 17   Global Step: 182060   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:58:11,262-Speed 5978.94 samples/sec   Loss 2.4188   LearningRate 0.0074   Epoch: 17   Global Step: 182070   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:58:18,127-Speed 5968.38 samples/sec   Loss 2.3757   LearningRate 0.0073   Epoch: 17   Global Step: 182080   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:58:24,973-Speed 5984.03 samples/sec   Loss 2.4052   LearningRate 0.0073   Epoch: 17   Global Step: 182090   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:58:31,884-Speed 5928.09 samples/sec   Loss 2.4706   LearningRate 0.0073   Epoch: 17   Global Step: 182100   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:58:38,740-Speed 5975.81 samples/sec   Loss 2.3990   LearningRate 0.0073   Epoch: 17   Global Step: 182110   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 07:58:45,593-Speed 5977.95 samples/sec   Loss 2.4267   LearningRate 0.0073   Epoch: 17   Global Step: 182120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:58:52,476-Speed 5952.15 samples/sec   Loss 2.4029   LearningRate 0.0073   Epoch: 17   Global Step: 182130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:58:59,401-Speed 5915.67 samples/sec   Loss 2.4031   LearningRate 0.0073   Epoch: 17   Global Step: 182140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:59:06,265-Speed 5968.62 samples/sec   Loss 2.3707   LearningRate 0.0073   Epoch: 17   Global Step: 182150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:59:13,140-Speed 5958.37 samples/sec   Loss 2.3830   LearningRate 0.0073   Epoch: 17   Global Step: 182160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:59:19,998-Speed 5973.55 samples/sec   Loss 2.4386   LearningRate 0.0073   Epoch: 17   Global Step: 182170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:59:26,847-Speed 5981.29 samples/sec   Loss 2.3621   LearningRate 0.0073   Epoch: 17   Global Step: 182180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:59:33,701-Speed 5977.06 samples/sec   Loss 2.4133   LearningRate 0.0073   Epoch: 17   Global Step: 182190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:59:40,565-Speed 5968.57 samples/sec   Loss 2.4207   LearningRate 0.0073   Epoch: 17   Global Step: 182200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:59:47,427-Speed 5972.07 samples/sec   Loss 2.3792   LearningRate 0.0073   Epoch: 17   Global Step: 182210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 07:59:54,279-Speed 5978.77 samples/sec   Loss 2.3715   LearningRate 0.0073   Epoch: 17   Global Step: 182220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:00:01,145-Speed 5967.12 samples/sec   Loss 2.4069   LearningRate 0.0073   Epoch: 17   Global Step: 182230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:00:08,007-Speed 5970.18 samples/sec   Loss 2.3732   LearningRate 0.0073   Epoch: 17   Global Step: 182240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:00:14,855-Speed 5982.45 samples/sec   Loss 2.3816   LearningRate 0.0073   Epoch: 17   Global Step: 182250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:00:21,709-Speed 5977.46 samples/sec   Loss 2.3908   LearningRate 0.0072   Epoch: 17   Global Step: 182260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:00:28,595-Speed 5949.37 samples/sec   Loss 2.3951   LearningRate 0.0072   Epoch: 17   Global Step: 182270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:00:35,525-Speed 5911.43 samples/sec   Loss 2.4082   LearningRate 0.0072   Epoch: 17   Global Step: 182280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:00:42,402-Speed 5956.99 samples/sec   Loss 2.3759   LearningRate 0.0072   Epoch: 17   Global Step: 182290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:00:49,245-Speed 5986.87 samples/sec   Loss 2.3782   LearningRate 0.0072   Epoch: 17   Global Step: 182300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:00:56,096-Speed 5980.54 samples/sec   Loss 2.3656   LearningRate 0.0072   Epoch: 17   Global Step: 182310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:01:02,960-Speed 5968.38 samples/sec   Loss 2.4231   LearningRate 0.0072   Epoch: 17   Global Step: 182320   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:01:09,842-Speed 5954.21 samples/sec   Loss 2.3672   LearningRate 0.0072   Epoch: 17   Global Step: 182330   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:01:16,716-Speed 5961.62 samples/sec   Loss 2.4350   LearningRate 0.0072   Epoch: 17   Global Step: 182340   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:01:23,575-Speed 5972.46 samples/sec   Loss 2.3892   LearningRate 0.0072   Epoch: 17   Global Step: 182350   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:01:30,448-Speed 5960.64 samples/sec   Loss 2.3663   LearningRate 0.0072   Epoch: 17   Global Step: 182360   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:01:37,311-Speed 5972.30 samples/sec   Loss 2.4000   LearningRate 0.0072   Epoch: 17   Global Step: 182370   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:01:44,173-Speed 5970.07 samples/sec   Loss 2.4069   LearningRate 0.0072   Epoch: 17   Global Step: 182380   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:01:51,022-Speed 5981.95 samples/sec   Loss 2.4214   LearningRate 0.0072   Epoch: 17   Global Step: 182390   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:01:57,876-Speed 5977.22 samples/sec   Loss 2.3503   LearningRate 0.0072   Epoch: 17   Global Step: 182400   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:02:04,729-Speed 5978.50 samples/sec   Loss 2.3704   LearningRate 0.0072   Epoch: 17   Global Step: 182410   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:02:11,597-Speed 5964.61 samples/sec   Loss 2.3674   LearningRate 0.0072   Epoch: 17   Global Step: 182420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:02:18,459-Speed 5969.82 samples/sec   Loss 2.3514   LearningRate 0.0071   Epoch: 17   Global Step: 182430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:02:25,320-Speed 5973.86 samples/sec   Loss 2.3945   LearningRate 0.0071   Epoch: 17   Global Step: 182440   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 08:02:32,181-Speed 5971.86 samples/sec   Loss 2.3951   LearningRate 0.0071   Epoch: 17   Global Step: 182450   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 08:02:39,077-Speed 5940.24 samples/sec   Loss 2.3539   LearningRate 0.0071   Epoch: 17   Global Step: 182460   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 08:02:45,950-Speed 5960.43 samples/sec   Loss 2.4117   LearningRate 0.0071   Epoch: 17   Global Step: 182470   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 08:02:52,806-Speed 5975.24 samples/sec   Loss 2.3656   LearningRate 0.0071   Epoch: 17   Global Step: 182480   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 08:02:59,663-Speed 5975.10 samples/sec   Loss 2.3425   LearningRate 0.0071   Epoch: 17   Global Step: 182490   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 08:03:06,535-Speed 5962.00 samples/sec   Loss 2.3937   LearningRate 0.0071   Epoch: 17   Global Step: 182500   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 08:03:13,393-Speed 5973.95 samples/sec   Loss 2.3211   LearningRate 0.0071   Epoch: 17   Global Step: 182510   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 08:03:20,252-Speed 5973.12 samples/sec   Loss 2.4037   LearningRate 0.0071   Epoch: 17   Global Step: 182520   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 08:03:27,155-Speed 5934.69 samples/sec   Loss 2.3478   LearningRate 0.0071   Epoch: 17   Global Step: 182530   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-01-09 08:03:34,013-Speed 5974.05 samples/sec   Loss 2.3949   LearningRate 0.0071   Epoch: 17   Global Step: 182540   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:03:40,871-Speed 5973.14 samples/sec   Loss 2.3311   LearningRate 0.0071   Epoch: 17   Global Step: 182550   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:03:47,716-Speed 5985.03 samples/sec   Loss 2.3782   LearningRate 0.0071   Epoch: 17   Global Step: 182560   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:03:54,564-Speed 5982.58 samples/sec   Loss 2.3672   LearningRate 0.0071   Epoch: 17   Global Step: 182570   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:04:01,428-Speed 5968.23 samples/sec   Loss 2.3426   LearningRate 0.0071   Epoch: 17   Global Step: 182580   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:04:08,286-Speed 5974.16 samples/sec   Loss 2.3675   LearningRate 0.0071   Epoch: 17   Global Step: 182590   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:04:15,136-Speed 5979.88 samples/sec   Loss 2.3847   LearningRate 0.0071   Epoch: 17   Global Step: 182600   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:04:21,994-Speed 5975.88 samples/sec   Loss 2.4151   LearningRate 0.0070   Epoch: 17   Global Step: 182610   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:04:28,861-Speed 5965.57 samples/sec   Loss 2.3470   LearningRate 0.0070   Epoch: 17   Global Step: 182620   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:04:35,712-Speed 5979.97 samples/sec   Loss 2.4114   LearningRate 0.0070   Epoch: 17   Global Step: 182630   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:04:42,607-Speed 5944.74 samples/sec   Loss 2.3777   LearningRate 0.0070   Epoch: 17   Global Step: 182640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:04:49,478-Speed 5962.93 samples/sec   Loss 2.3676   LearningRate 0.0070   Epoch: 17   Global Step: 182650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:04:56,321-Speed 5986.20 samples/sec   Loss 2.3630   LearningRate 0.0070   Epoch: 17   Global Step: 182660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:05:03,177-Speed 5975.70 samples/sec   Loss 2.3927   LearningRate 0.0070   Epoch: 17   Global Step: 182670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:05:10,045-Speed 5967.84 samples/sec   Loss 2.3825   LearningRate 0.0070   Epoch: 17   Global Step: 182680   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:05:16,916-Speed 5962.29 samples/sec   Loss 2.3604   LearningRate 0.0070   Epoch: 17   Global Step: 182690   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:05:23,771-Speed 5977.69 samples/sec   Loss 2.3800   LearningRate 0.0070   Epoch: 17   Global Step: 182700   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:05:30,625-Speed 5976.98 samples/sec   Loss 2.3838   LearningRate 0.0070   Epoch: 17   Global Step: 182710   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:05:37,517-Speed 5947.09 samples/sec   Loss 2.3949   LearningRate 0.0070   Epoch: 17   Global Step: 182720   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:05:44,436-Speed 5921.28 samples/sec   Loss 2.3702   LearningRate 0.0070   Epoch: 17   Global Step: 182730   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:05:51,302-Speed 5967.09 samples/sec   Loss 2.3576   LearningRate 0.0070   Epoch: 17   Global Step: 182740   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:05:58,149-Speed 5982.54 samples/sec   Loss 2.3692   LearningRate 0.0070   Epoch: 17   Global Step: 182750   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:06:05,004-Speed 5975.90 samples/sec   Loss 2.3525   LearningRate 0.0070   Epoch: 17   Global Step: 182760   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:06:11,855-Speed 5982.77 samples/sec   Loss 2.3472   LearningRate 0.0070   Epoch: 17   Global Step: 182770   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:06:18,723-Speed 5965.10 samples/sec   Loss 2.3754   LearningRate 0.0069   Epoch: 17   Global Step: 182780   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:06:25,566-Speed 5987.12 samples/sec   Loss 2.3589   LearningRate 0.0069   Epoch: 17   Global Step: 182790   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:06:32,415-Speed 5981.68 samples/sec   Loss 2.3618   LearningRate 0.0069   Epoch: 17   Global Step: 182800   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:06:39,261-Speed 5984.19 samples/sec   Loss 2.3712   LearningRate 0.0069   Epoch: 17   Global Step: 182810   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:06:46,124-Speed 5969.99 samples/sec   Loss 2.3489   LearningRate 0.0069   Epoch: 17   Global Step: 182820   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:06:52,968-Speed 5985.73 samples/sec   Loss 2.3544   LearningRate 0.0069   Epoch: 17   Global Step: 182830   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:06:59,831-Speed 5969.36 samples/sec   Loss 2.3591   LearningRate 0.0069   Epoch: 17   Global Step: 182840   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:07:06,680-Speed 5981.57 samples/sec   Loss 2.3788   LearningRate 0.0069   Epoch: 17   Global Step: 182850   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:07:13,531-Speed 5982.07 samples/sec   Loss 2.3731   LearningRate 0.0069   Epoch: 17   Global Step: 182860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:07:20,374-Speed 5986.45 samples/sec   Loss 2.3841   LearningRate 0.0069   Epoch: 17   Global Step: 182870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:07:27,249-Speed 5961.61 samples/sec   Loss 2.3518   LearningRate 0.0069   Epoch: 17   Global Step: 182880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:07:34,097-Speed 5983.28 samples/sec   Loss 2.3590   LearningRate 0.0069   Epoch: 17   Global Step: 182890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:07:40,982-Speed 5951.56 samples/sec   Loss 2.3302   LearningRate 0.0069   Epoch: 17   Global Step: 182900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:07:47,829-Speed 5982.83 samples/sec   Loss 2.3628   LearningRate 0.0069   Epoch: 17   Global Step: 182910   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:07:54,702-Speed 5963.70 samples/sec   Loss 2.3773   LearningRate 0.0069   Epoch: 17   Global Step: 182920   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:08:01,570-Speed 5965.12 samples/sec   Loss 2.3350   LearningRate 0.0069   Epoch: 17   Global Step: 182930   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:08:08,414-Speed 5986.45 samples/sec   Loss 2.3358   LearningRate 0.0069   Epoch: 17   Global Step: 182940   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:08:15,272-Speed 5973.89 samples/sec   Loss 2.3568   LearningRate 0.0069   Epoch: 17   Global Step: 182950   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:08:22,129-Speed 5975.34 samples/sec   Loss 2.3725   LearningRate 0.0068   Epoch: 17   Global Step: 182960   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:08:28,992-Speed 5968.51 samples/sec   Loss 2.3476   LearningRate 0.0068   Epoch: 17   Global Step: 182970   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:08:35,844-Speed 5979.18 samples/sec   Loss 2.3666   LearningRate 0.0068   Epoch: 17   Global Step: 182980   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:08:42,698-Speed 5977.98 samples/sec   Loss 2.3665   LearningRate 0.0068   Epoch: 17   Global Step: 182990   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:08:49,571-Speed 5960.56 samples/sec   Loss 2.3450   LearningRate 0.0068   Epoch: 17   Global Step: 183000   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:08:56,446-Speed 5959.73 samples/sec   Loss 2.3308   LearningRate 0.0068   Epoch: 17   Global Step: 183010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:09:03,311-Speed 5967.34 samples/sec   Loss 2.3634   LearningRate 0.0068   Epoch: 17   Global Step: 183020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:09:10,162-Speed 5979.45 samples/sec   Loss 2.3399   LearningRate 0.0068   Epoch: 17   Global Step: 183030   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:09:17,018-Speed 5975.31 samples/sec   Loss 2.3680   LearningRate 0.0068   Epoch: 17   Global Step: 183040   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:09:23,889-Speed 5962.61 samples/sec   Loss 2.3336   LearningRate 0.0068   Epoch: 17   Global Step: 183050   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:09:30,757-Speed 5964.86 samples/sec   Loss 2.3662   LearningRate 0.0068   Epoch: 17   Global Step: 183060   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:09:37,622-Speed 5968.62 samples/sec   Loss 2.3754   LearningRate 0.0068   Epoch: 17   Global Step: 183070   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:09:44,480-Speed 5973.31 samples/sec   Loss 2.3289   LearningRate 0.0068   Epoch: 17   Global Step: 183080   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:09:51,341-Speed 5971.30 samples/sec   Loss 2.3224   LearningRate 0.0068   Epoch: 17   Global Step: 183090   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:09:58,235-Speed 5942.68 samples/sec   Loss 2.3721   LearningRate 0.0068   Epoch: 17   Global Step: 183100   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:10:05,113-Speed 5956.19 samples/sec   Loss 2.3694   LearningRate 0.0068   Epoch: 17   Global Step: 183110   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:10:12,001-Speed 5947.62 samples/sec   Loss 2.3324   LearningRate 0.0068   Epoch: 17   Global Step: 183120   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:10:18,900-Speed 5938.88 samples/sec   Loss 2.3273   LearningRate 0.0068   Epoch: 17   Global Step: 183130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:10:25,778-Speed 5955.69 samples/sec   Loss 2.2948   LearningRate 0.0067   Epoch: 17   Global Step: 183140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:10:32,709-Speed 5911.39 samples/sec   Loss 2.3331   LearningRate 0.0067   Epoch: 17   Global Step: 183150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:10:39,581-Speed 5961.43 samples/sec   Loss 2.3319   LearningRate 0.0067   Epoch: 17   Global Step: 183160   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:10:46,444-Speed 5971.09 samples/sec   Loss 2.3674   LearningRate 0.0067   Epoch: 17   Global Step: 183170   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:10:53,320-Speed 5957.58 samples/sec   Loss 2.3140   LearningRate 0.0067   Epoch: 17   Global Step: 183180   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:11:00,173-Speed 5978.53 samples/sec   Loss 2.3473   LearningRate 0.0067   Epoch: 17   Global Step: 183190   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:11:07,050-Speed 5957.06 samples/sec   Loss 2.3254   LearningRate 0.0067   Epoch: 17   Global Step: 183200   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:11:13,926-Speed 5957.96 samples/sec   Loss 2.3425   LearningRate 0.0067   Epoch: 17   Global Step: 183210   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:11:20,786-Speed 5972.49 samples/sec   Loss 2.3655   LearningRate 0.0067   Epoch: 17   Global Step: 183220   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:11:27,650-Speed 5968.60 samples/sec   Loss 2.3302   LearningRate 0.0067   Epoch: 17   Global Step: 183230   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:11:34,510-Speed 5971.99 samples/sec   Loss 2.3418   LearningRate 0.0067   Epoch: 17   Global Step: 183240   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:11:41,369-Speed 5973.06 samples/sec   Loss 2.3123   LearningRate 0.0067   Epoch: 17   Global Step: 183250   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:11:48,236-Speed 5966.12 samples/sec   Loss 2.3260   LearningRate 0.0067   Epoch: 17   Global Step: 183260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:11:55,081-Speed 5984.92 samples/sec   Loss 2.3514   LearningRate 0.0067   Epoch: 17   Global Step: 183270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:12:01,987-Speed 5932.44 samples/sec   Loss 2.3385   LearningRate 0.0067   Epoch: 17   Global Step: 183280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:12:08,859-Speed 5961.69 samples/sec   Loss 2.3230   LearningRate 0.0067   Epoch: 17   Global Step: 183290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:12:15,716-Speed 5974.14 samples/sec   Loss 2.3304   LearningRate 0.0067   Epoch: 17   Global Step: 183300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:12:22,653-Speed 5906.07 samples/sec   Loss 2.3380   LearningRate 0.0067   Epoch: 17   Global Step: 183310   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:12:29,630-Speed 5871.60 samples/sec   Loss 2.3411   LearningRate 0.0066   Epoch: 17   Global Step: 183320   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:12:36,486-Speed 5975.45 samples/sec   Loss 2.3487   LearningRate 0.0066   Epoch: 17   Global Step: 183330   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:12:43,334-Speed 5983.37 samples/sec   Loss 2.3516   LearningRate 0.0066   Epoch: 17   Global Step: 183340   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:12:50,189-Speed 5976.37 samples/sec   Loss 2.3684   LearningRate 0.0066   Epoch: 17   Global Step: 183350   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:12:57,048-Speed 5972.79 samples/sec   Loss 2.3417   LearningRate 0.0066   Epoch: 17   Global Step: 183360   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:13:03,906-Speed 5974.29 samples/sec   Loss 2.3013   LearningRate 0.0066   Epoch: 17   Global Step: 183370   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:13:10,778-Speed 5960.82 samples/sec   Loss 2.3226   LearningRate 0.0066   Epoch: 17   Global Step: 183380   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:13:17,632-Speed 5978.04 samples/sec   Loss 2.3102   LearningRate 0.0066   Epoch: 17   Global Step: 183390   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:13:24,489-Speed 5974.71 samples/sec   Loss 2.2961   LearningRate 0.0066   Epoch: 17   Global Step: 183400   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:13:31,364-Speed 5959.11 samples/sec   Loss 2.3090   LearningRate 0.0066   Epoch: 17   Global Step: 183410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:13:38,218-Speed 5977.01 samples/sec   Loss 2.3510   LearningRate 0.0066   Epoch: 17   Global Step: 183420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:13:45,069-Speed 5979.89 samples/sec   Loss 2.3141   LearningRate 0.0066   Epoch: 17   Global Step: 183430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:13:51,931-Speed 5969.97 samples/sec   Loss 2.3466   LearningRate 0.0066   Epoch: 17   Global Step: 183440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:13:58,826-Speed 5941.68 samples/sec   Loss 2.3440   LearningRate 0.0066   Epoch: 17   Global Step: 183450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:14:05,705-Speed 5955.30 samples/sec   Loss 2.3029   LearningRate 0.0066   Epoch: 17   Global Step: 183460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:14:12,562-Speed 5974.29 samples/sec   Loss 2.2943   LearningRate 0.0066   Epoch: 17   Global Step: 183470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:14:19,422-Speed 5972.26 samples/sec   Loss 2.3005   LearningRate 0.0066   Epoch: 17   Global Step: 183480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:14:26,268-Speed 5984.07 samples/sec   Loss 2.2990   LearningRate 0.0066   Epoch: 17   Global Step: 183490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:14:33,118-Speed 5980.27 samples/sec   Loss 2.3218   LearningRate 0.0065   Epoch: 17   Global Step: 183500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:14:39,953-Speed 5993.80 samples/sec   Loss 2.3130   LearningRate 0.0065   Epoch: 17   Global Step: 183510   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:14:46,795-Speed 5987.32 samples/sec   Loss 2.3281   LearningRate 0.0065   Epoch: 17   Global Step: 183520   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:14:53,664-Speed 5963.83 samples/sec   Loss 2.3100   LearningRate 0.0065   Epoch: 17   Global Step: 183530   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:15:00,525-Speed 5971.69 samples/sec   Loss 2.3170   LearningRate 0.0065   Epoch: 17   Global Step: 183540   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:15:07,395-Speed 5963.16 samples/sec   Loss 2.2818   LearningRate 0.0065   Epoch: 17   Global Step: 183550   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:15:14,288-Speed 5943.28 samples/sec   Loss 2.3521   LearningRate 0.0065   Epoch: 17   Global Step: 183560   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:15:21,150-Speed 5970.68 samples/sec   Loss 2.2993   LearningRate 0.0065   Epoch: 17   Global Step: 183570   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:15:28,016-Speed 5967.32 samples/sec   Loss 2.3262   LearningRate 0.0065   Epoch: 17   Global Step: 183580   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:15:34,854-Speed 5990.88 samples/sec   Loss 2.3306   LearningRate 0.0065   Epoch: 17   Global Step: 183590   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:15:41,705-Speed 5980.08 samples/sec   Loss 2.3078   LearningRate 0.0065   Epoch: 17   Global Step: 183600   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:15:48,589-Speed 5950.66 samples/sec   Loss 2.2866   LearningRate 0.0065   Epoch: 17   Global Step: 183610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:15:55,433-Speed 5986.29 samples/sec   Loss 2.3228   LearningRate 0.0065   Epoch: 17   Global Step: 183620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:16:02,271-Speed 5991.03 samples/sec   Loss 2.3049   LearningRate 0.0065   Epoch: 17   Global Step: 183630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:16:09,117-Speed 5984.29 samples/sec   Loss 2.3169   LearningRate 0.0065   Epoch: 17   Global Step: 183640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:16:15,962-Speed 5984.35 samples/sec   Loss 2.3148   LearningRate 0.0065   Epoch: 17   Global Step: 183650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:16:22,795-Speed 5995.48 samples/sec   Loss 2.3127   LearningRate 0.0065   Epoch: 17   Global Step: 183660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:16:29,642-Speed 5983.12 samples/sec   Loss 2.3056   LearningRate 0.0065   Epoch: 17   Global Step: 183670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:16:36,488-Speed 5983.88 samples/sec   Loss 2.2772   LearningRate 0.0064   Epoch: 17   Global Step: 183680   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:16:43,333-Speed 5984.70 samples/sec   Loss 2.2730   LearningRate 0.0064   Epoch: 17   Global Step: 183690   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:16:50,183-Speed 5980.96 samples/sec   Loss 2.3135   LearningRate 0.0064   Epoch: 17   Global Step: 183700   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:16:57,041-Speed 5974.06 samples/sec   Loss 2.3291   LearningRate 0.0064   Epoch: 17   Global Step: 183710   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:17:03,902-Speed 5970.76 samples/sec   Loss 2.2969   LearningRate 0.0064   Epoch: 17   Global Step: 183720   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:17:10,772-Speed 5962.91 samples/sec   Loss 2.3176   LearningRate 0.0064   Epoch: 17   Global Step: 183730   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:17:17,623-Speed 5979.88 samples/sec   Loss 2.3291   LearningRate 0.0064   Epoch: 17   Global Step: 183740   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:17:24,478-Speed 5976.20 samples/sec   Loss 2.3223   LearningRate 0.0064   Epoch: 17   Global Step: 183750   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:17:31,320-Speed 5987.75 samples/sec   Loss 2.2693   LearningRate 0.0064   Epoch: 17   Global Step: 183760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:17:38,167-Speed 5983.06 samples/sec   Loss 2.3001   LearningRate 0.0064   Epoch: 17   Global Step: 183770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:17:45,035-Speed 5965.28 samples/sec   Loss 2.2865   LearningRate 0.0064   Epoch: 17   Global Step: 183780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:17:51,927-Speed 5944.06 samples/sec   Loss 2.3074   LearningRate 0.0064   Epoch: 17   Global Step: 183790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:17:58,791-Speed 5968.83 samples/sec   Loss 2.2846   LearningRate 0.0064   Epoch: 17   Global Step: 183800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:18:05,635-Speed 5985.85 samples/sec   Loss 2.2844   LearningRate 0.0064   Epoch: 17   Global Step: 183810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:18:12,492-Speed 5974.81 samples/sec   Loss 2.3149   LearningRate 0.0064   Epoch: 17   Global Step: 183820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:18:19,353-Speed 5970.90 samples/sec   Loss 2.2992   LearningRate 0.0064   Epoch: 17   Global Step: 183830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:18:26,280-Speed 5915.10 samples/sec   Loss 2.3256   LearningRate 0.0064   Epoch: 17   Global Step: 183840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:18:33,133-Speed 5978.05 samples/sec   Loss 2.2986   LearningRate 0.0064   Epoch: 17   Global Step: 183850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:18:39,992-Speed 5973.47 samples/sec   Loss 2.3065   LearningRate 0.0064   Epoch: 17   Global Step: 183860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-01-09 08:18:46,872-Speed 5953.78 samples/sec   Loss 2.3015   LearningRate 0.0063   Epoch: 17   Global Step: 183870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:18:53,730-Speed 5974.13 samples/sec   Loss 2.3156   LearningRate 0.0063   Epoch: 17   Global Step: 183880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:19:00,601-Speed 5962.60 samples/sec   Loss 2.3064   LearningRate 0.0063   Epoch: 17   Global Step: 183890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:19:07,458-Speed 5974.68 samples/sec   Loss 2.3110   LearningRate 0.0063   Epoch: 17   Global Step: 183900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:19:14,311-Speed 5978.05 samples/sec   Loss 2.3062   LearningRate 0.0063   Epoch: 17   Global Step: 183910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:19:21,230-Speed 5920.82 samples/sec   Loss 2.2641   LearningRate 0.0063   Epoch: 17   Global Step: 183920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:19:28,081-Speed 5980.24 samples/sec   Loss 2.2616   LearningRate 0.0063   Epoch: 17   Global Step: 183930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:19:34,942-Speed 5970.79 samples/sec   Loss 2.3041   LearningRate 0.0063   Epoch: 17   Global Step: 183940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:19:41,784-Speed 5988.19 samples/sec   Loss 2.3026   LearningRate 0.0063   Epoch: 17   Global Step: 183950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:19:48,675-Speed 5944.81 samples/sec   Loss 2.3064   LearningRate 0.0063   Epoch: 17   Global Step: 183960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:19:55,533-Speed 5973.36 samples/sec   Loss 2.2970   LearningRate 0.0063   Epoch: 17   Global Step: 183970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:20:02,391-Speed 5973.81 samples/sec   Loss 2.2726   LearningRate 0.0063   Epoch: 17   Global Step: 183980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:20:09,253-Speed 5970.45 samples/sec   Loss 2.2628   LearningRate 0.0063   Epoch: 17   Global Step: 183990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:20:16,097-Speed 5985.85 samples/sec   Loss 2.2959   LearningRate 0.0063   Epoch: 17   Global Step: 184000   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:20:22,940-Speed 5988.05 samples/sec   Loss 2.2863   LearningRate 0.0063   Epoch: 17   Global Step: 184010   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:20:29,783-Speed 5987.29 samples/sec   Loss 2.3138   LearningRate 0.0063   Epoch: 17   Global Step: 184020   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:20:36,632-Speed 5981.35 samples/sec   Loss 2.2418   LearningRate 0.0063   Epoch: 17   Global Step: 184030   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:20:43,495-Speed 5969.89 samples/sec   Loss 2.2898   LearningRate 0.0063   Epoch: 17   Global Step: 184040   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:20:50,421-Speed 5915.11 samples/sec   Loss 2.2944   LearningRate 0.0062   Epoch: 17   Global Step: 184050   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:20:57,263-Speed 5987.62 samples/sec   Loss 2.3140   LearningRate 0.0062   Epoch: 17   Global Step: 184060   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:21:04,114-Speed 5980.05 samples/sec   Loss 2.2936   LearningRate 0.0062   Epoch: 17   Global Step: 184070   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:21:10,959-Speed 5985.84 samples/sec   Loss 2.2601   LearningRate 0.0062   Epoch: 17   Global Step: 184080   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:21:17,849-Speed 5946.77 samples/sec   Loss 2.2877   LearningRate 0.0062   Epoch: 17   Global Step: 184090   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:21:24,720-Speed 5961.82 samples/sec   Loss 2.3320   LearningRate 0.0062   Epoch: 17   Global Step: 184100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:21:31,665-Speed 5900.00 samples/sec   Loss 2.2826   LearningRate 0.0062   Epoch: 17   Global Step: 184110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:21:38,544-Speed 5955.33 samples/sec   Loss 2.3253   LearningRate 0.0062   Epoch: 17   Global Step: 184120   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:21:45,400-Speed 5975.12 samples/sec   Loss 2.2471   LearningRate 0.0062   Epoch: 17   Global Step: 184130   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:21:52,252-Speed 5979.83 samples/sec   Loss 2.2378   LearningRate 0.0062   Epoch: 17   Global Step: 184140   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:21:59,114-Speed 5970.00 samples/sec   Loss 2.3296   LearningRate 0.0062   Epoch: 17   Global Step: 184150   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:22:05,986-Speed 5960.98 samples/sec   Loss 2.2677   LearningRate 0.0062   Epoch: 17   Global Step: 184160   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:22:12,876-Speed 5946.15 samples/sec   Loss 2.2378   LearningRate 0.0062   Epoch: 17   Global Step: 184170   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:22:19,755-Speed 5955.36 samples/sec   Loss 2.2743   LearningRate 0.0062   Epoch: 17   Global Step: 184180   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:22:26,632-Speed 5956.76 samples/sec   Loss 2.2916   LearningRate 0.0062   Epoch: 17   Global Step: 184190   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:22:33,474-Speed 5987.79 samples/sec   Loss 2.2768   LearningRate 0.0062   Epoch: 17   Global Step: 184200   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:22:40,358-Speed 5951.71 samples/sec   Loss 2.2782   LearningRate 0.0062   Epoch: 17   Global Step: 184210   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:22:47,224-Speed 5968.81 samples/sec   Loss 2.2574   LearningRate 0.0062   Epoch: 17   Global Step: 184220   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:22:54,069-Speed 5985.50 samples/sec   Loss 2.2847   LearningRate 0.0062   Epoch: 17   Global Step: 184230   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:23:00,944-Speed 5958.89 samples/sec   Loss 2.2947   LearningRate 0.0061   Epoch: 17   Global Step: 184240   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:23:07,811-Speed 5965.19 samples/sec   Loss 2.2743   LearningRate 0.0061   Epoch: 17   Global Step: 184250   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:23:14,655-Speed 5985.73 samples/sec   Loss 2.2810   LearningRate 0.0061   Epoch: 17   Global Step: 184260   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:23:21,507-Speed 5979.30 samples/sec   Loss 2.2832   LearningRate 0.0061   Epoch: 17   Global Step: 184270   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:23:28,365-Speed 5973.31 samples/sec   Loss 2.2779   LearningRate 0.0061   Epoch: 17   Global Step: 184280   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:23:35,244-Speed 5955.13 samples/sec   Loss 2.2598   LearningRate 0.0061   Epoch: 17   Global Step: 184290   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:23:42,094-Speed 5981.40 samples/sec   Loss 2.2817   LearningRate 0.0061   Epoch: 17   Global Step: 184300   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:23:48,950-Speed 5976.31 samples/sec   Loss 2.2967   LearningRate 0.0061   Epoch: 17   Global Step: 184310   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:23:55,799-Speed 5981.39 samples/sec   Loss 2.2765   LearningRate 0.0061   Epoch: 17   Global Step: 184320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:24:02,646-Speed 5983.78 samples/sec   Loss 2.2870   LearningRate 0.0061   Epoch: 17   Global Step: 184330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-01-09 08:24:09,574-Speed 5912.80 samples/sec   Loss 2.2577   LearningRate 0.0061   Epoch: 17   Global Step: 184340   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:24:16,508-Speed 5908.43 samples/sec   Loss 2.2645   LearningRate 0.0061   Epoch: 17   Global Step: 184350   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:24:23,379-Speed 5962.09 samples/sec   Loss 2.2540   LearningRate 0.0061   Epoch: 17   Global Step: 184360   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:24:30,232-Speed 5977.62 samples/sec   Loss 2.2617   LearningRate 0.0061   Epoch: 17   Global Step: 184370   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:24:37,090-Speed 5973.49 samples/sec   Loss 2.2819   LearningRate 0.0061   Epoch: 17   Global Step: 184380   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:24:43,944-Speed 5976.47 samples/sec   Loss 2.2694   LearningRate 0.0061   Epoch: 17   Global Step: 184390   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:24:50,793-Speed 5981.65 samples/sec   Loss 2.2567   LearningRate 0.0061   Epoch: 17   Global Step: 184400   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:24:57,638-Speed 5984.66 samples/sec   Loss 2.2452   LearningRate 0.0061   Epoch: 17   Global Step: 184410   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:25:04,493-Speed 5976.28 samples/sec   Loss 2.2516   LearningRate 0.0061   Epoch: 17   Global Step: 184420   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-01-09 08:25:11,344-Speed 5979.68 samples/sec   Loss 2.2692   LearningRate 0.0060   Epoch: 17   Global Step: 184430   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:25:18,217-Speed 5960.87 samples/sec   Loss 2.2653   LearningRate 0.0060   Epoch: 17   Global Step: 184440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:25:25,083-Speed 5966.43 samples/sec   Loss 2.2629   LearningRate 0.0060   Epoch: 17   Global Step: 184450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:25:31,942-Speed 5972.69 samples/sec   Loss 2.2525   LearningRate 0.0060   Epoch: 17   Global Step: 184460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:25:38,791-Speed 5981.48 samples/sec   Loss 2.2878   LearningRate 0.0060   Epoch: 17   Global Step: 184470   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:25:45,650-Speed 5972.61 samples/sec   Loss 2.2473   LearningRate 0.0060   Epoch: 17   Global Step: 184480   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:25:52,494-Speed 5987.61 samples/sec   Loss 2.2547   LearningRate 0.0060   Epoch: 17   Global Step: 184490   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:25:59,360-Speed 5967.83 samples/sec   Loss 2.2477   LearningRate 0.0060   Epoch: 17   Global Step: 184500   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:26:06,213-Speed 5977.80 samples/sec   Loss 2.2471   LearningRate 0.0060   Epoch: 17   Global Step: 184510   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:26:13,057-Speed 5985.57 samples/sec   Loss 2.2686   LearningRate 0.0060   Epoch: 17   Global Step: 184520   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:26:19,936-Speed 5955.57 samples/sec   Loss 2.3005   LearningRate 0.0060   Epoch: 17   Global Step: 184530   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:26:26,797-Speed 5972.02 samples/sec   Loss 2.2600   LearningRate 0.0060   Epoch: 17   Global Step: 184540   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:26:33,646-Speed 5981.06 samples/sec   Loss 2.2444   LearningRate 0.0060   Epoch: 17   Global Step: 184550   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:26:40,515-Speed 5963.75 samples/sec   Loss 2.2531   LearningRate 0.0060   Epoch: 17   Global Step: 184560   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:26:47,381-Speed 5967.39 samples/sec   Loss 2.2412   LearningRate 0.0060   Epoch: 17   Global Step: 184570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:26:54,239-Speed 5974.27 samples/sec   Loss 2.2518   LearningRate 0.0060   Epoch: 17   Global Step: 184580   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:27:01,096-Speed 5975.29 samples/sec   Loss 2.2544   LearningRate 0.0060   Epoch: 17   Global Step: 184590   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:27:07,972-Speed 5957.78 samples/sec   Loss 2.2513   LearningRate 0.0060   Epoch: 17   Global Step: 184600   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:27:14,842-Speed 5962.92 samples/sec   Loss 2.2685   LearningRate 0.0060   Epoch: 17   Global Step: 184610   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:27:21,708-Speed 5967.41 samples/sec   Loss 2.2763   LearningRate 0.0059   Epoch: 17   Global Step: 184620   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:27:28,568-Speed 5974.02 samples/sec   Loss 2.2589   LearningRate 0.0059   Epoch: 17   Global Step: 184630   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:27:35,421-Speed 5980.68 samples/sec   Loss 2.2200   LearningRate 0.0059   Epoch: 17   Global Step: 184640   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:27:42,271-Speed 5980.22 samples/sec   Loss 2.2227   LearningRate 0.0059   Epoch: 17   Global Step: 184650   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:27:49,136-Speed 5967.49 samples/sec   Loss 2.2466   LearningRate 0.0059   Epoch: 17   Global Step: 184660   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:27:55,991-Speed 5976.91 samples/sec   Loss 2.2297   LearningRate 0.0059   Epoch: 17   Global Step: 184670   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:28:02,867-Speed 5957.84 samples/sec   Loss 2.2526   LearningRate 0.0059   Epoch: 17   Global Step: 184680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:28:09,728-Speed 5971.25 samples/sec   Loss 2.2973   LearningRate 0.0059   Epoch: 17   Global Step: 184690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:28:16,584-Speed 5976.01 samples/sec   Loss 2.2288   LearningRate 0.0059   Epoch: 17   Global Step: 184700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:28:23,437-Speed 5978.22 samples/sec   Loss 2.2865   LearningRate 0.0059   Epoch: 17   Global Step: 184710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:28:30,292-Speed 5977.00 samples/sec   Loss 2.2525   LearningRate 0.0059   Epoch: 17   Global Step: 184720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:28:37,147-Speed 5976.16 samples/sec   Loss 2.2373   LearningRate 0.0059   Epoch: 17   Global Step: 184730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:28:43,996-Speed 5981.48 samples/sec   Loss 2.2544   LearningRate 0.0059   Epoch: 17   Global Step: 184740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:28:50,861-Speed 5967.41 samples/sec   Loss 2.1925   LearningRate 0.0059   Epoch: 17   Global Step: 184750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:28:57,709-Speed 5982.53 samples/sec   Loss 2.2427   LearningRate 0.0059   Epoch: 17   Global Step: 184760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:29:04,563-Speed 5977.41 samples/sec   Loss 2.2293   LearningRate 0.0059   Epoch: 17   Global Step: 184770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:29:11,414-Speed 5979.27 samples/sec   Loss 2.2347   LearningRate 0.0059   Epoch: 17   Global Step: 184780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:29:18,268-Speed 5978.20 samples/sec   Loss 2.2342   LearningRate 0.0059   Epoch: 17   Global Step: 184790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:29:25,139-Speed 5962.02 samples/sec   Loss 2.2481   LearningRate 0.0059   Epoch: 17   Global Step: 184800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:29:31,995-Speed 5976.32 samples/sec   Loss 2.2463   LearningRate 0.0058   Epoch: 17   Global Step: 184810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:29:38,846-Speed 5979.04 samples/sec   Loss 2.2251   LearningRate 0.0058   Epoch: 17   Global Step: 184820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:29:45,701-Speed 5976.29 samples/sec   Loss 2.2058   LearningRate 0.0058   Epoch: 17   Global Step: 184830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:29:52,552-Speed 5980.62 samples/sec   Loss 2.2344   LearningRate 0.0058   Epoch: 17   Global Step: 184840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:29:59,401-Speed 5983.27 samples/sec   Loss 2.2447   LearningRate 0.0058   Epoch: 17   Global Step: 184850   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:30:06,257-Speed 5975.23 samples/sec   Loss 2.2522   LearningRate 0.0058   Epoch: 17   Global Step: 184860   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:30:13,117-Speed 5972.51 samples/sec   Loss 2.2596   LearningRate 0.0058   Epoch: 17   Global Step: 184870   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:30:19,979-Speed 5969.92 samples/sec   Loss 2.2671   LearningRate 0.0058   Epoch: 17   Global Step: 184880   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:30:26,833-Speed 5977.67 samples/sec   Loss 2.2552   LearningRate 0.0058   Epoch: 17   Global Step: 184890   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:30:33,702-Speed 5964.13 samples/sec   Loss 2.2752   LearningRate 0.0058   Epoch: 17   Global Step: 184900   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:30:40,590-Speed 5947.01 samples/sec   Loss 2.2489   LearningRate 0.0058   Epoch: 17   Global Step: 184910   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:30:47,467-Speed 5958.10 samples/sec   Loss 2.2194   LearningRate 0.0058   Epoch: 17   Global Step: 184920   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:30:54,338-Speed 5972.00 samples/sec   Loss 2.2116   LearningRate 0.0058   Epoch: 17   Global Step: 184930   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:31:01,195-Speed 5974.03 samples/sec   Loss 2.2152   LearningRate 0.0058   Epoch: 17   Global Step: 184940   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:31:08,059-Speed 5968.96 samples/sec   Loss 2.2217   LearningRate 0.0058   Epoch: 17   Global Step: 184950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:31:14,917-Speed 5974.86 samples/sec   Loss 2.2438   LearningRate 0.0058   Epoch: 17   Global Step: 184960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:31:21,788-Speed 5962.60 samples/sec   Loss 2.2196   LearningRate 0.0058   Epoch: 17   Global Step: 184970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:31:28,641-Speed 5977.96 samples/sec   Loss 2.2137   LearningRate 0.0058   Epoch: 17   Global Step: 184980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:31:35,484-Speed 5987.59 samples/sec   Loss 2.1970   LearningRate 0.0058   Epoch: 17   Global Step: 184990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:31:42,322-Speed 5990.42 samples/sec   Loss 2.2412   LearningRate 0.0058   Epoch: 17   Global Step: 185000   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:32:09,082-[lfw][185000]XNorm: 23.374321
Training: 2022-01-09 08:32:09,083-[lfw][185000]Accuracy-Flip: 0.99783+-0.00279
Training: 2022-01-09 08:32:09,083-[lfw][185000]Accuracy-Highest: 0.99833
Training: 2022-01-09 08:32:40,032-[cfp_fp][185000]XNorm: 21.096349
Training: 2022-01-09 08:32:40,033-[cfp_fp][185000]Accuracy-Flip: 0.99200+-0.00479
Training: 2022-01-09 08:32:40,034-[cfp_fp][185000]Accuracy-Highest: 0.99229
Training: 2022-01-09 08:33:06,746-[agedb_30][185000]XNorm: 22.766516
Training: 2022-01-09 08:33:06,747-[agedb_30][185000]Accuracy-Flip: 0.98033+-0.00618
Training: 2022-01-09 08:33:06,747-[agedb_30][185000]Accuracy-Highest: 0.98200
Training: 2022-01-09 08:33:13,626-Speed 448.62 samples/sec   Loss 2.1973   LearningRate 0.0057   Epoch: 17   Global Step: 185010   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:33:20,455-Speed 6000.40 samples/sec   Loss 2.2451   LearningRate 0.0057   Epoch: 17   Global Step: 185020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:33:27,304-Speed 5981.51 samples/sec   Loss 2.2024   LearningRate 0.0057   Epoch: 17   Global Step: 185030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:33:34,168-Speed 5968.51 samples/sec   Loss 2.2103   LearningRate 0.0057   Epoch: 17   Global Step: 185040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:33:41,013-Speed 5984.96 samples/sec   Loss 2.2262   LearningRate 0.0057   Epoch: 17   Global Step: 185050   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:33:47,861-Speed 5982.59 samples/sec   Loss 2.2339   LearningRate 0.0057   Epoch: 17   Global Step: 185060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:33:54,734-Speed 5961.01 samples/sec   Loss 2.2266   LearningRate 0.0057   Epoch: 17   Global Step: 185070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:34:01,610-Speed 5967.70 samples/sec   Loss 2.2282   LearningRate 0.0057   Epoch: 17   Global Step: 185080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:34:08,483-Speed 5960.65 samples/sec   Loss 2.2121   LearningRate 0.0057   Epoch: 17   Global Step: 185090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:34:15,346-Speed 5969.44 samples/sec   Loss 2.2161   LearningRate 0.0057   Epoch: 17   Global Step: 185100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:34:22,194-Speed 5982.84 samples/sec   Loss 2.2057   LearningRate 0.0057   Epoch: 17   Global Step: 185110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:34:29,052-Speed 5974.07 samples/sec   Loss 2.1872   LearningRate 0.0057   Epoch: 17   Global Step: 185120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:34:35,932-Speed 5953.74 samples/sec   Loss 2.1965   LearningRate 0.0057   Epoch: 17   Global Step: 185130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:34:42,794-Speed 5970.72 samples/sec   Loss 2.2433   LearningRate 0.0057   Epoch: 17   Global Step: 185140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:34:49,652-Speed 5973.38 samples/sec   Loss 2.1967   LearningRate 0.0057   Epoch: 17   Global Step: 185150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:34:56,507-Speed 5976.48 samples/sec   Loss 2.2087   LearningRate 0.0057   Epoch: 17   Global Step: 185160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:35:03,385-Speed 5956.89 samples/sec   Loss 2.2214   LearningRate 0.0057   Epoch: 17   Global Step: 185170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:35:10,227-Speed 5987.78 samples/sec   Loss 2.2437   LearningRate 0.0057   Epoch: 17   Global Step: 185180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:35:17,068-Speed 5988.61 samples/sec   Loss 2.2219   LearningRate 0.0057   Epoch: 17   Global Step: 185190   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:35:23,912-Speed 5985.51 samples/sec   Loss 2.2200   LearningRate 0.0056   Epoch: 17   Global Step: 185200   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:35:30,755-Speed 5987.37 samples/sec   Loss 2.2252   LearningRate 0.0056   Epoch: 17   Global Step: 185210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:35:37,617-Speed 5969.81 samples/sec   Loss 2.2056   LearningRate 0.0056   Epoch: 17   Global Step: 185220   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:35:44,482-Speed 5970.58 samples/sec   Loss 2.1873   LearningRate 0.0056   Epoch: 17   Global Step: 185230   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:35:51,340-Speed 5973.43 samples/sec   Loss 2.1566   LearningRate 0.0056   Epoch: 17   Global Step: 185240   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:35:58,203-Speed 5969.11 samples/sec   Loss 2.2330   LearningRate 0.0056   Epoch: 17   Global Step: 185250   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:36:05,091-Speed 5946.73 samples/sec   Loss 2.2220   LearningRate 0.0056   Epoch: 17   Global Step: 185260   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:36:11,950-Speed 5973.50 samples/sec   Loss 2.1771   LearningRate 0.0056   Epoch: 17   Global Step: 185270   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:36:18,821-Speed 5965.60 samples/sec   Loss 2.2209   LearningRate 0.0056   Epoch: 17   Global Step: 185280   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:36:25,691-Speed 5964.55 samples/sec   Loss 2.1954   LearningRate 0.0056   Epoch: 17   Global Step: 185290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:36:32,550-Speed 5972.36 samples/sec   Loss 2.2083   LearningRate 0.0056   Epoch: 17   Global Step: 185300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:36:39,402-Speed 5979.36 samples/sec   Loss 2.2154   LearningRate 0.0056   Epoch: 17   Global Step: 185310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:36:46,251-Speed 5980.88 samples/sec   Loss 2.2136   LearningRate 0.0056   Epoch: 17   Global Step: 185320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:36:53,118-Speed 5966.06 samples/sec   Loss 2.1821   LearningRate 0.0056   Epoch: 17   Global Step: 185330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:36:59,994-Speed 5957.68 samples/sec   Loss 2.2044   LearningRate 0.0056   Epoch: 17   Global Step: 185340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:37:06,864-Speed 5963.54 samples/sec   Loss 2.1702   LearningRate 0.0056   Epoch: 17   Global Step: 185350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:37:13,713-Speed 5982.39 samples/sec   Loss 2.2251   LearningRate 0.0056   Epoch: 17   Global Step: 185360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:37:20,567-Speed 5976.65 samples/sec   Loss 2.2209   LearningRate 0.0056   Epoch: 17   Global Step: 185370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:37:27,440-Speed 5960.53 samples/sec   Loss 2.1751   LearningRate 0.0056   Epoch: 17   Global Step: 185380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:37:34,284-Speed 5986.00 samples/sec   Loss 2.1951   LearningRate 0.0056   Epoch: 17   Global Step: 185390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-09 08:37:41,137-Speed 5980.17 samples/sec   Loss 2.2064   LearningRate 0.0055   Epoch: 17   Global Step: 185400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:37:47,989-Speed 5978.95 samples/sec   Loss 2.1798   LearningRate 0.0055   Epoch: 17   Global Step: 185410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:37:54,840-Speed 5979.95 samples/sec   Loss 2.1919   LearningRate 0.0055   Epoch: 17   Global Step: 185420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:38:01,705-Speed 5969.39 samples/sec   Loss 2.2259   LearningRate 0.0055   Epoch: 17   Global Step: 185430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:38:08,555-Speed 5981.39 samples/sec   Loss 2.2273   LearningRate 0.0055   Epoch: 17   Global Step: 185440   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:38:15,421-Speed 5968.04 samples/sec   Loss 2.2131   LearningRate 0.0055   Epoch: 17   Global Step: 185450   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:38:22,288-Speed 5965.82 samples/sec   Loss 2.2224   LearningRate 0.0055   Epoch: 17   Global Step: 185460   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:38:29,152-Speed 5968.49 samples/sec   Loss 2.1942   LearningRate 0.0055   Epoch: 17   Global Step: 185470   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:38:36,027-Speed 5958.91 samples/sec   Loss 2.2022   LearningRate 0.0055   Epoch: 17   Global Step: 185480   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:38:42,904-Speed 5957.31 samples/sec   Loss 2.2009   LearningRate 0.0055   Epoch: 17   Global Step: 185490   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:38:49,777-Speed 5960.94 samples/sec   Loss 2.1824   LearningRate 0.0055   Epoch: 17   Global Step: 185500   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:38:56,635-Speed 5973.96 samples/sec   Loss 2.2004   LearningRate 0.0055   Epoch: 17   Global Step: 185510   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:39:03,500-Speed 5967.31 samples/sec   Loss 2.1867   LearningRate 0.0055   Epoch: 17   Global Step: 185520   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:39:10,359-Speed 5973.82 samples/sec   Loss 2.1801   LearningRate 0.0055   Epoch: 17   Global Step: 185530   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:39:17,223-Speed 5968.57 samples/sec   Loss 2.1871   LearningRate 0.0055   Epoch: 17   Global Step: 185540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:39:24,097-Speed 5960.44 samples/sec   Loss 2.2165   LearningRate 0.0055   Epoch: 17   Global Step: 185550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:39:30,970-Speed 5960.65 samples/sec   Loss 2.1761   LearningRate 0.0055   Epoch: 17   Global Step: 185560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:39:37,847-Speed 5958.09 samples/sec   Loss 2.1926   LearningRate 0.0055   Epoch: 17   Global Step: 185570   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:39:44,717-Speed 5964.76 samples/sec   Loss 2.1956   LearningRate 0.0055   Epoch: 17   Global Step: 185580   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:39:51,573-Speed 5974.76 samples/sec   Loss 2.1924   LearningRate 0.0055   Epoch: 17   Global Step: 185590   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:39:58,422-Speed 5982.15 samples/sec   Loss 2.2134   LearningRate 0.0054   Epoch: 17   Global Step: 185600   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:40:05,261-Speed 5989.83 samples/sec   Loss 2.1834   LearningRate 0.0054   Epoch: 17   Global Step: 185610   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:40:12,128-Speed 5965.69 samples/sec   Loss 2.1932   LearningRate 0.0054   Epoch: 17   Global Step: 185620   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:40:18,990-Speed 5970.32 samples/sec   Loss 2.1876   LearningRate 0.0054   Epoch: 17   Global Step: 185630   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:40:25,849-Speed 5972.77 samples/sec   Loss 2.1834   LearningRate 0.0054   Epoch: 17   Global Step: 185640   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:40:32,714-Speed 5966.51 samples/sec   Loss 2.1679   LearningRate 0.0054   Epoch: 17   Global Step: 185650   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:40:39,573-Speed 5972.69 samples/sec   Loss 2.2173   LearningRate 0.0054   Epoch: 17   Global Step: 185660   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:40:46,445-Speed 5962.46 samples/sec   Loss 2.1945   LearningRate 0.0054   Epoch: 17   Global Step: 185670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:40:53,319-Speed 5959.72 samples/sec   Loss 2.1763   LearningRate 0.0054   Epoch: 17   Global Step: 185680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:41:00,201-Speed 5952.68 samples/sec   Loss 2.1705   LearningRate 0.0054   Epoch: 17   Global Step: 185690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:41:07,058-Speed 5977.48 samples/sec   Loss 2.2075   LearningRate 0.0054   Epoch: 17   Global Step: 185700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:41:13,932-Speed 5960.97 samples/sec   Loss 2.2194   LearningRate 0.0054   Epoch: 17   Global Step: 185710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:41:20,785-Speed 5978.22 samples/sec   Loss 2.2135   LearningRate 0.0054   Epoch: 17   Global Step: 185720   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:41:27,662-Speed 5956.72 samples/sec   Loss 2.2018   LearningRate 0.0054   Epoch: 17   Global Step: 185730   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:41:34,513-Speed 5979.87 samples/sec   Loss 2.1834   LearningRate 0.0054   Epoch: 17   Global Step: 185740   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:41:41,384-Speed 5962.36 samples/sec   Loss 2.1782   LearningRate 0.0054   Epoch: 17   Global Step: 185750   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:41:48,260-Speed 5958.12 samples/sec   Loss 2.1681   LearningRate 0.0054   Epoch: 17   Global Step: 185760   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:41:55,127-Speed 5968.71 samples/sec   Loss 2.1507   LearningRate 0.0054   Epoch: 17   Global Step: 185770   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:42:01,986-Speed 5972.61 samples/sec   Loss 2.1730   LearningRate 0.0054   Epoch: 17   Global Step: 185780   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:42:08,870-Speed 5951.24 samples/sec   Loss 2.1953   LearningRate 0.0054   Epoch: 17   Global Step: 185790   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:42:15,720-Speed 5981.32 samples/sec   Loss 2.1967   LearningRate 0.0053   Epoch: 17   Global Step: 185800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:42:22,598-Speed 5956.54 samples/sec   Loss 2.2070   LearningRate 0.0053   Epoch: 17   Global Step: 185810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:42:29,482-Speed 5951.48 samples/sec   Loss 2.1898   LearningRate 0.0053   Epoch: 17   Global Step: 185820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:42:36,338-Speed 5974.55 samples/sec   Loss 2.1845   LearningRate 0.0053   Epoch: 17   Global Step: 185830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:42:43,242-Speed 5937.33 samples/sec   Loss 2.1834   LearningRate 0.0053   Epoch: 17   Global Step: 185840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:42:50,127-Speed 5949.78 samples/sec   Loss 2.1642   LearningRate 0.0053   Epoch: 17   Global Step: 185850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:42:56,984-Speed 5976.99 samples/sec   Loss 2.1888   LearningRate 0.0053   Epoch: 17   Global Step: 185860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:43:03,846-Speed 5970.14 samples/sec   Loss 2.1925   LearningRate 0.0053   Epoch: 17   Global Step: 185870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:43:10,728-Speed 5953.59 samples/sec   Loss 2.1813   LearningRate 0.0053   Epoch: 17   Global Step: 185880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:43:17,580-Speed 5978.44 samples/sec   Loss 2.1554   LearningRate 0.0053   Epoch: 17   Global Step: 185890   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:43:24,427-Speed 5983.89 samples/sec   Loss 2.1977   LearningRate 0.0053   Epoch: 17   Global Step: 185900   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:43:31,279-Speed 5978.83 samples/sec   Loss 2.2051   LearningRate 0.0053   Epoch: 17   Global Step: 185910   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:43:38,171-Speed 5944.57 samples/sec   Loss 2.1850   LearningRate 0.0053   Epoch: 17   Global Step: 185920   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:43:45,016-Speed 5984.43 samples/sec   Loss 2.1933   LearningRate 0.0053   Epoch: 17   Global Step: 185930   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:43:51,953-Speed 5906.39 samples/sec   Loss 2.1895   LearningRate 0.0053   Epoch: 17   Global Step: 185940   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:43:58,923-Speed 5878.03 samples/sec   Loss 2.1761   LearningRate 0.0053   Epoch: 17   Global Step: 185950   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:44:05,879-Speed 5889.07 samples/sec   Loss 2.1577   LearningRate 0.0053   Epoch: 17   Global Step: 185960   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:44:12,866-Speed 5864.17 samples/sec   Loss 2.1471   LearningRate 0.0053   Epoch: 17   Global Step: 185970   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:44:19,720-Speed 5977.61 samples/sec   Loss 2.1988   LearningRate 0.0053   Epoch: 17   Global Step: 185980   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:44:26,686-Speed 5881.44 samples/sec   Loss 2.1855   LearningRate 0.0053   Epoch: 17   Global Step: 185990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:44:33,616-Speed 5911.39 samples/sec   Loss 2.1674   LearningRate 0.0052   Epoch: 17   Global Step: 186000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:44:40,490-Speed 5959.56 samples/sec   Loss 2.1772   LearningRate 0.0052   Epoch: 17   Global Step: 186010   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:44:47,362-Speed 5962.53 samples/sec   Loss 2.1503   LearningRate 0.0052   Epoch: 17   Global Step: 186020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:44:54,224-Speed 5970.20 samples/sec   Loss 2.1774   LearningRate 0.0052   Epoch: 17   Global Step: 186030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:45:01,097-Speed 5960.12 samples/sec   Loss 2.1575   LearningRate 0.0052   Epoch: 17   Global Step: 186040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:45:07,962-Speed 5967.76 samples/sec   Loss 2.1925   LearningRate 0.0052   Epoch: 17   Global Step: 186050   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:45:14,829-Speed 5966.09 samples/sec   Loss 2.1866   LearningRate 0.0052   Epoch: 17   Global Step: 186060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:45:21,683-Speed 5976.98 samples/sec   Loss 2.1870   LearningRate 0.0052   Epoch: 17   Global Step: 186070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:45:28,551-Speed 5964.42 samples/sec   Loss 2.1597   LearningRate 0.0052   Epoch: 17   Global Step: 186080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:45:35,432-Speed 5956.86 samples/sec   Loss 2.1666   LearningRate 0.0052   Epoch: 17   Global Step: 186090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:45:42,287-Speed 5976.22 samples/sec   Loss 2.1566   LearningRate 0.0052   Epoch: 17   Global Step: 186100   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 08:45:49,142-Speed 5975.98 samples/sec   Loss 2.1749   LearningRate 0.0052   Epoch: 17   Global Step: 186110   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 08:45:56,003-Speed 5972.85 samples/sec   Loss 2.1550   LearningRate 0.0052   Epoch: 17   Global Step: 186120   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 08:46:02,892-Speed 5948.43 samples/sec   Loss 2.1433   LearningRate 0.0052   Epoch: 17   Global Step: 186130   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 08:46:09,758-Speed 5966.95 samples/sec   Loss 2.1497   LearningRate 0.0052   Epoch: 17   Global Step: 186140   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 08:46:16,599-Speed 5988.27 samples/sec   Loss 2.1543   LearningRate 0.0052   Epoch: 17   Global Step: 186150   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 08:46:23,448-Speed 5984.44 samples/sec   Loss 2.1650   LearningRate 0.0052   Epoch: 17   Global Step: 186160   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 08:46:30,301-Speed 5985.92 samples/sec   Loss 2.1609   LearningRate 0.0052   Epoch: 17   Global Step: 186170   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 08:46:37,155-Speed 5976.29 samples/sec   Loss 2.1399   LearningRate 0.0052   Epoch: 17   Global Step: 186180   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 08:46:44,007-Speed 5981.42 samples/sec   Loss 2.1595   LearningRate 0.0052   Epoch: 17   Global Step: 186190   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 08:46:50,863-Speed 5974.92 samples/sec   Loss 2.1348   LearningRate 0.0052   Epoch: 17   Global Step: 186200   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:46:57,735-Speed 5961.98 samples/sec   Loss 2.1837   LearningRate 0.0051   Epoch: 17   Global Step: 186210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:47:04,588-Speed 5978.05 samples/sec   Loss 2.1521   LearningRate 0.0051   Epoch: 17   Global Step: 186220   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:47:11,433-Speed 5985.06 samples/sec   Loss 2.1561   LearningRate 0.0051   Epoch: 17   Global Step: 186230   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:47:18,289-Speed 5975.19 samples/sec   Loss 2.1330   LearningRate 0.0051   Epoch: 17   Global Step: 186240   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:47:25,144-Speed 5976.10 samples/sec   Loss 2.1941   LearningRate 0.0051   Epoch: 17   Global Step: 186250   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:47:31,996-Speed 5979.71 samples/sec   Loss 2.1624   LearningRate 0.0051   Epoch: 17   Global Step: 186260   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:47:38,839-Speed 5986.87 samples/sec   Loss 2.1381   LearningRate 0.0051   Epoch: 17   Global Step: 186270   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:47:45,688-Speed 5983.28 samples/sec   Loss 2.1335   LearningRate 0.0051   Epoch: 17   Global Step: 186280   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:47:52,605-Speed 5922.54 samples/sec   Loss 2.1593   LearningRate 0.0051   Epoch: 17   Global Step: 186290   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:47:59,489-Speed 5951.72 samples/sec   Loss 2.1560   LearningRate 0.0051   Epoch: 17   Global Step: 186300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:48:06,371-Speed 5955.02 samples/sec   Loss 2.1073   LearningRate 0.0051   Epoch: 17   Global Step: 186310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:48:13,255-Speed 5950.45 samples/sec   Loss 2.1514   LearningRate 0.0051   Epoch: 17   Global Step: 186320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:48:20,143-Speed 5948.19 samples/sec   Loss 2.1306   LearningRate 0.0051   Epoch: 17   Global Step: 186330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:48:27,094-Speed 5894.31 samples/sec   Loss 2.1508   LearningRate 0.0051   Epoch: 17   Global Step: 186340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:48:33,950-Speed 5975.23 samples/sec   Loss 2.1716   LearningRate 0.0051   Epoch: 17   Global Step: 186350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:48:40,810-Speed 5971.97 samples/sec   Loss 2.1389   LearningRate 0.0051   Epoch: 17   Global Step: 186360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:48:47,651-Speed 5988.03 samples/sec   Loss 2.1677   LearningRate 0.0051   Epoch: 17   Global Step: 186370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:48:54,542-Speed 5944.69 samples/sec   Loss 2.1556   LearningRate 0.0051   Epoch: 17   Global Step: 186380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:49:01,388-Speed 5984.34 samples/sec   Loss 2.1682   LearningRate 0.0051   Epoch: 17   Global Step: 186390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:49:08,236-Speed 5982.45 samples/sec   Loss 2.1619   LearningRate 0.0051   Epoch: 17   Global Step: 186400   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-01-09 08:49:15,086-Speed 5983.61 samples/sec   Loss 2.1472   LearningRate 0.0050   Epoch: 17   Global Step: 186410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:49:21,927-Speed 5987.85 samples/sec   Loss 2.1552   LearningRate 0.0050   Epoch: 17   Global Step: 186420   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:49:28,774-Speed 5983.62 samples/sec   Loss 2.1455   LearningRate 0.0050   Epoch: 17   Global Step: 186430   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:49:35,629-Speed 5976.20 samples/sec   Loss 2.1252   LearningRate 0.0050   Epoch: 17   Global Step: 186440   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:49:42,533-Speed 5934.07 samples/sec   Loss 2.1482   LearningRate 0.0050   Epoch: 17   Global Step: 186450   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:49:49,401-Speed 5965.91 samples/sec   Loss 2.1208   LearningRate 0.0050   Epoch: 17   Global Step: 186460   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:49:56,253-Speed 5978.43 samples/sec   Loss 2.1616   LearningRate 0.0050   Epoch: 17   Global Step: 186470   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:50:03,103-Speed 5981.13 samples/sec   Loss 2.1704   LearningRate 0.0050   Epoch: 17   Global Step: 186480   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:50:09,955-Speed 5978.24 samples/sec   Loss 2.1744   LearningRate 0.0050   Epoch: 17   Global Step: 186490   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:50:16,830-Speed 5958.90 samples/sec   Loss 2.1463   LearningRate 0.0050   Epoch: 17   Global Step: 186500   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:50:23,696-Speed 5966.85 samples/sec   Loss 2.1473   LearningRate 0.0050   Epoch: 17   Global Step: 186510   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:50:30,543-Speed 5984.79 samples/sec   Loss 2.1805   LearningRate 0.0050   Epoch: 17   Global Step: 186520   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:50:37,390-Speed 5983.28 samples/sec   Loss 2.1623   LearningRate 0.0050   Epoch: 17   Global Step: 186530   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:50:44,242-Speed 5979.06 samples/sec   Loss 2.1749   LearningRate 0.0050   Epoch: 17   Global Step: 186540   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:50:51,091-Speed 5980.77 samples/sec   Loss 2.1364   LearningRate 0.0050   Epoch: 17   Global Step: 186550   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:50:57,962-Speed 5962.40 samples/sec   Loss 2.1298   LearningRate 0.0050   Epoch: 17   Global Step: 186560   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:51:04,829-Speed 5966.37 samples/sec   Loss 2.1671   LearningRate 0.0050   Epoch: 17   Global Step: 186570   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:51:11,700-Speed 5961.96 samples/sec   Loss 2.1397   LearningRate 0.0050   Epoch: 17   Global Step: 186580   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:51:18,591-Speed 5947.91 samples/sec   Loss 2.1367   LearningRate 0.0050   Epoch: 17   Global Step: 186590   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:51:25,523-Speed 5910.38 samples/sec   Loss 2.1591   LearningRate 0.0050   Epoch: 17   Global Step: 186600   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:51:32,379-Speed 5975.45 samples/sec   Loss 2.1222   LearningRate 0.0050   Epoch: 17   Global Step: 186610   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:51:39,254-Speed 5960.06 samples/sec   Loss 2.1457   LearningRate 0.0049   Epoch: 17   Global Step: 186620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:51:46,104-Speed 5979.67 samples/sec   Loss 2.1366   LearningRate 0.0049   Epoch: 17   Global Step: 186630   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:51:52,949-Speed 5985.91 samples/sec   Loss 2.1545   LearningRate 0.0049   Epoch: 17   Global Step: 186640   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:51:59,793-Speed 5985.80 samples/sec   Loss 2.1356   LearningRate 0.0049   Epoch: 17   Global Step: 186650   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:52:22,706-Speed 1787.77 samples/sec   Loss 2.1843   LearningRate 0.0049   Epoch: 18   Global Step: 186660   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:52:29,533-Speed 6004.30 samples/sec   Loss 2.1161   LearningRate 0.0049   Epoch: 18   Global Step: 186670   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:52:36,386-Speed 5978.24 samples/sec   Loss 2.1415   LearningRate 0.0049   Epoch: 18   Global Step: 186680   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:52:43,226-Speed 5989.56 samples/sec   Loss 2.1449   LearningRate 0.0049   Epoch: 18   Global Step: 186690   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:52:50,079-Speed 5977.90 samples/sec   Loss 2.1549   LearningRate 0.0049   Epoch: 18   Global Step: 186700   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:52:56,918-Speed 5990.56 samples/sec   Loss 2.1328   LearningRate 0.0049   Epoch: 18   Global Step: 186710   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:53:03,780-Speed 5973.50 samples/sec   Loss 2.1084   LearningRate 0.0049   Epoch: 18   Global Step: 186720   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:53:10,630-Speed 5980.25 samples/sec   Loss 2.1204   LearningRate 0.0049   Epoch: 18   Global Step: 186730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:53:17,490-Speed 5971.57 samples/sec   Loss 2.1409   LearningRate 0.0049   Epoch: 18   Global Step: 186740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:53:24,375-Speed 5950.57 samples/sec   Loss 2.1152   LearningRate 0.0049   Epoch: 18   Global Step: 186750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:53:31,223-Speed 5984.43 samples/sec   Loss 2.0949   LearningRate 0.0049   Epoch: 18   Global Step: 186760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:53:38,101-Speed 5956.64 samples/sec   Loss 2.0984   LearningRate 0.0049   Epoch: 18   Global Step: 186770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:53:44,950-Speed 5981.46 samples/sec   Loss 2.1274   LearningRate 0.0049   Epoch: 18   Global Step: 186780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:53:51,820-Speed 5963.21 samples/sec   Loss 2.1257   LearningRate 0.0049   Epoch: 18   Global Step: 186790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:53:58,665-Speed 5984.84 samples/sec   Loss 2.1213   LearningRate 0.0049   Epoch: 18   Global Step: 186800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:54:05,530-Speed 5968.07 samples/sec   Loss 2.1228   LearningRate 0.0049   Epoch: 18   Global Step: 186810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:54:12,402-Speed 5962.14 samples/sec   Loss 2.1398   LearningRate 0.0049   Epoch: 18   Global Step: 186820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:54:19,255-Speed 5977.65 samples/sec   Loss 2.1033   LearningRate 0.0048   Epoch: 18   Global Step: 186830   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:54:26,107-Speed 5983.78 samples/sec   Loss 2.1368   LearningRate 0.0048   Epoch: 18   Global Step: 186840   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:54:32,966-Speed 5972.66 samples/sec   Loss 2.0972   LearningRate 0.0048   Epoch: 18   Global Step: 186850   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:54:39,812-Speed 5983.47 samples/sec   Loss 2.0914   LearningRate 0.0048   Epoch: 18   Global Step: 186860   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:54:46,713-Speed 5936.79 samples/sec   Loss 2.0817   LearningRate 0.0048   Epoch: 18   Global Step: 186870   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:54:53,569-Speed 5975.76 samples/sec   Loss 2.1391   LearningRate 0.0048   Epoch: 18   Global Step: 186880   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:55:00,431-Speed 5970.99 samples/sec   Loss 2.1108   LearningRate 0.0048   Epoch: 18   Global Step: 186890   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:55:07,292-Speed 5971.35 samples/sec   Loss 2.0666   LearningRate 0.0048   Epoch: 18   Global Step: 186900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:55:14,235-Speed 5900.82 samples/sec   Loss 2.1305   LearningRate 0.0048   Epoch: 18   Global Step: 186910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:55:21,076-Speed 5988.45 samples/sec   Loss 2.1097   LearningRate 0.0048   Epoch: 18   Global Step: 186920   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:55:27,961-Speed 5950.72 samples/sec   Loss 2.0906   LearningRate 0.0048   Epoch: 18   Global Step: 186930   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:55:34,807-Speed 5983.51 samples/sec   Loss 2.0852   LearningRate 0.0048   Epoch: 18   Global Step: 186940   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:55:41,674-Speed 5965.98 samples/sec   Loss 2.1019   LearningRate 0.0048   Epoch: 18   Global Step: 186950   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:55:48,643-Speed 5878.75 samples/sec   Loss 2.1116   LearningRate 0.0048   Epoch: 18   Global Step: 186960   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:55:55,510-Speed 5966.42 samples/sec   Loss 2.1337   LearningRate 0.0048   Epoch: 18   Global Step: 186970   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:56:02,358-Speed 5982.63 samples/sec   Loss 2.1347   LearningRate 0.0048   Epoch: 18   Global Step: 186980   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:56:09,231-Speed 5961.72 samples/sec   Loss 2.1267   LearningRate 0.0048   Epoch: 18   Global Step: 186990   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:56:16,080-Speed 5981.72 samples/sec   Loss 2.1102   LearningRate 0.0048   Epoch: 18   Global Step: 187000   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:56:22,925-Speed 5985.05 samples/sec   Loss 2.1083   LearningRate 0.0048   Epoch: 18   Global Step: 187010   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:56:29,772-Speed 5983.14 samples/sec   Loss 2.1345   LearningRate 0.0048   Epoch: 18   Global Step: 187020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:56:36,628-Speed 5976.98 samples/sec   Loss 2.1445   LearningRate 0.0048   Epoch: 18   Global Step: 187030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:56:43,500-Speed 5960.57 samples/sec   Loss 2.1106   LearningRate 0.0048   Epoch: 18   Global Step: 187040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:56:50,374-Speed 5960.16 samples/sec   Loss 2.1102   LearningRate 0.0047   Epoch: 18   Global Step: 187050   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:56:57,226-Speed 5980.32 samples/sec   Loss 2.1019   LearningRate 0.0047   Epoch: 18   Global Step: 187060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:57:04,084-Speed 5973.18 samples/sec   Loss 2.1322   LearningRate 0.0047   Epoch: 18   Global Step: 187070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:57:10,947-Speed 5969.93 samples/sec   Loss 2.1011   LearningRate 0.0047   Epoch: 18   Global Step: 187080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:57:17,817-Speed 5963.87 samples/sec   Loss 2.0967   LearningRate 0.0047   Epoch: 18   Global Step: 187090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:57:24,701-Speed 5950.45 samples/sec   Loss 2.1250   LearningRate 0.0047   Epoch: 18   Global Step: 187100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:57:31,569-Speed 5965.70 samples/sec   Loss 2.1281   LearningRate 0.0047   Epoch: 18   Global Step: 187110   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:57:38,453-Speed 5951.71 samples/sec   Loss 2.1270   LearningRate 0.0047   Epoch: 18   Global Step: 187120   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:57:45,303-Speed 5980.00 samples/sec   Loss 2.1158   LearningRate 0.0047   Epoch: 18   Global Step: 187130   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:57:52,188-Speed 5950.74 samples/sec   Loss 2.0958   LearningRate 0.0047   Epoch: 18   Global Step: 187140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:57:59,041-Speed 5979.25 samples/sec   Loss 2.1182   LearningRate 0.0047   Epoch: 18   Global Step: 187150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:58:05,888-Speed 5982.68 samples/sec   Loss 2.1100   LearningRate 0.0047   Epoch: 18   Global Step: 187160   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:58:12,745-Speed 5974.80 samples/sec   Loss 2.1465   LearningRate 0.0047   Epoch: 18   Global Step: 187170   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:58:19,598-Speed 5977.95 samples/sec   Loss 2.0962   LearningRate 0.0047   Epoch: 18   Global Step: 187180   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:58:26,440-Speed 5987.53 samples/sec   Loss 2.0693   LearningRate 0.0047   Epoch: 18   Global Step: 187190   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:58:33,325-Speed 5950.99 samples/sec   Loss 2.1033   LearningRate 0.0047   Epoch: 18   Global Step: 187200   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:58:40,206-Speed 5953.83 samples/sec   Loss 2.1249   LearningRate 0.0047   Epoch: 18   Global Step: 187210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:58:47,068-Speed 5969.77 samples/sec   Loss 2.1010   LearningRate 0.0047   Epoch: 18   Global Step: 187220   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:58:53,956-Speed 5948.02 samples/sec   Loss 2.0954   LearningRate 0.0047   Epoch: 18   Global Step: 187230   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:59:00,836-Speed 5954.27 samples/sec   Loss 2.0920   LearningRate 0.0047   Epoch: 18   Global Step: 187240   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:59:07,700-Speed 5968.54 samples/sec   Loss 2.0805   LearningRate 0.0047   Epoch: 18   Global Step: 187250   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 08:59:14,562-Speed 5970.28 samples/sec   Loss 2.0723   LearningRate 0.0046   Epoch: 18   Global Step: 187260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:59:21,410-Speed 5982.76 samples/sec   Loss 2.1209   LearningRate 0.0046   Epoch: 18   Global Step: 187270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:59:28,258-Speed 5982.31 samples/sec   Loss 2.0879   LearningRate 0.0046   Epoch: 18   Global Step: 187280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:59:35,105-Speed 5982.51 samples/sec   Loss 2.1088   LearningRate 0.0046   Epoch: 18   Global Step: 187290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:59:41,970-Speed 5967.87 samples/sec   Loss 2.0798   LearningRate 0.0046   Epoch: 18   Global Step: 187300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:59:48,828-Speed 5974.03 samples/sec   Loss 2.0930   LearningRate 0.0046   Epoch: 18   Global Step: 187310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 08:59:55,662-Speed 5994.89 samples/sec   Loss 2.0962   LearningRate 0.0046   Epoch: 18   Global Step: 187320   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:00:02,520-Speed 5973.82 samples/sec   Loss 2.0859   LearningRate 0.0046   Epoch: 18   Global Step: 187330   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:00:09,365-Speed 5984.70 samples/sec   Loss 2.0672   LearningRate 0.0046   Epoch: 18   Global Step: 187340   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:00:16,216-Speed 5979.71 samples/sec   Loss 2.1086   LearningRate 0.0046   Epoch: 18   Global Step: 187350   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:00:23,074-Speed 5974.62 samples/sec   Loss 2.0853   LearningRate 0.0046   Epoch: 18   Global Step: 187360   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:00:29,930-Speed 5975.00 samples/sec   Loss 2.1237   LearningRate 0.0046   Epoch: 18   Global Step: 187370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:00:36,798-Speed 5965.04 samples/sec   Loss 2.0891   LearningRate 0.0046   Epoch: 18   Global Step: 187380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:00:43,683-Speed 5950.59 samples/sec   Loss 2.0886   LearningRate 0.0046   Epoch: 18   Global Step: 187390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:00:50,530-Speed 5983.06 samples/sec   Loss 2.0870   LearningRate 0.0046   Epoch: 18   Global Step: 187400   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 09:00:57,382-Speed 5978.99 samples/sec   Loss 2.0746   LearningRate 0.0046   Epoch: 18   Global Step: 187410   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 09:01:04,256-Speed 5960.62 samples/sec   Loss 2.0999   LearningRate 0.0046   Epoch: 18   Global Step: 187420   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 09:01:11,138-Speed 5952.50 samples/sec   Loss 2.0847   LearningRate 0.0046   Epoch: 18   Global Step: 187430   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 09:01:18,014-Speed 5958.53 samples/sec   Loss 2.0789   LearningRate 0.0046   Epoch: 18   Global Step: 187440   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 09:01:24,877-Speed 5969.57 samples/sec   Loss 2.0705   LearningRate 0.0046   Epoch: 18   Global Step: 187450   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 09:01:31,736-Speed 5973.16 samples/sec   Loss 2.0757   LearningRate 0.0046   Epoch: 18   Global Step: 187460   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 09:01:38,584-Speed 5982.74 samples/sec   Loss 2.0898   LearningRate 0.0046   Epoch: 18   Global Step: 187470   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 09:01:45,452-Speed 5965.31 samples/sec   Loss 2.1020   LearningRate 0.0045   Epoch: 18   Global Step: 187480   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 09:01:52,340-Speed 5947.69 samples/sec   Loss 2.0809   LearningRate 0.0045   Epoch: 18   Global Step: 187490   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-01-09 09:01:59,208-Speed 5965.05 samples/sec   Loss 2.0843   LearningRate 0.0045   Epoch: 18   Global Step: 187500   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:02:06,080-Speed 5962.06 samples/sec   Loss 2.1155   LearningRate 0.0045   Epoch: 18   Global Step: 187510   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:02:12,961-Speed 5953.68 samples/sec   Loss 2.0874   LearningRate 0.0045   Epoch: 18   Global Step: 187520   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:02:19,817-Speed 5976.02 samples/sec   Loss 2.1177   LearningRate 0.0045   Epoch: 18   Global Step: 187530   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:02:26,678-Speed 5972.36 samples/sec   Loss 2.0964   LearningRate 0.0045   Epoch: 18   Global Step: 187540   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:02:33,532-Speed 5977.08 samples/sec   Loss 2.0553   LearningRate 0.0045   Epoch: 18   Global Step: 187550   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:02:40,387-Speed 5975.82 samples/sec   Loss 2.0740   LearningRate 0.0045   Epoch: 18   Global Step: 187560   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:02:47,255-Speed 5966.12 samples/sec   Loss 2.0848   LearningRate 0.0045   Epoch: 18   Global Step: 187570   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:02:54,111-Speed 5974.66 samples/sec   Loss 2.1059   LearningRate 0.0045   Epoch: 18   Global Step: 187580   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:03:00,974-Speed 5972.70 samples/sec   Loss 2.0734   LearningRate 0.0045   Epoch: 18   Global Step: 187590   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:03:07,821-Speed 5983.49 samples/sec   Loss 2.0857   LearningRate 0.0045   Epoch: 18   Global Step: 187600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:03:14,668-Speed 5982.73 samples/sec   Loss 2.0624   LearningRate 0.0045   Epoch: 18   Global Step: 187610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:03:21,528-Speed 5971.85 samples/sec   Loss 2.0767   LearningRate 0.0045   Epoch: 18   Global Step: 187620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:03:28,368-Speed 5989.72 samples/sec   Loss 2.0652   LearningRate 0.0045   Epoch: 18   Global Step: 187630   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:03:35,231-Speed 5968.36 samples/sec   Loss 2.1186   LearningRate 0.0045   Epoch: 18   Global Step: 187640   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:03:42,106-Speed 5963.22 samples/sec   Loss 2.0600   LearningRate 0.0045   Epoch: 18   Global Step: 187650   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:03:48,968-Speed 5970.34 samples/sec   Loss 2.0595   LearningRate 0.0045   Epoch: 18   Global Step: 187660   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:03:55,837-Speed 5964.15 samples/sec   Loss 2.0903   LearningRate 0.0045   Epoch: 18   Global Step: 187670   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:04:02,689-Speed 5979.31 samples/sec   Loss 2.0744   LearningRate 0.0045   Epoch: 18   Global Step: 187680   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:04:09,558-Speed 5967.07 samples/sec   Loss 2.0497   LearningRate 0.0045   Epoch: 18   Global Step: 187690   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:04:16,403-Speed 5985.38 samples/sec   Loss 2.0766   LearningRate 0.0044   Epoch: 18   Global Step: 187700   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:04:23,263-Speed 5971.93 samples/sec   Loss 2.1049   LearningRate 0.0044   Epoch: 18   Global Step: 187710   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:04:30,131-Speed 5966.25 samples/sec   Loss 2.0892   LearningRate 0.0044   Epoch: 18   Global Step: 187720   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:04:36,993-Speed 5970.20 samples/sec   Loss 2.0658   LearningRate 0.0044   Epoch: 18   Global Step: 187730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:04:43,835-Speed 5988.23 samples/sec   Loss 2.0753   LearningRate 0.0044   Epoch: 18   Global Step: 187740   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:04:50,711-Speed 5957.60 samples/sec   Loss 2.0684   LearningRate 0.0044   Epoch: 18   Global Step: 187750   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:04:57,584-Speed 5960.64 samples/sec   Loss 2.0550   LearningRate 0.0044   Epoch: 18   Global Step: 187760   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:05:04,464-Speed 5956.32 samples/sec   Loss 2.0631   LearningRate 0.0044   Epoch: 18   Global Step: 187770   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:05:11,316-Speed 5979.03 samples/sec   Loss 2.0843   LearningRate 0.0044   Epoch: 18   Global Step: 187780   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:05:18,171-Speed 5975.63 samples/sec   Loss 2.0900   LearningRate 0.0044   Epoch: 18   Global Step: 187790   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:05:25,045-Speed 5961.27 samples/sec   Loss 2.0718   LearningRate 0.0044   Epoch: 18   Global Step: 187800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:05:31,902-Speed 5974.67 samples/sec   Loss 2.0901   LearningRate 0.0044   Epoch: 18   Global Step: 187810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:05:38,767-Speed 5966.73 samples/sec   Loss 2.0718   LearningRate 0.0044   Epoch: 18   Global Step: 187820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:05:45,638-Speed 5965.43 samples/sec   Loss 2.0402   LearningRate 0.0044   Epoch: 18   Global Step: 187830   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:05:52,536-Speed 5939.60 samples/sec   Loss 2.0342   LearningRate 0.0044   Epoch: 18   Global Step: 187840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:05:59,455-Speed 5920.43 samples/sec   Loss 2.0836   LearningRate 0.0044   Epoch: 18   Global Step: 187850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:06:06,311-Speed 5976.15 samples/sec   Loss 2.0648   LearningRate 0.0044   Epoch: 18   Global Step: 187860   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:06:13,201-Speed 5948.33 samples/sec   Loss 2.0679   LearningRate 0.0044   Epoch: 18   Global Step: 187870   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:06:20,053-Speed 5978.86 samples/sec   Loss 2.0821   LearningRate 0.0044   Epoch: 18   Global Step: 187880   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:06:26,910-Speed 5975.30 samples/sec   Loss 2.0320   LearningRate 0.0044   Epoch: 18   Global Step: 187890   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:06:33,764-Speed 5978.12 samples/sec   Loss 2.0495   LearningRate 0.0044   Epoch: 18   Global Step: 187900   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:06:40,639-Speed 5959.06 samples/sec   Loss 2.0492   LearningRate 0.0044   Epoch: 18   Global Step: 187910   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:06:47,528-Speed 5946.39 samples/sec   Loss 2.0594   LearningRate 0.0043   Epoch: 18   Global Step: 187920   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:06:54,383-Speed 5976.54 samples/sec   Loss 2.0624   LearningRate 0.0043   Epoch: 18   Global Step: 187930   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:07:01,234-Speed 5979.43 samples/sec   Loss 2.0713   LearningRate 0.0043   Epoch: 18   Global Step: 187940   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:07:08,098-Speed 5968.57 samples/sec   Loss 2.0995   LearningRate 0.0043   Epoch: 18   Global Step: 187950   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:07:14,958-Speed 5972.23 samples/sec   Loss 2.0675   LearningRate 0.0043   Epoch: 18   Global Step: 187960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:07:21,819-Speed 5970.83 samples/sec   Loss 2.0490   LearningRate 0.0043   Epoch: 18   Global Step: 187970   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:07:28,683-Speed 5968.58 samples/sec   Loss 2.0546   LearningRate 0.0043   Epoch: 18   Global Step: 187980   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:07:35,540-Speed 5975.12 samples/sec   Loss 2.0482   LearningRate 0.0043   Epoch: 18   Global Step: 187990   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:07:42,387-Speed 5982.84 samples/sec   Loss 2.0208   LearningRate 0.0043   Epoch: 18   Global Step: 188000   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:07:49,239-Speed 5979.53 samples/sec   Loss 2.0618   LearningRate 0.0043   Epoch: 18   Global Step: 188010   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:07:56,098-Speed 5972.24 samples/sec   Loss 2.0335   LearningRate 0.0043   Epoch: 18   Global Step: 188020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:08:02,952-Speed 5977.37 samples/sec   Loss 2.0621   LearningRate 0.0043   Epoch: 18   Global Step: 188030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:08:09,817-Speed 5967.99 samples/sec   Loss 2.0718   LearningRate 0.0043   Epoch: 18   Global Step: 188040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:08:16,673-Speed 5976.15 samples/sec   Loss 2.0624   LearningRate 0.0043   Epoch: 18   Global Step: 188050   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:08:23,523-Speed 5980.91 samples/sec   Loss 2.0262   LearningRate 0.0043   Epoch: 18   Global Step: 188060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:08:30,403-Speed 5954.71 samples/sec   Loss 2.0763   LearningRate 0.0043   Epoch: 18   Global Step: 188070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:08:37,269-Speed 5967.02 samples/sec   Loss 2.0498   LearningRate 0.0043   Epoch: 18   Global Step: 188080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:08:44,118-Speed 5981.03 samples/sec   Loss 2.0422   LearningRate 0.0043   Epoch: 18   Global Step: 188090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:08:50,976-Speed 5975.64 samples/sec   Loss 2.0638   LearningRate 0.0043   Epoch: 18   Global Step: 188100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:08:57,827-Speed 5979.49 samples/sec   Loss 2.0379   LearningRate 0.0043   Epoch: 18   Global Step: 188110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:09:04,683-Speed 5975.30 samples/sec   Loss 2.0390   LearningRate 0.0043   Epoch: 18   Global Step: 188120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:09:11,558-Speed 5958.61 samples/sec   Loss 2.0762   LearningRate 0.0043   Epoch: 18   Global Step: 188130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:09:18,419-Speed 5973.38 samples/sec   Loss 2.0366   LearningRate 0.0043   Epoch: 18   Global Step: 188140   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:09:25,270-Speed 5979.44 samples/sec   Loss 2.0837   LearningRate 0.0042   Epoch: 18   Global Step: 188150   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:09:32,129-Speed 5972.94 samples/sec   Loss 2.0818   LearningRate 0.0042   Epoch: 18   Global Step: 188160   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:09:38,983-Speed 5976.66 samples/sec   Loss 2.0696   LearningRate 0.0042   Epoch: 18   Global Step: 188170   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:09:45,825-Speed 5987.59 samples/sec   Loss 2.0388   LearningRate 0.0042   Epoch: 18   Global Step: 188180   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:09:52,716-Speed 5945.67 samples/sec   Loss 2.0637   LearningRate 0.0042   Epoch: 18   Global Step: 188190   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:09:59,588-Speed 5961.03 samples/sec   Loss 2.1025   LearningRate 0.0042   Epoch: 18   Global Step: 188200   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:10:06,436-Speed 5982.19 samples/sec   Loss 2.0354   LearningRate 0.0042   Epoch: 18   Global Step: 188210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:10:13,299-Speed 5971.82 samples/sec   Loss 2.0687   LearningRate 0.0042   Epoch: 18   Global Step: 188220   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:10:20,157-Speed 5974.09 samples/sec   Loss 2.0471   LearningRate 0.0042   Epoch: 18   Global Step: 188230   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:10:27,022-Speed 5966.88 samples/sec   Loss 2.0128   LearningRate 0.0042   Epoch: 18   Global Step: 188240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:10:33,884-Speed 5970.93 samples/sec   Loss 2.0808   LearningRate 0.0042   Epoch: 18   Global Step: 188250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:10:40,738-Speed 5977.67 samples/sec   Loss 2.0143   LearningRate 0.0042   Epoch: 18   Global Step: 188260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:10:47,633-Speed 5943.54 samples/sec   Loss 2.0182   LearningRate 0.0042   Epoch: 18   Global Step: 188270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:10:54,501-Speed 5965.55 samples/sec   Loss 2.0656   LearningRate 0.0042   Epoch: 18   Global Step: 188280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:11:01,368-Speed 5968.04 samples/sec   Loss 2.0253   LearningRate 0.0042   Epoch: 18   Global Step: 188290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:11:08,232-Speed 5968.95 samples/sec   Loss 2.0583   LearningRate 0.0042   Epoch: 18   Global Step: 188300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:11:15,086-Speed 5977.43 samples/sec   Loss 2.0224   LearningRate 0.0042   Epoch: 18   Global Step: 188310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:11:21,943-Speed 5974.85 samples/sec   Loss 2.0474   LearningRate 0.0042   Epoch: 18   Global Step: 188320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:11:28,835-Speed 5944.12 samples/sec   Loss 2.0567   LearningRate 0.0042   Epoch: 18   Global Step: 188330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:11:35,691-Speed 5975.58 samples/sec   Loss 2.0394   LearningRate 0.0042   Epoch: 18   Global Step: 188340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:11:42,544-Speed 5977.96 samples/sec   Loss 2.0329   LearningRate 0.0042   Epoch: 18   Global Step: 188350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:11:49,395-Speed 5979.10 samples/sec   Loss 2.0263   LearningRate 0.0042   Epoch: 18   Global Step: 188360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:11:56,250-Speed 5977.09 samples/sec   Loss 2.0418   LearningRate 0.0041   Epoch: 18   Global Step: 188370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:12:03,132-Speed 5952.98 samples/sec   Loss 2.0296   LearningRate 0.0041   Epoch: 18   Global Step: 188380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:12:10,017-Speed 5950.18 samples/sec   Loss 2.0362   LearningRate 0.0041   Epoch: 18   Global Step: 188390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:12:16,860-Speed 5986.28 samples/sec   Loss 2.0307   LearningRate 0.0041   Epoch: 18   Global Step: 188400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:12:23,702-Speed 5988.08 samples/sec   Loss 2.0520   LearningRate 0.0041   Epoch: 18   Global Step: 188410   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:12:30,564-Speed 5970.40 samples/sec   Loss 2.0571   LearningRate 0.0041   Epoch: 18   Global Step: 188420   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:12:37,412-Speed 5982.44 samples/sec   Loss 2.0312   LearningRate 0.0041   Epoch: 18   Global Step: 188430   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:12:44,260-Speed 5982.04 samples/sec   Loss 2.0469   LearningRate 0.0041   Epoch: 18   Global Step: 188440   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:12:51,098-Speed 5990.81 samples/sec   Loss 2.0822   LearningRate 0.0041   Epoch: 18   Global Step: 188450   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:12:57,964-Speed 5967.52 samples/sec   Loss 2.0137   LearningRate 0.0041   Epoch: 18   Global Step: 188460   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:13:04,816-Speed 5978.96 samples/sec   Loss 2.0224   LearningRate 0.0041   Epoch: 18   Global Step: 188470   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:13:11,689-Speed 5960.27 samples/sec   Loss 2.0337   LearningRate 0.0041   Epoch: 18   Global Step: 188480   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:13:18,543-Speed 5977.36 samples/sec   Loss 2.0497   LearningRate 0.0041   Epoch: 18   Global Step: 188490   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:13:25,390-Speed 5983.52 samples/sec   Loss 2.0327   LearningRate 0.0041   Epoch: 18   Global Step: 188500   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:13:32,247-Speed 5975.34 samples/sec   Loss 2.0285   LearningRate 0.0041   Epoch: 18   Global Step: 188510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:13:39,107-Speed 5971.20 samples/sec   Loss 1.9796   LearningRate 0.0041   Epoch: 18   Global Step: 188520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:13:45,959-Speed 5979.36 samples/sec   Loss 2.0538   LearningRate 0.0041   Epoch: 18   Global Step: 188530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:13:52,797-Speed 5990.73 samples/sec   Loss 1.9738   LearningRate 0.0041   Epoch: 18   Global Step: 188540   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:13:59,639-Speed 5987.61 samples/sec   Loss 2.0135   LearningRate 0.0041   Epoch: 18   Global Step: 188550   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:14:06,494-Speed 5976.32 samples/sec   Loss 2.0449   LearningRate 0.0041   Epoch: 18   Global Step: 188560   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:14:13,359-Speed 5968.05 samples/sec   Loss 2.0339   LearningRate 0.0041   Epoch: 18   Global Step: 188570   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:14:20,211-Speed 5979.22 samples/sec   Loss 2.0419   LearningRate 0.0041   Epoch: 18   Global Step: 188580   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:14:27,083-Speed 5961.40 samples/sec   Loss 2.0381   LearningRate 0.0041   Epoch: 18   Global Step: 188590   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:14:34,066-Speed 5867.09 samples/sec   Loss 2.0395   LearningRate 0.0040   Epoch: 18   Global Step: 188600   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:14:41,061-Speed 5856.82 samples/sec   Loss 2.0060   LearningRate 0.0040   Epoch: 18   Global Step: 188610   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:14:47,927-Speed 5969.08 samples/sec   Loss 2.0062   LearningRate 0.0040   Epoch: 18   Global Step: 188620   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:14:54,778-Speed 5979.68 samples/sec   Loss 2.0433   LearningRate 0.0040   Epoch: 18   Global Step: 188630   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:15:01,635-Speed 5974.82 samples/sec   Loss 2.0080   LearningRate 0.0040   Epoch: 18   Global Step: 188640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:15:08,482-Speed 5983.58 samples/sec   Loss 2.0128   LearningRate 0.0040   Epoch: 18   Global Step: 188650   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:15:15,326-Speed 5986.02 samples/sec   Loss 2.0228   LearningRate 0.0040   Epoch: 18   Global Step: 188660   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:15:22,168-Speed 5987.15 samples/sec   Loss 2.0175   LearningRate 0.0040   Epoch: 18   Global Step: 188670   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:15:29,021-Speed 5978.91 samples/sec   Loss 2.0495   LearningRate 0.0040   Epoch: 18   Global Step: 188680   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:15:35,884-Speed 5972.62 samples/sec   Loss 2.0650   LearningRate 0.0040   Epoch: 18   Global Step: 188690   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:15:42,720-Speed 5992.09 samples/sec   Loss 2.0059   LearningRate 0.0040   Epoch: 18   Global Step: 188700   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:15:49,571-Speed 5980.08 samples/sec   Loss 2.0371   LearningRate 0.0040   Epoch: 18   Global Step: 188710   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:15:56,437-Speed 5967.24 samples/sec   Loss 2.0200   LearningRate 0.0040   Epoch: 18   Global Step: 188720   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:16:03,301-Speed 5968.36 samples/sec   Loss 2.0332   LearningRate 0.0040   Epoch: 18   Global Step: 188730   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:16:10,156-Speed 5976.93 samples/sec   Loss 2.0357   LearningRate 0.0040   Epoch: 18   Global Step: 188740   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:16:17,050-Speed 5941.78 samples/sec   Loss 1.9803   LearningRate 0.0040   Epoch: 18   Global Step: 188750   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:16:23,899-Speed 5982.18 samples/sec   Loss 2.0652   LearningRate 0.0040   Epoch: 18   Global Step: 188760   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:16:30,882-Speed 5867.64 samples/sec   Loss 1.9857   LearningRate 0.0040   Epoch: 18   Global Step: 188770   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:16:37,741-Speed 5973.02 samples/sec   Loss 1.9977   LearningRate 0.0040   Epoch: 18   Global Step: 188780   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:16:44,601-Speed 5972.32 samples/sec   Loss 1.9984   LearningRate 0.0040   Epoch: 18   Global Step: 188790   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:16:51,444-Speed 5986.34 samples/sec   Loss 2.0131   LearningRate 0.0040   Epoch: 18   Global Step: 188800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:16:58,322-Speed 5956.51 samples/sec   Loss 2.0410   LearningRate 0.0040   Epoch: 18   Global Step: 188810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:17:05,185-Speed 5969.90 samples/sec   Loss 2.0081   LearningRate 0.0040   Epoch: 18   Global Step: 188820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:17:12,077-Speed 5947.80 samples/sec   Loss 2.0199   LearningRate 0.0040   Epoch: 18   Global Step: 188830   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:17:18,948-Speed 5962.80 samples/sec   Loss 2.0154   LearningRate 0.0039   Epoch: 18   Global Step: 188840   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:17:25,796-Speed 5982.12 samples/sec   Loss 2.0322   LearningRate 0.0039   Epoch: 18   Global Step: 188850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:17:32,643-Speed 5984.10 samples/sec   Loss 2.0438   LearningRate 0.0039   Epoch: 18   Global Step: 188860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:17:39,612-Speed 5878.45 samples/sec   Loss 1.9911   LearningRate 0.0039   Epoch: 18   Global Step: 188870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:17:46,456-Speed 5986.01 samples/sec   Loss 1.9969   LearningRate 0.0039   Epoch: 18   Global Step: 188880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:17:53,307-Speed 5980.58 samples/sec   Loss 2.0100   LearningRate 0.0039   Epoch: 18   Global Step: 188890   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:18:00,159-Speed 5978.90 samples/sec   Loss 2.0339   LearningRate 0.0039   Epoch: 18   Global Step: 188900   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:18:07,037-Speed 5956.28 samples/sec   Loss 2.0016   LearningRate 0.0039   Epoch: 18   Global Step: 188910   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:18:13,905-Speed 5965.79 samples/sec   Loss 2.0101   LearningRate 0.0039   Epoch: 18   Global Step: 188920   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:18:20,780-Speed 5958.12 samples/sec   Loss 1.9567   LearningRate 0.0039   Epoch: 18   Global Step: 188930   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:18:27,617-Speed 5992.31 samples/sec   Loss 2.0113   LearningRate 0.0039   Epoch: 18   Global Step: 188940   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:18:34,462-Speed 5984.95 samples/sec   Loss 1.9730   LearningRate 0.0039   Epoch: 18   Global Step: 188950   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:18:41,302-Speed 5989.40 samples/sec   Loss 2.0109   LearningRate 0.0039   Epoch: 18   Global Step: 188960   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:18:48,158-Speed 5975.39 samples/sec   Loss 2.0014   LearningRate 0.0039   Epoch: 18   Global Step: 188970   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:18:55,003-Speed 5984.88 samples/sec   Loss 2.0395   LearningRate 0.0039   Epoch: 18   Global Step: 188980   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:19:01,845-Speed 5987.88 samples/sec   Loss 1.9899   LearningRate 0.0039   Epoch: 18   Global Step: 188990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:19:08,703-Speed 5973.54 samples/sec   Loss 1.9969   LearningRate 0.0039   Epoch: 18   Global Step: 189000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:19:15,564-Speed 5971.95 samples/sec   Loss 2.0235   LearningRate 0.0039   Epoch: 18   Global Step: 189010   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:19:22,434-Speed 5963.10 samples/sec   Loss 1.9943   LearningRate 0.0039   Epoch: 18   Global Step: 189020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:19:29,296-Speed 5970.67 samples/sec   Loss 1.9749   LearningRate 0.0039   Epoch: 18   Global Step: 189030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:19:36,254-Speed 5888.23 samples/sec   Loss 2.0145   LearningRate 0.0039   Epoch: 18   Global Step: 189040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:19:43,207-Speed 5891.64 samples/sec   Loss 2.0147   LearningRate 0.0039   Epoch: 18   Global Step: 189050   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:19:50,060-Speed 5978.98 samples/sec   Loss 2.0011   LearningRate 0.0039   Epoch: 18   Global Step: 189060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:19:57,015-Speed 5890.26 samples/sec   Loss 2.0055   LearningRate 0.0038   Epoch: 18   Global Step: 189070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:20:03,860-Speed 5985.30 samples/sec   Loss 1.9613   LearningRate 0.0038   Epoch: 18   Global Step: 189080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:20:10,779-Speed 5921.97 samples/sec   Loss 2.0402   LearningRate 0.0038   Epoch: 18   Global Step: 189090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:20:17,687-Speed 5930.10 samples/sec   Loss 2.0130   LearningRate 0.0038   Epoch: 18   Global Step: 189100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:20:24,550-Speed 5969.27 samples/sec   Loss 1.9977   LearningRate 0.0038   Epoch: 18   Global Step: 189110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:20:31,425-Speed 5959.08 samples/sec   Loss 1.9694   LearningRate 0.0038   Epoch: 18   Global Step: 189120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:20:38,333-Speed 5930.68 samples/sec   Loss 1.9787   LearningRate 0.0038   Epoch: 18   Global Step: 189130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:20:45,182-Speed 5981.08 samples/sec   Loss 1.9834   LearningRate 0.0038   Epoch: 18   Global Step: 189140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:20:52,032-Speed 5981.06 samples/sec   Loss 1.9982   LearningRate 0.0038   Epoch: 18   Global Step: 189150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:20:58,892-Speed 5972.06 samples/sec   Loss 1.9710   LearningRate 0.0038   Epoch: 18   Global Step: 189160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:21:05,736-Speed 5985.24 samples/sec   Loss 2.0373   LearningRate 0.0038   Epoch: 18   Global Step: 189170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:21:12,580-Speed 5986.36 samples/sec   Loss 2.0077   LearningRate 0.0038   Epoch: 18   Global Step: 189180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:21:19,450-Speed 5963.72 samples/sec   Loss 1.9863   LearningRate 0.0038   Epoch: 18   Global Step: 189190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:21:26,320-Speed 5963.27 samples/sec   Loss 1.9699   LearningRate 0.0038   Epoch: 18   Global Step: 189200   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:21:33,186-Speed 5967.13 samples/sec   Loss 1.9926   LearningRate 0.0038   Epoch: 18   Global Step: 189210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:21:40,055-Speed 5964.42 samples/sec   Loss 1.9975   LearningRate 0.0038   Epoch: 18   Global Step: 189220   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:21:46,926-Speed 5962.11 samples/sec   Loss 1.9817   LearningRate 0.0038   Epoch: 18   Global Step: 189230   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:21:53,766-Speed 5989.45 samples/sec   Loss 2.0023   LearningRate 0.0038   Epoch: 18   Global Step: 189240   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:22:00,623-Speed 5974.67 samples/sec   Loss 2.0084   LearningRate 0.0038   Epoch: 18   Global Step: 189250   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:22:07,492-Speed 5964.04 samples/sec   Loss 2.0110   LearningRate 0.0038   Epoch: 18   Global Step: 189260   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:22:14,344-Speed 5979.64 samples/sec   Loss 1.9733   LearningRate 0.0038   Epoch: 18   Global Step: 189270   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:22:21,280-Speed 5906.91 samples/sec   Loss 2.0040   LearningRate 0.0038   Epoch: 18   Global Step: 189280   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:22:28,210-Speed 5911.53 samples/sec   Loss 2.0092   LearningRate 0.0038   Epoch: 18   Global Step: 189290   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:22:35,101-Speed 5944.91 samples/sec   Loss 2.0304   LearningRate 0.0038   Epoch: 18   Global Step: 189300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:22:41,990-Speed 5947.66 samples/sec   Loss 2.0128   LearningRate 0.0037   Epoch: 18   Global Step: 189310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:22:48,848-Speed 5973.71 samples/sec   Loss 1.9832   LearningRate 0.0037   Epoch: 18   Global Step: 189320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:22:55,715-Speed 5965.87 samples/sec   Loss 1.9713   LearningRate 0.0037   Epoch: 18   Global Step: 189330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:23:02,586-Speed 5963.39 samples/sec   Loss 1.9664   LearningRate 0.0037   Epoch: 18   Global Step: 189340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:23:09,447-Speed 5970.40 samples/sec   Loss 2.0095   LearningRate 0.0037   Epoch: 18   Global Step: 189350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:23:16,324-Speed 5957.45 samples/sec   Loss 1.9990   LearningRate 0.0037   Epoch: 18   Global Step: 189360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:23:23,203-Speed 5955.76 samples/sec   Loss 2.0145   LearningRate 0.0037   Epoch: 18   Global Step: 189370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:23:30,071-Speed 5965.37 samples/sec   Loss 1.9672   LearningRate 0.0037   Epoch: 18   Global Step: 189380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:23:36,933-Speed 5970.70 samples/sec   Loss 1.9688   LearningRate 0.0037   Epoch: 18   Global Step: 189390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:23:43,851-Speed 5921.56 samples/sec   Loss 2.0016   LearningRate 0.0037   Epoch: 18   Global Step: 189400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:23:50,715-Speed 5968.74 samples/sec   Loss 1.9655   LearningRate 0.0037   Epoch: 18   Global Step: 189410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:23:57,563-Speed 5982.82 samples/sec   Loss 1.9566   LearningRate 0.0037   Epoch: 18   Global Step: 189420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:24:04,442-Speed 5956.50 samples/sec   Loss 2.0069   LearningRate 0.0037   Epoch: 18   Global Step: 189430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:24:11,281-Speed 5989.63 samples/sec   Loss 1.9968   LearningRate 0.0037   Epoch: 18   Global Step: 189440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:24:18,131-Speed 5981.08 samples/sec   Loss 1.9652   LearningRate 0.0037   Epoch: 18   Global Step: 189450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:24:24,985-Speed 5976.78 samples/sec   Loss 1.9739   LearningRate 0.0037   Epoch: 18   Global Step: 189460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:24:31,842-Speed 5974.53 samples/sec   Loss 1.9902   LearningRate 0.0037   Epoch: 18   Global Step: 189470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:24:38,699-Speed 5976.41 samples/sec   Loss 1.9794   LearningRate 0.0037   Epoch: 18   Global Step: 189480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-01-09 09:24:45,550-Speed 5980.14 samples/sec   Loss 1.9993   LearningRate 0.0037   Epoch: 18   Global Step: 189490   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:24:52,437-Speed 5947.81 samples/sec   Loss 1.9944   LearningRate 0.0037   Epoch: 18   Global Step: 189500   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:24:59,329-Speed 5944.72 samples/sec   Loss 2.0138   LearningRate 0.0037   Epoch: 18   Global Step: 189510   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:25:06,184-Speed 5976.99 samples/sec   Loss 1.9568   LearningRate 0.0037   Epoch: 18   Global Step: 189520   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-01-09 09:25:13,068-Speed 5951.19 samples/sec   Loss 1.9737   LearningRate 0.0037   Epoch: 18   Global Step: 189530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:25:19,978-Speed 5928.24 samples/sec   Loss 1.9177   LearningRate 0.0037   Epoch: 18   Global Step: 189540   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:25:26,825-Speed 5984.67 samples/sec   Loss 1.9455   LearningRate 0.0037   Epoch: 18   Global Step: 189550   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:25:33,685-Speed 5971.87 samples/sec   Loss 1.9859   LearningRate 0.0036   Epoch: 18   Global Step: 189560   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:25:40,554-Speed 5964.08 samples/sec   Loss 1.9836   LearningRate 0.0036   Epoch: 18   Global Step: 189570   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:25:47,426-Speed 5961.97 samples/sec   Loss 1.9753   LearningRate 0.0036   Epoch: 18   Global Step: 189580   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:25:54,300-Speed 5959.99 samples/sec   Loss 1.9811   LearningRate 0.0036   Epoch: 18   Global Step: 189590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:26:01,154-Speed 5979.19 samples/sec   Loss 1.9954   LearningRate 0.0036   Epoch: 18   Global Step: 189600   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:26:08,116-Speed 5883.96 samples/sec   Loss 2.0175   LearningRate 0.0036   Epoch: 18   Global Step: 189610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:26:14,969-Speed 5978.36 samples/sec   Loss 1.9725   LearningRate 0.0036   Epoch: 18   Global Step: 189620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:26:21,849-Speed 5956.70 samples/sec   Loss 1.9873   LearningRate 0.0036   Epoch: 18   Global Step: 189630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:26:28,714-Speed 5967.60 samples/sec   Loss 1.9969   LearningRate 0.0036   Epoch: 18   Global Step: 189640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:26:35,560-Speed 5984.07 samples/sec   Loss 1.9701   LearningRate 0.0036   Epoch: 18   Global Step: 189650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:26:42,412-Speed 5978.99 samples/sec   Loss 1.9706   LearningRate 0.0036   Epoch: 18   Global Step: 189660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:26:49,253-Speed 5988.65 samples/sec   Loss 1.9818   LearningRate 0.0036   Epoch: 18   Global Step: 189670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:26:56,103-Speed 5980.25 samples/sec   Loss 1.9897   LearningRate 0.0036   Epoch: 18   Global Step: 189680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:27:03,019-Speed 5923.69 samples/sec   Loss 1.9907   LearningRate 0.0036   Epoch: 18   Global Step: 189690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:27:09,873-Speed 5977.74 samples/sec   Loss 1.9765   LearningRate 0.0036   Epoch: 18   Global Step: 189700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:27:16,716-Speed 5986.23 samples/sec   Loss 1.9443   LearningRate 0.0036   Epoch: 18   Global Step: 189710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:27:23,576-Speed 5971.85 samples/sec   Loss 1.9864   LearningRate 0.0036   Epoch: 18   Global Step: 189720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:27:30,428-Speed 5978.96 samples/sec   Loss 2.0171   LearningRate 0.0036   Epoch: 18   Global Step: 189730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:27:37,296-Speed 5964.79 samples/sec   Loss 1.9772   LearningRate 0.0036   Epoch: 18   Global Step: 189740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:27:44,160-Speed 5969.45 samples/sec   Loss 1.9294   LearningRate 0.0036   Epoch: 18   Global Step: 189750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:27:51,064-Speed 5934.40 samples/sec   Loss 1.9586   LearningRate 0.0036   Epoch: 18   Global Step: 189760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:27:57,922-Speed 5973.81 samples/sec   Loss 1.9550   LearningRate 0.0036   Epoch: 18   Global Step: 189770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:28:04,785-Speed 5968.57 samples/sec   Loss 1.9914   LearningRate 0.0036   Epoch: 18   Global Step: 189780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:28:11,646-Speed 5971.58 samples/sec   Loss 1.9438   LearningRate 0.0036   Epoch: 18   Global Step: 189790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:28:18,550-Speed 5933.97 samples/sec   Loss 1.9561   LearningRate 0.0035   Epoch: 18   Global Step: 189800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:28:25,398-Speed 5984.66 samples/sec   Loss 1.9634   LearningRate 0.0035   Epoch: 18   Global Step: 189810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:28:32,238-Speed 5989.55 samples/sec   Loss 1.9988   LearningRate 0.0035   Epoch: 18   Global Step: 189820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:28:39,093-Speed 5975.51 samples/sec   Loss 1.9925   LearningRate 0.0035   Epoch: 18   Global Step: 189830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:28:45,946-Speed 5978.17 samples/sec   Loss 1.9630   LearningRate 0.0035   Epoch: 18   Global Step: 189840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:28:52,802-Speed 5976.42 samples/sec   Loss 1.9604   LearningRate 0.0035   Epoch: 18   Global Step: 189850   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:28:59,762-Speed 5886.20 samples/sec   Loss 1.9227   LearningRate 0.0035   Epoch: 18   Global Step: 189860   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:29:06,623-Speed 5971.34 samples/sec   Loss 1.9347   LearningRate 0.0035   Epoch: 18   Global Step: 189870   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:29:13,473-Speed 5983.17 samples/sec   Loss 1.9430   LearningRate 0.0035   Epoch: 18   Global Step: 189880   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:29:20,326-Speed 5977.92 samples/sec   Loss 1.9406   LearningRate 0.0035   Epoch: 18   Global Step: 189890   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:29:27,184-Speed 5973.79 samples/sec   Loss 1.9470   LearningRate 0.0035   Epoch: 18   Global Step: 189900   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:29:34,061-Speed 5957.45 samples/sec   Loss 1.9672   LearningRate 0.0035   Epoch: 18   Global Step: 189910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:29:40,929-Speed 5965.68 samples/sec   Loss 1.9428   LearningRate 0.0035   Epoch: 18   Global Step: 189920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:29:47,798-Speed 5964.02 samples/sec   Loss 1.9921   LearningRate 0.0035   Epoch: 18   Global Step: 189930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:29:54,682-Speed 5951.55 samples/sec   Loss 1.9582   LearningRate 0.0035   Epoch: 18   Global Step: 189940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:30:01,533-Speed 5980.17 samples/sec   Loss 1.9427   LearningRate 0.0035   Epoch: 18   Global Step: 189950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:30:08,371-Speed 5990.89 samples/sec   Loss 1.9601   LearningRate 0.0035   Epoch: 18   Global Step: 189960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:30:15,240-Speed 5964.30 samples/sec   Loss 1.9627   LearningRate 0.0035   Epoch: 18   Global Step: 189970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:30:22,091-Speed 5979.22 samples/sec   Loss 1.9134   LearningRate 0.0035   Epoch: 18   Global Step: 189980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:30:28,959-Speed 5968.04 samples/sec   Loss 1.9835   LearningRate 0.0035   Epoch: 18   Global Step: 189990   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:30:35,841-Speed 5953.42 samples/sec   Loss 1.9990   LearningRate 0.0035   Epoch: 18   Global Step: 190000   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:31:02,610-[lfw][190000]XNorm: 23.655104
Training: 2022-01-09 09:31:02,611-[lfw][190000]Accuracy-Flip: 0.99767+-0.00271
Training: 2022-01-09 09:31:02,611-[lfw][190000]Accuracy-Highest: 0.99833
Training: 2022-01-09 09:31:33,605-[cfp_fp][190000]XNorm: 21.504125
Training: 2022-01-09 09:31:33,606-[cfp_fp][190000]Accuracy-Flip: 0.99286+-0.00389
Training: 2022-01-09 09:31:33,607-[cfp_fp][190000]Accuracy-Highest: 0.99286
Training: 2022-01-09 09:32:00,336-[agedb_30][190000]XNorm: 23.125094
Training: 2022-01-09 09:32:00,337-[agedb_30][190000]Accuracy-Flip: 0.98117+-0.00606
Training: 2022-01-09 09:32:00,337-[agedb_30][190000]Accuracy-Highest: 0.98200
Training: 2022-01-09 09:32:07,190-Speed 448.40 samples/sec   Loss 1.9535   LearningRate 0.0035   Epoch: 18   Global Step: 190010   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:32:14,024-Speed 5995.09 samples/sec   Loss 1.9457   LearningRate 0.0035   Epoch: 18   Global Step: 190020   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:32:20,866-Speed 5987.77 samples/sec   Loss 1.9517   LearningRate 0.0035   Epoch: 18   Global Step: 190030   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:32:27,710-Speed 5986.20 samples/sec   Loss 1.9016   LearningRate 0.0035   Epoch: 18   Global Step: 190040   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:32:34,572-Speed 5969.69 samples/sec   Loss 1.9386   LearningRate 0.0034   Epoch: 18   Global Step: 190050   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:32:41,423-Speed 5979.55 samples/sec   Loss 1.9291   LearningRate 0.0034   Epoch: 18   Global Step: 190060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:32:48,273-Speed 5983.23 samples/sec   Loss 1.9685   LearningRate 0.0034   Epoch: 18   Global Step: 190070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:32:55,119-Speed 5984.66 samples/sec   Loss 1.9775   LearningRate 0.0034   Epoch: 18   Global Step: 190080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:33:01,968-Speed 5981.46 samples/sec   Loss 1.9759   LearningRate 0.0034   Epoch: 18   Global Step: 190090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:33:08,874-Speed 5932.62 samples/sec   Loss 1.9617   LearningRate 0.0034   Epoch: 18   Global Step: 190100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:33:15,725-Speed 5979.84 samples/sec   Loss 1.9287   LearningRate 0.0034   Epoch: 18   Global Step: 190110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:33:22,582-Speed 5974.74 samples/sec   Loss 1.9386   LearningRate 0.0034   Epoch: 18   Global Step: 190120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:33:29,427-Speed 5985.77 samples/sec   Loss 1.9605   LearningRate 0.0034   Epoch: 18   Global Step: 190130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:33:36,272-Speed 5984.95 samples/sec   Loss 1.9512   LearningRate 0.0034   Epoch: 18   Global Step: 190140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:33:43,165-Speed 5944.16 samples/sec   Loss 1.9365   LearningRate 0.0034   Epoch: 18   Global Step: 190150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:33:50,069-Speed 5937.06 samples/sec   Loss 1.9457   LearningRate 0.0034   Epoch: 18   Global Step: 190160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:33:56,994-Speed 5915.91 samples/sec   Loss 1.9530   LearningRate 0.0034   Epoch: 18   Global Step: 190170   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:34:03,887-Speed 5943.69 samples/sec   Loss 1.9263   LearningRate 0.0034   Epoch: 18   Global Step: 190180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:34:10,769-Speed 5952.60 samples/sec   Loss 1.9444   LearningRate 0.0034   Epoch: 18   Global Step: 190190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:34:17,623-Speed 5977.43 samples/sec   Loss 1.9505   LearningRate 0.0034   Epoch: 18   Global Step: 190200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:34:24,480-Speed 5974.39 samples/sec   Loss 1.9474   LearningRate 0.0034   Epoch: 18   Global Step: 190210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:34:31,341-Speed 5971.37 samples/sec   Loss 1.9513   LearningRate 0.0034   Epoch: 18   Global Step: 190220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:34:38,194-Speed 5977.48 samples/sec   Loss 1.9403   LearningRate 0.0034   Epoch: 18   Global Step: 190230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:34:45,047-Speed 5977.74 samples/sec   Loss 1.9531   LearningRate 0.0034   Epoch: 18   Global Step: 190240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:34:51,891-Speed 5986.07 samples/sec   Loss 1.9578   LearningRate 0.0034   Epoch: 18   Global Step: 190250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:34:58,768-Speed 5957.79 samples/sec   Loss 1.9612   LearningRate 0.0034   Epoch: 18   Global Step: 190260   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:35:05,617-Speed 5981.76 samples/sec   Loss 1.9231   LearningRate 0.0034   Epoch: 18   Global Step: 190270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:35:12,480-Speed 5969.54 samples/sec   Loss 1.9556   LearningRate 0.0034   Epoch: 18   Global Step: 190280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:35:19,325-Speed 5984.47 samples/sec   Loss 1.9493   LearningRate 0.0034   Epoch: 18   Global Step: 190290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:35:26,183-Speed 5974.07 samples/sec   Loss 1.9546   LearningRate 0.0033   Epoch: 18   Global Step: 190300   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:35:33,042-Speed 5973.41 samples/sec   Loss 1.9435   LearningRate 0.0033   Epoch: 18   Global Step: 190310   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:35:39,903-Speed 5971.14 samples/sec   Loss 1.9440   LearningRate 0.0033   Epoch: 18   Global Step: 190320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:35:46,751-Speed 5982.51 samples/sec   Loss 1.9309   LearningRate 0.0033   Epoch: 18   Global Step: 190330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:35:53,612-Speed 5971.69 samples/sec   Loss 1.9351   LearningRate 0.0033   Epoch: 18   Global Step: 190340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:36:00,471-Speed 5971.80 samples/sec   Loss 1.9087   LearningRate 0.0033   Epoch: 18   Global Step: 190350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:36:07,369-Speed 5942.06 samples/sec   Loss 1.9295   LearningRate 0.0033   Epoch: 18   Global Step: 190360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:36:14,247-Speed 5956.30 samples/sec   Loss 1.9299   LearningRate 0.0033   Epoch: 18   Global Step: 190370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:36:21,093-Speed 5983.82 samples/sec   Loss 1.9182   LearningRate 0.0033   Epoch: 18   Global Step: 190380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:36:27,931-Speed 5991.88 samples/sec   Loss 1.9402   LearningRate 0.0033   Epoch: 18   Global Step: 190390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:36:34,800-Speed 5964.30 samples/sec   Loss 1.9342   LearningRate 0.0033   Epoch: 18   Global Step: 190400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:36:41,682-Speed 5952.42 samples/sec   Loss 1.9694   LearningRate 0.0033   Epoch: 18   Global Step: 190410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:36:54,053-Speed 3311.90 samples/sec   Loss 1.9326   LearningRate 0.0033   Epoch: 18   Global Step: 190420   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-01-09 09:37:00,889-Speed 5993.26 samples/sec   Loss 1.9396   LearningRate 0.0033   Epoch: 18   Global Step: 190430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:37:07,739-Speed 5980.70 samples/sec   Loss 1.8962   LearningRate 0.0033   Epoch: 18   Global Step: 190440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:37:14,613-Speed 5959.58 samples/sec   Loss 1.9028   LearningRate 0.0033   Epoch: 18   Global Step: 190450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:37:21,462-Speed 5981.89 samples/sec   Loss 1.9093   LearningRate 0.0033   Epoch: 18   Global Step: 190460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:37:28,322-Speed 5971.68 samples/sec   Loss 1.9351   LearningRate 0.0033   Epoch: 18   Global Step: 190470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:37:35,178-Speed 5975.88 samples/sec   Loss 1.9521   LearningRate 0.0033   Epoch: 18   Global Step: 190480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:37:42,025-Speed 5983.27 samples/sec   Loss 1.9738   LearningRate 0.0033   Epoch: 18   Global Step: 190490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:37:48,883-Speed 5973.18 samples/sec   Loss 1.9393   LearningRate 0.0033   Epoch: 18   Global Step: 190500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:37:55,766-Speed 5953.06 samples/sec   Loss 1.9068   LearningRate 0.0033   Epoch: 18   Global Step: 190510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:38:02,630-Speed 5968.78 samples/sec   Loss 1.9454   LearningRate 0.0033   Epoch: 18   Global Step: 190520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:38:09,477-Speed 5983.11 samples/sec   Loss 1.9344   LearningRate 0.0033   Epoch: 18   Global Step: 190530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:38:16,329-Speed 5979.48 samples/sec   Loss 1.9478   LearningRate 0.0033   Epoch: 18   Global Step: 190540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:38:23,194-Speed 5967.77 samples/sec   Loss 1.9243   LearningRate 0.0033   Epoch: 18   Global Step: 190550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:38:30,049-Speed 5976.33 samples/sec   Loss 1.9204   LearningRate 0.0032   Epoch: 18   Global Step: 190560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:38:36,922-Speed 5961.47 samples/sec   Loss 1.9399   LearningRate 0.0032   Epoch: 18   Global Step: 190570   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:38:43,791-Speed 5964.12 samples/sec   Loss 1.9160   LearningRate 0.0032   Epoch: 18   Global Step: 190580   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:38:50,655-Speed 5968.85 samples/sec   Loss 1.9478   LearningRate 0.0032   Epoch: 18   Global Step: 190590   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:38:57,520-Speed 5967.08 samples/sec   Loss 1.9186   LearningRate 0.0032   Epoch: 18   Global Step: 190600   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:39:04,370-Speed 5981.46 samples/sec   Loss 1.9294   LearningRate 0.0032   Epoch: 18   Global Step: 190610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:39:11,249-Speed 5954.83 samples/sec   Loss 1.9456   LearningRate 0.0032   Epoch: 18   Global Step: 190620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:39:18,139-Speed 5945.95 samples/sec   Loss 1.8763   LearningRate 0.0032   Epoch: 18   Global Step: 190630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:39:25,014-Speed 5959.30 samples/sec   Loss 1.9158   LearningRate 0.0032   Epoch: 18   Global Step: 190640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:39:31,871-Speed 5974.39 samples/sec   Loss 1.9531   LearningRate 0.0032   Epoch: 18   Global Step: 190650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:39:38,777-Speed 5931.74 samples/sec   Loss 1.9450   LearningRate 0.0032   Epoch: 18   Global Step: 190660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:39:45,650-Speed 5960.62 samples/sec   Loss 1.8935   LearningRate 0.0032   Epoch: 18   Global Step: 190670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:39:52,506-Speed 5976.01 samples/sec   Loss 1.8904   LearningRate 0.0032   Epoch: 18   Global Step: 190680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:39:59,360-Speed 5976.88 samples/sec   Loss 1.9426   LearningRate 0.0032   Epoch: 18   Global Step: 190690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:40:06,265-Speed 5932.91 samples/sec   Loss 1.9212   LearningRate 0.0032   Epoch: 18   Global Step: 190700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:40:13,121-Speed 5975.17 samples/sec   Loss 1.9146   LearningRate 0.0032   Epoch: 18   Global Step: 190710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:40:20,019-Speed 5939.34 samples/sec   Loss 1.9275   LearningRate 0.0032   Epoch: 18   Global Step: 190720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:40:26,875-Speed 5976.02 samples/sec   Loss 1.9572   LearningRate 0.0032   Epoch: 18   Global Step: 190730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:40:33,732-Speed 5974.11 samples/sec   Loss 1.9124   LearningRate 0.0032   Epoch: 18   Global Step: 190740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:40:40,609-Speed 5957.65 samples/sec   Loss 1.9370   LearningRate 0.0032   Epoch: 18   Global Step: 190750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:40:47,465-Speed 5975.89 samples/sec   Loss 1.9308   LearningRate 0.0032   Epoch: 18   Global Step: 190760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:40:54,333-Speed 5964.33 samples/sec   Loss 1.9278   LearningRate 0.0032   Epoch: 18   Global Step: 190770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:41:01,245-Speed 5927.22 samples/sec   Loss 1.9349   LearningRate 0.0032   Epoch: 18   Global Step: 190780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:41:08,220-Speed 5874.23 samples/sec   Loss 1.8965   LearningRate 0.0032   Epoch: 18   Global Step: 190790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:41:15,182-Speed 5883.64 samples/sec   Loss 1.9081   LearningRate 0.0032   Epoch: 18   Global Step: 190800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:41:22,048-Speed 5967.25 samples/sec   Loss 1.9314   LearningRate 0.0032   Epoch: 18   Global Step: 190810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:41:28,898-Speed 5980.46 samples/sec   Loss 1.9230   LearningRate 0.0031   Epoch: 18   Global Step: 190820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:41:35,756-Speed 5973.63 samples/sec   Loss 1.8953   LearningRate 0.0031   Epoch: 18   Global Step: 190830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:41:42,634-Speed 5956.55 samples/sec   Loss 1.9208   LearningRate 0.0031   Epoch: 18   Global Step: 190840   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:41:49,499-Speed 5968.22 samples/sec   Loss 1.9086   LearningRate 0.0031   Epoch: 18   Global Step: 190850   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:41:56,365-Speed 5966.91 samples/sec   Loss 1.9269   LearningRate 0.0031   Epoch: 18   Global Step: 190860   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:42:03,256-Speed 5945.60 samples/sec   Loss 1.9068   LearningRate 0.0031   Epoch: 18   Global Step: 190870   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:42:10,111-Speed 5976.53 samples/sec   Loss 1.8977   LearningRate 0.0031   Epoch: 18   Global Step: 190880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:42:16,979-Speed 5964.70 samples/sec   Loss 1.8982   LearningRate 0.0031   Epoch: 18   Global Step: 190890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:42:23,834-Speed 5976.60 samples/sec   Loss 1.9039   LearningRate 0.0031   Epoch: 18   Global Step: 190900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:42:30,688-Speed 5977.54 samples/sec   Loss 1.9078   LearningRate 0.0031   Epoch: 18   Global Step: 190910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:42:37,546-Speed 5972.96 samples/sec   Loss 1.8950   LearningRate 0.0031   Epoch: 18   Global Step: 190920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:42:44,419-Speed 5961.17 samples/sec   Loss 1.9289   LearningRate 0.0031   Epoch: 18   Global Step: 190930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:42:51,377-Speed 5887.78 samples/sec   Loss 1.9220   LearningRate 0.0031   Epoch: 18   Global Step: 190940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:42:58,230-Speed 5977.61 samples/sec   Loss 1.8811   LearningRate 0.0031   Epoch: 18   Global Step: 190950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:43:05,094-Speed 5969.45 samples/sec   Loss 1.9541   LearningRate 0.0031   Epoch: 18   Global Step: 190960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:43:11,977-Speed 5952.47 samples/sec   Loss 1.9019   LearningRate 0.0031   Epoch: 18   Global Step: 190970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:43:18,833-Speed 5975.76 samples/sec   Loss 1.9063   LearningRate 0.0031   Epoch: 18   Global Step: 190980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:43:25,695-Speed 5970.22 samples/sec   Loss 1.9114   LearningRate 0.0031   Epoch: 18   Global Step: 190990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:43:32,576-Speed 5954.51 samples/sec   Loss 1.8995   LearningRate 0.0031   Epoch: 18   Global Step: 191000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:43:39,424-Speed 5981.46 samples/sec   Loss 1.8966   LearningRate 0.0031   Epoch: 18   Global Step: 191010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:43:46,278-Speed 5977.25 samples/sec   Loss 1.9086   LearningRate 0.0031   Epoch: 18   Global Step: 191020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:43:53,154-Speed 5958.73 samples/sec   Loss 1.9344   LearningRate 0.0031   Epoch: 18   Global Step: 191030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:44:00,014-Speed 5971.35 samples/sec   Loss 1.8934   LearningRate 0.0031   Epoch: 18   Global Step: 191040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:44:06,869-Speed 5976.14 samples/sec   Loss 1.9068   LearningRate 0.0031   Epoch: 18   Global Step: 191050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:44:13,713-Speed 5985.65 samples/sec   Loss 1.8873   LearningRate 0.0031   Epoch: 18   Global Step: 191060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:44:20,560-Speed 5983.08 samples/sec   Loss 1.9122   LearningRate 0.0031   Epoch: 18   Global Step: 191070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:44:27,423-Speed 5969.69 samples/sec   Loss 1.9066   LearningRate 0.0031   Epoch: 18   Global Step: 191080   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-01-09 09:44:34,286-Speed 5969.29 samples/sec   Loss 1.8858   LearningRate 0.0030   Epoch: 18   Global Step: 191090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:44:41,143-Speed 5974.86 samples/sec   Loss 1.9034   LearningRate 0.0030   Epoch: 18   Global Step: 191100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:44:48,002-Speed 5972.18 samples/sec   Loss 1.8654   LearningRate 0.0030   Epoch: 18   Global Step: 191110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:44:54,864-Speed 5970.54 samples/sec   Loss 1.9124   LearningRate 0.0030   Epoch: 18   Global Step: 191120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:45:01,727-Speed 5968.52 samples/sec   Loss 1.8936   LearningRate 0.0030   Epoch: 18   Global Step: 191130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:45:08,617-Speed 5948.90 samples/sec   Loss 1.9032   LearningRate 0.0030   Epoch: 18   Global Step: 191140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:45:15,469-Speed 5979.33 samples/sec   Loss 1.9290   LearningRate 0.0030   Epoch: 18   Global Step: 191150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:45:22,325-Speed 5974.81 samples/sec   Loss 1.8901   LearningRate 0.0030   Epoch: 18   Global Step: 191160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:45:29,183-Speed 5973.75 samples/sec   Loss 1.8810   LearningRate 0.0030   Epoch: 18   Global Step: 191170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:45:36,046-Speed 5969.82 samples/sec   Loss 1.9018   LearningRate 0.0030   Epoch: 18   Global Step: 191180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:45:42,896-Speed 5980.36 samples/sec   Loss 1.9079   LearningRate 0.0030   Epoch: 18   Global Step: 191190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:45:49,754-Speed 5973.66 samples/sec   Loss 1.8798   LearningRate 0.0030   Epoch: 18   Global Step: 191200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:45:56,598-Speed 5986.22 samples/sec   Loss 1.8680   LearningRate 0.0030   Epoch: 18   Global Step: 191210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:46:03,478-Speed 5954.45 samples/sec   Loss 1.9017   LearningRate 0.0030   Epoch: 18   Global Step: 191220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:46:10,334-Speed 5977.41 samples/sec   Loss 1.8968   LearningRate 0.0030   Epoch: 18   Global Step: 191230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:46:17,201-Speed 5966.33 samples/sec   Loss 1.9330   LearningRate 0.0030   Epoch: 18   Global Step: 191240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:46:24,066-Speed 5967.61 samples/sec   Loss 1.8681   LearningRate 0.0030   Epoch: 18   Global Step: 191250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:46:30,928-Speed 5970.24 samples/sec   Loss 1.9019   LearningRate 0.0030   Epoch: 18   Global Step: 191260   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:46:37,780-Speed 5978.94 samples/sec   Loss 1.8861   LearningRate 0.0030   Epoch: 18   Global Step: 191270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:46:44,625-Speed 5984.98 samples/sec   Loss 1.9106   LearningRate 0.0030   Epoch: 18   Global Step: 191280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:46:51,471-Speed 5984.56 samples/sec   Loss 1.9066   LearningRate 0.0030   Epoch: 18   Global Step: 191290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:46:58,333-Speed 5970.56 samples/sec   Loss 1.8785   LearningRate 0.0030   Epoch: 18   Global Step: 191300   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:47:05,190-Speed 5973.67 samples/sec   Loss 1.8939   LearningRate 0.0030   Epoch: 18   Global Step: 191310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:47:12,065-Speed 5959.71 samples/sec   Loss 1.8937   LearningRate 0.0030   Epoch: 18   Global Step: 191320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:47:18,917-Speed 5979.20 samples/sec   Loss 1.9150   LearningRate 0.0030   Epoch: 18   Global Step: 191330   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:47:25,773-Speed 5974.80 samples/sec   Loss 1.8692   LearningRate 0.0030   Epoch: 18   Global Step: 191340   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:47:32,682-Speed 5929.82 samples/sec   Loss 1.9004   LearningRate 0.0030   Epoch: 18   Global Step: 191350   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:47:39,564-Speed 5952.75 samples/sec   Loss 1.9017   LearningRate 0.0029   Epoch: 18   Global Step: 191360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:47:46,421-Speed 5974.96 samples/sec   Loss 1.8677   LearningRate 0.0029   Epoch: 18   Global Step: 191370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:47:53,276-Speed 5975.80 samples/sec   Loss 1.8774   LearningRate 0.0029   Epoch: 18   Global Step: 191380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:48:00,157-Speed 5954.50 samples/sec   Loss 1.9238   LearningRate 0.0029   Epoch: 18   Global Step: 191390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:48:07,038-Speed 5954.03 samples/sec   Loss 1.8656   LearningRate 0.0029   Epoch: 18   Global Step: 191400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:48:13,888-Speed 5980.72 samples/sec   Loss 1.8945   LearningRate 0.0029   Epoch: 18   Global Step: 191410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:48:20,756-Speed 5964.93 samples/sec   Loss 1.9148   LearningRate 0.0029   Epoch: 18   Global Step: 191420   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:48:27,663-Speed 5931.23 samples/sec   Loss 1.8854   LearningRate 0.0029   Epoch: 18   Global Step: 191430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:48:34,514-Speed 5980.30 samples/sec   Loss 1.8443   LearningRate 0.0029   Epoch: 18   Global Step: 191440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:48:41,402-Speed 5947.84 samples/sec   Loss 1.8953   LearningRate 0.0029   Epoch: 18   Global Step: 191450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:48:48,265-Speed 5969.14 samples/sec   Loss 1.8725   LearningRate 0.0029   Epoch: 18   Global Step: 191460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:48:55,114-Speed 5981.32 samples/sec   Loss 1.8769   LearningRate 0.0029   Epoch: 18   Global Step: 191470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:49:01,982-Speed 5965.56 samples/sec   Loss 1.8928   LearningRate 0.0029   Epoch: 18   Global Step: 191480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:49:08,836-Speed 5977.39 samples/sec   Loss 1.8851   LearningRate 0.0029   Epoch: 18   Global Step: 191490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:49:15,690-Speed 5977.20 samples/sec   Loss 1.9004   LearningRate 0.0029   Epoch: 18   Global Step: 191500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:49:22,564-Speed 5959.25 samples/sec   Loss 1.8900   LearningRate 0.0029   Epoch: 18   Global Step: 191510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:49:29,430-Speed 5966.85 samples/sec   Loss 1.8470   LearningRate 0.0029   Epoch: 18   Global Step: 191520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:49:36,277-Speed 5983.23 samples/sec   Loss 1.8869   LearningRate 0.0029   Epoch: 18   Global Step: 191530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:49:43,130-Speed 5978.36 samples/sec   Loss 1.8139   LearningRate 0.0029   Epoch: 18   Global Step: 191540   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:49:50,005-Speed 5960.66 samples/sec   Loss 1.9098   LearningRate 0.0029   Epoch: 18   Global Step: 191550   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:49:56,862-Speed 5973.99 samples/sec   Loss 1.8736   LearningRate 0.0029   Epoch: 18   Global Step: 191560   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:50:03,735-Speed 5960.18 samples/sec   Loss 1.9075   LearningRate 0.0029   Epoch: 18   Global Step: 191570   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:50:10,579-Speed 5986.22 samples/sec   Loss 1.8622   LearningRate 0.0029   Epoch: 18   Global Step: 191580   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:50:17,421-Speed 5987.60 samples/sec   Loss 1.8912   LearningRate 0.0029   Epoch: 18   Global Step: 191590   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:50:24,279-Speed 5975.87 samples/sec   Loss 1.8730   LearningRate 0.0029   Epoch: 18   Global Step: 191600   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:50:31,127-Speed 5982.02 samples/sec   Loss 1.8989   LearningRate 0.0029   Epoch: 18   Global Step: 191610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:50:37,993-Speed 5966.84 samples/sec   Loss 1.8517   LearningRate 0.0029   Epoch: 18   Global Step: 191620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:50:44,836-Speed 5986.79 samples/sec   Loss 1.9210   LearningRate 0.0028   Epoch: 18   Global Step: 191630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:50:51,685-Speed 5981.66 samples/sec   Loss 1.8960   LearningRate 0.0028   Epoch: 18   Global Step: 191640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:50:58,544-Speed 5973.03 samples/sec   Loss 1.8692   LearningRate 0.0028   Epoch: 18   Global Step: 191650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:51:05,429-Speed 5951.73 samples/sec   Loss 1.8914   LearningRate 0.0028   Epoch: 18   Global Step: 191660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:51:12,294-Speed 5967.14 samples/sec   Loss 1.8415   LearningRate 0.0028   Epoch: 18   Global Step: 191670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:51:19,148-Speed 5977.66 samples/sec   Loss 1.8631   LearningRate 0.0028   Epoch: 18   Global Step: 191680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:51:25,992-Speed 5986.21 samples/sec   Loss 1.8938   LearningRate 0.0028   Epoch: 18   Global Step: 191690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:51:32,846-Speed 5976.63 samples/sec   Loss 1.8632   LearningRate 0.0028   Epoch: 18   Global Step: 191700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:51:39,732-Speed 5950.26 samples/sec   Loss 1.8717   LearningRate 0.0028   Epoch: 18   Global Step: 191710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:51:46,599-Speed 5966.35 samples/sec   Loss 1.8801   LearningRate 0.0028   Epoch: 18   Global Step: 191720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:51:53,445-Speed 5983.39 samples/sec   Loss 1.8743   LearningRate 0.0028   Epoch: 18   Global Step: 191730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:52:00,296-Speed 5980.69 samples/sec   Loss 1.9089   LearningRate 0.0028   Epoch: 18   Global Step: 191740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:52:07,155-Speed 5974.70 samples/sec   Loss 1.8540   LearningRate 0.0028   Epoch: 18   Global Step: 191750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:52:13,995-Speed 5989.56 samples/sec   Loss 1.8440   LearningRate 0.0028   Epoch: 18   Global Step: 191760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:52:20,860-Speed 5966.97 samples/sec   Loss 1.8618   LearningRate 0.0028   Epoch: 18   Global Step: 191770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:52:27,739-Speed 5958.13 samples/sec   Loss 1.8792   LearningRate 0.0028   Epoch: 18   Global Step: 191780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:52:34,598-Speed 5972.96 samples/sec   Loss 1.8496   LearningRate 0.0028   Epoch: 18   Global Step: 191790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:52:41,459-Speed 5971.37 samples/sec   Loss 1.8877   LearningRate 0.0028   Epoch: 18   Global Step: 191800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:52:48,326-Speed 5965.66 samples/sec   Loss 1.8763   LearningRate 0.0028   Epoch: 18   Global Step: 191810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:52:55,215-Speed 5946.69 samples/sec   Loss 1.8688   LearningRate 0.0028   Epoch: 18   Global Step: 191820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:53:02,075-Speed 5972.34 samples/sec   Loss 1.8421   LearningRate 0.0028   Epoch: 18   Global Step: 191830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:53:08,930-Speed 5976.30 samples/sec   Loss 1.8611   LearningRate 0.0028   Epoch: 18   Global Step: 191840   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:53:15,775-Speed 5984.43 samples/sec   Loss 1.8727   LearningRate 0.0028   Epoch: 18   Global Step: 191850   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:53:22,632-Speed 5974.85 samples/sec   Loss 1.8654   LearningRate 0.0028   Epoch: 18   Global Step: 191860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:53:29,488-Speed 5975.84 samples/sec   Loss 1.8663   LearningRate 0.0028   Epoch: 18   Global Step: 191870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:53:36,348-Speed 5972.17 samples/sec   Loss 1.8474   LearningRate 0.0028   Epoch: 18   Global Step: 191880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:53:43,239-Speed 5953.58 samples/sec   Loss 1.8550   LearningRate 0.0028   Epoch: 18   Global Step: 191890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:53:50,109-Speed 5963.15 samples/sec   Loss 1.8369   LearningRate 0.0028   Epoch: 18   Global Step: 191900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:53:56,963-Speed 5977.13 samples/sec   Loss 1.8787   LearningRate 0.0027   Epoch: 18   Global Step: 191910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:54:03,811-Speed 5982.58 samples/sec   Loss 1.8697   LearningRate 0.0027   Epoch: 18   Global Step: 191920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:54:10,710-Speed 5937.67 samples/sec   Loss 1.8501   LearningRate 0.0027   Epoch: 18   Global Step: 191930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:54:17,562-Speed 5978.62 samples/sec   Loss 1.8591   LearningRate 0.0027   Epoch: 18   Global Step: 191940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:54:24,417-Speed 5976.43 samples/sec   Loss 1.8698   LearningRate 0.0027   Epoch: 18   Global Step: 191950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:54:31,304-Speed 5948.98 samples/sec   Loss 1.8723   LearningRate 0.0027   Epoch: 18   Global Step: 191960   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-01-09 09:54:38,176-Speed 5961.86 samples/sec   Loss 1.8496   LearningRate 0.0027   Epoch: 18   Global Step: 191970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:54:45,053-Speed 5957.36 samples/sec   Loss 1.8559   LearningRate 0.0027   Epoch: 18   Global Step: 191980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:54:51,898-Speed 5984.49 samples/sec   Loss 1.8578   LearningRate 0.0027   Epoch: 18   Global Step: 191990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:54:58,761-Speed 5969.72 samples/sec   Loss 1.8596   LearningRate 0.0027   Epoch: 18   Global Step: 192000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:55:05,649-Speed 5947.87 samples/sec   Loss 1.8680   LearningRate 0.0027   Epoch: 18   Global Step: 192010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:55:12,512-Speed 5969.21 samples/sec   Loss 1.8533   LearningRate 0.0027   Epoch: 18   Global Step: 192020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:55:19,375-Speed 5968.64 samples/sec   Loss 1.8795   LearningRate 0.0027   Epoch: 18   Global Step: 192030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:55:26,252-Speed 5958.14 samples/sec   Loss 1.8443   LearningRate 0.0027   Epoch: 18   Global Step: 192040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:55:33,105-Speed 5977.73 samples/sec   Loss 1.8262   LearningRate 0.0027   Epoch: 18   Global Step: 192050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:55:39,958-Speed 5978.61 samples/sec   Loss 1.8627   LearningRate 0.0027   Epoch: 18   Global Step: 192060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:55:46,812-Speed 5976.99 samples/sec   Loss 1.8709   LearningRate 0.0027   Epoch: 18   Global Step: 192070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:55:53,681-Speed 5966.02 samples/sec   Loss 1.8652   LearningRate 0.0027   Epoch: 18   Global Step: 192080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:56:00,532-Speed 5979.23 samples/sec   Loss 1.8725   LearningRate 0.0027   Epoch: 18   Global Step: 192090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:56:07,381-Speed 5982.84 samples/sec   Loss 1.8678   LearningRate 0.0027   Epoch: 18   Global Step: 192100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:56:14,253-Speed 5962.24 samples/sec   Loss 1.8623   LearningRate 0.0027   Epoch: 18   Global Step: 192110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:56:21,128-Speed 5958.98 samples/sec   Loss 1.8657   LearningRate 0.0027   Epoch: 18   Global Step: 192120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:56:27,982-Speed 5977.30 samples/sec   Loss 1.8704   LearningRate 0.0027   Epoch: 18   Global Step: 192130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:56:34,834-Speed 5979.49 samples/sec   Loss 1.8844   LearningRate 0.0027   Epoch: 18   Global Step: 192140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:56:41,687-Speed 5977.59 samples/sec   Loss 1.8624   LearningRate 0.0027   Epoch: 18   Global Step: 192150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:56:48,547-Speed 5972.30 samples/sec   Loss 1.8695   LearningRate 0.0027   Epoch: 18   Global Step: 192160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:56:55,444-Speed 5940.16 samples/sec   Loss 1.8723   LearningRate 0.0027   Epoch: 18   Global Step: 192170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:57:02,292-Speed 5982.40 samples/sec   Loss 1.8473   LearningRate 0.0027   Epoch: 18   Global Step: 192180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:57:09,139-Speed 5982.66 samples/sec   Loss 1.8506   LearningRate 0.0026   Epoch: 18   Global Step: 192190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:57:16,004-Speed 5968.50 samples/sec   Loss 1.8456   LearningRate 0.0026   Epoch: 18   Global Step: 192200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 09:57:22,850-Speed 5983.76 samples/sec   Loss 1.8351   LearningRate 0.0026   Epoch: 18   Global Step: 192210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:57:29,698-Speed 5984.19 samples/sec   Loss 1.8394   LearningRate 0.0026   Epoch: 18   Global Step: 192220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:57:36,541-Speed 5987.25 samples/sec   Loss 1.8752   LearningRate 0.0026   Epoch: 18   Global Step: 192230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:57:43,390-Speed 5980.83 samples/sec   Loss 1.8843   LearningRate 0.0026   Epoch: 18   Global Step: 192240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:57:50,244-Speed 5977.55 samples/sec   Loss 1.8757   LearningRate 0.0026   Epoch: 18   Global Step: 192250   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 09:57:57,095-Speed 5980.14 samples/sec   Loss 1.8574   LearningRate 0.0026   Epoch: 18   Global Step: 192260   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 09:58:03,944-Speed 5981.05 samples/sec   Loss 1.8346   LearningRate 0.0026   Epoch: 18   Global Step: 192270   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 09:58:10,785-Speed 5988.82 samples/sec   Loss 1.8453   LearningRate 0.0026   Epoch: 18   Global Step: 192280   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 09:58:17,639-Speed 5978.73 samples/sec   Loss 1.8206   LearningRate 0.0026   Epoch: 18   Global Step: 192290   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 09:58:24,488-Speed 5981.15 samples/sec   Loss 1.8532   LearningRate 0.0026   Epoch: 18   Global Step: 192300   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 09:58:31,344-Speed 5975.97 samples/sec   Loss 1.8571   LearningRate 0.0026   Epoch: 18   Global Step: 192310   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 09:58:38,195-Speed 5980.04 samples/sec   Loss 1.8391   LearningRate 0.0026   Epoch: 18   Global Step: 192320   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 09:58:45,044-Speed 5980.58 samples/sec   Loss 1.8251   LearningRate 0.0026   Epoch: 18   Global Step: 192330   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 09:59:07,939-Speed 1789.25 samples/sec   Loss 1.8344   LearningRate 0.0026   Epoch: 18   Global Step: 192340   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-01-09 09:59:14,803-Speed 5968.98 samples/sec   Loss 1.8601   LearningRate 0.0026   Epoch: 18   Global Step: 192350   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:59:21,648-Speed 5984.98 samples/sec   Loss 1.8551   LearningRate 0.0026   Epoch: 18   Global Step: 192360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:59:28,493-Speed 5984.89 samples/sec   Loss 1.8296   LearningRate 0.0026   Epoch: 18   Global Step: 192370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:59:35,354-Speed 5970.96 samples/sec   Loss 1.8180   LearningRate 0.0026   Epoch: 18   Global Step: 192380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:59:42,216-Speed 5969.67 samples/sec   Loss 1.8486   LearningRate 0.0026   Epoch: 18   Global Step: 192390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:59:49,073-Speed 5975.76 samples/sec   Loss 1.8307   LearningRate 0.0026   Epoch: 18   Global Step: 192400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 09:59:55,927-Speed 5977.48 samples/sec   Loss 1.8292   LearningRate 0.0026   Epoch: 18   Global Step: 192410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:00:02,801-Speed 5959.09 samples/sec   Loss 1.8633   LearningRate 0.0026   Epoch: 18   Global Step: 192420   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:00:09,689-Speed 5948.21 samples/sec   Loss 1.8238   LearningRate 0.0026   Epoch: 18   Global Step: 192430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:00:16,583-Speed 5943.27 samples/sec   Loss 1.8266   LearningRate 0.0026   Epoch: 18   Global Step: 192440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:00:23,468-Speed 5950.16 samples/sec   Loss 1.8314   LearningRate 0.0026   Epoch: 18   Global Step: 192450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:00:30,340-Speed 5961.69 samples/sec   Loss 1.8394   LearningRate 0.0026   Epoch: 18   Global Step: 192460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:00:37,218-Speed 5956.52 samples/sec   Loss 1.8305   LearningRate 0.0026   Epoch: 18   Global Step: 192470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:00:44,091-Speed 5960.60 samples/sec   Loss 1.8145   LearningRate 0.0025   Epoch: 18   Global Step: 192480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:00:50,957-Speed 5967.24 samples/sec   Loss 1.8299   LearningRate 0.0025   Epoch: 18   Global Step: 192490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:00:57,809-Speed 5980.76 samples/sec   Loss 1.8112   LearningRate 0.0025   Epoch: 18   Global Step: 192500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:01:04,664-Speed 5976.26 samples/sec   Loss 1.8518   LearningRate 0.0025   Epoch: 18   Global Step: 192510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:01:11,518-Speed 5977.67 samples/sec   Loss 1.8061   LearningRate 0.0025   Epoch: 18   Global Step: 192520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:01:18,388-Speed 5963.12 samples/sec   Loss 1.8211   LearningRate 0.0025   Epoch: 18   Global Step: 192530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:01:25,248-Speed 5972.06 samples/sec   Loss 1.8470   LearningRate 0.0025   Epoch: 18   Global Step: 192540   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:01:32,116-Speed 5964.78 samples/sec   Loss 1.8426   LearningRate 0.0025   Epoch: 18   Global Step: 192550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:01:38,989-Speed 5961.57 samples/sec   Loss 1.8263   LearningRate 0.0025   Epoch: 18   Global Step: 192560   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:01:45,860-Speed 5962.16 samples/sec   Loss 1.8680   LearningRate 0.0025   Epoch: 18   Global Step: 192570   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:01:52,738-Speed 5957.05 samples/sec   Loss 1.8417   LearningRate 0.0025   Epoch: 18   Global Step: 192580   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:01:59,597-Speed 5974.67 samples/sec   Loss 1.8178   LearningRate 0.0025   Epoch: 18   Global Step: 192590   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:02:06,482-Speed 5950.24 samples/sec   Loss 1.8385   LearningRate 0.0025   Epoch: 18   Global Step: 192600   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:02:13,364-Speed 5953.04 samples/sec   Loss 1.8157   LearningRate 0.0025   Epoch: 18   Global Step: 192610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:02:20,270-Speed 5934.24 samples/sec   Loss 1.8349   LearningRate 0.0025   Epoch: 18   Global Step: 192620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:02:27,127-Speed 5974.08 samples/sec   Loss 1.8239   LearningRate 0.0025   Epoch: 18   Global Step: 192630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:02:33,984-Speed 5974.91 samples/sec   Loss 1.8315   LearningRate 0.0025   Epoch: 18   Global Step: 192640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:02:40,856-Speed 5961.82 samples/sec   Loss 1.8086   LearningRate 0.0025   Epoch: 18   Global Step: 192650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:02:47,717-Speed 5971.58 samples/sec   Loss 1.8429   LearningRate 0.0025   Epoch: 18   Global Step: 192660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:02:54,590-Speed 5960.70 samples/sec   Loss 1.8266   LearningRate 0.0025   Epoch: 18   Global Step: 192670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:03:01,459-Speed 5964.98 samples/sec   Loss 1.8127   LearningRate 0.0025   Epoch: 18   Global Step: 192680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:03:08,319-Speed 5971.67 samples/sec   Loss 1.8584   LearningRate 0.0025   Epoch: 18   Global Step: 192690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:03:15,156-Speed 5992.28 samples/sec   Loss 1.8132   LearningRate 0.0025   Epoch: 18   Global Step: 192700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:03:22,002-Speed 5984.24 samples/sec   Loss 1.8155   LearningRate 0.0025   Epoch: 18   Global Step: 192710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:03:28,883-Speed 5953.62 samples/sec   Loss 1.8179   LearningRate 0.0025   Epoch: 18   Global Step: 192720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:03:35,743-Speed 5973.80 samples/sec   Loss 1.8359   LearningRate 0.0025   Epoch: 18   Global Step: 192730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:03:42,608-Speed 5970.69 samples/sec   Loss 1.8486   LearningRate 0.0025   Epoch: 18   Global Step: 192740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:03:49,483-Speed 5959.01 samples/sec   Loss 1.8145   LearningRate 0.0025   Epoch: 18   Global Step: 192750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:03:56,340-Speed 5974.83 samples/sec   Loss 1.8373   LearningRate 0.0025   Epoch: 18   Global Step: 192760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:04:03,207-Speed 5965.66 samples/sec   Loss 1.8062   LearningRate 0.0025   Epoch: 18   Global Step: 192770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:04:10,072-Speed 5967.34 samples/sec   Loss 1.8317   LearningRate 0.0024   Epoch: 18   Global Step: 192780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:04:16,926-Speed 5977.86 samples/sec   Loss 1.8019   LearningRate 0.0024   Epoch: 18   Global Step: 192790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:04:23,774-Speed 5982.88 samples/sec   Loss 1.8033   LearningRate 0.0024   Epoch: 18   Global Step: 192800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:04:30,626-Speed 5978.29 samples/sec   Loss 1.8142   LearningRate 0.0024   Epoch: 18   Global Step: 192810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:04:37,479-Speed 5978.24 samples/sec   Loss 1.8427   LearningRate 0.0024   Epoch: 18   Global Step: 192820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:04:44,332-Speed 5979.28 samples/sec   Loss 1.8451   LearningRate 0.0024   Epoch: 18   Global Step: 192830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:04:51,197-Speed 5966.57 samples/sec   Loss 1.8376   LearningRate 0.0024   Epoch: 18   Global Step: 192840   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:04:58,054-Speed 5974.89 samples/sec   Loss 1.7947   LearningRate 0.0024   Epoch: 18   Global Step: 192850   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:05:04,921-Speed 5966.67 samples/sec   Loss 1.7947   LearningRate 0.0024   Epoch: 18   Global Step: 192860   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:05:11,820-Speed 5937.97 samples/sec   Loss 1.8383   LearningRate 0.0024   Epoch: 18   Global Step: 192870   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:05:18,709-Speed 5946.64 samples/sec   Loss 1.8074   LearningRate 0.0024   Epoch: 18   Global Step: 192880   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:05:25,587-Speed 5959.53 samples/sec   Loss 1.8001   LearningRate 0.0024   Epoch: 18   Global Step: 192890   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:05:32,440-Speed 5978.03 samples/sec   Loss 1.7838   LearningRate 0.0024   Epoch: 18   Global Step: 192900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:05:39,304-Speed 5968.54 samples/sec   Loss 1.7864   LearningRate 0.0024   Epoch: 18   Global Step: 192910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:05:46,163-Speed 5973.94 samples/sec   Loss 1.8223   LearningRate 0.0024   Epoch: 18   Global Step: 192920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:05:53,030-Speed 5965.55 samples/sec   Loss 1.8218   LearningRate 0.0024   Epoch: 18   Global Step: 192930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:05:59,874-Speed 5986.15 samples/sec   Loss 1.8109   LearningRate 0.0024   Epoch: 18   Global Step: 192940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:06:06,736-Speed 5970.54 samples/sec   Loss 1.8019   LearningRate 0.0024   Epoch: 18   Global Step: 192950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:06:13,579-Speed 5986.34 samples/sec   Loss 1.8140   LearningRate 0.0024   Epoch: 18   Global Step: 192960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:06:20,434-Speed 5976.84 samples/sec   Loss 1.8315   LearningRate 0.0024   Epoch: 18   Global Step: 192970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:06:27,294-Speed 5971.38 samples/sec   Loss 1.7965   LearningRate 0.0024   Epoch: 18   Global Step: 192980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:06:34,139-Speed 5985.53 samples/sec   Loss 1.8586   LearningRate 0.0024   Epoch: 18   Global Step: 192990   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:06:40,994-Speed 5976.45 samples/sec   Loss 1.8151   LearningRate 0.0024   Epoch: 18   Global Step: 193000   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:06:47,837-Speed 5986.86 samples/sec   Loss 1.8087   LearningRate 0.0024   Epoch: 18   Global Step: 193010   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:06:54,710-Speed 5960.03 samples/sec   Loss 1.8252   LearningRate 0.0024   Epoch: 18   Global Step: 193020   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:07:01,556-Speed 5984.37 samples/sec   Loss 1.8302   LearningRate 0.0024   Epoch: 18   Global Step: 193030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:07:08,409-Speed 5978.62 samples/sec   Loss 1.8182   LearningRate 0.0024   Epoch: 18   Global Step: 193040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:07:15,265-Speed 5975.52 samples/sec   Loss 1.8355   LearningRate 0.0024   Epoch: 18   Global Step: 193050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:07:22,121-Speed 5975.55 samples/sec   Loss 1.8136   LearningRate 0.0024   Epoch: 18   Global Step: 193060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:07:29,036-Speed 5925.35 samples/sec   Loss 1.8199   LearningRate 0.0024   Epoch: 18   Global Step: 193070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:07:35,886-Speed 5981.21 samples/sec   Loss 1.7841   LearningRate 0.0023   Epoch: 18   Global Step: 193080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:07:42,749-Speed 5969.11 samples/sec   Loss 1.8230   LearningRate 0.0023   Epoch: 18   Global Step: 193090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:07:49,599-Speed 5981.01 samples/sec   Loss 1.8181   LearningRate 0.0023   Epoch: 18   Global Step: 193100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:07:56,449-Speed 5980.51 samples/sec   Loss 1.8146   LearningRate 0.0023   Epoch: 18   Global Step: 193110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:08:03,306-Speed 5974.41 samples/sec   Loss 1.8417   LearningRate 0.0023   Epoch: 18   Global Step: 193120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:08:10,160-Speed 5977.98 samples/sec   Loss 1.7874   LearningRate 0.0023   Epoch: 18   Global Step: 193130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:08:17,007-Speed 5983.11 samples/sec   Loss 1.8025   LearningRate 0.0023   Epoch: 18   Global Step: 193140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:08:23,854-Speed 5983.11 samples/sec   Loss 1.8148   LearningRate 0.0023   Epoch: 18   Global Step: 193150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:08:30,709-Speed 5976.95 samples/sec   Loss 1.8075   LearningRate 0.0023   Epoch: 18   Global Step: 193160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:08:37,570-Speed 5970.40 samples/sec   Loss 1.8250   LearningRate 0.0023   Epoch: 18   Global Step: 193170   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:08:44,419-Speed 5981.45 samples/sec   Loss 1.8366   LearningRate 0.0023   Epoch: 18   Global Step: 193180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:08:51,286-Speed 5966.38 samples/sec   Loss 1.8329   LearningRate 0.0023   Epoch: 18   Global Step: 193190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:08:58,155-Speed 5964.04 samples/sec   Loss 1.7875   LearningRate 0.0023   Epoch: 18   Global Step: 193200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:09:05,000-Speed 5985.31 samples/sec   Loss 1.7961   LearningRate 0.0023   Epoch: 18   Global Step: 193210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:09:11,865-Speed 5967.29 samples/sec   Loss 1.8260   LearningRate 0.0023   Epoch: 18   Global Step: 193220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:09:18,713-Speed 5982.22 samples/sec   Loss 1.8017   LearningRate 0.0023   Epoch: 18   Global Step: 193230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:09:25,595-Speed 5954.85 samples/sec   Loss 1.8211   LearningRate 0.0023   Epoch: 18   Global Step: 193240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:09:32,487-Speed 5944.20 samples/sec   Loss 1.7938   LearningRate 0.0023   Epoch: 18   Global Step: 193250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:09:39,375-Speed 5947.86 samples/sec   Loss 1.8190   LearningRate 0.0023   Epoch: 18   Global Step: 193260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:09:46,250-Speed 5958.94 samples/sec   Loss 1.8360   LearningRate 0.0023   Epoch: 18   Global Step: 193270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:09:53,122-Speed 5961.44 samples/sec   Loss 1.8128   LearningRate 0.0023   Epoch: 18   Global Step: 193280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:09:59,984-Speed 5970.79 samples/sec   Loss 1.8181   LearningRate 0.0023   Epoch: 18   Global Step: 193290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:10:06,823-Speed 5990.26 samples/sec   Loss 1.7730   LearningRate 0.0023   Epoch: 18   Global Step: 193300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:10:13,684-Speed 5971.61 samples/sec   Loss 1.7763   LearningRate 0.0023   Epoch: 18   Global Step: 193310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:10:20,526-Speed 5986.47 samples/sec   Loss 1.7860   LearningRate 0.0023   Epoch: 18   Global Step: 193320   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:10:27,380-Speed 5977.88 samples/sec   Loss 1.7921   LearningRate 0.0023   Epoch: 18   Global Step: 193330   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:10:34,272-Speed 5943.99 samples/sec   Loss 1.8132   LearningRate 0.0023   Epoch: 18   Global Step: 193340   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:10:41,143-Speed 5962.73 samples/sec   Loss 1.8051   LearningRate 0.0023   Epoch: 18   Global Step: 193350   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:10:48,001-Speed 5973.44 samples/sec   Loss 1.7861   LearningRate 0.0023   Epoch: 18   Global Step: 193360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:10:54,857-Speed 5975.61 samples/sec   Loss 1.7927   LearningRate 0.0023   Epoch: 18   Global Step: 193370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:11:01,710-Speed 5978.29 samples/sec   Loss 1.7867   LearningRate 0.0023   Epoch: 18   Global Step: 193380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:11:08,570-Speed 5972.42 samples/sec   Loss 1.8211   LearningRate 0.0022   Epoch: 18   Global Step: 193390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:11:15,429-Speed 5973.19 samples/sec   Loss 1.8031   LearningRate 0.0022   Epoch: 18   Global Step: 193400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:11:22,299-Speed 5962.62 samples/sec   Loss 1.7806   LearningRate 0.0022   Epoch: 18   Global Step: 193410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:11:29,155-Speed 5976.02 samples/sec   Loss 1.7923   LearningRate 0.0022   Epoch: 18   Global Step: 193420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:11:36,015-Speed 5971.86 samples/sec   Loss 1.8032   LearningRate 0.0022   Epoch: 18   Global Step: 193430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:11:42,900-Speed 5950.70 samples/sec   Loss 1.7724   LearningRate 0.0022   Epoch: 18   Global Step: 193440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:11:49,767-Speed 5965.84 samples/sec   Loss 1.8116   LearningRate 0.0022   Epoch: 18   Global Step: 193450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:11:56,617-Speed 5981.03 samples/sec   Loss 1.8062   LearningRate 0.0022   Epoch: 18   Global Step: 193460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:12:03,466-Speed 5981.41 samples/sec   Loss 1.7804   LearningRate 0.0022   Epoch: 18   Global Step: 193470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:12:10,318-Speed 5979.77 samples/sec   Loss 1.7980   LearningRate 0.0022   Epoch: 18   Global Step: 193480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:12:17,182-Speed 5971.17 samples/sec   Loss 1.7792   LearningRate 0.0022   Epoch: 18   Global Step: 193490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:12:24,055-Speed 5960.54 samples/sec   Loss 1.8008   LearningRate 0.0022   Epoch: 18   Global Step: 193500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:12:30,900-Speed 5984.58 samples/sec   Loss 1.7888   LearningRate 0.0022   Epoch: 18   Global Step: 193510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:12:37,746-Speed 5984.37 samples/sec   Loss 1.7775   LearningRate 0.0022   Epoch: 18   Global Step: 193520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:12:44,596-Speed 5980.48 samples/sec   Loss 1.7808   LearningRate 0.0022   Epoch: 18   Global Step: 193530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:12:51,444-Speed 5983.13 samples/sec   Loss 1.8202   LearningRate 0.0022   Epoch: 18   Global Step: 193540   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:12:58,334-Speed 5945.64 samples/sec   Loss 1.7698   LearningRate 0.0022   Epoch: 18   Global Step: 193550   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:13:05,194-Speed 5971.69 samples/sec   Loss 1.8218   LearningRate 0.0022   Epoch: 18   Global Step: 193560   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:13:12,073-Speed 5955.75 samples/sec   Loss 1.7923   LearningRate 0.0022   Epoch: 18   Global Step: 193570   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:13:18,931-Speed 5974.73 samples/sec   Loss 1.7796   LearningRate 0.0022   Epoch: 18   Global Step: 193580   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:13:25,797-Speed 5966.52 samples/sec   Loss 1.7688   LearningRate 0.0022   Epoch: 18   Global Step: 193590   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:13:32,654-Speed 5974.87 samples/sec   Loss 1.8086   LearningRate 0.0022   Epoch: 18   Global Step: 193600   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:13:45,111-Speed 3288.79 samples/sec   Loss 1.7868   LearningRate 0.0022   Epoch: 18   Global Step: 193610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:13:51,957-Speed 5984.28 samples/sec   Loss 1.7919   LearningRate 0.0022   Epoch: 18   Global Step: 193620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:13:58,823-Speed 5966.90 samples/sec   Loss 1.8039   LearningRate 0.0022   Epoch: 18   Global Step: 193630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:14:05,661-Speed 5990.92 samples/sec   Loss 1.7771   LearningRate 0.0022   Epoch: 18   Global Step: 193640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:14:12,494-Speed 5994.91 samples/sec   Loss 1.7618   LearningRate 0.0022   Epoch: 18   Global Step: 193650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:14:19,323-Speed 5998.83 samples/sec   Loss 1.8232   LearningRate 0.0022   Epoch: 18   Global Step: 193660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:14:26,192-Speed 5964.26 samples/sec   Loss 1.8087   LearningRate 0.0022   Epoch: 18   Global Step: 193670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:14:33,072-Speed 5954.95 samples/sec   Loss 1.8127   LearningRate 0.0022   Epoch: 18   Global Step: 193680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:14:39,980-Speed 5930.52 samples/sec   Loss 1.7775   LearningRate 0.0022   Epoch: 18   Global Step: 193690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:14:46,894-Speed 5926.87 samples/sec   Loss 1.7689   LearningRate 0.0021   Epoch: 18   Global Step: 193700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:14:53,739-Speed 5985.61 samples/sec   Loss 1.7786   LearningRate 0.0021   Epoch: 18   Global Step: 193710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:15:00,632-Speed 5943.66 samples/sec   Loss 1.7799   LearningRate 0.0021   Epoch: 18   Global Step: 193720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:15:07,563-Speed 5910.74 samples/sec   Loss 1.7910   LearningRate 0.0021   Epoch: 18   Global Step: 193730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:15:14,473-Speed 5931.47 samples/sec   Loss 1.7779   LearningRate 0.0021   Epoch: 18   Global Step: 193740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:15:21,332-Speed 5972.45 samples/sec   Loss 1.7766   LearningRate 0.0021   Epoch: 18   Global Step: 193750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:15:28,189-Speed 5974.53 samples/sec   Loss 1.7905   LearningRate 0.0021   Epoch: 18   Global Step: 193760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:15:35,098-Speed 5930.57 samples/sec   Loss 1.7758   LearningRate 0.0021   Epoch: 18   Global Step: 193770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:15:42,026-Speed 5913.52 samples/sec   Loss 1.7866   LearningRate 0.0021   Epoch: 18   Global Step: 193780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:15:48,952-Speed 5915.22 samples/sec   Loss 1.7764   LearningRate 0.0021   Epoch: 18   Global Step: 193790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:15:55,845-Speed 5946.25 samples/sec   Loss 1.7882   LearningRate 0.0021   Epoch: 18   Global Step: 193800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:16:02,707-Speed 5970.50 samples/sec   Loss 1.7483   LearningRate 0.0021   Epoch: 18   Global Step: 193810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:16:09,574-Speed 5965.76 samples/sec   Loss 1.7472   LearningRate 0.0021   Epoch: 18   Global Step: 193820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:16:16,432-Speed 5974.50 samples/sec   Loss 1.8054   LearningRate 0.0021   Epoch: 18   Global Step: 193830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:16:23,292-Speed 5971.77 samples/sec   Loss 1.7665   LearningRate 0.0021   Epoch: 18   Global Step: 193840   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:16:30,207-Speed 5926.43 samples/sec   Loss 1.7569   LearningRate 0.0021   Epoch: 18   Global Step: 193850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:16:37,074-Speed 5965.93 samples/sec   Loss 1.7515   LearningRate 0.0021   Epoch: 18   Global Step: 193860   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:16:43,933-Speed 5972.26 samples/sec   Loss 1.7581   LearningRate 0.0021   Epoch: 18   Global Step: 193870   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:16:50,798-Speed 5968.81 samples/sec   Loss 1.7944   LearningRate 0.0021   Epoch: 18   Global Step: 193880   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:16:57,655-Speed 5976.35 samples/sec   Loss 1.8005   LearningRate 0.0021   Epoch: 18   Global Step: 193890   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:17:04,506-Speed 5979.57 samples/sec   Loss 1.7861   LearningRate 0.0021   Epoch: 18   Global Step: 193900   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:17:11,370-Speed 5968.53 samples/sec   Loss 1.8068   LearningRate 0.0021   Epoch: 18   Global Step: 193910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:17:18,241-Speed 5964.31 samples/sec   Loss 1.7427   LearningRate 0.0021   Epoch: 18   Global Step: 193920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:17:25,098-Speed 5974.83 samples/sec   Loss 1.7938   LearningRate 0.0021   Epoch: 18   Global Step: 193930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:17:32,075-Speed 5871.77 samples/sec   Loss 1.7635   LearningRate 0.0021   Epoch: 18   Global Step: 193940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:17:39,023-Speed 5897.30 samples/sec   Loss 1.7434   LearningRate 0.0021   Epoch: 18   Global Step: 193950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:17:45,974-Speed 5895.86 samples/sec   Loss 1.7826   LearningRate 0.0021   Epoch: 18   Global Step: 193960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:17:52,840-Speed 5967.10 samples/sec   Loss 1.7823   LearningRate 0.0021   Epoch: 18   Global Step: 193970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:17:59,686-Speed 5983.68 samples/sec   Loss 1.7702   LearningRate 0.0021   Epoch: 18   Global Step: 193980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:18:06,538-Speed 5979.45 samples/sec   Loss 1.8009   LearningRate 0.0021   Epoch: 18   Global Step: 193990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:18:13,485-Speed 5898.46 samples/sec   Loss 1.7487   LearningRate 0.0021   Epoch: 18   Global Step: 194000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:18:20,354-Speed 5964.02 samples/sec   Loss 1.7654   LearningRate 0.0021   Epoch: 18   Global Step: 194010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:18:27,224-Speed 5963.34 samples/sec   Loss 1.7757   LearningRate 0.0020   Epoch: 18   Global Step: 194020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:18:34,096-Speed 5961.42 samples/sec   Loss 1.7995   LearningRate 0.0020   Epoch: 18   Global Step: 194030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:18:40,972-Speed 5960.49 samples/sec   Loss 1.7589   LearningRate 0.0020   Epoch: 18   Global Step: 194040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:18:47,873-Speed 5936.25 samples/sec   Loss 1.7469   LearningRate 0.0020   Epoch: 18   Global Step: 194050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:18:54,746-Speed 5961.36 samples/sec   Loss 1.7954   LearningRate 0.0020   Epoch: 18   Global Step: 194060   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-01-09 10:19:01,584-Speed 5991.36 samples/sec   Loss 1.7577   LearningRate 0.0020   Epoch: 18   Global Step: 194070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:19:08,467-Speed 5952.28 samples/sec   Loss 1.7673   LearningRate 0.0020   Epoch: 18   Global Step: 194080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:19:15,407-Speed 5903.21 samples/sec   Loss 1.7684   LearningRate 0.0020   Epoch: 18   Global Step: 194090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:19:22,278-Speed 5962.41 samples/sec   Loss 1.7890   LearningRate 0.0020   Epoch: 18   Global Step: 194100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:19:29,143-Speed 5968.27 samples/sec   Loss 1.7647   LearningRate 0.0020   Epoch: 18   Global Step: 194110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:19:36,043-Speed 5937.28 samples/sec   Loss 1.7828   LearningRate 0.0020   Epoch: 18   Global Step: 194120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:19:42,911-Speed 5967.12 samples/sec   Loss 1.7733   LearningRate 0.0020   Epoch: 18   Global Step: 194130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:19:49,773-Speed 5969.89 samples/sec   Loss 1.7412   LearningRate 0.0020   Epoch: 18   Global Step: 194140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:19:56,640-Speed 5966.59 samples/sec   Loss 1.7601   LearningRate 0.0020   Epoch: 18   Global Step: 194150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:20:03,493-Speed 5978.17 samples/sec   Loss 1.7621   LearningRate 0.0020   Epoch: 18   Global Step: 194160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:20:10,347-Speed 5976.79 samples/sec   Loss 1.7674   LearningRate 0.0020   Epoch: 18   Global Step: 194170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:20:17,214-Speed 5965.81 samples/sec   Loss 1.7763   LearningRate 0.0020   Epoch: 18   Global Step: 194180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:20:24,064-Speed 5980.79 samples/sec   Loss 1.7542   LearningRate 0.0020   Epoch: 18   Global Step: 194190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:20:30,907-Speed 5986.62 samples/sec   Loss 1.7567   LearningRate 0.0020   Epoch: 18   Global Step: 194200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:20:37,822-Speed 5924.62 samples/sec   Loss 1.7646   LearningRate 0.0020   Epoch: 18   Global Step: 194210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:20:44,696-Speed 5960.32 samples/sec   Loss 1.7626   LearningRate 0.0020   Epoch: 18   Global Step: 194220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:20:51,602-Speed 5931.88 samples/sec   Loss 1.7835   LearningRate 0.0020   Epoch: 18   Global Step: 194230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:20:58,500-Speed 5939.35 samples/sec   Loss 1.7559   LearningRate 0.0020   Epoch: 18   Global Step: 194240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:21:05,398-Speed 5939.05 samples/sec   Loss 1.7578   LearningRate 0.0020   Epoch: 18   Global Step: 194250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:21:12,304-Speed 5931.95 samples/sec   Loss 1.7630   LearningRate 0.0020   Epoch: 18   Global Step: 194260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:21:19,151-Speed 5983.83 samples/sec   Loss 1.7817   LearningRate 0.0020   Epoch: 18   Global Step: 194270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:21:26,036-Speed 5950.43 samples/sec   Loss 1.7793   LearningRate 0.0020   Epoch: 18   Global Step: 194280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:21:32,889-Speed 5977.49 samples/sec   Loss 1.7593   LearningRate 0.0020   Epoch: 18   Global Step: 194290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:21:39,750-Speed 5971.62 samples/sec   Loss 1.7594   LearningRate 0.0020   Epoch: 18   Global Step: 194300   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:21:46,618-Speed 5967.42 samples/sec   Loss 1.7581   LearningRate 0.0020   Epoch: 18   Global Step: 194310   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:21:53,478-Speed 5971.34 samples/sec   Loss 1.7764   LearningRate 0.0020   Epoch: 18   Global Step: 194320   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:22:00,359-Speed 5954.67 samples/sec   Loss 1.7723   LearningRate 0.0020   Epoch: 18   Global Step: 194330   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:22:07,228-Speed 5964.11 samples/sec   Loss 1.7702   LearningRate 0.0020   Epoch: 18   Global Step: 194340   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:22:14,074-Speed 5983.48 samples/sec   Loss 1.7845   LearningRate 0.0019   Epoch: 18   Global Step: 194350   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:22:20,929-Speed 5976.21 samples/sec   Loss 1.7717   LearningRate 0.0019   Epoch: 18   Global Step: 194360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:22:27,776-Speed 5983.29 samples/sec   Loss 1.7716   LearningRate 0.0019   Epoch: 18   Global Step: 194370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:22:34,633-Speed 5974.57 samples/sec   Loss 1.7658   LearningRate 0.0019   Epoch: 18   Global Step: 194380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:22:41,480-Speed 5982.96 samples/sec   Loss 1.7584   LearningRate 0.0019   Epoch: 18   Global Step: 194390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:22:48,326-Speed 5984.32 samples/sec   Loss 1.7823   LearningRate 0.0019   Epoch: 18   Global Step: 194400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:22:55,175-Speed 5981.27 samples/sec   Loss 1.7293   LearningRate 0.0019   Epoch: 18   Global Step: 194410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:23:02,033-Speed 5973.64 samples/sec   Loss 1.7678   LearningRate 0.0019   Epoch: 18   Global Step: 194420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:23:08,896-Speed 5969.64 samples/sec   Loss 1.7385   LearningRate 0.0019   Epoch: 18   Global Step: 194430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:23:15,744-Speed 5982.13 samples/sec   Loss 1.7692   LearningRate 0.0019   Epoch: 18   Global Step: 194440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:23:22,578-Speed 5994.45 samples/sec   Loss 1.7799   LearningRate 0.0019   Epoch: 18   Global Step: 194450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:23:29,443-Speed 5968.33 samples/sec   Loss 1.7399   LearningRate 0.0019   Epoch: 18   Global Step: 194460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:23:36,296-Speed 5977.77 samples/sec   Loss 1.7434   LearningRate 0.0019   Epoch: 18   Global Step: 194470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:23:43,179-Speed 5952.24 samples/sec   Loss 1.7399   LearningRate 0.0019   Epoch: 18   Global Step: 194480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:23:50,039-Speed 5972.59 samples/sec   Loss 1.7338   LearningRate 0.0019   Epoch: 18   Global Step: 194490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:23:56,913-Speed 5959.23 samples/sec   Loss 1.7473   LearningRate 0.0019   Epoch: 18   Global Step: 194500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:24:03,760-Speed 5983.95 samples/sec   Loss 1.7626   LearningRate 0.0019   Epoch: 18   Global Step: 194510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:24:10,627-Speed 5966.05 samples/sec   Loss 1.7466   LearningRate 0.0019   Epoch: 18   Global Step: 194520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:24:17,485-Speed 5974.07 samples/sec   Loss 1.7643   LearningRate 0.0019   Epoch: 18   Global Step: 194530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:24:24,335-Speed 5980.45 samples/sec   Loss 1.7885   LearningRate 0.0019   Epoch: 18   Global Step: 194540   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:24:31,188-Speed 5978.96 samples/sec   Loss 1.7330   LearningRate 0.0019   Epoch: 18   Global Step: 194550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:24:38,046-Speed 5973.09 samples/sec   Loss 1.7408   LearningRate 0.0019   Epoch: 18   Global Step: 194560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:24:44,901-Speed 5976.83 samples/sec   Loss 1.7314   LearningRate 0.0019   Epoch: 18   Global Step: 194570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-01-09 10:24:51,750-Speed 5981.92 samples/sec   Loss 1.7367   LearningRate 0.0019   Epoch: 18   Global Step: 194580   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:24:58,601-Speed 5979.57 samples/sec   Loss 1.7476   LearningRate 0.0019   Epoch: 18   Global Step: 194590   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:25:05,459-Speed 5973.47 samples/sec   Loss 1.7038   LearningRate 0.0019   Epoch: 18   Global Step: 194600   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:25:12,443-Speed 5866.68 samples/sec   Loss 1.7261   LearningRate 0.0019   Epoch: 18   Global Step: 194610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:25:19,305-Speed 5970.70 samples/sec   Loss 1.7274   LearningRate 0.0019   Epoch: 18   Global Step: 194620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-01-09 10:25:26,172-Speed 5966.27 samples/sec   Loss 1.7742   LearningRate 0.0019   Epoch: 18   Global Step: 194630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:25:33,112-Speed 5904.98 samples/sec   Loss 1.7232   LearningRate 0.0019   Epoch: 18   Global Step: 194640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:25:40,065-Speed 5892.42 samples/sec   Loss 1.7495   LearningRate 0.0019   Epoch: 18   Global Step: 194650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:25:46,918-Speed 5978.11 samples/sec   Loss 1.7664   LearningRate 0.0019   Epoch: 18   Global Step: 194660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:25:53,775-Speed 5977.82 samples/sec   Loss 1.7841   LearningRate 0.0019   Epoch: 18   Global Step: 194670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:26:00,632-Speed 5974.18 samples/sec   Loss 1.7694   LearningRate 0.0019   Epoch: 18   Global Step: 194680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:26:07,506-Speed 5960.02 samples/sec   Loss 1.7407   LearningRate 0.0018   Epoch: 18   Global Step: 194690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:26:14,356-Speed 5980.63 samples/sec   Loss 1.7221   LearningRate 0.0018   Epoch: 18   Global Step: 194700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:26:21,222-Speed 5966.93 samples/sec   Loss 1.7469   LearningRate 0.0018   Epoch: 18   Global Step: 194710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:26:28,072-Speed 5980.75 samples/sec   Loss 1.7537   LearningRate 0.0018   Epoch: 18   Global Step: 194720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:26:34,930-Speed 5974.77 samples/sec   Loss 1.7400   LearningRate 0.0018   Epoch: 18   Global Step: 194730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:26:41,864-Speed 5907.97 samples/sec   Loss 1.7681   LearningRate 0.0018   Epoch: 18   Global Step: 194740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:26:48,736-Speed 5961.35 samples/sec   Loss 1.7200   LearningRate 0.0018   Epoch: 18   Global Step: 194750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:26:55,586-Speed 5981.42 samples/sec   Loss 1.7412   LearningRate 0.0018   Epoch: 18   Global Step: 194760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:27:02,457-Speed 5961.68 samples/sec   Loss 1.7615   LearningRate 0.0018   Epoch: 18   Global Step: 194770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:27:09,315-Speed 5974.26 samples/sec   Loss 1.7476   LearningRate 0.0018   Epoch: 18   Global Step: 194780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:27:16,160-Speed 5985.36 samples/sec   Loss 1.7276   LearningRate 0.0018   Epoch: 18   Global Step: 194790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:27:23,011-Speed 5979.19 samples/sec   Loss 1.7462   LearningRate 0.0018   Epoch: 18   Global Step: 194800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:27:29,871-Speed 5972.79 samples/sec   Loss 1.7468   LearningRate 0.0018   Epoch: 18   Global Step: 194810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:27:36,719-Speed 5982.37 samples/sec   Loss 1.7058   LearningRate 0.0018   Epoch: 18   Global Step: 194820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:27:43,588-Speed 5964.10 samples/sec   Loss 1.7613   LearningRate 0.0018   Epoch: 18   Global Step: 194830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:27:50,429-Speed 5989.08 samples/sec   Loss 1.7428   LearningRate 0.0018   Epoch: 18   Global Step: 194840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:27:57,302-Speed 5960.51 samples/sec   Loss 1.7261   LearningRate 0.0018   Epoch: 18   Global Step: 194850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:28:04,164-Speed 5970.02 samples/sec   Loss 1.7654   LearningRate 0.0018   Epoch: 18   Global Step: 194860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:28:11,044-Speed 5956.31 samples/sec   Loss 1.7441   LearningRate 0.0018   Epoch: 18   Global Step: 194870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:28:17,899-Speed 5976.37 samples/sec   Loss 1.7211   LearningRate 0.0018   Epoch: 18   Global Step: 194880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:28:24,759-Speed 5972.20 samples/sec   Loss 1.7770   LearningRate 0.0018   Epoch: 18   Global Step: 194890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:28:31,662-Speed 5934.92 samples/sec   Loss 1.7382   LearningRate 0.0018   Epoch: 18   Global Step: 194900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:28:38,560-Speed 5939.79 samples/sec   Loss 1.7394   LearningRate 0.0018   Epoch: 18   Global Step: 194910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:28:45,431-Speed 5962.23 samples/sec   Loss 1.7137   LearningRate 0.0018   Epoch: 18   Global Step: 194920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:28:52,268-Speed 5991.70 samples/sec   Loss 1.7464   LearningRate 0.0018   Epoch: 18   Global Step: 194930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:28:59,118-Speed 5981.10 samples/sec   Loss 1.7214   LearningRate 0.0018   Epoch: 18   Global Step: 194940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:29:05,951-Speed 5994.65 samples/sec   Loss 1.7371   LearningRate 0.0018   Epoch: 18   Global Step: 194950   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 10:29:12,814-Speed 5969.21 samples/sec   Loss 1.7662   LearningRate 0.0018   Epoch: 18   Global Step: 194960   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 10:29:19,673-Speed 5973.33 samples/sec   Loss 1.7393   LearningRate 0.0018   Epoch: 18   Global Step: 194970   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 10:29:26,533-Speed 5971.40 samples/sec   Loss 1.7329   LearningRate 0.0018   Epoch: 18   Global Step: 194980   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 10:29:33,386-Speed 5978.44 samples/sec   Loss 1.7445   LearningRate 0.0018   Epoch: 18   Global Step: 194990   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 10:29:40,235-Speed 5981.25 samples/sec   Loss 1.7025   LearningRate 0.0018   Epoch: 18   Global Step: 195000   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 10:30:06,980-[lfw][195000]XNorm: 23.604671
Training: 2022-01-09 10:30:06,980-[lfw][195000]Accuracy-Flip: 0.99800+-0.00267
Training: 2022-01-09 10:30:06,981-[lfw][195000]Accuracy-Highest: 0.99833
Training: 2022-01-09 10:30:38,026-[cfp_fp][195000]XNorm: 21.649061
Training: 2022-01-09 10:30:38,027-[cfp_fp][195000]Accuracy-Flip: 0.99286+-0.00313
Training: 2022-01-09 10:30:38,027-[cfp_fp][195000]Accuracy-Highest: 0.99286
Training: 2022-01-09 10:31:04,928-[agedb_30][195000]XNorm: 23.145942
Training: 2022-01-09 10:31:04,929-[agedb_30][195000]Accuracy-Flip: 0.98233+-0.00583
Training: 2022-01-09 10:31:04,930-[agedb_30][195000]Accuracy-Highest: 0.98233
Training: 2022-01-09 10:31:11,761-Speed 447.53 samples/sec   Loss 1.7544   LearningRate 0.0018   Epoch: 18   Global Step: 195010   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 10:31:18,587-Speed 6001.40 samples/sec   Loss 1.7183   LearningRate 0.0018   Epoch: 18   Global Step: 195020   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 10:31:25,454-Speed 5965.80 samples/sec   Loss 1.7370   LearningRate 0.0018   Epoch: 18   Global Step: 195030   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 10:31:32,299-Speed 5985.02 samples/sec   Loss 1.7591   LearningRate 0.0017   Epoch: 18   Global Step: 195040   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-01-09 10:31:39,157-Speed 5976.46 samples/sec   Loss 1.7161   LearningRate 0.0017   Epoch: 18   Global Step: 195050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:31:46,034-Speed 5957.56 samples/sec   Loss 1.7307   LearningRate 0.0017   Epoch: 18   Global Step: 195060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:31:52,906-Speed 5962.11 samples/sec   Loss 1.7245   LearningRate 0.0017   Epoch: 18   Global Step: 195070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:31:59,772-Speed 5966.14 samples/sec   Loss 1.7288   LearningRate 0.0017   Epoch: 18   Global Step: 195080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:32:06,643-Speed 5963.34 samples/sec   Loss 1.7354   LearningRate 0.0017   Epoch: 18   Global Step: 195090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:32:13,497-Speed 5977.28 samples/sec   Loss 1.7238   LearningRate 0.0017   Epoch: 18   Global Step: 195100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:32:20,363-Speed 5966.90 samples/sec   Loss 1.7420   LearningRate 0.0017   Epoch: 18   Global Step: 195110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:32:27,237-Speed 5959.98 samples/sec   Loss 1.7292   LearningRate 0.0017   Epoch: 18   Global Step: 195120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:32:34,099-Speed 5970.12 samples/sec   Loss 1.7216   LearningRate 0.0017   Epoch: 18   Global Step: 195130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:32:40,961-Speed 5969.98 samples/sec   Loss 1.7074   LearningRate 0.0017   Epoch: 18   Global Step: 195140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:32:47,815-Speed 5977.75 samples/sec   Loss 1.7156   LearningRate 0.0017   Epoch: 18   Global Step: 195150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:32:54,664-Speed 5981.67 samples/sec   Loss 1.7159   LearningRate 0.0017   Epoch: 18   Global Step: 195160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:33:01,513-Speed 5981.69 samples/sec   Loss 1.7444   LearningRate 0.0017   Epoch: 18   Global Step: 195170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:33:08,371-Speed 5973.39 samples/sec   Loss 1.7191   LearningRate 0.0017   Epoch: 18   Global Step: 195180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:33:15,240-Speed 5964.54 samples/sec   Loss 1.7140   LearningRate 0.0017   Epoch: 18   Global Step: 195190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:33:22,096-Speed 5975.24 samples/sec   Loss 1.7133   LearningRate 0.0017   Epoch: 18   Global Step: 195200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:33:28,947-Speed 5980.90 samples/sec   Loss 1.7083   LearningRate 0.0017   Epoch: 18   Global Step: 195210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:33:35,785-Speed 5990.90 samples/sec   Loss 1.6837   LearningRate 0.0017   Epoch: 18   Global Step: 195220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:33:42,624-Speed 5990.12 samples/sec   Loss 1.7575   LearningRate 0.0017   Epoch: 18   Global Step: 195230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:33:49,482-Speed 5973.41 samples/sec   Loss 1.7172   LearningRate 0.0017   Epoch: 18   Global Step: 195240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:33:56,351-Speed 5965.18 samples/sec   Loss 1.7695   LearningRate 0.0017   Epoch: 18   Global Step: 195250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:34:03,193-Speed 5987.97 samples/sec   Loss 1.7496   LearningRate 0.0017   Epoch: 18   Global Step: 195260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:34:10,044-Speed 5979.97 samples/sec   Loss 1.7412   LearningRate 0.0017   Epoch: 18   Global Step: 195270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:34:16,887-Speed 5986.89 samples/sec   Loss 1.7342   LearningRate 0.0017   Epoch: 18   Global Step: 195280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:34:23,724-Speed 5991.79 samples/sec   Loss 1.7104   LearningRate 0.0017   Epoch: 18   Global Step: 195290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:34:30,577-Speed 5978.13 samples/sec   Loss 1.7188   LearningRate 0.0017   Epoch: 18   Global Step: 195300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:34:37,459-Speed 5954.14 samples/sec   Loss 1.7096   LearningRate 0.0017   Epoch: 18   Global Step: 195310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:34:44,326-Speed 5965.56 samples/sec   Loss 1.7102   LearningRate 0.0017   Epoch: 18   Global Step: 195320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:34:51,176-Speed 5983.65 samples/sec   Loss 1.6973   LearningRate 0.0017   Epoch: 18   Global Step: 195330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:34:58,035-Speed 5972.27 samples/sec   Loss 1.7199   LearningRate 0.0017   Epoch: 18   Global Step: 195340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:35:04,892-Speed 5974.89 samples/sec   Loss 1.7149   LearningRate 0.0017   Epoch: 18   Global Step: 195350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:35:11,738-Speed 5984.59 samples/sec   Loss 1.7409   LearningRate 0.0017   Epoch: 18   Global Step: 195360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:35:18,585-Speed 5983.18 samples/sec   Loss 1.7160   LearningRate 0.0017   Epoch: 18   Global Step: 195370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:35:25,420-Speed 5993.85 samples/sec   Loss 1.7131   LearningRate 0.0017   Epoch: 18   Global Step: 195380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:35:32,254-Speed 5995.17 samples/sec   Loss 1.7335   LearningRate 0.0017   Epoch: 18   Global Step: 195390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:35:39,087-Speed 5995.28 samples/sec   Loss 1.7317   LearningRate 0.0016   Epoch: 18   Global Step: 195400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:35:45,952-Speed 5967.33 samples/sec   Loss 1.7064   LearningRate 0.0016   Epoch: 18   Global Step: 195410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:35:52,838-Speed 5949.67 samples/sec   Loss 1.7099   LearningRate 0.0016   Epoch: 18   Global Step: 195420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:36:05,519-Speed 3230.33 samples/sec   Loss 1.7341   LearningRate 0.0016   Epoch: 18   Global Step: 195430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:36:12,363-Speed 5987.63 samples/sec   Loss 1.7249   LearningRate 0.0016   Epoch: 18   Global Step: 195440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:36:19,194-Speed 5996.40 samples/sec   Loss 1.7054   LearningRate 0.0016   Epoch: 18   Global Step: 195450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:36:26,068-Speed 5960.53 samples/sec   Loss 1.6874   LearningRate 0.0016   Epoch: 18   Global Step: 195460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:36:32,927-Speed 5972.56 samples/sec   Loss 1.7170   LearningRate 0.0016   Epoch: 18   Global Step: 195470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:36:39,764-Speed 5991.72 samples/sec   Loss 1.7128   LearningRate 0.0016   Epoch: 18   Global Step: 195480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:36:46,616-Speed 5979.47 samples/sec   Loss 1.7087   LearningRate 0.0016   Epoch: 18   Global Step: 195490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:36:53,468-Speed 5978.88 samples/sec   Loss 1.6823   LearningRate 0.0016   Epoch: 18   Global Step: 195500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:37:00,352-Speed 5951.45 samples/sec   Loss 1.6892   LearningRate 0.0016   Epoch: 18   Global Step: 195510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:37:07,210-Speed 5973.88 samples/sec   Loss 1.7056   LearningRate 0.0016   Epoch: 18   Global Step: 195520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:37:14,107-Speed 5940.45 samples/sec   Loss 1.6998   LearningRate 0.0016   Epoch: 18   Global Step: 195530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:37:20,956-Speed 5981.18 samples/sec   Loss 1.7038   LearningRate 0.0016   Epoch: 18   Global Step: 195540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:37:27,823-Speed 5965.91 samples/sec   Loss 1.7508   LearningRate 0.0016   Epoch: 18   Global Step: 195550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:37:34,673-Speed 5981.26 samples/sec   Loss 1.6877   LearningRate 0.0016   Epoch: 18   Global Step: 195560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:37:41,544-Speed 5962.25 samples/sec   Loss 1.7134   LearningRate 0.0016   Epoch: 18   Global Step: 195570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:37:48,479-Speed 5907.26 samples/sec   Loss 1.7025   LearningRate 0.0016   Epoch: 18   Global Step: 195580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:37:55,335-Speed 5976.37 samples/sec   Loss 1.7447   LearningRate 0.0016   Epoch: 18   Global Step: 195590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:38:02,223-Speed 5947.32 samples/sec   Loss 1.6777   LearningRate 0.0016   Epoch: 18   Global Step: 195600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:38:09,097-Speed 5959.96 samples/sec   Loss 1.7076   LearningRate 0.0016   Epoch: 18   Global Step: 195610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:38:15,976-Speed 5955.83 samples/sec   Loss 1.7021   LearningRate 0.0016   Epoch: 18   Global Step: 195620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:38:22,827-Speed 5979.47 samples/sec   Loss 1.6898   LearningRate 0.0016   Epoch: 18   Global Step: 195630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:38:29,676-Speed 5981.62 samples/sec   Loss 1.6844   LearningRate 0.0016   Epoch: 18   Global Step: 195640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:38:36,546-Speed 5963.01 samples/sec   Loss 1.6869   LearningRate 0.0016   Epoch: 18   Global Step: 195650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:38:43,415-Speed 5964.14 samples/sec   Loss 1.7270   LearningRate 0.0016   Epoch: 18   Global Step: 195660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:38:50,261-Speed 5984.81 samples/sec   Loss 1.7157   LearningRate 0.0016   Epoch: 18   Global Step: 195670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:38:57,202-Speed 5902.02 samples/sec   Loss 1.6719   LearningRate 0.0016   Epoch: 18   Global Step: 195680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:39:04,058-Speed 5975.23 samples/sec   Loss 1.6935   LearningRate 0.0016   Epoch: 18   Global Step: 195690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:39:11,015-Speed 5889.26 samples/sec   Loss 1.6966   LearningRate 0.0016   Epoch: 18   Global Step: 195700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:39:17,866-Speed 5979.63 samples/sec   Loss 1.6939   LearningRate 0.0016   Epoch: 18   Global Step: 195710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:39:24,748-Speed 5952.37 samples/sec   Loss 1.6817   LearningRate 0.0016   Epoch: 18   Global Step: 195720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:39:31,603-Speed 5976.28 samples/sec   Loss 1.7125   LearningRate 0.0016   Epoch: 18   Global Step: 195730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:39:38,481-Speed 5956.81 samples/sec   Loss 1.7097   LearningRate 0.0016   Epoch: 18   Global Step: 195740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:39:45,344-Speed 5969.22 samples/sec   Loss 1.7239   LearningRate 0.0016   Epoch: 18   Global Step: 195750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:39:52,211-Speed 5966.23 samples/sec   Loss 1.7072   LearningRate 0.0016   Epoch: 18   Global Step: 195760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:39:59,083-Speed 5961.07 samples/sec   Loss 1.6745   LearningRate 0.0015   Epoch: 18   Global Step: 195770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:40:05,957-Speed 5960.22 samples/sec   Loss 1.6872   LearningRate 0.0015   Epoch: 18   Global Step: 195780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:40:12,827-Speed 5963.71 samples/sec   Loss 1.7020   LearningRate 0.0015   Epoch: 18   Global Step: 195790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:40:19,679-Speed 5978.83 samples/sec   Loss 1.7194   LearningRate 0.0015   Epoch: 18   Global Step: 195800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:40:26,535-Speed 5975.05 samples/sec   Loss 1.6713   LearningRate 0.0015   Epoch: 18   Global Step: 195810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:40:33,389-Speed 5977.83 samples/sec   Loss 1.7442   LearningRate 0.0015   Epoch: 18   Global Step: 195820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:40:40,269-Speed 5954.39 samples/sec   Loss 1.7045   LearningRate 0.0015   Epoch: 18   Global Step: 195830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:40:47,122-Speed 5977.68 samples/sec   Loss 1.7158   LearningRate 0.0015   Epoch: 18   Global Step: 195840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:40:53,979-Speed 5974.35 samples/sec   Loss 1.7337   LearningRate 0.0015   Epoch: 18   Global Step: 195850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:41:00,826-Speed 5984.06 samples/sec   Loss 1.6786   LearningRate 0.0015   Epoch: 18   Global Step: 195860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:41:07,717-Speed 5944.79 samples/sec   Loss 1.7079   LearningRate 0.0015   Epoch: 18   Global Step: 195870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:41:14,613-Speed 5940.94 samples/sec   Loss 1.7059   LearningRate 0.0015   Epoch: 18   Global Step: 195880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:41:21,467-Speed 5976.40 samples/sec   Loss 1.7095   LearningRate 0.0015   Epoch: 18   Global Step: 195890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:41:28,350-Speed 5952.16 samples/sec   Loss 1.7063   LearningRate 0.0015   Epoch: 18   Global Step: 195900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:41:35,228-Speed 5956.55 samples/sec   Loss 1.6683   LearningRate 0.0015   Epoch: 18   Global Step: 195910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:41:42,084-Speed 5975.63 samples/sec   Loss 1.6920   LearningRate 0.0015   Epoch: 18   Global Step: 195920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:41:48,958-Speed 5960.21 samples/sec   Loss 1.7375   LearningRate 0.0015   Epoch: 18   Global Step: 195930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:41:55,813-Speed 5975.95 samples/sec   Loss 1.6900   LearningRate 0.0015   Epoch: 18   Global Step: 195940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:42:02,690-Speed 5957.42 samples/sec   Loss 1.6653   LearningRate 0.0015   Epoch: 18   Global Step: 195950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:42:09,554-Speed 5969.30 samples/sec   Loss 1.6878   LearningRate 0.0015   Epoch: 18   Global Step: 195960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:42:16,419-Speed 5967.28 samples/sec   Loss 1.7122   LearningRate 0.0015   Epoch: 18   Global Step: 195970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:42:23,287-Speed 5965.25 samples/sec   Loss 1.7153   LearningRate 0.0015   Epoch: 18   Global Step: 195980   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:42:30,165-Speed 5956.27 samples/sec   Loss 1.6809   LearningRate 0.0015   Epoch: 18   Global Step: 195990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:42:37,041-Speed 5958.23 samples/sec   Loss 1.6897   LearningRate 0.0015   Epoch: 18   Global Step: 196000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:42:43,912-Speed 5962.42 samples/sec   Loss 1.7097   LearningRate 0.0015   Epoch: 18   Global Step: 196010   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:42:50,780-Speed 5965.01 samples/sec   Loss 1.6897   LearningRate 0.0015   Epoch: 18   Global Step: 196020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:42:57,636-Speed 5974.97 samples/sec   Loss 1.6842   LearningRate 0.0015   Epoch: 18   Global Step: 196030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:43:04,488-Speed 5979.29 samples/sec   Loss 1.6844   LearningRate 0.0015   Epoch: 18   Global Step: 196040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:43:11,373-Speed 5950.70 samples/sec   Loss 1.7073   LearningRate 0.0015   Epoch: 18   Global Step: 196050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:43:18,235-Speed 5969.98 samples/sec   Loss 1.7014   LearningRate 0.0015   Epoch: 18   Global Step: 196060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:43:25,092-Speed 5974.61 samples/sec   Loss 1.7043   LearningRate 0.0015   Epoch: 18   Global Step: 196070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:43:31,950-Speed 5974.28 samples/sec   Loss 1.6815   LearningRate 0.0015   Epoch: 18   Global Step: 196080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:43:38,807-Speed 5974.72 samples/sec   Loss 1.7081   LearningRate 0.0015   Epoch: 18   Global Step: 196090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:43:45,667-Speed 5972.75 samples/sec   Loss 1.6953   LearningRate 0.0015   Epoch: 18   Global Step: 196100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:43:52,524-Speed 5974.49 samples/sec   Loss 1.6974   LearningRate 0.0015   Epoch: 18   Global Step: 196110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:43:59,379-Speed 5976.34 samples/sec   Loss 1.6911   LearningRate 0.0015   Epoch: 18   Global Step: 196120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:44:06,234-Speed 5976.19 samples/sec   Loss 1.6691   LearningRate 0.0015   Epoch: 18   Global Step: 196130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:44:13,155-Speed 5919.59 samples/sec   Loss 1.6498   LearningRate 0.0015   Epoch: 18   Global Step: 196140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:44:20,067-Speed 5927.43 samples/sec   Loss 1.7171   LearningRate 0.0014   Epoch: 18   Global Step: 196150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:44:26,920-Speed 5977.62 samples/sec   Loss 1.6733   LearningRate 0.0014   Epoch: 18   Global Step: 196160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:44:33,783-Speed 5968.98 samples/sec   Loss 1.7031   LearningRate 0.0014   Epoch: 18   Global Step: 196170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:44:40,642-Speed 5973.49 samples/sec   Loss 1.7103   LearningRate 0.0014   Epoch: 18   Global Step: 196180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:44:47,503-Speed 5970.75 samples/sec   Loss 1.6949   LearningRate 0.0014   Epoch: 18   Global Step: 196190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:44:54,380-Speed 5957.66 samples/sec   Loss 1.6804   LearningRate 0.0014   Epoch: 18   Global Step: 196200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:45:01,254-Speed 5959.40 samples/sec   Loss 1.7076   LearningRate 0.0014   Epoch: 18   Global Step: 196210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:45:08,124-Speed 5963.33 samples/sec   Loss 1.6869   LearningRate 0.0014   Epoch: 18   Global Step: 196220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:45:15,008-Speed 5951.38 samples/sec   Loss 1.6718   LearningRate 0.0014   Epoch: 18   Global Step: 196230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:45:21,884-Speed 5958.54 samples/sec   Loss 1.7132   LearningRate 0.0014   Epoch: 18   Global Step: 196240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:45:28,747-Speed 5968.56 samples/sec   Loss 1.6746   LearningRate 0.0014   Epoch: 18   Global Step: 196250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:45:35,621-Speed 5960.29 samples/sec   Loss 1.7088   LearningRate 0.0014   Epoch: 18   Global Step: 196260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:45:42,486-Speed 5967.99 samples/sec   Loss 1.6844   LearningRate 0.0014   Epoch: 18   Global Step: 196270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:45:49,349-Speed 5968.91 samples/sec   Loss 1.7075   LearningRate 0.0014   Epoch: 18   Global Step: 196280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:45:56,214-Speed 5968.17 samples/sec   Loss 1.7055   LearningRate 0.0014   Epoch: 18   Global Step: 196290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:46:03,071-Speed 5974.26 samples/sec   Loss 1.6624   LearningRate 0.0014   Epoch: 18   Global Step: 196300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:46:09,923-Speed 5979.40 samples/sec   Loss 1.6900   LearningRate 0.0014   Epoch: 18   Global Step: 196310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:46:16,782-Speed 5972.18 samples/sec   Loss 1.6779   LearningRate 0.0014   Epoch: 18   Global Step: 196320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:46:23,650-Speed 5969.13 samples/sec   Loss 1.6666   LearningRate 0.0014   Epoch: 18   Global Step: 196330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:46:30,528-Speed 5956.28 samples/sec   Loss 1.6806   LearningRate 0.0014   Epoch: 18   Global Step: 196340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:46:37,378-Speed 5979.90 samples/sec   Loss 1.6754   LearningRate 0.0014   Epoch: 18   Global Step: 196350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:46:44,239-Speed 5971.53 samples/sec   Loss 1.6923   LearningRate 0.0014   Epoch: 18   Global Step: 196360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:46:51,119-Speed 5955.22 samples/sec   Loss 1.6986   LearningRate 0.0014   Epoch: 18   Global Step: 196370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:46:57,993-Speed 5959.10 samples/sec   Loss 1.7057   LearningRate 0.0014   Epoch: 18   Global Step: 196380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:47:04,888-Speed 5942.19 samples/sec   Loss 1.6580   LearningRate 0.0014   Epoch: 18   Global Step: 196390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:47:11,770-Speed 5952.76 samples/sec   Loss 1.6517   LearningRate 0.0014   Epoch: 18   Global Step: 196400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:47:18,634-Speed 5968.50 samples/sec   Loss 1.6372   LearningRate 0.0014   Epoch: 18   Global Step: 196410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:47:25,494-Speed 5972.31 samples/sec   Loss 1.7040   LearningRate 0.0014   Epoch: 18   Global Step: 196420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:47:32,351-Speed 5973.78 samples/sec   Loss 1.6886   LearningRate 0.0014   Epoch: 18   Global Step: 196430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:47:39,220-Speed 5964.61 samples/sec   Loss 1.6997   LearningRate 0.0014   Epoch: 18   Global Step: 196440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:47:46,088-Speed 5964.73 samples/sec   Loss 1.6828   LearningRate 0.0014   Epoch: 18   Global Step: 196450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:47:52,958-Speed 5963.71 samples/sec   Loss 1.6812   LearningRate 0.0014   Epoch: 18   Global Step: 196460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:47:59,839-Speed 5954.15 samples/sec   Loss 1.6826   LearningRate 0.0014   Epoch: 18   Global Step: 196470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:48:06,682-Speed 5986.65 samples/sec   Loss 1.6810   LearningRate 0.0014   Epoch: 18   Global Step: 196480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:48:13,554-Speed 5961.27 samples/sec   Loss 1.6864   LearningRate 0.0014   Epoch: 18   Global Step: 196490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:48:20,410-Speed 5976.04 samples/sec   Loss 1.6827   LearningRate 0.0014   Epoch: 18   Global Step: 196500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:48:27,272-Speed 5970.15 samples/sec   Loss 1.6778   LearningRate 0.0014   Epoch: 18   Global Step: 196510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:48:34,128-Speed 5974.91 samples/sec   Loss 1.6756   LearningRate 0.0014   Epoch: 18   Global Step: 196520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:48:41,012-Speed 5951.48 samples/sec   Loss 1.6641   LearningRate 0.0014   Epoch: 18   Global Step: 196530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:48:47,882-Speed 5964.01 samples/sec   Loss 1.6851   LearningRate 0.0013   Epoch: 18   Global Step: 196540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:48:54,733-Speed 5979.63 samples/sec   Loss 1.6724   LearningRate 0.0013   Epoch: 18   Global Step: 196550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:49:01,579-Speed 5983.71 samples/sec   Loss 1.6883   LearningRate 0.0013   Epoch: 18   Global Step: 196560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:49:08,436-Speed 5974.20 samples/sec   Loss 1.6636   LearningRate 0.0013   Epoch: 18   Global Step: 196570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:49:15,299-Speed 5969.79 samples/sec   Loss 1.6708   LearningRate 0.0013   Epoch: 18   Global Step: 196580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:49:22,144-Speed 5985.57 samples/sec   Loss 1.6675   LearningRate 0.0013   Epoch: 18   Global Step: 196590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:49:29,005-Speed 5970.96 samples/sec   Loss 1.6695   LearningRate 0.0013   Epoch: 18   Global Step: 196600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:49:35,856-Speed 5979.80 samples/sec   Loss 1.6403   LearningRate 0.0013   Epoch: 18   Global Step: 196610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:49:42,711-Speed 5976.91 samples/sec   Loss 1.6767   LearningRate 0.0013   Epoch: 18   Global Step: 196620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:49:49,575-Speed 5968.22 samples/sec   Loss 1.6836   LearningRate 0.0013   Epoch: 18   Global Step: 196630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:49:56,425-Speed 5980.31 samples/sec   Loss 1.6820   LearningRate 0.0013   Epoch: 18   Global Step: 196640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:50:03,274-Speed 5982.33 samples/sec   Loss 1.7011   LearningRate 0.0013   Epoch: 18   Global Step: 196650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:50:10,148-Speed 5959.50 samples/sec   Loss 1.6543   LearningRate 0.0013   Epoch: 18   Global Step: 196660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:50:17,018-Speed 5963.54 samples/sec   Loss 1.6870   LearningRate 0.0013   Epoch: 18   Global Step: 196670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:50:23,935-Speed 5923.66 samples/sec   Loss 1.6949   LearningRate 0.0013   Epoch: 18   Global Step: 196680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:50:30,788-Speed 5980.86 samples/sec   Loss 1.6690   LearningRate 0.0013   Epoch: 18   Global Step: 196690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:50:37,647-Speed 5972.61 samples/sec   Loss 1.6508   LearningRate 0.0013   Epoch: 18   Global Step: 196700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:50:44,499-Speed 5979.86 samples/sec   Loss 1.6952   LearningRate 0.0013   Epoch: 18   Global Step: 196710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:50:51,341-Speed 5986.75 samples/sec   Loss 1.6986   LearningRate 0.0013   Epoch: 18   Global Step: 196720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:50:58,207-Speed 5967.10 samples/sec   Loss 1.6855   LearningRate 0.0013   Epoch: 18   Global Step: 196730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:51:05,067-Speed 5972.23 samples/sec   Loss 1.6619   LearningRate 0.0013   Epoch: 18   Global Step: 196740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:51:12,018-Speed 5893.46 samples/sec   Loss 1.6710   LearningRate 0.0013   Epoch: 18   Global Step: 196750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:51:18,866-Speed 5982.48 samples/sec   Loss 1.6756   LearningRate 0.0013   Epoch: 18   Global Step: 196760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:51:25,753-Speed 5951.65 samples/sec   Loss 1.6881   LearningRate 0.0013   Epoch: 18   Global Step: 196770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:51:32,610-Speed 5974.54 samples/sec   Loss 1.6740   LearningRate 0.0013   Epoch: 18   Global Step: 196780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:51:39,470-Speed 5971.90 samples/sec   Loss 1.6464   LearningRate 0.0013   Epoch: 18   Global Step: 196790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:51:46,317-Speed 5983.52 samples/sec   Loss 1.6416   LearningRate 0.0013   Epoch: 18   Global Step: 196800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:51:53,162-Speed 5985.84 samples/sec   Loss 1.6949   LearningRate 0.0013   Epoch: 18   Global Step: 196810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:52:00,018-Speed 5975.16 samples/sec   Loss 1.6878   LearningRate 0.0013   Epoch: 18   Global Step: 196820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:52:06,891-Speed 5960.74 samples/sec   Loss 1.6578   LearningRate 0.0013   Epoch: 18   Global Step: 196830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:52:13,739-Speed 5982.60 samples/sec   Loss 1.6684   LearningRate 0.0013   Epoch: 18   Global Step: 196840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:52:20,606-Speed 5966.38 samples/sec   Loss 1.6891   LearningRate 0.0013   Epoch: 18   Global Step: 196850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:52:27,477-Speed 5962.12 samples/sec   Loss 1.6828   LearningRate 0.0013   Epoch: 18   Global Step: 196860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:52:34,347-Speed 5963.77 samples/sec   Loss 1.6570   LearningRate 0.0013   Epoch: 18   Global Step: 196870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:52:41,201-Speed 5976.94 samples/sec   Loss 1.6649   LearningRate 0.0013   Epoch: 18   Global Step: 196880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:52:48,061-Speed 5972.48 samples/sec   Loss 1.6776   LearningRate 0.0013   Epoch: 18   Global Step: 196890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:52:54,909-Speed 5981.80 samples/sec   Loss 1.6634   LearningRate 0.0013   Epoch: 18   Global Step: 196900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:53:01,766-Speed 5975.03 samples/sec   Loss 1.6733   LearningRate 0.0013   Epoch: 18   Global Step: 196910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:53:08,652-Speed 5949.76 samples/sec   Loss 1.6719   LearningRate 0.0013   Epoch: 18   Global Step: 196920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:53:15,511-Speed 5972.70 samples/sec   Loss 1.6668   LearningRate 0.0013   Epoch: 18   Global Step: 196930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:53:22,360-Speed 5981.69 samples/sec   Loss 1.6895   LearningRate 0.0013   Epoch: 18   Global Step: 196940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:53:29,210-Speed 5980.18 samples/sec   Loss 1.6773   LearningRate 0.0012   Epoch: 18   Global Step: 196950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:53:36,057-Speed 5986.63 samples/sec   Loss 1.6850   LearningRate 0.0012   Epoch: 18   Global Step: 196960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:53:42,909-Speed 5978.48 samples/sec   Loss 1.6660   LearningRate 0.0012   Epoch: 18   Global Step: 196970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:53:49,773-Speed 5968.42 samples/sec   Loss 1.6837   LearningRate 0.0012   Epoch: 18   Global Step: 196980   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:53:56,622-Speed 5981.37 samples/sec   Loss 1.6763   LearningRate 0.0012   Epoch: 18   Global Step: 196990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:54:03,468-Speed 5984.05 samples/sec   Loss 1.6805   LearningRate 0.0012   Epoch: 18   Global Step: 197000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:54:10,311-Speed 5987.24 samples/sec   Loss 1.6854   LearningRate 0.0012   Epoch: 18   Global Step: 197010   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:54:17,169-Speed 5973.54 samples/sec   Loss 1.6723   LearningRate 0.0012   Epoch: 18   Global Step: 197020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:54:40,395-Speed 1763.62 samples/sec   Loss 1.6555   LearningRate 0.0012   Epoch: 19   Global Step: 197030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:54:47,215-Speed 6008.94 samples/sec   Loss 1.6754   LearningRate 0.0012   Epoch: 19   Global Step: 197040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:54:54,059-Speed 5985.95 samples/sec   Loss 1.6617   LearningRate 0.0012   Epoch: 19   Global Step: 197050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:55:00,909-Speed 5980.89 samples/sec   Loss 1.6777   LearningRate 0.0012   Epoch: 19   Global Step: 197060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:55:07,744-Speed 5993.81 samples/sec   Loss 1.6860   LearningRate 0.0012   Epoch: 19   Global Step: 197070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:55:14,585-Speed 5988.88 samples/sec   Loss 1.6748   LearningRate 0.0012   Epoch: 19   Global Step: 197080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:55:21,480-Speed 5953.31 samples/sec   Loss 1.6620   LearningRate 0.0012   Epoch: 19   Global Step: 197090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:55:28,314-Speed 5994.97 samples/sec   Loss 1.6439   LearningRate 0.0012   Epoch: 19   Global Step: 197100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:55:35,149-Speed 5993.48 samples/sec   Loss 1.6449   LearningRate 0.0012   Epoch: 19   Global Step: 197110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:55:42,016-Speed 5966.42 samples/sec   Loss 1.6692   LearningRate 0.0012   Epoch: 19   Global Step: 197120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:55:48,874-Speed 5974.01 samples/sec   Loss 1.6711   LearningRate 0.0012   Epoch: 19   Global Step: 197130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:55:55,743-Speed 5963.83 samples/sec   Loss 1.6286   LearningRate 0.0012   Epoch: 19   Global Step: 197140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:56:02,633-Speed 5946.34 samples/sec   Loss 1.6375   LearningRate 0.0012   Epoch: 19   Global Step: 197150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:56:09,494-Speed 5971.21 samples/sec   Loss 1.6625   LearningRate 0.0012   Epoch: 19   Global Step: 197160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:56:16,354-Speed 5972.64 samples/sec   Loss 1.6626   LearningRate 0.0012   Epoch: 19   Global Step: 197170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:56:23,219-Speed 5967.17 samples/sec   Loss 1.6596   LearningRate 0.0012   Epoch: 19   Global Step: 197180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:56:30,175-Speed 5891.34 samples/sec   Loss 1.6279   LearningRate 0.0012   Epoch: 19   Global Step: 197190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:56:37,054-Speed 5955.71 samples/sec   Loss 1.6381   LearningRate 0.0012   Epoch: 19   Global Step: 197200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:56:43,938-Speed 5951.01 samples/sec   Loss 1.6318   LearningRate 0.0012   Epoch: 19   Global Step: 197210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:56:50,787-Speed 5981.68 samples/sec   Loss 1.6446   LearningRate 0.0012   Epoch: 19   Global Step: 197220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:56:57,656-Speed 5964.34 samples/sec   Loss 1.6106   LearningRate 0.0012   Epoch: 19   Global Step: 197230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:57:04,512-Speed 5975.59 samples/sec   Loss 1.6629   LearningRate 0.0012   Epoch: 19   Global Step: 197240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:57:11,371-Speed 5972.67 samples/sec   Loss 1.6403   LearningRate 0.0012   Epoch: 19   Global Step: 197250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:57:18,219-Speed 5985.18 samples/sec   Loss 1.6476   LearningRate 0.0012   Epoch: 19   Global Step: 197260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:57:25,065-Speed 5984.41 samples/sec   Loss 1.6260   LearningRate 0.0012   Epoch: 19   Global Step: 197270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:57:31,960-Speed 5941.65 samples/sec   Loss 1.6383   LearningRate 0.0012   Epoch: 19   Global Step: 197280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:57:38,814-Speed 5977.51 samples/sec   Loss 1.6413   LearningRate 0.0012   Epoch: 19   Global Step: 197290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:57:45,665-Speed 5978.94 samples/sec   Loss 1.6378   LearningRate 0.0012   Epoch: 19   Global Step: 197300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:57:52,522-Speed 5974.96 samples/sec   Loss 1.6518   LearningRate 0.0012   Epoch: 19   Global Step: 197310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:57:59,370-Speed 5982.45 samples/sec   Loss 1.6792   LearningRate 0.0012   Epoch: 19   Global Step: 197320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:58:06,225-Speed 5976.26 samples/sec   Loss 1.6328   LearningRate 0.0012   Epoch: 19   Global Step: 197330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:58:13,134-Speed 5929.31 samples/sec   Loss 1.6555   LearningRate 0.0012   Epoch: 19   Global Step: 197340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:58:20,001-Speed 5965.97 samples/sec   Loss 1.6415   LearningRate 0.0012   Epoch: 19   Global Step: 197350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:58:26,857-Speed 5976.22 samples/sec   Loss 1.6878   LearningRate 0.0012   Epoch: 19   Global Step: 197360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:58:33,724-Speed 5966.65 samples/sec   Loss 1.6525   LearningRate 0.0012   Epoch: 19   Global Step: 197370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:58:40,579-Speed 5976.23 samples/sec   Loss 1.6505   LearningRate 0.0011   Epoch: 19   Global Step: 197380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:58:47,447-Speed 5964.89 samples/sec   Loss 1.6568   LearningRate 0.0011   Epoch: 19   Global Step: 197390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 10:58:54,316-Speed 5964.84 samples/sec   Loss 1.6495   LearningRate 0.0011   Epoch: 19   Global Step: 197400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:59:01,203-Speed 5948.88 samples/sec   Loss 1.6294   LearningRate 0.0011   Epoch: 19   Global Step: 197410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:59:08,080-Speed 5956.77 samples/sec   Loss 1.6652   LearningRate 0.0011   Epoch: 19   Global Step: 197420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:59:14,919-Speed 5990.92 samples/sec   Loss 1.6386   LearningRate 0.0011   Epoch: 19   Global Step: 197430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:59:21,763-Speed 5985.82 samples/sec   Loss 1.6645   LearningRate 0.0011   Epoch: 19   Global Step: 197440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:59:28,644-Speed 5953.73 samples/sec   Loss 1.6629   LearningRate 0.0011   Epoch: 19   Global Step: 197450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:59:35,503-Speed 5972.18 samples/sec   Loss 1.6520   LearningRate 0.0011   Epoch: 19   Global Step: 197460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:59:42,366-Speed 5969.21 samples/sec   Loss 1.6514   LearningRate 0.0011   Epoch: 19   Global Step: 197470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:59:49,214-Speed 5981.97 samples/sec   Loss 1.6347   LearningRate 0.0011   Epoch: 19   Global Step: 197480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 10:59:56,071-Speed 5975.50 samples/sec   Loss 1.6581   LearningRate 0.0011   Epoch: 19   Global Step: 197490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:00:02,927-Speed 5974.71 samples/sec   Loss 1.6695   LearningRate 0.0011   Epoch: 19   Global Step: 197500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:00:09,812-Speed 5949.96 samples/sec   Loss 1.6474   LearningRate 0.0011   Epoch: 19   Global Step: 197510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:00:16,670-Speed 5974.23 samples/sec   Loss 1.6485   LearningRate 0.0011   Epoch: 19   Global Step: 197520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:00:23,527-Speed 5975.10 samples/sec   Loss 1.6319   LearningRate 0.0011   Epoch: 19   Global Step: 197530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:00:30,389-Speed 5969.90 samples/sec   Loss 1.6199   LearningRate 0.0011   Epoch: 19   Global Step: 197540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:00:37,237-Speed 5982.78 samples/sec   Loss 1.6115   LearningRate 0.0011   Epoch: 19   Global Step: 197550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:00:44,082-Speed 5984.89 samples/sec   Loss 1.6421   LearningRate 0.0011   Epoch: 19   Global Step: 197560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:00:50,943-Speed 5971.56 samples/sec   Loss 1.6374   LearningRate 0.0011   Epoch: 19   Global Step: 197570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:00:57,819-Speed 5958.33 samples/sec   Loss 1.6439   LearningRate 0.0011   Epoch: 19   Global Step: 197580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:01:04,669-Speed 5980.29 samples/sec   Loss 1.6757   LearningRate 0.0011   Epoch: 19   Global Step: 197590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:01:11,545-Speed 5958.17 samples/sec   Loss 1.6711   LearningRate 0.0011   Epoch: 19   Global Step: 197600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:01:18,421-Speed 5958.29 samples/sec   Loss 1.6437   LearningRate 0.0011   Epoch: 19   Global Step: 197610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:01:25,296-Speed 5959.45 samples/sec   Loss 1.6436   LearningRate 0.0011   Epoch: 19   Global Step: 197620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:01:32,150-Speed 5977.03 samples/sec   Loss 1.6614   LearningRate 0.0011   Epoch: 19   Global Step: 197630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:01:39,036-Speed 5949.14 samples/sec   Loss 1.6468   LearningRate 0.0011   Epoch: 19   Global Step: 197640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:01:45,885-Speed 5982.11 samples/sec   Loss 1.6203   LearningRate 0.0011   Epoch: 19   Global Step: 197650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:01:52,760-Speed 5958.68 samples/sec   Loss 1.6262   LearningRate 0.0011   Epoch: 19   Global Step: 197660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:01:59,622-Speed 5970.81 samples/sec   Loss 1.6491   LearningRate 0.0011   Epoch: 19   Global Step: 197670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:02:06,516-Speed 5942.63 samples/sec   Loss 1.6188   LearningRate 0.0011   Epoch: 19   Global Step: 197680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:02:13,367-Speed 5979.99 samples/sec   Loss 1.6454   LearningRate 0.0011   Epoch: 19   Global Step: 197690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:02:20,235-Speed 5965.46 samples/sec   Loss 1.6275   LearningRate 0.0011   Epoch: 19   Global Step: 197700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:02:27,117-Speed 5952.39 samples/sec   Loss 1.6420   LearningRate 0.0011   Epoch: 19   Global Step: 197710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:02:33,990-Speed 5962.85 samples/sec   Loss 1.6197   LearningRate 0.0011   Epoch: 19   Global Step: 197720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:02:40,836-Speed 5984.22 samples/sec   Loss 1.6502   LearningRate 0.0011   Epoch: 19   Global Step: 197730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:02:47,702-Speed 5966.79 samples/sec   Loss 1.6319   LearningRate 0.0011   Epoch: 19   Global Step: 197740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:02:54,571-Speed 5964.56 samples/sec   Loss 1.6257   LearningRate 0.0011   Epoch: 19   Global Step: 197750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:03:01,434-Speed 5969.59 samples/sec   Loss 1.5994   LearningRate 0.0011   Epoch: 19   Global Step: 197760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:03:08,304-Speed 5964.09 samples/sec   Loss 1.6708   LearningRate 0.0011   Epoch: 19   Global Step: 197770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:03:15,167-Speed 5969.26 samples/sec   Loss 1.6316   LearningRate 0.0011   Epoch: 19   Global Step: 197780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:03:22,088-Speed 5920.48 samples/sec   Loss 1.6243   LearningRate 0.0011   Epoch: 19   Global Step: 197790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:03:28,969-Speed 5953.59 samples/sec   Loss 1.6188   LearningRate 0.0011   Epoch: 19   Global Step: 197800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:03:35,836-Speed 5966.08 samples/sec   Loss 1.6303   LearningRate 0.0011   Epoch: 19   Global Step: 197810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:03:42,702-Speed 5967.40 samples/sec   Loss 1.6556   LearningRate 0.0010   Epoch: 19   Global Step: 197820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:03:49,558-Speed 5975.36 samples/sec   Loss 1.6501   LearningRate 0.0010   Epoch: 19   Global Step: 197830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:03:56,416-Speed 5973.41 samples/sec   Loss 1.6492   LearningRate 0.0010   Epoch: 19   Global Step: 197840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:04:03,269-Speed 5978.26 samples/sec   Loss 1.6516   LearningRate 0.0010   Epoch: 19   Global Step: 197850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:04:10,139-Speed 5963.18 samples/sec   Loss 1.6266   LearningRate 0.0010   Epoch: 19   Global Step: 197860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:04:17,008-Speed 5963.87 samples/sec   Loss 1.5936   LearningRate 0.0010   Epoch: 19   Global Step: 197870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:04:23,874-Speed 5972.38 samples/sec   Loss 1.6376   LearningRate 0.0010   Epoch: 19   Global Step: 197880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:04:30,745-Speed 5962.90 samples/sec   Loss 1.6287   LearningRate 0.0010   Epoch: 19   Global Step: 197890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:04:37,717-Speed 5876.24 samples/sec   Loss 1.6258   LearningRate 0.0010   Epoch: 19   Global Step: 197900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:04:44,580-Speed 5969.79 samples/sec   Loss 1.6130   LearningRate 0.0010   Epoch: 19   Global Step: 197910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:04:51,422-Speed 5988.74 samples/sec   Loss 1.6379   LearningRate 0.0010   Epoch: 19   Global Step: 197920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:04:58,321-Speed 5937.81 samples/sec   Loss 1.6336   LearningRate 0.0010   Epoch: 19   Global Step: 197930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:05:05,176-Speed 5978.14 samples/sec   Loss 1.6379   LearningRate 0.0010   Epoch: 19   Global Step: 197940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:05:12,037-Speed 5972.15 samples/sec   Loss 1.6234   LearningRate 0.0010   Epoch: 19   Global Step: 197950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:05:18,902-Speed 5967.12 samples/sec   Loss 1.6220   LearningRate 0.0010   Epoch: 19   Global Step: 197960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:05:25,756-Speed 5977.17 samples/sec   Loss 1.6355   LearningRate 0.0010   Epoch: 19   Global Step: 197970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:05:32,620-Speed 5968.93 samples/sec   Loss 1.6160   LearningRate 0.0010   Epoch: 19   Global Step: 197980   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-01-09 11:05:39,476-Speed 5976.06 samples/sec   Loss 1.6364   LearningRate 0.0010   Epoch: 19   Global Step: 197990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:05:46,321-Speed 5984.80 samples/sec   Loss 1.6291   LearningRate 0.0010   Epoch: 19   Global Step: 198000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:05:53,169-Speed 5982.18 samples/sec   Loss 1.6587   LearningRate 0.0010   Epoch: 19   Global Step: 198010   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:06:00,035-Speed 5967.12 samples/sec   Loss 1.6330   LearningRate 0.0010   Epoch: 19   Global Step: 198020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:06:06,911-Speed 5958.07 samples/sec   Loss 1.6317   LearningRate 0.0010   Epoch: 19   Global Step: 198030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:06:13,784-Speed 5960.98 samples/sec   Loss 1.6536   LearningRate 0.0010   Epoch: 19   Global Step: 198040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:06:20,641-Speed 5974.30 samples/sec   Loss 1.6242   LearningRate 0.0010   Epoch: 19   Global Step: 198050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:06:27,521-Speed 5954.68 samples/sec   Loss 1.6367   LearningRate 0.0010   Epoch: 19   Global Step: 198060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:06:34,388-Speed 5965.63 samples/sec   Loss 1.6225   LearningRate 0.0010   Epoch: 19   Global Step: 198070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:06:41,245-Speed 5974.79 samples/sec   Loss 1.6235   LearningRate 0.0010   Epoch: 19   Global Step: 198080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:06:48,098-Speed 5977.83 samples/sec   Loss 1.6086   LearningRate 0.0010   Epoch: 19   Global Step: 198090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:06:54,950-Speed 5979.78 samples/sec   Loss 1.6186   LearningRate 0.0010   Epoch: 19   Global Step: 198100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:07:01,841-Speed 5944.75 samples/sec   Loss 1.6113   LearningRate 0.0010   Epoch: 19   Global Step: 198110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:07:08,687-Speed 5984.37 samples/sec   Loss 1.6415   LearningRate 0.0010   Epoch: 19   Global Step: 198120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:07:15,555-Speed 5964.96 samples/sec   Loss 1.6238   LearningRate 0.0010   Epoch: 19   Global Step: 198130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:07:22,417-Speed 5970.53 samples/sec   Loss 1.6276   LearningRate 0.0010   Epoch: 19   Global Step: 198140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:07:29,299-Speed 5953.35 samples/sec   Loss 1.6334   LearningRate 0.0010   Epoch: 19   Global Step: 198150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:07:36,157-Speed 5973.56 samples/sec   Loss 1.6593   LearningRate 0.0010   Epoch: 19   Global Step: 198160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:07:43,006-Speed 5981.98 samples/sec   Loss 1.6317   LearningRate 0.0010   Epoch: 19   Global Step: 198170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:07:49,851-Speed 5985.26 samples/sec   Loss 1.6210   LearningRate 0.0010   Epoch: 19   Global Step: 198180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:07:56,704-Speed 5978.05 samples/sec   Loss 1.6107   LearningRate 0.0010   Epoch: 19   Global Step: 198190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:08:03,575-Speed 5963.23 samples/sec   Loss 1.5958   LearningRate 0.0010   Epoch: 19   Global Step: 198200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:08:10,424-Speed 5981.55 samples/sec   Loss 1.6519   LearningRate 0.0010   Epoch: 19   Global Step: 198210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:08:17,312-Speed 5947.66 samples/sec   Loss 1.6077   LearningRate 0.0010   Epoch: 19   Global Step: 198220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:08:24,175-Speed 5969.59 samples/sec   Loss 1.6037   LearningRate 0.0010   Epoch: 19   Global Step: 198230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:08:31,027-Speed 5980.90 samples/sec   Loss 1.6306   LearningRate 0.0010   Epoch: 19   Global Step: 198240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:08:37,882-Speed 5976.56 samples/sec   Loss 1.6256   LearningRate 0.0010   Epoch: 19   Global Step: 198250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:08:44,777-Speed 5941.24 samples/sec   Loss 1.6471   LearningRate 0.0010   Epoch: 19   Global Step: 198260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:08:51,639-Speed 5970.27 samples/sec   Loss 1.6248   LearningRate 0.0010   Epoch: 19   Global Step: 198270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:08:58,494-Speed 5976.35 samples/sec   Loss 1.6193   LearningRate 0.0010   Epoch: 19   Global Step: 198280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:09:05,373-Speed 5955.74 samples/sec   Loss 1.5961   LearningRate 0.0009   Epoch: 19   Global Step: 198290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:09:12,251-Speed 5956.80 samples/sec   Loss 1.6094   LearningRate 0.0009   Epoch: 19   Global Step: 198300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:09:19,136-Speed 5949.77 samples/sec   Loss 1.5846   LearningRate 0.0009   Epoch: 19   Global Step: 198310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:09:25,988-Speed 5978.50 samples/sec   Loss 1.6093   LearningRate 0.0009   Epoch: 19   Global Step: 198320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:09:32,856-Speed 5965.48 samples/sec   Loss 1.6465   LearningRate 0.0009   Epoch: 19   Global Step: 198330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:09:39,711-Speed 5978.85 samples/sec   Loss 1.5720   LearningRate 0.0009   Epoch: 19   Global Step: 198340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:09:46,587-Speed 5957.73 samples/sec   Loss 1.6154   LearningRate 0.0009   Epoch: 19   Global Step: 198350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:09:53,456-Speed 5964.38 samples/sec   Loss 1.6294   LearningRate 0.0009   Epoch: 19   Global Step: 198360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:10:00,322-Speed 5967.94 samples/sec   Loss 1.5990   LearningRate 0.0009   Epoch: 19   Global Step: 198370   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-01-09 11:10:07,166-Speed 5985.48 samples/sec   Loss 1.6204   LearningRate 0.0009   Epoch: 19   Global Step: 198380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:10:14,009-Speed 5987.50 samples/sec   Loss 1.6429   LearningRate 0.0009   Epoch: 19   Global Step: 198390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:10:20,853-Speed 5987.47 samples/sec   Loss 1.6207   LearningRate 0.0009   Epoch: 19   Global Step: 198400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:10:27,707-Speed 5977.01 samples/sec   Loss 1.6190   LearningRate 0.0009   Epoch: 19   Global Step: 198410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:10:34,571-Speed 5970.31 samples/sec   Loss 1.6015   LearningRate 0.0009   Epoch: 19   Global Step: 198420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:10:41,425-Speed 5977.47 samples/sec   Loss 1.6139   LearningRate 0.0009   Epoch: 19   Global Step: 198430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:10:48,303-Speed 5955.75 samples/sec   Loss 1.6456   LearningRate 0.0009   Epoch: 19   Global Step: 198440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:10:55,163-Speed 5972.44 samples/sec   Loss 1.6292   LearningRate 0.0009   Epoch: 19   Global Step: 198450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:11:02,017-Speed 5976.63 samples/sec   Loss 1.6162   LearningRate 0.0009   Epoch: 19   Global Step: 198460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:11:08,864-Speed 5983.21 samples/sec   Loss 1.5901   LearningRate 0.0009   Epoch: 19   Global Step: 198470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:11:15,717-Speed 5979.27 samples/sec   Loss 1.6444   LearningRate 0.0009   Epoch: 19   Global Step: 198480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:11:22,562-Speed 5985.09 samples/sec   Loss 1.6207   LearningRate 0.0009   Epoch: 19   Global Step: 198490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:11:29,413-Speed 5980.09 samples/sec   Loss 1.5993   LearningRate 0.0009   Epoch: 19   Global Step: 198500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:11:36,278-Speed 5967.85 samples/sec   Loss 1.6203   LearningRate 0.0009   Epoch: 19   Global Step: 198510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:11:43,141-Speed 5972.24 samples/sec   Loss 1.6277   LearningRate 0.0009   Epoch: 19   Global Step: 198520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:11:49,977-Speed 5992.53 samples/sec   Loss 1.6110   LearningRate 0.0009   Epoch: 19   Global Step: 198530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:11:56,824-Speed 5983.34 samples/sec   Loss 1.6299   LearningRate 0.0009   Epoch: 19   Global Step: 198540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:12:03,687-Speed 5970.76 samples/sec   Loss 1.6273   LearningRate 0.0009   Epoch: 19   Global Step: 198550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:12:10,545-Speed 5973.43 samples/sec   Loss 1.6002   LearningRate 0.0009   Epoch: 19   Global Step: 198560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:12:17,401-Speed 5976.70 samples/sec   Loss 1.6235   LearningRate 0.0009   Epoch: 19   Global Step: 198570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:12:24,264-Speed 5968.83 samples/sec   Loss 1.6136   LearningRate 0.0009   Epoch: 19   Global Step: 198580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:12:31,122-Speed 5974.54 samples/sec   Loss 1.5873   LearningRate 0.0009   Epoch: 19   Global Step: 198590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:12:37,983-Speed 5974.16 samples/sec   Loss 1.6231   LearningRate 0.0009   Epoch: 19   Global Step: 198600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:12:44,832-Speed 5981.65 samples/sec   Loss 1.6132   LearningRate 0.0009   Epoch: 19   Global Step: 198610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:12:51,687-Speed 5976.36 samples/sec   Loss 1.6163   LearningRate 0.0009   Epoch: 19   Global Step: 198620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:12:58,552-Speed 5968.40 samples/sec   Loss 1.6081   LearningRate 0.0009   Epoch: 19   Global Step: 198630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:13:05,400-Speed 5982.33 samples/sec   Loss 1.5962   LearningRate 0.0009   Epoch: 19   Global Step: 198640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:13:12,257-Speed 5974.57 samples/sec   Loss 1.6316   LearningRate 0.0009   Epoch: 19   Global Step: 198650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:13:19,121-Speed 5968.07 samples/sec   Loss 1.6080   LearningRate 0.0009   Epoch: 19   Global Step: 198660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:13:25,986-Speed 5968.32 samples/sec   Loss 1.6096   LearningRate 0.0009   Epoch: 19   Global Step: 198670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:13:32,828-Speed 5989.34 samples/sec   Loss 1.6559   LearningRate 0.0009   Epoch: 19   Global Step: 198680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:13:39,667-Speed 5990.06 samples/sec   Loss 1.6196   LearningRate 0.0009   Epoch: 19   Global Step: 198690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:13:46,669-Speed 5852.55 samples/sec   Loss 1.6055   LearningRate 0.0009   Epoch: 19   Global Step: 198700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:13:53,620-Speed 5893.63 samples/sec   Loss 1.6319   LearningRate 0.0009   Epoch: 19   Global Step: 198710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:14:00,582-Speed 5884.48 samples/sec   Loss 1.5941   LearningRate 0.0009   Epoch: 19   Global Step: 198720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:14:07,445-Speed 5969.64 samples/sec   Loss 1.5862   LearningRate 0.0009   Epoch: 19   Global Step: 198730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:14:14,303-Speed 5975.99 samples/sec   Loss 1.5943   LearningRate 0.0009   Epoch: 19   Global Step: 198740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:14:21,170-Speed 5966.16 samples/sec   Loss 1.5957   LearningRate 0.0009   Epoch: 19   Global Step: 198750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:14:28,026-Speed 5976.30 samples/sec   Loss 1.6007   LearningRate 0.0009   Epoch: 19   Global Step: 198760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:14:34,867-Speed 5988.00 samples/sec   Loss 1.6084   LearningRate 0.0009   Epoch: 19   Global Step: 198770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:14:41,726-Speed 5974.43 samples/sec   Loss 1.5994   LearningRate 0.0008   Epoch: 19   Global Step: 198780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:14:48,570-Speed 5985.22 samples/sec   Loss 1.6331   LearningRate 0.0008   Epoch: 19   Global Step: 198790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:14:55,421-Speed 5980.21 samples/sec   Loss 1.6011   LearningRate 0.0008   Epoch: 19   Global Step: 198800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:15:02,274-Speed 5977.96 samples/sec   Loss 1.6198   LearningRate 0.0008   Epoch: 19   Global Step: 198810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:15:09,116-Speed 5987.72 samples/sec   Loss 1.6167   LearningRate 0.0008   Epoch: 19   Global Step: 198820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:15:15,949-Speed 5995.23 samples/sec   Loss 1.6204   LearningRate 0.0008   Epoch: 19   Global Step: 198830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:15:22,796-Speed 5983.80 samples/sec   Loss 1.5832   LearningRate 0.0008   Epoch: 19   Global Step: 198840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:15:29,642-Speed 5983.77 samples/sec   Loss 1.6080   LearningRate 0.0008   Epoch: 19   Global Step: 198850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:15:36,500-Speed 5973.98 samples/sec   Loss 1.6121   LearningRate 0.0008   Epoch: 19   Global Step: 198860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:15:43,355-Speed 5976.80 samples/sec   Loss 1.6132   LearningRate 0.0008   Epoch: 19   Global Step: 198870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:15:50,224-Speed 5965.06 samples/sec   Loss 1.5926   LearningRate 0.0008   Epoch: 19   Global Step: 198880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:15:57,078-Speed 5977.56 samples/sec   Loss 1.6076   LearningRate 0.0008   Epoch: 19   Global Step: 198890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:16:03,927-Speed 5980.87 samples/sec   Loss 1.5986   LearningRate 0.0008   Epoch: 19   Global Step: 198900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:16:10,777-Speed 5980.17 samples/sec   Loss 1.6031   LearningRate 0.0008   Epoch: 19   Global Step: 198910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:16:17,622-Speed 5985.50 samples/sec   Loss 1.6084   LearningRate 0.0008   Epoch: 19   Global Step: 198920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:16:24,464-Speed 5987.77 samples/sec   Loss 1.6028   LearningRate 0.0008   Epoch: 19   Global Step: 198930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:16:31,314-Speed 5980.47 samples/sec   Loss 1.5922   LearningRate 0.0008   Epoch: 19   Global Step: 198940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:16:38,189-Speed 5959.10 samples/sec   Loss 1.5944   LearningRate 0.0008   Epoch: 19   Global Step: 198950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:16:45,066-Speed 5957.36 samples/sec   Loss 1.6017   LearningRate 0.0008   Epoch: 19   Global Step: 198960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:16:51,943-Speed 5957.09 samples/sec   Loss 1.6238   LearningRate 0.0008   Epoch: 19   Global Step: 198970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:16:58,785-Speed 5988.25 samples/sec   Loss 1.5722   LearningRate 0.0008   Epoch: 19   Global Step: 198980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:17:05,655-Speed 5963.25 samples/sec   Loss 1.5570   LearningRate 0.0008   Epoch: 19   Global Step: 198990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:17:12,579-Speed 5917.30 samples/sec   Loss 1.6120   LearningRate 0.0008   Epoch: 19   Global Step: 199000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:17:19,484-Speed 5932.74 samples/sec   Loss 1.6020   LearningRate 0.0008   Epoch: 19   Global Step: 199010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:17:26,336-Speed 5981.51 samples/sec   Loss 1.5936   LearningRate 0.0008   Epoch: 19   Global Step: 199020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:17:33,180-Speed 5986.13 samples/sec   Loss 1.6048   LearningRate 0.0008   Epoch: 19   Global Step: 199030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:17:40,036-Speed 5976.02 samples/sec   Loss 1.5968   LearningRate 0.0008   Epoch: 19   Global Step: 199040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:17:46,894-Speed 5973.82 samples/sec   Loss 1.6015   LearningRate 0.0008   Epoch: 19   Global Step: 199050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:17:53,741-Speed 5982.99 samples/sec   Loss 1.6093   LearningRate 0.0008   Epoch: 19   Global Step: 199060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:18:00,585-Speed 5985.17 samples/sec   Loss 1.6191   LearningRate 0.0008   Epoch: 19   Global Step: 199070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:18:07,442-Speed 5975.19 samples/sec   Loss 1.5763   LearningRate 0.0008   Epoch: 19   Global Step: 199080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:18:14,295-Speed 5977.30 samples/sec   Loss 1.6159   LearningRate 0.0008   Epoch: 19   Global Step: 199090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:18:21,146-Speed 5982.39 samples/sec   Loss 1.6180   LearningRate 0.0008   Epoch: 19   Global Step: 199100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:18:28,029-Speed 5952.17 samples/sec   Loss 1.6095   LearningRate 0.0008   Epoch: 19   Global Step: 199110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:18:34,871-Speed 5986.68 samples/sec   Loss 1.6054   LearningRate 0.0008   Epoch: 19   Global Step: 199120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:18:41,724-Speed 5978.14 samples/sec   Loss 1.6045   LearningRate 0.0008   Epoch: 19   Global Step: 199130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:18:48,589-Speed 5968.49 samples/sec   Loss 1.5916   LearningRate 0.0008   Epoch: 19   Global Step: 199140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:18:55,540-Speed 5893.63 samples/sec   Loss 1.5965   LearningRate 0.0008   Epoch: 19   Global Step: 199150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:19:02,510-Speed 5877.78 samples/sec   Loss 1.6098   LearningRate 0.0008   Epoch: 19   Global Step: 199160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:19:09,374-Speed 5968.63 samples/sec   Loss 1.5956   LearningRate 0.0008   Epoch: 19   Global Step: 199170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:19:16,222-Speed 5982.76 samples/sec   Loss 1.5780   LearningRate 0.0008   Epoch: 19   Global Step: 199180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:19:23,067-Speed 5985.47 samples/sec   Loss 1.5825   LearningRate 0.0008   Epoch: 19   Global Step: 199190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:19:29,933-Speed 5966.48 samples/sec   Loss 1.5962   LearningRate 0.0008   Epoch: 19   Global Step: 199200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:19:36,800-Speed 5966.01 samples/sec   Loss 1.5916   LearningRate 0.0008   Epoch: 19   Global Step: 199210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:19:43,685-Speed 5951.03 samples/sec   Loss 1.5780   LearningRate 0.0008   Epoch: 19   Global Step: 199220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:19:50,534-Speed 5981.26 samples/sec   Loss 1.6039   LearningRate 0.0008   Epoch: 19   Global Step: 199230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:19:57,413-Speed 5956.25 samples/sec   Loss 1.6030   LearningRate 0.0008   Epoch: 19   Global Step: 199240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:20:04,273-Speed 5971.47 samples/sec   Loss 1.5831   LearningRate 0.0008   Epoch: 19   Global Step: 199250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:20:11,115-Speed 5987.44 samples/sec   Loss 1.5902   LearningRate 0.0008   Epoch: 19   Global Step: 199260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:20:17,984-Speed 5964.34 samples/sec   Loss 1.6051   LearningRate 0.0008   Epoch: 19   Global Step: 199270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:20:24,840-Speed 5975.52 samples/sec   Loss 1.5660   LearningRate 0.0008   Epoch: 19   Global Step: 199280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:20:31,711-Speed 5963.99 samples/sec   Loss 1.6402   LearningRate 0.0008   Epoch: 19   Global Step: 199290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:20:38,579-Speed 5964.73 samples/sec   Loss 1.5590   LearningRate 0.0007   Epoch: 19   Global Step: 199300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:20:45,455-Speed 5958.35 samples/sec   Loss 1.5908   LearningRate 0.0007   Epoch: 19   Global Step: 199310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:20:52,311-Speed 5974.85 samples/sec   Loss 1.5784   LearningRate 0.0007   Epoch: 19   Global Step: 199320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:20:59,192-Speed 5954.44 samples/sec   Loss 1.5782   LearningRate 0.0007   Epoch: 19   Global Step: 199330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:21:06,047-Speed 5976.28 samples/sec   Loss 1.6191   LearningRate 0.0007   Epoch: 19   Global Step: 199340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:21:12,904-Speed 5974.74 samples/sec   Loss 1.6094   LearningRate 0.0007   Epoch: 19   Global Step: 199350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:21:19,750-Speed 5983.97 samples/sec   Loss 1.5989   LearningRate 0.0007   Epoch: 19   Global Step: 199360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:21:26,622-Speed 5961.95 samples/sec   Loss 1.5815   LearningRate 0.0007   Epoch: 19   Global Step: 199370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:21:33,476-Speed 5976.95 samples/sec   Loss 1.5950   LearningRate 0.0007   Epoch: 19   Global Step: 199380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:21:40,362-Speed 5949.31 samples/sec   Loss 1.6008   LearningRate 0.0007   Epoch: 19   Global Step: 199390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:21:47,207-Speed 5986.26 samples/sec   Loss 1.6180   LearningRate 0.0007   Epoch: 19   Global Step: 199400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:21:54,082-Speed 5960.67 samples/sec   Loss 1.5548   LearningRate 0.0007   Epoch: 19   Global Step: 199410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:22:00,945-Speed 5968.77 samples/sec   Loss 1.5759   LearningRate 0.0007   Epoch: 19   Global Step: 199420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:22:07,804-Speed 5973.17 samples/sec   Loss 1.5991   LearningRate 0.0007   Epoch: 19   Global Step: 199430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:22:14,667-Speed 5969.36 samples/sec   Loss 1.5727   LearningRate 0.0007   Epoch: 19   Global Step: 199440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:22:21,560-Speed 5942.73 samples/sec   Loss 1.5575   LearningRate 0.0007   Epoch: 19   Global Step: 199450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:22:28,422-Speed 5970.25 samples/sec   Loss 1.6270   LearningRate 0.0007   Epoch: 19   Global Step: 199460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:22:35,291-Speed 5964.42 samples/sec   Loss 1.5874   LearningRate 0.0007   Epoch: 19   Global Step: 199470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:22:42,146-Speed 5976.76 samples/sec   Loss 1.6117   LearningRate 0.0007   Epoch: 19   Global Step: 199480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:22:49,035-Speed 5949.66 samples/sec   Loss 1.5919   LearningRate 0.0007   Epoch: 19   Global Step: 199490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:22:55,890-Speed 5978.15 samples/sec   Loss 1.5841   LearningRate 0.0007   Epoch: 19   Global Step: 199500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:23:02,747-Speed 5974.11 samples/sec   Loss 1.5963   LearningRate 0.0007   Epoch: 19   Global Step: 199510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:23:09,601-Speed 5977.36 samples/sec   Loss 1.5798   LearningRate 0.0007   Epoch: 19   Global Step: 199520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:23:16,474-Speed 5961.63 samples/sec   Loss 1.5981   LearningRate 0.0007   Epoch: 19   Global Step: 199530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:23:23,371-Speed 5940.12 samples/sec   Loss 1.6100   LearningRate 0.0007   Epoch: 19   Global Step: 199540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:23:30,228-Speed 5974.67 samples/sec   Loss 1.6090   LearningRate 0.0007   Epoch: 19   Global Step: 199550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:23:37,097-Speed 5964.41 samples/sec   Loss 1.5537   LearningRate 0.0007   Epoch: 19   Global Step: 199560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:23:43,952-Speed 5976.12 samples/sec   Loss 1.5876   LearningRate 0.0007   Epoch: 19   Global Step: 199570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:23:50,823-Speed 5964.29 samples/sec   Loss 1.5981   LearningRate 0.0007   Epoch: 19   Global Step: 199580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:23:57,793-Speed 5877.89 samples/sec   Loss 1.5812   LearningRate 0.0007   Epoch: 19   Global Step: 199590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:24:04,646-Speed 5978.16 samples/sec   Loss 1.5998   LearningRate 0.0007   Epoch: 19   Global Step: 199600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:24:11,500-Speed 5976.83 samples/sec   Loss 1.5706   LearningRate 0.0007   Epoch: 19   Global Step: 199610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:24:18,337-Speed 5992.07 samples/sec   Loss 1.5802   LearningRate 0.0007   Epoch: 19   Global Step: 199620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:24:25,226-Speed 5946.75 samples/sec   Loss 1.5929   LearningRate 0.0007   Epoch: 19   Global Step: 199630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:24:32,085-Speed 5973.23 samples/sec   Loss 1.5835   LearningRate 0.0007   Epoch: 19   Global Step: 199640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:24:38,980-Speed 5942.31 samples/sec   Loss 1.5677   LearningRate 0.0007   Epoch: 19   Global Step: 199650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:24:45,846-Speed 5966.26 samples/sec   Loss 1.5873   LearningRate 0.0007   Epoch: 19   Global Step: 199660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:24:52,685-Speed 5990.28 samples/sec   Loss 1.5799   LearningRate 0.0007   Epoch: 19   Global Step: 199670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:24:59,566-Speed 5953.57 samples/sec   Loss 1.5857   LearningRate 0.0007   Epoch: 19   Global Step: 199680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:25:06,454-Speed 5947.91 samples/sec   Loss 1.5868   LearningRate 0.0007   Epoch: 19   Global Step: 199690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:25:13,317-Speed 5970.84 samples/sec   Loss 1.5704   LearningRate 0.0007   Epoch: 19   Global Step: 199700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-01-09 11:25:20,177-Speed 5971.57 samples/sec   Loss 1.5993   LearningRate 0.0007   Epoch: 19   Global Step: 199710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:25:27,025-Speed 5985.10 samples/sec   Loss 1.5915   LearningRate 0.0007   Epoch: 19   Global Step: 199720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-01-09 11:25:33,905-Speed 5955.48 samples/sec   Loss 1.5920   LearningRate 0.0007   Epoch: 19   Global Step: 199730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:25:40,870-Speed 5881.99 samples/sec   Loss 1.5801   LearningRate 0.0007   Epoch: 19   Global Step: 199740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:25:47,733-Speed 5969.91 samples/sec   Loss 1.5621   LearningRate 0.0007   Epoch: 19   Global Step: 199750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:25:54,581-Speed 5982.46 samples/sec   Loss 1.5788   LearningRate 0.0007   Epoch: 19   Global Step: 199760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:26:07,181-Speed 3251.22 samples/sec   Loss 1.5546   LearningRate 0.0007   Epoch: 19   Global Step: 199770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:26:14,024-Speed 5987.28 samples/sec   Loss 1.5962   LearningRate 0.0007   Epoch: 19   Global Step: 199780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:26:20,866-Speed 5987.47 samples/sec   Loss 1.5995   LearningRate 0.0007   Epoch: 19   Global Step: 199790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:26:27,729-Speed 5969.33 samples/sec   Loss 1.5545   LearningRate 0.0007   Epoch: 19   Global Step: 199800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:26:34,568-Speed 5990.35 samples/sec   Loss 1.5918   LearningRate 0.0007   Epoch: 19   Global Step: 199810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:26:41,419-Speed 5979.91 samples/sec   Loss 1.5775   LearningRate 0.0007   Epoch: 19   Global Step: 199820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:26:48,263-Speed 5984.84 samples/sec   Loss 1.5654   LearningRate 0.0007   Epoch: 19   Global Step: 199830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:26:55,109-Speed 5984.45 samples/sec   Loss 1.5702   LearningRate 0.0007   Epoch: 19   Global Step: 199840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:27:01,957-Speed 5983.09 samples/sec   Loss 1.5982   LearningRate 0.0007   Epoch: 19   Global Step: 199850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:27:08,890-Speed 5908.90 samples/sec   Loss 1.5821   LearningRate 0.0006   Epoch: 19   Global Step: 199860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:27:15,764-Speed 5959.85 samples/sec   Loss 1.5808   LearningRate 0.0006   Epoch: 19   Global Step: 199870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:27:22,617-Speed 5978.31 samples/sec   Loss 1.5932   LearningRate 0.0006   Epoch: 19   Global Step: 199880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:27:29,472-Speed 5976.00 samples/sec   Loss 1.5849   LearningRate 0.0006   Epoch: 19   Global Step: 199890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:27:36,325-Speed 5977.99 samples/sec   Loss 1.5932   LearningRate 0.0006   Epoch: 19   Global Step: 199900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:27:43,185-Speed 5972.37 samples/sec   Loss 1.5581   LearningRate 0.0006   Epoch: 19   Global Step: 199910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:27:50,061-Speed 5957.17 samples/sec   Loss 1.5563   LearningRate 0.0006   Epoch: 19   Global Step: 199920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:27:56,932-Speed 5962.99 samples/sec   Loss 1.5771   LearningRate 0.0006   Epoch: 19   Global Step: 199930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:28:03,790-Speed 5973.63 samples/sec   Loss 1.6112   LearningRate 0.0006   Epoch: 19   Global Step: 199940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:28:10,656-Speed 5965.75 samples/sec   Loss 1.5818   LearningRate 0.0006   Epoch: 19   Global Step: 199950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:28:17,506-Speed 5981.41 samples/sec   Loss 1.5770   LearningRate 0.0006   Epoch: 19   Global Step: 199960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:28:24,359-Speed 5979.35 samples/sec   Loss 1.5902   LearningRate 0.0006   Epoch: 19   Global Step: 199970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:28:31,223-Speed 5968.35 samples/sec   Loss 1.5844   LearningRate 0.0006   Epoch: 19   Global Step: 199980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:28:38,077-Speed 5978.17 samples/sec   Loss 1.6035   LearningRate 0.0006   Epoch: 19   Global Step: 199990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:28:44,940-Speed 5969.11 samples/sec   Loss 1.5668   LearningRate 0.0006   Epoch: 19   Global Step: 200000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:29:11,620-[lfw][200000]XNorm: 23.359737
Training: 2022-01-09 11:29:11,620-[lfw][200000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-01-09 11:29:11,621-[lfw][200000]Accuracy-Highest: 0.99833
Training: 2022-01-09 11:29:42,551-[cfp_fp][200000]XNorm: 21.488663
Training: 2022-01-09 11:29:42,552-[cfp_fp][200000]Accuracy-Flip: 0.99243+-0.00320
Training: 2022-01-09 11:29:42,553-[cfp_fp][200000]Accuracy-Highest: 0.99286
Training: 2022-01-09 11:30:09,161-[agedb_30][200000]XNorm: 22.877173
Training: 2022-01-09 11:30:09,161-[agedb_30][200000]Accuracy-Flip: 0.98300+-0.00515
Training: 2022-01-09 11:30:09,162-[agedb_30][200000]Accuracy-Highest: 0.98300
Training: 2022-01-09 11:30:16,016-Speed 449.74 samples/sec   Loss 1.5715   LearningRate 0.0006   Epoch: 19   Global Step: 200010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:30:22,851-Speed 5994.20 samples/sec   Loss 1.5735   LearningRate 0.0006   Epoch: 19   Global Step: 200020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:30:29,700-Speed 5982.53 samples/sec   Loss 1.5884   LearningRate 0.0006   Epoch: 19   Global Step: 200030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:30:36,581-Speed 5954.30 samples/sec   Loss 1.5814   LearningRate 0.0006   Epoch: 19   Global Step: 200040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:30:43,436-Speed 5976.01 samples/sec   Loss 1.5978   LearningRate 0.0006   Epoch: 19   Global Step: 200050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:30:50,275-Speed 5990.13 samples/sec   Loss 1.5836   LearningRate 0.0006   Epoch: 19   Global Step: 200060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:30:57,119-Speed 5985.76 samples/sec   Loss 1.5743   LearningRate 0.0006   Epoch: 19   Global Step: 200070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:31:03,961-Speed 5987.75 samples/sec   Loss 1.5940   LearningRate 0.0006   Epoch: 19   Global Step: 200080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:31:10,830-Speed 5963.81 samples/sec   Loss 1.5589   LearningRate 0.0006   Epoch: 19   Global Step: 200090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:31:17,675-Speed 5985.45 samples/sec   Loss 1.5705   LearningRate 0.0006   Epoch: 19   Global Step: 200100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:31:24,517-Speed 5987.66 samples/sec   Loss 1.5858   LearningRate 0.0006   Epoch: 19   Global Step: 200110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:31:31,358-Speed 5988.53 samples/sec   Loss 1.5636   LearningRate 0.0006   Epoch: 19   Global Step: 200120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:31:38,208-Speed 5980.90 samples/sec   Loss 1.5990   LearningRate 0.0006   Epoch: 19   Global Step: 200130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:31:45,076-Speed 5964.71 samples/sec   Loss 1.5777   LearningRate 0.0006   Epoch: 19   Global Step: 200140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:31:55,229-Speed 6001.77 samples/sec   Loss 1.5498   LearningRate 0.0006   Epoch: 19   Global Step: 200150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:32:02,070-Speed 5988.60 samples/sec   Loss 1.5696   LearningRate 0.0006   Epoch: 19   Global Step: 200160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:32:08,900-Speed 5998.41 samples/sec   Loss 1.5582   LearningRate 0.0006   Epoch: 19   Global Step: 200170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:32:15,737-Speed 5992.06 samples/sec   Loss 1.5902   LearningRate 0.0006   Epoch: 19   Global Step: 200180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:32:22,585-Speed 5982.00 samples/sec   Loss 1.5738   LearningRate 0.0006   Epoch: 19   Global Step: 200190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:32:29,420-Speed 5994.36 samples/sec   Loss 1.5907   LearningRate 0.0006   Epoch: 19   Global Step: 200200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:32:36,269-Speed 5984.14 samples/sec   Loss 1.5525   LearningRate 0.0006   Epoch: 19   Global Step: 200210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:32:43,131-Speed 5970.70 samples/sec   Loss 1.5817   LearningRate 0.0006   Epoch: 19   Global Step: 200220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:32:49,981-Speed 5980.37 samples/sec   Loss 1.5569   LearningRate 0.0006   Epoch: 19   Global Step: 200230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:32:56,834-Speed 5979.22 samples/sec   Loss 1.5855   LearningRate 0.0006   Epoch: 19   Global Step: 200240   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 11:33:03,665-Speed 5997.43 samples/sec   Loss 1.5742   LearningRate 0.0006   Epoch: 19   Global Step: 200250   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 11:33:10,511-Speed 5984.02 samples/sec   Loss 1.5227   LearningRate 0.0006   Epoch: 19   Global Step: 200260   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 11:33:17,346-Speed 5994.16 samples/sec   Loss 1.5615   LearningRate 0.0006   Epoch: 19   Global Step: 200270   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 11:33:24,183-Speed 5991.85 samples/sec   Loss 1.5727   LearningRate 0.0006   Epoch: 19   Global Step: 200280   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 11:33:31,032-Speed 5982.84 samples/sec   Loss 1.5422   LearningRate 0.0006   Epoch: 19   Global Step: 200290   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 11:33:37,874-Speed 5988.03 samples/sec   Loss 1.5631   LearningRate 0.0006   Epoch: 19   Global Step: 200300   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 11:33:44,710-Speed 5992.69 samples/sec   Loss 1.5616   LearningRate 0.0006   Epoch: 19   Global Step: 200310   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 11:33:51,577-Speed 5965.92 samples/sec   Loss 1.5887   LearningRate 0.0006   Epoch: 19   Global Step: 200320   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 11:33:58,443-Speed 5966.72 samples/sec   Loss 1.5746   LearningRate 0.0006   Epoch: 19   Global Step: 200330   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 11:34:05,281-Speed 5990.92 samples/sec   Loss 1.5633   LearningRate 0.0006   Epoch: 19   Global Step: 200340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:34:12,116-Speed 5993.66 samples/sec   Loss 1.5620   LearningRate 0.0006   Epoch: 19   Global Step: 200350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:34:18,980-Speed 5968.14 samples/sec   Loss 1.5726   LearningRate 0.0006   Epoch: 19   Global Step: 200360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:34:25,819-Speed 5990.76 samples/sec   Loss 1.5785   LearningRate 0.0006   Epoch: 19   Global Step: 200370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:34:32,668-Speed 5981.84 samples/sec   Loss 1.5437   LearningRate 0.0006   Epoch: 19   Global Step: 200380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:34:39,526-Speed 5973.56 samples/sec   Loss 1.5666   LearningRate 0.0006   Epoch: 19   Global Step: 200390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:34:46,391-Speed 5967.24 samples/sec   Loss 1.5596   LearningRate 0.0006   Epoch: 19   Global Step: 200400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:34:53,245-Speed 5977.55 samples/sec   Loss 1.5702   LearningRate 0.0006   Epoch: 19   Global Step: 200410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:35:00,089-Speed 5985.40 samples/sec   Loss 1.5919   LearningRate 0.0006   Epoch: 19   Global Step: 200420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:35:06,937-Speed 5982.45 samples/sec   Loss 1.5536   LearningRate 0.0006   Epoch: 19   Global Step: 200430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:35:13,798-Speed 5970.46 samples/sec   Loss 1.5563   LearningRate 0.0006   Epoch: 19   Global Step: 200440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:35:20,664-Speed 5967.33 samples/sec   Loss 1.5496   LearningRate 0.0006   Epoch: 19   Global Step: 200450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:35:27,540-Speed 5959.53 samples/sec   Loss 1.5813   LearningRate 0.0005   Epoch: 19   Global Step: 200460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:35:34,381-Speed 5988.65 samples/sec   Loss 1.5833   LearningRate 0.0005   Epoch: 19   Global Step: 200470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:35:41,228-Speed 5982.92 samples/sec   Loss 1.5454   LearningRate 0.0005   Epoch: 19   Global Step: 200480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:35:48,068-Speed 5989.82 samples/sec   Loss 1.5714   LearningRate 0.0005   Epoch: 19   Global Step: 200490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:35:54,927-Speed 5972.96 samples/sec   Loss 1.5702   LearningRate 0.0005   Epoch: 19   Global Step: 200500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:36:01,785-Speed 5973.43 samples/sec   Loss 1.5730   LearningRate 0.0005   Epoch: 19   Global Step: 200510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:36:08,634-Speed 5982.60 samples/sec   Loss 1.5530   LearningRate 0.0005   Epoch: 19   Global Step: 200520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:36:15,561-Speed 5913.71 samples/sec   Loss 1.5444   LearningRate 0.0005   Epoch: 19   Global Step: 200530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:36:22,438-Speed 5958.73 samples/sec   Loss 1.5966   LearningRate 0.0005   Epoch: 19   Global Step: 200540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:36:29,290-Speed 5980.61 samples/sec   Loss 1.5792   LearningRate 0.0005   Epoch: 19   Global Step: 200550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:36:36,129-Speed 5989.85 samples/sec   Loss 1.5595   LearningRate 0.0005   Epoch: 19   Global Step: 200560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:36:42,995-Speed 5966.69 samples/sec   Loss 1.5571   LearningRate 0.0005   Epoch: 19   Global Step: 200570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:36:49,859-Speed 5968.58 samples/sec   Loss 1.5724   LearningRate 0.0005   Epoch: 19   Global Step: 200580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:36:56,702-Speed 5987.86 samples/sec   Loss 1.5849   LearningRate 0.0005   Epoch: 19   Global Step: 200590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:37:03,546-Speed 5987.17 samples/sec   Loss 1.5743   LearningRate 0.0005   Epoch: 19   Global Step: 200600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:37:10,396-Speed 5980.42 samples/sec   Loss 1.5617   LearningRate 0.0005   Epoch: 19   Global Step: 200610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:37:17,234-Speed 5990.89 samples/sec   Loss 1.5659   LearningRate 0.0005   Epoch: 19   Global Step: 200620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:37:24,200-Speed 5881.09 samples/sec   Loss 1.5880   LearningRate 0.0005   Epoch: 19   Global Step: 200630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:37:31,089-Speed 5947.20 samples/sec   Loss 1.5568   LearningRate 0.0005   Epoch: 19   Global Step: 200640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:37:37,923-Speed 5994.37 samples/sec   Loss 1.5297   LearningRate 0.0005   Epoch: 19   Global Step: 200650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:37:44,772-Speed 5981.17 samples/sec   Loss 1.5672   LearningRate 0.0005   Epoch: 19   Global Step: 200660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:37:51,618-Speed 5984.95 samples/sec   Loss 1.5544   LearningRate 0.0005   Epoch: 19   Global Step: 200670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:37:58,476-Speed 5974.02 samples/sec   Loss 1.5379   LearningRate 0.0005   Epoch: 19   Global Step: 200680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:38:05,330-Speed 5976.44 samples/sec   Loss 1.5624   LearningRate 0.0005   Epoch: 19   Global Step: 200690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:38:12,173-Speed 5986.90 samples/sec   Loss 1.5590   LearningRate 0.0005   Epoch: 19   Global Step: 200700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:38:19,626-Speed 5497.00 samples/sec   Loss 1.5389   LearningRate 0.0005   Epoch: 19   Global Step: 200710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:38:26,487-Speed 5970.78 samples/sec   Loss 1.5803   LearningRate 0.0005   Epoch: 19   Global Step: 200720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:38:33,340-Speed 5978.53 samples/sec   Loss 1.5623   LearningRate 0.0005   Epoch: 19   Global Step: 200730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:38:40,186-Speed 5983.55 samples/sec   Loss 1.5692   LearningRate 0.0005   Epoch: 19   Global Step: 200740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:38:47,077-Speed 5945.47 samples/sec   Loss 1.5585   LearningRate 0.0005   Epoch: 19   Global Step: 200750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:38:53,918-Speed 5988.79 samples/sec   Loss 1.5553   LearningRate 0.0005   Epoch: 19   Global Step: 200760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:39:00,761-Speed 5987.05 samples/sec   Loss 1.5234   LearningRate 0.0005   Epoch: 19   Global Step: 200770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:39:07,618-Speed 5974.20 samples/sec   Loss 1.5599   LearningRate 0.0005   Epoch: 19   Global Step: 200780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:39:14,468-Speed 5981.36 samples/sec   Loss 1.5655   LearningRate 0.0005   Epoch: 19   Global Step: 200790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:39:21,319-Speed 5979.25 samples/sec   Loss 1.5633   LearningRate 0.0005   Epoch: 19   Global Step: 200800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:39:28,168-Speed 5981.47 samples/sec   Loss 1.5941   LearningRate 0.0005   Epoch: 19   Global Step: 200810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:39:35,010-Speed 5987.26 samples/sec   Loss 1.5557   LearningRate 0.0005   Epoch: 19   Global Step: 200820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:39:41,861-Speed 5980.36 samples/sec   Loss 1.5815   LearningRate 0.0005   Epoch: 19   Global Step: 200830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:39:48,724-Speed 5968.65 samples/sec   Loss 1.5635   LearningRate 0.0005   Epoch: 19   Global Step: 200840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:39:55,561-Speed 5991.86 samples/sec   Loss 1.5556   LearningRate 0.0005   Epoch: 19   Global Step: 200850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:40:02,433-Speed 5961.76 samples/sec   Loss 1.5592   LearningRate 0.0005   Epoch: 19   Global Step: 200860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:40:09,305-Speed 5962.36 samples/sec   Loss 1.5643   LearningRate 0.0005   Epoch: 19   Global Step: 200870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:40:16,155-Speed 5980.56 samples/sec   Loss 1.5534   LearningRate 0.0005   Epoch: 19   Global Step: 200880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:40:23,022-Speed 5965.96 samples/sec   Loss 1.5224   LearningRate 0.0005   Epoch: 19   Global Step: 200890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:40:29,876-Speed 5977.35 samples/sec   Loss 1.5536   LearningRate 0.0005   Epoch: 19   Global Step: 200900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:40:36,727-Speed 5979.10 samples/sec   Loss 1.5801   LearningRate 0.0005   Epoch: 19   Global Step: 200910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:40:43,577-Speed 5980.56 samples/sec   Loss 1.5567   LearningRate 0.0005   Epoch: 19   Global Step: 200920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:40:50,423-Speed 5986.82 samples/sec   Loss 1.5447   LearningRate 0.0005   Epoch: 19   Global Step: 200930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:40:57,266-Speed 5986.05 samples/sec   Loss 1.5471   LearningRate 0.0005   Epoch: 19   Global Step: 200940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:41:04,122-Speed 5975.83 samples/sec   Loss 1.5602   LearningRate 0.0005   Epoch: 19   Global Step: 200950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:41:10,993-Speed 5962.42 samples/sec   Loss 1.5483   LearningRate 0.0005   Epoch: 19   Global Step: 200960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:41:17,866-Speed 5961.27 samples/sec   Loss 1.5587   LearningRate 0.0005   Epoch: 19   Global Step: 200970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:41:24,737-Speed 5962.61 samples/sec   Loss 1.5622   LearningRate 0.0005   Epoch: 19   Global Step: 200980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:41:31,619-Speed 5952.46 samples/sec   Loss 1.5248   LearningRate 0.0005   Epoch: 19   Global Step: 200990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:41:38,483-Speed 5968.28 samples/sec   Loss 1.5479   LearningRate 0.0005   Epoch: 19   Global Step: 201000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:41:45,338-Speed 5976.81 samples/sec   Loss 1.6080   LearningRate 0.0005   Epoch: 19   Global Step: 201010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:41:52,189-Speed 5978.98 samples/sec   Loss 1.5522   LearningRate 0.0005   Epoch: 19   Global Step: 201020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:41:59,096-Speed 5933.79 samples/sec   Loss 1.5434   LearningRate 0.0005   Epoch: 19   Global Step: 201030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:42:05,999-Speed 5934.77 samples/sec   Loss 1.5972   LearningRate 0.0005   Epoch: 19   Global Step: 201040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:42:12,870-Speed 5962.23 samples/sec   Loss 1.5906   LearningRate 0.0005   Epoch: 19   Global Step: 201050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:42:19,843-Speed 5875.74 samples/sec   Loss 1.5255   LearningRate 0.0005   Epoch: 19   Global Step: 201060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:42:26,700-Speed 5974.15 samples/sec   Loss 1.5712   LearningRate 0.0005   Epoch: 19   Global Step: 201070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:42:33,550-Speed 5981.08 samples/sec   Loss 1.5479   LearningRate 0.0005   Epoch: 19   Global Step: 201080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:42:40,401-Speed 5979.69 samples/sec   Loss 1.5414   LearningRate 0.0005   Epoch: 19   Global Step: 201090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:42:47,260-Speed 5973.18 samples/sec   Loss 1.5431   LearningRate 0.0005   Epoch: 19   Global Step: 201100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:42:54,110-Speed 5980.80 samples/sec   Loss 1.5211   LearningRate 0.0005   Epoch: 19   Global Step: 201110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:43:00,964-Speed 5976.06 samples/sec   Loss 1.5582   LearningRate 0.0004   Epoch: 19   Global Step: 201120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:43:07,817-Speed 5980.67 samples/sec   Loss 1.5381   LearningRate 0.0004   Epoch: 19   Global Step: 201130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:43:14,682-Speed 5967.00 samples/sec   Loss 1.5792   LearningRate 0.0004   Epoch: 19   Global Step: 201140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:43:21,532-Speed 5981.06 samples/sec   Loss 1.5576   LearningRate 0.0004   Epoch: 19   Global Step: 201150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:43:28,410-Speed 5956.32 samples/sec   Loss 1.5237   LearningRate 0.0004   Epoch: 19   Global Step: 201160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:43:35,268-Speed 5974.00 samples/sec   Loss 1.5392   LearningRate 0.0004   Epoch: 19   Global Step: 201170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:43:42,142-Speed 5961.93 samples/sec   Loss 1.5471   LearningRate 0.0004   Epoch: 19   Global Step: 201180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:43:49,014-Speed 5961.51 samples/sec   Loss 1.5446   LearningRate 0.0004   Epoch: 19   Global Step: 201190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:43:55,876-Speed 5970.08 samples/sec   Loss 1.5452   LearningRate 0.0004   Epoch: 19   Global Step: 201200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:44:02,745-Speed 5964.39 samples/sec   Loss 1.5706   LearningRate 0.0004   Epoch: 19   Global Step: 201210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:44:09,604-Speed 5972.89 samples/sec   Loss 1.5558   LearningRate 0.0004   Epoch: 19   Global Step: 201220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:44:16,476-Speed 5962.49 samples/sec   Loss 1.5433   LearningRate 0.0004   Epoch: 19   Global Step: 201230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:44:23,361-Speed 5950.75 samples/sec   Loss 1.5202   LearningRate 0.0004   Epoch: 19   Global Step: 201240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:44:30,217-Speed 5975.01 samples/sec   Loss 1.5581   LearningRate 0.0004   Epoch: 19   Global Step: 201250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:44:43,152-Speed 3167.04 samples/sec   Loss 1.5319   LearningRate 0.0004   Epoch: 19   Global Step: 201260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:44:50,002-Speed 5980.20 samples/sec   Loss 1.5551   LearningRate 0.0004   Epoch: 19   Global Step: 201270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:44:56,839-Speed 5991.85 samples/sec   Loss 1.5476   LearningRate 0.0004   Epoch: 19   Global Step: 201280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:45:03,705-Speed 5966.90 samples/sec   Loss 1.5684   LearningRate 0.0004   Epoch: 19   Global Step: 201290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:45:10,574-Speed 5964.50 samples/sec   Loss 1.5439   LearningRate 0.0004   Epoch: 19   Global Step: 201300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:45:17,424-Speed 5981.07 samples/sec   Loss 1.5293   LearningRate 0.0004   Epoch: 19   Global Step: 201310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:45:24,283-Speed 5972.93 samples/sec   Loss 1.5555   LearningRate 0.0004   Epoch: 19   Global Step: 201320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:45:31,132-Speed 5981.68 samples/sec   Loss 1.5452   LearningRate 0.0004   Epoch: 19   Global Step: 201330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:45:38,010-Speed 5955.94 samples/sec   Loss 1.5637   LearningRate 0.0004   Epoch: 19   Global Step: 201340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:45:44,876-Speed 5967.37 samples/sec   Loss 1.5534   LearningRate 0.0004   Epoch: 19   Global Step: 201350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:45:51,750-Speed 5959.61 samples/sec   Loss 1.5630   LearningRate 0.0004   Epoch: 19   Global Step: 201360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:45:58,614-Speed 5968.20 samples/sec   Loss 1.5422   LearningRate 0.0004   Epoch: 19   Global Step: 201370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:46:05,513-Speed 5938.51 samples/sec   Loss 1.5436   LearningRate 0.0004   Epoch: 19   Global Step: 201380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:46:12,386-Speed 5960.72 samples/sec   Loss 1.5461   LearningRate 0.0004   Epoch: 19   Global Step: 201390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:46:19,254-Speed 5965.69 samples/sec   Loss 1.5575   LearningRate 0.0004   Epoch: 19   Global Step: 201400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:46:26,131-Speed 5957.91 samples/sec   Loss 1.5623   LearningRate 0.0004   Epoch: 19   Global Step: 201410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:46:33,010-Speed 5955.92 samples/sec   Loss 1.5536   LearningRate 0.0004   Epoch: 19   Global Step: 201420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:46:39,878-Speed 5964.85 samples/sec   Loss 1.5608   LearningRate 0.0004   Epoch: 19   Global Step: 201430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:46:46,753-Speed 5959.54 samples/sec   Loss 1.5596   LearningRate 0.0004   Epoch: 19   Global Step: 201440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:46:53,626-Speed 5961.25 samples/sec   Loss 1.5548   LearningRate 0.0004   Epoch: 19   Global Step: 201450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:47:00,500-Speed 5959.82 samples/sec   Loss 1.5286   LearningRate 0.0004   Epoch: 19   Global Step: 201460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:47:07,359-Speed 5972.46 samples/sec   Loss 1.5300   LearningRate 0.0004   Epoch: 19   Global Step: 201470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:47:14,216-Speed 5975.10 samples/sec   Loss 1.5500   LearningRate 0.0004   Epoch: 19   Global Step: 201480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:47:21,088-Speed 5964.30 samples/sec   Loss 1.5398   LearningRate 0.0004   Epoch: 19   Global Step: 201490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:47:27,943-Speed 5975.92 samples/sec   Loss 1.5286   LearningRate 0.0004   Epoch: 19   Global Step: 201500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:47:34,807-Speed 5967.87 samples/sec   Loss 1.5434   LearningRate 0.0004   Epoch: 19   Global Step: 201510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:47:41,659-Speed 5979.47 samples/sec   Loss 1.5383   LearningRate 0.0004   Epoch: 19   Global Step: 201520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:47:48,573-Speed 5925.03 samples/sec   Loss 1.5545   LearningRate 0.0004   Epoch: 19   Global Step: 201530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:47:55,453-Speed 5954.09 samples/sec   Loss 1.5532   LearningRate 0.0004   Epoch: 19   Global Step: 201540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:48:02,306-Speed 5978.39 samples/sec   Loss 1.5331   LearningRate 0.0004   Epoch: 19   Global Step: 201550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:48:09,177-Speed 5962.68 samples/sec   Loss 1.5294   LearningRate 0.0004   Epoch: 19   Global Step: 201560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:48:16,068-Speed 5945.37 samples/sec   Loss 1.5263   LearningRate 0.0004   Epoch: 19   Global Step: 201570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:48:22,920-Speed 5978.51 samples/sec   Loss 1.5534   LearningRate 0.0004   Epoch: 19   Global Step: 201580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:48:29,787-Speed 5966.25 samples/sec   Loss 1.5146   LearningRate 0.0004   Epoch: 19   Global Step: 201590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:48:36,652-Speed 5967.07 samples/sec   Loss 1.5663   LearningRate 0.0004   Epoch: 19   Global Step: 201600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:48:43,508-Speed 5975.66 samples/sec   Loss 1.5445   LearningRate 0.0004   Epoch: 19   Global Step: 201610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:48:50,369-Speed 5971.01 samples/sec   Loss 1.5455   LearningRate 0.0004   Epoch: 19   Global Step: 201620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:48:57,267-Speed 5938.77 samples/sec   Loss 1.5470   LearningRate 0.0004   Epoch: 19   Global Step: 201630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:49:04,135-Speed 5965.75 samples/sec   Loss 1.5358   LearningRate 0.0004   Epoch: 19   Global Step: 201640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:49:10,987-Speed 5979.24 samples/sec   Loss 1.5588   LearningRate 0.0004   Epoch: 19   Global Step: 201650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:49:17,871-Speed 5950.48 samples/sec   Loss 1.5400   LearningRate 0.0004   Epoch: 19   Global Step: 201660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:49:24,740-Speed 5964.43 samples/sec   Loss 1.5743   LearningRate 0.0004   Epoch: 19   Global Step: 201670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:49:31,603-Speed 5972.07 samples/sec   Loss 1.5408   LearningRate 0.0004   Epoch: 19   Global Step: 201680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:49:38,466-Speed 5968.84 samples/sec   Loss 1.5410   LearningRate 0.0004   Epoch: 19   Global Step: 201690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:49:45,326-Speed 5972.74 samples/sec   Loss 1.5486   LearningRate 0.0004   Epoch: 19   Global Step: 201700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:49:52,182-Speed 5974.80 samples/sec   Loss 1.5522   LearningRate 0.0004   Epoch: 19   Global Step: 201710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:49:59,041-Speed 5973.06 samples/sec   Loss 1.5306   LearningRate 0.0004   Epoch: 19   Global Step: 201720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:50:05,934-Speed 5943.36 samples/sec   Loss 1.5514   LearningRate 0.0004   Epoch: 19   Global Step: 201730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:50:12,810-Speed 5959.22 samples/sec   Loss 1.5394   LearningRate 0.0004   Epoch: 19   Global Step: 201740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:50:19,681-Speed 5963.16 samples/sec   Loss 1.5195   LearningRate 0.0004   Epoch: 19   Global Step: 201750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:50:26,561-Speed 5953.77 samples/sec   Loss 1.5446   LearningRate 0.0004   Epoch: 19   Global Step: 201760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:50:33,414-Speed 5978.17 samples/sec   Loss 1.5427   LearningRate 0.0004   Epoch: 19   Global Step: 201770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:50:40,271-Speed 5974.70 samples/sec   Loss 1.5429   LearningRate 0.0004   Epoch: 19   Global Step: 201780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:50:47,135-Speed 5967.58 samples/sec   Loss 1.4964   LearningRate 0.0004   Epoch: 19   Global Step: 201790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:50:54,008-Speed 5960.97 samples/sec   Loss 1.5073   LearningRate 0.0004   Epoch: 19   Global Step: 201800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:51:00,873-Speed 5967.55 samples/sec   Loss 1.5584   LearningRate 0.0004   Epoch: 19   Global Step: 201810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:51:07,800-Speed 5915.04 samples/sec   Loss 1.5222   LearningRate 0.0004   Epoch: 19   Global Step: 201820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:51:14,742-Speed 5901.79 samples/sec   Loss 1.5314   LearningRate 0.0004   Epoch: 19   Global Step: 201830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:51:21,620-Speed 5958.02 samples/sec   Loss 1.5260   LearningRate 0.0004   Epoch: 19   Global Step: 201840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:51:28,486-Speed 5966.45 samples/sec   Loss 1.5417   LearningRate 0.0004   Epoch: 19   Global Step: 201850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:51:35,354-Speed 5965.22 samples/sec   Loss 1.5564   LearningRate 0.0003   Epoch: 19   Global Step: 201860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:51:42,228-Speed 5959.10 samples/sec   Loss 1.5367   LearningRate 0.0003   Epoch: 19   Global Step: 201870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:51:49,104-Speed 5959.00 samples/sec   Loss 1.5189   LearningRate 0.0003   Epoch: 19   Global Step: 201880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:51:55,960-Speed 5975.70 samples/sec   Loss 1.5356   LearningRate 0.0003   Epoch: 19   Global Step: 201890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:52:02,826-Speed 5966.27 samples/sec   Loss 1.5238   LearningRate 0.0003   Epoch: 19   Global Step: 201900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:52:09,697-Speed 5962.65 samples/sec   Loss 1.5586   LearningRate 0.0003   Epoch: 19   Global Step: 201910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:52:16,549-Speed 5978.70 samples/sec   Loss 1.5233   LearningRate 0.0003   Epoch: 19   Global Step: 201920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:52:23,422-Speed 5962.35 samples/sec   Loss 1.5144   LearningRate 0.0003   Epoch: 19   Global Step: 201930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:52:30,284-Speed 5970.79 samples/sec   Loss 1.5496   LearningRate 0.0003   Epoch: 19   Global Step: 201940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:52:37,167-Speed 5951.99 samples/sec   Loss 1.5588   LearningRate 0.0003   Epoch: 19   Global Step: 201950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:52:44,037-Speed 5963.04 samples/sec   Loss 1.5506   LearningRate 0.0003   Epoch: 19   Global Step: 201960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:52:50,916-Speed 5956.05 samples/sec   Loss 1.5374   LearningRate 0.0003   Epoch: 19   Global Step: 201970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:52:57,776-Speed 5973.34 samples/sec   Loss 1.5374   LearningRate 0.0003   Epoch: 19   Global Step: 201980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:53:04,635-Speed 5972.33 samples/sec   Loss 1.5535   LearningRate 0.0003   Epoch: 19   Global Step: 201990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:53:11,493-Speed 5974.53 samples/sec   Loss 1.5464   LearningRate 0.0003   Epoch: 19   Global Step: 202000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:53:18,380-Speed 5948.76 samples/sec   Loss 1.5480   LearningRate 0.0003   Epoch: 19   Global Step: 202010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:53:25,241-Speed 5972.56 samples/sec   Loss 1.5254   LearningRate 0.0003   Epoch: 19   Global Step: 202020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:53:32,093-Speed 5978.68 samples/sec   Loss 1.5417   LearningRate 0.0003   Epoch: 19   Global Step: 202030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:53:38,949-Speed 5975.65 samples/sec   Loss 1.5100   LearningRate 0.0003   Epoch: 19   Global Step: 202040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:53:45,808-Speed 5972.93 samples/sec   Loss 1.5183   LearningRate 0.0003   Epoch: 19   Global Step: 202050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:53:52,679-Speed 5962.46 samples/sec   Loss 1.5351   LearningRate 0.0003   Epoch: 19   Global Step: 202060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:53:59,566-Speed 5948.64 samples/sec   Loss 1.5371   LearningRate 0.0003   Epoch: 19   Global Step: 202070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:54:06,411-Speed 5984.65 samples/sec   Loss 1.5468   LearningRate 0.0003   Epoch: 19   Global Step: 202080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:54:13,269-Speed 5974.30 samples/sec   Loss 1.5432   LearningRate 0.0003   Epoch: 19   Global Step: 202090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:54:20,120-Speed 5982.81 samples/sec   Loss 1.5234   LearningRate 0.0003   Epoch: 19   Global Step: 202100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:54:26,973-Speed 5979.33 samples/sec   Loss 1.5205   LearningRate 0.0003   Epoch: 19   Global Step: 202110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:54:33,837-Speed 5968.46 samples/sec   Loss 1.5388   LearningRate 0.0003   Epoch: 19   Global Step: 202120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:54:40,742-Speed 5932.97 samples/sec   Loss 1.5525   LearningRate 0.0003   Epoch: 19   Global Step: 202130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:54:47,598-Speed 5975.15 samples/sec   Loss 1.5519   LearningRate 0.0003   Epoch: 19   Global Step: 202140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:54:54,461-Speed 5971.63 samples/sec   Loss 1.5216   LearningRate 0.0003   Epoch: 19   Global Step: 202150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:55:01,352-Speed 5944.98 samples/sec   Loss 1.5020   LearningRate 0.0003   Epoch: 19   Global Step: 202160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:55:08,214-Speed 5969.98 samples/sec   Loss 1.5600   LearningRate 0.0003   Epoch: 19   Global Step: 202170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:55:15,074-Speed 5972.09 samples/sec   Loss 1.5073   LearningRate 0.0003   Epoch: 19   Global Step: 202180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:55:21,918-Speed 5985.24 samples/sec   Loss 1.5408   LearningRate 0.0003   Epoch: 19   Global Step: 202190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:55:28,790-Speed 5962.45 samples/sec   Loss 1.5178   LearningRate 0.0003   Epoch: 19   Global Step: 202200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:55:35,668-Speed 5955.80 samples/sec   Loss 1.5133   LearningRate 0.0003   Epoch: 19   Global Step: 202210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:55:42,526-Speed 5976.12 samples/sec   Loss 1.5348   LearningRate 0.0003   Epoch: 19   Global Step: 202220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:55:49,379-Speed 5978.39 samples/sec   Loss 1.5357   LearningRate 0.0003   Epoch: 19   Global Step: 202230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:55:56,255-Speed 5958.09 samples/sec   Loss 1.5569   LearningRate 0.0003   Epoch: 19   Global Step: 202240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:56:03,115-Speed 5971.89 samples/sec   Loss 1.5426   LearningRate 0.0003   Epoch: 19   Global Step: 202250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:56:09,971-Speed 5975.96 samples/sec   Loss 1.5239   LearningRate 0.0003   Epoch: 19   Global Step: 202260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:56:16,837-Speed 5966.51 samples/sec   Loss 1.5066   LearningRate 0.0003   Epoch: 19   Global Step: 202270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:56:23,681-Speed 5985.44 samples/sec   Loss 1.4980   LearningRate 0.0003   Epoch: 19   Global Step: 202280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:56:30,532-Speed 5980.29 samples/sec   Loss 1.5315   LearningRate 0.0003   Epoch: 19   Global Step: 202290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:56:37,405-Speed 5960.54 samples/sec   Loss 1.5401   LearningRate 0.0003   Epoch: 19   Global Step: 202300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:56:44,269-Speed 5968.09 samples/sec   Loss 1.5424   LearningRate 0.0003   Epoch: 19   Global Step: 202310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:56:51,147-Speed 5956.62 samples/sec   Loss 1.5409   LearningRate 0.0003   Epoch: 19   Global Step: 202320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:56:58,031-Speed 5954.27 samples/sec   Loss 1.5247   LearningRate 0.0003   Epoch: 19   Global Step: 202330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:57:04,890-Speed 5972.13 samples/sec   Loss 1.5379   LearningRate 0.0003   Epoch: 19   Global Step: 202340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:57:11,766-Speed 5958.15 samples/sec   Loss 1.5259   LearningRate 0.0003   Epoch: 19   Global Step: 202350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:57:18,631-Speed 5967.82 samples/sec   Loss 1.5507   LearningRate 0.0003   Epoch: 19   Global Step: 202360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:57:25,492-Speed 5972.05 samples/sec   Loss 1.5298   LearningRate 0.0003   Epoch: 19   Global Step: 202370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:57:32,349-Speed 5974.50 samples/sec   Loss 1.5476   LearningRate 0.0003   Epoch: 19   Global Step: 202380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:57:39,216-Speed 5966.21 samples/sec   Loss 1.5526   LearningRate 0.0003   Epoch: 19   Global Step: 202390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:57:46,076-Speed 5971.55 samples/sec   Loss 1.5053   LearningRate 0.0003   Epoch: 19   Global Step: 202400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:57:52,943-Speed 5968.10 samples/sec   Loss 1.5185   LearningRate 0.0003   Epoch: 19   Global Step: 202410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:57:59,811-Speed 5965.42 samples/sec   Loss 1.5307   LearningRate 0.0003   Epoch: 19   Global Step: 202420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:58:06,657-Speed 5984.53 samples/sec   Loss 1.5256   LearningRate 0.0003   Epoch: 19   Global Step: 202430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:58:13,513-Speed 5975.18 samples/sec   Loss 1.5256   LearningRate 0.0003   Epoch: 19   Global Step: 202440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:58:20,375-Speed 5970.54 samples/sec   Loss 1.5236   LearningRate 0.0003   Epoch: 19   Global Step: 202450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:58:27,225-Speed 5980.94 samples/sec   Loss 1.5322   LearningRate 0.0003   Epoch: 19   Global Step: 202460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:58:34,096-Speed 5962.59 samples/sec   Loss 1.5362   LearningRate 0.0003   Epoch: 19   Global Step: 202470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:58:40,954-Speed 5973.42 samples/sec   Loss 1.5387   LearningRate 0.0003   Epoch: 19   Global Step: 202480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:58:47,832-Speed 5956.20 samples/sec   Loss 1.5388   LearningRate 0.0003   Epoch: 19   Global Step: 202490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:58:54,698-Speed 5967.71 samples/sec   Loss 1.5322   LearningRate 0.0003   Epoch: 19   Global Step: 202500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:59:01,555-Speed 5974.12 samples/sec   Loss 1.5117   LearningRate 0.0003   Epoch: 19   Global Step: 202510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:59:08,436-Speed 5953.66 samples/sec   Loss 1.5229   LearningRate 0.0003   Epoch: 19   Global Step: 202520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:59:15,294-Speed 5976.04 samples/sec   Loss 1.5357   LearningRate 0.0003   Epoch: 19   Global Step: 202530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:59:22,179-Speed 5950.82 samples/sec   Loss 1.5472   LearningRate 0.0003   Epoch: 19   Global Step: 202540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 11:59:29,020-Speed 5988.48 samples/sec   Loss 1.5004   LearningRate 0.0003   Epoch: 19   Global Step: 202550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:59:35,863-Speed 5986.86 samples/sec   Loss 1.5094   LearningRate 0.0003   Epoch: 19   Global Step: 202560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:59:42,740-Speed 5957.49 samples/sec   Loss 1.5530   LearningRate 0.0003   Epoch: 19   Global Step: 202570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:59:49,587-Speed 5982.59 samples/sec   Loss 1.5328   LearningRate 0.0003   Epoch: 19   Global Step: 202580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 11:59:56,427-Speed 5989.29 samples/sec   Loss 1.5359   LearningRate 0.0003   Epoch: 19   Global Step: 202590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:00:03,277-Speed 5980.69 samples/sec   Loss 1.5245   LearningRate 0.0003   Epoch: 19   Global Step: 202600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:00:10,138-Speed 5970.66 samples/sec   Loss 1.5307   LearningRate 0.0003   Epoch: 19   Global Step: 202610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:00:16,991-Speed 5978.51 samples/sec   Loss 1.5247   LearningRate 0.0003   Epoch: 19   Global Step: 202620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:00:23,855-Speed 5967.62 samples/sec   Loss 1.5350   LearningRate 0.0003   Epoch: 19   Global Step: 202630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:00:30,737-Speed 5953.35 samples/sec   Loss 1.5305   LearningRate 0.0003   Epoch: 19   Global Step: 202640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:00:37,594-Speed 5974.40 samples/sec   Loss 1.5286   LearningRate 0.0003   Epoch: 19   Global Step: 202650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:00:44,446-Speed 5978.81 samples/sec   Loss 1.5409   LearningRate 0.0003   Epoch: 19   Global Step: 202660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:00:51,345-Speed 5938.23 samples/sec   Loss 1.5309   LearningRate 0.0003   Epoch: 19   Global Step: 202670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:00:58,197-Speed 5978.95 samples/sec   Loss 1.5681   LearningRate 0.0003   Epoch: 19   Global Step: 202680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:01:05,054-Speed 5974.75 samples/sec   Loss 1.5209   LearningRate 0.0003   Epoch: 19   Global Step: 202690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:01:11,920-Speed 5966.48 samples/sec   Loss 1.4903   LearningRate 0.0003   Epoch: 19   Global Step: 202700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:01:18,782-Speed 5971.31 samples/sec   Loss 1.5517   LearningRate 0.0003   Epoch: 19   Global Step: 202710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:01:25,644-Speed 5971.19 samples/sec   Loss 1.5057   LearningRate 0.0002   Epoch: 19   Global Step: 202720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:01:32,514-Speed 5963.12 samples/sec   Loss 1.5129   LearningRate 0.0002   Epoch: 19   Global Step: 202730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:01:39,388-Speed 5960.04 samples/sec   Loss 1.5261   LearningRate 0.0002   Epoch: 19   Global Step: 202740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:01:46,256-Speed 5965.51 samples/sec   Loss 1.5486   LearningRate 0.0002   Epoch: 19   Global Step: 202750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:01:53,134-Speed 5955.74 samples/sec   Loss 1.5389   LearningRate 0.0002   Epoch: 19   Global Step: 202760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:01:59,994-Speed 5971.90 samples/sec   Loss 1.5490   LearningRate 0.0002   Epoch: 19   Global Step: 202770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:02:06,915-Speed 5919.58 samples/sec   Loss 1.4950   LearningRate 0.0002   Epoch: 19   Global Step: 202780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:02:13,776-Speed 5970.51 samples/sec   Loss 1.4944   LearningRate 0.0002   Epoch: 19   Global Step: 202790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:02:20,622-Speed 5984.39 samples/sec   Loss 1.5272   LearningRate 0.0002   Epoch: 19   Global Step: 202800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:02:27,489-Speed 5965.78 samples/sec   Loss 1.5141   LearningRate 0.0002   Epoch: 19   Global Step: 202810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:02:34,331-Speed 5987.80 samples/sec   Loss 1.5160   LearningRate 0.0002   Epoch: 19   Global Step: 202820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:02:41,191-Speed 5972.14 samples/sec   Loss 1.5233   LearningRate 0.0002   Epoch: 19   Global Step: 202830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:02:48,060-Speed 5964.23 samples/sec   Loss 1.5131   LearningRate 0.0002   Epoch: 19   Global Step: 202840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:02:54,916-Speed 5974.68 samples/sec   Loss 1.5201   LearningRate 0.0002   Epoch: 19   Global Step: 202850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:03:01,831-Speed 5924.26 samples/sec   Loss 1.5095   LearningRate 0.0002   Epoch: 19   Global Step: 202860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:03:08,790-Speed 5887.46 samples/sec   Loss 1.5130   LearningRate 0.0002   Epoch: 19   Global Step: 202870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:03:15,651-Speed 5970.93 samples/sec   Loss 1.5174   LearningRate 0.0002   Epoch: 19   Global Step: 202880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:03:22,506-Speed 5976.84 samples/sec   Loss 1.5331   LearningRate 0.0002   Epoch: 19   Global Step: 202890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:03:29,358-Speed 5978.72 samples/sec   Loss 1.5039   LearningRate 0.0002   Epoch: 19   Global Step: 202900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:03:36,241-Speed 5951.62 samples/sec   Loss 1.5160   LearningRate 0.0002   Epoch: 19   Global Step: 202910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:03:43,083-Speed 5987.32 samples/sec   Loss 1.5097   LearningRate 0.0002   Epoch: 19   Global Step: 202920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:03:49,932-Speed 5982.09 samples/sec   Loss 1.5315   LearningRate 0.0002   Epoch: 19   Global Step: 202930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:03:56,791-Speed 5972.96 samples/sec   Loss 1.5306   LearningRate 0.0002   Epoch: 19   Global Step: 202940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:04:03,657-Speed 5966.89 samples/sec   Loss 1.5467   LearningRate 0.0002   Epoch: 19   Global Step: 202950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:04:10,518-Speed 5970.25 samples/sec   Loss 1.5220   LearningRate 0.0002   Epoch: 19   Global Step: 202960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:04:22,907-Speed 3306.57 samples/sec   Loss 1.5075   LearningRate 0.0002   Epoch: 19   Global Step: 202970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:04:29,739-Speed 5996.68 samples/sec   Loss 1.4997   LearningRate 0.0002   Epoch: 19   Global Step: 202980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:04:36,596-Speed 5974.63 samples/sec   Loss 1.5179   LearningRate 0.0002   Epoch: 19   Global Step: 202990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:04:43,475-Speed 5954.92 samples/sec   Loss 1.4868   LearningRate 0.0002   Epoch: 19   Global Step: 203000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:04:50,314-Speed 5990.11 samples/sec   Loss 1.4905   LearningRate 0.0002   Epoch: 19   Global Step: 203010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:04:57,161-Speed 5983.84 samples/sec   Loss 1.5195   LearningRate 0.0002   Epoch: 19   Global Step: 203020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:05:04,025-Speed 5968.59 samples/sec   Loss 1.5363   LearningRate 0.0002   Epoch: 19   Global Step: 203030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:05:10,870-Speed 5985.01 samples/sec   Loss 1.5158   LearningRate 0.0002   Epoch: 19   Global Step: 203040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:05:17,728-Speed 5973.90 samples/sec   Loss 1.5079   LearningRate 0.0002   Epoch: 19   Global Step: 203050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:05:24,601-Speed 5960.29 samples/sec   Loss 1.5179   LearningRate 0.0002   Epoch: 19   Global Step: 203060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:05:31,502-Speed 5936.77 samples/sec   Loss 1.5369   LearningRate 0.0002   Epoch: 19   Global Step: 203070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:05:38,348-Speed 5984.52 samples/sec   Loss 1.5324   LearningRate 0.0002   Epoch: 19   Global Step: 203080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:05:45,242-Speed 5942.29 samples/sec   Loss 1.5161   LearningRate 0.0002   Epoch: 19   Global Step: 203090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:05:52,097-Speed 5978.82 samples/sec   Loss 1.5003   LearningRate 0.0002   Epoch: 19   Global Step: 203100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:05:58,941-Speed 5987.30 samples/sec   Loss 1.5305   LearningRate 0.0002   Epoch: 19   Global Step: 203110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:06:05,814-Speed 5961.32 samples/sec   Loss 1.5170   LearningRate 0.0002   Epoch: 19   Global Step: 203120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:06:12,687-Speed 5961.50 samples/sec   Loss 1.5018   LearningRate 0.0002   Epoch: 19   Global Step: 203130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:06:19,575-Speed 5948.45 samples/sec   Loss 1.5162   LearningRate 0.0002   Epoch: 19   Global Step: 203140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:06:26,461-Speed 5949.79 samples/sec   Loss 1.5082   LearningRate 0.0002   Epoch: 19   Global Step: 203150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:06:33,338-Speed 5957.38 samples/sec   Loss 1.5279   LearningRate 0.0002   Epoch: 19   Global Step: 203160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:06:40,194-Speed 5975.82 samples/sec   Loss 1.5327   LearningRate 0.0002   Epoch: 19   Global Step: 203170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:06:47,041-Speed 5983.54 samples/sec   Loss 1.4960   LearningRate 0.0002   Epoch: 19   Global Step: 203180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:06:53,906-Speed 5967.74 samples/sec   Loss 1.4861   LearningRate 0.0002   Epoch: 19   Global Step: 203190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:07:00,768-Speed 5970.68 samples/sec   Loss 1.5053   LearningRate 0.0002   Epoch: 19   Global Step: 203200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:07:07,617-Speed 5981.24 samples/sec   Loss 1.5306   LearningRate 0.0002   Epoch: 19   Global Step: 203210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:07:14,475-Speed 5974.28 samples/sec   Loss 1.5011   LearningRate 0.0002   Epoch: 19   Global Step: 203220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:07:21,327-Speed 5979.01 samples/sec   Loss 1.5308   LearningRate 0.0002   Epoch: 19   Global Step: 203230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:07:28,183-Speed 5975.04 samples/sec   Loss 1.4988   LearningRate 0.0002   Epoch: 19   Global Step: 203240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:07:35,032-Speed 5981.37 samples/sec   Loss 1.4795   LearningRate 0.0002   Epoch: 19   Global Step: 203250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:07:41,899-Speed 5966.43 samples/sec   Loss 1.5105   LearningRate 0.0002   Epoch: 19   Global Step: 203260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:07:48,749-Speed 5980.67 samples/sec   Loss 1.5065   LearningRate 0.0002   Epoch: 19   Global Step: 203270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:07:55,601-Speed 5978.69 samples/sec   Loss 1.5304   LearningRate 0.0002   Epoch: 19   Global Step: 203280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:08:02,495-Speed 5942.66 samples/sec   Loss 1.5293   LearningRate 0.0002   Epoch: 19   Global Step: 203290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:08:09,371-Speed 5958.22 samples/sec   Loss 1.5335   LearningRate 0.0002   Epoch: 19   Global Step: 203300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:08:16,225-Speed 5976.78 samples/sec   Loss 1.4977   LearningRate 0.0002   Epoch: 19   Global Step: 203310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:08:23,068-Speed 5987.00 samples/sec   Loss 1.5355   LearningRate 0.0002   Epoch: 19   Global Step: 203320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:08:29,932-Speed 5968.43 samples/sec   Loss 1.5205   LearningRate 0.0002   Epoch: 19   Global Step: 203330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:08:36,784-Speed 5978.84 samples/sec   Loss 1.4926   LearningRate 0.0002   Epoch: 19   Global Step: 203340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:08:43,629-Speed 5985.53 samples/sec   Loss 1.5427   LearningRate 0.0002   Epoch: 19   Global Step: 203350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:08:50,487-Speed 5976.02 samples/sec   Loss 1.5280   LearningRate 0.0002   Epoch: 19   Global Step: 203360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:08:57,353-Speed 5966.55 samples/sec   Loss 1.5002   LearningRate 0.0002   Epoch: 19   Global Step: 203370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:09:04,212-Speed 5972.73 samples/sec   Loss 1.5248   LearningRate 0.0002   Epoch: 19   Global Step: 203380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:09:11,078-Speed 5967.08 samples/sec   Loss 1.4954   LearningRate 0.0002   Epoch: 19   Global Step: 203390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:09:17,938-Speed 5972.02 samples/sec   Loss 1.5439   LearningRate 0.0002   Epoch: 19   Global Step: 203400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:09:24,799-Speed 5975.57 samples/sec   Loss 1.4995   LearningRate 0.0002   Epoch: 19   Global Step: 203410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:09:31,642-Speed 5986.22 samples/sec   Loss 1.5210   LearningRate 0.0002   Epoch: 19   Global Step: 203420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:09:38,485-Speed 5989.46 samples/sec   Loss 1.5035   LearningRate 0.0002   Epoch: 19   Global Step: 203430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:09:45,360-Speed 5959.01 samples/sec   Loss 1.5107   LearningRate 0.0002   Epoch: 19   Global Step: 203440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:09:52,225-Speed 5967.54 samples/sec   Loss 1.5163   LearningRate 0.0002   Epoch: 19   Global Step: 203450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:09:59,076-Speed 5980.00 samples/sec   Loss 1.4974   LearningRate 0.0002   Epoch: 19   Global Step: 203460   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 12:10:05,951-Speed 5959.16 samples/sec   Loss 1.5153   LearningRate 0.0002   Epoch: 19   Global Step: 203470   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 12:10:12,809-Speed 5973.89 samples/sec   Loss 1.5295   LearningRate 0.0002   Epoch: 19   Global Step: 203480   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 12:10:19,663-Speed 5977.50 samples/sec   Loss 1.5472   LearningRate 0.0002   Epoch: 19   Global Step: 203490   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 12:10:26,544-Speed 5953.07 samples/sec   Loss 1.5035   LearningRate 0.0002   Epoch: 19   Global Step: 203500   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 12:10:33,439-Speed 5941.70 samples/sec   Loss 1.5120   LearningRate 0.0002   Epoch: 19   Global Step: 203510   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 12:10:40,309-Speed 5963.32 samples/sec   Loss 1.4910   LearningRate 0.0002   Epoch: 19   Global Step: 203520   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 12:10:47,163-Speed 5977.51 samples/sec   Loss 1.5378   LearningRate 0.0002   Epoch: 19   Global Step: 203530   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 12:10:54,013-Speed 5983.85 samples/sec   Loss 1.5391   LearningRate 0.0002   Epoch: 19   Global Step: 203540   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 12:11:00,880-Speed 5965.44 samples/sec   Loss 1.4865   LearningRate 0.0002   Epoch: 19   Global Step: 203550   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-01-09 12:11:07,729-Speed 5981.09 samples/sec   Loss 1.5008   LearningRate 0.0002   Epoch: 19   Global Step: 203560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:11:14,575-Speed 5984.57 samples/sec   Loss 1.5128   LearningRate 0.0002   Epoch: 19   Global Step: 203570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:11:21,432-Speed 5974.21 samples/sec   Loss 1.5023   LearningRate 0.0002   Epoch: 19   Global Step: 203580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:11:28,309-Speed 5956.86 samples/sec   Loss 1.4914   LearningRate 0.0002   Epoch: 19   Global Step: 203590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:11:35,179-Speed 5963.26 samples/sec   Loss 1.4950   LearningRate 0.0002   Epoch: 19   Global Step: 203600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:11:42,034-Speed 5976.79 samples/sec   Loss 1.4940   LearningRate 0.0002   Epoch: 19   Global Step: 203610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:11:48,909-Speed 5959.12 samples/sec   Loss 1.4983   LearningRate 0.0002   Epoch: 19   Global Step: 203620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:11:55,795-Speed 5948.87 samples/sec   Loss 1.5241   LearningRate 0.0002   Epoch: 19   Global Step: 203630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:12:02,659-Speed 5968.81 samples/sec   Loss 1.4978   LearningRate 0.0002   Epoch: 19   Global Step: 203640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:12:09,535-Speed 5958.54 samples/sec   Loss 1.5268   LearningRate 0.0002   Epoch: 19   Global Step: 203650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:12:16,411-Speed 5958.14 samples/sec   Loss 1.5069   LearningRate 0.0002   Epoch: 19   Global Step: 203660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:12:23,293-Speed 5953.16 samples/sec   Loss 1.4930   LearningRate 0.0002   Epoch: 19   Global Step: 203670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:12:30,186-Speed 5943.17 samples/sec   Loss 1.5190   LearningRate 0.0002   Epoch: 19   Global Step: 203680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:12:37,047-Speed 5972.56 samples/sec   Loss 1.5248   LearningRate 0.0002   Epoch: 19   Global Step: 203690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:12:43,898-Speed 5979.93 samples/sec   Loss 1.5339   LearningRate 0.0002   Epoch: 19   Global Step: 203700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:12:50,766-Speed 5965.56 samples/sec   Loss 1.5160   LearningRate 0.0002   Epoch: 19   Global Step: 203710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:12:57,619-Speed 5977.64 samples/sec   Loss 1.5199   LearningRate 0.0002   Epoch: 19   Global Step: 203720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:13:04,461-Speed 5988.15 samples/sec   Loss 1.5116   LearningRate 0.0002   Epoch: 19   Global Step: 203730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:13:11,324-Speed 5969.19 samples/sec   Loss 1.5070   LearningRate 0.0002   Epoch: 19   Global Step: 203740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:13:18,172-Speed 5982.98 samples/sec   Loss 1.5313   LearningRate 0.0002   Epoch: 19   Global Step: 203750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:13:25,019-Speed 5983.09 samples/sec   Loss 1.4880   LearningRate 0.0002   Epoch: 19   Global Step: 203760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:13:31,891-Speed 5961.56 samples/sec   Loss 1.5046   LearningRate 0.0001   Epoch: 19   Global Step: 203770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:13:38,760-Speed 5964.13 samples/sec   Loss 1.4996   LearningRate 0.0001   Epoch: 19   Global Step: 203780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:13:45,614-Speed 5980.15 samples/sec   Loss 1.4990   LearningRate 0.0001   Epoch: 19   Global Step: 203790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:13:52,455-Speed 5988.14 samples/sec   Loss 1.5181   LearningRate 0.0001   Epoch: 19   Global Step: 203800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:13:59,336-Speed 5954.48 samples/sec   Loss 1.5195   LearningRate 0.0001   Epoch: 19   Global Step: 203810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:14:06,186-Speed 5981.02 samples/sec   Loss 1.4705   LearningRate 0.0001   Epoch: 19   Global Step: 203820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:14:13,039-Speed 5977.93 samples/sec   Loss 1.5104   LearningRate 0.0001   Epoch: 19   Global Step: 203830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:14:19,919-Speed 5955.08 samples/sec   Loss 1.5244   LearningRate 0.0001   Epoch: 19   Global Step: 203840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:14:26,788-Speed 5964.79 samples/sec   Loss 1.4907   LearningRate 0.0001   Epoch: 19   Global Step: 203850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:14:33,656-Speed 5964.86 samples/sec   Loss 1.5312   LearningRate 0.0001   Epoch: 19   Global Step: 203860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:14:40,515-Speed 5973.47 samples/sec   Loss 1.5032   LearningRate 0.0001   Epoch: 19   Global Step: 203870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:14:47,391-Speed 5957.89 samples/sec   Loss 1.5457   LearningRate 0.0001   Epoch: 19   Global Step: 203880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:14:54,244-Speed 5977.54 samples/sec   Loss 1.4941   LearningRate 0.0001   Epoch: 19   Global Step: 203890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:15:01,073-Speed 5999.48 samples/sec   Loss 1.5009   LearningRate 0.0001   Epoch: 19   Global Step: 203900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:15:07,921-Speed 5982.99 samples/sec   Loss 1.5049   LearningRate 0.0001   Epoch: 19   Global Step: 203910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:15:14,773-Speed 5979.83 samples/sec   Loss 1.5331   LearningRate 0.0001   Epoch: 19   Global Step: 203920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:15:21,636-Speed 5969.67 samples/sec   Loss 1.5104   LearningRate 0.0001   Epoch: 19   Global Step: 203930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:15:28,482-Speed 5984.33 samples/sec   Loss 1.5001   LearningRate 0.0001   Epoch: 19   Global Step: 203940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:15:35,326-Speed 5985.91 samples/sec   Loss 1.5024   LearningRate 0.0001   Epoch: 19   Global Step: 203950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:15:42,176-Speed 5980.87 samples/sec   Loss 1.5260   LearningRate 0.0001   Epoch: 19   Global Step: 203960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:15:49,036-Speed 5971.83 samples/sec   Loss 1.4960   LearningRate 0.0001   Epoch: 19   Global Step: 203970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:15:55,892-Speed 5975.39 samples/sec   Loss 1.5067   LearningRate 0.0001   Epoch: 19   Global Step: 203980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:16:02,738-Speed 5983.93 samples/sec   Loss 1.4776   LearningRate 0.0001   Epoch: 19   Global Step: 203990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:16:09,594-Speed 5975.66 samples/sec   Loss 1.5173   LearningRate 0.0001   Epoch: 19   Global Step: 204000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:16:16,448-Speed 5979.55 samples/sec   Loss 1.4861   LearningRate 0.0001   Epoch: 19   Global Step: 204010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:16:23,313-Speed 5966.63 samples/sec   Loss 1.5274   LearningRate 0.0001   Epoch: 19   Global Step: 204020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:16:30,233-Speed 5920.34 samples/sec   Loss 1.5157   LearningRate 0.0001   Epoch: 19   Global Step: 204030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:16:37,084-Speed 5980.72 samples/sec   Loss 1.5331   LearningRate 0.0001   Epoch: 19   Global Step: 204040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:16:43,943-Speed 5972.63 samples/sec   Loss 1.5179   LearningRate 0.0001   Epoch: 19   Global Step: 204050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:16:50,793-Speed 5979.67 samples/sec   Loss 1.4872   LearningRate 0.0001   Epoch: 19   Global Step: 204060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:16:57,647-Speed 5978.63 samples/sec   Loss 1.5252   LearningRate 0.0001   Epoch: 19   Global Step: 204070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:17:04,491-Speed 5985.34 samples/sec   Loss 1.5090   LearningRate 0.0001   Epoch: 19   Global Step: 204080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:17:11,337-Speed 5984.37 samples/sec   Loss 1.4875   LearningRate 0.0001   Epoch: 19   Global Step: 204090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:17:18,175-Speed 5991.57 samples/sec   Loss 1.5111   LearningRate 0.0001   Epoch: 19   Global Step: 204100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:17:25,040-Speed 5967.05 samples/sec   Loss 1.5001   LearningRate 0.0001   Epoch: 19   Global Step: 204110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:17:31,896-Speed 5975.08 samples/sec   Loss 1.4848   LearningRate 0.0001   Epoch: 19   Global Step: 204120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:17:38,749-Speed 5978.11 samples/sec   Loss 1.5040   LearningRate 0.0001   Epoch: 19   Global Step: 204130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:17:45,673-Speed 5917.07 samples/sec   Loss 1.5004   LearningRate 0.0001   Epoch: 19   Global Step: 204140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:17:52,528-Speed 5976.72 samples/sec   Loss 1.5126   LearningRate 0.0001   Epoch: 19   Global Step: 204150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:17:59,377-Speed 5982.17 samples/sec   Loss 1.4927   LearningRate 0.0001   Epoch: 19   Global Step: 204160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:18:06,235-Speed 5973.36 samples/sec   Loss 1.5143   LearningRate 0.0001   Epoch: 19   Global Step: 204170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:18:13,091-Speed 5976.27 samples/sec   Loss 1.4911   LearningRate 0.0001   Epoch: 19   Global Step: 204180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:18:19,960-Speed 5964.31 samples/sec   Loss 1.4917   LearningRate 0.0001   Epoch: 19   Global Step: 204190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:18:26,834-Speed 5960.10 samples/sec   Loss 1.5101   LearningRate 0.0001   Epoch: 19   Global Step: 204200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:18:33,696-Speed 5969.29 samples/sec   Loss 1.4865   LearningRate 0.0001   Epoch: 19   Global Step: 204210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:18:40,550-Speed 5977.65 samples/sec   Loss 1.4758   LearningRate 0.0001   Epoch: 19   Global Step: 204220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:18:47,398-Speed 5982.25 samples/sec   Loss 1.4888   LearningRate 0.0001   Epoch: 19   Global Step: 204230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:18:54,261-Speed 5969.70 samples/sec   Loss 1.5213   LearningRate 0.0001   Epoch: 19   Global Step: 204240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:19:01,101-Speed 5989.19 samples/sec   Loss 1.4992   LearningRate 0.0001   Epoch: 19   Global Step: 204250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:19:07,947-Speed 5983.52 samples/sec   Loss 1.5425   LearningRate 0.0001   Epoch: 19   Global Step: 204260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:19:14,813-Speed 5967.20 samples/sec   Loss 1.5011   LearningRate 0.0001   Epoch: 19   Global Step: 204270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:19:21,662-Speed 5981.17 samples/sec   Loss 1.5106   LearningRate 0.0001   Epoch: 19   Global Step: 204280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:19:28,515-Speed 5978.36 samples/sec   Loss 1.4941   LearningRate 0.0001   Epoch: 19   Global Step: 204290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:19:35,367-Speed 5978.99 samples/sec   Loss 1.4900   LearningRate 0.0001   Epoch: 19   Global Step: 204300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:19:42,253-Speed 5950.30 samples/sec   Loss 1.5095   LearningRate 0.0001   Epoch: 19   Global Step: 204310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:19:49,175-Speed 5918.52 samples/sec   Loss 1.4808   LearningRate 0.0001   Epoch: 19   Global Step: 204320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:19:56,108-Speed 5908.79 samples/sec   Loss 1.4662   LearningRate 0.0001   Epoch: 19   Global Step: 204330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:20:03,030-Speed 5918.59 samples/sec   Loss 1.5052   LearningRate 0.0001   Epoch: 19   Global Step: 204340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:20:09,962-Speed 5910.16 samples/sec   Loss 1.5010   LearningRate 0.0001   Epoch: 19   Global Step: 204350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:20:16,879-Speed 5924.33 samples/sec   Loss 1.5020   LearningRate 0.0001   Epoch: 19   Global Step: 204360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:20:23,823-Speed 5899.59 samples/sec   Loss 1.4817   LearningRate 0.0001   Epoch: 19   Global Step: 204370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:20:30,745-Speed 5918.45 samples/sec   Loss 1.4981   LearningRate 0.0001   Epoch: 19   Global Step: 204380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:20:37,604-Speed 5973.27 samples/sec   Loss 1.5045   LearningRate 0.0001   Epoch: 19   Global Step: 204390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:20:44,454-Speed 5980.16 samples/sec   Loss 1.5284   LearningRate 0.0001   Epoch: 19   Global Step: 204400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:20:51,374-Speed 5920.29 samples/sec   Loss 1.5241   LearningRate 0.0001   Epoch: 19   Global Step: 204410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:20:58,224-Speed 5980.68 samples/sec   Loss 1.4941   LearningRate 0.0001   Epoch: 19   Global Step: 204420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:21:05,074-Speed 5980.13 samples/sec   Loss 1.4972   LearningRate 0.0001   Epoch: 19   Global Step: 204430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:21:11,914-Speed 5989.82 samples/sec   Loss 1.4993   LearningRate 0.0001   Epoch: 19   Global Step: 204440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:21:18,770-Speed 5977.22 samples/sec   Loss 1.5067   LearningRate 0.0001   Epoch: 19   Global Step: 204450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:21:25,647-Speed 5958.75 samples/sec   Loss 1.5227   LearningRate 0.0001   Epoch: 19   Global Step: 204460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:21:32,521-Speed 5959.69 samples/sec   Loss 1.5273   LearningRate 0.0001   Epoch: 19   Global Step: 204470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:21:39,374-Speed 5978.43 samples/sec   Loss 1.5040   LearningRate 0.0001   Epoch: 19   Global Step: 204480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:21:46,227-Speed 5977.74 samples/sec   Loss 1.5003   LearningRate 0.0001   Epoch: 19   Global Step: 204490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:21:53,079-Speed 5979.73 samples/sec   Loss 1.4905   LearningRate 0.0001   Epoch: 19   Global Step: 204500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:21:59,974-Speed 5941.71 samples/sec   Loss 1.5004   LearningRate 0.0001   Epoch: 19   Global Step: 204510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:22:06,859-Speed 5950.48 samples/sec   Loss 1.5053   LearningRate 0.0001   Epoch: 19   Global Step: 204520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:22:13,709-Speed 5980.44 samples/sec   Loss 1.4954   LearningRate 0.0001   Epoch: 19   Global Step: 204530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:22:20,590-Speed 5954.21 samples/sec   Loss 1.4983   LearningRate 0.0001   Epoch: 19   Global Step: 204540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:22:27,445-Speed 5976.57 samples/sec   Loss 1.5102   LearningRate 0.0001   Epoch: 19   Global Step: 204550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:22:34,300-Speed 5976.49 samples/sec   Loss 1.5072   LearningRate 0.0001   Epoch: 19   Global Step: 204560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:22:41,194-Speed 5942.33 samples/sec   Loss 1.4983   LearningRate 0.0001   Epoch: 19   Global Step: 204570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:22:48,103-Speed 5929.68 samples/sec   Loss 1.5116   LearningRate 0.0001   Epoch: 19   Global Step: 204580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:22:55,005-Speed 5936.05 samples/sec   Loss 1.4848   LearningRate 0.0001   Epoch: 19   Global Step: 204590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:23:01,869-Speed 5968.28 samples/sec   Loss 1.5056   LearningRate 0.0001   Epoch: 19   Global Step: 204600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:23:08,831-Speed 5884.89 samples/sec   Loss 1.4940   LearningRate 0.0001   Epoch: 19   Global Step: 204610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:23:15,775-Speed 5901.05 samples/sec   Loss 1.5045   LearningRate 0.0001   Epoch: 19   Global Step: 204620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:23:22,629-Speed 5977.70 samples/sec   Loss 1.5056   LearningRate 0.0001   Epoch: 19   Global Step: 204630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:23:29,482-Speed 5978.23 samples/sec   Loss 1.4862   LearningRate 0.0001   Epoch: 19   Global Step: 204640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-01-09 12:23:36,322-Speed 5991.47 samples/sec   Loss 1.5074   LearningRate 0.0001   Epoch: 19   Global Step: 204650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:23:43,177-Speed 5976.00 samples/sec   Loss 1.5140   LearningRate 0.0001   Epoch: 19   Global Step: 204660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:23:50,033-Speed 5975.92 samples/sec   Loss 1.4783   LearningRate 0.0001   Epoch: 19   Global Step: 204670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:23:56,898-Speed 5967.49 samples/sec   Loss 1.5047   LearningRate 0.0001   Epoch: 19   Global Step: 204680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:24:03,768-Speed 5965.74 samples/sec   Loss 1.5257   LearningRate 0.0001   Epoch: 19   Global Step: 204690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:24:10,612-Speed 5986.57 samples/sec   Loss 1.4892   LearningRate 0.0001   Epoch: 19   Global Step: 204700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:24:17,483-Speed 5962.72 samples/sec   Loss 1.4943   LearningRate 0.0001   Epoch: 19   Global Step: 204710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:24:24,328-Speed 5985.09 samples/sec   Loss 1.5183   LearningRate 0.0001   Epoch: 19   Global Step: 204720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:24:31,182-Speed 5976.91 samples/sec   Loss 1.4678   LearningRate 0.0001   Epoch: 19   Global Step: 204730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:24:38,038-Speed 5975.61 samples/sec   Loss 1.5193   LearningRate 0.0001   Epoch: 19   Global Step: 204740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:24:44,893-Speed 5976.44 samples/sec   Loss 1.5034   LearningRate 0.0001   Epoch: 19   Global Step: 204750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:24:51,743-Speed 5981.33 samples/sec   Loss 1.5013   LearningRate 0.0001   Epoch: 19   Global Step: 204760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:24:58,599-Speed 5975.03 samples/sec   Loss 1.5184   LearningRate 0.0001   Epoch: 19   Global Step: 204770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:25:05,450-Speed 5979.49 samples/sec   Loss 1.4927   LearningRate 0.0001   Epoch: 19   Global Step: 204780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:25:12,296-Speed 5983.93 samples/sec   Loss 1.4982   LearningRate 0.0001   Epoch: 19   Global Step: 204790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:25:19,141-Speed 5984.22 samples/sec   Loss 1.5074   LearningRate 0.0001   Epoch: 19   Global Step: 204800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:25:25,986-Speed 5985.56 samples/sec   Loss 1.4912   LearningRate 0.0001   Epoch: 19   Global Step: 204810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:25:32,868-Speed 5953.58 samples/sec   Loss 1.5144   LearningRate 0.0001   Epoch: 19   Global Step: 204820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-01-09 12:25:39,713-Speed 5984.84 samples/sec   Loss 1.4973   LearningRate 0.0001   Epoch: 19   Global Step: 204830   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:25:46,565-Speed 5979.13 samples/sec   Loss 1.5049   LearningRate 0.0001   Epoch: 19   Global Step: 204840   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:25:53,415-Speed 5980.70 samples/sec   Loss 1.4961   LearningRate 0.0001   Epoch: 19   Global Step: 204850   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:26:00,300-Speed 5950.37 samples/sec   Loss 1.4932   LearningRate 0.0001   Epoch: 19   Global Step: 204860   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:26:07,142-Speed 5988.07 samples/sec   Loss 1.4968   LearningRate 0.0001   Epoch: 19   Global Step: 204870   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:26:14,006-Speed 5970.39 samples/sec   Loss 1.5117   LearningRate 0.0001   Epoch: 19   Global Step: 204880   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:26:20,853-Speed 5984.13 samples/sec   Loss 1.4984   LearningRate 0.0001   Epoch: 19   Global Step: 204890   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:26:27,693-Speed 5989.27 samples/sec   Loss 1.5031   LearningRate 0.0001   Epoch: 19   Global Step: 204900   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:26:34,544-Speed 5980.48 samples/sec   Loss 1.4856   LearningRate 0.0001   Epoch: 19   Global Step: 204910   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:26:41,399-Speed 5975.73 samples/sec   Loss 1.4604   LearningRate 0.0001   Epoch: 19   Global Step: 204920   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:26:48,244-Speed 5984.64 samples/sec   Loss 1.4839   LearningRate 0.0001   Epoch: 19   Global Step: 204930   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:26:55,102-Speed 5974.01 samples/sec   Loss 1.5062   LearningRate 0.0001   Epoch: 19   Global Step: 204940   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:27:01,980-Speed 5957.24 samples/sec   Loss 1.4974   LearningRate 0.0001   Epoch: 19   Global Step: 204950   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:27:08,829-Speed 5981.08 samples/sec   Loss 1.4950   LearningRate 0.0001   Epoch: 19   Global Step: 204960   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:27:15,703-Speed 5960.97 samples/sec   Loss 1.5100   LearningRate 0.0001   Epoch: 19   Global Step: 204970   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:27:22,561-Speed 5974.15 samples/sec   Loss 1.4946   LearningRate 0.0001   Epoch: 19   Global Step: 204980   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:27:29,430-Speed 5964.44 samples/sec   Loss 1.4932   LearningRate 0.0001   Epoch: 19   Global Step: 204990   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:27:36,284-Speed 5976.76 samples/sec   Loss 1.4871   LearningRate 0.0001   Epoch: 19   Global Step: 205000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:28:03,131-[lfw][205000]XNorm: 23.454143
Training: 2022-01-09 12:28:03,132-[lfw][205000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-01-09 12:28:03,132-[lfw][205000]Accuracy-Highest: 0.99833
Training: 2022-01-09 12:28:34,222-[cfp_fp][205000]XNorm: 21.622859
Training: 2022-01-09 12:28:34,223-[cfp_fp][205000]Accuracy-Flip: 0.99271+-0.00329
Training: 2022-01-09 12:28:34,224-[cfp_fp][205000]Accuracy-Highest: 0.99286
Training: 2022-01-09 12:29:00,925-[agedb_30][205000]XNorm: 22.974586
Training: 2022-01-09 12:29:00,926-[agedb_30][205000]Accuracy-Flip: 0.98267+-0.00554
Training: 2022-01-09 12:29:00,927-[agedb_30][205000]Accuracy-Highest: 0.98300
Training: 2022-01-09 12:29:07,753-Speed 447.81 samples/sec   Loss 1.4973   LearningRate 0.0001   Epoch: 19   Global Step: 205010   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:29:14,595-Speed 5988.60 samples/sec   Loss 1.5018   LearningRate 0.0001   Epoch: 19   Global Step: 205020   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:29:21,446-Speed 5979.25 samples/sec   Loss 1.4811   LearningRate 0.0001   Epoch: 19   Global Step: 205030   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:29:28,318-Speed 5962.55 samples/sec   Loss 1.5022   LearningRate 0.0001   Epoch: 19   Global Step: 205040   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:29:35,155-Speed 5991.65 samples/sec   Loss 1.5117   LearningRate 0.0001   Epoch: 19   Global Step: 205050   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:29:42,006-Speed 5979.84 samples/sec   Loss 1.5086   LearningRate 0.0001   Epoch: 19   Global Step: 205060   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:29:48,870-Speed 5968.12 samples/sec   Loss 1.4964   LearningRate 0.0001   Epoch: 19   Global Step: 205070   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:29:55,715-Speed 5984.96 samples/sec   Loss 1.4936   LearningRate 0.0001   Epoch: 19   Global Step: 205080   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 12:30:02,592-Speed 5956.91 samples/sec   Loss 1.5108   LearningRate 0.0001   Epoch: 19   Global Step: 205090   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 12:30:09,453-Speed 5971.28 samples/sec   Loss 1.4880   LearningRate 0.0001   Epoch: 19   Global Step: 205100   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 12:30:16,310-Speed 5974.15 samples/sec   Loss 1.5223   LearningRate 0.0001   Epoch: 19   Global Step: 205110   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 12:30:23,195-Speed 5955.98 samples/sec   Loss 1.4940   LearningRate 0.0001   Epoch: 19   Global Step: 205120   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 12:30:30,089-Speed 5942.15 samples/sec   Loss 1.4985   LearningRate 0.0001   Epoch: 19   Global Step: 205130   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 12:30:36,968-Speed 5955.28 samples/sec   Loss 1.5033   LearningRate 0.0001   Epoch: 19   Global Step: 205140   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 12:30:43,831-Speed 5970.45 samples/sec   Loss 1.5137   LearningRate 0.0001   Epoch: 19   Global Step: 205150   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 12:30:50,688-Speed 5974.20 samples/sec   Loss 1.5046   LearningRate 0.0001   Epoch: 19   Global Step: 205160   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 12:30:57,552-Speed 5968.28 samples/sec   Loss 1.4722   LearningRate 0.0001   Epoch: 19   Global Step: 205170   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 12:31:04,442-Speed 5946.21 samples/sec   Loss 1.5032   LearningRate 0.0001   Epoch: 19   Global Step: 205180   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:31:11,297-Speed 5977.06 samples/sec   Loss 1.4892   LearningRate 0.0001   Epoch: 19   Global Step: 205190   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:31:18,159-Speed 5970.56 samples/sec   Loss 1.5032   LearningRate 0.0001   Epoch: 19   Global Step: 205200   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:31:25,022-Speed 5969.33 samples/sec   Loss 1.4915   LearningRate 0.0001   Epoch: 19   Global Step: 205210   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:31:31,890-Speed 5965.47 samples/sec   Loss 1.4656   LearningRate 0.0001   Epoch: 19   Global Step: 205220   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:31:38,798-Speed 5930.64 samples/sec   Loss 1.4846   LearningRate 0.0001   Epoch: 19   Global Step: 205230   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:31:45,653-Speed 5976.77 samples/sec   Loss 1.4877   LearningRate 0.0001   Epoch: 19   Global Step: 205240   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:31:52,506-Speed 5977.85 samples/sec   Loss 1.5146   LearningRate 0.0001   Epoch: 19   Global Step: 205250   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:31:59,377-Speed 5962.43 samples/sec   Loss 1.4800   LearningRate 0.0001   Epoch: 19   Global Step: 205260   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:32:06,224-Speed 5982.62 samples/sec   Loss 1.5029   LearningRate 0.0001   Epoch: 19   Global Step: 205270   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:32:13,121-Speed 5940.69 samples/sec   Loss 1.4662   LearningRate 0.0001   Epoch: 19   Global Step: 205280   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:32:19,972-Speed 5979.64 samples/sec   Loss 1.5031   LearningRate 0.0001   Epoch: 19   Global Step: 205290   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:32:26,828-Speed 5975.50 samples/sec   Loss 1.4907   LearningRate 0.0000   Epoch: 19   Global Step: 205300   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:32:33,680-Speed 5978.83 samples/sec   Loss 1.5058   LearningRate 0.0000   Epoch: 19   Global Step: 205310   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:32:40,559-Speed 5955.50 samples/sec   Loss 1.5026   LearningRate 0.0000   Epoch: 19   Global Step: 205320   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:32:47,406-Speed 5982.80 samples/sec   Loss 1.5175   LearningRate 0.0000   Epoch: 19   Global Step: 205330   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:32:54,274-Speed 5965.31 samples/sec   Loss 1.5022   LearningRate 0.0000   Epoch: 19   Global Step: 205340   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:33:01,121-Speed 5983.29 samples/sec   Loss 1.5004   LearningRate 0.0000   Epoch: 19   Global Step: 205350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:33:07,975-Speed 5977.47 samples/sec   Loss 1.4864   LearningRate 0.0000   Epoch: 19   Global Step: 205360   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:33:14,847-Speed 5965.03 samples/sec   Loss 1.4962   LearningRate 0.0000   Epoch: 19   Global Step: 205370   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:33:21,706-Speed 5973.15 samples/sec   Loss 1.4815   LearningRate 0.0000   Epoch: 19   Global Step: 205380   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:33:28,581-Speed 5959.44 samples/sec   Loss 1.4920   LearningRate 0.0000   Epoch: 19   Global Step: 205390   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:33:35,429-Speed 5982.26 samples/sec   Loss 1.4983   LearningRate 0.0000   Epoch: 19   Global Step: 205400   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:33:42,302-Speed 5960.51 samples/sec   Loss 1.4871   LearningRate 0.0000   Epoch: 19   Global Step: 205410   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:33:49,156-Speed 5977.64 samples/sec   Loss 1.5182   LearningRate 0.0000   Epoch: 19   Global Step: 205420   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:33:56,012-Speed 5975.44 samples/sec   Loss 1.4971   LearningRate 0.0000   Epoch: 19   Global Step: 205430   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:34:02,867-Speed 5976.97 samples/sec   Loss 1.4812   LearningRate 0.0000   Epoch: 19   Global Step: 205440   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:34:09,725-Speed 5973.77 samples/sec   Loss 1.4661   LearningRate 0.0000   Epoch: 19   Global Step: 205450   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:34:16,579-Speed 5976.58 samples/sec   Loss 1.4916   LearningRate 0.0000   Epoch: 19   Global Step: 205460   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:34:23,463-Speed 5952.00 samples/sec   Loss 1.5071   LearningRate 0.0000   Epoch: 19   Global Step: 205470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:34:30,316-Speed 5978.32 samples/sec   Loss 1.4695   LearningRate 0.0000   Epoch: 19   Global Step: 205480   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:34:37,192-Speed 5957.29 samples/sec   Loss 1.4829   LearningRate 0.0000   Epoch: 19   Global Step: 205490   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:34:44,053-Speed 5971.72 samples/sec   Loss 1.4960   LearningRate 0.0000   Epoch: 19   Global Step: 205500   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:34:50,936-Speed 5951.61 samples/sec   Loss 1.4594   LearningRate 0.0000   Epoch: 19   Global Step: 205510   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:34:57,796-Speed 5972.06 samples/sec   Loss 1.4759   LearningRate 0.0000   Epoch: 19   Global Step: 205520   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:35:04,648-Speed 5978.88 samples/sec   Loss 1.4655   LearningRate 0.0000   Epoch: 19   Global Step: 205530   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:35:11,499-Speed 5981.85 samples/sec   Loss 1.4785   LearningRate 0.0000   Epoch: 19   Global Step: 205540   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:35:18,365-Speed 5966.63 samples/sec   Loss 1.4987   LearningRate 0.0000   Epoch: 19   Global Step: 205550   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:35:25,234-Speed 5964.77 samples/sec   Loss 1.4826   LearningRate 0.0000   Epoch: 19   Global Step: 205560   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:35:32,103-Speed 5964.19 samples/sec   Loss 1.5212   LearningRate 0.0000   Epoch: 19   Global Step: 205570   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:35:38,950-Speed 5983.39 samples/sec   Loss 1.4603   LearningRate 0.0000   Epoch: 19   Global Step: 205580   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:35:45,811-Speed 5970.68 samples/sec   Loss 1.5029   LearningRate 0.0000   Epoch: 19   Global Step: 205590   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:35:52,656-Speed 5985.40 samples/sec   Loss 1.5123   LearningRate 0.0000   Epoch: 19   Global Step: 205600   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:35:59,531-Speed 5958.96 samples/sec   Loss 1.4964   LearningRate 0.0000   Epoch: 19   Global Step: 205610   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:36:06,396-Speed 5967.68 samples/sec   Loss 1.5054   LearningRate 0.0000   Epoch: 19   Global Step: 205620   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:36:13,259-Speed 5978.25 samples/sec   Loss 1.5129   LearningRate 0.0000   Epoch: 19   Global Step: 205630   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:36:20,127-Speed 5964.65 samples/sec   Loss 1.4915   LearningRate 0.0000   Epoch: 19   Global Step: 205640   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:36:27,007-Speed 5967.78 samples/sec   Loss 1.5049   LearningRate 0.0000   Epoch: 19   Global Step: 205650   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:36:33,856-Speed 5980.98 samples/sec   Loss 1.5230   LearningRate 0.0000   Epoch: 19   Global Step: 205660   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:36:40,713-Speed 5976.71 samples/sec   Loss 1.4931   LearningRate 0.0000   Epoch: 19   Global Step: 205670   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:36:47,566-Speed 5978.36 samples/sec   Loss 1.4989   LearningRate 0.0000   Epoch: 19   Global Step: 205680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:36:54,419-Speed 5979.68 samples/sec   Loss 1.4827   LearningRate 0.0000   Epoch: 19   Global Step: 205690   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:37:01,294-Speed 5959.20 samples/sec   Loss 1.4998   LearningRate 0.0000   Epoch: 19   Global Step: 205700   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:37:08,157-Speed 5968.77 samples/sec   Loss 1.5013   LearningRate 0.0000   Epoch: 19   Global Step: 205710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:37:15,003-Speed 5984.25 samples/sec   Loss 1.4782   LearningRate 0.0000   Epoch: 19   Global Step: 205720   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:37:21,869-Speed 5966.76 samples/sec   Loss 1.4844   LearningRate 0.0000   Epoch: 19   Global Step: 205730   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:37:28,745-Speed 5961.39 samples/sec   Loss 1.4913   LearningRate 0.0000   Epoch: 19   Global Step: 205740   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:37:35,602-Speed 5974.74 samples/sec   Loss 1.4935   LearningRate 0.0000   Epoch: 19   Global Step: 205750   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:37:42,457-Speed 5976.92 samples/sec   Loss 1.4598   LearningRate 0.0000   Epoch: 19   Global Step: 205760   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:37:49,316-Speed 5972.36 samples/sec   Loss 1.4909   LearningRate 0.0000   Epoch: 19   Global Step: 205770   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:37:56,160-Speed 5985.73 samples/sec   Loss 1.5134   LearningRate 0.0000   Epoch: 19   Global Step: 205780   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:38:03,027-Speed 5966.38 samples/sec   Loss 1.4926   LearningRate 0.0000   Epoch: 19   Global Step: 205790   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:38:09,877-Speed 5981.09 samples/sec   Loss 1.5133   LearningRate 0.0000   Epoch: 19   Global Step: 205800   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:38:16,751-Speed 5959.67 samples/sec   Loss 1.4765   LearningRate 0.0000   Epoch: 19   Global Step: 205810   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:38:23,749-Speed 5853.72 samples/sec   Loss 1.4968   LearningRate 0.0000   Epoch: 19   Global Step: 205820   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:38:30,720-Speed 5877.81 samples/sec   Loss 1.4725   LearningRate 0.0000   Epoch: 19   Global Step: 205830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:38:37,580-Speed 5972.03 samples/sec   Loss 1.4962   LearningRate 0.0000   Epoch: 19   Global Step: 205840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:38:44,440-Speed 5971.82 samples/sec   Loss 1.4653   LearningRate 0.0000   Epoch: 19   Global Step: 205850   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:38:51,289-Speed 5981.87 samples/sec   Loss 1.4577   LearningRate 0.0000   Epoch: 19   Global Step: 205860   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:38:58,166-Speed 5957.41 samples/sec   Loss 1.4573   LearningRate 0.0000   Epoch: 19   Global Step: 205870   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:39:05,117-Speed 5894.10 samples/sec   Loss 1.4899   LearningRate 0.0000   Epoch: 19   Global Step: 205880   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:39:12,074-Speed 5888.79 samples/sec   Loss 1.4717   LearningRate 0.0000   Epoch: 19   Global Step: 205890   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:39:19,028-Speed 5890.92 samples/sec   Loss 1.4909   LearningRate 0.0000   Epoch: 19   Global Step: 205900   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:39:25,885-Speed 5975.63 samples/sec   Loss 1.4900   LearningRate 0.0000   Epoch: 19   Global Step: 205910   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:39:32,746-Speed 5971.40 samples/sec   Loss 1.4908   LearningRate 0.0000   Epoch: 19   Global Step: 205920   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:39:39,627-Speed 5952.89 samples/sec   Loss 1.5082   LearningRate 0.0000   Epoch: 19   Global Step: 205930   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:39:46,474-Speed 5984.41 samples/sec   Loss 1.4709   LearningRate 0.0000   Epoch: 19   Global Step: 205940   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:39:53,365-Speed 5944.30 samples/sec   Loss 1.4643   LearningRate 0.0000   Epoch: 19   Global Step: 205950   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:40:00,223-Speed 5974.03 samples/sec   Loss 1.4800   LearningRate 0.0000   Epoch: 19   Global Step: 205960   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:40:07,097-Speed 5959.94 samples/sec   Loss 1.4699   LearningRate 0.0000   Epoch: 19   Global Step: 205970   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:40:13,944-Speed 5983.64 samples/sec   Loss 1.4905   LearningRate 0.0000   Epoch: 19   Global Step: 205980   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:40:20,787-Speed 5987.04 samples/sec   Loss 1.4545   LearningRate 0.0000   Epoch: 19   Global Step: 205990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:40:27,649-Speed 5969.76 samples/sec   Loss 1.4700   LearningRate 0.0000   Epoch: 19   Global Step: 206000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:40:34,506-Speed 5975.27 samples/sec   Loss 1.4906   LearningRate 0.0000   Epoch: 19   Global Step: 206010   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:40:41,358-Speed 5978.10 samples/sec   Loss 1.4983   LearningRate 0.0000   Epoch: 19   Global Step: 206020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:40:48,206-Speed 5982.27 samples/sec   Loss 1.4707   LearningRate 0.0000   Epoch: 19   Global Step: 206030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:40:55,066-Speed 5971.75 samples/sec   Loss 1.4682   LearningRate 0.0000   Epoch: 19   Global Step: 206040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:41:01,918-Speed 5980.02 samples/sec   Loss 1.5043   LearningRate 0.0000   Epoch: 19   Global Step: 206050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:41:08,759-Speed 5988.97 samples/sec   Loss 1.4912   LearningRate 0.0000   Epoch: 19   Global Step: 206060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:41:15,602-Speed 5986.44 samples/sec   Loss 1.4945   LearningRate 0.0000   Epoch: 19   Global Step: 206070   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:41:22,441-Speed 5990.13 samples/sec   Loss 1.4782   LearningRate 0.0000   Epoch: 19   Global Step: 206080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:41:29,308-Speed 5966.50 samples/sec   Loss 1.4757   LearningRate 0.0000   Epoch: 19   Global Step: 206090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:41:36,182-Speed 5959.82 samples/sec   Loss 1.4916   LearningRate 0.0000   Epoch: 19   Global Step: 206100   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:41:43,033-Speed 5979.48 samples/sec   Loss 1.4970   LearningRate 0.0000   Epoch: 19   Global Step: 206110   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:41:49,893-Speed 5972.46 samples/sec   Loss 1.4619   LearningRate 0.0000   Epoch: 19   Global Step: 206120   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:41:56,746-Speed 5977.94 samples/sec   Loss 1.4895   LearningRate 0.0000   Epoch: 19   Global Step: 206130   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:42:10,589-Speed 2959.18 samples/sec   Loss 1.4999   LearningRate 0.0000   Epoch: 19   Global Step: 206140   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:42:17,489-Speed 5939.98 samples/sec   Loss 1.4825   LearningRate 0.0000   Epoch: 19   Global Step: 206150   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:42:24,332-Speed 5986.81 samples/sec   Loss 1.4730   LearningRate 0.0000   Epoch: 19   Global Step: 206160   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:42:31,182-Speed 5980.54 samples/sec   Loss 1.4823   LearningRate 0.0000   Epoch: 19   Global Step: 206170   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:42:38,050-Speed 5965.56 samples/sec   Loss 1.5093   LearningRate 0.0000   Epoch: 19   Global Step: 206180   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:42:44,903-Speed 5978.11 samples/sec   Loss 1.4835   LearningRate 0.0000   Epoch: 19   Global Step: 206190   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:42:51,760-Speed 5977.51 samples/sec   Loss 1.4997   LearningRate 0.0000   Epoch: 19   Global Step: 206200   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:42:58,612-Speed 5978.94 samples/sec   Loss 1.5004   LearningRate 0.0000   Epoch: 19   Global Step: 206210   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:43:05,483-Speed 5962.19 samples/sec   Loss 1.4868   LearningRate 0.0000   Epoch: 19   Global Step: 206220   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:43:12,333-Speed 5980.92 samples/sec   Loss 1.4973   LearningRate 0.0000   Epoch: 19   Global Step: 206230   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:43:19,198-Speed 5968.15 samples/sec   Loss 1.4981   LearningRate 0.0000   Epoch: 19   Global Step: 206240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:43:26,067-Speed 5963.87 samples/sec   Loss 1.5058   LearningRate 0.0000   Epoch: 19   Global Step: 206250   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:43:32,926-Speed 5973.31 samples/sec   Loss 1.4919   LearningRate 0.0000   Epoch: 19   Global Step: 206260   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:43:39,774-Speed 5982.58 samples/sec   Loss 1.4904   LearningRate 0.0000   Epoch: 19   Global Step: 206270   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:43:46,636-Speed 5970.42 samples/sec   Loss 1.4766   LearningRate 0.0000   Epoch: 19   Global Step: 206280   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:43:53,508-Speed 5961.48 samples/sec   Loss 1.4766   LearningRate 0.0000   Epoch: 19   Global Step: 206290   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:44:00,371-Speed 5969.54 samples/sec   Loss 1.4807   LearningRate 0.0000   Epoch: 19   Global Step: 206300   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:44:07,230-Speed 5973.64 samples/sec   Loss 1.4793   LearningRate 0.0000   Epoch: 19   Global Step: 206310   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:44:14,078-Speed 5981.73 samples/sec   Loss 1.4798   LearningRate 0.0000   Epoch: 19   Global Step: 206320   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:44:20,945-Speed 5966.44 samples/sec   Loss 1.4869   LearningRate 0.0000   Epoch: 19   Global Step: 206330   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:44:27,798-Speed 5978.63 samples/sec   Loss 1.4891   LearningRate 0.0000   Epoch: 19   Global Step: 206340   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:44:34,659-Speed 5971.11 samples/sec   Loss 1.4529   LearningRate 0.0000   Epoch: 19   Global Step: 206350   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:44:41,507-Speed 5982.00 samples/sec   Loss 1.5222   LearningRate 0.0000   Epoch: 19   Global Step: 206360   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:44:48,402-Speed 5941.90 samples/sec   Loss 1.4848   LearningRate 0.0000   Epoch: 19   Global Step: 206370   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:44:55,260-Speed 5973.59 samples/sec   Loss 1.4902   LearningRate 0.0000   Epoch: 19   Global Step: 206380   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:45:02,133-Speed 5961.30 samples/sec   Loss 1.4923   LearningRate 0.0000   Epoch: 19   Global Step: 206390   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:45:08,991-Speed 5974.73 samples/sec   Loss 1.4686   LearningRate 0.0000   Epoch: 19   Global Step: 206400   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:45:15,846-Speed 5976.10 samples/sec   Loss 1.5141   LearningRate 0.0000   Epoch: 19   Global Step: 206410   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:45:22,695-Speed 5981.78 samples/sec   Loss 1.4679   LearningRate 0.0000   Epoch: 19   Global Step: 206420   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 12:45:29,548-Speed 5979.89 samples/sec   Loss 1.4586   LearningRate 0.0000   Epoch: 19   Global Step: 206430   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 12:45:36,435-Speed 5948.19 samples/sec   Loss 1.4692   LearningRate 0.0000   Epoch: 19   Global Step: 206440   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 12:45:43,295-Speed 5972.06 samples/sec   Loss 1.4942   LearningRate 0.0000   Epoch: 19   Global Step: 206450   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 12:45:50,165-Speed 5963.15 samples/sec   Loss 1.4813   LearningRate 0.0000   Epoch: 19   Global Step: 206460   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 12:45:57,028-Speed 5969.49 samples/sec   Loss 1.4887   LearningRate 0.0000   Epoch: 19   Global Step: 206470   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 12:46:03,906-Speed 5958.65 samples/sec   Loss 1.5010   LearningRate 0.0000   Epoch: 19   Global Step: 206480   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 12:46:10,753-Speed 5983.68 samples/sec   Loss 1.4597   LearningRate 0.0000   Epoch: 19   Global Step: 206490   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 12:46:17,603-Speed 5980.35 samples/sec   Loss 1.4771   LearningRate 0.0000   Epoch: 19   Global Step: 206500   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 12:46:24,450-Speed 5983.20 samples/sec   Loss 1.4856   LearningRate 0.0000   Epoch: 19   Global Step: 206510   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-01-09 12:46:31,338-Speed 5947.74 samples/sec   Loss 1.4824   LearningRate 0.0000   Epoch: 19   Global Step: 206520   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:46:38,203-Speed 5968.04 samples/sec   Loss 1.5069   LearningRate 0.0000   Epoch: 19   Global Step: 206530   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:46:45,064-Speed 5970.30 samples/sec   Loss 1.4820   LearningRate 0.0000   Epoch: 19   Global Step: 206540   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:46:51,922-Speed 5974.22 samples/sec   Loss 1.4859   LearningRate 0.0000   Epoch: 19   Global Step: 206550   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:46:58,788-Speed 5966.65 samples/sec   Loss 1.4710   LearningRate 0.0000   Epoch: 19   Global Step: 206560   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:47:05,653-Speed 5967.31 samples/sec   Loss 1.4852   LearningRate 0.0000   Epoch: 19   Global Step: 206570   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:47:12,513-Speed 5972.62 samples/sec   Loss 1.4961   LearningRate 0.0000   Epoch: 19   Global Step: 206580   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:47:19,374-Speed 5970.56 samples/sec   Loss 1.4798   LearningRate 0.0000   Epoch: 19   Global Step: 206590   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:47:26,233-Speed 5972.58 samples/sec   Loss 1.5281   LearningRate 0.0000   Epoch: 19   Global Step: 206600   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:47:33,083-Speed 5980.84 samples/sec   Loss 1.4687   LearningRate 0.0000   Epoch: 19   Global Step: 206610   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:47:40,118-Speed 5825.99 samples/sec   Loss 1.4895   LearningRate 0.0000   Epoch: 19   Global Step: 206620   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:47:47,086-Speed 5879.85 samples/sec   Loss 1.4781   LearningRate 0.0000   Epoch: 19   Global Step: 206630   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:47:53,924-Speed 5990.98 samples/sec   Loss 1.5233   LearningRate 0.0000   Epoch: 19   Global Step: 206640   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:48:00,778-Speed 5976.96 samples/sec   Loss 1.4756   LearningRate 0.0000   Epoch: 19   Global Step: 206650   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:48:07,619-Speed 5989.08 samples/sec   Loss 1.4716   LearningRate 0.0000   Epoch: 19   Global Step: 206660   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:48:14,477-Speed 5973.64 samples/sec   Loss 1.4619   LearningRate 0.0000   Epoch: 19   Global Step: 206670   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:48:21,339-Speed 5970.23 samples/sec   Loss 1.4790   LearningRate 0.0000   Epoch: 19   Global Step: 206680   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:48:28,203-Speed 5968.52 samples/sec   Loss 1.4912   LearningRate 0.0000   Epoch: 19   Global Step: 206690   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:48:35,058-Speed 5975.78 samples/sec   Loss 1.4995   LearningRate 0.0000   Epoch: 19   Global Step: 206700   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:48:41,938-Speed 5954.68 samples/sec   Loss 1.4879   LearningRate 0.0000   Epoch: 19   Global Step: 206710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:48:48,809-Speed 5962.72 samples/sec   Loss 1.4759   LearningRate 0.0000   Epoch: 19   Global Step: 206720   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:48:55,687-Speed 5956.50 samples/sec   Loss 1.4704   LearningRate 0.0000   Epoch: 19   Global Step: 206730   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:49:02,535-Speed 5981.99 samples/sec   Loss 1.4970   LearningRate 0.0000   Epoch: 19   Global Step: 206740   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:49:09,388-Speed 5978.42 samples/sec   Loss 1.5044   LearningRate 0.0000   Epoch: 19   Global Step: 206750   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:49:16,236-Speed 5982.54 samples/sec   Loss 1.4825   LearningRate 0.0000   Epoch: 19   Global Step: 206760   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:49:23,203-Speed 5879.66 samples/sec   Loss 1.4891   LearningRate 0.0000   Epoch: 19   Global Step: 206770   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:49:30,045-Speed 5987.62 samples/sec   Loss 1.4959   LearningRate 0.0000   Epoch: 19   Global Step: 206780   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:49:36,897-Speed 5978.98 samples/sec   Loss 1.4690   LearningRate 0.0000   Epoch: 19   Global Step: 206790   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:49:43,743-Speed 5984.16 samples/sec   Loss 1.4949   LearningRate 0.0000   Epoch: 19   Global Step: 206800   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:49:50,621-Speed 5956.05 samples/sec   Loss 1.4791   LearningRate 0.0000   Epoch: 19   Global Step: 206810   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:49:57,487-Speed 5967.19 samples/sec   Loss 1.4862   LearningRate 0.0000   Epoch: 19   Global Step: 206820   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:50:04,342-Speed 5976.03 samples/sec   Loss 1.5036   LearningRate 0.0000   Epoch: 19   Global Step: 206830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:50:11,215-Speed 5961.84 samples/sec   Loss 1.4929   LearningRate 0.0000   Epoch: 19   Global Step: 206840   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:50:18,066-Speed 5979.12 samples/sec   Loss 1.4767   LearningRate 0.0000   Epoch: 19   Global Step: 206850   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:50:24,915-Speed 5981.60 samples/sec   Loss 1.4572   LearningRate 0.0000   Epoch: 19   Global Step: 206860   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:50:31,782-Speed 5965.97 samples/sec   Loss 1.5182   LearningRate 0.0000   Epoch: 19   Global Step: 206870   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:50:38,637-Speed 5976.10 samples/sec   Loss 1.4795   LearningRate 0.0000   Epoch: 19   Global Step: 206880   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:50:45,515-Speed 5957.22 samples/sec   Loss 1.4759   LearningRate 0.0000   Epoch: 19   Global Step: 206890   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:50:52,376-Speed 5970.58 samples/sec   Loss 1.4836   LearningRate 0.0000   Epoch: 19   Global Step: 206900   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:50:59,231-Speed 5977.41 samples/sec   Loss 1.4887   LearningRate 0.0000   Epoch: 19   Global Step: 206910   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:51:06,088-Speed 5973.84 samples/sec   Loss 1.4588   LearningRate 0.0000   Epoch: 19   Global Step: 206920   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:51:12,935-Speed 5983.42 samples/sec   Loss 1.4833   LearningRate 0.0000   Epoch: 19   Global Step: 206930   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:51:19,789-Speed 5977.54 samples/sec   Loss 1.5066   LearningRate 0.0000   Epoch: 19   Global Step: 206940   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:51:26,634-Speed 5985.22 samples/sec   Loss 1.4956   LearningRate 0.0000   Epoch: 19   Global Step: 206950   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:51:33,508-Speed 5967.09 samples/sec   Loss 1.4967   LearningRate 0.0000   Epoch: 19   Global Step: 206960   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:51:40,367-Speed 5972.29 samples/sec   Loss 1.5012   LearningRate 0.0000   Epoch: 19   Global Step: 206970   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:51:47,231-Speed 5968.38 samples/sec   Loss 1.4477   LearningRate 0.0000   Epoch: 19   Global Step: 206980   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:51:54,092-Speed 5971.50 samples/sec   Loss 1.4964   LearningRate 0.0000   Epoch: 19   Global Step: 206990   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:52:00,943-Speed 5979.71 samples/sec   Loss 1.4602   LearningRate 0.0000   Epoch: 19   Global Step: 207000   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:52:07,814-Speed 5962.82 samples/sec   Loss 1.4929   LearningRate 0.0000   Epoch: 19   Global Step: 207010   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:52:14,673-Speed 5972.97 samples/sec   Loss 1.5120   LearningRate 0.0000   Epoch: 19   Global Step: 207020   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:52:21,540-Speed 5965.46 samples/sec   Loss 1.4648   LearningRate 0.0000   Epoch: 19   Global Step: 207030   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:52:28,418-Speed 5956.14 samples/sec   Loss 1.4635   LearningRate 0.0000   Epoch: 19   Global Step: 207040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:52:35,264-Speed 5984.49 samples/sec   Loss 1.4955   LearningRate 0.0000   Epoch: 19   Global Step: 207050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:52:42,132-Speed 5965.01 samples/sec   Loss 1.5012   LearningRate 0.0000   Epoch: 19   Global Step: 207060   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:52:48,992-Speed 5971.65 samples/sec   Loss 1.4935   LearningRate 0.0000   Epoch: 19   Global Step: 207070   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:52:55,838-Speed 5984.38 samples/sec   Loss 1.4688   LearningRate 0.0000   Epoch: 19   Global Step: 207080   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:53:02,695-Speed 5975.08 samples/sec   Loss 1.4872   LearningRate 0.0000   Epoch: 19   Global Step: 207090   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:53:09,555-Speed 5974.66 samples/sec   Loss 1.4717   LearningRate 0.0000   Epoch: 19   Global Step: 207100   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:53:16,394-Speed 5990.38 samples/sec   Loss 1.5151   LearningRate 0.0000   Epoch: 19   Global Step: 207110   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:53:23,236-Speed 5987.94 samples/sec   Loss 1.4629   LearningRate 0.0000   Epoch: 19   Global Step: 207120   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:53:30,106-Speed 5965.15 samples/sec   Loss 1.4757   LearningRate 0.0000   Epoch: 19   Global Step: 207130   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:53:36,969-Speed 5969.58 samples/sec   Loss 1.4917   LearningRate 0.0000   Epoch: 19   Global Step: 207140   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:53:43,844-Speed 5959.78 samples/sec   Loss 1.5104   LearningRate 0.0000   Epoch: 19   Global Step: 207150   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:53:50,696-Speed 5978.73 samples/sec   Loss 1.4786   LearningRate 0.0000   Epoch: 19   Global Step: 207160   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:53:57,552-Speed 5975.63 samples/sec   Loss 1.4782   LearningRate 0.0000   Epoch: 19   Global Step: 207170   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:54:04,400-Speed 5982.11 samples/sec   Loss 1.4684   LearningRate 0.0000   Epoch: 19   Global Step: 207180   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:54:11,266-Speed 5967.61 samples/sec   Loss 1.4684   LearningRate 0.0000   Epoch: 19   Global Step: 207190   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:54:18,112-Speed 5984.26 samples/sec   Loss 1.4896   LearningRate 0.0000   Epoch: 19   Global Step: 207200   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:54:24,985-Speed 5960.04 samples/sec   Loss 1.5111   LearningRate 0.0000   Epoch: 19   Global Step: 207210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:54:31,842-Speed 5977.17 samples/sec   Loss 1.5104   LearningRate 0.0000   Epoch: 19   Global Step: 207220   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:54:38,717-Speed 5958.50 samples/sec   Loss 1.4772   LearningRate 0.0000   Epoch: 19   Global Step: 207230   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:54:45,588-Speed 5962.89 samples/sec   Loss 1.4821   LearningRate 0.0000   Epoch: 19   Global Step: 207240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:54:52,458-Speed 5966.08 samples/sec   Loss 1.4929   LearningRate 0.0000   Epoch: 19   Global Step: 207250   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:54:59,335-Speed 5956.85 samples/sec   Loss 1.4945   LearningRate 0.0000   Epoch: 19   Global Step: 207260   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:55:06,206-Speed 5962.83 samples/sec   Loss 1.5052   LearningRate 0.0000   Epoch: 19   Global Step: 207270   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:55:13,052-Speed 5983.66 samples/sec   Loss 1.4985   LearningRate 0.0000   Epoch: 19   Global Step: 207280   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-01-09 12:55:19,893-Speed 5990.57 samples/sec   Loss 1.4743   LearningRate 0.0000   Epoch: 19   Global Step: 207290   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:55:26,748-Speed 5979.68 samples/sec   Loss 1.4865   LearningRate 0.0000   Epoch: 19   Global Step: 207300   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:55:33,616-Speed 5964.71 samples/sec   Loss 1.4767   LearningRate 0.0000   Epoch: 19   Global Step: 207310   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:55:40,478-Speed 5970.87 samples/sec   Loss 1.4929   LearningRate 0.0000   Epoch: 19   Global Step: 207320   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:55:47,326-Speed 5983.69 samples/sec   Loss 1.5018   LearningRate 0.0000   Epoch: 19   Global Step: 207330   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:55:54,211-Speed 5954.38 samples/sec   Loss 1.4981   LearningRate 0.0000   Epoch: 19   Global Step: 207340   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:56:01,056-Speed 5984.53 samples/sec   Loss 1.4933   LearningRate 0.0000   Epoch: 19   Global Step: 207350   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:56:07,912-Speed 5975.63 samples/sec   Loss 1.4956   LearningRate 0.0000   Epoch: 19   Global Step: 207360   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:56:14,765-Speed 5978.30 samples/sec   Loss 1.5111   LearningRate 0.0000   Epoch: 19   Global Step: 207370   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-01-09 12:56:21,627-Speed 5972.10 samples/sec   Loss 1.5128   LearningRate 0.0000   Epoch: 19   Global Step: 207380   Fp16 Grad Scale: 32768   Required: -0 hours
Training: 2022-01-09 12:56:28,484-Speed 5975.01 samples/sec   Loss 1.4906   LearningRate 0.0000   Epoch: 19   Global Step: 207390   Fp16 Grad Scale: 65536   Required: -0 hours